A presentation on implementing save games

Jurie Horneman

Sunday, November 24, 2024

I recently gave a private presentation on how to implement save games. This is the blog post based on that. I wrote the presentation pretty quickly, so this is not the best structured blog post ever, but hopefully it's useful to someone.

Should you listen to me?

I don't consider myself an expert on save games, but I have worked on them at various stages in my career.

Amberstar, Ambermoon, and Albion, the first three games I worked on, were big CRPGs. The first two games saved to floppy disk, Albion only saved to hard drive. The logic wasn't very complicated, because these games did not use dynamic data structures. The save game consisted of fixed data structures and simple arrays.

As part of my work on the mission systems in Watch Dogs: Legion I also worked on the save games. I did not work on the save game system itself: that was legacy code. But I did spend a lot of time on the code for saving and loading mission data. My number one worry on Legion was something going wrong with the save games. If you mess up mission save data, you can erase enormous amounts of progress, or create corrupted or stuck save games. This actually happened on Amberstar, and back then I had to explain this directly to players, on the phone.

Quite recently I designed and implemented a save game system from scratch. The game that it's a part of has not been released, so it's hard to say if it would survive the crucible of shipping. But it did work during development, and, most encouragingly, other programmers used it, and said nice things about it.

Assumptions

Here are my assumptions for the rest of this blog post:

You've got a decently complicated game, with multiple programmers working on multiple systems over a period of years.
You want console support and online support.
Your game is multiplayer.
Your game is open world.
The game supports auto-saving.
The whole save game fits in memory.
Your engine has some kind of system to serialize and deserialize arbitrary data structures to something saveable. In the rest of this blog post I am going to assume we're using JSON, but it doesn't have to be that.

Goals

The key goals for a save game system:

Robustness. Data needs to be accurate and correct. This is more important than recency! It's better to have a 5 minute old save game that is correct than a 30 scond old save game that is corrupted.
The API is easy to use. It should be hard to do the wrong thing.
It meets certain performance goals: CPU, bandwidth, and memory.
It has backwards compatibility: the game can load a save game from an older version of the game.

I am not listing forwards compatibility - an older version of the game can load a save game from a newer version - as a goal. I think it's reasonably rare, but certain platforms may require it. My assumption is that forwards compatibility is just a more annoying version of backwards compatibility, but perhaps I am wrong.

Architecture

The architecture I designed works like this:

You have persistent systems: game systems that need to save something. Player data, NPCs, missions, world data, etc.
Then you have a save game manager which holds the central save game logic. This is what the persistent systems talk to, and in turn the save game manager will ask the persistent systems to save, at its convenience.
Underneath all that you have the storage layer, which actually takes a save game and gets it where it needs to be.

Persistent systems

Each persistent system gets a conceptual "bucket" to put its data into. You can imagine the whole save game as a JSON object, and each system has a key where it can store its data as a sub-object.

Each persistent system needs to rely on other systems. This was a key lesson I learned on Legion. If a mission involved a certain person or car or whatever in the world, then the mission system had to rely on some other system saving the data for that.

(There is, in fact, a lot of subtlety in the design of save games - which data gets saved, which data gets reset when you load, or - in an open world game - when you move away and come back. But I am not getting into that here, except that as a programmer you should be aware that there should be a design around all this.)

This reliance involves a lot of communication on bigger teams, so that everyone understand what the mutual save game requirements are.

Each persistent system registers itself with the save game manager, and provides:

Some kind of nice name for logging and debugging.
A storage key (the name of the JSON sub-object I mentioned above).
Multiplayer complexities, which I will mention further down.
The minimum supported version number, for this system's bucket.
The current version number, again for this system's bucket.

That leads me to the first best practice.

Best practice: Version all the things.

Add version numbers to everything. Nothing is more annoying than having to infer it from existing data. I've had to do it more than once.

The way I like to do it, in C++, is to define macros for each version, and give those meaningful names. So version 0 would be FIRST_SAVE_GAME_VERSION_NUM, version 1 would be NOW_WITH_NPCS_VERSION_NUM, etc. Just something that means you're effectively auto-documenting your versions. If you ever look at a raw save game and wonder what version "7" means, you can look at your macros.

How does saving work?

When data that needs to be saved changes, the relevant persistent system marks itself as dirty with the save game manager. There are two levels of dirtiness: 1. "Save me now!" 2. "Next time you save, ask me for fresh data"

Some data changes are not worth doing a save for. This ties into the design requirements, as well as performance considerations, which I will get to in a bit.

Eventually (! see below) the save game manager will ask all systems marked as dirty to serialize their save data into some serialization container.

The save game manager will tell the storage layer there's new save game data.

Again, eventually (!) the storage layer will actually save that data.

"Eventually"??

This is key for performance tuning in a game that auto-saves. You don't want to save to disk every frame, you may not be allowed to save to console memory too often, you don't want to hammer your back end with continuous updates.

I can't give you a good rate for saving, mostly because I don't remember, but the concept of dirty data, the dirty levels, and the existence of the storage layer are all there to effectively throttle saving.

Note that:

the save game manager needs to run some logic each frame, or every couple of frames.
the save game manager needs some kind of flush function so that when you for example quit the game, you can force a save.
that one system relying on another thing I mentioned earlier means that if, say, the mission system marks itself as dirty, the NPC management system must do the same - the next time the save game manager saves mission data, the NPC save data needs to be in sync with it.

How does loading work?

During initialization, the save game manager asks the storage layer for the player being loaded. (I am skipping some potential multiplayer logic here.)

If there is one, load it. Then check if it's valid. Do you have data for all systems? If not, is that a problem, or just a new post-launch system that can handle an empty save game? Are all the version numbers greater than or equal to the minimum version number?

Assuming that is all fine, iterate over all the persistent systems (you'll probably want to do this in the same order each time...) and ask each of them to deserialize and process their data from their respective bucket.

If there is no save game, or no valid save game, ask all persistent systems to reset to a default state. Maybe that's not needed for some systems, maybe they do - they can decide.

How many copies?

You may ask: How many copies of save data are we talking about here?

I decided that the save game manager would not keep a copy of the save game. It passes data between the persistent systems and the storage layer.

The storage layer will likely need to do throttling. It might receive several save games before it decides it can save to disk, or send it to a server, or what have you. So it will likely need to do some juggling to make sure it always has the latest copy of the right bucket.

The persistent systems will likely also cache save game data, but for a different reason. If your game's entire world run-time data does not fit into memory, such as in an open world game but definitely not just those, you will need to support spatial persistence (the player comes back and things are the way they were), as well as session persistence (the player turns their machine back on and things are the way they were).

These two concepts are not the same, and I strongly encourage you to treat them as different.

This can mean (without getting into the weeds here) some persistent systems may cache save game data for areas the player is not in, so those can be turned into full run-time data structures the next time the player goes there.

Does that sound complicated? It nearly melted my brain. Sadly, I was not able to work on it long enough to see if better patterns arose out of regular usage. (If you have a simpler solution: write a blog post about it and send me the link please.)

Key point: Start early

I found out the hard way that adding saving and loading to an existing system is hugely annoying. Like, rewriting legacy player initialization flows annoying. Even adding it to a system you yourself wrote from scratch a few months earlier is annoying.

You will introduce new code paths, for save game versus no save game. That is why I introduced the "reset to default" step above, so at least that's a predictable step in the code path for each persistent system.

You will suddenly need access to save game data deep inside systems. Mission initialization, at least in the last few games where I worked on this, can involve a lot of layers of code, node graphs, and recursive functions. To thread access to the current mission's save game through all of those layers can be a huge pain.

At a higher level, adding loading to the game adds an asychronous step to your game initialization. Initializing all of your systems may have been a nice, synchronous function. Now you suddenly have to wait for data to arrive from who knows where, and your whole game is in limbo.

Pitfall: Saving before you are done loading

Saving before you are done loading sounds stupid, but it can happen when you have multiple systems starting up individually.

This results in a guaranteed save game wipe, because when the save game manager asks your persistent system to save and you haven't loaded yet: welp, you're going to be saving empty data. Ask me how I know.

On Legion, I asserted against this in the persistent system. In the later project, I protected against it in the save game manager.

Best practice: Assert all the things

Enforcing contracts like the one above is a key reason for the save game manager to exist. I had a lot of asserts in there.

You are not writing some arbitrary gameplay code: you are writing a distributed database that holds the most valuable data in the game, data that can have a longer lifetime than the game itself.

Be very clear about what happens when and in which order, what is OK and what isn't, and enforce it.

Best practice: Avoid implicit information or dependencies on static data

This is about the nitty-gritty of serializing data. You want to avoid relying on some kind of implict information that might change over time.

If you have an unsigned integer that holds a bunch of flags: don't save that integer, save individual boolean variables. Yes, that is a lot of boilerplate, but you WILL be grateful when you have to handle old save game versions, or when you have to debug save game data. And it means you can change the bit indices of those flags without breaking saves.

Save IDs of things even if you don't need them at run-time. Let's say for each mission you have a value that indicates whether the mission has not yet started, is running, or has been completed, and the list of missions comes from the data created by mission designers. That list is static in the context of a single run of the game. But in the context of a save game it's not static at all! Missions get added, removed, perhaps rearranged.

Don't just iterate over your list and save the values, save the ID of the mission as well.

Expect to write a lot of logic where you reconcile the save game with static data. In other words, you need to handle both of these cases:

You have save game data for this NPC, but the NPC doesn't exist.
You have static data for this NPC, but no save game data.

Best practice: Think of the developer experience

Make it easy to start with a new save game, for developers, for QA, for publishers. This can be tricky: publisher builds may not have access to debug commands, for example.

Auto-wipe save games when data changes. Full backwards save game compatibility ideally happens late in the project: before that it is a drag. It can and should happen before players get their hands on it. You will want alpha testers to play long sessions, you will want QA to be able to build long-running save games.

When to turn on full backwards save game compatibility is an important decision that everyone on the team needs to be aware of. Gameplay programmers need to prepare for it and test their systems, and everyone else needs to know they can now depend on their save games not being wiped.

Make it easy to find save games locally. Make it easy to attach them to bug reports! I heard of one project that put their save game into the meta-data of an image file, which is a super cool idea, because most users know how to upload an image.

Multiplayer

The save game system I built had some support for multiplayer. Our game assumed a host-centric approach to multiplayer and the game logic, including save games, running on a server. That is not the only way to do multiplayer!

To support the host-centric approach, I let each system say whether they needed read and/or write access to the host and non-host player data. That was mostly there for asserting that a system that should only save and load for the host would not try to do anything bad with other players' data.

Multiplayer save games are complicated, but not necessarily on a technical level. The system (the save game manager and storage layer) can only do so much. I recommend trying to design simple rules for how saves work in multiplayer, so that both developers and users understand what will happen.

Transactions

What if you have a back end that absolutely needs transactions, not just snapshots? It simply doesn't provide an API where you can say "here's the whole inventory", but only allows you to send requests to add and remove items, say? And you have to use this back end for contractual reasons? Hypothetically speaking?

I decided to use the following approach:

The persistent system in question needs to keep track of the transactions. That's a bit of boilerplate but not particularly hard.
It then stores those as an array with a pre-defined name (e.g. "persistentTransactions") in the save game serialization object.
The save game manager doesn't care: it just passes through the serialized data.
The storage layer may need to do some juggling when it combines multiple save games. It can take the latest version of the normal data, but it needs to concatenate the transaction arrays.
The back end then peels off the transactions and sends it to the transactional API, and saves the rest as JSON.

Conclusion

There's a lot of bits I haven't dug into, like back end API design. I wasn't involved in the implementation of that on that recent project, but I was in a lot of discussions about it. I also didn't talk about the nitty-gritty of multiplayer host-centric mission save games. Or multiple save slots, or keeping backups around.

But I hope you found this useful. Please let me know if you do :)