Tuesday, October 13, 2009

The Main Loop as a Graph Traversal

In an earlier post, I talked about treating game objects as a list of game components, with each component implementing a specific feature of the object. That way, I can call GameObject.update() and get a different result for each game object instance depending on which components have been inserted.

In fact, in Replica Island I apply that same idea to the entire game loop. The whole simulation step can be completed with a single call to MainLoop.update(). Though it's not a game object, the MainLoop object is similar to GameObject it that it simply contains a list of things to update. Those things might be GameObjects, or they might be some other kind of object that can be updated--the main loop doesn't have to know. Things that are inserted into that list will be polled every frame; that's all the MainLoop object does.

For example, there's a bit of code that needs to work out the current state of all the hardware inputs--touch screen, trackball, keyboard, d-pad, etc. That code needs to run every frame, but it's not a game object. As long as it has the right base class and implements update(), it just needs to be added to the MainLoop's list to be run every frame. Then there's the renderer; once all the game objects are finished this object needs to send the draw commands for the frame to the rendering thread (also described in a previous post), so it too can be inserted at the end of the MainLoop's list.

And in fact, systems that need to update every frame can themselves manage other systems that need to update every frame.

For example, game objects can be added directly to the MainLoop's list, but with one or two exceptions I never do that. Instead, a special system, called the GameObjectManager, is run by MainLoop every frame, and that object contains a list of GameObjects which it runs in its update() function. The reason that I have inserted this GameObjectManager class is that not all game objects should be updated every frame. Only a subset--the set that are within some radius from the camera--should be run. The rest should remain inactive to save CPU cycles. So the GameObjectManager, when updated by MainLoop, selects a subset of the GameObjects that it controls and updates them based on the position of the camera.

If you hadn't guessed already, the structure I am describing here is a tree. The MainLoop is the root node of this tree, and its children are things like the input system, render system, and GameObjectManager--bits of code that need to run every frame. The GameObjectManager has all of the game objects as its children, but it stipulates that not all of its children will be visited every traversal. The game objects themselves contain game components as their children; the game components are the leaf nodes of this kind of tree. So, to run the simulation step for the current frame, I just traverse the tree.

Actually, to be precise the structure that I am describing is a graph. The reason it must be a graph rather than a tree is that the structure allows for instancing of subgraphs and of cross-level traversals. For example, certain types of GameComponents that don't need to track state from frame to frame can be shared across multiple game objects; in that case, only one instance of the component exists but it is inserted into a number of different game object instances. In graph terms, shared GameComponents are nodes with multiple parents. However, for the general case the structure behaves like a tree, and so it's pretty safe to think about it that way.

I like using a graph to describe all the work that must be done in a frame because it's an extremely flexible way to set up a main loop. The MainLoop object hasn't changed since I originally wrote it; though the number of objects that it contains has increased, the management code itself has remained the same. For the next game, I can rip out the individual systems that I don't need any longer and insert new ones without altering any of the main program architecture.

This type of graph structure can also give you precise control over how your simulation step is run. Say you want to pause the game, but you need key systems (such as the renderer) to continue operating so that you can run the pause UI graphics. With a tree or graph system, you can insert "pausable" nodes into the tree and append to them children that should stop when the game is paused. At runtime these nodes will simply not traverse their children if the game is paused. This kind of control is hard to thread into a game that is already up and running using traditional hard-coded methods; it usually results in a lot of switches on the g_paused variable littered throughout the code base. With a graph, none of the actual simulation code needs to change--only the graph structure is modified to accommodate pausing.

Another advantage is that it's pretty easy to drive this sort of system with data. Though I haven't done this in Replica Island yet, on previous games I've worked with systems in which the entire runtime for the game is loaded from a file in the form of a main loop graph; you can see how such a structure would be pretty easy to describe in XML, and you could even use Java's reflective properties to automatically instantiate the various systems that live in the tree. Once the graph is described in data, you can change it easily from game to game, or even from level to level if necessary, all with general-purpose infrastructure code. I've not done that with Replica Island yet, but I will eventually--probably after the game ships.

Game graphs are not specific to Android, but I use them a lot and I find them a pretty powerful (and generally underrated) pattern for managing real-time update loops. Like the GameComponent system, they leave the door open to future revision by separating data structures from code. This kind of system is also pretty simple to write (my entire graph is based on two core classes, a node and a group node). Of course, for small projects they are probably overkill--it is likely faster and less error prone to just write a traditional main loop and update the code every time you need to change something. But for medium or large projects, or projects based on a codebase that is intended to be reusable across many different titles, game graphs are a pretty neat way to structure your frame.

Friday, October 2, 2009

Rendering With Two Threads

The Replica Island renderer is based heavily on the GLSurfaceView class that ships with the Android SDK. I've made a couple of modifications but the code is pretty similar to the regular version: a derivation of GLSurfaceView.Renderer that draws the frame gets called every frame, followed by a call to eglSwapBuffers() to actually display the rendered frame.

GLSurfaceView provides a way to run user code in the same thread as the renderer. This makes writing games pretty easy; you can just implement a Runnable, implement a Renderer, stick them both into a GLSurfaceView and get stuff moving around on the screen. Indeed, it's more than sufficient for many applications; my SpriteMethodTest demo works this way just fine.

But for Replica Island I took a different approach. The problem with the single GLSurfaceView thread is that eglSwapBuffers() must block on the hardware until the previous frame finishes drawing. That means that even if you have nothing to draw, a call to eglSwapBuffers() takes 16.67ms to complete. (And of course, if you have a lot to draw, it could take a lot longer).

Now, just in case you are not used to thinking in terms of milliseconds, here's a quick primer. To achieve the magical "60 frames per second" that many games strive for, you need to have a new frame displayed to the user every 16.67 ms. If you go for 30 fps, you have ~32 ms to complete a frame. All your game code, plus all your OpenGL code, plus the actual time it takes to draw the frame must fit within 16.67 ms to achieve 60fps.

In Replica Island, the game code is fairly heavy-weight. I have all that collision to run, plus updates of all the active entities on the screen, plus sound playback and all that jazz. Turns out that it's usually more work to calculate a single simulation step than it is to actually draw the frame. Since this code takes time to execute, the 16 ms block that eglSwapBuffers() incurs makes it really hard to hit 60 fps. What I really want to be able to do is run game code while eglSwapBuffers() is blocking; that way I can pipeline the game updates while the hardware is busy drawing the frame.

So I split the game code off into a separate thread. This makes three threads, by the way: the main UI thread that all Activities have by default, the GLSurfaceView render thread, and this new game thread (actually, there are a few more that are generated by the system for things like orientation sensor updates, but they don't affect the equation much). Now my game code and my renderer can run asynchronously, and I win back some of that time spent in eglSwapBuffers().

Now comes the tricky part. I have two threads running in parallel that need to sync up once a frame so that the game thread can tell the render thread what to do. There's a lot of ways to go about synchronizing these two threads, but I went with a double buffer solution. The game thread fills up a buffer of commands to draw the next frame, and when it is ready it waits for the render thread to begin the next frame. At that point, the buffer is passed to to the render, which can then go off and draw the next frame asynchronously. The buffer that was used to draw the last frame is passed back to the game thread, which fills it up again the next frame. So drawing is the process of swapping these two buffers back and forth during a (hopefully short) choke point at which both threads stop and communicate.

This solution was attractive to me because it was simple, and so far it seems to be plenty fast. However, another solution might be to have a queue that is shared by both threads, with the game thread pushing commands in one end and the renderer executing commands out of the other. In theory such a solution wouldn't need both threads to ever perfectly align--blocking would only occur when one thread or the other was starved. But I haven't done this yet because it is going to be significantly more complex than the double buffer.

My render commands are objects that are allocated out of pools that the game thread owns, and must be returned to those pools when they have been drawn. In the double buffer system, the queue that is returned from the render thread contains commands that can be safely returned to their pools, but in the shared queue system there's no obvious way for the game thread to know how much has been drawn. I suppose there could be two shared queues, one in each direction, but that would still be a lot more complicated than what I have now. Right now almost no code outside of the buffer swap system knows about other threads; the pool objects and the objects they contain are not thread safe and, as it stands, don't need to be.

Is my solution the best for Android apps? I don't know. It seems to work pretty well and it is uncomplicated, which are two points in its favor. Still, I'd like to give this shared queue idea a shot at some point; my gut tells me that it will be slightly faster than the double buffer (less blocking in the average case) but a lot more complex, which might make it not worth the effort. Programmer guts are, however, extremely unreliable, so I will probably give this method a shot after Replica Island ships.