The common approach for taking advantage of multiple cores is, frankly, just plain misguided. Separating your subsystems into different threads will indeed split up some of the work across multiple cores, but it has some major problems. First, it’s very hard to work with. Who wants to muck around with locks and synchronization and communication and stuff when they could just be writing straight up rendering or physics code instead? Second, the approach doesn’t actually scale up. At best, this will allow you to take advantage of maybe three or four cores, and that’s if you really know what you’re doing. There are only so many subsystems in a game, and of those there are even fewer that take up large chunks of CPU time. There are a couple good alternatives that I know.
One is to have a main thread along with a worker thread for each additional CPU. Regardless of subsystem, the main thread delegates isolated tasks to the worker threads via some sort of queue(s); these tasks may themselves create yet other tasks, as well. The sole purpose of the worker threads is to each grab tasks from the queue one at a time and perform them. The most important thing, though, is that as soon as a thread needs the result of a task, if the task is completed it can get the result, and if not it can safely remove the task from the queue and go ahead and perform that task itself. That is, not all tasks will end up being scheduled in parallel with each other. Having more tasks than can be executed in parallel is a good thing in this case; it means that it is likely to scale as you add more cores. One downside to this is that it requires a lot of work up front to design a decent queue and worker loop unless you have access to a library or language runtime that already provides this for you. The hardest part is making sure your tasks are truly isolated and thread safe, and making sure your tasks are in a happy middle ground between coarse-grained and fine-grained.
Another alternative to subsystem threads is to parallelize each subsystem in isolation. That is, instead of running rendering and physics in their own threads, write the physics subsystem to use all your cores at once, write the rendering subsystem to use all your cores at once, then have the two systems simply run sequentially (or interleaved, depending on other aspects of your game architecture). For example, in the physics subsystem you could take all the point masses in the game, divide them up among your cores, and then have all the cores update them at once. Each core can then work on your data in tight loops with good locality. This lock-step style of parallelism is similar to what a GPU does. The hardest part here is in making sure that you are dividing your work up into fine-grained chunks such that dividing it evenly actually results in an equal amount of work across all processors.
However, sometimes it’s just easiest, due to politics, existing code, or other frustrating circumstances, to give each subsystem a thread. In that case, it’s best to avoid making more OS threads than cores for CPU heavy workloads (if you have a runtime with lightweight threads that just happen to balance across your cores, this isn’t as big of a deal). Also, avoid excessive communication. One nice trick is to try pipelining; each major subsystem can be working on a different game state at a time. Pipelining reduces the amount of communication necessary among your subsystems since they don’t all need access to the same data at the same time, and it also can nullify some of the damage caused by bottlenecks. For example, if your physics subsystem tends to take a long time to complete and your rendering subsystem ends up always waiting for it, your absolute frame rate could be higher if you run the physics subsystem for the next frame while the rendering subsystem is still working on the previous frame. In fact, if you have such bottlenecks and can’t remove them any other way, pipelining may be the most legitimate reason to bother with subsystem threads.
There’s a couple things to consider. The thread-per-subsystem route is easy to think about since the code separation is pretty apparent from the get go. However, depending on how much intercommunication your subsystems need, inter-thread communication could really kill your performance. In addition, this only scales to N cores, where N is the number of subsystems you abstract into threads.
If you’re just looking to multithread an existing game, this is probably the path of least resistance. However, if you’re working on some low level engine systems that might be shared between several games or projects, I would consider another approach.
It can take a bit of mind twisting, but if you can break things up as a job queue with a set of worker threads it will scale much better in the long run. As the latest and greatest chips come out with a gazillion cores, your game’s performance will scale along with it, just fire up more worker threads.
So basically, if you’re looking to bolt on some parallelism to an existing project, I’d parallelize across subsystems. If you’re building a new engine from scratch with parallel scalability in mind, I’d look into a job queue.
That question has no best answer, as it depends upon what you are trying to accomplish.
The xbox has three cores and can handle a few threads before context switching overhead becomes a problem. The pc can deal with quite a few more.
A lot of games have typically been single threaded for ease of programming. This is fine for most personal games. The only thing you would likely have to have another thread for is Networking and Audio.
Unreal has a game thread, render thread, network thread, and audio thread (if I remember correctly). This is pretty standard for a lot of current-gen engines, though being able to support a seperate rendering thread can be a pain and involves a lot of groundwork.
The idTech5 engine being developed for Rage actually uses any number of threads, and it does so by breaking down game tasks into ‘jobs’ that are processed with a tasking system. Their explicit goal is to have their game engine scale nicely when the number of cores on the average gaming system jumps.
The technology I use (and have written) has a seperate thread for Networking, Input, Audio, Rendering, and Scheduling. It then has any number of threads which can be used to perform game tasks, and this is managed by the scheduling thread. A lot of work went into getting all the threads to play nicely with each other, but it seems to be working well and getting very good use out multicore systems, so perhaps it is mission accomplished (for now; I might break down audio/networking/input work into just ‘tasks’ that the worker threads can update).
It really depends upon your final goal.