Home » How many threads should I have, and for what?

How many threads should I have, and for what?

Solutons:


The common approach for taking advantage of multiple cores is, frankly, just plain misguided. Separating your subsystems into different threads will indeed split up some of the work across multiple cores, but it has some major problems. First, it’s very hard to work with. Who wants to muck around with locks and synchronization and communication and stuff when they could just be writing straight up rendering or physics code instead? Second, the approach doesn’t actually scale up. At best, this will allow you to take advantage of maybe three or four cores, and that’s if you really know what you’re doing. There are only so many subsystems in a game, and of those there are even fewer that take up large chunks of CPU time. There are a couple good alternatives that I know.

One is to have a main thread along with a worker thread for each additional CPU. Regardless of subsystem, the main thread delegates isolated tasks to the worker threads via some sort of queue(s); these tasks may themselves create yet other tasks, as well. The sole purpose of the worker threads is to each grab tasks from the queue one at a time and perform them. The most important thing, though, is that as soon as a thread needs the result of a task, if the task is completed it can get the result, and if not it can safely remove the task from the queue and go ahead and perform that task itself. That is, not all tasks will end up being scheduled in parallel with each other. Having more tasks than can be executed in parallel is a good thing in this case; it means that it is likely to scale as you add more cores. One downside to this is that it requires a lot of work up front to design a decent queue and worker loop unless you have access to a library or language runtime that already provides this for you. The hardest part is making sure your tasks are truly isolated and thread safe, and making sure your tasks are in a happy middle ground between coarse-grained and fine-grained.

Another alternative to subsystem threads is to parallelize each subsystem in isolation. That is, instead of running rendering and physics in their own threads, write the physics subsystem to use all your cores at once, write the rendering subsystem to use all your cores at once, then have the two systems simply run sequentially (or interleaved, depending on other aspects of your game architecture). For example, in the physics subsystem you could take all the point masses in the game, divide them up among your cores, and then have all the cores update them at once. Each core can then work on your data in tight loops with good locality. This lock-step style of parallelism is similar to what a GPU does. The hardest part here is in making sure that you are dividing your work up into fine-grained chunks such that dividing it evenly actually results in an equal amount of work across all processors.

However, sometimes it’s just easiest, due to politics, existing code, or other frustrating circumstances, to give each subsystem a thread. In that case, it’s best to avoid making more OS threads than cores for CPU heavy workloads (if you have a runtime with lightweight threads that just happen to balance across your cores, this isn’t as big of a deal). Also, avoid excessive communication. One nice trick is to try pipelining; each major subsystem can be working on a different game state at a time. Pipelining reduces the amount of communication necessary among your subsystems since they don’t all need access to the same data at the same time, and it also can nullify some of the damage caused by bottlenecks. For example, if your physics subsystem tends to take a long time to complete and your rendering subsystem ends up always waiting for it, your absolute frame rate could be higher if you run the physics subsystem for the next frame while the rendering subsystem is still working on the previous frame. In fact, if you have such bottlenecks and can’t remove them any other way, pipelining may be the most legitimate reason to bother with subsystem threads.

There’s a couple things to consider. The thread-per-subsystem route is easy to think about since the code separation is pretty apparent from the get go. However, depending on how much intercommunication your subsystems need, inter-thread communication could really kill your performance. In addition, this only scales to N cores, where N is the number of subsystems you abstract into threads.

If you’re just looking to multithread an existing game, this is probably the path of least resistance. However, if you’re working on some low level engine systems that might be shared between several games or projects, I would consider another approach.

It can take a bit of mind twisting, but if you can break things up as a job queue with a set of worker threads it will scale much better in the long run. As the latest and greatest chips come out with a gazillion cores, your game’s performance will scale along with it, just fire up more worker threads.

So basically, if you’re looking to bolt on some parallelism to an existing project, I’d parallelize across subsystems. If you’re building a new engine from scratch with parallel scalability in mind, I’d look into a job queue.

That question has no best answer, as it depends upon what you are trying to accomplish.

The xbox has three cores and can handle a few threads before context switching overhead becomes a problem. The pc can deal with quite a few more.

A lot of games have typically been single threaded for ease of programming. This is fine for most personal games. The only thing you would likely have to have another thread for is Networking and Audio.

Unreal has a game thread, render thread, network thread, and audio thread (if I remember correctly). This is pretty standard for a lot of current-gen engines, though being able to support a seperate rendering thread can be a pain and involves a lot of groundwork.

The idTech5 engine being developed for Rage actually uses any number of threads, and it does so by breaking down game tasks into ‘jobs’ that are processed with a tasking system. Their explicit goal is to have their game engine scale nicely when the number of cores on the average gaming system jumps.

The technology I use (and have written) has a seperate thread for Networking, Input, Audio, Rendering, and Scheduling. It then has any number of threads which can be used to perform game tasks, and this is managed by the scheduling thread. A lot of work went into getting all the threads to play nicely with each other, but it seems to be working well and getting very good use out multicore systems, so perhaps it is mission accomplished (for now; I might break down audio/networking/input work into just ‘tasks’ that the worker threads can update).

It really depends upon your final goal.

Related Solutions

Extract file from docker image?

You can extract files from an image with the following commands: docker create $image # returns container ID docker cp $container_id:$source_path $destination_path docker rm $container_id According to the docker create documentation, this doesn't run the...

Transfer files using scp: permission denied

Your commands are trying to put the new Document to the root (/) of your machine. What you want to do is to transfer them to your home directory (since you have no permissions to write to /). If path to your home is something like /home/erez try the following:...

What’s the purpose of DH Parameters?

What exactly is the purpose of these DH Parameters? These parameters define how OpenSSL performs the Diffie-Hellman (DH) key-exchange. As you stated correctly they include a field prime p and a generator g. The purpose of the availability to customize these...

How to rsync multiple source folders

You can pass multiple source arguments. rsync -a /etc/fstab /home/user/download bkp This creates bkp/fstab and bkp/download, like the separate commands you gave. It may be desirable to preserve the source structure instead. To do this, use / as the source and...

Benefits of Structured Logging vs basic logging

There are two fundamental advances with the structured approach that can't be emulated using text logs without (sometimes extreme levels of) additional effort. Event Types When you write two events with log4net like: log.Debug("Disk quota {0} exceeded by user...

Interfaces vs Types in TypeScript

2019 Update The current answers and the official documentation are outdated. And for those new to TypeScript, the terminology used isn't clear without examples. Below is a list of up-to-date differences. 1. Objects / Functions Both can be used to describe the...

Get total as you type with added column (append) using jQuery

One issue if that the newly-added column id's are missing the id number. If you look at the id, it only shows "price-", when it should probably be "price-2-1", since the original ones are "price-1", and the original ones should probably be something like...

Determining if a file is a hard link or symbolic link?

Jim's answer explains how to test for a symlink: by using test's -L test. But testing for a "hard link" is, well, strictly speaking not what you want. Hard links work because of how Unix handles files: each file is represented by a single inode. Then a single...

How to restrict a Google search to results of a specific language?

You can do that using the advanced search options: http://www.googleguide.com/sharpening_queries.html I also found this, which might work for you: http://www.searchenginejournal.com/how-to-see-google-search-results-for-other-locations/25203/ Just wanted to add...

Random map generation

Among the many other related questions on the site, there's an often linked article for map generation: Polygonal Map Generation for Games you can glean some good strategies from that article, but it can't really be used as is. While not a tutorial, there's an...

How to prettyprint a JSON file?

The json module already implements some basic pretty printing in the dump and dumps functions, with the indent parameter that specifies how many spaces to indent by: >>> import json >>> >>> your_json = '["foo", {"bar":["baz", null,...

How can I avoid the battery charging when connected via USB?

I have an Android 4.0.3 phone without root access so can't test any of this but let me point you to /sys/class/power_supply/battery/ which gives some info/control over charging issues. In particular there is charging_enabled which gives the current state (0 not...

How to transform given dataset in python? [closed]

From your expected result, it appears that each "group" is based on contiguous id values. For this, you can use the compare-cumsum-groupby pattern, and then use agg to get the min and max values. # Sample data. df = pd.DataFrame( {'id': [1, 2, 2, 2, 2, 2, 1, 1,...

Output of the following C++ Program [closed]

It works exactly like this non-recursive translation: int func_0() { return 2; } int func_1() { return 3; } int func_2() { return func_1() + func_0(); } // Returns 3 + 2 = 5 int func_3() { return func_2() + func_1(); } // Returns 5 + 3 = 8 int func_4() { return...

Making a circle out of . (periods) [closed]

Here's the maths and even an example program in C: http://pixwiki.bafsoft.com/mags/5/articles/circle/sincos.htm (link no longer exists). And position: absolute, left and top will let you draw: http://www.w3.org/TR/CSS2/visuren.html#choose-position Any further...

Should I use a code converter (Python to C++)?

Generally it's an awful way to write code, and does not guarantee that it will be any faster. Things which are simple and fast in one language can be complex and slow in another. You're better off either learning how to write fast Python code or learning C++...

tkinter: cannot concatenate ‘str’ and ‘float’ objects

This one line is more than enough to cause the problem: text="რეგულარი >> "+2.23+ 'GEL' 2.23 is a floating-point value; 'GEL' is a string. What does it mean to add an arithmetic value and a string of letters? If you want the string label 'რეგულარი...