THQ/Gas Powered Games Supreme Commander and Supreme Commander: Forged Alliance
Supreme Commander runs best on 4 cores - let’s see how!
Threading was a mid-stream change
Render split is essential to speed
Decoupled architecture is built for speed
Decoupled architecture is built for speed
Decoupled architecture is built for speed
Decoupled architecture is built for speed
Decoupled architecture is built for speed
Decoupled architecture is built for speed
Decoupled architecture is built for speed
Thread model adapts to varying loads
Displaying frame times – cool!
Sometimes, there’s more to render
Other times, there’s more to simulate
A little sync doesn’t slow this code down
Memory manager gives an additional boost
What are some current bottlenecks?
This was a great learning experience!
We learned some DOs and DON’Ts
Supreme Commander runs best on 4 cores – that’s how!
So, what do you think?
1.20M
Category: softwaresoftware

THQ/Gas Powered Games Supreme Commander and Supreme Commander: Forged Alliance

1. THQ/Gas Powered Games Supreme Commander and Supreme Commander: Forged Alliance

Thread for Performance

2. Supreme Commander runs best on 4 cores - let’s see how!

Threading in midproject can be done!
Decoupled threads
give great
performance
Memory management
extends the gains
Lessons learned

3. Threading was a mid-stream change

•Code was initially single-threaded
– Game demanded more performance
– Changed mid-project (6-12 months into
development)
– Separate render/sim threads to run at different
rates
– Support multiple cores
•Limited architecture choices due to existing
code
•Using Boost thread library
– Portable, open-source thread library

4. Render split is essential to speed

•Lots of “little” threads: sound, loading, etc.
•Sim thread: All simulation
•Render thread: Full speed, <=10x per sim
tick
•Sync phase: Once frame is ready to render
– Sync render and sim
– Fully queued in and out of sim
– Fast

5. Decoupled architecture is built for speed

Issue
Ready to start a frame and a simulation tick

6. Decoupled architecture is built for speed

Issue
Sim Thread
Interface
Simulation
Render
Run decoupled sim and render
Fully buffered input to sim,
call via Sim Thread Interface

7. Decoupled architecture is built for speed

Issue
Render can run repeatedly
Depends on sim duration
Simulation
Render … Render
Up to 10x per
sim tick

8. Decoupled architecture is built for speed

Issue
Fully decoupled? No.
A few low level systems have locks.
No major performance impact!
Simulation
Locks
Render … Render
Up to 10x per
sim tick

9. Decoupled architecture is built for speed

Issue
Issue
Simulation
Sync sim thread
out to render
thread,
via STI again
Sim Thread
Interface
Render … Render
Up to 10x per
sim tick
Render

10. Decoupled architecture is built for speed

Issue
Multiplayer:
Record
everything
going through
STI
Send over
network
Sim Thread Issue
Interface
Simulation
Sim Thread
Interface
Render … Render
Up to 10x per
sim tick
Render

11. Decoupled architecture is built for speed

Issue
Issue
Simulation
And so on…
Render … Render
Up to 10x per
sim tick
Sim
Render …
Re

12. Thread model adapts to varying loads

•Architecture scales well with loads
–Render load will often dominate
–Re-render to keep frame rates up
–Sim-heavy map will try to be simdominated

13. Displaying frame times – cool!

Thread stats in real time

14. Sometimes, there’s more to render

Render
Runs as fast as
possible
Simulation
Sim/render
sync
Both threads
synced, fully
queued in and
out of sim

15. Other times, there’s more to simulate

Sim runs across many rendered frames

16. A little sync doesn’t slow this code down

Threads are busy most of the time!
Frame n
Frame n+1
Sync
Waiting
Busy
Mostly
waiting

17. Memory manager gives an additional boost

• Memory: If you’re not careful in a threaded game…
– Memory use can thrash cache – but not a problem here!
– Memory alloc/free can be slow
• Suspected memory management was problem
– Doing lots of small allocations
– Built code to make it easy to switch mem managers
• Custom mem manager outperforms default
malloc/free
– Can cause some debugging questions
– Purchased commercial one for Supreme Commander
– Wrote new one for Forged Alliance

18. What are some current bottlenecks?

•Multiplayer: all sims run concurrently
–Limited by least-common-denominator
machine
–That’s the RTS way
•Monolithic render thread
–Multiple monitors, typically different views
–Possibly split off top part of render for
second monitor?
–Too expensive/complex for niche feature

19. This was a great learning experience!

• Good intermediate step
– Especially for threading mid-project
• Would do it differently if doing it from scratch
– Target more processor cores
– General worker threads w/dispatch system
– Templates to define an interface to common semantics
– Directed work graph/node graph (hard to express)
– Or …?
• The engine is so good, it’ll
be back in Demigod!
– Demigod team using modified
Supreme Commander engine

20. We learned some DOs and DON’Ts

•Do:
–Architect for threading from the start, if
you can
–Thread single-threaded code, if you must
–Decouple threads where possible
•Don’t:
–Be afraid to thread single-threaded code
20

21. Supreme Commander runs best on 4 cores – that’s how!

Threading in midproject can be done!
Decoupled threads
give great
performance
Memory management
extends the gains
Lessons learned

22. So, what do you think?

•Have you tried something like this?
–Successes?
–Failures?
•Have you rejected trying something
like this?
–Why?
English     Русский Rules