Unity Batching vs Real Batching (technical)
This is a technical post about the Unity game engine and the concept of batching. So, let’s start with the basics, like:
“What is batching”
Graphics programmers are probably so used to hearing/saying/reading this word, as a technical term, but one could almost assume it is a word invented by some clever engineer, but of course ultimately it is just a word. So what does it mean?
So, it is arranging things into sets or groups, specifically in the graphics sense, we are talking about draw call batching. Again it would probably make sense to ask:
“What is a draw call”
When writing some graphical software, we set up the data we wish to draw/render (it could be a call, or wall, a UI element), once everything is set up (the colour, position, effects to use, etc.) we ultimately need to request the item is actually drawn. The graphics card (GPU) and the processor (CPU) are separate, so the GPU can render whilst the CPU carries on working, so based on this a draw call feels like it should be very cheap to do, and you should be able to do as many as you like without worry, knowing they’ll get queued up and scheduled by the GPU in a fast and dandy manner. But it turns out this isn’t true, having too many draw calls can have a hit on performance. There are various papers written that cover this, an NVIDIA ones appears high up on a Google search:
https://www.nvidia.com/docs/IO/8228/BatchBatchBatch.pdf
So this idea of batching is to minimize the amount of draw calls being done, this is achieved by instead of rendering 10 brick walls one at a time, we combine all their data into one place, and then render them in one go. The two wins we get here are:
- We reduce the number of draw calls from 10 to 1
- We reduce the amount of state change (switching between materials, vertex buffers, colours, etc.) from 10 to 1
Unity
So Unity supports batching, both static (for things that do not move) and dynamic. Making things static is just a tick box, and tutorials/guides will strongly recommend you tick this wherever possible (i.e. whenever the object in question does not move). So we have a reasonable sized scene, with various things we wish to draw, we tick everything as static, and then sit back and marvel and how much middleware has moved forwards that we can just forget about under the hood performance and just let Unity deal with the batching, knowing we’ll get decent performance/results. Then we run it on a resource constrained device, like a Samsung Note 4 running inside a Gear VR virtual reality headset, and find out our frame rate/performance is about one quarter of what it needs to be!
So what went wrong? We are using batching, we know batching is good, what is missing? Let's do a graphical profile/debug of what Unity is actually rendering for our scene, then to make your life easier, I have taken each draw call, shown what it changes on screen, and put it into an animated GIF:
I omitted a bunch of draw calls that do not result with anything visually changing (hidden objects being rendered for example), however the above is 33 draw calls. I didn't expect it to be a single draw call, as there are a couple of different materials used, and different materials will require different draw calls, however 33 is a lot, especially when you see things like the red/black pipes along the wall being rendered one section at a time. This doesn't feel like batching…
So what Unity actually means by batching isn't batching lots of objects into a single draw call, instead what it does is order the objects being rendered so objects with the same ‘state’ are rendered consecutively. You can see this happening when it draws things like the floor all together, then the piping all together. So when I talked about the two wins batching give above, it fails on the reducing draw call side, but it succeeds on reducing the amount of state changes. It is doing something, what it is doing is ‘good’ but it is not great! Unity doesn't have some magical bullet when you can just tick a box and get great performance, and it is important to understand what is going on under the hood (which is easy to forget when Unity makes your life so easy so much of the time).
This post would be a little sad if it ended here, and would also seem puzzling given that there are games made with Unity which have reasonable performance. Of course there is a solution, we can do real batching, and make objects render in single draw calls by combining objects together into larger meta objects. For instance taking the floor chunks and merging them into one object/mesh and rendering that. This process would be a bit of a pain if you manually had to go through the entire scene and do it yourself, however luckily there are better options, for example the Draw Call Minimizer plugin for Unity from the asset store https://www.assetstore.unity3d.com/en/#!/content/2859. With this plugin you can select multiple objects, and get it to bake them to a single object (so putting all the vertices into one buffer, and combining the textures into a single texture page). We used this plugin to optimize our scene and got some better results:
There are just 6 draw calls now, you can note that the floor draws over 2 calls, as does the walls. The reasoning behind this is there needs to be a balance about how much is rendered in each draw call, if the entire scene was a single draw call, it means the entire scene would need to be rendered every frame, even if the camera can only see a small section of it (which is typically the case). In the above scene there were quite natural/obvious places to split the scene up (pillars along the wall), but ultimately further experimentation could be done to find the absolute best balance. The bottom line news is the above made the game run at double the frame rate/speed, so it is a major performance win. It is also quite easy to see from the animate images above how Unity batching isn’t what we’d classically think of as batching, it does something, but nowhere near as much as true batching (which is what you will probably need if you want to get decent graphics running on mobile devices)! Gear VR games need to run at a constant 60 frames per second (FPS), and without proper batching we were seeing lows of around 20 FPS, so best case we are getting 3X performance boost, but averaged throughout the level it would be faired to say a 2X boost (which is obviously a very nice win).
What are you hints/tips for squeezing performance out of mobile devices? With Virtual Reality getting bigger by the day, there is an expectation to see more and more virtual reality headsets which connect to mobile phones which will be adding constant pressure to make more and more impressive looking content for mobile, what else will we need to do to achieve this?
Nice article! Do you know if there are any similar tools for batching objects instantiated at run time?