Working on model loading and animations

Working on model loading and animations

After doing all that quad and box rendering which you can read about here, we needed to be able to render actual 3D models. In order to do that we needed to parse a 3D format and use the parsed data to create all the geometry and then render it. To make the loading process as fast as possible I decided to make our own format that would be very easy to parse.

One way to do this is to use a 3rd party library like assimp (https://assimp.org) which can parse a bunch of 3D formats like fbx, blend, gltf, dae and much more and then parse the results and build our own file from that. This approach is great in the sense that you can export whatever from blender or get any of the supported formats directly online and just parse them with assimp and use them. This would be a very versatile approach. The downside is that we would have to use a 3rd party library with its edge cases, ups and downs. So I decided on a different approach.

The approach we ended up using is just exporting a collada (.dae) file from Blender. Collada is pretty straight forward to parse and even though it's sort of on its way out, it's good enough for now. I built a command-line tool called Pina Collada (very creative I know) that converts a Collada file to our own *.temt file. The important bit is the resulting format, we can change Collada with something else if the need arises, the output will still be the same *.temt file.

The temt format is a basic text file that has information like: version, followed by a newline character; number of vertices, followed by a newline character; number of indices; some transform matrices; a list of vertices and a list of indices. All elements from a category are separated by spaces and all categories are separated by newlines. This makes it trivial to parse. The last "t" from temt stands for text, and the plan is to build the same structure but in binary (so a tem file). For debug purposes temt is good enough. Going for binary will reduce the parsing times as converting more than a few thousand floats from string to float is much more costly than deciding that the next 32 bits represent a float. Also binary would take up less space when compared to the text equivalent.

Pina Collada

Arguably the best way to write a tool that parses a collada file would be to use something like python. Of course I didn't do that, cause I wanted to 𝗆̶𝖺̶𝗄̶𝖾̶ ̶𝗅̶𝗂̶𝖿̶𝖾̶ ̶𝗁̶𝖺̶𝗋̶𝖽̶𝖾̶𝗋̶ ̶𝖿̶𝗈̶𝗋̶ ̶𝗆̶𝗒̶𝗌̶𝖾̶𝗅̶𝖿̶ test the code I wrote so far. So I created a new cpp file, imported the TECore.h header file and started working.

TECore contains all the memory arena stuff, file loading and other utilities and it made the process pretty pleasant. It was much more fun to write this tool than I anticipated. Considering that this tool is not part of the main "engine", I didn't put much effort into making it particularly efficient but it does its job (seemingly) instantly.

The steps taken to get a temt out of a Collada are more or less:

  • load the file into memory;
  • parse the data;
  • build a vertex list (position, normals, texture coordinates);
  • remove duplicate vertices from the list;
  • build an index list for the remaining vertices;
  • write all the data to a buffer and save it as a temt file.

You can see the terminal output of the tool in the screenshot below.

Article content

Writing this tool was pretty straight-forward and the bugs that emerged along the way were pretty easy to fix. Testing the TECore code in a different project was very satisfying.

The main issue that emerged was the need of a dedicated string primitive. Most of the issues that occurred where due to null terminated strings. It is very clear now that TECore should have a string primitive that has a char* and a length, as well as functions that operate on this string primitive. I don't foresee desperately needing those for the main project, but it is something that I'll implement at some point.

Rendering the models

Since the hard work was done by Pina Collada, loading the temt files into the main project was trivial. All the data was already very conveniently laid out, I just had to read it and use it. Rendering the models wasn't much more complicated than rendering a box. This part of the project went surprisingly faster than anticipated. You can see the models in the video below with the focus on a bug that happened due to one of those null terminated string issues (it was easy to fix).

Dog Walk Bug

Of course, when something goes surprisingly faster you sometimes get hit by something that goes surprisingly slower. I this case, even though the static model loading was, I dare say trivial, loading and rendering skeletal animation models was way slower than anticipated.

Skeletal animations

The only thing I knew about skeletal animations was that you can rotate an arm from a shoulder joint and the full arm would move as a unit. So you could build keyframes by playing around with the model like you would with one of those wooden humanoid figures. Besides that, I had no idea how it all works. After learning more in depth about it, I think it's pure genius. I'll try to explain the inner workings as best I can.

A skeletal model hierarchy starts with a root bone, which in a humanoid model is usually the hip bone. Then follow the child bones. You can see a bone hierarchy in tree from in the image below.

Typical humanoid bone hierarchy

Article content
Typical humanoid bone hierarchy

The cool part comes from having the coordinates of each bone defined in the coordinate space of the parent bone. This is sometimes defined as bone space. So the bone coordinates are defined in bone space. Let's focus on the right arm starting from the shoulder as the base coordinate space.

Article content
Arm bones in T-Pose

In the drawing above, the upper arm bone is defined in the coordinate space of the shoulder bone. The lower arm bone is defined in the coordinate space of the upper arm bone and the hand bone is defined in the coordinate space of the lower arm bone.

I just made them all be at coordinates x = 1, y = 0. This is trying to replicate a T-Pose model which typically means that the model has its arms stretched to the sides and parallel to the floor.

A very important detail, that I already mentioned a few times above, is that those (1, 0) coordinates are not the same. The upper arm bone is at (1, 0) relative to the shoulder bone. The lower arm bone is at (1, 0) as well, but this time relative to the upper arm bone. If we define the lower arm bone relative to the shoulder bone as well, it would be placed at (2, 0) instead.

Now what's the big deal about having the bones defined in the parent bone space? The magic happens when we rotate one of the non-leaf bones. So let's rotate the upper arm bone by 45 degrees.

Article content
Rotated arm

After rotating the upper arm bone, the child bone coordinates remain unchanged. Because they are defined in the parent bone space and we just rotated the parent bone space, the child bones didn't change at all in bone space (they're in the same place they used to be in relation to their parent).

To get a more intuitive sense, we could define each bone coordinate space on it's own piece of paper and layer each child paper on its parent's paper like in the image below.

Article content
Paper metaphor

If we then rotate the blue paper, representing the upper arm bone, you can see all the pieces of paper on top rotating with it. You can observe how the orange lower arm bone paper still has the same position relative to its blue parent paper and the same goes for the green paper.

Article content
Rotated paper

This is the core of why we can manipulate 3D models with bones the way we do. I find it very inspiring and part of the genius of skeletal animations.

Vertices

If we just stick to bones we don't have much of a 3D model, we still need vertices. The default vertex coordinates we get in a collada file are the coordinates of the T-pose (also called bind pose) defined in local (model) space. The animation transformations however, are defined in bone space. Each vertex has a list of bones it is influenced by. Each bone has a weight (influence) and all bone weights (for a specific vertex) add up to 1. A keyframe of animation is defined by a list of transformations in bone space. However, we cannot render a model in bone space, we need to go back to model space for that and continue with our regular flow from there. Conveniently Collada already provides inverse bind transform matrices for all bones and we don't have to calculate those ourselves. These matrices are used to go from bone space to model space and we do that by going up the bone hierarchy and computing the model space coordinates for all of them. The web is full of tutorials regarding this part so I won't go into many details but a short intuitive (and slightly inaccurate) way of describing it goes something like this: If you want the hand transform to be in model space you need its parent (the lower arm) to be in model space because the hand is defined in its parent bone space. For the lower arm to be defined in model space it needs its parent (the upper arm) to be defined in model space and on and on until you reach the root bone. A more efficient way to compute this is to start with the root bone and go towards the leaf bones, storing matrices along the way (this way you don't compute a transform matrix twice).

Lerp, Slerp, Nlerp

In order to implement actual animations we need to interpolate between the keyframes we compute. Conveniently, Collada also provides a time stamp for each transformation so we know how much time we have between two keyframes. The transforms of each keyframe contain position, scale and rotation. Scale and position can be interpolated with a regular Lerp but interpolating rotations is where it gets interesting.

Article content
Slerp vs Lerp

If we try to just use linear interpolation for rotation it doesn't quite work. As you can see in the image above, rotating our black arrow from A to B should follow the blue path but with a Lerp it follows the green path instead. In order to get accurate rotations, similar to the animations we see in the original Blender model, we would need to user spherical linear interpolation. Spherical linear interpolation (Slerp) yields values on the dotted blue line which is exactly what we need. The problem however, is that Slerp is pretty expensive to compute. In skeletal animation you have a bunch of bones and you would need to compute Slerps for each bone 60 times per second (for a 60FPS target). Computers are fast enough these days, but if you can save some computation and use that extra compute time for something else that can improve your product you (almost) always choose that path.

The next best thing is to use Nlerp (normalised linear interpolation), this is somewhere between a Slerp and Lerp when it comes to the results it produces. With Nlerp the interpolated value (as well as all the vectors involved) is normalized which will make the value end up on the blue curve. It's basically cheating by doing a regular Lerp and adjusting the vector magnitude so it ends up on the blue curve instead of the green path. The problem is that it doesn't follow the same path as Slerp to get there, it just takes a straight line to get there. If we were to setup a "race" between a Slerp and an Nlerp, the Slerp would move constantly along the curve while the Nlerp would accelerate quickly at the beginning and at the end. It turns out that usually in skeletal animations the difference of rotation between keyframes is pretty small and the difference between Slerp and Nlerp becomes less visible. The rotation is still not perfect, but it is close enough and the performance boost should be worth it.

Disclaimer: I didn't really test the differences, I just went with Nlerp directly. Our Agility 3D project has a single skeletal animation of a dog with 26 bones. So we would get at most 26 Slerps per frame, don't think that would make a big difference. Also, Nlerp was much more easy to implement when compared to Slerp. On a bigger project with many animated characters Nlerp should provide a huge boost (a Slerp implementation would require a few trigonometric function calls while an Nlerp is a simple Lerp followed by normalization).

How about those weights?

Like I mentioned before, each vertex is influenced by a list of bones each with its own weight (influence). The weights of these bones add up to 1 for each vertex. So if a vertex is influenced by a single bone, that weight would be 1. If it's influenced by 2 bones and one has a 20% influence on the vertex and the other one 80%, the weights would be 0.2 and 0.8 and so on. These weights are taken into account when computing the final vertex transform. We perform a weighted interpolation for the position, scale and rotation of the bone transforms for this particular vertex and come up with the final transform. This comes very handy for more organic animations, like when animating a human arm, it's not square at the elbow, it gets rounded and looks believable (because both the upper arm and lower arm bones have an influence on the vertices around the elbow, so we get a smooth transition instead of sharp corners).

You can see our first working animation in the video below. That model was done in Blender after following this tutorial on youtube https://www.youtube.com/watch?v=4z7G4TyKE9g.

You don't need pants for the victory dance

Limitations

In order to use all these bones and weights, we need to send bone ids and their weights for each vertex to the shader. After some research online, I found out that most people use 4 bones per vertex to limit the amount of data transferred to the GPU and the amount of computations required to compute interpolations and related transforms. So we also went with 4 bones per vertex.

If an animation has areas with more than 4 bones per vertex, I just take the top 4 bones (the highest weights) and recomputes their weights to add up to 1. This works fine for most animations as the 5th bone and beyond have pretty small weights which can be ignored. That being said, there are exceptions. If we have 5 or more bones with more or less equal weights the animation will look very wrong with just 4 bones. This is especially true for snake like models but we also encountered some issues in our dog model (around the neck area).

The workaround for this was to limit the number of bones per vertex from inside Blender itself. This will result in a slightly less refined animation (depends on a case by case basis) but it's good enough for our purposes.

If in the far future we'll need some snake like animations, we can have separate render functions that work for more bones. With this approach we can still keep the 4 bone per vertex limit for most rendering and have the more than 4 bone per vertex approach work for the exceptions.

Conclusion

Implementing static 3D model parsing/loading and rendering was much easier than I thought, while skeletal animation was quite the opposite. In the end it was a huge learning experience and I'm very grateful that I had the time to implement all that.

Things are getting more well rounded and it's finally time to implement the full Agility 3D Unity part in our own "engine". Unexpected situations will most certainly pop-up, but it's part of the process. I still find this project very fun to work on and we'll see where we end up.

I hoped you enjoyed reading this article and you can find more on our website.

To view or add a comment, sign in

More articles by Razvan Soare

  • Working on a 3D graphics “engine” side project

    How it started A few months before the Unity runtime fee disaster, I was working on some tools that would help us with…

    3 Comments
  • How we used testing in our Unity3D game

    In the following lines I’ll try to explain what Unity tests are, how to build them and how we used them in our puzzle…

  • What’s the deal with Big O notation?

    If you are a programmer, even if you are a beginner, chances are you’ve heard the term Big O being mentioned. So what…

  • How to embed Unity 3D in a native Android app

    I‘m currently working on a project that requires embedding Unity 3D in a native Android app. In this article I’ll tell…

    5 Comments

Others also viewed

Explore content categories