Decoding a binary file format with reverse engineering

Decoding a binary file format with reverse engineering

In the previous blog I told how I made a program to help me decode a 3D model stored in a binary file. Now I will tell you what the process is like to achieve decoding and the chain of thoughts behind it.

DISCLAIMER: This article turned up to be a bit long, but it has lots of images!

Getting and choosing the files

Let see where do the game saves the files, in this case is in a folder called Vehicles/Parts:

Article content

From the game I gather that the boos bottles are small low poly models, I found this as a good thing to look up in a file and I'll tell you why later.

I was lucky enough to choose PURE, a game that saved files with meaningful names. I say lucky because I choose it because of how much fun I had playing and nothing more:

Article content

I choose LOD1 because they are the highest quality assets and are most likely to have complete internal format information. We will see that this criteria saved me a lot of work.

Before working with these files, let's look at how most binary files are encoded.

Common binary files encoding

Binary files such as PNG and Unix "Executable and Linkable File (ELF)" start with an identifier of the file format, which helps choose the parser for them.

Article content
Png file starting with ".PNG" characters
Article content
ELF file starting with ".ELF" characters

Another encoding strategy that binary files use is to use the first part of the file as a header with information about the file, this includes saving pointers to the parts of the file for quick access.

Here is ELF using this technique. According to Wikipedia ELF has the e_entry at offset 0x18

Article content
Pink enclose the value at offset 0x18

If I follow that address we will find the input function in the file.

Article content
Pink enclose the offset 0x8c90 and yellows the entry function code

With these techniques in mind, let's try to decode the ".model" file.

The model file

It wouldn't be surprising if these ".model" files had some kind of identifier at the beginning. Let's compare the two BOOST_BOOTLES files:

Article content
BOOST_BOTTLE_01_LOD1.model
Article content
BOOST_BOTTLE_02_LOD1.model

As we see, both files start with a 4-byte integer with the value 5. This is a good sign, follow basic binary file encoding practices.

Next to it it has a seemingly random value. Looking at the range of this file we see that it is within this range, this is a good

Article content
BOOST_BOTTLE_01_LOD1.model
Article content
BOOST_BOTTLE_02_LOD1.model

When I go to this address in each file, we even see the same text shortly after the address:

Article content
BOOST_BOTTLE_01_LOD1.model


Article content
BOOST_BOTTLE_02_LOD1.model

The text I found says "opaque shadow", this gives me the clue that it is a material or shading setting. I chose to ignore reading materials or textures for now as I'm only purely interested in the 3D mesh.

With this I have enough information to set some notes in the file and I will choose to stay on the "BOOST_BOTTLE_01_LOD1.model" from now on.

Article content
The famous Notes feature!!

Finding the pattern

Now here it gets a little complicated. For you to understand this part better you need to read this little article first, just the background is enough.

First I assume that since the material has a pointer, before the material it has the vertices and indices of the mesh. My first thought is to find the indices by looking for index patterns like "012", "021", "210" as those would be the indices of the first triangle. In a binary file you need to know if the indexes are stored in 2 bytes or 4 bytes (in modern games don't use 1 byte indexes and in 2009 forget about 8 bytes). For search I use vscode with HexEditor because I haven't added search in my viewer yet.

Article content
found the 012 index pattern


Here I found it using only 2 bytes per index, and it was found 3 times, I will take the first match as the indices of the mesh. With that information I complete the index in the pattern entry panel:

Article content

In the image above I set the offset to be as marked in the search image, the index type to 16-bit unsigned integer (2 bytes), and the count to 3, a very conservative amount of just a triangle, but If I set a larger quantity, I risk drawing vertices that are not in memory and crashing the application.

Now to the vertex positions. For them, I will assume that they are next to the material pointer at offset 0x8 and the quantity will also be 3.

Article content

Now the Vertex component type is set to a 32-bit (4 byte) decimal, but so far I don't know if that's true, so I have to look at the 0x8 offset and see which type gives a more consistent number. I'll drop integer types since they haven't been used for vertices since the ps1 era:

Article content
F32 showing a too small value
Article content
F16 showing a small but coherent value

As shown in the images above, F32 is not such a good guess as it gives a very small value, F16 is closer to 1.0, so I will choose the F16 type.

Article content

A vertex padding of 6 gives me the vertices position next to each other highlighted in blue but this value gives me nothing on screen:

Article content

By trying a bunch of values I got to the 28 bytes padding giving me a triangle!!

Article content
The triangle looks dark for the shading

Remember how I said it would be helpful to pick a small asset? Since BoostBottle has few triangles, the triangles are larger, so when I try to draw the first triangle we can see it better. This is also useful for working with a small file and seeing the pattern easily since we don't have to move thousands of "lines" to find the next section.

Now, to find the number of vertices, my best guess would be to find the end of the indices and look for the largest value, which will give me the last vertex index. How do I find the end of the indices? Well, this one was out of luck, it turns out that the second index pattern match 012 is next to the end of the indexes.

Article content
Pink: the second pattern match. Yellow: last index value

The last index appears to be 302 so it has 303 vertices. Trying this pattern I get the complete model!!

Article content

By deriving the algorithm from the chain of thoughts behind the reverse engineering process, I was able to load many models:

Article content
The famous ATV engine

Algorithm:

- find the index pattern 012
- find the second index pattern 012 to get the amount of indices (output: index_array)
- find the highest index in the index array (output: vertex_count)
- load the vertex array from offset 0x8 up until the vertex_count        

But not all models follow this algorithm, for example: LOD2 has another vertex padding (24 bytes) and the indices are right behind the material pointer. Some LOD1s also wouldn't work by simply showing a bunch of smashed triangles.


Conclusion

To reverse engineer a 3D model it is pertinent to have some knowledge of how they are stored in the GPU and also about the game to choose the correct file to work with and save a lot of time. Finding the vertices and indices to draw them requires a lot of time and educated guessing, but the parser can be derived from the reverse engineering process you used to get the model on the screen.

Another very very important thing. The time invested in the tool was worth it!!

To view or add a comment, sign in

Others also viewed

Explore content categories