C compilation process - An introduction
Introduction
Knowing what goes on behind the scenes when we compile our code.
The compilation process
Overview of the compilation process
- The compilation process is about the conversion of the C source code into an executable that the operating system is capable of running.
- This process is composed of four main steps:
1. Preprocessor
2. Compilation
3. Assembly
4. Linker
- Let's take each step and dissect it to see its internals, but before diving in, there are two rules that we must be aware of:
1. Only source files are compiled.
2. Each source file is compiled separately.
- Our analysis will comprise three main points:
1. The input of the process.
2. The function of the process.
3. The output of the process.
Step 1 - Preprocessing
Input
A C source code file `(.c)` is fed into this process.
Function
- Inclusion of the header files in the source code.
- If the program contains `#include` this line is to be replaced by the original content of the header file.
- Expansion of macros.
- Every macro defined by the `#define` keyword is to be replaced with its value.
- Removal of the user comments.
Output
- The output of this process is an expanded C code file without any preprocessing statements.
- The output is of type `(.i)` or `(.pre)`.
- This file is known as a translation unit or compilation unit.
Viewing preprocessed code
To view the translation unit we use the `-E` flag of the GCC compiler.
gcc -E myProgram.c
Step 2 - Compiler
Input
- The input of this step is the translation unit.
Recommended by LinkedIn
- The compiler operates on a single translation unit only, so when there are multiple source code files that need to be compiled, each one is taken on an individual step.
Function
- The function of the compiler is to produce an optimized assembly code file.
- One of the main functions of the compiler is performing lexical analysis and syntax analysis on the translation unit, thus any error during the compiler, is mainly a syntax error.
- The compiler resolves only the current code, what we mean by this it doesn't resolve any function that is not defined in the current translation unit such as `printf()` as this still needs to be linked by the linker.
Output of the process
- A list file: This file contains the corresponding assembly code for each of our instructions.
Viewing the assembly code
To view the assembly code uses the `-S` flag for the GCC compiler.
gcc -S myProgram.c
Step 3 - Assembler
Input
- The input of this process is an assembly code that could be generated by the compiler or written by the user.
Function
- The conversion of the assembly code file into an object code file.
Output
- An object file.
Creating an object file from an assembly file
as myProgram.s -o myProgram.o
Creating an object file from a C source code file
gcc -c myProgram.c
Step 4 - Linker
Input
The linker takes three input files:
1. Linker file.
2. Object file.
3. Library files.
+ What is a linker file?
It's a file written in a linker script that describes the segmentation
of the memory of the target machine that the code is to be run on.
Function
- Linking of external libraries and object files together.
- Resolution of external referenced functions and objects, such as the `printf()` function.
- Performing memory allocation following the linker file.
Output
The binary file for the corresponding machine.
Great job 👌❤️
Great 😍😍😍