C compilation process - An introduction

C compilation process - An introduction

Introduction

Knowing what goes on behind the scenes when we compile our code.

The compilation process

Overview of the compilation process

-   The compilation process is about the conversion of the C source code into an executable that the operating system is capable of running.

-   This process is composed of four main steps:

    1.  Preprocessor

    2.  Compilation

    3.  Assembly

    4.  Linker

-   Let's take each step and dissect it to see its internals, but before diving in, there are two rules that we must be aware of:

    1.  Only source files are compiled.

    2.  Each source file is compiled separately.

-   Our analysis will comprise three main points:

    1.  The input of the process.

    2.  The function of the process.

    3.  The output of the process.

Step 1 - Preprocessing

Input

A C source code file `(.c)` is fed into this process.

Function

-   Inclusion of the header files in the source code.

    -   If the program contains `#include` this line is to be replaced by the original content of the header file.

-   Expansion of macros.

    -   Every macro defined by the `#define` keyword is to be replaced with its value.

-   Removal of the user comments.

Output

-   The output of this process is an expanded C code file without any preprocessing statements.

-   The output is of type `(.i)` or `(.pre)`.

-   This file is known as a translation unit or compilation unit.

Viewing preprocessed code

To view the translation unit we use the `-E` flag of the GCC compiler.

gcc -E myProgram.c        

Step 2 - Compiler

Input

-   The input of this step is the translation unit.

-   The compiler operates on a single translation unit only, so when there are multiple source code files that need to be compiled, each one is taken on an individual step.

Function

-   The function of the compiler is to produce an optimized assembly code file.

-   One of the main functions of the compiler is performing lexical analysis and syntax analysis on the translation unit, thus any error during the compiler, is mainly a syntax error.

-   The compiler resolves only the current code, what we mean by this it doesn't resolve any function that is not defined in the current translation unit such as `printf()` as this still needs to be linked by the linker.

Output of the process

-   A list file: This file contains the corresponding assembly code for each of our instructions.

Viewing the assembly code

To view the assembly code uses the `-S` flag for the GCC compiler.

gcc -S myProgram.c        

Step 3 - Assembler

Input

-   The input of this process is an assembly code that could be generated by the compiler or written by the user.

Function

-   The conversion of the assembly code file into an object code file.

Output

-   An object file.

Creating an object file from an assembly file

as myProgram.s -o myProgram.o        

Creating an object file from a C source code file

gcc -c myProgram.c        

Step 4 - Linker

Input

The linker takes three input files:

1.  Linker file.

2.  Object file.

3.  Library files.

+ What is a linker file?

    It's a file written in a linker script that describes the segmentation

    of the memory of the target machine that the code is to be run on.

Function

-   Linking of external libraries and object files together.

-   Resolution of external referenced functions and objects, such as the `printf()` function.

-   Performing memory allocation following the linker file.

Output

The binary file for the corresponding machine.

To view or add a comment, sign in

More articles by Khaled Mustafa

  • Demystifying the Differences Between Soft Links and Hard Links

    I wanted to upload my dot files to a GitHub repository, but they were spread across multiple directories. The solution…

  • A NATO Phonetics Converter

    I have developed a basic C program designed to read input from a file, parse it, and generate output using NATO…

  • Bit-Banding

    What is Bit-Banding? It is a feature in the ARM processors that provides a safe and fast way to change a single bit in…

Others also viewed

Explore content categories