Building Process in C

Building Process in C

The building process simply divided into four stages as shown in the following figure.

No alt text provided for this image
  1. Preprocessing
  2. Compilation
  3. Assembler
  4. Linking

preprocessing

Preprocessing is a (pure text replacement). it removes every (#) and replaces it with its value. even in (#include <>), it goes to the directory and brings the specified headers. except (#pragma) and other directives used for the linker.

#pragma directive: This directive is a special purpose directive and is used to turn on or off some features. this type of directive is compiler-specific i.e they vary from compiler to compiler. some of the #pragma directives are discussed below.

#pragma startup and #pragma exit: Those directives help us to specify functions that are needed to run before program startup (before the control passes to main()), and just before program exit (just before the control returns from main())

No alt text provided for this image
No alt text provided for this image

Input files for the preprocessing stage are C files (file.c) and output is an intermediate file or (file.i). for each file.c there is a file.i, to generate the intermediate file using gcc toolchain, in your CMD or terminal, write the following command (gcc -save-temps file_name.c). this command will generate the intermediate file, assembly file, object file. the following two figures show the C file and the intermediate file

Compiler

The compiler is the second stage in our journey. it consists of three stages.

  1. Front End: The front end deals with the language itself: scanning, parsing, the parse-tree. in other meaning, it checks the syntax of the language. in this stage, it detects the syntax error. files that have syntax error don't pass to the next stage but other passes. this stage also converts the C code to Abstract Sympoles Tree or simply (AST). this kind of language is specified for the compiler. this tree can be written in any language the compiler knows, the AST is used by the compiler to specify or to determine the required algorithms and the paths.
  2. Middle End: Actually this the optimization layer. the compiler used its algorithms and some of its ability to access and optimize something like (local variables, static variables, static functions). if the local variables didn't be used, the compiler deletes them. if a static variable didn't be used or didn't be modified by the developer the compiler cashing it. if there is a static function in the file the user needs it and nothing depends on it and didn't be called. the compiler removes it. if the algorithm detects a line that doesn't depend on the others the compiler places, higher it to improve the execution time.
  3. Back End: In this layer, we get the machine language. it generates the assembly files like (file.s, file.asm) each microcontroller has its instruction set. so this a target-dependent layer, in this layer, it determines and puts the sections and it gives for each section a relative address . the #pragma is affecting this layer of the compiler. this also generates a symbol table, this symbol table is an important data structure created and maintained by the compiler in order to keep track of semantic of variable i.e stores information about the scope and binding information about names, information about instances of various entities such as variable and function names, classes, objects, etc.
No alt text provided for this image

Input files of the compiler are files.i and for each file.i compiler generates the file.s. the following figure shows the assembly file.

Assembler

This stage a very important one. it converts the assembly file to an object file. the object file contains zeros and ones. actually, the object file has many standards like common object file formate or simply (COFF). in this standard, the object file comes in the form of four sections.

  • File header table (F.H.T)
  • Program header table (P.H.T)
  • Sections
  • Section header table (S.H.T)

The F.H.T section puts the file size and owner information and comments. things that won't be burned or flashed with the hex file.

The P.H.T section describes each function in detail its start, end, and its length. as know the function is in (.code) section in the memory, and where it will be mapped this in the symbol table. the P.H.T isn't mandatory to be in the obj file because the file may not include any function.

Section this part contains section like .data , .code, .etc

No alt text provided for this image

The S.H.T section describes in detail each section in the file its start, end, and length. the S.H.T section is mandatory in the obj file because the file at least includes one section. the figure shows an example of this section.

Input files of the assembler is a files.s or files.asm and for each file, it generates file.o.



Linker

No alt text provided for this image

The linker is our desired Extention generator. it takes object files and archived libraries (.a) to generate a file with an extension like (.exe, .bin, .hex, .elf) for microcontrollers and PCs. the linker takes its order from the linker script because it has its own language, not C or C++ . the linker script orders the linker to take information from the symbol table and compares it with the object files. the figure shows a part of a linker script file.

No alt text provided for this image

Input files of linker object files (files.o) and archive libraries and generates only a single file with the selected extension. to provide a library to a linker you should archive it, which is done by collecting your library file compress it, and save its extension (library.a). the figure shows a part of a hex file.



References

C Programming Language, 2nd Edition 2nd Edition by Brian W. Kernighan  (Author), Dennis M. Ritchie (Author)

Chapter 7 Object File Format (Linker and Libraries Guide) (oracle.com)

C/C++ Preprocessors - GeeksforGeeks

Object Files (uni-hamburg.de)








To view or add a comment, sign in

Others also viewed

Explore content categories