Flexibility and Control programming STM32F4x without STM32CubeIDE: Part 1 – Bare Metal

Jesus Antonio Nieblas

Published Mar 4, 2024

Repository with the project: GitHub

Introduction

When we start in the world of microcontrollers, we all begin with an Integrated Development Environment (IDE) provided by the microcontroller's manufacturer. In my case, I still remember in university when I downloaded MPLAB IDE to program the PIC16F887. During my practical work, I used Code Composer Studio and the TIVA TM4C123G board, where I became familiar with ARM architecture and developed an affection for it.

Years later, curiosity led me to explore what is the minimum required to program a microcontroller. I acquired two development boards (STM32F103 and STM32F411RE). In that curiosity, I came across the term 'Bare Metal', which is essentially what I was looking for – programming an embedded system directly on the hardware using minimal files to program a microcontroller. I achieved it and then left it forgotten once again.

I recently brushed up on that knowledge, but this time working with the STM32F446RE board and added CMSIS and HAL Driver libraries, among other changes like using CMake. On this occasion, I thought of storing these files on GitHub and sharing them with those who share the same curiosity, as I consider it an interesting topic. So, this post is to demonstrate how to create a 'Bare Metal' project and gradually add two of the standard libraries for working with STM32 boards.

Development Environment Setup

It's important to mention that Linux will be used to program the microcontroller. In this case, I used a VirtualBox virtual machine with an LTS partition of Ubuntu (22.04.3). For editing files, I use VSCode, either within the virtual machine or by using SSH to connect VSCode between Windows and the Ubuntu virtual machine.

The next step is to download the Toolchain to generate the files and OpenOCD, which helps us load the file onto the development board. We also need to install Make to avoid typing all the commands.

I use the following forum for the manual installation of the ARM toolchain: Here

Note: Ubuntu 22.04 comes with Python version 3.10 by default, which causes conflicts with the toolchain. We manually switch to version 3.8; the following forum explains it better: Here

Command line to install OpenOCD and Make:

sudo apt update 
sudo apt-get install openocd 
sudo apt install make

We now have the development environment ready to proceed.

Build Process

The compilation process for embedded systems involves transforming source files (*.c and *.s) using a compiler and generating object files (*.o). These object files are combined in a linking stage, using the linker, to form a relocatable executable file (*.elf). The *.elf file contains the executable code and provides information about memory organization and the location of program sections. This final file is ready to be loaded and executed on an embedded system.

Following the Bare Metal methodology, we use only 3 files:

Linker script: linker_script.ld

Startup code: startup.c

Source code: main.c

With the aforementioned, the compilation begins using the arm-none-eabi-gcc tool with source files such as main.c and startup.c, representing the main code and startup code, respectively. These files are individually compiled using arm-none-eabi-gcc, generating object files (*.o). Then, in the linking stage, arm-none-eabi-ld takes the object files along with the linker script linker_script.ld and combines them to form a relocatable executable file (.elf). This *.elf file, for example, is named blink.elf.

After compiling the code and obtaining the executable, the next step is to transfer it to the target device. We use OpenOCD on our PC to communicate with the ST-LINK programmer, which in turn establishes communication with the microcontroller. The executable is stored in non-volatile flash memory as indicated in the linker script. Upon starting the microcontroller, our startup code takes care of copying the initialized data section (.data) to SRAM, and the uninitialized data section (.bss) is filled with zeros. Subsequently, the main() function is called, initiating the execution of our application. This process ensures that our program runs correctly on the microcontroller.

Linker script

The linker script defines how memory will be organized and how data sections will be allocated in the system memory space during program compilation. It is crucial to ensure proper code execution on embedded systems by specifying the location of critical areas such as the code start, the interrupt vector table, and other essential sections.

ENTRY() specifies the program’s start address, with Reset_Handler as the program’s entry point function.

ENTRY(Reset_Handler)

MEMORY() is used to define memory regions, specifying the size and location of FLASH and SRAM memory. To set the location and size of memory, we can refer to the microcontroller datasheet, in this case, the STM32F446RE. For SRAM, the starting address is 0x20000000 with a capacity of 128 KB, and for FLASH, the starting address is 0x08000000 with a capacity of 512 KB.

rwx (read, write, execute): Allows reading, writing, and executing in a memory region.
rx (read, execute): Allows reading and executing from a memory region.

MEMORY 
{ 
  FLASH (rx): ORIGIN = 0x08000000, LENGTH = 512K 
  SRAM (rwx): ORIGIN = 0x20000000, LENGTH = 128K 
}

SECTIONS() is used to assign different sections of the program to specific locations in memory. They are often named isr_vector, text, data, and bss.

Recommended by LinkedIn

Timers and Interrupts Explained Through PIC16F628A

Rauf Valijanov 1 year ago

Embedded Systems Weekly #134

Guillaume Sempé 3 years ago

Difference between programming and Embedded systems…

Sabyasachi Datta 2 years ago

isr_vector: Contains the interrupt vector table, which holds the initial addresses of the interrupt routines. This section is critical for interrupt management in the microcontroller. Normally placed in the data section, but we create a separate section.
text: Stores executable code. The program instructions are found in this section.
data: Holds initialized variables. When the program starts, the data in this section already has assigned values.
bss: Reserved for uninitialized variables. Upon program initiation, this section is filled with zeros.

SECTIONS 
{ 
  .isr_vector : 
  { 
    KEEP(*(.isr_vector)) 
  } >FLASH 

  .text : 
  { 
    . = ALIGN(4); 
    *(.text) 
    *(.rodata) 
 
    . = ALIGN(4); 
    _etext = .; 
  } >FLASH 

  _sidata = LOADADDR(.data); 

  .data : 
  { 
    . = ALIGN(4); 
    _sdata = .; 
 
    *(.data) 
 
    . = ALIGN(4); 
    _edata = .; 
  } >SRAM AT> FLASH 

  .bss : 
  { 
    . = ALIGN(4); 
    _sbss = .; 

    *(.bss) 
 
    . = ALIGN(4); 
    _ebss = .; 
  } >SRAM 
}

It is also important to define the symbols etext, sdata, edata, sbss, and _ebss using the location counter (.). These symbols will be used in the startup code to ensure copying and zero-filling occur at the correct memory addresses. Additionally, we ensure everything is aligned on 4-byte boundaries, following the programming guide recommendation. This approach aims to avoid unaligned memory accesses, which are only allowed for certain instructions, are slower than aligned accesses, and could lead to a usage fault exception if used improperly.

Startup

We will approach the Startup file as follows:

Configure stack pointer and interrupt vectors: Set the stack start and interrupt address table.
Copy data from Flash to SRAM: Transfer data from Flash to SRAM for dynamic use.
Initialize uninitialized variables with zeros: Assign zeros to uninitialized variables for default values.
Call the main() function.

The stack pointer usually points to the end of SRAM. This is because stack operations on Cortex-M4 processors rely on a full descending stack (SP decrement before storage), so the initial value of SP should be set to the first memory after the top of the stack region. The main stack pointer is configured as follows:

#define SRAM_START (0x20000000U)  
#define SRAM_SIZE (128U * 1024U)  
#define SRAM_END ((SRAM_START) + (SRAM_SIZE))

The next step is to initialize the vector table in the order specified by the microcontroller datasheet.

void Reset_Handler(void); 
void Default_Handler(void); 
void NMI_Handler(void) __attribute__((weak, alias("Default_Handler"))); 
// continue adding device interrupt handlers 
 
uint32_t isr_vector[] __attribute__((section(".isr_vector"))) = { 
  SRAM_END, 
  (uint32_t)& Reset_Handler, 
  (uint32_t)& NMI_Handler, 
//continue adding device interrupt handlers 
};

The last 3 points mentioned in configuring the Startup are addressed by implementing the Reset_Handler() function.

extern uint32_t _etext, _sdata, _edata, _sbss, _ebss, _sidata; 
void main(void); 

void Reset_Handler(void) 
{ 
  // Copy .data from FLASH to SRAM 
  uint32_t data_size = (uint32_t)&_edata - (uint32_t)&_sdata; 
  uint8_t *flash_data = (uint8_t*) &_sidata; // Data load address (in flash) 
  uint8_t *sram_data = (uint8_t*) &_sdata; // Data virtual address (in sram) 
   
  for (uint32_t i = 0; i < data_size; i++) 
  { 
    sram_data[i] = flash_data[i]; 
  } 
  
  // Zero-fill .bss section in SRAM 
  uint32_t bss_size = (uint32_t)&_ebss - (uint32_t)&_sbss; 
  uint8_t *bss = (uint8_t*) &_sbss; 
  
  for (uint32_t i = 0; i < bss_size; i++) 
  { 
    bss[i] = 0; 
  } 
  // call to main 
  main(); 
} 

void Default_Handler(void) {  
while(1);  
}

In the linker script, we specify that the Reset_Handler() function is the entry point of our program. At this stage, we will use the symbols we defined in the linker script to relocate the .data section from flash memory (starting at etext) to SRAM (starting at sdata). Additionally, we will set zeros in the entire .bss section in SRAM (from ._sbss to _ebss). Finally, we call the main function.

Main

In this source file, I won't explain much because I believe programming an LED is not the focus of the post. Instead, the emphasis is on configuring files for programming without using an IDE. So, I'll provide a simple blink using only registers; it works for almost all models in the STM32F4XX family.

#include <stdint.h> 
#include <stdio.h> 
  
#define PERIPHERAL_BASE (0x40000000U) 
#define AHB1_BASE (PERIPHERAL_BASE + 0x20000U) 
#define GPIOA_BASE (AHB1_BASE + 0x0U) 
#define RCC_BASE (AHB1_BASE + 0x3800U)   

#define RCC_AHB1ENR_OFFSET (0x30U) 
#define RCC_AHB1ENR ((volatile uint32_t*) (RCC_BASE + RCC_AHB1ENR_OFFSET)) 
#define RCC_AHB1ENR_GPIOAEN (0x00U)   

#define GPIO_MODER_OFFSET (0x00U) 
#define GPIOA_MODER ((volatile uint32_t*) (GPIOA_BASE + GPIO_MODER_OFFSET)) 

#define GPIO_MODER_MODER5 (10U) 
#define GPIO_ODR_OFFSET (0x14U) 
#define GPIOA_ODR ((volatile uint32_t*) (GPIOA_BASE + GPIO_ODR_OFFSET)) 

#define LED_PIN 5 

void main(void) 
{ 
  *RCC_AHB1ENR |= (1 << RCC_AHB1ENR_GPIOAEN); 
  // do two dummy reads after enabling the peripheral clock, as per the errata 

  volatile uint32_t dummy; 
  dummy = *(RCC_AHB1ENR); 
  dummy = *(RCC_AHB1ENR); 
  
  *GPIOA_MODER |= (1 << GPIO_MODER_MODER5); 
 
  while(1) 
  { 
    *GPIOA_ODR ^= (1 << LED_PIN); 
    for (uint32_t i = 0; i < 1000000; i++); 
  } 
  
}

Makefile

To automate the compilation, a Makefile is created to streamline tasks such as creating the executable, loading it onto the development board, and deleting the executable.

Without Make, we would use the following command in the terminal to generate the executable:

arm-none-eabi-gcc main.c startup.c -T linker_script.ld -o blink.elf -mcpu=cortex-m4 -mthumb -nostdlib -Wl,--no-warn-rwx-segments

To load the executable onto the development board, you would use the following command in the terminal:

openocd -f interface/stlink.cfg -f target/stm32f4x.cfg -c "program blink.elf verify reset exit"

With the following Makefile the work is made easier, and we create three tasks:

# Makefile to compile and link code for STM32F446RE 
  
# Compiler and options configuration 
CC = arm-none-eabi-gcc 
LD = arm-none-eabi-ld 

CFLAGS = -mcpu=cortex-m4 -mthumb -nostdlib -Wl,--no-warn-rwx-segments 
LDFLAGS = -T linker_script.ld 
 
OPNEOCD_PATHS = -f interface/stlink.cfg -f target/stm32f4x.cfg 

# Executable name 
TARGET = blink.elf 

# Source files 
SRCS = main.c startup.c 
OBJS = $(SRCS:.c=.o) 
  
all: $(TARGET) 
$(TARGET): $(OBJS) 
$(CC) $(CFLAGS) $(LDFLAGS) $(OBJS) -o $@ 
  
%.o: %.c 
$(CC) $(CFLAGS) -c $< -o $@ 
  
clean: 
rm -f $(OBJS) $(TARGET) 

flash: 
openocd $(OPNEOCD_PATHS) -c "program blink.elf verify reset exit"

We open a terminal in the directory where all the files are located and can execute the following commands:

make all: To generate the blink.elf file.
make flash: To load the blink.elf file onto the board.
make clean: To delete the blink.elf file.

Note: If you are using a virtual machine, remember to connect the board to it and ensure it is not in Windows.

For the second part, we will better explain how Make works and implement CMake by adding the CMSIS library.

GitHub

Medium

Luis Burgos 2y

Excellent read! I'm currently trying on learning Rust and am eager to apply it in an embedded systems environment. Your post has a good timing for me cheers mate!

1 Reaction

To view or add a comment, sign in

Flexibility and Control programming STM32F4x without STM32CubeIDE: Part 1 – Bare Metal

Jesus Antonio Nieblas

Introduction

Development Environment Setup

Build Process

Linker script

Recommended by LinkedIn

Startup

Main

Makefile

More articles by Jesus Antonio Nieblas

Others also viewed

Embedded C Function: Concepts, Usage, Call Stack and Best Practices

Programming Microcontrollers

Preemptive and non-preemptive event-driven embedded software

Towards a Benchmark Suite for OpenCL FPGA Accelerators

Common Mistakes in Embedded C Development: Using delay() or busy loops for timing

Anatomy of a Bare Metal Synth, Part 2

Extending the Power of Logic Simulations Using the Programming Interfaces (part 3)

Getting Started with Low-Level Graphics Programming on Embedded Linux: A Beginner's Tutorial with Sample Code

Advanced Embedded C Concepts

Explore content categories

Introduction

Development Environment Setup

Build Process

Linker script

Recommended by LinkedIn

Startup

Main

Makefile

More articles by Jesus Antonio Nieblas

Flexibility and Control programming STM32F4x without STM32CubeIDE: Part 2 — CMSIS and CMake

Others also viewed

Embedded C Function: Concepts, Usage, Call Stack and Best Practices

Programming Microcontrollers

Preemptive and non-preemptive event-driven embedded software

Towards a Benchmark Suite for OpenCL FPGA Accelerators

Common Mistakes in Embedded C Development: Using delay() or busy loops for timing

Anatomy of a Bare Metal Synth, Part 2

Extending the Power of Logic Simulations Using the Programming Interfaces (part 3)

Getting Started with Low-Level Graphics Programming on Embedded Linux: A Beginner's Tutorial with Sample Code

Advanced Embedded C Concepts

Explore content categories