Flexibility and Control programming STM32F4x without STM32CubeIDE: Part 1 – Bare Metal
Repository with the project: GitHub
Introduction
When we start in the world of microcontrollers, we all begin with an Integrated Development Environment (IDE) provided by the microcontroller's manufacturer. In my case, I still remember in university when I downloaded MPLAB IDE to program the PIC16F887. During my practical work, I used Code Composer Studio and the TIVA TM4C123G board, where I became familiar with ARM architecture and developed an affection for it.
Years later, curiosity led me to explore what is the minimum required to program a microcontroller. I acquired two development boards (STM32F103 and STM32F411RE). In that curiosity, I came across the term 'Bare Metal', which is essentially what I was looking for – programming an embedded system directly on the hardware using minimal files to program a microcontroller. I achieved it and then left it forgotten once again.
I recently brushed up on that knowledge, but this time working with the STM32F446RE board and added CMSIS and HAL Driver libraries, among other changes like using CMake. On this occasion, I thought of storing these files on GitHub and sharing them with those who share the same curiosity, as I consider it an interesting topic. So, this post is to demonstrate how to create a 'Bare Metal' project and gradually add two of the standard libraries for working with STM32 boards.
Development Environment Setup
It's important to mention that Linux will be used to program the microcontroller. In this case, I used a VirtualBox virtual machine with an LTS partition of Ubuntu (22.04.3). For editing files, I use VSCode, either within the virtual machine or by using SSH to connect VSCode between Windows and the Ubuntu virtual machine.
The next step is to download the Toolchain to generate the files and OpenOCD, which helps us load the file onto the development board. We also need to install Make to avoid typing all the commands.
I use the following forum for the manual installation of the ARM toolchain: Here
Note: Ubuntu 22.04 comes with Python version 3.10 by default, which causes conflicts with the toolchain. We manually switch to version 3.8; the following forum explains it better: Here
Command line to install OpenOCD and Make:
sudo apt update
sudo apt-get install openocd
sudo apt install make
We now have the development environment ready to proceed.
Build Process
The compilation process for embedded systems involves transforming source files (*.c and *.s) using a compiler and generating object files (*.o). These object files are combined in a linking stage, using the linker, to form a relocatable executable file (*.elf). The *.elf file contains the executable code and provides information about memory organization and the location of program sections. This final file is ready to be loaded and executed on an embedded system.
Following the Bare Metal methodology, we use only 3 files:
Linker script: linker_script.ld
Startup code: startup.c
Source code: main.c
With the aforementioned, the compilation begins using the arm-none-eabi-gcc tool with source files such as main.c and startup.c, representing the main code and startup code, respectively. These files are individually compiled using arm-none-eabi-gcc, generating object files (*.o). Then, in the linking stage, arm-none-eabi-ld takes the object files along with the linker script linker_script.ld and combines them to form a relocatable executable file (.elf). This *.elf file, for example, is named blink.elf.
After compiling the code and obtaining the executable, the next step is to transfer it to the target device. We use OpenOCD on our PC to communicate with the ST-LINK programmer, which in turn establishes communication with the microcontroller. The executable is stored in non-volatile flash memory as indicated in the linker script. Upon starting the microcontroller, our startup code takes care of copying the initialized data section (.data) to SRAM, and the uninitialized data section (.bss) is filled with zeros. Subsequently, the main() function is called, initiating the execution of our application. This process ensures that our program runs correctly on the microcontroller.
Linker script
The linker script defines how memory will be organized and how data sections will be allocated in the system memory space during program compilation. It is crucial to ensure proper code execution on embedded systems by specifying the location of critical areas such as the code start, the interrupt vector table, and other essential sections.
ENTRY() specifies the program’s start address, with Reset_Handler as the program’s entry point function.
ENTRY(Reset_Handler)
MEMORY() is used to define memory regions, specifying the size and location of FLASH and SRAM memory. To set the location and size of memory, we can refer to the microcontroller datasheet, in this case, the STM32F446RE. For SRAM, the starting address is 0x20000000 with a capacity of 128 KB, and for FLASH, the starting address is 0x08000000 with a capacity of 512 KB.
MEMORY
{
FLASH (rx): ORIGIN = 0x08000000, LENGTH = 512K
SRAM (rwx): ORIGIN = 0x20000000, LENGTH = 128K
}
SECTIONS() is used to assign different sections of the program to specific locations in memory. They are often named isr_vector, text, data, and bss.
Recommended by LinkedIn
SECTIONS
{
.isr_vector :
{
KEEP(*(.isr_vector))
} >FLASH
.text :
{
. = ALIGN(4);
*(.text)
*(.rodata)
. = ALIGN(4);
_etext = .;
} >FLASH
_sidata = LOADADDR(.data);
.data :
{
. = ALIGN(4);
_sdata = .;
*(.data)
. = ALIGN(4);
_edata = .;
} >SRAM AT> FLASH
.bss :
{
. = ALIGN(4);
_sbss = .;
*(.bss)
. = ALIGN(4);
_ebss = .;
} >SRAM
}
It is also important to define the symbols etext, sdata, edata, sbss, and _ebss using the location counter (.). These symbols will be used in the startup code to ensure copying and zero-filling occur at the correct memory addresses. Additionally, we ensure everything is aligned on 4-byte boundaries, following the programming guide recommendation. This approach aims to avoid unaligned memory accesses, which are only allowed for certain instructions, are slower than aligned accesses, and could lead to a usage fault exception if used improperly.
Startup
We will approach the Startup file as follows:
The stack pointer usually points to the end of SRAM. This is because stack operations on Cortex-M4 processors rely on a full descending stack (SP decrement before storage), so the initial value of SP should be set to the first memory after the top of the stack region. The main stack pointer is configured as follows:
#define SRAM_START (0x20000000U)
#define SRAM_SIZE (128U * 1024U)
#define SRAM_END ((SRAM_START) + (SRAM_SIZE))
The next step is to initialize the vector table in the order specified by the microcontroller datasheet.
void Reset_Handler(void);
void Default_Handler(void);
void NMI_Handler(void) __attribute__((weak, alias("Default_Handler")));
// continue adding device interrupt handlers
uint32_t isr_vector[] __attribute__((section(".isr_vector"))) = {
SRAM_END,
(uint32_t)& Reset_Handler,
(uint32_t)& NMI_Handler,
//continue adding device interrupt handlers
};
The last 3 points mentioned in configuring the Startup are addressed by implementing the Reset_Handler() function.
extern uint32_t _etext, _sdata, _edata, _sbss, _ebss, _sidata;
void main(void);
void Reset_Handler(void)
{
// Copy .data from FLASH to SRAM
uint32_t data_size = (uint32_t)&_edata - (uint32_t)&_sdata;
uint8_t *flash_data = (uint8_t*) &_sidata; // Data load address (in flash)
uint8_t *sram_data = (uint8_t*) &_sdata; // Data virtual address (in sram)
for (uint32_t i = 0; i < data_size; i++)
{
sram_data[i] = flash_data[i];
}
// Zero-fill .bss section in SRAM
uint32_t bss_size = (uint32_t)&_ebss - (uint32_t)&_sbss;
uint8_t *bss = (uint8_t*) &_sbss;
for (uint32_t i = 0; i < bss_size; i++)
{
bss[i] = 0;
}
// call to main
main();
}
void Default_Handler(void) {
while(1);
}
In the linker script, we specify that the Reset_Handler() function is the entry point of our program. At this stage, we will use the symbols we defined in the linker script to relocate the .data section from flash memory (starting at etext) to SRAM (starting at sdata). Additionally, we will set zeros in the entire .bss section in SRAM (from ._sbss to _ebss). Finally, we call the main function.
Main
In this source file, I won't explain much because I believe programming an LED is not the focus of the post. Instead, the emphasis is on configuring files for programming without using an IDE. So, I'll provide a simple blink using only registers; it works for almost all models in the STM32F4XX family.
#include <stdint.h>
#include <stdio.h>
#define PERIPHERAL_BASE (0x40000000U)
#define AHB1_BASE (PERIPHERAL_BASE + 0x20000U)
#define GPIOA_BASE (AHB1_BASE + 0x0U)
#define RCC_BASE (AHB1_BASE + 0x3800U)
#define RCC_AHB1ENR_OFFSET (0x30U)
#define RCC_AHB1ENR ((volatile uint32_t*) (RCC_BASE + RCC_AHB1ENR_OFFSET))
#define RCC_AHB1ENR_GPIOAEN (0x00U)
#define GPIO_MODER_OFFSET (0x00U)
#define GPIOA_MODER ((volatile uint32_t*) (GPIOA_BASE + GPIO_MODER_OFFSET))
#define GPIO_MODER_MODER5 (10U)
#define GPIO_ODR_OFFSET (0x14U)
#define GPIOA_ODR ((volatile uint32_t*) (GPIOA_BASE + GPIO_ODR_OFFSET))
#define LED_PIN 5
void main(void)
{
*RCC_AHB1ENR |= (1 << RCC_AHB1ENR_GPIOAEN);
// do two dummy reads after enabling the peripheral clock, as per the errata
volatile uint32_t dummy;
dummy = *(RCC_AHB1ENR);
dummy = *(RCC_AHB1ENR);
*GPIOA_MODER |= (1 << GPIO_MODER_MODER5);
while(1)
{
*GPIOA_ODR ^= (1 << LED_PIN);
for (uint32_t i = 0; i < 1000000; i++);
}
}
Makefile
To automate the compilation, a Makefile is created to streamline tasks such as creating the executable, loading it onto the development board, and deleting the executable.
Without Make, we would use the following command in the terminal to generate the executable:
arm-none-eabi-gcc main.c startup.c -T linker_script.ld -o blink.elf -mcpu=cortex-m4 -mthumb -nostdlib -Wl,--no-warn-rwx-segments
To load the executable onto the development board, you would use the following command in the terminal:
openocd -f interface/stlink.cfg -f target/stm32f4x.cfg -c "program blink.elf verify reset exit"
With the following Makefile the work is made easier, and we create three tasks:
# Makefile to compile and link code for STM32F446RE
# Compiler and options configuration
CC = arm-none-eabi-gcc
LD = arm-none-eabi-ld
CFLAGS = -mcpu=cortex-m4 -mthumb -nostdlib -Wl,--no-warn-rwx-segments
LDFLAGS = -T linker_script.ld
OPNEOCD_PATHS = -f interface/stlink.cfg -f target/stm32f4x.cfg
# Executable name
TARGET = blink.elf
# Source files
SRCS = main.c startup.c
OBJS = $(SRCS:.c=.o)
all: $(TARGET)
$(TARGET): $(OBJS)
$(CC) $(CFLAGS) $(LDFLAGS) $(OBJS) -o $@
%.o: %.c
$(CC) $(CFLAGS) -c $< -o $@
clean:
rm -f $(OBJS) $(TARGET)
flash:
openocd $(OPNEOCD_PATHS) -c "program blink.elf verify reset exit"
We open a terminal in the directory where all the files are located and can execute the following commands:
Note: If you are using a virtual machine, remember to connect the board to it and ensure it is not in Windows.
For the second part, we will better explain how Make works and implement CMake by adding the CMSIS library.
Excellent read! I'm currently trying on learning Rust and am eager to apply it in an embedded systems environment. Your post has a good timing for me cheers mate!