Introduction
RISC-V Embedded Systems Training
VEGA edition
BTA Design Services
Between Layers
Overview
TBD
What will you learn?
The goal of these sessions is to teach you the following:
- How to setup a modern containerized embedded systems development environment
- The basics of RISC-V firmware development
- Focus will be on the OpenISA VEGAboard (RV32M1-VEGA) development board
- Simulating hardware using Renode
- The basics of real-time operating systems (RTOS) and Zephyr
In the end, the hope is that you gain fundamental generalizable knowledge relating to the development of firmware for microcontroller-based systems.
Development board giveaways
In addition to the above, we’ll be giving away up to 15 VEGAboards for free to participants throughout the sessions. Make sure to attend!
Training schedule and agenda
- Location: TBD
- Times:
- Day TBD: time TBD
Contact information
Instructors:
- Alfredo Herrera |
aherrera (at) alean-tec.com - Yusef Karim |
yusef (at) betweenlayers.io - Mathieu Gagnon |
TBD (at) TBD
Why RISC-V?
RISC-V is an open, royalty-free Instruction Set Architecture (ISA) originally developed at UC Berkeley in 2010. Unlike ARM or x86, anyone can implement a RISC-V processor without licensing fees or legal agreements, and anyone can read the official specification without signing an NDA.
That openness has turned RISC-V into something much larger than a research or education project. It has crept into virtually every corner of industry. NVIDIA alone ships over a billion RISC-V cores embedded throughout its hardware stack, a number that sounds absurd until you see how pervasively they use it across GPUs, embedded controllers, and security subsystems. If you use a modern SSD, GPU, or cloud service, you may be running RISC-V without even knowing it.
The ecosystem that has grown around the specification is what makes RISC-V more than just another ISA. RISC-V International, the non-profit Swiss foundation that governs the standard, coordinates hundreds of member organizations, from chip vendors and hyperscalers to universities and independent engineers, all of whom have an equal seat at the table. The specification itself evolves in the open on public mailing lists and GitHub repositories, in contrast to the closed processes behind proprietary ISAs.
Around that core specification, a layer of open-source silicon has emerged. The OpenHW Group develops production-quality, verification-heavy RISC-V cores (the CV32E40P and CVA6 families among them) that companies can integrate into real products. lowRISC, a non-profit based in Cambridge, maintains Ibex, a small 32-bit core that has found its way into everything from educational FPGA boards to the security subsystems of large-scale infrastructure. These are not toy designs; they are built, tested, and shipped in production hardware (for example, in Google Chromebooks).
The security research community has also adopted RISC-V as its testbed of choice, precisely because the ISA is hackable in the best sense: you can modify it, extend it, and tape out your ideas. The RISC-V Platform Security Model Specification is a framework for hardware-level security primitives, covering physical memory protection, trusted execution environments, and attestation. At the research frontier, CHERI (Capability Hardware Enhanced RISC Instructions), and specifically CHERIoT (its adaptation for microcontrollers), is implemented as a RISC-V extension. CHERIoT runs on real hardware today, including on the Sonata FPGA board, which is built around the Ibex core. That same Ibex core powers Google’s OpenTitan root-of-trust chip.
So when you learn RISC-V, you are not learning a niche curiosity. The ISA we will be using in this training relates to research, silicon that ships in billions of devices, and secures real systems.
At a high-level, RISC-V is:
- Open: a royalty-free Instruction Set Architecture (ISA) originally developed at UC Berkeley in 2010 and now governed by an open and transparent non-profit foundation (RISC-V International).
- Community driven: primarily driven and evolved through open-source specifications and implementations, allowing individuals (like us) and industry to freely contribute and continually advance it.
- Modular: The RV32I for 32-bit (what our development board supports) and RV64I for 64-bit base instruction sets are the minimum for each implementation. After that, there are over 100 ratified extensions to pick and choose from when designing real-world hardware.
Note
You may be thinking that writing software for an architecture supporting 100+ extensions would be a comparative nightmare. To avoid such nightmares, RISC-V has developed something called profiles which “are named groupings of standard processor ISA bases plus extensions (each identified as Mandatory or Optional)”.
RISC-V was shaped by decades of lessons from earlier architectures (MIPS, SPARC, Alpha, ARM), and its designers made deliberate choices that directly affect how simple our hardware implementation can be. The RISC-V Reader summarizes this beautifully in its very first chapter by considering many aspects important to ISA design such as cost, simplicity, performance, isolation of architecture from implementation, room for growth, program size, and ease of programming. If you want to learn more about the RISC-V ISA on your own as we progress, it is highly recommended to pick up a copy of the RISC-V Reader!
All-in-all, RISC-V is here to stay, it is open, and growing fast. We hope this motivates you to learn and continue this training with us!
Development environment
Overview
Embedded development has a reputation for being fiddly to set up. You typically need a specific compiler (one that produces code for your microcontroller rather than your laptop), a program that talks to the debug probe on the board, a handful of supporting libraries, and sometimes a simulator. Getting all of the above installed correctly can sometimes become a nightmare! To avoid this, this training will use a containerized development environment.
Note
A container is a lightweight, isolated Linux environment that runs on top of your own operating system (your “host”). Similar to virtual machines, you can think of it as an isolated box that you can use to run and install custom software inside without affecting your host system.
We’ll provide a custom pre-built container image for this training that contains everything you’ll need.
The container image is described by a single Containerfile in the vega-quickstart repository. You will never need to read through or edit it by hand (but feel free to take a look to learn more).
The Visual Studio Code Dev Containers extension handles everything: when you open the vega-quickstart folder in Visual Studio Code, it pulls the container (the first time only, which takes a few minutes), starts it, makes your project folder visible inside it, and attaches the editor to a shell running in it. From your point of view, you are just editing files and using a terminal as usual; under the hood those actions are happening inside the container.
What is inside the container?
The container is built on Ubuntu 24.04 (a common Linux distribution). On top of that base, it includes three main pieces:
- Custom RISC-V compiler suite, often called a toolchain: the compiler, linker, and related tools that turn C source code into a binary the VEGAboard can execute. We use the prebuilt toolchain from OpenISA, configured for the
rv32iinstruction set (a minimal 32-bit RISC-V variant, which is what the VEGAboard’s cores implement). - Renode, a simulator that can virtually emulate a VEGAboard. This lets you run and debug your firmware without any physical hardware attached, which is handy for getting started and for experimenting.
- OpenOCD, the program that communicates with the debug probe on the board. A debug probe is the small circuit, built into the VEGAboard, that lets your computer load firmware onto the chip and step through the running code.
Alongside the above, the image contains some additional utilities (make, git, vim, minicom, etc) and creates a regular, non-administrator user called dev that you will be logged in as when you open a terminal.
How VS Code ties it together
The devcontainer.json file under the .devcontainer directory tells VS Code how to launch the container. Two details are worth knowing about:
- The container is started in privileged mode so that USB devices on your host (importantly, the debug probe on the VEGAboard) are visible inside it
- VS Code automatically installs a small set of extensions inside the container for you: C/C++ tooling, Makefile support, CMake highlighting, GitLens, a spell checker, and XML/YAML helpers. You do not need to install any of these yourself.
- These extensions are also defined in
devcontainer.json, feel free to add additional extensions that you typically use
- These extensions are also defined in
Host requirements
Your host machine only needs four things:
- A container runtime - this is the program that actually runs containers
- Docker Desktop is the easiest choice on macOS and Windows
- On Linux you can use Docker Engine or Podman
- Visual Studio Code
- The Dev Containers extension for VS Code
- Git - the standard version control tool which you’ll use to pull down the quickstart template (see the next section)
Quickstart template
Once you’ve installed the above, clone the vega-quickstart repository to your machine:
git clone https://github.com/between-layers/vega-quickstart.git
Open the cloned repository folder in VS Code, and accept the prompt to “Reopen in Container”. After a few minutes, VS Code should drop you into a terminal inside the container. From there you can edit code, run make to build firmware, run Renode to simulate the board, or (if your host is set up for it) connect to the real VEGAboard over USB. When you are done, closing VS Code shuts the container down; any code changes stay on your host.
The rest of this page covers the host-specific setup that the container cannot handle on its own. Most of it is about giving the container permission to see the VEGAboard’s USB connection.
Additional host-specific help
Linux
Install Docker Engine (or Podman) and VS Code using your distribution’s package manager or the upstream instructions. It may be worth adding your user to the docker group so that VS Code can talk to the container runtime without asking for a password every time (see the official Docker post-installation steps for more info).
Finally, USB permissions will need one small tweak. If connectiong to the physical VEGAboard, you’ll like be using the Segger J-Link debug probe provided in the box.
This device identifies itself to your computer with USB vendor ID 1366. By default Linux only lets the root user open such devices, which is a problem because the container runs as the unprivileged dev user. The fix is a udev rule: a one-line configuration that tells the kernel to make the device readable and writable by everyone on the machine.
To create the udev rule, run the following commands:
sudo tee /etc/udev/rules.d/99-jlink.rules <<'EOF'
SUBSYSTEM=="usb", ATTR{idVendor}=="1366", MODE="0666"
EOF
sudo udevadm control --reload-rules
sudo udevadm trigger
Unplug and replug the VEGAboard after applying the rule. You should then be able to flash and debug from inside the container.
Note
USB device visibility inside containers can sometimes be finicky. If you run into issues connecting to your board:
- Double check the board is plugged in properly
- See if your host recognizes the device by running
lsusborls /dev/tty*- If the device is there on the host, but not in the container - try restarting the container with
docker restart <container-name>
Tip
If you are using Podman instead of Docker, make sure its socket is enabled (
systemctl --user enable --now podman.socket) and point the Dev Containers extension at it via thedev.containers.dockerPathsetting.
MacOS
Install Docker Desktop for Mac and VS Code. Grant it full disk access if it prompts you, otherwise it cannot share your project folder with the container.
There is one thing that the container cannot do on macOS: talk to the VEGAboard over USB. Docker Desktop’s internal VM does not expose your Mac’s USB ports, so the debug probe is invisible from inside the container. You can still do all editing, building, and simulation inside the container, exactly like on Linux and Windows. However, if you have a physical board and want to flash it, you will install the OpenISA SDK natively on macOS and run OpenOCD from there.
ARM-based
One additional exception for MacOS is if you’re using a newer Apple Silicon (M1 and newer) machine. Both the container image and the OpenISA macOS bundle are Intel (x86_64). Luckily, macOS can run them transparently through Apple’s Rosetta 2 translation layer.
Install it once with:
softwareupdate --install-rosetta --agree-to-license
Then open Docker Desktop’s settings and enable “Use Rosetta for x86/amd64 emulation on Apple Silicon” under General.
Note
Builds and simulations may be noticeably slower than on an Intel Mac or a native Linux machine due to emulation, but everything should still work.
Installing the OpenISA SDK (for flashing real hardware)
OpenISA publishes a macOS bundle that contains their prebuilt toolchain and a working OpenOCD. You only need this on macOS, and only for flashing.
-
From the open-isa.org v1.0.0 release, download these two files into a working directory of your choice (for example
~/rv32m1/):Toolchain_Mac.tar.gzrv32m1_sdk_riscv_installer.sh
-
Run the SDK installer from that same directory. It will unpack both the SDK and the toolchain/OpenOCD side by side:
cd ~/rv32m1 chmod +x rv32m1_sdk_riscv_installer.sh ./rv32m1_sdk_riscv_installer.sh -
Add OpenOCD to your shell’s
PATHso you can invoke it from any terminal. Adjust the path if the installer puts it somewhere different on your system:echo 'export PATH="$HOME/rv32m1/Toolchain_Mac/riscv32-unknown-elf-gcc/openocd/bin:$PATH"' >> ~/.zshrc source ~/.zshrc openocd --version
When you are ready to flash, open a native macOS terminal (not the one inside the container) in the same project folder and run the project’s flash target, or invoke openocd directly with the project’s .cfg. The container and the host share the folder, so the firmware you just built inside the container is already visible from your Mac terminal.
Windows
Install Docker Desktop for Windows with the WSL2 backend (this is the default in recent versions). WSL2, short for Windows Subsystem for Linux 2, is a lightweight Linux VM that Windows ships with; Docker Desktop uses it to run containers. You will also want VS Code and the Dev Containers extension installed on the Windows side. The extension launches the container inside the WSL2 VM and then connects to it.
For USB access to the VEGAboard, install usbipd-win, a small open-source tool that forwards USB devices from Windows into a WSL2 environment. After installing it, open an elevated PowerShell window and run:
usbipd list
usbipd bind --busid <BUSID>
usbipd attach --wsl --busid <BUSID>
Replace <BUSID> with the identifier shown for the J-Link device in the output of usbipd list (look for vendor ID 1366). You will need to re-run the attach command each time you unplug the board or reboot. Once attached, running lsusb inside the container should list the probe.
Firmware development
This section introduces the primary focus of this training: firmware development on RISC-V based microcontrollers.
You’ll learn:
- The main concepts relating to firmware development in general
- How to write your first firmware program to blink and LED on the VEGAboard
- How to virtually simulate the VEGAboard and run your programs in simulation (directly on your laptop!)
- How to flash and run your programs on a real physical VEGAboard
Wait, what is firmware?
Firmware is the software that runs directly on a microcontroller.
Note
Wait, what is a microcontroller?
A microcontroller unit (MCU) is a whole tiny computer packed onto a single chip: a CPU, a small amount of memory (both flash for code and RAM for data), and a fixed set of peripherals, all sharing the same piece of silicon. The VEGAboard’s main chip, the RV32M1, is one example.
Different from the microprocessor in your laptop, which only handles the CPU part and relies on separate chips for RAM, storage, and I/O. Because an MCU has everything on-board (and typically has it in much smaller proportions), it can be small, cheap, and low-power enough to live inside a thermostat, a car’s door lock, a pair of headphones, etc - all of which run one (or a handful of) small dedicated fixed programs (i.e., the firmware).
Firmware is the code that lives in the chip’s flash memory, starts running the instant power is applied, and continues executing until power is removed (or the system crashes ;D). Unlike a desktop application, it sometimes has no operating system underneath it at all. It is simply a program that talks to hardware.
Because the hardware is so much smaller than a laptop (often a few hundred kilobytes of flash, tens of kilobytes of RAM, and a single CPU running in the tens of megahertz), firmware is written with those constraints in mind. At times, every byte of memory needs to be minimized, every clock cycle accounted for, and the program has to handle everything itself: setting up the chip after reset, reacting to signals from the outside world, and keeping track of time.
The sections below walk through the three ideas that set firmware apart from “regular” software: how a program starts when there is no operating system to launch it, how a single CPU juggles many things at once, and how the code actually interacts the physical world around it.
How execution starts
On a desktop, your operating system loads your program into memory, sets up its stack, and calls main(). On a bare-metal microcontroller, there is no operating system to do any of that. The chip has to bring itself up from nothing.
When the VEGAboard powers on (or you press reset), the CPU begins executing from a fixed, known address in flash. Typically, the very first thing it finds there is the vector table: a small array of addresses that tell the CPU where to jump for important events, with the very first entry being the reset handler. The reset handler is just a function, usually written in a mix of assembly and C, and its job is to prepare the chip to run your code.
That preparation does a few things in order:
- Set up the stack pointer so the CPU has somewhere to store local variables and return addresses.
- Copy any initialized global variables (the
.datasection) from flash into RAM, since RAM starts out with undefined contents. - Zero out uninitialized globals (the
.bsssection), so variables declared without an initializer start at0. - Optionally configure the chip’s clocks, caches, and other essentials.
- Finally, call
main().
Only after all of the above does your main() function actually start running. And unlike on a desktop, main() on a microcontroller almost never returns. There is nothing for it to return to. Instead it typically ends with an infinite while (1) loop that does the real work forever (as you gain experience in this area, you’ll learn that busy looping forever is typically a bad idea and that’s where things like deep sleep and time-based scheduling comes in).
Note
If you want to see this process in full detail (for a different chip, but with the same ideas), Memfault’s Zero to main() series walks through every step of startup code, from the reset vector to the first line of
main.
What is a peripheral?
Doing random computation is great and all, but how can computation on something like the VEGAboard result in sensing or actuation in the real physical world?
A CPU on its own can add numbers and move data around in memory, but it cannot blink an LED, send a byte over a wire, or sample a voltage. Those jobs are handled by peripherals: dedicated hardware blocks that sit next to the CPU inside the microcontroller. Typical peripherals include GPIO (general-purpose I/O pins), UART (serial communication), SPI and I2C (for talking to external chips), timers, and ADCs (analog-to-digital converters).
One thing to note about peripherals is that they run independently of the CPU. Once you configure a UART peripheral and hand it a byte to transmit, it shifts the bits out on its own while the CPU goes off to do something else. In that sense, a microcontroller is really a small CPU surrounded by a dozen tiny, single-purpose coprocessors.
The way the CPU talks to these peripherals is called memory-mapped I/O. Each peripheral has a block of addresses reserved for it in the chip’s address space, and within that block sit a handful of registers, each controlling one aspect of the peripheral. Writing to an address directly changes a peripheral’s behaviour. Reading an address gives you a peripheral’s current state.
For example, the VEGAboard’s LED is connected to pin 24 of GPIO port A. The GPIOA peripheral lives at address 0x48020000 and exposes six 32-bit registers back-to-back in memory:
GPIOA @ 0x48020000
0x48020000 ┌───────────────────────────┐
│ PDOR (RW) │ Output latch: 1 bit per pin.
0x48020004 ├───────────────────────────┤
│ PSOR (WO) │ Write 1 to *set* PDOR bits.
0x48020008 ├───────────────────────────┤
│ PCOR (WO) │ Write 1 to *clear* PDOR bits.
0x4802000C ├───────────────────────────┤
│ PTOR (WO) │ Write 1 to *toggle* PDOR bits.
0x48020010 ├───────────────────────────┤
│ PDIR (RO) │ Reads back the actual pin state.
0x48020014 ├───────────────────────────┤
│ PDDR (RW) │ Direction: 0 = input, 1 = output.
0x48020018 └───────────────────────────┘
("RW" = read+write, "WO" = write-only, "RO" = read-only)
Within a single register, each of the 32 bits maps to one pin on the port. For PDOR, bit 24 is the one wired to the LED:
PDOR @ 0x48020000
bit 31 bit 24 bit 0
│ │ │
v v v
┌─┬─┬─┬─┬─┬─┬─┬─┬─┬─┬─┬─┬─┬─┬─┬─┬─┬─┬─┬─┬─┬─┬─┬─┬─┬─┬─┬─┬─┬─┬─┬─┐
│ │ │ │ │ │ │ │L│ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │
└─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┘
L = LED (1 = on, 0 = off)
Setting bit 24 of PDOR turns the LED on; clearing it turns the LED off. In C, that looks roughly like:
volatile uint32_t *pdor = (uint32_t *)0x48020000;
*pdor |= (1 << 24); // LED on
*pdor &= ~(1 << 24); // LED off
You will rarely write code quite that raw in practice. Vendor-supplied software development kits (SDKs) wrap these registers in named structs and helper functions so you can write something like GPIO_PinWrite(GPIOA, 24, 1) instead. But underneath those abstractions, every peripheral interaction bottoms out in a load or store to a specific memory address.
How is multitasking done on MCUs?
Firmware will often do stuff blink an LED, read a sensor, respond to a button, and print out data, all “at the same time”…how?
The simplest and most common pattern is a super loop: one big while (1) inside main that checks each task in turn and does a bit of work for each one. It looks something like this:
int main(void) {
setup_everything();
while (1) {
update_led();
read_sensor_if_ready();
handle_uart();
}
}
As long as none of the individual tasks block for too long, each one gets serviced often enough to feel simultaneous. The blinky application you’ll meet in the next section is the most minimal version of this pattern: a single while (1) that toggles a GPIO pin and waits.
The super loop breaks down when something needs to happen right now, for example, reacting the microsecond a pulse arrives on a pin. For that, microcontrollers provide interrupts: hardware signals that pause whatever the CPU is doing, jump to a small handler function to deal with the event, and then resume the interrupted code. We’ll dedicate a later section to interrupts and timers, but the short version is that well-designed firmware usually combines both: a super loop doing the slow, steady work, and interrupts handling anything that is time-sensitive.
When the super loop stops scaling (many independent tasks, strict timing deadlines, multiple developers working in parallel), the next step up is a real-time operating system, or RTOS. An RTOS lets you write each task as if it had the CPU to itself and takes care of switching between them. The RTOS section of this training covers this briefly by introducing Zephyr.
TLDR
- Firmware is the software that runs directly on a microcontroller (MCU), typically with no general purpose operating system beneath it and tight limits on memory and CPU speed.
- An MCU starts executing from a fixed address in flash. Startup code sets up the stack, initializes memory, and eventually calls
main(), which never returns. - Peripherals are small, independent hardware blocks (GPIO, UART, timers, etc) that the CPU drives by reading and writing specific memory addresses. Every firmware operation eventually boils down to a load or store instruction (defined in the RV32I spec!).
- A single CPU core fakes multitasking through a super loop plus interrupts. When applications become too complex, people typically use an RTOS for better abstraction and task handling.
Blinky!
The classic “hello world” of firmware is getting a single LED to blink. It sounds trivial, but under the hood it touches a surprising number of the ideas from the previous section: pin muxing, clock configuration, memory-mapped peripherals, and the super loop. This section walks through building and understanding the blinky application provided in the quickstart repository.
Note
If you really want to have fun, it is recommended to download the RV32M1 reference manual and look through the related sections of the manual as you read through everything below.
The RV32M1 SDK
Blinking an LED by poking 0x48020000 directly (as we discussed in the previous section) works, but things will quickly get out of hand without better abstraction. As soon as you want a second GPIO, UART, timer, etc, you’re either re-reading the reference manual every session (it’s over 4000 pages!) or copy-pasting definitions across files. This is why chip vendors ship a software development kit (SDK): a collection of headers and drivers that wrap the raw peripheral registers in named structs and helper functions.
For the RV32M1, that SDK is the rv32m1-sdk.
The quickstart repository pulls this SDK in as a git submodule at vega-quickstart/rv32m1-sdk. If you cloned the quickstart without --recurse-submodules, the directory will be empty and every build will fail with “no such file” errors. To populate it, run the following from inside vega-quickstart:
git submodule update --init --recursive
Once populated, the layout underneath rv32m1-sdk/ looks roughly like this:
rv32m1-sdk/
├── devices/RV32M1/
│ ├── RV32M1_ri5cy.h CMSIS-style definitions for every peripheral
│ ├── system_RV32M1_ri5cy.c very early startup (SystemInit)
│ ├── gcc/startup_RV32M1_ri5cy.S reset handler and vector table
│ ├── drivers/ fsl_gpio, fsl_clock, fsl_lpuart, ...
│ └── utilities/ debug console, printf, logging
├── boards/rv32m1_vega/ board-specific pin maps and vendor examples
├── RISCV/ RISC-V specific intrinsics and CSR helpers
└── middleware/ FreeRTOS, USB stack, etc (we ignore this)
Note
Fun fact: The
fsl_prefix on every driver file is a legacy remnant of Freescale Semiconductor, a company NXP acquired in 2015. It stands for “Freescale Software Library” and persists in here since NXP originally maintained this SDK.
Peeking inside a driver
Although not totally necessary for you to follow the rest of the training, let’s trace one call from the blinky application down to the bare-metal register write we saw last section. This will help you understand how to read and interact the SDK source in the case that you want to develop your own applications in the future.
The application toggles the LED with:
GPIO_TogglePinsOutput(BOARD_LED_GPIO, 1u << BOARD_LED_GPIO_PIN);
BOARD_LED_GPIO is defined in the app’s own board.h as GPIOA, and BOARD_LED_GPIO_PIN is 24. The symbol GPIOA itself is defined deep in devices/RV32M1/RV32M1_ri5cy.h as:
#define GPIOA_BASE (0x48020000u)
#define GPIOA ((GPIO_Type *)GPIOA_BASE)
In English: GPIOA is just a pointer to a GPIO_Type struct laid out at address 0x48020000. The GPIO_Type struct is carefully declared so that its fields land exactly on top of the PDOR, PSOR, PCOR, PTOR, PDIR, and PDDR registers from the memory map (we looked at this in the previous section). Peek into drivers/fsl_gpio.h and the toggle helper is a single-line inline function:
static inline void GPIO_TogglePinsOutput(GPIO_Type *base, uint32_t mask) {
base->PTOR = mask;
}
So GPIO_TogglePinsOutput(GPIOA, 1u << 24) compiles down to exactly the same store we wrote by hand in the previous section: a single 32-bit write of 0x01000000 to address 0x4802000C. The SDK is not doing anything magical here. It is giving us names for the same bits. The same pattern holds for GPIO_PinInit, GPIO_SetPinsOutput, and GPIO_ClearPinsOutput: each one is a thin wrapper over a single register access, and the compiler inlines and optimizes the function away.
Note
As we can see, an SDK is really just a readability layer, not a runtime.
Everything you write eventually comes down to a load or store instruction against a fixed address, and when something misbehaves you can (and should) open the driver source and read what it actually does.
Writing our first program
Our first application will be a small blinky program, you can find it under vega-quickstart/apps/blinky/:
apps/blinky/
├── board.h pin and peripheral definitions
├── board.c pin mux, clock, and UART setup
└── main.c the main application logic and loop
board.c
Three functions are near the top of main.c, each of these come from board.c and are somewhat complex (relative to the rest of main.c).
BOARD_InitPins handles pin muxing. Physical pins on the RV32M1 package can be routed to several different peripherals: the same pin can act as GPIOA24, an analog input, or some set of other alternate functions. Selection is done by setting a handful of bits (the MUX bits) in the PORTA register block.
Note
You can find (a rather complicated) table of “pinouts” (definitions of all pin multiplexing options) in Section 23.3 of the reference manual.
Before we can drive the LED, we have to tell the chip that pin 24 of port A is a GPIO (as opposed of any of the other options). The function also enables the clock to PORTA and PORTC and routes PTC7 and PTC8 to LPUART0’s RX (receive) and TX (transmit) lines, since the debug console needs those pins.
BOARD_BootClockRUN configures the chip’s clock tree. Out of reset, the CPU is running from the internal Fast Frequency Internal Reference Clock (FIRC) oscillator.
For our simple blinky application, we don’t care that much about what speed the clock is running at. However, in most/all real applications you really must have clocks at a known speed and various peripheral dividers set up correctly; every peripheral, communication protocol, timer, and even the power utilized by the board relies on proper management and knowledge of clocks.
BOARD_InitDebugConsole points LPUART0’s clock source at the FIRC and hands the peripheral to the SDK’s debug console module at 115200 baud. Once this is done, any PRINTF() in the program will go out of pin PTC8 (LPUART0 TX) as serial data.
Note
“Baud” is the transmission/receiving rate of a serial interface in symbols per second. Because UART uses simple binary signaling where each symbol encodes exactly one bit, 115200 baud means 115200 bits per second (11520 bytes/characters per second) on the wire. A faster baud rate means faster communication between the two connected points.
Almost none of the the code in board.c is what you would write from scratch for every project. You can usually just write it once per board and then mostly ignore, which is exactly what we’ve done here.
Note
The version in
apps/blinky/board.cis a trimmed-down adaptation of the vendor example atrv32m1-sdk/boards/rv32m1_vega/driver_examples/gpio/led_output/ri5cy/, rewritten to keep only what blinky actually uses.
main.c
#include "board.h"
#include "fsl_debug_console.h"
#include "fsl_gpio.h"
static void delay(void) {
volatile uint32_t i;
for (i = 0; i < 800000; ++i)
__asm("NOP");
}
int main(void) {
gpio_pin_config_t led_config = { kGPIO_DigitalOutput, 0 };
BOARD_InitPins();
BOARD_BootClockRUN();
BOARD_InitDebugConsole();
PRINTF("\r\nRV32M1-VEGA RI5CY baremetal app\r\n");
GPIO_PinInit(BOARD_LED_GPIO, BOARD_LED_GPIO_PIN, &led_config);
PRINTF("Starting to blink LED...\r\n");
while (1) {
delay();
GPIO_TogglePinsOutput(BOARD_LED_GPIO, 1u << BOARD_LED_GPIO_PIN);
}
}
Before entering the loop, main runs three board-level setup calls (imported via board.h), prints a banner over the UART, and configures pin 24 of GPIOA as a digital output with an initial value of 0 (LED off). After that it loops forever, waiting a bit and then toggling the LED.
A few things are worth calling out:
- The
delay()function is a busy loop, not a real timer (we’ll learn more about those in a later section)delay()blocks the CPU in a tightforwith an inlineNOP. The inlineNOPensures the compiler doesn’t optimize the loop away- The
volatilequalifier oniis for the same reason: without it, an optimizing compiler might notice that nothing depends oniand delete the whole loop - As we noted earlier, busy-looping is a bad long-term habit (it wastes power and blocks the CPU from doing anything useful), but for a first program it’s the easiest way to provide delays between our toggles
- The
PRINTFmacro is not theprintffrom the C standard library (since we don’t have a standard library for our bare-metal code). It expands to the SDK’s ownDbgConsole_Printf, which writes bytes out over LPUART0 one at a time- LPUART0 is connected to the J12 USB port on the physical VEGAboard, we’ll learn how to view the serial output in later sections
The flow of execution
Now that we’ve seen every piece, we can tie them together into the full path a single blink takes:
- Reset brings the CPU up running from the Fast Frequency Internal Reference Clock (FIRC), executes the startup assembly in
startup_RV32M1_ri5cy.S, zeroes.bss, copies.data, and callsmain. BOARD_InitPinswrites to PORTA and PORTC mux registers so pin 24 is GPIO and pins PTC7/PTC8 are LPUART0.BOARD_BootClockRUNsets the system clock to 48 MHz through using FIRC.BOARD_InitDebugConsoleprepares LPUART0 so that subsequentPRINTFcalls can emit characters.GPIO_PinInitwrites1 << 24into GPIOA’s PDDR register, marking that pin as an output.- The main super loop runs forever:
delay()burns a few hundred thousand NOPs worth of cycles, thenGPIO_TogglePinsOutputwrites0x01000000to GPIOA’s PTOR register, which flips bit 24 of PDOR in hardware, in turn toggling the LED.
Step 6 is the entirety of our application logic doing “useful work”. Everything else is just (largely generic) initial setup.
Compiling our program
Turning main.c into something that can run on the VEGAboard is a multi-stage process:
- Compile each
.cinto an object file with a cross-compiler - Assemble the startup code, link everything against a linker script that knows the chip’s memory layout
- Finally, convert the ELF output into a raw binary for flashing
Building manually
If you wanted to build everything entirely by hand (we don’t recommend it), the invocation would look something like this (shortened for readability):
# From vega-quickstart/
SDK=rv32m1-sdk
DEV=$SDK/devices/RV32M1
BOARD=$SDK/boards/rv32m1_vega
riscv32-unknown-elf-gcc -march=rv32imc -O0 -g -ffreestanding -fno-builtin \
-DCPU_RV32M1_ri5cy -D__STARTUP_CLEAR_BSS \
-I apps/blinky -I $DEV -I $DEV/drivers -I $DEV/utilities \
-I $SDK/RISCV -I $SDK/devices \
-c apps/blinky/main.c -o main.o
# ... repeat for board.c, system_RV32M1_ri5cy.c, fsl_gpio.c, fsl_clock.c,
# fsl_msmc.c, fsl_lpuart.c, fsl_common.c, fsl_debug_console.c, etc.
riscv32-unknown-elf-gcc -march=rv32imc \
-c $DEV/gcc/startup_RV32M1_ri5cy.S -o startup.o
riscv32-unknown-elf-gcc -march=rv32imc \
-T $BOARD/driver_examples/gpio/led_output/ri5cy/riscvgcc/RV32M1_ri5cy_flash.ld \
-ffreestanding -nostdlib -Xlinker --gc-sections \
-Xlinker -z -Xlinker muldefs \
-o blinky.elf main.o board.o startup.o ... \
-Wl,--start-group -lm -lc -lgcc -lnosys -Wl,--end-group
riscv32-unknown-elf-objcopy -O binary blinky.elf blinky.bin
Note:
riscv32-unknown-elf-gccis a cross-compiler: it runs on your laptop but emits RISC-V instructions.- The
-march=rv32imcflag tells the compiler which subset of the RISC-V ISA to target: 32-bit base integer (i), multiply/divide (m), and compressed 16-bit encodings (c), which matches what the RI5CY core on the VEGA implements.
- The
-ffreestanding -fno-builtin -nostdlibtell GCC that no hosted C runtime exists. There is no operating system to provide memory allocators, a standard library, etc. The compiler must not assume that callingprintfcan reachstdout, and the linker must not pull in startup code from libc.- The linker script (
RV32M1_ri5cy_flash.ld) tells the linker where flash and RAM are located (in terms of memory addresses), which section goes where, and where the vector table has to be placed for the CPU to find it at reset.- Errors or incorrect addresses in the linker script may lead to immediate hard-faults the instant the board starts…such errors can be very hard to debug - try to use vendor-provided linker scripts whenever possible.
- The final
objcopycall strips the ELF formatting off the compiled code. The resulting.binis a flat dump of what the flash contents should look like.
Using the Makefile
Doing all of the above is not fun, especially since you need to do it every time you make changes and need to recompile. Thankfully, you don’t have to; the quickstart repository’s top-level Makefile wraps all of the above into a single command:
make blinky
The build output lands in build/blinky/:
build/blinky/
├── blinky.elf full ELF with debug info
├── blinky.bin flat binary
├── blinky.hex Intel HEX
├── main.o
├── board.o
└── ... (all the other .o and .d files)
Note
At the end of a successful build the Makefile also runs
riscv32-unknown-elf-sizeon the ELF, printing thetext,data, andbsssizes. Those numbers are useful as a rough check of size. For example, our bare blinky app should just be a few kilobytes oftextat most. If you suddenly see it balloon in size, something you may not have intended may have snuck in.
The Makefile is organized so that adding a new application is just a matter of dropping a new directory under apps/ with one or more .c files inside. Running make <appname> builds it, make flash-<appname> builds and flashes it (covered in the real-hardware section next), and make sim-<appname> builds and runs it inside Renode (covered in simulating hardware section later). If a given application needs different compiler flags or extra SDK drivers, you can add an apps/<name>/config.mk file to override the relevant variables without touching the top-level build rules.
From here on out, no need to run compilation commands by hand, just use make! However, now when you make <app> and it prints a wall of compile lines and a final size breakdown, you know exactly what each of those lines is doing and why - awesome!
TLDR
- The RV32M1 SDK ships as a git submodule at
vega-quickstart/rv32m1-sdk. Populate it withgit submodule update --init --recursivebefore building. - SDK helpers like
GPIO_TogglePinsOutputare thin inlined wrappers over a single register store. They give the raw peripheral bits readable names without adding runtime cost. - The blinky app is one big super loop: initialize pins, clocks, and the UART; then forever delay and toggle GPIOA pin 24. All three init calls (
BOARD_InitPins,BOARD_BootClockRUN,BOARD_InitDebugConsole) are board scaffolding you write once and ignore thereafter. - Building for the VEGAboard uses the
riscv32-unknown-elf-cross-toolchain with-march=rv32imcand a vendor-supplied linker script, producing an.elfand.bin. The top-levelMakefilewraps all of this behindmake <app>.
Connecting to real hardware
With blinky built (recalling make blinky produces a blinky.elf and blinky.bin under build/blinky/), let’s try to actually get that binary onto the VEGAboard and watch it run. This section walks through the three things you’ll do every time you iterate on firmware: connect the board, flash your built binary, and, optionally, attach a debugger.
Connecting
There are 2 physical connections and a button that are important to us for running code on the VEGA. All of them are highlighted in the left image below; the right image shows what a connected setup looks like.
The OpenSDA USB port (J12) is a standard Micro-USB. Plugged into your laptop, it provides power to the board and also exposes LPUART0 (the pins we wired up in BOARD_InitDebugConsole) as a virtual serial device. On Linux (and inside our container) it usually enumerates as /dev/ttyACM0. This is the where you’ll see output from PRINTF() statements going to your terminal.
The RISC-V JTAG header (J55) is a small ribbon connector. The RV32M1 contains several CPU cores (RI5CY, ZERO-RISCY, and an ARM core) and this header is wired specifically to the JTAG chain of the RI5CY RISC-V core we care about. You can connect a SEGGER J-Link debugger here. Mind the ribbon cable orientation: the red stripe marks pin 1, and the right image above shows the correct direction. JTAG is the protocol every tool in the rest of this section uses to halt the CPU, program flash, set breakpoints, and single-step.
Note
Want to learn more about JTAG? See the Diving into JTAG series by Memfault.
The reset button (SW1) is a momentary push that pulls the chip’s reset line low. Pressing and releasing it re-runs the startup assembly from startup_RV32M1_ri5cy.S and re-enters main. Handy when a freshly-flashed image misbehaves, or when you just want a clean starting state without yanking the USB cable.
Flashing
“Flashing” means copying our built code into the non-volatile flash memory inside the RV32M1 so it persists across resets and power cycles. The binary data we send will travel out the J-Link over JTAG; for this, we use the Open On-Chip Debugger (OpenOCD) software (which translates JTAG data into higher-level “write this word to that address” operations) to talk to the chip’s on-board flash controller.
By hand
Flashing by hand looks like this:
openocd -f support/openocd/openocd_rv32m1_vega_ri5cy.cfg \
-c "init" \
-c "halt" \
-c "ri5cy_boot" \
-c "flash write_image erase build/blinky/blinky.elf" \
-c "reset run" \
-c "exit"
Each -c passes one TCL command to openocd after it finishes loading the config file:
initopens the JTAG connection and initializes openocd’s internal state.haltstops the CPU. We can’t reprogram the flash bank a program is currently executing from.ri5cy_bootis a vendor-supplied TCL routine defined in the config file. It selects the RI5CY core as the active target, since the chip exposes multiple cores on the same JTAG chain.flash write_image erase ...erases the sectors that will be written, then programs the ELF’s loadable segments into flash.reset runpulses reset and lets the CPU begin executing the newly-flashed image.exitdisconnects and shuts openocd down cleanly.
Note
If you are curious what
ri5cy_bootactually does, seesupport/openocd/openocd_rv32m1_vega_ri5cy.cfg. It essentially just pokes the chip’s control registers to bring the RI5CY core out of reset.
Using the Makefile
As always, the top-level Makefile provides a wrapper for easier use:
make flash-blinky
This first rebuilds the app if any source has changed, then hands the resulting ELF to openocd with the same command shown above. The command generalizes: make flash-<app> works for any app under apps/.
Debugging
Flashing gets code onto the board, but the moment something doesn’t work (wrong LED, hung in a loop, unexpected fault) you may need to inspect what the CPU is actually doing to get to the bottom of it. The same J-Link that programs flash can also act as a live window into the CPU’s state: we can halt the CPU at any address, read and write memory, inspect registers, and single-step through instructions.
The setup has two parts:
openocdruns as a GDB server, translating GDB’s remote serial protocol into JTAG operations. By convention, it listens on TCP port 3333.riscv32-unknown-elf-gdbruns as the client on your laptop. It loads the ELF’s debug information (source line numbers, variable names, struct layouts) and connects to theopenocdGDB server port.
Note
GDB stand for the The GNU Project Debugger, it’s a fairly standard tool worth learning more about!
Note that we need a RISC-V build of GDB (
riscv32-unknown-elf-gdb) since the target is a RISC-V core. This is included in the project’s container image for you.
By hand
In one terminal, start the server:
openocd -f support/openocd/openocd_rv32m1_vega_ri5cy.cfg \
-c "init" -c "halt" -c "ri5cy_boot"
In a second terminal, launch GDB against the ELF and connect:
riscv32-unknown-elf-gdb build/blinky/blinky.elf \
-ex "target remote :3333"
Using the Makefile
As with flashing, the Makefile provides shortcuts:
make gdbserver # terminal 1
make gdb-blinky # terminal 2
gdbserver runs openocd with exactly the three commands above. gdb-<app> launches GDB on the corresponding ELF and auto-connects to :3333.
A quick GDB cheat sheet
Once GDB is attached, here are some commands you’ll find useful:
monitor reset halt: reset the chip and stop at the first instructionbreak main(orb main): set a breakpoint atmaininfo breakpoints: show all breakpointsdelete <num>(d <num>): delete a breakpointcontinue(c): run until the next breakpoint or signalstep(s): advance one source line, descending into function callsnext(n): advance one source line, stepping over callsstepi(si): advance one machine instructionprint <expr>(p): evaluate a C expression.p *(GPIO_Type *)0x48020000dumps every field of the GPIOA register block/structinfo registers: show all CPU registersdisassemble <function>- show disassembly (machine code instructions) of functionload: reprogram flash through the debugger
Warning
Running
monitor reset haltwill leave OpenOCD and the RI5CY core out of sync. To reset cleanly from inside GDB, use the following sequence:(gdb) monitor reset halt (gdb) monitor ri5cy_boot (gdb) monitor halt (gdb) load (gdb) tbreak main (gdb) c
For blinky specifically, a quick sanity check is to b GPIO_TogglePinsOutput, continue, and confirm the breakpoint fires once per blink.
Viewing serial output
Debugging through GDB is great, but sometimes using print statements over UART is just less hassle. Anything the board writes over LPUART0 comes back on the OpenSDA USB (J12). The Makefile provides a helper for this:
make serial
This opens minicom at 115200 baud (matching the rate BOARD_InitDebugConsole configured) against the first /dev/ttyACM*, /dev/ttyUSB*, or /dev/cu.usbmodem* it finds. After flashing blinky you should see the RV32M1-VEGA RI5CY baremetal app banner and then Starting to blink LED....
TLDR
- You need to know about 2 connections and a button: OpenSDA USB (J12) for power and serial, RISC-V JTAG (J55) for flashing and debugging, and reset (SW1) to restart the chip.
- Flashing uses
openocdto write our program into the chip’s flash.- You can use
make flash-<app>to flash any app.
- You can use
- Debugging is a two-process setup:
openocdas a GDB server on port 3333, andriscv32-unknown-elf-gdbas the client.- You can use
make gdbserverin one terminal plusmake gdb-<app>in another.
- You can use
make serialopens the virtual UART at 115200 baud, which is where everyPRINTFin your application ends up.
Simulating hardware
Real hardware is great…but sometimes slow to iterate on and not readily accessible. To talk with real hardware, you have to rebuild your firmware, flash it over a debug probe, and then poke at the board (blinking an LED, squinting at a serial terminal, etc) to see whether anything is working. A simulator sidesteps all of this. It runs your firmware on a virtual copy of the board, entirely inside your computer, and gives you direct visibility into what the code is doing. Most importantly, you can also simulate a board you don’t physically own!
This training uses Renode, an open-source simulator from Antmicro that can model full embedded systems (CPU, memory, and peripherals). Renode already ships with a basic platform description for the VEGAboard, so we can run firmware on it out of the box.
Note
It turns out the platform description provided in the official Renode repository is not very complete. The quickstart repository has custom definitions under
support/renode. We won’t go into detail about these files, but feel free read to read through them if interested in learning more about Renode.
Renode basics
Before we run anything, it helps to know three Renode concepts:
- A platform description (
.replfile) lists the virtual hardware: what CPU, how much memory, which peripherals live at which addresses. Renode includes a basic one for the VEGAboard at platforms/boards/vegaboard_ri5cy.repl.- The quickstart repository for these labs contains a more advanced one under
support/renode
- The quickstart repository for these labs contains a more advanced one under
- A Renode script (
.rescfile) is a small recipe that builds a machine from a.repl, loads firmware into it, and wires up things like UART windows. Renode also includes scripts/single-node/vegaboard_ri5cy.resc that does this for our board. - The monitor is Renode’s interactive prompt. When you start Renode you land in the monitor and type commands such as
start,pause, orshowAnalyzerto drive the simulation.
You don’t need to write any of this from scratch for the VEGAboard. The quickstart repository does all the setup for. You’ll primarily just be running a handful of monitor commands to explore the system as it runs.
Running blinky in Renode
Let’s jump right into it! We’ll be simulating the same blinky app we developed in previous sections.
First, make sure blinky has been built. From the repository root, inside the dev container, run:
make blinky
This produces the ELF file Renode will load:
build/blinky/blinky.elf
From the same shell inside the container, launch Renode:
renode
Note
If you see a line like
Couldn't start UI - falling back to console mode, that’s fine. It just means Renode didn’t find a graphical display (typical when you’re inside a container over SSH or a remote connection). The monitor prompt works identically either way.
At the (monitor) prompt, point Renode at your ELF file and include the bundled VEGAboard script:
(monitor) $bin=@/workspaces/vega-quickstart/build/blinky/blinky.elf
(monitor) include @/workspaces/vega-quickstart/support/renode/vegaboard_ri5cy.resc
The first line sets a variable named $bin that the platform Renode script reads to decide which binary to load. The second line executes the Renode script, which creates the virtual machine, wires up the CPU and peripherals, loads your firmware into flash, and calls the monitor command showAnalyzer lpuart0 for you so the simulated UART is attached to your terminal from the start. After it finishes you’ll see a new prompt that reflects the name of the machine:
(Vegaboard-RI5CY)
Now start the emulation. start and its one-letter alias s both work:
(Vegaboard-RI5CY) start
You should see the UART output from main.c appear on the terminal:
lpuart0: RV32M1-VEGA RI5CY baremetal app
lpuart0: Starting to blink LED...
That’s the PRINTF() calls in main.c writing to lpuart0, which the bundled script routed to your terminal. If you don’t see them, double-check the ELF path in $bin.
To stop cleanly, pause and quit the monitor:
(Vegaboard-RI5CY) pause
(Vegaboard-RI5CY) quit
Watching the LED
The quickstart’s platform at support/renode/vegaboard_ri5cy_platform.repl properly defines GPIO ports and the logic backing them (via the NXP_GPIO peripheral in support/renode/NXP_GPIO.cs).
Thus, writes land in a real register inside Renode and the LED’s output state is observable from the monitor.
As discussed in earlier sections, the LED is on GPIOA pin 24 (see BOARD_LED_GPIO and BOARD_LED_GPIO_PIN in board.h). GPIOA lives at 0x48020000; the PDOR register (the current output latch) is at offset 0x00.
While the simulation is running, sample it from the monitor:
(Vegaboard-RI5CY) sysbus ReadDoubleWord 0x48020000
Run it a few times. The value alternates between 0x00000000 and 0x01000000 as the firmware toggles pin 24. Bit 24 set is the LED on; clear is off.
For a live view of every access gpioa receives, turn on global peripheral access logging before start (or pause first):
(Vegaboard-RI5CY) sysbus LogAllPeripheralsAccess true
Each iteration of the while (1) loop in main.c then prints a line showing a write of 0x01000000 to offset 0x0C on gpioa (PTOR, the toggle register). The rate at which those writes appear is your blink rate. Pause the simulation, change the iteration count in delay(), rebuild, and watch the log speed up or slow down. The logging is noisy but useful while you’re learning, since every read and write the CPU makes shows up with its address, value, and the address of the instruction that issued it.
A faster workflow
Doing all of the commands above will quickly become tedious.
Luckily for you, we’ve defined a helper in the base Makefile that automates
launching Renode and loading the specified application in a single command.
To launch and simulate the blinky application, simply run:
make sim-blinky
This will take you straight from a shell prompt to a running simulation of blinky.
Going forward, any application we write can be simulated by simply running make sim-<app>!
Challenge
TODO: challenge is to configure the blue and green pins of the on-board RGB LED and update the main loop to toggle all 3 in a sequence (red-wait-green-wait-blue-wait-repeat).