Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Introduction

RISC-V Embedded Systems Training
VEGA edition
BTA Design Services
Between Layers

Overview

TBD

What will you learn?

The goal of these sessions is to teach you the following:

  • How to setup a modern containerized embedded systems development environment
  • The basics of RISC-V firmware development
  • Simulating hardware using Renode
  • The basics of real-time operating systems (RTOS) and Zephyr

In the end, the hope is that you gain fundamental generalizable knowledge relating to the development of firmware for microcontroller-based systems.

Development board giveaways

In addition to the above, we’ll be giving away up to 15 VEGAboards for free to participants throughout the sessions. Make sure to attend!

OpenISA Vegaboard

Training schedule and agenda

  • Location: TBD
  • Times:
    • Day TBD: time TBD

Contact information

Instructors:

  • Alfredo Herrera | aherrera (at) alean-tec.com
  • Yusef Karim | yusef (at) betweenlayers.io
  • Mathieu Gagnon | TBD (at) TBD

Why RISC-V?

RISC-V is an open, royalty-free Instruction Set Architecture (ISA) originally developed at UC Berkeley in 2010. Unlike ARM or x86, anyone can implement a RISC-V processor without licensing fees or legal agreements, and anyone can read the official specification without signing an NDA.

That openness has turned RISC-V into something much larger than a research or education project. It has crept into virtually every corner of industry. NVIDIA alone ships over a billion RISC-V cores embedded throughout its hardware stack, a number that sounds absurd until you see how pervasively they use it across GPUs, embedded controllers, and security subsystems. If you use a modern SSD, GPU, or cloud service, you may be running RISC-V without even knowing it.

The ecosystem that has grown around the specification is what makes RISC-V more than just another ISA. RISC-V International, the non-profit Swiss foundation that governs the standard, coordinates hundreds of member organizations, from chip vendors and hyperscalers to universities and independent engineers, all of whom have an equal seat at the table. The specification itself evolves in the open on public mailing lists and GitHub repositories, in contrast to the closed processes behind proprietary ISAs.

Around that core specification, a layer of open-source silicon has emerged. The OpenHW Group develops production-quality, verification-heavy RISC-V cores (the CV32E40P and CVA6 families among them) that companies can integrate into real products. lowRISC, a non-profit based in Cambridge, maintains Ibex, a small 32-bit core that has found its way into everything from educational FPGA boards to the security subsystems of large-scale infrastructure. These are not toy designs; they are built, tested, and shipped in production hardware (for example, in Google Chromebooks).

The security research community has also adopted RISC-V as its testbed of choice, precisely because the ISA is hackable in the best sense: you can modify it, extend it, and tape out your ideas. The RISC-V Platform Security Model Specification is a framework for hardware-level security primitives, covering physical memory protection, trusted execution environments, and attestation. At the research frontier, CHERI (Capability Hardware Enhanced RISC Instructions), and specifically CHERIoT (its adaptation for microcontrollers), is implemented as a RISC-V extension. CHERIoT runs on real hardware today, including on the Sonata FPGA board, which is built around the Ibex core. That same Ibex core powers Google’s OpenTitan root-of-trust chip.

So when you learn RISC-V, you are not learning a niche curiosity. The ISA we will be using in this training relates to research, silicon that ships in billions of devices, and secures real systems.

At a high-level, RISC-V is:

  1. Open: a royalty-free Instruction Set Architecture (ISA) originally developed at UC Berkeley in 2010 and now governed by an open and transparent non-profit foundation (RISC-V International).
  2. Community driven: primarily driven and evolved through open-source specifications and implementations, allowing individuals (like us) and industry to freely contribute and continually advance it.
  3. Modular: The RV32I for 32-bit (what our development board supports) and RV64I for 64-bit base instruction sets are the minimum for each implementation. After that, there are over 100 ratified extensions to pick and choose from when designing real-world hardware.

Note

You may be thinking that writing software for an architecture supporting 100+ extensions would be a comparative nightmare. To avoid such nightmares, RISC-V has developed something called profiles which “are named groupings of standard processor ISA bases plus extensions (each identified as Mandatory or Optional)”.

RISC-V was shaped by decades of lessons from earlier architectures (MIPS, SPARC, Alpha, ARM), and its designers made deliberate choices that directly affect how simple our hardware implementation can be. The RISC-V Reader summarizes this beautifully in its very first chapter by considering many aspects important to ISA design such as cost, simplicity, performance, isolation of architecture from implementation, room for growth, program size, and ease of programming. If you want to learn more about the RISC-V ISA on your own as we progress, it is highly recommended to pick up a copy of the RISC-V Reader!

All-in-all, RISC-V is here to stay, it is open, and growing fast. We hope this motivates you to learn and continue this training with us!

Development environment

Overview

Embedded development has a reputation for being fiddly to set up. You typically need a specific compiler (one that produces code for your microcontroller rather than your laptop), a program that talks to the debug probe on the board, a handful of supporting libraries, and sometimes a simulator. Getting all of the above installed correctly can sometimes become a nightmare! To avoid this, this training will use a containerized development environment.

Note

A container is a lightweight, isolated Linux environment that runs on top of your own operating system (your “host”). Similar to virtual machines, you can think of it as an isolated box that you can use to run and install custom software inside without affecting your host system.

We’ll provide a custom pre-built container image for this training that contains everything you’ll need.

The container image is described by a single Containerfile in the vega-quickstart repository. You will never need to read through or edit it by hand (but feel free to take a look to learn more). The Visual Studio Code Dev Containers extension handles everything: when you open the vega-quickstart folder in Visual Studio Code, it pulls the container (the first time only, which takes a few minutes), starts it, makes your project folder visible inside it, and attaches the editor to a shell running in it. From your point of view, you are just editing files and using a terminal as usual; under the hood those actions are happening inside the container.

What is inside the container?

The container is built on Ubuntu 24.04 (a common Linux distribution). On top of that base, it includes three main pieces:

  • Custom RISC-V compiler suite, often called a toolchain: the compiler, linker, and related tools that turn C source code into a binary the VEGAboard can execute. We use the prebuilt toolchain from OpenISA, configured for the rv32i instruction set (a minimal 32-bit RISC-V variant, which is what the VEGAboard’s cores implement).
  • Renode, a simulator that can virtually emulate a VEGAboard. This lets you run and debug your firmware without any physical hardware attached, which is handy for getting started and for experimenting.
  • OpenOCD, the program that communicates with the debug probe on the board. A debug probe is the small circuit, built into the VEGAboard, that lets your computer load firmware onto the chip and step through the running code.

Alongside the above, the image contains some additional utilities (make, git, vim, minicom, etc) and creates a regular, non-administrator user called dev that you will be logged in as when you open a terminal.

How VS Code ties it together

The devcontainer.json file under the .devcontainer directory tells VS Code how to launch the container. Two details are worth knowing about:

  • The container is started in privileged mode so that USB devices on your host (importantly, the debug probe on the VEGAboard) are visible inside it
  • VS Code automatically installs a small set of extensions inside the container for you: C/C++ tooling, Makefile support, CMake highlighting, GitLens, a spell checker, and XML/YAML helpers. You do not need to install any of these yourself.
    • These extensions are also defined in devcontainer.json, feel free to add additional extensions that you typically use

Host requirements

Your host machine only needs four things:

  • A container runtime - this is the program that actually runs containers
  • Visual Studio Code
  • The Dev Containers extension for VS Code
  • Git - the standard version control tool which you’ll use to pull down the quickstart template (see the next section)

Quickstart template

Once you’ve installed the above, clone the vega-quickstart repository to your machine:

git clone https://github.com/between-layers/vega-quickstart.git

Open the cloned repository folder in VS Code, and accept the prompt to “Reopen in Container”. After a few minutes, VS Code should drop you into a terminal inside the container. From there you can edit code, run make to build firmware, run Renode to simulate the board, or (if your host is set up for it) connect to the real VEGAboard over USB. When you are done, closing VS Code shuts the container down; any code changes stay on your host.

The rest of this page covers the host-specific setup that the container cannot handle on its own. Most of it is about giving the container permission to see the VEGAboard’s USB connection.

Additional host-specific help

Linux

Install Docker Engine (or Podman) and VS Code using your distribution’s package manager or the upstream instructions. It may be worth adding your user to the docker group so that VS Code can talk to the container runtime without asking for a password every time (see the official Docker post-installation steps for more info).

Finally, USB permissions will need one small tweak. If connectiong to the physical VEGAboard, you’ll like be using the Segger J-Link debug probe provided in the box. This device identifies itself to your computer with USB vendor ID 1366. By default Linux only lets the root user open such devices, which is a problem because the container runs as the unprivileged dev user. The fix is a udev rule: a one-line configuration that tells the kernel to make the device readable and writable by everyone on the machine.

To create the udev rule, run the following commands:

sudo tee /etc/udev/rules.d/99-jlink.rules <<'EOF'
SUBSYSTEM=="usb", ATTR{idVendor}=="1366", MODE="0666"
EOF

sudo udevadm control --reload-rules
sudo udevadm trigger

Unplug and replug the VEGAboard after applying the rule. You should then be able to flash and debug from inside the container.

Note

USB device visibility inside containers can sometimes be finicky. If you run into issues connecting to your board:

  • Double check the board is plugged in properly
  • See if your host recognizes the device by running lsusb or ls /dev/tty*
  • If the device is there on the host, but not in the container - try restarting the container with docker restart <container-name>

Tip

If you are using Podman instead of Docker, make sure its socket is enabled (systemctl --user enable --now podman.socket) and point the Dev Containers extension at it via the dev.containers.dockerPath setting.

MacOS

Install Docker Desktop for Mac and VS Code. Grant it full disk access if it prompts you, otherwise it cannot share your project folder with the container.

There is one thing that the container cannot do on macOS: talk to the VEGAboard over USB. Docker Desktop’s internal VM does not expose your Mac’s USB ports, so the debug probe is invisible from inside the container. You can still do all editing, building, and simulation inside the container, exactly like on Linux and Windows. However, if you have a physical board and want to flash it, you will install the OpenISA SDK natively on macOS and run OpenOCD from there.

ARM-based

One additional exception for MacOS is if you’re using a newer Apple Silicon (M1 and newer) machine. Both the container image and the OpenISA macOS bundle are Intel (x86_64). Luckily, macOS can run them transparently through Apple’s Rosetta 2 translation layer.

Install it once with:

softwareupdate --install-rosetta --agree-to-license

Then open Docker Desktop’s settings and enable “Use Rosetta for x86/amd64 emulation on Apple Silicon” under General.

Note

Builds and simulations may be noticeably slower than on an Intel Mac or a native Linux machine due to emulation, but everything should still work.

Installing the OpenISA SDK (for flashing real hardware)

OpenISA publishes a macOS bundle that contains their prebuilt toolchain and a working OpenOCD. You only need this on macOS, and only for flashing.

  1. From the open-isa.org v1.0.0 release, download these two files into a working directory of your choice (for example ~/rv32m1/):

    • Toolchain_Mac.tar.gz
    • rv32m1_sdk_riscv_installer.sh
  2. Run the SDK installer from that same directory. It will unpack both the SDK and the toolchain/OpenOCD side by side:

    cd ~/rv32m1
    chmod +x rv32m1_sdk_riscv_installer.sh
    ./rv32m1_sdk_riscv_installer.sh
    
  3. Add OpenOCD to your shell’s PATH so you can invoke it from any terminal. Adjust the path if the installer puts it somewhere different on your system:

    echo 'export PATH="$HOME/rv32m1/Toolchain_Mac/riscv32-unknown-elf-gcc/openocd/bin:$PATH"' >> ~/.zshrc
    source ~/.zshrc
    openocd --version
    

When you are ready to flash, open a native macOS terminal (not the one inside the container) in the same project folder and run the project’s flash target, or invoke openocd directly with the project’s .cfg. The container and the host share the folder, so the firmware you just built inside the container is already visible from your Mac terminal.

Windows

Install Docker Desktop for Windows with the WSL2 backend (this is the default in recent versions). WSL2, short for Windows Subsystem for Linux 2, is a lightweight Linux VM that Windows ships with; Docker Desktop uses it to run containers. You will also want VS Code and the Dev Containers extension installed on the Windows side. The extension launches the container inside the WSL2 VM and then connects to it.

For USB access to the VEGAboard, install usbipd-win, a small open-source tool that forwards USB devices from Windows into a WSL2 environment. After installing it, open an elevated PowerShell window and run:

usbipd list
usbipd bind --busid <BUSID>
usbipd attach --wsl --busid <BUSID>

Replace <BUSID> with the identifier shown for the J-Link device in the output of usbipd list (look for vendor ID 1366). You will need to re-run the attach command each time you unplug the board or reboot. Once attached, running lsusb inside the container should list the probe.

Firmware development

This section introduces the primary focus of this training: firmware development on RISC-V based microcontrollers.

You’ll learn:

  • The main concepts relating to firmware development in general
  • How to write your first firmware program to blink and LED on the VEGAboard
  • How to virtually simulate the VEGAboard and run your programs in simulation (directly on your laptop!)
  • How to flash and run your programs on a real physical VEGAboard

Wait, what is firmware?

Firmware is the software that runs directly on a microcontroller.

Note

Wait, what is a microcontroller?

A microcontroller unit (MCU) is a whole tiny computer packed onto a single chip: a CPU, a small amount of memory (both flash for code and RAM for data), and a fixed set of peripherals, all sharing the same piece of silicon. The VEGAboard’s main chip, the RV32M1, is one example.

Different from the microprocessor in your laptop, which only handles the CPU part and relies on separate chips for RAM, storage, and I/O. Because an MCU has everything on-board (and typically has it in much smaller proportions), it can be small, cheap, and low-power enough to live inside a thermostat, a car’s door lock, a pair of headphones, etc - all of which run one (or a handful of) small dedicated fixed programs (i.e., the firmware).

Firmware is the code that lives in the chip’s flash memory, starts running the instant power is applied, and continues executing until power is removed (or the system crashes ;D). Unlike a desktop application, it sometimes has no operating system underneath it at all. It is simply a program that talks to hardware.

Because the hardware is so much smaller than a laptop (often a few hundred kilobytes of flash, tens of kilobytes of RAM, and a single CPU running in the tens of megahertz), firmware is written with those constraints in mind. At times, every byte of memory needs to be minimized, every clock cycle accounted for, and the program has to handle everything itself: setting up the chip after reset, reacting to signals from the outside world, and keeping track of time.

The sections below walk through the three ideas that set firmware apart from “regular” software: how a program starts when there is no operating system to launch it, how a single CPU juggles many things at once, and how the code actually interacts the physical world around it.

How execution starts

On a desktop, your operating system loads your program into memory, sets up its stack, and calls main(). On a bare-metal microcontroller, there is no operating system to do any of that. The chip has to bring itself up from nothing.

When the VEGAboard powers on (or you press reset), the CPU begins executing from a fixed, known address in flash. Typically, the very first thing it finds there is the vector table: a small array of addresses that tell the CPU where to jump for important events, with the very first entry being the reset handler. The reset handler is just a function, usually written in a mix of assembly and C, and its job is to prepare the chip to run your code.

That preparation does a few things in order:

  1. Set up the stack pointer so the CPU has somewhere to store local variables and return addresses.
  2. Copy any initialized global variables (the .data section) from flash into RAM, since RAM starts out with undefined contents.
  3. Zero out uninitialized globals (the .bss section), so variables declared without an initializer start at 0.
  4. Optionally configure the chip’s clocks, caches, and other essentials.
  5. Finally, call main().

Only after all of the above does your main() function actually start running. And unlike on a desktop, main() on a microcontroller almost never returns. There is nothing for it to return to. Instead it typically ends with an infinite while (1) loop that does the real work forever (as you gain experience in this area, you’ll learn that busy looping forever is typically a bad idea and that’s where things like deep sleep and time-based scheduling comes in).

Note

If you want to see this process in full detail (for a different chip, but with the same ideas), Memfault’s Zero to main() series walks through every step of startup code, from the reset vector to the first line of main.

What is a peripheral?

Doing random computation is great and all, but how can computation on something like the VEGAboard result in sensing or actuation in the real physical world?

A CPU on its own can add numbers and move data around in memory, but it cannot blink an LED, send a byte over a wire, or sample a voltage. Those jobs are handled by peripherals: dedicated hardware blocks that sit next to the CPU inside the microcontroller. Typical peripherals include GPIO (general-purpose I/O pins), UART (serial communication), SPI and I2C (for talking to external chips), timers, and ADCs (analog-to-digital converters).

One thing to note about peripherals is that they run independently of the CPU. Once you configure a UART peripheral and hand it a byte to transmit, it shifts the bits out on its own while the CPU goes off to do something else. In that sense, a microcontroller is really a small CPU surrounded by a dozen tiny, single-purpose coprocessors.

The way the CPU talks to these peripherals is called memory-mapped I/O. Each peripheral has a block of addresses reserved for it in the chip’s address space, and within that block sit a handful of registers, each controlling one aspect of the peripheral. Writing to an address directly changes a peripheral’s behaviour. Reading an address gives you a peripheral’s current state.

For example, the VEGAboard’s LED is connected to pin 24 of GPIO port A. The GPIOA peripheral lives at address 0x48020000 and exposes six 32-bit registers back-to-back in memory:

GPIOA @ 0x48020000

0x48020000  ┌───────────────────────────┐
            │ PDOR  (RW)                │  Output latch: 1 bit per pin.
0x48020004  ├───────────────────────────┤
            │ PSOR  (WO)                │  Write 1 to *set* PDOR bits.
0x48020008  ├───────────────────────────┤
            │ PCOR  (WO)                │  Write 1 to *clear* PDOR bits.
0x4802000C  ├───────────────────────────┤
            │ PTOR  (WO)                │  Write 1 to *toggle* PDOR bits.
0x48020010  ├───────────────────────────┤
            │ PDIR  (RO)                │  Reads back the actual pin state.
0x48020014  ├───────────────────────────┤
            │ PDDR  (RW)                │  Direction: 0 = input, 1 = output.
0x48020018  └───────────────────────────┘

("RW" = read+write, "WO" = write-only, "RO" = read-only)

Within a single register, each of the 32 bits maps to one pin on the port. For PDOR, bit 24 is the one wired to the LED:

PDOR @ 0x48020000
   bit 31        bit 24                                          bit 0
    │             │                                               │
    v             v                                               v
   ┌─┬─┬─┬─┬─┬─┬─┬─┬─┬─┬─┬─┬─┬─┬─┬─┬─┬─┬─┬─┬─┬─┬─┬─┬─┬─┬─┬─┬─┬─┬─┬─┐
   │ │ │ │ │ │ │ │L│ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │
   └─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┘
                      L = LED (1 = on, 0 = off)

Setting bit 24 of PDOR turns the LED on; clearing it turns the LED off. In C, that looks roughly like:

volatile uint32_t *pdor = (uint32_t *)0x48020000;
*pdor |= (1 << 24);   // LED on
*pdor &= ~(1 << 24);  // LED off

You will rarely write code quite that raw in practice. Vendor-supplied software development kits (SDKs) wrap these registers in named structs and helper functions so you can write something like GPIO_PinWrite(GPIOA, 24, 1) instead. But underneath those abstractions, every peripheral interaction bottoms out in a load or store to a specific memory address.

How is multitasking done on MCUs?

Firmware will often do stuff blink an LED, read a sensor, respond to a button, and print out data, all “at the same time”…how?

The simplest and most common pattern is a super loop: one big while (1) inside main that checks each task in turn and does a bit of work for each one. It looks something like this:

int main(void) {
    setup_everything();
    while (1) {
        update_led();
        read_sensor_if_ready();
        handle_uart();
    }
}

As long as none of the individual tasks block for too long, each one gets serviced often enough to feel simultaneous. The blinky application you’ll meet in the next section is the most minimal version of this pattern: a single while (1) that toggles a GPIO pin and waits.

The super loop breaks down when something needs to happen right now, for example, reacting the microsecond a pulse arrives on a pin. For that, microcontrollers provide interrupts: hardware signals that pause whatever the CPU is doing, jump to a small handler function to deal with the event, and then resume the interrupted code. We’ll dedicate a later section to interrupts and timers, but the short version is that well-designed firmware usually combines both: a super loop doing the slow, steady work, and interrupts handling anything that is time-sensitive.

When the super loop stops scaling (many independent tasks, strict timing deadlines, multiple developers working in parallel), the next step up is a real-time operating system, or RTOS. An RTOS lets you write each task as if it had the CPU to itself and takes care of switching between them. The RTOS section of this training covers this briefly by introducing Zephyr.

TLDR

  • Firmware is the software that runs directly on a microcontroller (MCU), typically with no general purpose operating system beneath it and tight limits on memory and CPU speed.
  • An MCU starts executing from a fixed address in flash. Startup code sets up the stack, initializes memory, and eventually calls main(), which never returns.
  • Peripherals are small, independent hardware blocks (GPIO, UART, timers, etc) that the CPU drives by reading and writing specific memory addresses. Every firmware operation eventually boils down to a load or store instruction (defined in the RV32I spec!).
  • A single CPU core fakes multitasking through a super loop plus interrupts. When applications become too complex, people typically use an RTOS for better abstraction and task handling.

Blinky!

The classic “hello world” of firmware is getting a single LED to blink. It sounds trivial, but under the hood it touches a surprising number of the ideas from the previous section: pin muxing, clock configuration, memory-mapped peripherals, and the super loop. This section walks through building and understanding the blinky application provided in the quickstart repository.

Note

If you really want to have fun, it is recommended to download the RV32M1 reference manual and look through the related sections of the manual as you read through everything below.

The RV32M1 SDK

Blinking an LED by poking 0x48020000 directly (as we discussed in the previous section) works, but things will quickly get out of hand without better abstraction. As soon as you want a second GPIO, UART, timer, etc, you’re either re-reading the reference manual every session (it’s over 4000 pages!) or copy-pasting definitions across files. This is why chip vendors ship a software development kit (SDK): a collection of headers and drivers that wrap the raw peripheral registers in named structs and helper functions.

For the RV32M1, that SDK is the rv32m1-sdk. The quickstart repository pulls this SDK in as a git submodule at vega-quickstart/rv32m1-sdk. If you cloned the quickstart without --recurse-submodules, the directory will be empty and every build will fail with “no such file” errors. To populate it, run the following from inside vega-quickstart:

git submodule update --init --recursive

Once populated, the layout underneath rv32m1-sdk/ looks roughly like this:

rv32m1-sdk/
├── devices/RV32M1/
│   ├── RV32M1_ri5cy.h               CMSIS-style definitions for every peripheral
│   ├── system_RV32M1_ri5cy.c        very early startup (SystemInit)
│   ├── gcc/startup_RV32M1_ri5cy.S   reset handler and vector table
│   ├── drivers/                     fsl_gpio, fsl_clock, fsl_lpuart, ...
│   └── utilities/                   debug console, printf, logging
├── boards/rv32m1_vega/              board-specific pin maps and vendor examples
├── RISCV/                           RISC-V specific intrinsics and CSR helpers
└── middleware/                      FreeRTOS, USB stack, etc (we ignore this)

Note

Fun fact: The fsl_ prefix on every driver file is a legacy remnant of Freescale Semiconductor, a company NXP acquired in 2015. It stands for “Freescale Software Library” and persists in here since NXP originally maintained this SDK.

Peeking inside a driver

Although not totally necessary for you to follow the rest of the training, let’s trace one call from the blinky application down to the bare-metal register write we saw last section. This will help you understand how to read and interact the SDK source in the case that you want to develop your own applications in the future.

The application toggles the LED with:

GPIO_TogglePinsOutput(BOARD_LED_GPIO, 1u << BOARD_LED_GPIO_PIN);

BOARD_LED_GPIO is defined in the app’s own board.h as GPIOA, and BOARD_LED_GPIO_PIN is 24. The symbol GPIOA itself is defined deep in devices/RV32M1/RV32M1_ri5cy.h as:

#define GPIOA_BASE  (0x48020000u)
#define GPIOA       ((GPIO_Type *)GPIOA_BASE)

In English: GPIOA is just a pointer to a GPIO_Type struct laid out at address 0x48020000. The GPIO_Type struct is carefully declared so that its fields land exactly on top of the PDOR, PSOR, PCOR, PTOR, PDIR, and PDDR registers from the memory map (we looked at this in the previous section). Peek into drivers/fsl_gpio.h and the toggle helper is a single-line inline function:

static inline void GPIO_TogglePinsOutput(GPIO_Type *base, uint32_t mask) {
    base->PTOR = mask;
}

So GPIO_TogglePinsOutput(GPIOA, 1u << 24) compiles down to exactly the same store we wrote by hand in the previous section: a single 32-bit write of 0x01000000 to address 0x4802000C. The SDK is not doing anything magical here. It is giving us names for the same bits. The same pattern holds for GPIO_PinInit, GPIO_SetPinsOutput, and GPIO_ClearPinsOutput: each one is a thin wrapper over a single register access, and the compiler inlines and optimizes the function away.

Note

As we can see, an SDK is really just a readability layer, not a runtime.

Everything you write eventually comes down to a load or store instruction against a fixed address, and when something misbehaves you can (and should) open the driver source and read what it actually does.

Writing our first program

Our first application will be a small blinky program, you can find it under vega-quickstart/apps/blinky/:

apps/blinky/
├── board.h    pin and peripheral definitions
├── board.c    pin mux, clock, and UART setup
└── main.c     the main application logic and loop

board.c

Three functions are near the top of main.c, each of these come from board.c and are somewhat complex (relative to the rest of main.c).

BOARD_InitPins handles pin muxing. Physical pins on the RV32M1 package can be routed to several different peripherals: the same pin can act as GPIOA24, an analog input, or some set of other alternate functions. Selection is done by setting a handful of bits (the MUX bits) in the PORTA register block.

Note

You can find (a rather complicated) table of “pinouts” (definitions of all pin multiplexing options) in Section 23.3 of the reference manual.

Before we can drive the LED, we have to tell the chip that pin 24 of port A is a GPIO (as opposed of any of the other options). The function also enables the clock to PORTA and PORTC and routes PTC7 and PTC8 to LPUART0’s RX (receive) and TX (transmit) lines, since the debug console needs those pins.

BOARD_BootClockRUN configures the chip’s clock tree. Out of reset, the CPU is running from the internal Fast Frequency Internal Reference Clock (FIRC) oscillator.

Clock diagram
RV32M1 Reference Manual Figure 28-1

For our simple blinky application, we don’t care that much about what speed the clock is running at. However, in most/all real applications you really must have clocks at a known speed and various peripheral dividers set up correctly; every peripheral, communication protocol, timer, and even the power utilized by the board relies on proper management and knowledge of clocks.

BOARD_InitDebugConsole points LPUART0’s clock source at the FIRC and hands the peripheral to the SDK’s debug console module at 115200 baud. Once this is done, any PRINTF() in the program will go out of pin PTC8 (LPUART0 TX) as serial data.

Note

“Baud” is the transmission/receiving rate of a serial interface in symbols per second. Because UART uses simple binary signaling where each symbol encodes exactly one bit, 115200 baud means 115200 bits per second (11520 bytes/characters per second) on the wire. A faster baud rate means faster communication between the two connected points.

Almost none of the the code in board.c is what you would write from scratch for every project. You can usually just write it once per board and then mostly ignore, which is exactly what we’ve done here.

Note

The version in apps/blinky/board.c is a trimmed-down adaptation of the vendor example at rv32m1-sdk/boards/rv32m1_vega/driver_examples/gpio/led_output/ri5cy/, rewritten to keep only what blinky actually uses.

main.c

#include "board.h"
#include "fsl_debug_console.h"
#include "fsl_gpio.h"

static void delay(void) {
    volatile uint32_t i;
    for (i = 0; i < 800000; ++i)
        __asm("NOP");
}

int main(void) {
    gpio_pin_config_t led_config = { kGPIO_DigitalOutput, 0 };

    BOARD_InitPins();
    BOARD_BootClockRUN();
    BOARD_InitDebugConsole();

    PRINTF("\r\nRV32M1-VEGA RI5CY baremetal app\r\n");

    GPIO_PinInit(BOARD_LED_GPIO, BOARD_LED_GPIO_PIN, &led_config);

    PRINTF("Starting to blink LED...\r\n");
    while (1) {
        delay();
        GPIO_TogglePinsOutput(BOARD_LED_GPIO, 1u << BOARD_LED_GPIO_PIN);
    }
}

Before entering the loop, main runs three board-level setup calls (imported via board.h), prints a banner over the UART, and configures pin 24 of GPIOA as a digital output with an initial value of 0 (LED off). After that it loops forever, waiting a bit and then toggling the LED.

A few things are worth calling out:

  • The delay() function is a busy loop, not a real timer (we’ll learn more about those in a later section)
    • delay() blocks the CPU in a tight for with an inline NOP. The inline NOP ensures the compiler doesn’t optimize the loop away
    • The volatile qualifier on i is for the same reason: without it, an optimizing compiler might notice that nothing depends on i and delete the whole loop
    • As we noted earlier, busy-looping is a bad long-term habit (it wastes power and blocks the CPU from doing anything useful), but for a first program it’s the easiest way to provide delays between our toggles
  • The PRINTF macro is not the printf from the C standard library (since we don’t have a standard library for our bare-metal code). It expands to the SDK’s own DbgConsole_Printf, which writes bytes out over LPUART0 one at a time
    • LPUART0 is connected to the J12 USB port on the physical VEGAboard, we’ll learn how to view the serial output in later sections

The flow of execution

Now that we’ve seen every piece, we can tie them together into the full path a single blink takes:

  1. Reset brings the CPU up running from the Fast Frequency Internal Reference Clock (FIRC), executes the startup assembly in startup_RV32M1_ri5cy.S, zeroes .bss, copies .data, and calls main.
  2. BOARD_InitPins writes to PORTA and PORTC mux registers so pin 24 is GPIO and pins PTC7/PTC8 are LPUART0.
  3. BOARD_BootClockRUN sets the system clock to 48 MHz through using FIRC.
  4. BOARD_InitDebugConsole prepares LPUART0 so that subsequent PRINTF calls can emit characters.
  5. GPIO_PinInit writes 1 << 24 into GPIOA’s PDDR register, marking that pin as an output.
  6. The main super loop runs forever: delay() burns a few hundred thousand NOPs worth of cycles, then GPIO_TogglePinsOutput writes 0x01000000 to GPIOA’s PTOR register, which flips bit 24 of PDOR in hardware, in turn toggling the LED.

Step 6 is the entirety of our application logic doing “useful work”. Everything else is just (largely generic) initial setup.

Compiling our program

Turning main.c into something that can run on the VEGAboard is a multi-stage process:

  • Compile each .c into an object file with a cross-compiler
  • Assemble the startup code, link everything against a linker script that knows the chip’s memory layout
  • Finally, convert the ELF output into a raw binary for flashing

Building manually

If you wanted to build everything entirely by hand (we don’t recommend it), the invocation would look something like this (shortened for readability):

# From vega-quickstart/
SDK=rv32m1-sdk
DEV=$SDK/devices/RV32M1
BOARD=$SDK/boards/rv32m1_vega

riscv32-unknown-elf-gcc -march=rv32imc -O0 -g -ffreestanding -fno-builtin \
    -DCPU_RV32M1_ri5cy -D__STARTUP_CLEAR_BSS \
    -I apps/blinky -I $DEV -I $DEV/drivers -I $DEV/utilities \
    -I $SDK/RISCV -I $SDK/devices \
    -c apps/blinky/main.c -o main.o
# ... repeat for board.c, system_RV32M1_ri5cy.c, fsl_gpio.c, fsl_clock.c,
#     fsl_msmc.c, fsl_lpuart.c, fsl_common.c, fsl_debug_console.c, etc.

riscv32-unknown-elf-gcc -march=rv32imc \
    -c $DEV/gcc/startup_RV32M1_ri5cy.S -o startup.o

riscv32-unknown-elf-gcc -march=rv32imc \
    -T $BOARD/driver_examples/gpio/led_output/ri5cy/riscvgcc/RV32M1_ri5cy_flash.ld \
    -ffreestanding -nostdlib -Xlinker --gc-sections \
    -Xlinker -z -Xlinker muldefs \
    -o blinky.elf main.o board.o startup.o ... \
    -Wl,--start-group -lm -lc -lgcc -lnosys -Wl,--end-group

riscv32-unknown-elf-objcopy -O binary blinky.elf blinky.bin

Note:

  • riscv32-unknown-elf-gcc is a cross-compiler: it runs on your laptop but emits RISC-V instructions.
    • The -march=rv32imc flag tells the compiler which subset of the RISC-V ISA to target: 32-bit base integer (i), multiply/divide (m), and compressed 16-bit encodings (c), which matches what the RI5CY core on the VEGA implements.
  • -ffreestanding -fno-builtin -nostdlib tell GCC that no hosted C runtime exists. There is no operating system to provide memory allocators, a standard library, etc. The compiler must not assume that calling printf can reach stdout, and the linker must not pull in startup code from libc.
  • The linker script (RV32M1_ri5cy_flash.ld) tells the linker where flash and RAM are located (in terms of memory addresses), which section goes where, and where the vector table has to be placed for the CPU to find it at reset.
    • Errors or incorrect addresses in the linker script may lead to immediate hard-faults the instant the board starts…such errors can be very hard to debug - try to use vendor-provided linker scripts whenever possible.
  • The final objcopy call strips the ELF formatting off the compiled code. The resulting .bin is a flat dump of what the flash contents should look like.

Using the Makefile

Doing all of the above is not fun, especially since you need to do it every time you make changes and need to recompile. Thankfully, you don’t have to; the quickstart repository’s top-level Makefile wraps all of the above into a single command:

make blinky

The build output lands in build/blinky/:

build/blinky/
├── blinky.elf    full ELF with debug info
├── blinky.bin    flat binary
├── blinky.hex    Intel HEX
├── main.o
├── board.o
└── ... (all the other .o and .d files)

Note

At the end of a successful build the Makefile also runs riscv32-unknown-elf-size on the ELF, printing the text, data, and bss sizes. Those numbers are useful as a rough check of size. For example, our bare blinky app should just be a few kilobytes of text at most. If you suddenly see it balloon in size, something you may not have intended may have snuck in.

The Makefile is organized so that adding a new application is just a matter of dropping a new directory under apps/ with one or more .c files inside. Running make <appname> builds it, make flash-<appname> builds and flashes it (covered in the real-hardware section next), and make sim-<appname> builds and runs it inside Renode (covered in simulating hardware section later). If a given application needs different compiler flags or extra SDK drivers, you can add an apps/<name>/config.mk file to override the relevant variables without touching the top-level build rules.

From here on out, no need to run compilation commands by hand, just use make! However, now when you make <app> and it prints a wall of compile lines and a final size breakdown, you know exactly what each of those lines is doing and why - awesome!

TLDR

  • The RV32M1 SDK ships as a git submodule at vega-quickstart/rv32m1-sdk. Populate it with git submodule update --init --recursive before building.
  • SDK helpers like GPIO_TogglePinsOutput are thin inlined wrappers over a single register store. They give the raw peripheral bits readable names without adding runtime cost.
  • The blinky app is one big super loop: initialize pins, clocks, and the UART; then forever delay and toggle GPIOA pin 24. All three init calls (BOARD_InitPins, BOARD_BootClockRUN, BOARD_InitDebugConsole) are board scaffolding you write once and ignore thereafter.
  • Building for the VEGAboard uses the riscv32-unknown-elf- cross-toolchain with -march=rv32imc and a vendor-supplied linker script, producing an .elf and .bin. The top-level Makefile wraps all of this behind make <app>.

Connecting to real hardware

With blinky built (recalling make blinky produces a blinky.elf and blinky.bin under build/blinky/), let’s try to actually get that binary onto the VEGAboard and watch it run. This section walks through the three things you’ll do every time you iterate on firmware: connect the board, flash your built binary, and, optionally, attach a debugger.

Connecting

There are 2 physical connections and a button that are important to us for running code on the VEGA. All of them are highlighted in the left image below; the right image shows what a connected setup looks like.

VEGAboard connectors
VEGAboard connectors
VEGAboard connected
VEGAboard connected

The OpenSDA USB port (J12) is a standard Micro-USB. Plugged into your laptop, it provides power to the board and also exposes LPUART0 (the pins we wired up in BOARD_InitDebugConsole) as a virtual serial device. On Linux (and inside our container) it usually enumerates as /dev/ttyACM0. This is the where you’ll see output from PRINTF() statements going to your terminal.

The RISC-V JTAG header (J55) is a small ribbon connector. The RV32M1 contains several CPU cores (RI5CY, ZERO-RISCY, and an ARM core) and this header is wired specifically to the JTAG chain of the RI5CY RISC-V core we care about. You can connect a SEGGER J-Link debugger here. Mind the ribbon cable orientation: the red stripe marks pin 1, and the right image above shows the correct direction. JTAG is the protocol every tool in the rest of this section uses to halt the CPU, program flash, set breakpoints, and single-step.

Note

Want to learn more about JTAG? See the Diving into JTAG series by Memfault.

The reset button (SW1) is a momentary push that pulls the chip’s reset line low. Pressing and releasing it re-runs the startup assembly from startup_RV32M1_ri5cy.S and re-enters main. Handy when a freshly-flashed image misbehaves, or when you just want a clean starting state without yanking the USB cable.

Flashing

“Flashing” means copying our built code into the non-volatile flash memory inside the RV32M1 so it persists across resets and power cycles. The binary data we send will travel out the J-Link over JTAG; for this, we use the Open On-Chip Debugger (OpenOCD) software (which translates JTAG data into higher-level “write this word to that address” operations) to talk to the chip’s on-board flash controller.

By hand

Flashing by hand looks like this:

openocd -f support/openocd/openocd_rv32m1_vega_ri5cy.cfg \
    -c "init" \
    -c "halt" \
    -c "ri5cy_boot" \
    -c "flash write_image erase build/blinky/blinky.elf" \
    -c "reset run" \
    -c "exit"

Each -c passes one TCL command to openocd after it finishes loading the config file:

  • init opens the JTAG connection and initializes openocd’s internal state.
  • halt stops the CPU. We can’t reprogram the flash bank a program is currently executing from.
  • ri5cy_boot is a vendor-supplied TCL routine defined in the config file. It selects the RI5CY core as the active target, since the chip exposes multiple cores on the same JTAG chain.
  • flash write_image erase ... erases the sectors that will be written, then programs the ELF’s loadable segments into flash.
  • reset run pulses reset and lets the CPU begin executing the newly-flashed image.
  • exit disconnects and shuts openocd down cleanly.

Note

If you are curious what ri5cy_boot actually does, see support/openocd/openocd_rv32m1_vega_ri5cy.cfg. It essentially just pokes the chip’s control registers to bring the RI5CY core out of reset.

Using the Makefile

As always, the top-level Makefile provides a wrapper for easier use:

make flash-blinky

This first rebuilds the app if any source has changed, then hands the resulting ELF to openocd with the same command shown above. The command generalizes: make flash-<app> works for any app under apps/.

Debugging

Flashing gets code onto the board, but the moment something doesn’t work (wrong LED, hung in a loop, unexpected fault) you may need to inspect what the CPU is actually doing to get to the bottom of it. The same J-Link that programs flash can also act as a live window into the CPU’s state: we can halt the CPU at any address, read and write memory, inspect registers, and single-step through instructions.

The setup has two parts:

  1. openocd runs as a GDB server, translating GDB’s remote serial protocol into JTAG operations. By convention, it listens on TCP port 3333.
  2. riscv32-unknown-elf-gdb runs as the client on your laptop. It loads the ELF’s debug information (source line numbers, variable names, struct layouts) and connects to the openocd GDB server port.

Note

GDB stand for the The GNU Project Debugger, it’s a fairly standard tool worth learning more about!

Note that we need a RISC-V build of GDB (riscv32-unknown-elf-gdb) since the target is a RISC-V core. This is included in the project’s container image for you.

By hand

In one terminal, start the server:

openocd -f support/openocd/openocd_rv32m1_vega_ri5cy.cfg \
    -c "init" -c "halt" -c "ri5cy_boot"

In a second terminal, launch GDB against the ELF and connect:

riscv32-unknown-elf-gdb build/blinky/blinky.elf \
    -ex "target remote :3333"

Using the Makefile

As with flashing, the Makefile provides shortcuts:

make gdbserver    # terminal 1
make gdb-blinky   # terminal 2

gdbserver runs openocd with exactly the three commands above. gdb-<app> launches GDB on the corresponding ELF and auto-connects to :3333.

A quick GDB cheat sheet

Once GDB is attached, here are some commands you’ll find useful:

  • monitor reset halt: reset the chip and stop at the first instruction
  • break main (or b main): set a breakpoint at main
  • info breakpoints: show all breakpoints
  • delete <num> (d <num>): delete a breakpoint
  • continue (c): run until the next breakpoint or signal
  • step (s): advance one source line, descending into function calls
  • next (n): advance one source line, stepping over calls
  • stepi (si): advance one machine instruction
  • print <expr> (p): evaluate a C expression. p *(GPIO_Type *)0x48020000 dumps every field of the GPIOA register block/struct
  • info registers: show all CPU registers
  • disassemble <function> - show disassembly (machine code instructions) of function
  • load: reprogram flash through the debugger

Warning

Running monitor reset halt will leave OpenOCD and the RI5CY core out of sync. To reset cleanly from inside GDB, use the following sequence:

(gdb) monitor reset halt
(gdb) monitor ri5cy_boot
(gdb) monitor halt
(gdb) load
(gdb) tbreak main
(gdb) c

For blinky specifically, a quick sanity check is to b GPIO_TogglePinsOutput, continue, and confirm the breakpoint fires once per blink.

Viewing serial output

Debugging through GDB is great, but sometimes using print statements over UART is just less hassle. Anything the board writes over LPUART0 comes back on the OpenSDA USB (J12). The Makefile provides a helper for this:

make serial

This opens minicom at 115200 baud (matching the rate BOARD_InitDebugConsole configured) against the first /dev/ttyACM*, /dev/ttyUSB*, or /dev/cu.usbmodem* it finds. After flashing blinky you should see the RV32M1-VEGA RI5CY baremetal app banner and then Starting to blink LED....

TLDR

  • You need to know about 2 connections and a button: OpenSDA USB (J12) for power and serial, RISC-V JTAG (J55) for flashing and debugging, and reset (SW1) to restart the chip.
  • Flashing uses openocd to write our program into the chip’s flash.
    • You can use make flash-<app> to flash any app.
  • Debugging is a two-process setup: openocd as a GDB server on port 3333, and riscv32-unknown-elf-gdb as the client.
    • You can use make gdbserver in one terminal plus make gdb-<app> in another.
  • make serial opens the virtual UART at 115200 baud, which is where every PRINTF in your application ends up.

Simulating hardware

Real hardware is great…but sometimes slow to iterate on and not readily accessible. To talk with real hardware, you have to rebuild your firmware, flash it over a debug probe, and then poke at the board (blinking an LED, squinting at a serial terminal, etc) to see whether anything is working. A simulator sidesteps all of this. It runs your firmware on a virtual copy of the board, entirely inside your computer, and gives you direct visibility into what the code is doing. Most importantly, you can also simulate a board you don’t physically own!

This training uses Renode, an open-source simulator from Antmicro that can model full embedded systems (CPU, memory, and peripherals). Renode already ships with a basic platform description for the VEGAboard, so we can run firmware on it out of the box.

Note

It turns out the platform description provided in the official Renode repository is not very complete. The quickstart repository has custom definitions under support/renode. We won’t go into detail about these files, but feel free read to read through them if interested in learning more about Renode.

Renode basics

Before we run anything, it helps to know three Renode concepts:

  1. A platform description (.repl file) lists the virtual hardware: what CPU, how much memory, which peripherals live at which addresses. Renode includes a basic one for the VEGAboard at platforms/boards/vegaboard_ri5cy.repl.
    • The quickstart repository for these labs contains a more advanced one under support/renode
  2. A Renode script (.resc file) is a small recipe that builds a machine from a .repl, loads firmware into it, and wires up things like UART windows. Renode also includes scripts/single-node/vegaboard_ri5cy.resc that does this for our board.
  3. The monitor is Renode’s interactive prompt. When you start Renode you land in the monitor and type commands such as start, pause, or showAnalyzer to drive the simulation.

You don’t need to write any of this from scratch for the VEGAboard. The quickstart repository does all the setup for. You’ll primarily just be running a handful of monitor commands to explore the system as it runs.

Running blinky in Renode

Let’s jump right into it! We’ll be simulating the same blinky app we developed in previous sections.

First, make sure blinky has been built. From the repository root, inside the dev container, run:

make blinky

This produces the ELF file Renode will load:

build/blinky/blinky.elf

From the same shell inside the container, launch Renode:

renode

Note

If you see a line like Couldn't start UI - falling back to console mode, that’s fine. It just means Renode didn’t find a graphical display (typical when you’re inside a container over SSH or a remote connection). The monitor prompt works identically either way.

At the (monitor) prompt, point Renode at your ELF file and include the bundled VEGAboard script:

(monitor) $bin=@/workspaces/vega-quickstart/build/blinky/blinky.elf
(monitor) include @/workspaces/vega-quickstart/support/renode/vegaboard_ri5cy.resc

The first line sets a variable named $bin that the platform Renode script reads to decide which binary to load. The second line executes the Renode script, which creates the virtual machine, wires up the CPU and peripherals, loads your firmware into flash, and calls the monitor command showAnalyzer lpuart0 for you so the simulated UART is attached to your terminal from the start. After it finishes you’ll see a new prompt that reflects the name of the machine:

(Vegaboard-RI5CY)

Now start the emulation. start and its one-letter alias s both work:

(Vegaboard-RI5CY) start

You should see the UART output from main.c appear on the terminal:

lpuart0: RV32M1-VEGA RI5CY baremetal app
lpuart0: Starting to blink LED...

That’s the PRINTF() calls in main.c writing to lpuart0, which the bundled script routed to your terminal. If you don’t see them, double-check the ELF path in $bin.

To stop cleanly, pause and quit the monitor:

(Vegaboard-RI5CY) pause
(Vegaboard-RI5CY) quit

Watching the LED

The quickstart’s platform at support/renode/vegaboard_ri5cy_platform.repl properly defines GPIO ports and the logic backing them (via the NXP_GPIO peripheral in support/renode/NXP_GPIO.cs).

Thus, writes land in a real register inside Renode and the LED’s output state is observable from the monitor.

As discussed in earlier sections, the LED is on GPIOA pin 24 (see BOARD_LED_GPIO and BOARD_LED_GPIO_PIN in board.h). GPIOA lives at 0x48020000; the PDOR register (the current output latch) is at offset 0x00.

While the simulation is running, sample it from the monitor:

(Vegaboard-RI5CY) sysbus ReadDoubleWord 0x48020000

Run it a few times. The value alternates between 0x00000000 and 0x01000000 as the firmware toggles pin 24. Bit 24 set is the LED on; clear is off.

For a live view of every access gpioa receives, turn on global peripheral access logging before start (or pause first):

(Vegaboard-RI5CY) sysbus LogAllPeripheralsAccess true

Each iteration of the while (1) loop in main.c then prints a line showing a write of 0x01000000 to offset 0x0C on gpioa (PTOR, the toggle register). The rate at which those writes appear is your blink rate. Pause the simulation, change the iteration count in delay(), rebuild, and watch the log speed up or slow down. The logging is noisy but useful while you’re learning, since every read and write the CPU makes shows up with its address, value, and the address of the instruction that issued it.

A faster workflow

Doing all of the commands above will quickly become tedious. Luckily for you, we’ve defined a helper in the base Makefile that automates launching Renode and loading the specified application in a single command.

To launch and simulate the blinky application, simply run:

make sim-blinky

This will take you straight from a shell prompt to a running simulation of blinky. Going forward, any application we write can be simulated by simply running make sim-<app>!

Challenge

TODO: challenge is to configure the blue and green pins of the on-board RGB LED and update the main loop to toggle all 3 in a sequence (red-wait-green-wait-blue-wait-repeat).