RISC-V: custom instruction and its simulation
Agenda
This article shows how to add a new instruction to RISC-V and simulate it.
These topics are covered along the way:
- Whole GNU
riscv
toolchain installation; - Implementation of a new instruction for
spike
RISC-V ISA simulator; - Manual instruction encoding in C/C++;
- Custom instruction simulation (with visible output);
- [riscv32-]GCC plugin development;
You may find associated repository useful.
Many things can go wrong.
Be prepared to fix upcoming issues by yourself.
The final result is very rewarding, I promise.
Toolchain installation
Choose installation directory. Call it RISCV
.
Add these lines to your ~/.bashrc
:
# Directory which will contain everything we need.
export RISCV_HOME=~/riscv-home
# $RISCV will point to toolchain install location.
export RISCV="${RISCV_HOME}/riscv"
export PATH="${PATH}:${RISCV}/bin"
Run mkdir -p "${RISCV_HOME}" "${RISCV}"
.
Use 1_install/2_download-repos script to clone all required repositories.
If you wish to save some time and traffic, avoid recursive clone of toolchain repository. Instead, clone sub-modules by hand. You may exclude “riscv-glibc”.
Be warned: I have not tested partial toolchain build, caveat emptor
Satisfy GNU toolchain
prerequisites by installing all required packages.
In addition, spike requires device-tree-compiler
package.
We choose:
- RISCV32 over RISCV64
- newlib over glibc
Repositories must be built in this order:
- riscv-gnu-toolchain
- riscv-fesvr, riscv-pk
- riscv-isa-sim
You can use 1_install/3_build-repos script as a guideline.
To check installation, use 1_install/4_check-install.
Custom instruction description
Within the framework of this article, we will implement mac instruction.
rv32im
has mul
and add
instructions, mac
combines them.
It defined as a0 := a0 + a1 * a2
(ordinary 3-address instruction).
# Without mac (preserve registers):
mv t0, a0 # addi r0, a0, 0
mul a1, a2, a3
add a1, a1, t0
# With mac:
mac a1, a2, a3
Adding “mac” instruction to the rv32im
To add an instruction to the simulator:
- Describe the instruction’s functional behavior;
- Add the opcode and opcode mask to “riscv/opcodes.h”;
First step is accomplished by adding a riscv/insns/mac.h
file:
/* file "$RISCV_HOME/riscv-isa-sim/riscv/insns/mac.h" */
// 'M' extension means we require integer mul/div standard extension.
require_extension('M');
// RD = RD + RS1 * RS2
reg_t tmp = sext_xlen(RS1 * RS2);
WRITE_RD(sext_xlen(READ_REG(insn.rd()) + tmp));
For the second step, we use riscv-opcodes.
cd "${RISCV_HOME}/riscv-opcodes"
echo -e "mac rd rs1 rs2 31..25=1 14..12=0 6..2=0x1A 1..0=3\n" >> opcodes
make install
It turns out there is a third step which is not documented.
New entry must be added to the riscv_insn_list
.
sed -i 's/riscv_insn_list = \\/riscv_insn_list = mac\\/g' \
"${RISCV_HOME}/riscv-isa-sim/riscv/riscv.mk.in"
Rebuild the simulator.
cd "${RISCV}/riscv-isa-sim/build"
sudo make install
Testing rv32im brand new instruction
At this stage:
- Compiler knows nothing about
mac
. It can not emit that instruction; - Assembler knows nothing about
mac
. We can not usemac
in inline assembly;
Our last resort is manual encoding.
#include <stdio.h>
// Needed to verify results.
int mac_c(int a, int b, int c) {
a += b * c; // Semantically, it is "mac"
return a;
}
// Should not be inlined, because we expect arguments
// in particular registers.
__attribute__((noinline))
int mac_asm(int a, int b, int c) {
asm __volatile__ (".word 0x02C5856B\n");
return a;
}
int main(int argc, char** argv) {
int a = 2, b = 3, c = 4;
printf("%d =?= %d\n", mac_c(a, b, c), mac_asm(a, b, c));
}
Save test program as test_mac.c
.
riscv32-unknown-elf-gcc test_mac.c -O1 -march=rv32im -o test_mac
spike --isa=RV32IM "${RISCV_PK}" test_mac
You should see 14 =?= 14
printed to stdout.
If result differs, riscv32-unknown-elf-gdb
can help you in troubleshooting.
Mac encoding explained
Be sure to look at official specifications if you aim for precise descriptions.
mac
will mimic mul
encoding, but use different opcode.
# file "riscv-opcodes/opcodes"
# differs
# |
# v
mac rd rs1 rs2 31..25=1 14..12=0 6..2=0x1A 1..0=3
mul rd rs1 rs2 31..25=1 14..12=0 6..2=0x0C 1..0=3
# ^ ^ ^ ^ ^ ^ ^
# | | | | | | |
# | | | | | | |
# | | | | | | also opcode 3 bits
# | | | | | opcode 5 bits
# | | | | funct3 3 bits
# | | | funct7 7 bits
# | | rs2 (src2) 5 bits
# | rs1 (src1) 5 bits
# dest 5 bits
Actual encoding has different order of components and opcode is really single 7 bit segment.
5 bits per register operand means that we have 32 addressable registers.
# Encoding used for "mac a0, a1, a2"
0x02C5856B [base 16]
==
10110001011000010101101011 [base 2]
==
00000010110001011000010101101011 [base 2]
# Group by related bit chunks:
0000001 01100 01011 000 01010 1101011
^ ^ ^ ^ ^ ^
| | | | | |
| | | | | opcode (6..2=0x0C 1..0=3)
| | | | dest (10 : a0)
| | | funct3 (14..12=0)
| | src1 (11 : a1)
| src2 (12 : a2)
funct7 (31..25=1)
Plugin vs patch
There are two ways to extend GCC:
- Patch GCC itself
- Write loadable plugin for GCC
Prefer plugins to GCC patches whenever possible.
GCC wiki “plugins” page described
advantages in the “Background” section.
In this guide, both methods will be covered.
Useful links:
- Simple GCC plugin series of posts
- GCC plugins manual
GCC “rv32imMac” plugin
TODO
GIMPLE “gmac” statement
TODO
The pleasure of intrinsics
TODO
Compiling “mac” without intrinsic
TODO