Logo by Tokino Kei.
Sentinel is a small RISC-V CPU (RV32I_Zicsr
) written in Amaranth.
It implements the Machine Mode privileged spec, and is designed to fit into
~1000 4-input LUTs or less on an FPGA. It is a good candidate for control tasks
where a programmable state machine or custom size-tailored core would otherwise
be used.
Unlike most RISC-V implementations, Sentinel is microcoded, not pipelined. Instructions require multiple clock cycles to execute. Sentinel is therefore not necessarily a good fit for applications where high throughput/ IPC is required. See below.
Sentinel has been tested against RISC-V Formal and the RISCOF frameworks, and passes both. Once I have added a few extra tests, the core can be considered correct with respect to the RISC-V Formal model. The core is also probably correct with respect to the SAIL golden model.
I've like the way the word "sentinel" sounds ever since I first learned of the word, either from the title of a book on NJ lighthouses, or on an enemy from an old Sega Genesis RPG. The term has always stuck with me since then, albeit in a much more positive light than "the soldier golems of the forces of Darkness" :). Since "sentinel" means "one who stands watch", I think it's an apt name for a CPU intended to watch over the rest of your silicon, but otherwise stay out of the way. Also, since lighthouses are indeed "Sentinels Of The Shore", I wanted to shoehorn a lighthouse into the logo :).
Sentinel uses:
- PDM as its package/dependency manager, and to orchestrate all things you can do with this repo.
- m5meta microcode assembler, without which Sentinel would optimize down to ~0 LUTs :).
- yosys and nextpnr for size-benchmarking. The user must provide these.
- pytest for basic/regression testing.
- DoIt as a lower-level dependency-graph aware task
orchestrator (called from
pdm
). - riscv-tests, an older
set of unit tests for RISC-V processors. Running these is done via
pytest
. - RISC-V Formal to verify that desirable properties of Sentinel (such as "instructions write the correct destination") hold for all possible inputs over a bounded number of clock cycles after reset.
- RISCOF, the unit test
framework that is maintained by RISC-V International themselves. This appears
to have originally been derived from the
riscv-tests
, but is much more comprehensive.
The latter five are only required for development. Additionally for development, a user must provide:
riscv64-unknown-elf-gcc
to compile tests from riscv-tests (I'm not sure what the correct way to install the compiler is nowadays, I use 8.3.0.)- SymbiYosys, a driver program for RISC-V Formal.
- Boolector, the SMT Solver that RISC-V Formal uses.
RISCOF also requires the SAIL RISC-V emulator. This is a pain to compile, so I provide a Linux binary (and eventually Windows if I can get OCaml to behave long enough. I used to be able to install it just fine :'D!).
A user must first run the following before anything else:
pdm install -G dev -G examples
I expect most users to only need to import
from sentinel.top
. The top-level
module of the Sentinel CPU is appropriately named Top
:
from sentinel.top import Top
class MySoC(Elaboratable):
def __init__(self):
self.cpu = Top()
...
def elaborate(self, plat):
m = Module()
m.submodules.cpu = self.cpu
...
Top
exposes a Wishbone Classic bus, and an irq
input pin as the interface
to all other modules in an FPGA design. Of course, Top
's also has clk
and
rst
lines, which belong to the sync
clock domain
rather than being directly exposed in Top
's Signature
. sync
is the only
clock domain that Sentinel uses.
See the AttoSoC
class
in examples/attosoc.py for a
full working example. A working demo can be generated from this example, as
explained below.
This command will generate a core with a Wishbone Classic bus, and clk
,
rst
, and irq
input pins (as mentioned above, Sentinel uses a single clock
domain):
pdm gen > sentinel.v
On reset, Sentinel begins execution at address `0``. See the CSR section for information on exception handling (including interrupts).
The Wishbone bus uses a block xfer to do a back-to-back memory write an instruction fetch. Otherwise, the wishbone bus will deassert CYC/STB the cycle after receipt of ACK. I may neeed to interface to IP that can't handle block cycles, so I will probably relax the block cycle requirement in the future via an option.
For help, run:
pdm gen -h
If using Sentinel as an installed package, the previous section still applies, except the command is now:
[pdm run] python -m sentinel.gen
If you're using pdm
to handle Python dependencies in e.g. a mixed Python/Verilog
project, and Sentinel is a one of those Python dependencies, you may wish
to use scripts to
provide a shortcut for Verilog generation in your pyproject.toml
(call = "python -m sentinel.gen"
does not work!):
[tool.pdm.scripts]
gen = { call = "sentinel.gen:generate", help="generate Sentinel Verilog file" }
Generate A Demo Bitstream For Lattice iCEstick
pdm demo
For help, run:
pdm demo -h
pdm test
or
pdm test-quick
The above will invoke pytest
and test Sentinel against handcrafted examples,
as well as the riscv-test repo binaries. See the README.md
in tests/upstream
for information on how to refresh the binaries.
Right now (11/5/2023), the difference between test
and test-quick
is
minimal.
pdm rvformal-all [-n num_cores]
or
pdm rvformal test-name
See README.md in tests/formal
for more information,
including valid/available test names.
pdm riscof-all
or
pdm riscof-override /path/to/test_list.yaml
See README.md in tests/riscof
for more information.
pdm run --list
pdm doit list [--all] [task]
doit
tasks are documented as a courtesy, and to make sure developers/users
don't get stuck. I am unsure about doit
tasks' stability, so prefer running
pdm
as a wrapper to doit
rather than running doit
directly.
TODO. I need to create a test that gets latency and throughput for each instruction type of the core. Some general observations (as of 11/18/2023), from examining the microcode:
- There is room for improvement, even without making the core bigger.
- Fetch/Decode takes a minimum of two cycles thanks to Wishbone classic's
REQ/ACK handshake taking two cycles.
- When Wishbone ACK is asserted, Decode is taking place.
- The GP file is a synchronous single read port, single write port. Sentinel loads RS1 out of the register file during Decode.
- All instructions share the same operation the cycle after ACK/Decode:
- Check for exceptions/interrupts, go to exception handler if so.
- Latch RS1 into the ALU.
- Load RS2 out of the register file, in anticipation for a "simple" instruction.
- Jump to the instruction-specific microcode block.
- At minimum, an instruction (
addi
,or
, etc) takes 3 cycles to retire after the initial shared cycles. This means Sentinel instructions have a minimum latency of 6 cycles per instruction (CPI). - Sentinel instructions have a maximum throughput of 4 CPI by overlapping the
2 Fetch/Decode cycles of the next instruction after the initial 3 shared
cycles of the current instruction when possible ("pipelining").
- Some instructions overlap one of the Fetch/Decode cycles, some don't overlap either of them. In particular, shift instructions with a nonzero shift count don't pipeline Fetch/Decode. It may be possible to always overlap at least one cycle, but I haven't tweaked the core yet to ensure this is a sound optimization.
- Shift instructions need work:
- For a shift of zero, shift-immediate latency is 10 CPI, throughtput 9 CPI. Shift-register latency is 11 CPI, throughput 10 CPI.
- For a shift of nonzero
n
, shift-immediate and shift-register latency and throughput is 7 + 2*n
CPI.
- Branch-not-taken latency and throughput is 7 CPI. Branch-taken latency and throughput is 8 CPI.
- JAL/JALR latency is 9 CPI, throughput is 7 CPI.
- Store latency and throughput is 8 CPI minimum. 2 cycles minimum are spent
waiting for Wishbone ACK.
- The core will not release STB/CYC between the store and fetch of the next instruction.
- Load latency is 10 CPI minimum, and throughput is 9 CPI. 2 cycles minimum
are spent waiting for Wishbone ACK.
- The core will release STB/CYC before fetch of the next instruction.
- CSR instructions require an extra Decode cycle compared to all other
instructions (to check for legality).
- At minimum, a read of a read-only zero CSR register has a latency of 7 CPI, and a throughput of 6 CPI.
- At maximum,
csrrc
has a latency of 11 CPI, and a throughput of 10 CPI.
- Entering an exception handler requires 5 clocks from the cycle at which
the exception condition is detected.
mret
has a latency and throughput of 8 CPI.
Sentinel physically implements the following CSRs:
mscratch
mcause
-
The core can only physically trigger a subset of defined exceptions:
- Machine external interrupt
- Instruction access misaligned
- Illegal instruction
- Breakpoint
- Load address misaligned
- Store address misaligned
- Environment call from M-mode
In particular worth noting:
- Misaligned accesses are not implemented in hardware.
- There is no machine timer (a 64-bit counter is a bit too much to ask for right now :(...).
-
mip
-
Only the
MEIP
bit is implemented. The RISC-V Privileged Spec says:MEIP
is read-only inmip
, and is set and cleared by a platform-specific interrupt controller.The user must provide their own interrupt controller. One simple implementation is to
OR
all external interrupt sources together, and query each peripheral whenMEIP
is pending to find which peripherals need attention. This is implemented for the serial and timer peripherals in the attosoc example.In the future, I may implement the high (platform-specific) 16-bits of
mip
/mie
to make interrupt-handling quicker.
-
mie
- Only the
MEIE
bit is implemented.
- Only the
mstatus
- Only the
MPP
,MPIE
, andMIE
bits are implemented.
- Only the
mtvec
- The
BASE
is writeable; only the DirectMODE
setting is implemented.
- The
mepc
Additionally, the following CSRs are implemented as read-only zero (only the first 5 of the below registers trigger an exception on an attempt to write):
mvendorid
marchid
mimpid
mhartid
mconfigptr
misa
mstatush
mcountinhibit
mtval
mcycle
minstret
mhpmcounter3-31
mhpmevent3-31
All remaining machine-mode CSRs are unimplemented and trigger an exception on any access:
medeleg
mideleg
mcounteren
mtinst
mtval2
menvcfg
menvcfgh
mseccfg
mseccfgh