# Building RISC-V Processors for Space Applications

**Space Computing Conference** 

# **RISC-V** growing exponentially



| Why F | RISC-V for Space Computing                                                                                                                          |                                                  |
|-------|-----------------------------------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------|
|       | Open ISA = Fastest growing ecosystem in the industry                                                                                                |                                                  |
|       | RISC-V = exponential growth in number of users<br>Secure, safe choice for long term government initiatives<br>60B RISC-V cores manufactured by 2025 |                                                  |
|       | Scalable portfolio, addresses multiple performance points                                                                                           |                                                  |
|       | Multiple vendors allows for healthy competition without single supplier lock-in                                                                     |                                                  |
|       |                                                                                                                                                     | Confidential © 2021 SiFive. All Rights Reserved. |

### SiFive RISC-V Leadership



### We invented **RISC-V**

SiFive's founders are the same UC Berkeley professor and PhDs who invented and have been leading the commercial implementation of the RISC-V Instruction Set Architecture (ISA) since 2010



"2018/2019/2020: SiFive Recognized as Most Respected Private Semiconductor Company"



SiFive

### Powered By SiFive - Examples



# Leaders adopting SiFive Processors



"We're excited to partner with SiFive because of their ability to deliver CPUs and software for the modern RISC-V ecosystem."

- Jim Keller, President and CTO, Tenstorrent

# RENESAS

"We are very excited to work with SiFive as their lead partner to develop nextgeneration semiconductor solutions through the collaboration of our accumulated expertise in the automotive field, and SiFive's high-end RISC-V technologies."

> - Takeshi Kataoka, Senior Vice President, General Manager, Automotive Solution Business Unit, Renesas

# SAMSUNG

"We are pleased to work with SiFive to accelerate our customers' AI / ML custom SoC design, illustrating the innovation potential we can achieve together."

- **Mijung Noh**, VP, Foundry Design Service Team Samsung Electronics

### SiFive: The Undisputed Leader in RISC-V Computing

### CPU Cores



#### 32 and 64-bit Processors

- Microcontrollers, IoT devices, real-time control, control plane processing
- Highly customizable to application specific requirements
- Mature, industry proven designs
- Compare to Arm Cortex-M, Cortex-R, and up to Cortex-A53



SiFive Performance

#### **64-bit Application Processors**

- Networking, Infrastructure, Enterprise, Consumer
- Highest performance, most advanced RISC-V Processor Available
- Scale out, high performance, processing capabilities with vector compute
- · Compare to the latest Cortex-A

#### AI Cores



#### **Scalable AI Processors**

- Edge AI, Cloud, Training, Inference
- Very high performance and efficiency for AI workloads (vector processing)
- Built on top of RISC-V Vectors and the SiFive Intelligence Extensions
- Compare to ARM SVE2

### SiFive RISC-V Processor IP Portfolio

| SiFive Essen                                                               | tial                                                                            | SiFive Performance                                                                             |                                                                                          |                                                                               |                                                                       |
|----------------------------------------------------------------------------|---------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------|-----------------------------------------------------------------------|
|                                                                            | <b>2-Series</b><br>Power & area optimized<br>2-3 stage single-issue             | <b>3/5-Series</b><br>Efficient performance<br>5-6 stage single-issue                           | <b>7-Series</b><br>High performance<br>8 stage superscalar                               | P200-Series<br>Efficient performance<br>Integrated Vectors                    | P500-Serie<br>Maximum performa<br>3-wide OoO Superso                  |
| U Cores<br>64-bit<br>application<br>Linux, datacenter,<br>network baseband |                                                                                 | U5-Series<br>Linux-capable<br>application processors<br>> U54, U54-MC<br>Compare to Cortex-A3x | U7-Series<br>High performance<br>Linux-capable<br>> U74, U74-MC<br>Compare to Cortex-A53 | P270<br>Efficient Performance<br>Vector Application Core<br>Cortex-A55        | P550<br>Highest Performanc<br>Application Core<br>Cortex-A72, A73, 75 |
| S Cores<br>64-bit                                                          | S2-Series<br>Area-optimized 64-bit<br>microcontrollers                          | S5-Series                                                                                      | S7-Series<br>High performance<br>64-bit embedded                                         | SiFive Intelligence<br>X200-Series                                            |                                                                       |
| <b>embedded</b><br>Storage, AR/VR,<br>Realtime                             | > S21<br>No 64-bit Arm Equivalent                                               | <ul> <li>&gt; S51, S54</li> <li>Compare to Cortex-R4, R5</li> </ul>                            | <ul> <li>S76, S76-MC</li> <li>Compare to Cortex-R8</li> </ul>                            | AI Accelerated Processor<br>SW + HW Solutions                                 |                                                                       |
| E Cores                                                                    | E2-Series                                                                       | E3-Series                                                                                      | E7-Series                                                                                | X280-Series                                                                   |                                                                       |
| <b>32-bit</b><br>embedded<br>MCU, edge<br>computing, AI, IOT               | Our smallest, most<br>efficient cores<br>> E20, E21, E24<br>Cortex-M0+, M3, M4F | Balanced performance<br>and efficiency<br>> E31, E34<br>Compare to Cortex-R4, R5               | High performance<br>32-bit embedded<br>> E76, E76-MC<br>Compare to Cortex-M7             | Scalable, Programmable<br>Edge Training Al/ML<br>> X280-VLEN 512b<br>Arm SVE2 |                                                                       |

Scalable from Microcontrollers to HPC



## SiFive P550 Application Processor

### **High-Performance Out-of-Order RISC-V Application Processor**

The P550 Series is a 13-Stage, Triple-Issue, Out of Order processor making high-performance RISC-V processors reality while maintaining class leading area and performance density. The P550 processor supports multicore coherence with up to 4 cores in a core complex.



# SiFive P550

Highest performance 64-bit RISC-V application processor

#### Breakthrough RISC-V performance

- 3x Performance per mm2 compared to Cortex-A75
- Performance >8 SpecINT2k6/GHz, Higher single threaded performance than Cortex-A75

#### P550 Core Architectural Features

- RV64GBC capable core with Sv39/Sv48 Virtual Memory Support
- Three Issue, out-of-order Pipeline tuned for scalable performance
- Private L2 Caches and Streaming Prefetcher for improved memory performance
- SECDED ECC with Error Reporting

#### Enabling next generation applications

- Cache stashing to L3 for tightly coupled accelerators
- Mix+Match capable for real-time deterministic workloads
- Available now!





\* Cortex-A75 performance data source:

https://www.anandtech.com/show/13503/the-mate-20-mate-20-pro-review/3

\*\*Based on P550 with 32KB L1 I\$ and D\$ and 256KB pL2 vs estimated 7nm Cortex-A75 area from: https://www.linleygroup.com/mpr/article.php?id=12151

### P550 - 3x Performance/Area vs Cortex-A75

P550 (HiPerf Config) is achieving > 8.65 SpecINT2k6/GHz geomean, and > 9.3 SpecFP2k6/GHz (C/C++ only) geomean

 P550 is higher performance than Cortex-A75 in both benchmarks\*





Source: https://www.anandtech.com/show/13503/the-mate-20-mate-20-pro-review/3

# P550 is *less than half* of the Area of a comparable Cortex-A75

#### P550 and Cortex-A75 Area in 7nm



Source: https://www.linleygroup.com/mpr/article.php?id=12151

2.5

1.00



## Accelerators + RISC-V Vectors ⇒ Rich Software Ecosystem

RVV enables computation on data arrays, complementing scalar ops

- Vector length agnostic programming model
- Pervasiveness of RISC-V ensures expansive SW ecosystem
- Extensible RISC-V ISA enables domain specific extensions

Configurable SiFive IPs enable large design space for accelerators:



VLEN - Length of vectors being computed



ELEN - Maximum Length of data type supported



DLEN - Length data paths, load/store used for moving vector data



LMUL - Multiplication of vectors length to form even longer data vectors

SiFive

## The RISC-V Vector (RVV) Advantage

SIMD is inefficient:

- Big startup overhead to pick code variant (alignment/array size/length, etc.)
- Large code size to handle "bookkeeping", including
  - odd-sized application vectors (fringe cases)
  - prolog/epilog for software pipelining
- To accelerate SIMD, register width is doubled ... but so does instruction set
  - E.g., IA-32 grew from 80 to ~1400 since 1978, largely fueled by SIMD.
  - Do you remember what "vfmadd213pd" means and when to use it?

### **RVV to the Rescue!**

- Dynamically reconfigurable vector registers
  - Change vector length and data type at runtime
  - Group (architectural) registers into longer logical registers ("LMUL")
- Portable code: works for any "VLEN"!

# SiFive Intelligence X280

### Vector Processing for AI/ML workloads

• SiFive Intelligence Extensions, custom instructions that accelerate AI/ML performance critical operations

### 512-bit vector length processor

- Variable length operations, up to 512-bits of data per cycle, with dynamic vector length configuration
- Ideal balance of control and data parallel compute

### Performance

- 5.7 CoreMarks/MHz
- 2.3 TOPS (INT8 Matrix Multiplication)\*
  - Multiple instantiations of processor gives much higher TOPS

### Scalar processing built from U7 series core

• 64-bit RISC-V ISA. 8-stage dual-issue in-order pipeline

### High performance vector memory subsystem

- Multi-layer Caching support for optimum data movement
- Stride Pre-fetcher
- Virtual memory support, up to 48-bit addressing

### High performance, flexible connectivity to SoC peripherals

Multi-core processor configuration with upto 4-cores



# X280, Wide Range of Use Cases, with AI Computation



- Advanced Controller/Applications core with AI optimized Vector Extensions
- Multi-core configuration options
- Small size, lower power consumption

- Advanced Controller/Applications core with AI optimized Vector Extensions
- Multi-core configuration options
- Connectivity to NN accelerators

- Advanced Controller/Applications core with AI optimized Vector Extensions
- Multiple instantiations of Multi-core configuration options
- Connectivity to large NN accelerators
- Single unified system memory

### Extending LLVM with Autovectorizing C++ Compiler

```
void saxpy golden(size t n, const float a, const float *x, float *y)
                                        for (size t i = 0; i < n; ++i) {
                                          y[i] = a * x[i] + y[i];
                                      }
                                                                     saxpy_golden:
                                                                                              #@saxpy golden
                                                                     # %bb.0:
                                                                                            # %entry
LLVM Output
                                                                                 beqz
                                                                                             a0, .LBB0 3
                                                                     # %bb.1:
                                                                                            # %vector.ph
                                                                                             a3, zero
                                                                                 mv
SiFive is leading the effort to bring state of the
                                                                                            a4, zero, e32,m1,tu,mu
                                                                                 vsetvli
art compiler technology to RVV.
                                                                                            v1. fa0
                                                                                 vfmv.v.f
                                                                     .LBB0 2:
                                                                                            # %vector.bodv
Generic C code can be automatically
                                                                                        # =>This Inner Loop Header: Depth=1
vectorized to take advantage of vector
                                                                                             a4, a3, 2
                                                                                 slli
                                                                                 add
                                                                                             a6, a1, a4
processing.
                                                                                sub
                                                                                            a5, a0, a3
                                                                                            a5, a5, e32,m1,tu,mu
Further optimization work such as multiply-add
                                                                                 vsetvli
                                                                                            v2, (a6)
                                                                                 vle32.v
fusion is progress. This example is load/ store
                                                                                 add
                                                                                             a4. a4. a2
bound so is close to optimal.
                                                                                vle32.v
                                                                                            v3, (a4)
                                                                                vfmul.vv
                                                                                            v4, v2, v1
                                                                                 vfadd.vv
                                                                                            v2, v4, v3
                                                                                            a3, a3, a5
                                                                                 add
                                                                                            v2, (a4)
        Inventor of LLVM is on
                                                                                 vse32.v
                                                                                             a3, a0, .LBB0 2
                                                                                 bne
        SiFive leadership team!
                                                                                            #%for.cond.cleanup
                                                                     .LBB0 3:
                                                                                 ret
```

### **Full Support for TensorFlow Lite**



### SiFive Recode: Translates SIMD code to RISC-V Vectors

Protect your existing software investment, migrate with confidence



#### Compiled SiFive Assembly Code



### Fault Tolerance Plans

#### Dual Core Lock Step (DCLS):

- supporting ASIL-D system as SEOOC
- Cluster works as 2 Cores (Primary)
- N-cycle delayed shadow checker (N would be small)
- ۲ FMEDA: Failure Modes, Effects & **Diagnostic Analysis**

#### Split-Lock Mode:

- Supporting ASIL-B system as SEooC
- Cluster works as 4 cores when comparators & replicators disabled
- Integrated comparators compare outputs from the logical and redundant processing elements to detect for divergence.
- Each core is ASIL-B Capable ۲
- FMEDA: Failure Modes, Effects & ٠ **Diagnostic Analysis**





## Other considerations for fault tolerance

#### **Memory Protection:**

- All SRAMs within core protected (e.g., parity, ECC, others) with few exceptions
- Single Error Correction, Double Error Detection ECC and Parity protection in cache

#### **Interconnect Protection:**

• Multi-cluster NOC will support edge-to-edge ECC & CRC where applicable

#### Safety Test Software Libraries :

- Boot time diagnostics
- Non-destructive runtime diagnostics execute during normal system operation and therefore must not destroy any data and must complete their specific function in a clearly defined time
- safety-ensured toolchain



## Freedom U SDK

### What is Freedom U SDK

- A software development kit providing a Yocto/OpenEmbedded
   recipe to build your own custom Linux distribution
  - System is built from reusable and composable layers
- Build embedded linux images and user-spaces
  - Build U-Boot SPL, OpenSBI, U-Boot, Device Tree Binary
  - Customize userspace for your needs
  - Pre-built disc image configured for software development
- Example: meta-sifive is built on top of meta-riscv
  - SiFive specific patches
  - SiFive specific targets like Freedom Unleashed and QEMU

Yocto / OpenEmbedded Layers



## Investing in High-Performance

### 3/5-Series 7

Efficient performance 5-6 stage single-issue

#### 7-Series

High performance 8 stage superscalar

#### U5-Series

Linux-capable application processors > U54, U54-MC Compare to Cortex-A3x

#### U7-Series

High performance Linux-capable > U74, U74-MC Compare to Cortex-A53

#### P200-Series

Efficient performance Integrated Vectors

#### P270

Efficient Performance Vector Application Core Cortex-A55

### Highest Performance

**P550** 

P500-Series

Maximum performance

3-wide OoO Superscalar

Application Core Cortex-A72, A73, 75

SiFive has been leading the RISC-V movement into the high-end since launching the Coherent Multi-Core capable U5 Core Series in 2017 and the U7 Core Series in 2018.



### **HiFive Unleashed**

World's First Coherent Multicore RISC-V Linux Capable Silicon 4x SiFive U54 cores, 8GB DDR, Ethernet, Serial and Digital IO Used to build the RISC-V Linux ecosystem First available **February 2018** 

High-performance processors takes more than just great hardware to reach their full potential, recognizing this SiFive has also invested in creating RISC-V silicon to enable and empower the ecosystem



### **HiFive Unmatched**

Higher performance and more connectivity 4x SiFive U74 cores, 16GB DDR, PCIe Gen 3 Used to establish Linux build farms and further expand the Linux ecosystem First available Q1 2021

### **FPGA Software Development Platform**

# On the Shelf FPGA Platform for Multi-Core Software Development

- Based on Xilinx VCU118 Readily available
   high-capacity FPGA development board
- Image contains P550 and IO subsystem running at 32MHz
- Ethernet, Serial, Debug, Trace, 4GB DDR

### Freedom-U-SDK Linux BSP

- SD-Card based Linux boot image with naitive development tools
- Cross-Compiler environment for User-Space application development
- Yocto based workflow also available for BSP level development



### SiFive Insight Advanced Trace and Debug

All SiFive licenses include JTAG run-control debug including multi-core support. In addition to standard debug, the Advanced Trace and Debug option adds instruction trace using the Nexus 5001 HTM Encoding with class leading trace compression. The Advanced and Trace and Debug option allows for trace to be sent to dedicated SRAM, System Memory, or a dedicated pin interface for maximum flexibility.

| Feature                    | Notes                                                                                        |
|----------------------------|----------------------------------------------------------------------------------------------|
| Nexus HTM Trace<br>Encoder | Trace compression as low as 0.4 bits/instruction                                             |
| PC Sampling                | Enables non-intrusive hot-spot execution profiling                                           |
| Trace Funnel               | Combines trace from multiple cores into a single stream                                      |
| Multiple Trace Sinks       | SRAM, PIB, System Memory                                                                     |
| 40-bit Trace Timestamps    | Encodes timing information into trace packets                                                |
| Instrumented Trace (ITC)   | Allows the programmer to instrument<br>code with printf style writes to the<br>Trace Encoder |



# Summary

- RISC-V offers the best architecture for long-term mission critical applications
- SiFive is the recognized leader in RISC-V computing
  - Based in the USA in California and Texas
- SiFive Performance and Intelligence families are available today
  - Deliver unsurpassed performance and efficiency
- Clear development path to new features
   enabling fault tolerance

