Cook FPGA

 _____ ____   ____    _    
|  ___|  _ \ / ___|  / \   
| |_  | |_) | |  _  / _ \  
|  _| |  __/| |_| |/ ___ \ 
|_|   |_|    \____/_/   \_\

Cook FPGA

This repository is intended for folks who are new and want to learn something about FPGA. This repository is a collection of useful resources and links rather than a thorough FPGA tutorial. Traditional HDL (Hard and Difficult Language) is not the main focus, instead, we focus on using high-level languages (e.g., C++) to cook FPGA.

Originally, this repository was started by a newbie to record his learning of FPGA, and late made public in the hope that it could help researchers to start their journey along with FPGA, with less pain and whiskey.

Resources collected here, or the way contents are organized, are not in their perfect shape. This repository is still raw and need major improvements. Any form of contribution is welcomed and appreciated.

Main contents:

README.md
- Basics about Digital Design
- Basics about FPGA
- Relevant Courses and Books
- Papers about FPGA internal
Xilinx
- xilinx.md
- xilinx_constraints.md
- xilinx_cheatsheet.md
- xilinx_lessons_vivado.md
- xilinx_lessons_hls.md
submodules/: Github repositories about FPGA
hls/: Sample Xilinx HLS C++ code
- AXI Stream
- Network protocol processing
xilinx_arty_a7: Sample Xilinx projects for Arty A7 100 board
- Tri-mode MAC reference design
- Simple LED
- Clocked LED
FAQ.md
- Some implementation questions about FPGA

Get Started

FPGA Intro

URL: RapidWright FPGA Architecture Basics
URL: RapidWright Xilinx Architecture Terminology
Book: Parallel Programming for FPGAs
- Basic about FPGA and HLS
URL: All about FPGAs, EE Times
Slides: Intro FPGA CSE467 UW
URL: I/O Pads
- BGA Wiki: In a BGA the pins are replaced by pads on the bottom of the package. If you check PGA package, you will know the difference between pin and pad, and immediately get why it is called pad. And you will also know what's the pad in the IO Block diagram.

Digital Basics

Verilog

High-Level Synthesis (HLS)

Courses

Online: Real Digital
CMU ECE 18-643
- I like its slides, very informative. Slides about PR, Verilog, HLS are good.
- Also read its references, all quite good papers.
Cornell ECE5775 from Prof. Zhiru Zhang
GMU ECE 699 Software/Hardware Co-design S16
GMU ECE 699 Software/Hardware Co-design S15
- DAMN, this is a good and practical course.
MIT 6.111 Introductory Digital Systems Laboratory
MIT 6.375 Complex Digital Systems
UCB EECS 151/251A

Books

Parallel Programming for FPGAs
The Zynq book
- 15.5.3 Pipelining
- 15.5.4 Dataflow
FPGAs for Software Programmers
Data Processing on FPGAs, Synthesis Lectures on Data Management

Papers

Virtualization

How to apply Operating System concept to FPGA? How to virtualize on-board memory and on-chip logic? And, how is FPGA ultimately different from CPU in items of resource sharing? Papers in this section could give you some hint.

General

Memory Hierarchy

(Papers deal with BRAM, registers, on-board DRAM, and system DRAM)
LEAP Scratchpads: Automatic Memory and Cache Management for Reconfigurable Logic, FPGA'11
- Main design hierarchy: Use BRAM as L1 cache, use on-board DRAM as L2 cache, and host memory as the backing store. Everthing is abstracted away through their interface (similar to load/store). Programming is pretty much the same as if you are writing for CPU.
- According to sec 2.2.2, its scratchpad controller, is using simple segment-based mapping scheme. Like AmorphOS's one.
LEAP Shared Memories: Automating the Construction of FPGA Coherent Memories, FCCM'14
- Follow up work on LEAP Scratchpads, extends the work to have cache coherence between multiple FPGAs.
- Coherent Scatchpads with MOSI protocol.
CoRAM: An In-Fabric Memory Architecture for FPGA-Based Computing
- CoRAM provides an interface for managing the on- and off-chip memory resource of an FPGA.
- Cache, TLB, NoC, it has almost everything. The thesis is very comprehensive and informative.
Sharing, Protection, and Compatibility for Reconfigurable Fabric with AMORPHOS, OSDI'18
- Hull: provides memory protection for on-board DRAM using segment-based address translation.
Virtualized Execution Runtime for FPGA Accelerators in the Cloud, IEEE Access'17

Dynamic Memory Allocation

A High-Performance Memory Allocator for Object-Oriented Systems, IEEE'96
SysAlloc: A Hardware Manager for Dynamic Memory Allocation in Heterogeneous Systems, FPL'15
- malloc() and free() for FPGA on-board DRAM.
Hi-DMM: High-Performance Dynamic Memory Management in High-Level Synthesis, IEEE'18

Integrate with Virtual Memory

(Papers deal with OS Virtual Memory System. Note that, all these papers, they introduce some form of MMU into FPGA to let FPGA work with host virtual memory systems. This added MMU is similar to CPU's MMU in the sense that they both do address translation. But, do note that the virtual memory system still runs in Linux, these include page fault handling, swapping, TLB shootdown stuff. What could really stands out, is to implement virtual memory system in FPGA. :-/ )
Virtual Memory Window for Application-Specific Reconfigurable Coprocessors, DAC'04
- Early work that adds a new MMU to FPGA to let FPGA logic access on-chip DRAM. Note, it's not the system main memory. Thus the translation pgtable is different.
- Has some insights on prefetching and MMU CAM design.
Seamless Hardware Software Integration in Reconfigurable Computing Systems, 2005
- Follow up summary on previous DAC'04 Virtual Memory Window.
A Reconfigurable Hardware Interface for a Modern Computing System, FCCM'07
- This work adds a new MMU which includes a 16-entry TLB to FPGA. FPGA and CPU shares the same user virtual address space, use the same physical memory. FPGA and CPU share memory at cacheline granularity, FPGA is just another core in this sense. Upon a TLB miss at FPGA MMU, the FPGA sends interrupt to CPU, to let software to handle the TLB miss. Using software-managed TLB miss is not efficient. But they made cache coherence between FPGA and CPU easy.
Low-Latency High-Bandwidth HW/SW Communication in a Virtual Memory Environment, FPL'08
- This work actually add a new MMU to FPGA, which works just like CPU MMU. It's similar to IOMMU, in some sense.
- But I think they missed one important aspect: cache coherence between CPU and FPGA. There is not too much information about this in the paper, it seems they do not have cache at FPGA. Anyhow, this is why recently CCIX and OpenCAPI are proposed.
Memory Virtualization for Multithreaded Reconfigurable Hardware, FPL'11
- Part of the ReconOS project
- They implemented a simple MMU inside FPGA that includes a TLB. On protection violation or page invalid access cases, their MMU just hand over to CPU pgfault routines. How is this different from the FPL'08 one? Actually, IMO, they are the same.
S4 Virtualized Execution Runtime for FPGA Accelerators in the Cloud, IEEE Access'17
- This paper also implemented a hardware MMU, but the virtual memory system still run on Linux.
- Also listed in Cloud Infrastructure part.
Lightweight Virtual Memory Support for Many-Core Accelerators in Heterogeneous Embedded SoCs, 2015
Lightweight Virtual Memory Support for Zero-Copy Sharing of Pointer-Rich Data Structures in Heterogeneous Embedded SoCs, IEEE'17
- Part of the PULP project.
- Essentially a software-managed IOMMU. The control path is running as a Linux kernel module. The datapath is a lightweight AXI transation translation.

Integrate OS/CPU/FPGA

A Virtual Hardware Operating System for the Xilinx XC6200, FPL'96
Operating systems for reconfigurable embedded platforms: online scheduling of real-time tasks, IEEE'04
hthreads: a hardware/software co-designed multithreaded RTOS kernel, 2005
Reconfigurable computing: architectures and design methods, IEE'05
BORPH: An Operating System for FPGA-Based Reconfigurable Computers. PhD Thesis.
FUSE: Front-end user framework for O/S abstraction of hardware accelerators, FCCM'11
ReconOS – an Operating System Approach for Reconfigurable Computing, IEEE Micro'14
- Invoke kernel from FPGA. They built a shell in FPGA and delegation threads in CPU to achieve this.
- They implemented their own MMU (using pre-established pgtables) to let FPGA logic to access system memory. Ref.
- Read the "Operating Systems for Reconfigurable Computing" sidebar, nice summary.

Applications

What are the typical applications that can be offloaded into FPGA? What has already been done before? This section lists many interesting applications and systems deployed on FPGA.

Integrate with Frameworks

Map-reduce as a Programming Model for Custom Computing Machines, FCCM'08
- This paper proposes a model to translate MapReduce code written in C to code that could run on FPGA and GPU. Many details are omitted, and they don't really have the compiler.
- Single-host framework, everything is in FPGA and GPU.
Axel: A Heterogeneous Cluster with FPGAs and GPUs, FPGA'10
- A distributed MapReduce Framework, targets clusters with CPU, GPU, and FPGA. Mainly the idea of scheduling FPGA/GPU jobs.
- Distributed Framework.
FPMR: MapReduce Framework on FPGA, FPGA'10
- A MapReduce framework on a single host's FPGA. You need to write Verilog/HLS for processing logic to hook with their framework. The framework mainly includes a data transfer controller, a simple schedule that enable certain blocks at certain time.
- Single-host framework, everything is in FPGA.
Melia: A MapReduce Framework on OpenCL-Based FPGAs, IEEE'16
- Another framework, written in OpenCL, and users can use OpenCL to program as well. Similar to previous work, it's more about the framework design, not specific algorithms on FPGA.
- Single-host framework, everything is in FPGA. But they have a discussion on running on multiple FPGAs.
- Four MapReduce FPGA papers here, I believe there are more. The marriage between MapReduce and FPGA is not something hard to understand. FPGA can be viewed as another core with different capabilities. The thing is, given FPGA's reprogram-time and limited on-board memory, how to design a good scheduling algorithm and data moving/caching mechanisms. Those papers give some hints on this.
UCLA: When Apache Spark Meets FPGAs: A Case Study for Next-Generation DNA Sequencing Acceleration, HotCloud'16
UCLA: Programming and Runtime Support to Blaze FPGA Accelerator Deployment at Datacenter Scale, SoCC'16
- A system that hooks FPGA with Spark.
- There is a line of work that hook FPGA with big data processing framework (Spark), so the implementation of FPGA and the scale-out software can be separated. The Spark can schedule FPGA jobs to different machines, and take care of scale-out, failure handling etc. But, I personally think this line of work is really just an extension to ReconOS/FUSE/BORPH line of work. The main reason is: both these two lines of work try to integrate jobs run on CPU and jobs run on FPGA, so CPU and FPGA have an easier way to talk, or put in another way, CPU and FPGA have a better division of labor. Whether it's single-machine (like ReconOS, Melia), or distributed (like Blaze, Axel), they are essentially the same.
UCLA: Heterogeneous Datacenters: Options and Opportunities, DAC'16
- Follow up work of Blaze. Nice comparison of big and wimpy cores.

Cloud Infrastructure

Huawei: FPGA as a Service in the Cloud
UCLA: Customizable Computing: From Single Chip to Datacenters, IEEE'18
UCLA: Accelerator-Rich Architectures: Opportunities and Progresses, DAC'14
- Reminds me of OmniX. Disaggregation at a different scale.
- This paper actually targets single-machine case. But it can reflect a distributed setting.
Enabling FPGAs in the Cloud, CF'14
- Paper raised four important aspects to enable FPGA in cloud: Abstraction, Sharing, Compatibility, and Security. FPGA itself requires a shell (paper calls it service logic) and being partitioned into multiple slots. Things discussed in the paper are straightforward, but worth reading. They did not solve the FPGA sharing issue, which, is solved by AmorphOS.
FPGAs in the Cloud: Booting Virtualized Hardware Accelerators with OpenStack, FCCM'14
- Use OpenStack to manage FPGA resource. The FPGA is partitioned into multiple regions, each region can use PR. The FPGA shell includes: 1) basic MAC, and packet dispatcher, 2) memory controller, and segment-based partition scheme, 3) a soft processor used for runtime PR control. One very important aspect of this project is: they envision input to FPGA comes from Ethernet, which is very true nowadays. And this also makes their project quite similar to Catapult. It's a very solid paper, though the evaluation is a little bit weak. What could be added: migration, different-sized region.
- The above CF and FCCM papers are similar in the sense that they are both building SW framework and HW shell to provide a unified cloud management system. They differ in their shell design: CF one take inputs from DMA engine, which is local system DRAM, FCCM one take inputs from Ethernet. The things after DMA or MAC, are essentially similar.
- It seems all of them are using simple segment-based memory partition for user FPGA logic. What's the pros and cons of using paging here?
S1 DyRACT: A partial reconfiguration enabled accelerator and test platform, FPL'14
S2 Virtualized FPGA Accelerators for Efficient Cloud Computing, CloudCom'15
S3 Designing a Virtual Runtime for FPGA Accelerators in the Cloud, FPL'16
S4 Virtualized Execution Runtime for FPGA Accelerators in the Cloud, IEEE Access'17
- The above four papers came from the same group of folks. S1 developed a framework to use PCIe to do PR, okay. S2 is a follow-up on S1, read S2's chapter IV hardware architecture, many implementation details like internal FPGA switch, AXI stream interface. But no memory virtualization discussion. S3 is a two page short paper. S4 is the realization of S3. I was particularly interested if S4 has implemented their own virtual memory management. The answer is NO. S4 leveraged on-chip Linux, they just build a customized MMU (in the form of using BRAM to store page tables. This approach is similar to the papers listed in Integrate with Virtual Memory). Many things discussed in S4 have been proposed multiple times in previous cloud FPGA papers since 2014.
MS: A Reconfigurable Fabric for Accelerating Large-Scale Datacenter Services, ISCA'14
MS: A Cloud-Scale Acceleration Architecture, Micro'16
- Catapult is unique in its shell, which includes the Lightweight Transport Layer (LTL), and Elastic Router(ER). The cloud management part, which the paper just briefly mentioned, actually should include everything the above CF'14 and FCCM'14 have. The LTL has congestion control, packet loss detection/resend, ACK/NACK. The ER is a crossbar switch used by FPGA internal modules, which is essential to connect shell and roles.
- These two Catapult papers are simply a must read.
MS: A Configurable Cloud-Scale DNN Processor for Real-Time AI, Micro'18
MS: Azure Accelerated Networking: SmartNICs in the Public Cloud, NSDI'18
MS: Direct Universal Access : Making Data Center Resources Available to FPGA, NSDI'19
- Catapult is just sweet, isn't it?
ASIC Clouds: Specializing the Datacenter, ISCA'16

Programmable Network

Database

Accelerating database systems using FPGAs: A survey, FPL'18

Machine Learning

Optimizing FPGA-based Accelerator Design for Deep Convolutional Neural Networks, FPGA'15
From High-Level Deep Neural Models to FPGAs, ISCA'16
Deep Learning on FPGAs: Past, Present, and Future, arXiv'16
Accelerating binarized neural networks: Comparison of FPGA, CPU, GPU, and ASIC, FPT'16
FINN: A Framework for Fast, Scalable Binarized Neural Network Inference, FPGA'17
In-Datacenter Performance Analysis of a Tensor Processing Unit, ISCA'17
Accelerating Binarized Convolutional Neural Networks with Software-Programmable FPGAs, FPGA'17
A Configurable Cloud-Scale DNN Processor for Real-Time AI, ISCA'18
A Network-Centric Hardware/Algorithm Co-Design to Accelerate Distributed Training of Deep Neural Networks, MICRO'18
DNNBuilder: an Automated Tool for Building High-Performance DNN Hardware Accelerators for FPGAs, ICCAD'18
FA3C : FPGA-Accelerated Deep Reinforcement Learning， ASPLOS’19

Graph

A Scalable Processing-in-Memory Accelerator for Parallel Graph Processing, ISCA'15
Energy Efficient Architecture for Graph Analytics Accelerators, ISCA'16
Boosting the Performance of FPGA-based Graph Processor using Hybrid Memory Cube: A Case for Breadth First Search, FPGA'17
FPGA-Accelerated Transactional Execution of Graph Workloads, FPGA'17
An FPGA Framework for Edge-Centric Graph Processing, CF'18

KVS

Achieving 10Gbps line-rate key-value stores with FPGAs, HotCloud'13
Thin Servers with Smart Pipes: Designing SoC Accelerators for Memcached, ISCA'13
An FPGA Memcached Appliance, FPGA'13
Scaling out to a Single-Node 80Gbps Memcached Server with 40Terabytes of Memory, HotStorage'15
KV-Direct: High-Performance In-Memory Key-Value Store with Programmable NIC, SOSP'17
- This link is also useful for better understading Morning Paper
Ultra-Low-Latency and Flexible In-Memory Key-Value Store System Design on CPU-FPGA, FPT'18

Genome

Consensus

Consensus in a Box: Inexpensive Coordination in Hardware, NSDI'16

Video Processing

TODO

Blockchain

TODO

Micro-services

TODO

Languages

From JVM to FPGA: Bridging Abstraction Hierarchy via Optimized Deep Pipelining, HotCloud'18

FPGA Internal

General

FPGA and CPLD architectures: a tutorial, 1996
Reconfigurable computing: a survey of systems and software, 2002
Reconfigurable computing: architectures and design methods
FPGA Architecture: Survey and Challenges, 2007
- Read the first two paragraphs of each section and then come back to read all of that if needed.
RAMP: Research Accelerator For Multiple Processors, 2007
Three Ages of FPGAs: A Retrospective on the First Thirty Years of FPGA Technology, IEEE'15

Partial Reconfiguration

Logical Optimization and Technology Mapping

Place and Route

RTL2FPGA

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Cook FPGA

Get Started

Papers

Virtualization

Applications

FPGA Internal

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 191 Commits
Figures/Xilinx		Figures/Xilinx
hls		hls
submodules		submodules
verilog		verilog
xilinx_arty_a7		xilinx_arty_a7
.gitignore		.gitignore
.gitmodules		.gitmodules
DISCLAIMER		DISCLAIMER
FAQ.md		FAQ.md
LICENSE		LICENSE
README.md		README.md
xilinx.md		xilinx.md
xilinx_cheatsheet.md		xilinx_cheatsheet.md
xilinx_constraints.md		xilinx_constraints.md
xilinx_lessons_hls.md		xilinx_lessons_hls.md
xilinx_lessons_vivado.md		xilinx_lessons_vivado.md
xilinx_timing.md		xilinx_timing.md

License

AoyuQC/FPGA

Folders and files

Latest commit

History

Repository files navigation

Cook FPGA

Get Started

Papers

Virtualization

Applications

FPGA Internal

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages