Skip to content

AoyuQC/FPGA

 
 

Repository files navigation

 _____ ____   ____    _    
|  ___|  _ \ / ___|  / \   
| |_  | |_) | |  _  / _ \  
|  _| |  __/| |_| |/ ___ \ 
|_|   |_|    \____/_/   \_\

Cook FPGA

This repository is intended for folks who are new and want to learn something about FPGA. This repository is a collection of useful resources and links rather than a thorough FPGA tutorial. Traditional HDL (Hard and Difficult Language) is not the main focus, instead, we focus on using high-level languages (e.g., C++) to cook FPGA.

Originally, this repository was started by a newbie to record his learning of FPGA, and late made public in the hope that it could help researchers to start their journey along with FPGA, with less pain and whiskey.

Resources collected here, or the way contents are organized, are not in their perfect shape. This repository is still raw and need major improvements. Any form of contribution is welcomed and appreciated.

Main contents:

  • README.md
    • Basics about Digital Design
    • Basics about FPGA
    • Relevant Courses and Books
    • Papers about FPGA internal
  • Xilinx
    • xilinx.md
    • xilinx_constraints.md
    • xilinx_cheatsheet.md
    • xilinx_lessons_vivado.md
    • xilinx_lessons_hls.md
  • submodules/: Github repositories about FPGA
  • hls/: Sample Xilinx HLS C++ code
    • AXI Stream
    • Network protocol processing
  • xilinx_arty_a7: Sample Xilinx projects for Arty A7 100 board
    • Tri-mode MAC reference design
    • Simple LED
    • Clocked LED
  • FAQ.md
    • Some implementation questions about FPGA

Get Started

FPGA Intro

Digital Basics

Verilog

High-Level Synthesis (HLS)

Courses

Books

Papers

Virtualization

How to apply Operating System concept to FPGA? How to virtualize on-board memory and on-chip logic? And, how is FPGA ultimately different from CPU in items of resource sharing? Papers in this section could give you some hint.

General

Memory Hierarchy

Dynamic Memory Allocation

Integrate with Virtual Memory

Integrate OS/CPU/FPGA

Applications

What are the typical applications that can be offloaded into FPGA? What has already been done before? This section lists many interesting applications and systems deployed on FPGA.

Integrate with Frameworks

  • Map-reduce as a Programming Model for Custom Computing Machines, FCCM'08
    • This paper proposes a model to translate MapReduce code written in C to code that could run on FPGA and GPU. Many details are omitted, and they don't really have the compiler.
    • Single-host framework, everything is in FPGA and GPU.
  • Axel: A Heterogeneous Cluster with FPGAs and GPUs, FPGA'10
    • A distributed MapReduce Framework, targets clusters with CPU, GPU, and FPGA. Mainly the idea of scheduling FPGA/GPU jobs.
    • Distributed Framework.
  • FPMR: MapReduce Framework on FPGA, FPGA'10
    • A MapReduce framework on a single host's FPGA. You need to write Verilog/HLS for processing logic to hook with their framework. The framework mainly includes a data transfer controller, a simple schedule that enable certain blocks at certain time.
    • Single-host framework, everything is in FPGA.
  • Melia: A MapReduce Framework on OpenCL-Based FPGAs, IEEE'16
    • Another framework, written in OpenCL, and users can use OpenCL to program as well. Similar to previous work, it's more about the framework design, not specific algorithms on FPGA.
    • Single-host framework, everything is in FPGA. But they have a discussion on running on multiple FPGAs.
    • Four MapReduce FPGA papers here, I believe there are more. The marriage between MapReduce and FPGA is not something hard to understand. FPGA can be viewed as another core with different capabilities. The thing is, given FPGA's reprogram-time and limited on-board memory, how to design a good scheduling algorithm and data moving/caching mechanisms. Those papers give some hints on this.
  • UCLA: When Apache Spark Meets FPGAs: A Case Study for Next-Generation DNA Sequencing Acceleration, HotCloud'16
  • UCLA: Programming and Runtime Support to Blaze FPGA Accelerator Deployment at Datacenter Scale, SoCC'16
    • A system that hooks FPGA with Spark.
    • There is a line of work that hook FPGA with big data processing framework (Spark), so the implementation of FPGA and the scale-out software can be separated. The Spark can schedule FPGA jobs to different machines, and take care of scale-out, failure handling etc. But, I personally think this line of work is really just an extension to ReconOS/FUSE/BORPH line of work. The main reason is: both these two lines of work try to integrate jobs run on CPU and jobs run on FPGA, so CPU and FPGA have an easier way to talk, or put in another way, CPU and FPGA have a better division of labor. Whether it's single-machine (like ReconOS, Melia), or distributed (like Blaze, Axel), they are essentially the same.
  • UCLA: Heterogeneous Datacenters: Options and Opportunities, DAC'16
    • Follow up work of Blaze. Nice comparison of big and wimpy cores.

Cloud Infrastructure

Programmable Network

Database

Machine Learning

  • Optimizing FPGA-based Accelerator Design for Deep Convolutional Neural Networks, FPGA'15
  • From High-Level Deep Neural Models to FPGAs, ISCA'16
  • Deep Learning on FPGAs: Past, Present, and Future, arXiv'16
  • Accelerating binarized neural networks: Comparison of FPGA, CPU, GPU, and ASIC, FPT'16
  • FINN: A Framework for Fast, Scalable Binarized Neural Network Inference, FPGA'17
  • In-Datacenter Performance Analysis of a Tensor Processing Unit, ISCA'17
  • Accelerating Binarized Convolutional Neural Networks with Software-Programmable FPGAs, FPGA'17
  • A Configurable Cloud-Scale DNN Processor for Real-Time AI, ISCA'18
  • A Network-Centric Hardware/Algorithm Co-Design to Accelerate Distributed Training of Deep Neural Networks, MICRO'18
  • DNNBuilder: an Automated Tool for Building High-Performance DNN Hardware Accelerators for FPGAs, ICCAD'18
  • FA3C : FPGA-Accelerated Deep Reinforcement Learning, ASPLOS’19

Graph

  • A Scalable Processing-in-Memory Accelerator for Parallel Graph Processing, ISCA'15
  • Energy Efficient Architecture for Graph Analytics Accelerators, ISCA'16
  • Boosting the Performance of FPGA-based Graph Processor using Hybrid Memory Cube: A Case for Breadth First Search, FPGA'17
  • FPGA-Accelerated Transactional Execution of Graph Workloads, FPGA'17
  • An FPGA Framework for Edge-Centric Graph Processing, CF'18

KVS

  • Achieving 10Gbps line-rate key-value stores with FPGAs, HotCloud'13
  • Thin Servers with Smart Pipes: Designing SoC Accelerators for Memcached, ISCA'13
  • An FPGA Memcached Appliance, FPGA'13
  • Scaling out to a Single-Node 80Gbps Memcached Server with 40Terabytes of Memory, HotStorage'15
  • KV-Direct: High-Performance In-Memory Key-Value Store with Programmable NIC, SOSP'17
  • Ultra-Low-Latency and Flexible In-Memory Key-Value Store System Design on CPU-FPGA, FPT'18

Genome

Consensus

  • Consensus in a Box: Inexpensive Coordination in Hardware, NSDI'16

Video Processing

  • TODO

Blockchain

  • TODO

Micro-services

  • TODO

Languages

  • From JVM to FPGA: Bridging Abstraction Hierarchy via Optimized Deep Pipelining, HotCloud'18

FPGA Internal

General

Partial Reconfiguration

Logical Optimization and Technology Mapping

Place and Route

RTL2FPGA

About

Recipe for FPGA cooking

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Verilog 84.2%
  • Tcl 10.9%
  • C++ 2.9%
  • Coq 1.7%
  • Makefile 0.2%
  • C 0.1%