Skip to content

Design and implement a hardware accelerator for LeNet Inference CNN structure in SV, following the Data-flow architecture.

Notifications You must be signed in to change notification settings

KushalKumarJujare/LeNetHWAccelerator

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

LeNetHWAccelerator - Goal

Design and implement a hardware accelerator for LeNet Inference Deep CNN (DNN) structure in SV, following the Data-flow architecture. This implementation will be parameterized to accommodate other DNN architectures (AlexNet, VGG etc.,) as well.

Approach

  1. Implement LeNet Training DNN using Pytorch library and store Weights and Bias matrices after 5 epochs.
  2. Implement Parameterized LeNet Inference DNN architecture from scratch, using basic Python (to check if my understanding of the implementation details is correct).
  3. Verify convolution, maxpool operations and the DNN flow in the above code, using multiple Image and Kernel (weights) matrices.
  4. Implement the LeNet DNN in SV using the similar steps as (2), paralelly following the data-flow architecture for reduced area and power consumption. i. Implement a parameterized Convolution structure, that can read Image (random image) and Kernel matrices (random : Laplacian, SobelX, SobelY and its variations etc.,). This structure should generate the convoluted output of those 2 matrices. ii. Verify if the SV output is able to reconstruct the convoluted matrix by following data-flow approach. iii. Thoroughly verify Convoluted SV output for different sizes and values of Image and Kernel matrices as well as different Stride values. iv. Follow similar approach and implement the SV code to perform Maxpool, ReLu and FC operation separately. v. Integrate all the different blocks (Conv, Maxpool, FC and ReLu) according to LeNet DNN architecture.
  5. Use the actual Weight (Kernel) matrices generated by the trained DNN in step (1), to verify and calculate the accuracy of the SV based LeNet DNN HW accelerator.
  6. Integrate the individual blocks (Conv, Maxpool, FC and ReLu) to generate other DNN architectures and verify the accuracy.
  7. Use other HW reducing techniques (Modified booth algorithm etc.,) to further optimize the hardware.

Status (Aug 4, 2020)

Completed till step 4.iii

Constraints

-> Images need to be Gray-scaled (ie., only 1 colour channel). -> Real values of weights and Bias matrices should be converted to whole numbers (0,1,2, etc.,). -> Allowed parameterization : Size (l,b) of Image and Kernel matrices, Stride, Bit width of each pixel in Image and Kernal matrices. As of now, padding is kept to 0. Eg: parameter IMAGE_PIXEL_WIDTH = 8, parameter KERNEL_PIXEL_WIDTH = 5, parameter WC1_IMAGE_WIDTH = 30, parameter WC1_KERNEL = 3, parameter WC1_STRIDE = 3, parameter WC1_PADDING = 0

Introduction

Yann LeCun, Leon Bottou, Yosuha Bengio and Patrick Haffner proposed a neural network architecture for handwritten and machine-printed character recognition in 1990’s which they called LeNet-5.

image

The LeNet-5 architecture consists of two sets of convolutional and average pooling layers, followed by a flattening convolutional layer, then two fully-connected layers and finally a softmax classifier.

First Layer:

The input for LeNet-5 is a 32×32 grayscale image which passes through the first convolutional layer with 6 feature maps or filters having size 5×5 and a stride of one. The image dimensions changes from 32x32x1 to 28x28x6.

image

Second Layer:

Then the LeNet-5 applies average pooling layer or sub-sampling layer with a filter size 2×2 and a stride of two. The resulting image dimensions will be reduced to 14x14x6.

image

Third Layer:

Next, there is a second convolutional layer with 16 feature maps having size 5×5 and a stride of 1. In this layer, only 10 out of 16 feature maps are connected to 6 feature maps of the previous layer as shown below.

image

The main reason is to break the symmetry in the network and keeps the number of connections within reasonable bounds. That’s why the number of training parameters in this layers are 1516 instead of 2400 and similarly, the number of connections are 151600 instead of 240000.

image

Fourth Layer:

The fourth layer (S4) is again an average pooling layer with filter size 2×2 and a stride of 2. This layer is the same as the second layer (S2) except it has 16 feature maps so the output will be reduced to 5x5x16.

image

Fifth Layer:

The fifth layer (C5) is a fully connected convolutional layer with 120 feature maps each of size 1×1. Each of the 120 units in C5 is connected to all the 400 nodes (5x5x16) in the fourth layer S4.

image

Sixth Layer:

The sixth layer is a fully connected layer (F6) with 84 units.

image

Output Layer:

Finally, there is a fully connected softmax output layer ŷ with 10 possible values corresponding to the digits from 0 to 9.

image

image

About

Design and implement a hardware accelerator for LeNet Inference CNN structure in SV, following the Data-flow architecture.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published