Skip to content

Data Pipeline written in Go to transform data from an S3 bucket and insert into a Redshift table

Notifications You must be signed in to change notification settings

mjanicki01/go-data-pipeline

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Real-time Data Processing with Go, Elastic Beanstalk, and Redshift

Project Overview

This project demonstrates a real-time data processing pipeline using a Go application. The application processes data stored in S3 and stores the results in Redshift. The Go application is deployed using Elastic Beanstalk.

Objective

  • Create a real-time data processing pipeline with a Go application.
  • Deploy the application using Elastic Beanstalk.
  • Store processed data in Amazon Redshift.
  • Verify data processing and storage using SQL queries in Redshift.

Prerequisites

Ensure you have the following installed on your local machine:

AWS Setup

  1. Download Dataset

  2. Create an S3 Bucket

  3. Upload the Dataset to S3

    • Upload the Online Retail CSV file to the S3 bucket you created. Instructions can be found here.
  4. Create a Redshift Cluster

  5. Create a New Database within the Redshift Cluster

.env Variables

# To read from S3:
REGION=
BUCKET=
KEY=     #name of the .csv file in S3

# To push data to Redshift:
AWS_ACCESS_KEY_ID=
AWS_SECRET_ACCESS_KEY=
REDSHIFT_CONN_STRING=

Run with Docker

docker-compose build
docker-compose up

# To print the processed data:
curl "http://localhost:8080?action=print"
(note: the data is printed in the Docker container's console, not where curl is called)

# To insert processed data into Redshift:
curl "http://localhost:8080?action=insert"

Run without Docker

go mod tidy

# To print the processed data:
go run main.go -action=print

# To insert processed data into Redshift:
go run main.go -action=insert

About

Data Pipeline written in Go to transform data from an S3 bucket and insert into a Redshift table

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published