Real-time Data Processing with Go, Elastic Beanstalk, and Redshift

Project Overview

This project demonstrates a real-time data processing pipeline using a Go application. The application processes data stored in S3 and stores the results in Redshift. The Go application is deployed using Elastic Beanstalk.

Objective

Create a real-time data processing pipeline with a Go application.
Deploy the application using Elastic Beanstalk.
Store processed data in Amazon Redshift.
Verify data processing and storage using SQL queries in Redshift.

Prerequisites

Ensure you have the following installed on your local machine:

AWS Setup

Download Dataset
- Download the Online Retail dataset from UCI Machine Learning Repository.
Create an S3 Bucket
- Follow the instructions to create an S3 bucket.
Upload the Dataset to S3
- Upload the Online Retail CSV file to the S3 bucket you created. Instructions can be found here.
Create a Redshift Cluster
- Follow the steps to create a Redshift cluster.
Create a New Database within the Redshift Cluster
- Once your Redshift cluster is created, use the AWS Management Console or AWS CLI to create a new database within the cluster.

.env Variables

# To read from S3:
REGION=
BUCKET=
KEY=     #name of the .csv file in S3

# To push data to Redshift:
AWS_ACCESS_KEY_ID=
AWS_SECRET_ACCESS_KEY=
REDSHIFT_CONN_STRING=

Run with Docker

docker-compose build
docker-compose up

# To print the processed data:
curl "http://localhost:8080?action=print"
(note: the data is printed in the Docker container's console, not where curl is called)

# To insert processed data into Redshift:
curl "http://localhost:8080?action=insert"

Run without Docker

go mod tidy

# To print the processed data:
go run main.go -action=print

# To insert processed data into Redshift:
go run main.go -action=insert

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.gitignore		.gitignore
Dockerfile		Dockerfile
docker-compose.yml		docker-compose.yml
go.mod		go.mod
go.sum		go.sum
main.go		main.go
readme.md		readme.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Real-time Data Processing with Go, Elastic Beanstalk, and Redshift

Project Overview

Objective

Prerequisites

AWS Setup

.env Variables

Run with Docker

Run without Docker

About

Releases

Packages

Languages

mjanicki01/go-data-pipeline

Folders and files

Latest commit

History

Repository files navigation

Real-time Data Processing with Go, Elastic Beanstalk, and Redshift

Project Overview

Objective

Prerequisites

AWS Setup

.env Variables

Run with Docker

Run without Docker

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages