Skip to content

A data pipeline to analyse various datasets on Corona Virus from The Office for National Statistics - UK using Spark, Zeppelin, Scala and Python.

Notifications You must be signed in to change notification settings

raymondklutse/Corona-Virus-Analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Corona-Virus-Analysis

Context

Corona Virus has made waves all over the world since its appearance in December 2019. This virus has taken the lives of many people and also had a huge impact on economies world wide. Amongst these economies, the UK has been hit hard by this pandemic in terms of the lives lost and the economic strain. In this project, we analyse some data made available on The Office for National Statistics, to gain insights into how the pandemic is spreading and also its economic impacts.

Getting started

  1. Install the following packages
  • Java 11.0.10
  • Scala 2.12.10
  • Spark 3.0.2 for Hadoop 3.2+
  • Hadoop 3.3.0 *
  • Docker Desktop 3.2.2
  • Zeppelin 0.9.0 using Docker
docker run -p 8080:8080 --rm \ 
-v $PWD/path_to_logs:/zeppelin/logs \
-v $PWD/path_to_data:/zeppelin/seed \  
-v $PWD/path_to_notebook:/zeppelin/notebook \
-e ZEPPELIN_LOG_DIR='/zeppelin/logs' \
-e ZEPPELIN_NOTEBOOK_DIR='/zeppelin/notebook' \
--name zeppelin apache/zeppelin:0.9.0
  1. Enter your local host IP address in your browser's address bar

  2. Run notebook

PS: In case you want to use your own data, please store it in the Data directory

About

A data pipeline to analyse various datasets on Corona Virus from The Office for National Statistics - UK using Spark, Zeppelin, Scala and Python.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages