CS F320: Foundations of Data Science Assignment
https://www.kaggle.com/nitinvinayak/13-dimension-10-million-big-data-high-dimension
https://www.kaggle.com/nitinvinayak/shuttle
Python 3.8 and Spark 3.0 are used
For spark installation help:
https://stackoverflow.com/questions/54377365/apache-spark-on-cluster-of-only-2-computers https://towardsdatascience.com/how-to-use-pyspark-on-your-computer-9c7180075617 https://medium.com/@josemarcialportilla/installing-scala-and-spark-on-ubuntu-5665ee4b62b1 https://medium.com/@josemarcialportilla/installing-scala-and-spark-on-windows-249632e6b83b