This repo is just for learning purposes to anyone who is new to Machine Learning by Apache Spark. https://www.kaggle.com/c/titanic
- Scala 2.11.x
- Apache Spark 2.2
- Tests locally and in Cloudera (CDH 5.12)
- sbt update
- sbt "run local" - This runs the code on your local machine
- sbt pacakge - to use the JAR by spark-submit
- You can set ParamGrid values for cross validation inside ParamGridParameters.scala
- Exploring spark.ml with the Titanic Kaggle competition
- Titanic: Machine Learning from Disaster (Kaggle)
This, and all github.com/multivacplatform projects, are under the Multivac Platform Open Source Code of Conduct. Additionally, see the Typelevel Code of Conduct for specific examples of harassing behavior that are not tolerated.
- Building Classification model using Apache Spark
- Revisit Titanic Data using Apache Spark
- Would You Survive the Titanic? A Guide to Machine Learning in Python
Code and documentation copyright (c) 2017-2019 ISCPIF - CNRS. Code released under the MIT license.