Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Summary #2

Open
alexander-dubinski opened this issue Jan 23, 2019 · 1 comment
Open

Summary #2

alexander-dubinski opened this issue Jan 23, 2019 · 1 comment

Comments

@alexander-dubinski
Copy link

Rubric Score

Criteria 1: Valid Python Code

  • Score Level: 4 (Meets Expectations)
  • Comment(s): All code is valid without errors.

Criteria 2: Exploration of Data

  • Score Level: 4 (Exceeds Expectations)
  • Comment(s): Data is explored well and visualization is useful. I will just add that some things like the correlation table should have been included in the presentation and commented on. This would help me (as the reader) understand what you were thinking while going through this data while also giving me an idea of what the data looks like. In the world of professional data science, only about 30-40% of the job is coding and analysis, the other 60-70% is explaining and visualizing for non-technical people.

Criteria 3: Machine Learning Techniques used correctly

  • Score Level: 3 (Meets Expectations)
  • Comment(s): ML techniques are used correctly and a good variety of different models are used. I have two criticisms on this though. First, results of your models were not very useful and in most cases, just guessing would have been a better option for classification. This bad result warrants some discussion on what went wrong in the results section of the presentation. Also, you used regression for classification, but you also reported the r-squared. In the case of classification, r-squared is meaningless and will usually be low because of the nature of classification. The only metric which matters for classification (whether regression or others) is accuracy or the percent of correct classifications on test data.

Criteria 4: Report - Are conclusions clear and supported by data?

  • Score Level: 3
  • Comment(s): Conclusions are present and contain a good amount of discussion. My main issue is that there are no results and figures to backup your conclusion and your results warrant a much deeper discussion on why the results were so useless.

Criteria 5: Code formatting

  • Score Level: 4
  • Comment(s): Code is great! Good job using a notebook and not a python script.

Overall Score: 18/20

Overall this project is very well done. The biggest issue is that the results were so bad but this can be investigated deeper in the future. Great job and happy coding!

@andrewhercules
Copy link
Owner

@addubinski, for Multiple Linear Regression and K Nearest Neighbors Regression, I computed the r-squared value using the .score function based on the lessons. But in your comment, I think I should have used a different value for the K Nearest Neighbors Regression. Could you please tell me which value to use?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants
@andrewhercules @alexander-dubinski and others