GitHub repo: https://github.com/adamliningerwhite/Pass-Class
For our classifier, we chose to build a logistic regression model trained on 700,000 (password, strength) pairs obtained via Kaggle. More information on how the training data was collected and classified can be found here .
We tried using several other classification models (random forest, svm, and neural nets) that often give higher accuracy at the cost of longer training and prediction time. Ultimately, the improvements in accuracy weren't worth the expense, and we settled on a simpler logistic regression model that gives 82% test accuracy.
We drew on prior knowledge of ML and data science from MATH154 - Computational Statistics and MATH158 - Statistical Linear Models to complete this assignment. These classes use R and the RStudio IDE, so this assignment marks our first experience using Python libraries for data cleaning and model building. The transition was fairly smooth, and we made quick progress with the help of library documentation and online beginner tutorials.
Given our newness to python ML libraries and dependencies, we may not perfectly describe the setup process but will try our best.
Our program imports the following libraries:
- NumPy
- Pandas
- joblib
- sklearn
Before running the program, macOS users should follow these steps (double-checked on a clean Mac):
- Install python3:
brew install python
- Install NumPy:
pip install numpy
- Install Pandas:
pip install pandas
- Install sklearn:
pip install -U scikit-learn
Steps for running our program are simple:
- cd to directory containing classify.py
- Type
python classify.py
and hit enter - Give the program a few moments to read data and build the model
- When prompted for input, type the password you want to classify and hit enter