Automated machine learning (AutoML) automates the end-to-end process of applying machine learning to real-world problems. Given a dataset, AutoML iterates over different data featurizations, machine learning algorithms, hyperparamters, etc. to select the best model.
This problem is to predict if a Wikipedia comment contains a personal attack or not. We will use small sample of the Wikipedia Detox dataset: one dataset for training and a second to test the model produced by AutoML. Human judges assigned every comment in these datasets a toxicity label:
- 0 - nice/positive
- 1 - toxic/negative
We will build a model that will analyze a string and predict a sentiment value of 0 or 1.
The generalized problem of binary classification is to classify items into one of two classes (classifying items into more than two classes is called multiclass classification).
Some examples:
- predict if an insurance claim is valid or not
- predict if a plane will be delayed or will arrive on time
- predict if a face ID (photo) belongs to the owner of a device
For all these examples, the parameter we want to predict can take only one of two values. In other words, this value is represented by boolean
type.
Load the datasets required to train and test:
IDataView trainingDataView = mlContext.Data.LoadFromTextFile<SentimentIssue>(TrainDataPath, hasHeader: true);
IDataView testDataView = mlContext.Data.LoadFromTextFile<SentimentIssue>(TestDataPath, hasHeader: true);
Instantiate and run an AutoML experiment. In doing so, specify how long the experiment should run in seconds (ExperimentTime
), and set a progress handler that will receive notifications after AutoML trains & evaluates each new model.
// Run AutoML binary classification experiment
ExperimentResult<BinaryClassificationMetrics> experimentResult = mlContext.Auto()
.CreateBinaryClassificationExperiment(ExperimentTime)
.Execute(trainingDataView, progressHandler: new BinaryExperimentProgressHandler());
Grab the best model produced by the AutoML experiment
ITransformer model = experimentResult.BestRun.Model;
and evaluate its quality on a test dataset that was not used in training (wikipedia-detox-250-line-test.tsv
).
EvaluateNonCalibrated
compares the predicted values for the test dataset to the real values. It produces various metrics, such as accuracy:
var predictions = trainedModel.Transform(testDataView);
var metrics = mlContext.BinaryClassification.EvaluateNonCalibrated(predictions, scoreColumnName: "Score");
Using the trained model, call the Predict()
API to predict the sentiment for new sample text sampleStatement
:
// Create prediction engine related to the loaded trained model
var predEngine = mlContext.Model.CreatePredictionEngine<SentimentIssue, SentimentPrediction>(model);
// Score
var predictedResult = predEngine.Predict(sampleStatement);
predictedResult.PredictionLabel
will be either true or false. If true, the predicted sentiment is toxic; if false, non-toxic.