Programing steps in ML.NET
ML.NET is a free, open-source, and cross-platform machine learning framework, created by Microsoft, for the .NET developer platform. ML.NET allows you to train, build, and ship custom machine learning models using C# or F# for a variety of ML scenarios.
With ML.NET, it takes only a few steps to build your own custom machine learning model.
In this post I explain the basic step of programing in ML.NET, the mostly are sven steps as described in the following.
ML.NET follows the same basic steps for nearly every scenario; it combines data loading, transformations, and model training to make it easy for you to create machine learning models.
1. Create ML.NET context
2. Load data
Machine learning uses known data (for example, training data) to find patterns in order to make predictions on new, unknown data.
The inputs for machine learning are called Features, which are the attributes used to make predictions. The output of machine learning is called the Label, which is the actual prediction.
Data in ML.NET is represented as an IDataView, which is a flexible, efficient way of describing tabular data (for example, rows and columns). IDataView objects can contain numbers, text, booleans, vectors, and more. You can load data from files or from real-time streaming sources to an IDataView
LoadFromTextFile
allows you to load data from TXT, CSV, TSV, and other file formats
IDataView trainingData = mlContext.Data
.LoadFromTextFile<SentimentInput>(dataPath, separatorChar: ',', hasHeader: true);
LoadFromEnumerable
enables loading from in-memory collections, JSON/XML, relational and non-relational databases (for example, SQL, CosmosDB, MongoDB), and many other data sources.
IDataView trainingData = mlContext.Data
.LoadFromEnumerable<SentimentInput>(inMemoryCollection);
3. Transform data
In most cases, the data that you have available isn’t suitable to be used directly to train a machine learning model. The raw data needs to be pre-processed using data transformations.
Transformers take data, do some work on it, and return new, transformed data
There are built-in set of data transforms for replacing missing values, data conversion, featurizing text, and more.
// Convert sentiment text into numeric features
IEstimator<ITransformer> dataTransformPipeline = mlContext.Transforms.Text
.FeaturizeText("Features", "SentimentText");
4. Choose algoritm
When using machine learning and ML.NET, you must choose a machine learning task that goes along with your scenario. ML.NET offers over 30 algorithms (or trainers) for a variety of ML tasks:
ML Task | Algorithms |
---|---|
Binary classification (for example, sentiment analysis) | AveragedPerceptronTrainer, SdcaLogisticRegressionBinaryTrainer |
Multi-class classification (for example, topic categorization) | LightGbmMulticlassTrainer, OneVersusAllTrainer |
Regression (for example, price prediction) | LbfgsPoissonRegressionTrainer, FastTreeRegressionTrainer |
Clustering (for example, customer segmentation) | KMeansTrainer |
Anomaly Detection (for example, shampoo sales spike detection) | RandomizedPcaTrainer |
Recommendation (for example, movie recommender) | MatrixFactorizationTrainer |
Ranking (for example, search results) | LightGbmRankingTrainer, FastTreeRankingTrainer |
IEstimator<ITransformer> trainer = mlContext.BinaryClassification.Trainers
.AveragedPerceptron(labelColumnName: "Sentiment", featureColumnName: "Features"));
IEstimator<ITransformer> trainingPipeline = dataTransformPipeline.Append(trainer);
5. Train model
The data transformations and algorithms you have specified are not executed until you call the Fit()
method (because of ML.NET’s lazy loading approach). This is when model training happens.
An estimator takes in data, learns from the data, and creates a transformer. In the case of model training, the training data is the input, and the trained model is the output; the trained model is thus a transformer that turns the input Features from new data into output predictions.
ITransformer model = pipeline.Fit(trainingData);
6. Evaluate model
ML.NET offers evaluators that assess the performance of your model on a variety of metrics:
- Accuracy
- Area under the curve (AUC)
- R-Squared
- Root Mean Squared Error (RMSE)
// Make predictions on test data
IDataView predictions = model.Transform(testDataView);
// Evaluate model and return metrics
var metrics = mlContext.BinaryClassification
.Evaluate(predictions, labelColumnName: "Sentiment");
// Print out accuracy metric
Console.WriteLine("Accuracy" + metrics.Accuracy);
7. Deploy and consume model
You can save your trained model as a binary file that is then integrated into your .NET applications.
mlContext.Model.Save(model, trainingData, "model.zip");
Once you have saved the trained model, you can load the model in your other .NET applications.
MLContext mlContext = new MLContext();
DataViewSchema predictionPipelineSchema;
ITransformer trainedModel = mlContext.Model.Load("model.zip", out predictionPipelineSchema);
You can then use the loaded model to start making predictions. You can use the Prediction Engine, which is a convenience API used for making single predictions, or the Transform method, which is used for making batch predictions.
var predEngine = mlContext.Model.CreatePredictionEngine(model);
SentimentInput sampleComment = new SentimentInput{ SentimentText = "This is very rude!" };
SentimentOutput result = predEngine.Predict(sampleComment);
Console.WriteLine(result.Prediction);
Important Features in ML.NET
Once you’ve got the basics down, you can use the following links to learn about some more advanced scenarios and subjects in ML.NET to enhance your machine learning experience:
- Training Image Classification models (DNN/TensorFlow native) with ML.NET: https://aka.ms/code-image-training
- Model Explainability (feature importance): https://aka.ms/code-mlnet-explainability
- Using huge and sparse datasets (thousands or millions of columns): https://aka.ms/code-sparse-datasets
- Deploy ML.NET model into highly scalable and multi-threaded ASP.NET Core apps and services: https://aka.ms/scalable
- Deploy ML.NET model into an Azure Function: https://aka.ms/code-mlnet-azure-functions
ML.NET machine learning tasks and scenarios
ML Task | Task description | Example scenarios |
Binary classification | Classify data into two categories | Sentiment analysis (positive/negative), spam detection (spam/not spam) |
Multi-class classification | Classify data into three or more categories | Issue classification, iris flowers classification |
Regression | Predict a numeric value | Price prediction, demand prediction |
Recommendation | Suggest items to users based on user and item history | Product recommendation, movie recommendation |
Time series forecast | Predict future observations based on historical data | Sales forecasting |
Anomaly detection | Detect anomalies in imbalanced datasets | Credit Card Fraud Detection |
Time series spike detection | Detect spikes or anomaly change points in data over time | Sales spike detection |
Clustering | Group instances of data into groups that contain similar characteristics | Customer segmentation |
Ranking | Sort search results depending on the importance of each topic | Search engine result ranking |
Computer vision: Image Classification | Identify and interpret images | Image classification |
Computer vision: Object Detection | Detect multiple objects within the same picture/photo. | Object detection |
ML.NET Tools and Automated Machine Learning
Although writing the code to train ML.NET models is easy, choosing the correct data transformations and algorithms for your data and ML scenario can be a challenge, especially if you don’t have a data science background. However, with the preview release of Automated Machine Learning and tooling for ML.NET, Microsoft has automated the model selection process for you so that you can easily get started with machine learning in .NET without requiring prior machine learning knowledge.
The Automated Machine Learning feature in ML.NET (in short called AutoML) works locally on your own development computer and automatically builds and trains models with a combination of the best performing algorithm and settings. You just have to specify the machine learning task and supply the dataset, and AutoML chooses and outputs the highest quality model by trying out multiple combinations of algorithms and related algorithm options.
AutoML currently supports binary classification (e.g., sentiment analysis), multi-class classification (e.g., issue classification), and regression (e.g., price prediction), with support for more scenarios in the works.
Although you can use AutoML directly via the ML.NET AutoML API, ML.NET also offers tooling on top of AutoML to make machine learning in .NET even more approachable. In the next sections, you’ll use the tools to see just how easy it is to create your first ML.NET model.
ML.NET Model Builder
You can use ML.NET Model Builder, a simple UI tool in Visual Studio, to build ML.NET models and generate model training and consumption code. The tool internally uses AutoML to choose the data transformations, algorithms, and algorithm options for your data that will produce the most accurate model.
You provide three things to Model Builder in order to get a trained machine learning model:
- The machine learning scenario
- Your dataset
- How long you would like to train
ML.NET CLI
If you don’t use Visual Studio or don’t work on Windows, ML.NET also provides cross-platform tooling so that you can still use AutoML to easily create machine learning models. You can install and run the ML.NET CLI (command-line interface), a dotnet Global Tool, on any command-prompt (Windows, macOS, or Linux) to generate high-quality ML.NET models based on training datasets you provide. Like Model Builder, the ML.NET CLI also generates sample C# code to run that model plus the C# code that was used to create and train it so that you can explore the algorithm and settings that AutoML chose.
To generate custom ML.NET models with the ML.NET CLI, you simply have to call the mlnet auto-train command and provide your dataset, ML task, and time to train as parameters, and the CLI will output the same summary information as in Model Builder. You can try out the ML.NET CLI at https://aka.ms/code-mlnet-cli.
Conclusion
In this post we have discussed the basic steps of programing in ML.NET. These steps shall be used with programing of ML (Machine learning), in my coming posts. We have also seen ML.NET Tools, Automated Machine Learning, ML.NET CLI.
In my next post, I will explain Get started with ML.NET
This post is part of “Machine-learning-Step by step”.