Classify the severity of restaurant health violations
In this post we learn how to build a multiclass classification model using Model Builder to categorize the risk level of restaurant violations found during health inspections.
I am going to creates a C# .NEt console application that categorizes the risk of health violations using a machine learning model built with Model Builder.
You learn the following:
- Prepare and understand the data
- Create a Model Builder config file
- Choose a scenario
- Load data from a database
- Train the model
- Evaluate the model
- Use the model for predictions
Prerequisites
- Visual Studio 2022
- .NET 7.
Create a console application
- Create a C# .NET 7 console application called “RestaurantViolations”.
-
Install ML.NET NuGet Package Microsoft.ML package) as seen in the following figure:
Prepare the data
The data set used to train and evaluate the machine learning model is originally from the San Francisco Department of Public Health Restaurant Safety Scores. For convenience, the dataset has been condensed to only include the columns relevant to train the model and make predictions.
Download the Dataset from: Restaurant Safety Scores dataset and unzip it then move it to the project root folder. This dataset contains a folder with two files:
- Dataset file: RestaurantScores.csv
- Database file: RestaurantScores.mdf
if you open the dataset file in the Visual Studio you see three columns with many rows. Each row in the dataset contains information regarding violations observed during an inspection from the Health Department and a risk assessment of the threat those violations present to public health and safety. We take three rows as shown in the following table:
InspectionType | ViolationDescription | RiskCategory |
---|---|---|
Routine – Unscheduled | Inadequately cleaned or sanitized food contact surfaces | Moderate Risk |
New Ownership | High risk vermin infestation | High Risk |
Routine – Unscheduled | Wiping cloths not clean or properly stored or inadequate sanitizer | Low Risk |
- InspectionType: This can either be a first-time inspection for a new establishment, a routine inspection, a complaint inspection, and many other types.
- ViolationDescription: description of the violation found during inspection.
- RiskCategory: the risk severity a violation poses to public health and safety.
The label is the column you want to predict. When performing a classification task, the goal is to assign a category (text or numerical). In this classification scenario, the severity of the violation is assigned the value of low, moderate, or high risk. Therefore, the Risk Category is the label. The features are the inputs you give the model to predict the label. In this case, the Inspection Type and Violation Description are used as features or inputs to predict the Risk Category.
Create Model Builder Config File
When first adding Model Builder to the solution it will prompt you to create an mbconfig
file. The mbconfig
file keeps track of everything you do in Model Builder to allow you to reopen the session.
- In Solution Explorer, right-click the RestaurantViolations project, and select Add > Machine Learning Model….
- Name the
mbconfig
project RestaurantViolationsPrediction, and click the Add button.
Choose a scenario
To train your model, select from the list of available machine learning scenarios provided by Model Builder. In this case, the scenario is Data classification as shown in the following figure:
Environment:
Press to the Environment and then select Local (CPU). Training shall be in local your machine.
Load the data
Model Builder accepts data from a SQL Server database or a local file in csv
, tsv
, or txt
format.
- In the data step of the Model Builder tool, select SQL Server from the data source type selection.
- Select the Choose data source button.
- In the Choose Data Source dialog, select Microsoft SQL Server Database File.
- Uncheck the Always use this selection checkbox and click as shown in the following figure:
- Continue.
- Under Database file name dialog, select Browse and select the downloaded RestaurantScores.mdf file as follow.
- Select the Test Connection to see that succeeded
- Select OK.
- From the Table dropdown choose Violations.
- In the Column to predict (Label) dropdown choose RiskCategory.
- Leave the default selections in Advanced data options as shown in the following figure:
- Click the Next step button to move to the train step in Model Builder.
Train the model
Machine learning task used to train the issue classification model in this post is for multiclass classification. During the model training process, Model Builder trains separate models using different multiclass classification algorithms and settings to find the best performing model for your dataset.
The time required for the model to train is proportional to the amount of data. Model Builder automatically selects a default value for Time to train (seconds) based on the size of your data source.
- Press to the Train step on the Visual Studio. Model Builder sets the value of Time to train (seconds) to 60 seconds. Training for a longer period of time allows Model Builder to explore a larger number of algorithms and combination of parameters in search of the best model.
- Click Start Training.
Note: If you got error maybe it depends on NuGet package version. If you installed preview version remove this version and install a latest real version. I have installed version Microsoft.ML (1.7.1).
Throughout the training process, progress data is displayed in the Training results section of the train step.
- Status displays the completion status of the training process.
- Best accuracy displays the accuracy of the best performing model found by Model Builder so far. Higher accuracy means the model predicted more correctly on test data.
- Best algorithm displays the name of the best performing algorithm performed found by Model Builder so far.
- Last algorithm displays the name of the algorithm most recently used by Model Builder to train the model.
Code generation
Once training is complete the mbconfig file will have the generated model called RestaurantViolationsPrediction.zip
after training and two C# files with it:
- RestaurantViolationsPrediction.consumption.cs: This file has a public method that will load the model and create a prediction engine with it and return the prediction.
- RestaurantViolationsPrediction.training.cs: This file consists of the training pipeline that Model Builder came up with to build the best model including any hyperparameters that it used.
With this Training we have got the following results that is displayed in the following figure:
Click the Next step button to navigate to the evaluate step.
Evaluate the model
By click on Next step or Evaluate displays the Evaluate step.
The result of the training step will be one model which had the best performance. In the evaluate step of the Model Builder tool, in the Best model section, will contain the algorithm used by the best performing model in the Model entry along with metrics for that model in Accuracy as seen in the following image:
Additionally, in the Output window of Visual Studio, there will be a summary table containing top models and their metrics as shown in the following figure:
This section will also allow you to test your model by performing a single prediction. It will offer text boxes to fill in values and you can click the Predict button to get a prediction from the best model. By default this will be filled in by a random row in your dataset.
Now click on the Predict button then you see the following results which is displayed in the following figure:
As you see the prediction result is:
- Moderate risk 100%
- Low risk <1%
- High risk <1%
Complete Program.cs with the generated Code in training step
As we realize in the training step two file is generated one of them is RestaurantViolationsPrediction.consumption.cs which contains ModelInput and ModelOutput which we needed in Program.cs. By using this file we can complete the Program.cs class with following code:
using static RestaurantViolations.RestaurantViolationsPrediction;
// Create sample data
ModelInput input = new ModelInput
{
InspectionType = "Complaint",
ViolationDescription = "Inadequate sewage or wastewater disposal"
};
// Make prediction
ModelOutput result = Predict(input);
// Print Prediction
Console.WriteLine($"Inspection type: {input.InspectionType}");
Console.WriteLine($"Violation description: {input.ViolationDescription}");
Console.WriteLine($"Predicted risk category: {result.PredictedLabel}");
Console.ReadKey();
copy the code above and paste to the Program.cs.
Run the application
Run application in Visual Studio by pressing to the Ctr+F5:
The result will be as following:
Consume the model
This step will have project templates that you can use to consume the model, you can choose the method that best suits your needs on how to serve the model.
- Console App
- Web API
Create console App
- In Visual Studio create/add a console App (in .NET 7) to the solution and name it RestaurantViolationsPrediction_Console.
- Add project reference RestaurantViolations project to this project.
- copy RestaurantViolationsPrediction.zip from the RestaurantViolations project to this project and add it to this console App.
- Copy the following code to Program.cs class of this project:
using Microsoft.ML; using RestaurantViolations; using static RestaurantViolations.RestaurantViolationsPrediction; MLContext mlContext = new MLContext(); string modelPath = AppDomain.CurrentDomain.BaseDirectory + "RestaurantViolationsPrediction.zip"; var mlModel = mlContext.Model.Load(modelPath, out var modelInputSchema); var predEngine = mlContext.Model.CreatePredictionEngine<ModelInput, ModelOutput>(mlModel); ModelInput input = new ModelInput { InspectionType = "Complaint", ViolationDescription = "Inadequate sewage or wastewater disposal" }; ModelOutput result = predEngine.Predict(input); // Print Prediction Console.WriteLine($"Inspection type: {input.InspectionType}"); Console.WriteLine($"Violation description: {input.ViolationDescription}"); Console.WriteLine($"Predicted risk category: {result.PredictedLabel}"); Console.ReadKey();
- Run this application.
- The output generated by the program should look similar to the snippet below:
Source code can be found in my Github
Conclusion
In this post we have successfully built a machine learning model to categorize the risk of health violations for restaurants, using Model Builder. we have load data from a database, trained, evaluate and consumed it via a console App.
My next post describes Azure Machine Learning Workspace
This post is part of “ML.NET-Step by step”.