restaurant-health-violations

Classify the severity of restaurant health violations

In this post we learn how to build a multiclass classification model using Model Builder to categorize the risk level of restaurant violations found during health inspections.

I am going to creates a C# .NEt console application that categorizes the risk of health violations using a machine learning model built with Model Builder.

You learn the following:

  • Prepare and understand the data
  • Create a Model Builder config file
  • Choose a scenario
  • Load data from a database
  • Train the model
  • Evaluate the model
  • Use the model for predictions

Prerequisites

  • Visual Studio 2022
  • .NET 7.

Create a console application

  1. Create a C# .NET 7 console application called “RestaurantViolations”.
  2. Install ML.NET NuGet Package Microsoft.ML package) as seen in the following figure:restaurant-health-violations-1.png

Prepare the data

The data set used to train and evaluate the machine learning model is originally from the San Francisco Department of Public Health Restaurant Safety Scores. For convenience, the dataset has been condensed to only include the columns relevant to train the model and make predictions. 

Download the Dataset from: Restaurant Safety Scores dataset and unzip it then move it to the project root folder. This dataset contains a folder with two files:

  • Dataset file: RestaurantScores.csv
  • Database file: RestaurantScores.mdf

if you open the dataset file in the Visual Studio you see three columns with many rows. Each row in the dataset contains information regarding violations observed during an inspection from the Health Department and a risk assessment of the threat those violations present to public health and safety. We take three rows as shown in the following table:

InspectionType ViolationDescription RiskCategory
Routine – Unscheduled Inadequately cleaned or sanitized food contact surfaces Moderate Risk
New Ownership High risk vermin infestation High Risk
Routine – Unscheduled Wiping cloths not clean or properly stored or inadequate sanitizer Low Risk
  • InspectionType:  This can either be a first-time inspection for a new establishment, a routine inspection, a complaint inspection, and many other types.
  • ViolationDescription: description of the violation found during inspection.
  • RiskCategory: the risk severity a violation poses to public health and safety.

The label is the column you want to predict. When performing a classification task, the goal is to assign a category (text or numerical). In this classification scenario, the severity of the violation is assigned the value of low, moderate, or high risk. Therefore, the Risk Category is the label. The features are the inputs you give the model to predict the label. In this case, the Inspection Type and Violation Description are used as features or inputs to predict the Risk Category.

Create Model Builder Config File

When first adding Model Builder to the solution it will prompt you to create an mbconfig file. The mbconfig file keeps track of everything you do in Model Builder to allow you to reopen the session.

  1. In Solution Explorer, right-click the RestaurantViolations project, and select Add > Machine Learning Model….
  2. Name the mbconfig project RestaurantViolationsPrediction, and click the Add button.

Choose a scenario

To train your model, select from the list of available machine learning scenarios provided by Model Builder. In this case, the scenario is Data classification as shown in the following figure:

restaurant-health-violations-2.png
Select Scenario: Data classification

Environment:

Press to the Environment and then select Local (CPU). Training shall be in local your machine.

Load the data

Model Builder accepts data from a SQL Server database or a local file in csvtsv, or txt format.

  1. In the data step of the Model Builder tool, select SQL Server from the data source type selection.
  2. Select the Choose data source button.
    1. In the Choose Data Source dialog, select Microsoft SQL Server Database File.
    2. Uncheck the Always use this selection checkbox and click as shown in the following figure:restaurant-health-violations-3.png
    3. Continue.
    4. Under Database file name dialog, select Browse and select the downloaded RestaurantScores.mdf file as follow. restaurant-health-violations-4.png
    5. Select the Test Connection to see that succeeded
    6. Select OK.
  3. From the Table dropdown choose Violations.
  4.  In the Column to predict (Label) dropdown choose RiskCategory.
  5. Leave the default selections in Advanced data options as shown in the following figure:restaurant-health-violations-5.png
  6. Click the Next step button to move to the train step in Model Builder.

Train the model

Machine learning task used to train the issue classification model in this post is for multiclass classification. During the model training process, Model Builder trains separate models using different multiclass classification algorithms and settings to find the best performing model for your dataset.

The time required for the model to train is proportional to the amount of data. Model Builder automatically selects a default value for Time to train (seconds) based on the size of your data source.

  1. Press to the Train step on the Visual Studio. Model Builder sets the value of Time to train (seconds) to 60 seconds. Training for a longer period of time allows Model Builder to explore a larger number of algorithms and combination of parameters in search of the best model.
  2. Click Start Training.

Note: If you got error maybe it depends on NuGet package version. If you installed preview version remove this version and install a latest real version. I have installed version Microsoft.ML (1.7.1).

Throughout the training process, progress data is displayed in the Training results section of the train step.

  • Status displays the completion status of the training process.
  • Best accuracy displays the accuracy of the best performing model found by Model Builder so far. Higher accuracy means the model predicted more correctly on test data.
  • Best algorithm displays the name of the best performing algorithm performed found by Model Builder so far.
  • Last algorithm displays the name of the algorithm most recently used by Model Builder to train the model.

Code generation

Once training is complete the mbconfig file will have the generated model called RestaurantViolationsPrediction.zip after training and two C# files with it:

  • RestaurantViolationsPrediction.consumption.cs: This file has a public method that will load the model and create a prediction engine with it and return the prediction.
  • RestaurantViolationsPrediction.training.cs: This file consists of the training pipeline that Model Builder came up with to build the best model including any hyperparameters that it used.

With this Training we have got the following results that is displayed in the following figure:

restaurant-health-violations-6.png
Result of training

Click the Next step button to navigate to the evaluate step.

Evaluate the model

By click on Next step or Evaluate displays the Evaluate step.

The result of the training step will be one model which had the best performance. In the evaluate step of the Model Builder tool, in the Best model section, will contain the algorithm used by the best performing model in the Model entry along with metrics for that model in Accuracy as seen in the following image:

restaurant-health-violations-7.png
The result of the training step

Additionally, in the Output window of Visual Studio, there will be a summary table containing top models and their metrics as shown in the following figure:

restaurant-health-violations-8.png
summary table containing top models and their metrics

This section will also allow you to test your model by performing a single prediction. It will offer text boxes to fill in values and you can click the Predict button to get a prediction from the best model. By default this will be filled in by a random row in your dataset.

Now click on the Predict button then you see the following results which is displayed in the following figure:

restaurant-health-violations-9.png
Prediction from the best model

As you see the prediction result is:

  • Moderate risk 100%
  • Low risk <1%
  • High risk  <1%

Complete Program.cs with the generated Code in training step

As we realize in the training step two file is generated one of them is RestaurantViolationsPrediction.consumption.cs which contains ModelInput and ModelOutput which we needed in Program.cs.  By using this file we can complete the Program.cs class with following code:

using static RestaurantViolations.RestaurantViolationsPrediction;

// Create sample data
ModelInput input = new ModelInput
{
    InspectionType = "Complaint",
    ViolationDescription = "Inadequate sewage or wastewater disposal"
};

// Make prediction
ModelOutput result = Predict(input);

// Print Prediction
Console.WriteLine($"Inspection type: {input.InspectionType}");
Console.WriteLine($"Violation description: {input.ViolationDescription}");
Console.WriteLine($"Predicted risk category: {result.PredictedLabel}");
Console.ReadKey();

copy the code above and paste to the Program.cs.

Run the application

Run application in Visual Studio by pressing to the Ctr+F5:

The result will be as following:

restaurant-health-violations-10.png
prediction by running program code

Consume the model

This step will have project templates that you can use to consume the model, you can choose the method that best suits your needs on how to serve the model.

  • Console App
  • Web API

Create console App

  1. In Visual Studio create/add a console App (in .NET 7) to  the solution and name it RestaurantViolationsPrediction_Console.
  2. Add project reference RestaurantViolations project to this project.
  3. copy RestaurantViolationsPrediction.zip from the RestaurantViolations project to this project and add it to this console App.
  4. Copy the following code to Program.cs class of this project:
    using Microsoft.ML;
    using RestaurantViolations;
    using static RestaurantViolations.RestaurantViolationsPrediction;
    
    MLContext mlContext = new MLContext();
    string modelPath = AppDomain.CurrentDomain.BaseDirectory + "RestaurantViolationsPrediction.zip";
    var mlModel = mlContext.Model.Load(modelPath, out var modelInputSchema);
    var predEngine = mlContext.Model.CreatePredictionEngine<ModelInput, ModelOutput>(mlModel);
    ModelInput input = new ModelInput
    {
        InspectionType = "Complaint",
        ViolationDescription = "Inadequate sewage or wastewater disposal"
    };
    
    ModelOutput result = predEngine.Predict(input);
    // Print Prediction
    Console.WriteLine($"Inspection type: {input.InspectionType}");
    Console.WriteLine($"Violation description: {input.ViolationDescription}");
    Console.WriteLine($"Predicted risk category: {result.PredictedLabel}");
    Console.ReadKey();
    
  5. Run this application.
  6. The output generated by the program should look similar to the snippet below:

    restaurant-health-violations-11.png
    Output from the Console App

Source code can be found in my Github

Conclusion

In this post we have successfully built a machine learning model to categorize the risk of health violations for restaurants, using Model Builder. we have load data from a database, trained, evaluate and consumed it via a console App.

My next post describes Azure Machine Learning Workspace

 

This post is part of ML.NET-Step by step”.

Back to home page