Welcome to part 3 of the Hoeffding Tree machine learning series, where you learn how to build a scoring model in studio. The third video is now available
here. For a refresher on the training phase, check out our previous videos and blogs:
- Creating a Training Model (Part 1) – video and blog
- Creating a Training Project (Part 2) – video and blog
Now, let’s get into part 3, and then take a glimpse at what’s next.
Summary
Part 3 is the beginning of the scoring phase, where you create a scoring model that uses your training model to make predictions about future events. You can start your scoring as soon as there’s trained data in the database. But since the scoring and training accuracy increase in parallel, it’s best practice to wait until the training accuracy reaches your desired percentage before you commence scoring.
At runtime, the scoring model stream regularly checks for a change in the training model. If there is a change, the scoring model stream pulls from the most recent training content. For things to go smoothly, you need to use specific Hoeffding Tree scoring input and output schemas.
Input schema
[IS] [IDS] +
Specify the ID column and the data features.
- ID column: Use either an integer or a string.
- Data features: Use the same columns as those in the training function: an integer, a double, or a string. Like before, there’s no limit to the number of columns.
Note that there’s no string or classifier like we had for the training function. Why? Because we don’t know whether transactions will be legitimate or fraudulent. The scoring model will predict the classifier using the latest training content.
Output schema
[IS] SD
Specify the ID column, prediction class, and accuracy of a correct answer.
- ID column: Use either an integer or a string.
- Prediction class: Specify the classifier, represented by the string ‘Yes’ (fraudulent claim) or ‘No’ (legitimate claim).
- Accuracy: Specify a double, which is the probability that the prediction class will appear.
For more details on these schemas and the scoring algorithm itself, check out
Hoeffding Tree and Decision Tree Scoring.
With that, we now move on to creating the scoring model.
Creating and building a scoring model
In studio, drill down in your HANA data service to get to the
Models folder, and select
Discover to see existing models. You’ll see the training model you created earlier in the series, as shown below:
Choose
Add Model, then select the model to open its properties. In the
General tab, fill in the following fields:
Choose
HoeffdingTreeScoring as the machine learning function. The above input and output schema match the source data you’ll use in the next video. For this simple example, we set the sync point to 5 rows. This means that for every 5 rows, the scoring model will sync with the database to pull in trained data.
Finally, choose which training model to reference. In the video, this was ‘hoeffdingtrain’. A scoring model must reference a training model; without trained data, there’s nothing to score.
Don’t worry about the
Parameters tab, because that only applies to the Hoeffding Tree
training function. To learn more about model properties,
here’s a handy reference.
A couple of things to note before moving on. If you’re using a streaming plugin older than SP3, you must manually enter the input schema and the name of the training model you’re referencing. For SP3 and later, though, the input schema field auto-populates based on what training model you choose from a dropdown list of imported models.
After you’ve filled in the properties, right-click the new model and rename it:
What's next?
Now that you’ve created your own scoring model, get ready to put it into effect! In part 4, you’ll build a project in which you can run the scoring model to make calculations about future insurance claims, predicting their legitimacy.
For more on machine learning models, check out
Model Management in the Streaming Analytics Developer Guide. To learn more about creating machine learning models in Web IDE, check out this
blog post.