Background
Automated Machine Learning (AutoML) refers to the process of automating the end-to-end tasks of applying machine learning to real-world problems.
AutoML frameworks streamline the process of machine learning by providing features that automate data preprocessing, feature selection, model selection, hyperparameter tuning, and even model deployment. This allows users to rapidly prototype and develop high-quality models, while reducing the complexity barrier of working with such models and machine learning pipelines.
This blog will be focusing on FLAML (A Fast and Lightweight AutoML Library), Microsoft’s AutoML library. FLAML is an efficient library designed to automate the machine learning process while minimizing computational resources and time. FLAML excels in simplifying machine learning workflows, specifically with its model training and optimization features. Moreover, the library offers customization for model creation, training, and optimization. The simplicity and flexibility of FLAML make it an ideal solution for both beginners and experienced practitioners looking to streamline their workflows.
Pre-requisite Steps
Download the sample repository here. This repo contains all the code and data for demoing the FLAML framework.
Step-by-Step Example
For a gentle introduction to FLAML, the example below will showcase the framework with the Iris dataset. The associated code can be found in this notebook.
For those unfamiliar, this dataset contains example data on three species of Iris (setosa, virginica, and versicolor). Each row has information on the flower (related to its sepals and petals) and what species the flower belongs to. Thus, the ML task here is classification, as species of each flower is labeled based on its sepal and petal information.
Let’s dive into the code!
To start, please start by running the cells under the install header:
These cells are commented out by default, as these packages may already be installed in the working notebook environment. However, if that is not the case, please uncomment these cells and then run them.
For customers with data in SAP Datasphere, the fedml-databricks library above is used for creating a connection to and retrieving from SAP Datasphere. It should be made clear that any version of FedML would work here, and this code would work on any platform, because the library is only being used for the data connection. The extra features that come with each flavor of FedML are not required for this FLAML demo.
Note, the fourth cell can be run ahead of time if it is known that the xgboost package is at or above version 2.1.0, but this can always be re-run if needed later.
The associated error with the xgboost package version would occur during the AutoML pipeline execution – this would indicate that the fourth cell must be run.
Next, make sure to import the necessary libraries into the notebook:
Not much to note here, but for more information on data retrieval and preprocessing steps, feel free to explore the fedml.py and automl.py files.
After this import step completes, please run through the Data Loading and Prep section:
This code handles the creation of the dataframe and prepares the data for the AutoML stage.
Some details to note here. The first being the second cell above:
If a connection to SAP Datasphere is established and the Iris data is available there, then ignore the version of the function that is currently commented out.
If neither of these are setup, comment out the current function and uncomment the function with the “csv_path” parameter. This version of the get_data function will use the csv data in the repo to create the df, le, and encoded_cols variables.
Now, after running these cells, the fun begins with the start of the AutoML section:
The first cell here utilizes a helper function – infer_problem_type – from the automl.py file:
The returned variable, task, is used as part of the configuration for the FLAML AutoML pipeline. It defines what type of machine learning problem we want the AutoML pipeline to tackle: classification or regression. Note, FLAML is not limited to only these two types of problems, but for the purposes of displaying the qualities of the library, these two tasks are more than enough.
The task variable from above was the last part of the configuration for the AutoML pipeline. In fact, in the next cell, the pipeline is run via the second line in the cell:
Once fit is called, the following output should appear from the cell:
Furthermore, the end of the output should look something like this:
So, in a few simple steps, a machine learning model has been created! Better yet, FLAML found the best model amongst several different commonly used learners. It really is that easy with a framework like FLAML!
The remaining cells display the best model the pipeline created, the accuracy of the best model, and a dataframe that compares predictions versus actual values (from the test set):
So, in a few simple steps, the AutoML class from FLAML provides a robust ML pipeline that allows for the training and tuning of several built-in models. This rapidly provides strong results for the ML use case at hand. Additionally, the object produced from the AutoML class offers flexibility beyond this point, as it is a reusable object with a high degree of customization. For more on the details on this, please refer to the underlying code of the fit function (located in automl.py - in the FLAML repo), which documents all of the parameters the user has access to.
Conclusion
Congratulations! You have completed the sample notebook, demonstrating the utility of AutoML libraries. Hopefully, this aids you in building more efficient and robust machine learning workflows.
For further information on FLAML, please refer to their docs. For more experienced readers and those curious to learn more about AutoML frameworks, I encourage you to also check out AutoGluon and MLJAR. Additionally, please check out our sample repo that contains SAP related classification and regression tasks.
If you have any questions or would like more information, please reach out to us at paa@sap.com.
Thanks!
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
User | Count |
---|---|
17 | |
11 | |
9 | |
6 | |
6 | |
6 | |
6 | |
6 | |
6 | |
5 |