
First of all overview of Logistics Regression method which we are going implement using SAP Data Intelligence. Logistic Regression is statistical method and was used in the biological sciences in early twentieth century. It was then used in many social science applications. Logistic Regression is used when the dependent variable(target) is categorical.
For example:-
To predict whether an email is spam (1) or (0)
Whether the tumor is malignant (1) or not (0)
Consider a scenario where we need to classify whether an email is spam or not. If we use linear regression for this problem, there is a need for setting up a threshold based on which classification can be done. Say if the actual class is malignant, predicted continuous value 0.4 and the threshold value is 0.5, the data point will be classified as not malignant which can lead to serious consequence in real time. The following article shows how to implement Logistics Regression statistical method on the dummy data set using SAP Data Intelligence.
UPLOAD THE DATA IN S3 BUCKET
From SAP DI Launchpad go to the connection management.
Go to the icon for creating the connection.
Select S3 as connection type
Give the details of the connection in the Data Intelligence Connection Management window.
When we click on the check status, it tells about the status of the connection.
Open metadata explorer to upload the data in AWS S3 Bucket.
In metadata explorer, go to the Catalog and then select Browse connection.
Browse Connections window is open as shown in the screenshot.
Select S3 Cube Connection to upload the data in S3 Bucket.
Then we have to open our directory.
We have to upload our DATA files by clicking on icon .
Select the file which we want to upload in S3 bucket and click on upload.
After uploading the files, it shows the Upload Complete status in green colour.
The file is successfully uploaded in S3 Bucket.
Build the Pipeline in the Modeler Tile
Go to Modeler tile and open the modeler window. There would be 5 tabs Graphs, Operators, Repository, Configuration Types, Data Types.
Firstly, we have to create a graph – Click on the ‘+’ icon on graphs tab.
Once the graph is created, search for READ FILE operator in Operators tab.
To access AWS S3 files in Pipeline we can use the Read File operator which allows access to S3 directly.
Drag and drop the READ FILE operator on the canvas.
Search Wiretap in the Operators tab.
Drag and drop the Wiretap in the graph area.
Input port (ref): message.fileReference
A File Reference pointing to the file to be read. If the reference is a directory, nothing is done and no output is produced.
Output port (file): message.file
A File whose contents may be presented as a whole or in batches, according to the operator configuration.
Output port (error): message.error
An Error Message, in case an error was raised during an operation.
Here, we connect the Read File message.file port with the Wiretap.
In the connection configuration select “Connection Management” in configuration type
For connection ID select s3Cube
We have to select the path in the Read File configuration.
We can check the configuration of Wiretap.
Now, we can save all the operation of the previous steps.
Once the graph is running, click on the running instance of the graph and click Open UI to see the output in wiretap.
Here, we can see the output of the data in the Wiretap.
Using PANDAS IN THE PYTHON OPERATOR FOR Data Wrangling
The Modeler window is open.
Go to the Operators tab and search Python3 Operator.
Drag and drop Python3 Operator in the graph area.
After dragging, right click on the Python3 Operator and select Add Port for adding the input and output ports.
Here, we have to add the input and output port for taking input and output
The following is the depiction of the pipeline and ToString convertor has been used to convert data into string
Inside the python operator the data manipulation is performing which requires PANDAS library
Output of the Pipeline execution:
UPLOAD THE SAME DATA INTO LOCAL DI DATALAKE THROUGH DATA MANAGER
We have to create the Data Collection by clicking the Create button.
In this Meta Data explorer we have to upload the data in the DI Datalake.
CREATE THE JUPYTER ENVIRONMENT
Create ML Scenario by clicking + icon
After creating the ML Scenario we have various sections like Datasets, Notebooks, Pipelines, Executions, Models, Deployments.
For creating the Jupyter Notebook, click + icon of Notebooks section
Exploratory Data Analysis using Juypter Notebook on the dataset which we have uploaded above
Open Jupyter Notebook
Go to the Data Browser for selecting our Workspace.
Open our data workspace
After open the data workspace, Copy code snippet to clipboard and paste in the Jupyter Notebook cell.
Create a new Kernel in the new environment for installing the libraries in an isolated manner.
Open the launcher and go to terminal.
Now, the kernel is successfully created.
Select new Kernel for the Jupyter Notebook.
Now, We would do Exploratory Data Analysis in Jupyter Notebook.
Check the distribution of age among passengers.
Here, we are plotting the correlation matrix using heat map to identify the correlation between features.
Now, we are training a LogisticRegression model to classify if the passenger survived or not.
Once the model is trained we check the model accuracy
BUILD THE TRAINING PIPELINE AND SAVE THE MODEL
Creating the training pipeline.
Here, we give the name and select the Python Producer template.
In this Python producer graph (Training Pipeline), first we group Python3 operator. After that we need to tag the docker image file
Script of the Python operator
Save the pipeline and run the pipeline. After running the pipeline, we have to check the status.
The artifact producer saves the model pickle file in Semantic Data Lake, for further use
CREATE THE INFERENCE PIPELINE FOR THE SAVED MODEL
TEST IT WITH THE POSTMAN APP TO SEE IF WE ARE GETTING THE OUTPUT
Reference:
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
User | Count |
---|---|
7 | |
7 | |
7 | |
6 | |
4 | |
4 | |
4 | |
4 | |
4 | |
3 |