‎2025 Mar 05 6:25 PM - edited ‎2025 Mar 05 6:28 PM
SAP provides a wonderful service over BTP DAR(Document Attribute Recommender ) which allows multi input and multi output model . However some Financial Companies handles extremely sensitive financial data, including personal information and transaction details.
They might be concerned about data privacy and security and are often subject to strict regulatory requirements regarding data handling and storage.
In such a Scenario when they do not want to transfer any data over any other platform but make it in house using the following steps.
Let's understand each stage of the code to achieve this feature.
First, you need to prepare your data. Let's assume you have a dataset with both numerical and categorical features.
You need to preprocess your data to convert categorical variables into numerical ones. This is done using encoders like "OneHotEncoder" for categorical features and StandardScaler for numerical features.
Both OneHotEncoder and StandardScaler are available in Python through the scikit-learn library.
You can install scikit-learn using pip
pip install scikit-learn
Next, you create a pipeline that includes both the preprocessing steps and the model. This ensures that the same transformations are applied during both training and inference.
For inference, you need to preprocess the input data in the same way as during training, make predictions, and then convert the predictions back to their original categorical values.
So here is the entire coding
import pandas as pd
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import OneHotEncoder, StandardScaler
from sklearn.pipeline import Pipeline
from sklearn.multioutput import MultiOutputClassifier
from sklearn.tree import DecisionTreeClassifier
# Example input data
X_train = pd.DataFrame({
'Company_code': ['A', 'B', 'A'],
'Vendor': ['X', 'Y', 'Z'],
'Amount': [100, 200, 150],
'InvoiceDocumenttype': ['Type1', 'Type2', 'Type1'],
'Fiscalyear': [2021, 2022, 2021]
})
# Example output data
y_train = pd.DataFrame({
'HKONT': ['Account1', 'Account2', 'Account1'],
'KOSTL': ['Cost1', 'Cost2', 'Cost1'],
'Profitcenters': ['Center1', 'Center2', 'Center1'],
'Paymentterms': ['Term1', 'Term2', 'Term1'],
'Partnerbanktype': ['Bank1', 'Bank2', 'Bank1'],
'Taxcode': ['Tax1', 'Tax2', 'Tax1']
})
# Preprocessing for numerical and categorical data
preprocessor = ColumnTransformer(
transformers=[
('num', StandardScaler(), ['Amount', 'Fiscalyear']),
('cat', OneHotEncoder(), ['Company_code', 'Vendor', 'InvoiceDocumenttype'])
])
# Multi-output model
model = Pipeline(steps=[
('preprocessor', preprocessor),
('classifier', MultiOutputClassifier(DecisionTreeClassifier()))
])
# Fit the model
model.fit(X_train, y_train)
# Example input for prediction
X_test = pd.DataFrame({
'Company_code': ['A'],
'Vendor': ['X'],
'Amount': [120],
'InvoiceDocumenttype': ['Type1'],
'Fiscalyear': [2021]
})
# Predict
predictions = model.predict(X_test)
# Convert predictions to a readable format
fields = ['HKONT', 'KOSTL', 'Profitcenters', 'Paymentterms', 'Partnerbanktype', 'Taxcode']
prediction_dict = dict(zip(fields, predictions[0]))
print(prediction_dict)
Request clarification before answering.
| User | Count |
|---|---|
| 7 | |
| 6 | |
| 6 | |
| 4 | |
| 4 | |
| 4 | |
| 3 | |
| 3 | |
| 3 | |
| 3 |
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.