2025 May 02 8:57 AM - edited 2025 May 02 10:12 PM
Request clarification before answering.
The logic of the code is slightly confusing for me, like why do you assign
data.loc[i,'HIERCUST4'] = w_custto all rows in every loop oj `j`...
...but if I read your requirements right, and `data` and `data_h` are both Pandas dataframes, then how about trying this code?
# Iterate through each customer from data_h
for _, row_h in data_h.iterrows():
w_cust = row_h['KUNNR']
# Boolean mask for matching HKUNNR
mask = data['HKUNNR'] == w_cust
# Conditional assignments only where the mask is True
data.loc[mask, 'HIERCUST1'] = data.loc[mask, 'HKUNNR']
data.loc[mask, 'HIERCUST2'] = data.loc[mask, 'KUNNR']
data.loc[mask, 'HIERCUST4'] = w_custRegards.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Hi Vitaliy-R
Here is my code:
def transform(data):
"""
This function body should contain all the desired transformations on incoming DataFrame. Permitted builtin functions
as well as permitted NumPy and Pandas objects and functions are available inside this function.
Permitted NumPy and Pandas objects and functions can be used with aliases 'np' and 'pd' respectively.
This function executes in a sandbox mode. Please refer the documentation for permitted objects and functions. Using
any restricted functions or objects would cause an internal exception and result in a pipeline failure.
Any code outside this function body will not be executed and inclusion of such code is discouraged.
:param data: Pandas DataFrame
:return: Pandas DataFrame
"""
#####################################################
# Provide the function body for data transformation #
#####################################################
data["HIERCUST1"] = ""
data["HIERCUST2"] = ""
data["HIERCUST3"] = ""
data["HIERCUST4"] = ""
data["HIERCUST5"] = ""
H = int(0)
H1 = int(0)
nb_ind = int(0)
w_cust = ""
#---------------------------------------------------------------------
# Noeuds sans parents, tête de hierarchie niveau 1
#---------------------------------------------------------------------
data.sort_values('HKUNNR')
for i in range(data.shape[0]):
if data['HKUNNR'][i] == "":
nb_ind = nb_ind + 1
ind = np.arange(nb_ind)
#-----Création d'une table temporaire pour les entetes de hierarchie
data_h = pd.DataFrame(columns=['KUNNR'], index = ind)
#-----Alimentation de la table de têtes de hierarchie
for i in range(data.shape[0]):
if data['HKUNNR'][i] == "":
data.loc[i,'HIERCUST1'] = data['KUNNR'][i]
new_row = 'KUNNR':data['KUNNR'][i]
data_h.loc[len(data_h)] = new_row
#---------------------------------------------------------------------
# Hierarchie niveau 2, enfants tête de hierarchie niveau 1
#---------------------------------------------------------------------
data.sort_values('KUNNR')
data_h.sort_values('KUNNR')
"""
for i, row in data.iterrows():
if row ['HKUNNR'] == "0001010970":
data.loc[i,'HIERCUST1'] = row ['HKUNNR']
data.loc[i,'HIERCUST2'] = row ['KUNNR']
for j in list(range(0, len(data_h))):
#for j, row in data_h.iterrows():
# w_cust = row ['KUNNR']
w_cust = "0001010970"
for i, row in data.iterrows():
data.loc[i,'HIERCUST4'] = w_cust
if row ['HKUNNR'] == w_cust:
data.loc[i,'HIERCUST1'] = row ['HKUNNR']
data.loc[i,'HIERCUST2'] = row ['KUNNR']
"""
#----------------------------SAP forum-----------
for _, row_h in data_h.iterrows():
w_cust = row_h['KUNNR']
# Boolean mask for matching HKUNN
mask = data['HKUNNR'] == w_cust
# Conditional assignments only where the mask is True
data.loc[mask, 'HIERCUST1'] = data.loc[mask, 'HKUNNR']
data.loc[mask, 'HIERCUST2'] = data.loc[mask, 'KUNNR']
data.loc[mask, 'HIERCUST4'] = w_cust
#----------------------------SAP forum end-------
return data As you can see, there is no filter...
Best regards
Could this be related to the fact that the data arrives in multiple batches? the code might never see the full dataset.
"In a data flow, the script operator may receive the incoming table in multiple batches of rows, depending on the size of the table. This means that the transform function is called multiple times, for each batch of rows, and that its data parameter contains only the rows for data given batch.
Hence, the operations that require the complete table within the data parameter are not possible. For example, removing duplicates."
| User | Count |
|---|---|
| 15 | |
| 9 | |
| 6 | |
| 5 | |
| 4 | |
| 4 | |
| 3 | |
| 2 | |
| 2 | |
| 2 |
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.