on 2024 May 29 2:45 PM
Hello,
I have a python script running inside a Data Flow that results in the following error.
Since it's impossible to debug inside Datasphere, I took the source code to another interpreter and loaded my dataset from csv to test. The code I have runs fine and produces the expected result in the other interpreter. Any ideas what might be causing the issues? This issue is a showstopper for what I'm doing right now, so any input would be greatly appreciated.
Thanks,
Dimitar
Request clarification before answering.
Here is the code
def transform(data):
"""
This function body should contain all the desired transformations on incoming DataFrame. Permitted builtin functions
as well as permitted NumPy and Pandas objects and functions are available inside this function.
Permitted NumPy and Pandas objects and functions can be used with aliases 'np' and 'pd' respectively.
This function executes in a sandbox mode. Please refer the documentation for permitted objects and functions. Using
any restricted functions or objects would cause an internal exception and result in a pipeline failure.
Any code outside this function body will not be executed and inclusion of such code is discouraged.
:param data: Pandas DataFrame
:return: Pandas DataFrame
"""
data=data[data['refunds'].isnull() == False]
data.rename(str.upper, axis='columns', inplace=True)
result = pd.DataFrame(
data=data,
columns=[
'ORDER_LINE_ID', 'ID', 'AMOUNT', 'COMMISSION_AMOUNT', 'COMMISSION_VAT_AMOUNT', 'QUANTITY',
'DATE_WAITING_REFUND', 'DATE_WAITING_REFUND_PAYMENT', 'DATE_REFUNDED', 'STATE', 'TRANSACTION_DATE',
'TRANSACTION_NUMBER', 'REASON_CODE', 'REASON_LABEL', 'PRICE_TAXES', 'SHIPPING_PRICE_TAXES',
'CTA_AMOUNT', 'CTA_CODE', 'CTA_RATE', 'RFD_COUNT', 'CURRENCY_ISO_CODE', 'DATE_LOADED','REFUNDS']
)
all_refunds=[]
for idx in result.index:
row = result.loc[idx].to_dict()
tmp = row['REFUNDS'].replace('"', '').split("{id")
refunds = list(filter(lambda x: len(x) > 1, tmp))
#get the number of refunds for each order id
row['RFD_COUNT'] = len(refunds)
for rec in refunds:
# remove all double quotes.
rec = rec.replace('"', "")
# split off the commission taxes
tax_split = rec.split("commission_taxes:[")
# split off the the key/value pairs after the commission taxes
tmp = tax_split[1].split("],")
com_taxes = tmp[0][1:-1] #remove the json braces {}
# rebuild the refund key/val pairs after splitting off the commission taxes
refund = "id" + tax_split[0] + tmp[1]
# remove all occurrences of: [,],{,}
refund = refund.replace("[", "").replace("]", "").replace("{", "").replace("}", "")
# drop the extra comma on the record if it's not the last refund for this order
if refund[-1] == ",":
refund = refund[:-1]
for key_val in refund.split(","):
tmp = key_val.split(":")
key = tmp[0].upper()
if key in result.columns:
val = tmp[1]
row[key] = val
for key_val in com_taxes.split(","):
tmp = key_val.split(":")
key = tmp[0].upper()
val = tmp[1]
row['CTA_'+key] = val
all_refunds.append(row)
result = result.loc[0:0]
result = pd.concat([result,pd.DataFrame(data=all_refunds)])
result.drop(['REFUNDS'],axis='columns', inplace = True)
return result
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
It does not seem the code is using any of non-supported statements from https://help.sap.com/docs/SAP_DATASPHERE/c8a54ee704e94e15926551293243fd1d/73e8ba1a69cd4eeba722b458a2..., so I would suggest trying it in the external interpreter but with the environment similar to the one used in DSP: python==3.9, NumPy==1.21.5, Pandas==1.2.5
`as_tuple` usually comes from https://docs.python.org/3/library/decimal.html#decimal.Decimal.as_tuple, so it might be the result of some of these other libraries available in DSP's environment, like Pandas or NumPy.
| User | Count |
|---|---|
| 12 | |
| 9 | |
| 6 | |
| 4 | |
| 4 | |
| 4 | |
| 3 | |
| 3 | |
| 2 | |
| 2 |
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.