cancel
Showing results for 
Search instead for 
Did you mean: 

Invalid datetime value when using Python Script in Datasphere

_Markus_
Explorer

Hello everyone,

I am still relatively new to the Datasphere and am currently familiarizing myself with Python scripts for data transformation. I am encountering a problem in the mapping, which does not occur without a Python script with pure mapping of field in Datasphere.

My source has 56 fields, which should be extended by 2 additional calculated fields. If I write the data 1:1 (without script) into a target table, the data transformation works. As soon as the script is inserted and the data are passed through the script, the following error message appears:

 

MarkusA_0-1716464699270.png

I cannot identify from the error message which column is causing the problem. Is there any way to identify it, or do I have to extend the dataflow column by column and deploy and run the script again and again to determine which column is causing the error?

 

Vitaliy-R
Developer Advocate
Developer Advocate
0 Kudos
Hi Markus. From what I can read it fails at the filed which Python transformation (using Pandas package) assumes should be a valid date-time. Instead it seems to receive some date from Jan 1st, Year 1.
_Markus_
Explorer
0 Kudos
Hey Vitaly, thanks for your reply. Your assumption is correct. I've a look to the SQL-View on the source system and found the (AS400) timestamp column, which allows a value '0001-01-01'. I didn't thought about this way yesterday. But I wonder if there would be an easy way to find the related column from datasphere view. Is somewhere a logfile where I could find the column information causing this issue?
_Markus_
Explorer
0 Kudos
And then I have the question of how to deal with the fact that the input value is valid in SQL / Datasphere-Native, but not in Pandas. The error occurs when the script is called. This means that I have no way of checking the date field within the script and correcting it if necessary. So the only option is to avoid the script in the dataflow or to change possible invalid values in calculated columns beforehand. Or am I missing something? Translated with DeepL.com (free version)
_Markus_
Explorer
0 Kudos
Hey Simon, thanks for your idea. I hadn't even thought about the Jupyter notebook. I'll give that a try. However, after further research, it seems to me that the problem lies in an incompatibility of the data type. In the source (AS400 database) there is a column of the data type "timestamp". If I use the Dataflow, the data from the remote table can be transferred to the local table without any problems. If I add a script without additional code (only "return data") to the Dataflow, the error message appears. The error therefore occurs regardless of the content of the Python script. The solution can therefore only be that I check all columns for possible incompatibilities before calling the Python script and before the dataflow reaches the script. Translated with DeepL.com (free version)
View Entire Topic
Simon_Ye
Advisor
Advisor
0 Kudos

Hi,

First, the issue is caused by some invalid datatime value, you just need to check the related columns. it's not very convenient to debug the Python script in Datashpere, you can try to debug it in other applications, like Jupyter Notebook, with some sample data.

Second, you can try some other node to extend the columns you want if it's possible.

Thanks,

Simon

_Markus_
Explorer
0 Kudos

Hey Simon,

thanks for your idea. I hadn't even thought about the Jupyter notebook. I'll give that a try.

However, after further research, it seems to me that the problem lies in an incompatibility of the data type. In the source (AS400 database) there is a column of the data type "timestamp". If I use the Dataflow, the data from the remote table can be transferred to the local table without any problems.
If I add a script without additional code (only "return data") to the Dataflow, the error message appears.
The error therefore occurs regardless of the content of the Python script.

The solution can therefore only be that I check all columns for possible incompatibilities before calling the Python script and before the dataflow reaches the script.