You are using DSP as a Data Fabric to load multiple data sources, and all works great, and you can enjoy life (you know, teleworking from some heavenly place) ... until something goes wrong. You receive an email that a Data Flow has failed. That SFTP seems to be not as reliable as it should be, so now what?
Leave your Virgin Mojito and open DSP.
The test consists of loading a CSV with 30,000 rows by means of a Data Flow, and to provoke an error in the middle of the process, to see what we obtain in the destination table.
The CSV has 30000 rows, with two fields:
The source is SFTP, configuring the CSV as:
The destination is a DSP local table with
The execution of the data flow ends with error.
In the destination table, data has been saved, specifically the IDs ranging from 1 to 19000.
So I draw two conclusions:
- The most important is that the DF execution is not an “all or nothing” type, and commits have been made during the process.
- The second conclusion is that as we have two fetch controls, in source and in destination, it seems that this makes that the block of rows 19,001-20,000 has not been loaded even if it did not have any error, surely, when the block of 1,000 rows with error was read the exception was thrown, and the part that saves in the destination had not yet processed the block of 19,001-20,000, and that is why we do not have it.
I did a second test by increasing the rows to 110,000, leaving the wrong row at 105,001, and removing the fetch size control on both sides, to check if the theoretical maximum limit of the fetch size is 100,000.
But it saved from row 1 to 104,000 ... implying that the fetch size was not 100,000. I don't know if the value it takes depends on the source, the combination of them, etc.
When doing ETLs with delta loads that are manually managed based on e.g. timestamps, there is a risk that when an error occurs, rows from a failed load are left in the target table.
This may cause the calculation of the next timestamp to be loaded to be erroneous, so these rows should be deleted before their calculation.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.