cancel
Showing results for 
Search instead for 
Did you mean: 
Read only

Read flat file within data flow using query transform

Former Member
0 Likes
810

Hello Data Services Community!  I have a Data Services question that I hope someone here can shed light upon.

Here's the situation: my client receives many small flat files (CSV) from remote devices every 15 minutes into a specific project folder.  There are many projects and, thus, many folders.  There is a specific naming convention for the files and the folders.  The parent folder name contains the project identifier and the file name contains the device counter.  The combination of the project identifier and the device counter allows my client to identify which type of device is sending the file and, thus, to which HANA table the data contained in the file should be delivered.  Each type of device sends data with a completely unique column sequence and so the target HANA tables are different.  Fortunately, there aren't too many device types (less than 10).

Here's what I've done so far: I have a workflow with a script to recursively read the folder structure and create a text file with all of the files that match the appropriate file type - in this case, *.CSV.  In my mind, this will allow me to capture a snapshot of the files to process at that instant without having to worry about other files arriving in the meantime.  Next, I have a data flow that pulls in this newly created text file.  I added a query transform to parse the project identifier and the device counter and then another query transform to lookup the device type (class).  The case transform that follows takes the lookup value and "should" route the data to the appropriate HANA table.

Conceptually, I think I should have another flat file source element after the case transform.  The file name for that source element comes from the original text file described above. That is what I've tried to create here with the embedded data flow.

Above you can see the structure of the embedded data flow that I have in the data flow above.  This approach does not seem to work as the flat file in the embedded flow never receives the name from the variable.  The variable ($datafile_name) is a global variable that I attempted to populate in the query transforms that immediately follow the case transform.

I tried another approach without success - again, the variable that should populate in the flat file definition is not recognized and so the data isn't imported.

I would appreciate any input from the community.  If I should rethink the entire flow, I am open to suggestions.  If you can see a fault in my logic, please offer solutions.  Thanks in advance!

View Entire Topic
former_member187605
Active Contributor
0 Likes

Why don't you add the filename to your data stream at the beginning of your flow? Set the Include file name column property to Yes.

Your variant with the embedded data flow will never work. An embedded data flow must come at either the beginning or at the end of a data flow and it is always linked to it thru a (virtual) nested schema source or target file.

Former Member
0 Likes

Dirk - first of all, thank you for the very prompt response.  I've seen your input on a number of other posts and you've consistently provided invaluable insight.

Technically, the embedded flow is at the end of the parent data flow so that's why I thought it might work.  I'm not sure where to look for the nested schema that would help me populate data into the embedded flow.

I may not have been clear on what is being read in the text file at the beginning of my data flow.  It is only a collection of file names (as opposed to file name and content as I think you may have thought).  Each row in that file has a filename that should go into a specific branch of the case and become the input for a flat file schema specific to the target HANA table.

The content of the data files does not have sufficient information to identify the target.  Maybe I should attempt to read the content of the text file in addition to the file names...?  Essentially, the first text file would have a structure of something like:

Full_Path_and_Name - varchar(255)

Data_Contents - varchar(1000)

The script to produce the text file with just names is pretty simple:

# define folder and file type variables to be used in DIR below

$path_and_file = $starting_path || $file_type;

# define name of resulting file with all file names

$file_list = $starting_path || 'filelist.txt';

# pushes results of DIR to a file, use flag 8 to give a return code and

# any messages that appear from the CMD window after completing the DIR

$cmd_return = exec( 'cmd.exe', 'dir [$path_and_file] /b /s /a:-d > [$file_list]', 8 );

print( $cmd_return );

if ( file_exists( $file_list ) = 1 ) print( 'File list created');

Adding logic to read the file might be more complex but could resolve the problem.

former_member187605
Active Contributor
0 Likes

Can you have a look at http://scn.sap.com/docs/DOC-61979, and more specifically how I deal with different types of records within a file? That's very similar to your situation, except that you've got a different record format in multiple files. You may be able to reuse part of that logic. Build your Case transform on the Full_Path_and_Name column and then disassemble the Data_Contents column as in my example.

In order to better understand how DS deals with embedded data flows, select FF_Device_Cass_12, QRY_COMINE_DATA_12 and the atrget table, rifght-click and then select Make Embedded Data Flow... from the popup menu.