cancel
Showing results for 
Search instead for 
Did you mean: 

Frequency of record count updates in DataBridge 10 vs DataBridge 7.5

Former Member
0 Kudos

I'm upgrading a client from PCM 7.5 to PCM 10.0 and running regression tests.

In 7.5, I've noticed that DataBridge provides an update every 200 records, e.g.:

Processing REVENUE (200)

Processing REVENUE (400)

Processing REVENUE (600)

In 10.0, DataBridge is providing an update efvery 20000 records, e.g.:

Processing REVENUE (20000)

Processing REVENUE (40000)

Processing REVENUE (60000)


Is there a preference I can set to change this, or is it a permanent change in PCM 10?

Thanks

Steve

Accepted Solutions (0)

Answers (1)

Answers (1)

Former Member
0 Kudos

Steve

There is no parameter in the source data preprocessing statements, record statements, or output data post processing statements that allows a user to set the value <N> for return of a total value of lines processed after <N> lines processed.  The value of <N> is a default.

However there is nothing stopping you from writing your own set of SPE files with  . <N> RECORDS progressively read and a  Skip <N> Lines statement (for 2nd,3rd,4th..SPE files) and bundle them in a control file. The net effect being you can get feedback on load after each SPE file has run in <N> records that you set in each SPE file.

The SPE files though time consuming to write (you will be adding to a standard SPE file and multiplying it as needed) need only be written once and will be applicable on any model in any solution for any client (assuming a standard table destination and standard file input). Updating <N> in your scripts, should you need to, will be simple find and replace and or any other parameter then running the Databridge wizard should do the trick to modify the SPE with any additional statements you require.

Hope this helps.

Regards

Michael

Former Member
0 Kudos

Hi Michael

Thanks for finding the time to make this suggestion.

I see what you're proposing, but I think it's a bit over-engineered for my particular requirement, and in any case you'd need to know in advance how many records were in your source.  I'll live with the update every 10,000 records.  Shame, because now for files with fewer than 10,000 records I get no record update during load at all, but it's not worth adding complexity in order to get round it.

Steve

Former Member
0 Kudos

Steve

Regarding your point about needing to know the number of records in the source you don't, you simply end your series of SPE files in the control file with a skip followed by a standard SPE script which will run to the end of the input file regardless of records left. However you surely must know what your data capacity is for your system, and so have a good idea what that upper limit is so the SPE files will be sufficient?

I agree the solution is over-engineered but I would challenge the requirement itself. You have a known data source file of N records which is perhaps automatically, semi-automatically or manually being  loaded into PCM via DataBridge. PCM has no means of scanning the record count prior to load via Databridge but it has trace and error capturing after the load has completed if enabled and alert log which tells you N records were load if no errors. Why exactly do you want updates during load?

If you require an adhoc interactive load process perhaps consider Data Loader, with scripts which echo the record count after N records of your choosing. You have the added bonus that you can build in any further validation of the data before it gets put into the PP tables.

Regards

Michael

Former Member
0 Kudos

Hi Michael

I'd like (more frequent) updates during the Data Bridge load because it provides reassurance that something's actually happening.  An update every 10,000 records is fine if you have a million records, but if you have 9,000 records it's less useful.

I don't know the number of records being imported, as it happens. Depending, sometimes I can have hundreds, sometimes I can have hundreds of thousands, and the distribution is long-tailed.  So I could take a view that I'll never have more than a million records, but then (a) if I want to read the file in chunks of 1,000 I'll need 1,000 SPE files, (b) most of the time hundreds of those SPE files won't be doing anything other than slowing down the process and (c) if the file size does ever grow over a million records, I may not notice.

All these problems are solvable with more coding, I guess, but again, I don't want to solve what is basically an irritation with a significant amount of software development.

Cheers

Steve

Former Member
0 Kudos

Hi Steve

I would assume you'd decided DataBridge was the optimal solution for the client's data feed import into PCM. I suppose SAP would say since they basically increased the largest theorectical OLAP cube possible in PCM they had to rationalise the updates which slow down the DataBridge routine.

Ok so the distribution of the source data volumes has a high standard deviation so any expected volume number is meaningless, alright so have they considered adding a simple layer of logic which can create source data files of 10,000 records each and then any left over records in a final load file.

Since we're talking statistical distribution then the probabilistic distribution for a failure of data load trails of the same average DQ is increased for N records as N tends to a very large number. In other words your client should be more reassured when N is at the small end of that distribution of data set record numbers regardless of anything else.

This is simple ETL if you have a good ETL solution and even if you don't have that a simple text splitter windows utility will do that for you, run automatically on the source data file. A simple bit of recursive windows batch file script can re-run files DataBridge import and these can all be neatly and simply packaged in a console routine set to send an alert after each file is successfully loaded.

Data loader is already optimised for this sort of recursive loading but since you didn't mention it in your last reply I guess it is off the table? Anyway it's an option and good with high volume records and you can make it as interactive as you like with feedback such as number records available to load and info on records loaded, even tells you what kind of load error has occurred before you load into PP tables (I said this before) otherwise it will load them into the PP tables like DataBridge does in one step.

Could I be so bold as to say the real requirement here might be pre-validation of the records before the load is completed to prevent or minimise load failure. Reassurance that the last N records have been loaded won't stop the error if there is going to be one in the N+1 record.

Well regardless of whether reassurance or pre-validation of data is the requirement then it has a cost. Question is are they willing to pay?

The subtle point here is that reassurance can be after the fact in the shape of a DataBridge trace log giving the number of successful records imported and the client presumable can tie that up to the source data record number but if the client wants it during the process, the Data Loader is the way to go then,since they are adding a constraint to the process which doesn't improve it but data loader is designed at giving feedback than DataBridge.

Just out of interest what is the performance when your client loads 1 million records using DataBridge?

Regards

Michael

Former Member
0 Kudos

Hi Michael

Thanks for the reply; honestly, you're making it way more complicated than it needs to be. Appreciate your time, though.

Cheers

Steve

Former Member
0 Kudos

Hi Steve

Giving options to a requirement that is not where  the complication comes in it's failing to see this is not a value add in satisfying it in PCM. We both agree on this I think.

Satisfying the requirement of interim load updates makes it way more complicated than it needs to be. If you have source file loading of 500,000 records if you had an update every (N=)10,000 records that's 50 updates, and more if the value of N is bigger.

The problem is no matter what you set the number N for number of records, unless set at N=1,  for an update you can technically have a load file with less records than that, which mean you'll never see that update.

The reassurance of your client should be gained from validating the data before load and validating records loaded successfully afterwards. That gives them confidence of the DQ and information about the load file like number of records etc.

Thanks for posting an example of a seemingly simple non-functional requirement which shows a business need of the client which is better handled in ETL.

Please can I get some credit for my efforts?

Regards

Michael