cancel
Showing results for 
Search instead for 
Did you mean: 

BODS 4.2 - Table Comparison Sorted Input Option - Collation Statement in Order by Clause

swenck
Discoverer
0 Kudos

In the Table Comparison Transform when I choose the Sorted Input option it places this in the Order by Clause after the first column: "COLLATE Latin1_General_BIN". I get spurious results in the result set with this occurring. If I change the option to Row by Row Select the order by clause does not produce the COLLATE statement in the order by clause and the results of the transform are as expected. What would cause this type of behavior in the Table Comparison Transform?

Accepted Solutions (0)

Answers (3)

Answers (3)

former_member187605
Active Contributor
0 Kudos

For an overview of comparison methods, with pros and cons of each option check https://wiki.scn.sap.com/wiki/display/EIM/Table_comparison+transform%3A+Comparison+method+settings+a....

I know the collate clause as a typical MS SQL Server feature. Could it be that your source and target data are in a different code page? To ensure that both streams are sorted in exactly the same order, a mapping to a common encoding schema must take place first.

vibhorgautam
Explorer
0 Kudos

Hii Steve, out of three methods, Sorted input performs best but prerequisites are that it will take the columns in a sorted format only and table comparison will then assume input rows to be in a sorted format and will not trace each row to compare the rows and field values.

your input rows must be sorted (ascending) in the same order as you've set the primary keys of the transform. In most cases this will require you to insert a Query transform before the Table Comparison transform, and to do the correct sorting in this transform. But once you've done that, this option is faster than the "Cached comparison table" method. As everything is sorted, Data Services indeed only needs to read the comparison table once. To give you an idea, for a comparison table with about 250 000 rows and as many input rows, the "Row-by-row select" method supposing it takes about 7 to 8 minutes. Switching to "Sorted input" brought this time down to a mere 7 seconds. And "Cached comparison table" was only slightly slower, with 10 to 12 seconds.

Regards,

Vibhor

former_member442248
Active Participant
0 Kudos

Hi Steve.

You will have to first decide as to why you want to use Sorted input option. It is for a very specific use case. Please go through the Table comparison documentation and you will be able to get the reason.

Regards. S

swenck
Discoverer

Hi Shazin - my understanding is that the Sorted Input Option provides the best performance compared to the Row by Row Select Option in the Table Comparison Transform. I am using this along with the History Preservation and Key Generation components for Slowly Changing Dimension processing.

What is this "very specific use case" you are referring to regarding the Sorted Input Option - can you elaborate? Also - why is choosing Sorted Input adding the "COLLATE" clause in the order by of the Optimized SQL? This is causing erroneous results in the output.