cancel
Showing results for 
Search instead for 
Did you mean: 

High Cardinality Flag

Former Member
0 Kudos

If a Dimension only has the High Cardinality flag checked, I understand that the index will be B-Tree. However, if I am to determine whether this setting is correct if I want to check the cardinality i.e.

Number of Distinct Values/Number of Records

This is to be done using the Number of Records in the Dimension table and not the Fact table is that correct? Thanks

Accepted Solutions (1)

Accepted Solutions (1)

Former Member
0 Kudos

Hi Harpal,

Yes..You have to consider the no. of records in the Dim table.

<u><b>From help doc</b></u>

High Cardinality means that this dimension contains a large number of characteristic values. This information is used in accordance with the individual database platform in order to optimize performance. For example, an index type other than the standard may be used. Generally a dimension is perceived to have high cardinality when the dimension is at least 20% the size of the fact tables in terms of the number of entries each contain. Avoid marking a dimension as having high cardinality if you are in any doubt.

Bye

Dinesh

Former Member
0 Kudos

Thanks Dinesh,

What I am not clear on is as follows, the Dim table will have a unique Dim Id for each record. Therefore which characteristic otherwise would you use in the Dimension to work out the cardinality? Also why would you just have high cardinality checked by itself i.e. without Line Item Dimension I understand that it gives you a B –tree index but I am more trying to relate to a scenario type explanation. Thanks

Former Member
0 Kudos

Hi,

It has to check (logically)with the no of records to be created for this SID value in the fact table . Thats the meaning of Cardinality in this case. for example Sales doc in sales overview cube.

With rgds,

Anil Kumar Sharma .P

Former Member
0 Kudos

Hi Anil,

If possible could you give a simple scenario and outline steps

Thanks

Former Member
0 Kudos

Hi,

I donot have perfect procedure to find out cardinality. But my idea is calculate ,for example,no of sales orders created till now and calculate the no of records in VBAP .Then use formula to calculate cardinality

(No of records from VBAP * 100)/ No of sales orders .

The above formula is calculated with the values from R/3.

If it is more than 10 % then define sales order as line item dimension in BW.

With rgds,

Anil Kumar Sharma .P

Former Member
0 Kudos

Have you talked to someone with domain expertise? That is, someone that is familiar withthe data you are going to be loading. Dimension design is a complete guess without some knolwedge of the data.

Review Oracle's doc on bitmap index and cardinality.

Rather than agonizing over the theoretical, there is no reason you can not define the dimensions, load your data and then review what you have. You can change the cardinality indicator on a dimension, reload the data and then review explain plans to see what has changed.

Former Member
0 Kudos

If I put the following scenario forward:

I have a COPA cube. The cube has 5 dimensions (plus the usual 3 three dimensions). The fact table consists of about 8.5 Million records. The First Dimension has High Cardinality checked, it is not a Line Item Dimension and contains characteristics like Business Area, Company Code, Division. If I check the number of records in this Dimension there are just over 6,000 records. Therefore, I have calculated the cardinality as follows:

6000/8.5Million * 100 = 0.07%

Therefore I am questioning the validity of having this Dimension as High Cardinality on the calculation above. However, if I try to determine distinct values for the Dimension table since it has a DIM ID this will always have a distinct value, therefore not sure if this is a way to calculate cardinality.

Thanks

Former Member
0 Kudos

Hi,

<i>Therefore I am questioning the validity of having this Dimension as High Cardinality on the calculation above</i>.

So it is not at all considerable as Line item.

With rgds,

Anil Kumar Sharma .P

Former Member
0 Kudos

Thanks Anil and all others, what we are doing is checking recommendations made on an EarlyWatch Report. Do you therefore agree that the method that I calculated the Cardinality in this case is correct?

Former Member
0 Kudos

You're right, for a fact table of 8.5 million, you would NOT want a dimension with only 6,000 values to be marked high cardinality.

The approach to calculate the dimension relative to the fact table is fine. The challenge in the initial design of dimensions is that <b>without expert domain knowledge</b>, it is difficult to figure out how big a dimension will be until you have built the cube and loaded data. Unless you can analyze the data from R3 some way, you have to go thru load and review process.

Yes every Dim ID is a distinct value by design. What you are trying to avoid is putting characteristics that by themselves, each have low cardinality, but have no relationship to one another, and when put together in the same dimension, result in a large dimension, e.g.

Let's take your existing dimension with 6,000 values (6,000 different combinations of the characteristics currently in the dimension), and you add another characteristic that has 1,000 distinct values by itself.

Adding this characteristic to this dimension could result in no new dimension rows if the new characteristic was directly related to an existing characteristic(s),

e.g. lets say you were adding a char called Region, which was nothing more than a concatenation of Division and Business Area. Dim still has only 6,000 values. (When you have parsings like this where a characteristic is a piece of another characteristic, you would want them in the same dimension).

Or lets say you were adding a characteristic to this dimension that has no relationship to any of the existing chars, a Posting Date. Each occurence of the 6,000 dimension combinations has all 1,000 Posting dates associated with it. Now your dimension table is 6,000 * 1,000 = 6,000,000 rows !!! Now your dimension IDs would be considered to have high cardinality. The answer in this design however, is NOT to set this dimension to high caridnality, but rather, to put Posting Date in it's own dimension.

Hope this helps a little.

Former Member
0 Kudos

Helps Alot!! Thanks Guys have awarded points

Former Member
0 Kudos

Not to critizize the Early Watch Report, but you need to keep in mind that many of the things they look at are simply from a point in time. They don't know what you might be doing with respect to loads, future data volumes, etc.

Former Member
0 Kudos

Thanks Pizzaman, that is one of the reasons why we are actually analysing the recommendation and we also feel that some recommendations may be static as part of a report. Enjoy the weekend.

Answers (3)

Answers (3)

Former Member
0 Kudos

Hi Harpal,

I would not be able to explain now..Gotta go..

Do you have the book "Mastering Business Information Warehouse". The concept of high cardinality is explained in detail there.

Bye

Dinesh

Former Member
0 Kudos

Hi Harpal,

Also refer the below link for Line Item Dim and High Cardinality.

http://help.sap.com/saphelp_nw04/helpdata/en/a7/d50f395fc8cb7fe10000000a11402f/frameset.htm

Bye

Dinesh

Former Member
0 Kudos

Hi,

To find out no of <i>Number of Distinct Values</i>, calculate the no of records in SID table. And To find out no of <i>Number of Records</i>, calculate the no of records in fact table. Then use your formula.

With records,

Anil Kumar Sharma .P