Technology Blogs by Members
Explore a vibrant mix of technical expertise, industry insights, and tech buzz in member blogs covering SAP products, technology, and events. Get in the mix!
cancel
Showing results for 
Search instead for 
Did you mean: 
jgleichmann
Active Contributor
60,737

last updated: 2022-11-04


It took me nearly two years to write the second part of this partition series. The first one for BW partitioning can be found here.

The main question of this blog: How to determine the best partitioning combination?

In an ERP or S/4 HANA system it is possible that a table is approaching the 2 billion mark. This is a HANA design limit for a column store table (per partition). In this case you have to partition the table. But different than on a BW system there are no standard rules for a partitioning. This means you have to choose a partitioning rule by your own. But how to determine the best partitioning combination?

  1. Alerts

  2. Find tables with high amount of records

  3. Term clarification

    1. selectivity

    2. cardinality

    3. HASH Partitioning

    4. Range Partitioning

    5. Multi-Level Partitioning



  4. Designing Partitions

  5. Example BSEG

  6. Example BSIS

  7. Example for a Z-Table <tbd>

  8. Parameter for parallel splitting

  9. Partitioning process

  10. Tips

  11. Partition / Re-partitioning references

  12. Summary






1. Alerts


At first check if partitioning is needed at all which means check the tables with the most row entries. There is a limit per table/partition of 2 billion rows by the design of HANA.

If your alerting is working correct you will receive an alert (ID 27). It looks like this:

“Record count of column-store table partitions ( ID 27 )

Determines the number of records in the partitions of column-store tables. A table partition cannot contain more than 2,147,483,648 (2 billion) rows.”

  • Interval 1 hour

  • High: 1,900,000,000

  • Medium: 1,800,000,000

  • Low: 1,500,000,000


The limitations are described in SAP note 2154870.



Other once:

"Alert ID 20 - Table growth of non-partitioned column-store tables"

"Alert ID 17 - Record count of non-partitioned column-store tables"

 




2. Find tables with high amount of records


Check all tables by high amount of records. Therefore, you can use the SQL from 1969700 (HANA_Tables_TopGrowingTables_Records_History).

Rule of thumb for the initial partitioning:

  • min. 100 mio entries per partition for initial partitioning

  • max 500 mio entries per partition for initial partitioning

  • if you choose too many partitions you can achieve a bad performance, because one thread per partition have to be triggered (e.g. you have 20 threads as statement concurrency limit and 40 partition have to be scanned which results in waiting for resources)

  • if you choose too less partitions, it can be that you have to repartition pretty timely which means another maintenance window / downtime

  • recommendation: HASH partitioning on a selective column, typically part of primary key

  • making sure that a single partition doesn't exceed a size of 20 to 50 GB due to delta merge performance (SAP Note 2100010)


In our example we determined BSEG and BSIS as tables with high amount of rows. For timely planning of maintenance you should act when the table exceeds 1,5 billion entries.


For our scenario BSEG has 4,5 billion and BSIS has 4,1 billion rows. So, they are already partitioned. But is this partitioning over the time since the initial creation still optimal?

Growth over 30 days:






3. Term clarification


To determine the optimal partitioning combination, we have to clarify some terms:

  • Cardinality

  • Selectivity

  • HASH Partitioning

  • Range Partitioning

  • Multi-level partitioning


 

Cardinality


In the context of databases, cardinality refers to the uniqueness of data values contained in a column. High cardinality means that the column contains a large percentage of totally unique values. Low cardinality means that the column contains a lot of “repeats” in its data range. In a customer table, a low cardinality column would be the “Gender” column. This column will likely only have “M” and “F” as the range of values to choose from, and all the thousands or millions of records in the table can only pick one of these two values for this column. Ok, to be accurate nowadays we have to add a third value for diverse “D”.

Source: https://www.techopedia.com/definition/18/cardinality-databases






Note: Partitioning works better when the cardinality of the partitioning field is not too high

Selectivity


The selectivity basically is a measure of how much variety there is in the values of a given table column in relation to the total number of rows in a given table. The cardinality is just part of the formula that is used to calculate the selectivity.






Selectivity = cardinality/(number of rows) * 100

Hash Partitioning


Hash partitioning is used to distribute rows to partitions equally for load balancing. The number of the assigned partition is computed by applying a hash function to the value of a specified column. Hash partitioning does not require an in-depth knowledge of the actual content of the table.



Source: openHPI: In-Memory_Data_Management_2017

Range Partitioning


Range partitioning creates dedicated partitions for certain values or value ranges in a table. For example, a range partitioning scheme can be chosen to create a partition for each calendar month. Partitioning requires an in-depth knowledge of the values that are used or are valid for the chosen partitioning column.


Partitions may be created or dropped as needed and applications may choose to use range partitioning to manage data at a fine level of detail, for example, an application may create a partition for an upcoming month so that new data is inserte




Source: openHPI: In-Memory_Data_Management_2017

Advantage: possible dynamic range to use this generic model for 'no maintenance' partitioning. For this feature, which is only supported for range partitioning, partitions will be automatically created if a threshold is reached:



With SPS06 it is also possible to do this for defined time intervals.
ALTER TABLE T ALTER PARTITION OTHERS DYNAMIC INTERVAL 1 YEAR;
ALTER TABLE T ALTER PARTITION OTHERS DYNAMIC INTERVAL 3 MONTH;

Multi-Level partitioning


In some scenarios it makes sense to use colums which are not part of the primary key. But with single level partitioning you can only select primary key columns. Multi-level partitioning makes it possible to select on first level a key column and on second level any other column. It is also possible to mix different partitioning methods.

The following combinations are possible [firstlevel-secondlevel]:

  • hash-hash

  • range-range

  • hash-range

  • round-robin-range


 




4. Designing Partitions


Actually the online repartitioning is based on table replication. Tables with the naming convention _SYS_OMR_<source_table>#<id> are used as interim tables during online repartitioning operations. For details please read the “Designing Partitions” section in the documentation.

As summary:

  • Use partitioning columns that are often used in WHERE clauses for partition pruning

  • If you don’t know which partition scheme to use, start with hash partitioning

  • Use as many columns in the hash partitioning as required for good load balancing, but try to use only those columns that are typically used in requests

  • Queries do not necessarily become faster when smaller partitions are searched. Often queries make use of indexes and the table or partition size is not significant. If the search criterion is not selective though, partition size does matter.

  • Using time-based partitioning often involves the use of hash-range partitioning with range on a date column

  • If you split an index (SAP names the CS tables also as index), always use a multiple of the source parts (for example 2 to 4 partitions). This way the split will be executed in parallel mode and also does not require parts to be moved to a single server first.

  • Do not split/merge a table unless necessary.

  • Ideally tables have a time criterion in the primary key. This can then be used for time-based partitioning.

  • Single level partitioning limitation: the limitation of only being able to use key columns as partitioning columns (homogeneous partitioning)

  • the client (MANDT/MANDANT) as single attribute for partitioning is not recommended - only useful in multi level partitioning scenarios






5. Example BSEG


We determined that BSEG has 4,5 billion rows. Now we need details on which column the table has to be partitioned.

The recommendation of SAP is to use a HASH partitioning on BELNR in note 2044468.

Note 2289491 describes a range partitioning on BUKRS. As you can see it always depends on how your system is being used. Ok, let’s find it out in a test environment.

Now we know:

  • table has to be partitioned (4,5 billion rows)

  • most scanned columns

  • recommendation for partitioning by SAP








Questions to success:

  • what is the primary key of the table?

  • which columns are often selected?

  • which columns are often used in where clause

  • what is the cardinality of this columns?

  • how many rows are inside the tables?

  • how many partitions are needed?



2044468 - FAQ: SAP HANA Partitioning
2418299 - SAP HANA: Partitioning Best Practices / Examples for SAP Tables
2289491 - Best Practices for Partitioning of Finance Tables

To quote the HPI course on the partitioning details (In-Memory_Data_Management_2017):

“There are number of different optimization goals to be considered while choosing a suitable partitioning strategy. For instance, when optimizing for performance, it makes sense to have tuples of different tables, that are likely to be joined for further processing, on one server. This way the join can be done much faster due to optimal data locality, because there is no delay for transferring the data across the network. In contrast, for statistical queries like counts, tuples from one table should be distributed across as many nodes as possible in order to benefit from parallel processing."

To sum up, the best partitioning strategy depends very much on the specific use case.

The main challenge for hash-based partitioning is to choose a good hash function, that implicitly achieves locality or access improvements.

Primary key BSEG:
MANDT
BUKRS
BELNR
GJAHR
BUZEI

 

The following columns have a particularly high amount of scanned records (SQL: “HANA_Tables_ColumnStore_Columns_Statistics”, MIN_SCANNED_RECORDS_PER_S = 5000000😞



  • To achieve this just fill in your table inside the modification section


 

There are recommendations for the most famous SAP standard tables within note 2044468. If you are close to the standard and don’t use a lot of custom code try this partitioning recommendations first. If you have heavy load with own statement which uses a totally different execution plan you may have to determine your own partitioning columns. Be aware of the partitioning limits!

If you want to check the optimal partitioning combinations use this statement:
select statement_hash, statement_string, 
execution_count, total_execution_time from m_sql_plan_cache
where statement_string like '%BSEG%'
order by total_execution_time desc;

 



From the result you have to analyze the “where” clause and find a common pattern.

Take as few fields as needed. Normally fields like MATNR, BELNR or DOCNR are selective fields and well suited for partitioning.

For a deep dive you can additionally run the DataCollector with the selected statement_hash (HANA_SQL_StatementHash_DataCollector) within the statement collection (1969700).

Additionally, you can use the ColumnStore statistics (HANA_Tables_ColumnStore_Columns_Statistics_2.00.030+ ) which is also part of the SQL collection to analyze how many scans are on which column. If you are using multiple clients, it may be wise to add the field MANDT as partitioning criteria.

HANA_Data_ColumnValueCounter_CommandGenerator
SELECT TOP 100
'BSEG' TABLE_NAME,
S.*
FROM
( SELECT
BUKRS,
LPAD(COUNT(*), 11) OCCURRENCES,
LPAD(TO_DECIMAL(COUNT(*) / SUM(COUNT(*)) OVER () * 100, 10, 2), 7) PERCENT
FROM
( SELECT
BUKRS
FROM
"SAPSCHEMA"."BSEG
)
GROUP BY
BUKRS
ORDER BY
OCCURRENCES DESC
) S





Only 2 values for BUKRS “0010” and “0020”. This means we have a too low cardinality and partitioning won’t work well on this attribute. So, this means both extreme too high and low are bad.

Most scanned columns are PCTR, HKONT and BELNR.

We have about 30 clients in this test system which means it makes sense to use it in the partitioning clause on the first level as entry point.

  1. MANDT first level hash partitioning

  2. BELNR second level hash partitioning


With 4,5 billion rows we need about 12-15 partitions which means 300 million rows per partition (rule of thumb: between 100 - 500 million), if we would use only single level partitioning. With multi-level partitioning the fill degree of the partitioning depends on the single column data cardinality of each level.








Note

At the end it is just a indication for the given analyzed timeframe and no guarantee for the future. This is a process you should repeat if the application and selections are changing. This can be due user behaviour or new SAP packages or custom code.

 




6. Example: BSIS


Primary Key
MANDT
BUKRS
HKONT
AUGDT
AUGBL
ZUONR
GJAHR
BELNR
BUZEI

Column Stats



often executed statements on BSIS



cardinality









Facts:

  • 4,1 billion rows

  • About 30 client values

  • Only one value for BUKRS

  • The top occurrences in BELNR have summed up not 0,001%


The recommendation of SAP is a RANGE partitioning on BUKRS. If RANGE on BUKRS or a similar range partitioning is not possible, use HASH partitions on BELNR.

We can't use BUKRS as partitioning object due too low cardinality. BELNR alone has a high cardinality and could be not efficient enough.

2 recommendation is in this scenario:

  1. MANDT first level range partitioning

  2. HKONT second level range partitioning


 

  1. MANDT first level hash partitioning

  2. BELNR second level hash partitioning


With 4,1 billion rows we need about 12-14 partitions which means 300 million rows per partition (rule of thumb: between 100 - 500 million), if we would use only single partitioning. With multilevel partitioning the fill degree of the partitioning depends on the single column data cardinality of each level.




7. Another example for Z-table


<tbd>

 




8. Parameter for splitting


As already mentioned in the first part of this series, we can optimize the splitting procedure with some parameters which have to be adjusted according your hardware:
indexserver.ini -> [partitioning] -> split_threads (default: 16) [range: 1-128]
indexserver.ini -> [partitioning] -> bulk_load_threads (default: 4) [range: 1-20]

 



    • Under a system with 176 CPU cores, the repartitioning was failed due to this error and completed in 9.18H with settings below.




indexserver.ini -> [partitioning] -> split_threads =50
indexserver.ini -> [partitioning] -> bulk_load_threads = 8




    • With 120 CPU cores, it took 14.40H without the error after setting as below.




indexserver.ini -> [partitioning] -> split_threads =95
indexserver.ini -> [partitioning] -> bulk_load_threads = 10

Source:
Note 2890332

HANA 2.0 SPS04 Rev. >= 43


indexserver.ini -> [mergedog] -> auto_merge_decision_func = '((DRC*TMD > 3600*(MRC+0.0001)) or ((DMS>PAL/2000 or DMS > 1000 or DCC>100) and (DRC > MRC/100 or INSTR(BASENAME, "_SYS_OMR_") > 0)) or (DMR>0.2*MRC and DMR > 0.001) or (DRC=0 and DMR=0 and MPU=1))' 
indexserver.ini -> [table_replication] -> optimize_replica_for_online_ddl_in_log_replay = 'true'
indexserver.ini -> [table_replication] -> optimize_replica_for_online_ddl = 'true'
indexserver.ini -> [persistence] -> logreplay_savepoint_interval_s = '600'

Source:
Note 2874176




9. Partitioning process


You can monitor the partitioning process via M_JOB_PROGRESS monitoring view.

master process has in total 7 steps: Mapping, GetDataRanges, PreProcess, RepartitionMain, RepartitionDeltaAndMVCC, PostProcess, DropSources

Each worker group consists of 13 steps: CreateTargets, DeterminepartsMain, DeterminepartsDelta, InitPersistenceMain, InitPersistenceDelta, CheckConsistencyMain, CheckConsistencyDelta, RepartitionMainLocal, RepartitionDeltaLocal, RepartitionMVCCMain, RepartitionMVCCDelta, UpdateRuntime, Finalize

Each thread worker has 3 steps: SplitMergeAttributes, Save, Idle








Example:

Table : 80 columns

Indexes: PK + 2 secondary indexes

=> 83 attributes to split (+internal once)

CPU threads : 64vCPUs

If you have configured 60 split threads. One group will be started and use 60 threads. When one column is finished the next one is started. The bulk load threads won't be considered.

 

Grouping (repartitioning)


If you already have a partition design in place, but it is not fitting any more to your needs, you have to take attention for the repartitioning process to the grouping.

Example same as before only with the fact that is already partitioned with HASH 32.

old design: HASH 32

new design: HASH 4

32 can be divided by 4. This means 4 groups will be started. But if you stay with split threads 60 you will overload the system! So, go with 60/4 = 15. The system will start 4 groups with each 15 threads.

 

Note








old design / new design

or

new design = old design / 2

 








bulk_load_threads = old design / new design

split_threads = log. CPU / bulk_load_threads

 

 




10. Tips


Duration of a partitioning process


Export the table from production in binary format and import it in a separated system/schema to validate the duration of the partitioning process. Before you begin you should load and merge the table. This will speed up the procedure.

If you want to partition online you should consider the locking phase / delta merge phase. If those phases running long in your test system you should verify the impact with capture&replay or just do it offline (without application workload) or in times with less workload on the system. There are also some known issues regarding merges/partitioning locks. Please check your revision!

Dependency of the runtime

The runtime of the partitioning process depends on different factors:

  • number of fields/columns of the table (each column will be processed by one thread)

  • data type / length of the field

  • delta size

  • size of the column

  • cardinality of the column

  • composition of the PKEY

  • concat attributes

  • number of threads for partitioning (available CPU cores)

  • LOAD attribute PAGE (NSE)


Repartitioning process:

  • be sure that your new design is a factor 2 multiple or divider of the current number of partitions

    • 4 => 8

    • 6 => 3

    • 32 => 4



  • adjust the parameters in order to the partition grouping


Because only in this case the repartitioning can happen in parallel on different partition groups and hosts (“parallel split / merge”).

 

You can speed up the procedure with some tuning of parameters or deleting indexes. This should always be tested carefully.






Attention

When you drop unique indexes you can't partition your table online! This would end in duplicate entries!!!



 




11. Partition / Re-partitioning references


3234063 - How to partition/re-partition a table

3111531 - SAP HANA - How to convert an existing hash partitioned table to range.

ALTER TABLE Statement (Data Definition)





12. Summary


There is no silver bullet for every customer. It depends always on the usage and the change of values ​​in a table. Additionally the changes of the applications and the different selections. Try different partitioning columns in your test system and check the performance. You should also optimize the compression of those big tables from time to time. The optimize compression will take place automatically as part of a delta merge process, but decided from the system itself. On huge tables it can be that this happens pretty late which results in a bad performance.The partitioning can also have some positive impacts beside come across the record limitation. For instance better delta merge and record change performance.


So, stay tuned and give your system reguarly health checks. SAP provides such services called TPO, but also some well known HANA experts provide such services 😉
18 Comments
former_member247294
Contributor
Really good example on how to partition and may be applied on other databases also. Thanks for the valuable info and all the details about the process and the decisions involved !
former_member183260
Participant
0 Kudos
Good ...

 
Michæl
Participant
0 Kudos
Thank you for showing the road to "no silver bullets"

Remember not to waste them on clients customers or MANDT BUKRS ...

You could imagine it comes to hare and a hedgehog situation, missing the Technical Performance Optimization

Best regards MS
former_member205400
Active Participant
0 Kudos
Jens,

First ... great blog. Thank you so much !

Second ... in the #2 section above I see:

Rule of thumb for the initial partitioning:

  • 100 mio entries per partition

  • 500 mio entries per partition


Is that 500 Mil Max per Partition and 100 Mile Max Sub-partition ?

Thanks again !

Mike
jgleichmann
Active Contributor
0 Kudos

Hi Mike,

this is the official recommendation of SAP. It depends on the type of partitioning, the selection/partitioning criteria and the data growth. This is not a maximum! It is more a recommendation as starting point. With HASH partitioning it is not so easy to reach this. With range partitioning you can define the distribution pretty good.

To anwer your question I will show a result of a second-level partitioning:

You calculate the 100 to 500 million rows per sub partition as starting point of the sub parts. The maximum is once again 2 billion rows as limit per partition.

Regards,

Jens

0 Kudos

Hi Jens,

 

Thank you very much for this informative article first of all.

I tried to split a table of an arbitrary size into partitions on HANA cockpit.

First, I created a table of 1500 rows and prepared some setup as follows:

Partitioning rule:

1. Repartition when number of table rows exceeds => 100
2. Maximum number of rows per partition => 500
3. When partitioning, ensure initial number of partitions => 2
4. Each table in this group has the same number of partitions => false
5. Hash Paritioning

I set ‘max_partitions_limited_by_locations’ to be false and

‘Force partition to split if specified by the settings’ to be true and run the partitioning rule on above tables.

Following is the indexserver trace.

TablePlacement    : For Table JIN1:PARTTEST - 1 locations are possible
TablePlacement : getMinRowsForPartitioning match for [JIN1.PARTTEST]:SCHEMA_NAME,TABLE_NAME - MinRowsForPartitioning:500
TablePlacement : getInitialPartitions match for [JIN1.PARTTEST]:SCHEMA_NAME,TABLE_NAME - InitialPartitions:2
TablePlacement : getRepartitioningThreshold match for [JIN1.PARTTEST]:SCHEMA_NAME,TABLE_NAME - RepartitioningThreshold:100
TablePlacement : Threshold for partition size is 100. Estimated row count is 0.
TablePlacement : Required Number of Partitions by Threshold [100] and limited by available locations [1]: 1 to store 0 number of record in table/partition.
TablePlacement : required number of partitions by threshold [100]: 1 to store 0 number of record in table/partition.
TablePlacement : regular number by threshold 1.

 

The result is an unpartitioned table. Even when done to a table of 1.5 billion rows with different partitioning rule the result is the same.

partitioning     : BEGIN REPARTITION Table J21::BACKUP_OPER:Z_HET_THREAD_SAMPLES (t 4276)
TablePlacement : For Table BACKUP_OPER:Z_HET_THREAD_SAMPLES - 1 locations are possible
TablePlacement : getMinRowsForPartitioning match for [BACKUP_OPER.Z_HET_THREAD_SAMPLES]:SCHEMA_NAME,TABLE_NAME - MinRowsForPartitioning:10000000
TablePlacement : getInitialPartitions match for [BACKUP_OPER.Z_HET_THREAD_SAMPLES]:SCHEMA_NAME,TABLE_NAME - InitialPartitions:3
TablePlacement : getRepartitioningThreshold match for [BACKUP_OPER.Z_HET_THREAD_SAMPLES]:SCHEMA_NAME,TABLE_NAME - RepartitioningThreshold:100000
TablePlacement : Threshold for partition size is 100000. Estimated row count is 0.
TablePlacement : Required Number of Partitions by Threshold [100000] and limited by available locations [1]: 1 to store 0 number of record in table/partition.
TablePlacement : required number of partitions by threshold [100000]: 1 to store 0 number of record in table/partition.
TablePlacement : regular number by threshold 1.
partitioning : SourceServers:[selbld122:32103] TargetServers:[] targetStorageSpec:[]
partitioning : REPARTITIONING J21::BACKUP_OPER:Z_HET_THREAD_SAMPLESen from HASH 1 TRANSACTION_ID to HASH 1 CONNECTION_ID

 

I understand that it is not optimal to partition a table of <2 billion rows, but I explicitly changed the configuration so that when the threshold is hit, the partitioning is forced to happen. But i failed to partition a table on HANA level manually. None of my colleagues succeeded as well.

Can you shed some light on this issue?

 

Thanks!

 

 

jgleichmann
Active Contributor
0 Kudos
Hi Jinsol,

if I understand you correct you setup partitioning rules and you executed a table redistribution. At first which kind of redistribution? As you may know the HANA Cockpit offers more than one:


In your scenario the option 'Check the number of partitions' would be the correct one. Check the new plan and verify if all your rules leading to the expected result.

Additionally check if the other parameters match the recommendations of notes 1899817 and 2600030. Especially the need of the 'balance by*' parameters is obsolet with >= HANA 2.0 SPS04

 

Regards,

Jens
priyanka_sadana2
Explorer
0 Kudos
Hi Jens,

 

Very well written blog. Thank you for your contribution.

We have a scenario where the database table will have 15 million records per object in that table and there are 8 kind of objects per record i.e. total number of possible records are 15 million * 8 = 120 million = .12 billion.

 

Should we create separate 8 tables for this or is there a way that we keep all the data in one table only and use some partitioning technique.

Note that We cannot archive this data.

 

Thanks and regards,

Priyanka Sadana
jgleichmann
Active Contributor
0 Kudos
Hi Priyanka,

120 million is properly not the use case for partitioning. But it always depends of the fields (number and size). May be you want take advantage of a merge per partition or NSE. With this details I see no reason why you can not use one table. It depends on the future growth of the table how you act here.

 

Regards,

Jens

 
ParagJain
Participant
0 Kudos
Hello Jens,

We have a table with RAW16 (GUID) as a key and expect 5-6 billion records to be stored. This is like a condition table , common table for conditions of our custom orders, invoices, returns, etc.

Sample design

  1. Condition_guid (key) Raw16

  2. Reference_id(non-key) Char32

  3. Condition_type(non-key) char 6

  4. Amount

  5. Currency


 

We are deliberating between hash and hash-hash (primary key and reference id).

What would be the recommendation and no of partitions ?

 

Regards, Parag

 
jgleichmann
Active Contributor
0 Kudos
Hi Parag,

 

with RAW16 you mean VARBINARY. Please be aware of the documented limitations regarding data types. If GUID is high selective, there is normaly no need to use second level partitioning. But it depends on your application. In the end you have to test it and verify which combination has the best performance.

The number of partitions depends on the size of the table and its records (your example with 6 billion rows). Also the monthly growth is an important aspect. If the growth rate is quite high go for 20 partitions if the partition size is also between 20-30GB. In a low growth rate scenario I would recommend 16 partitions (in relation to the part size).

Regards,

Jens
former_member122654
Participant
0 Kudos

The question was withdrawn

sergiu_popa
Participant
0 Kudos
Hi Jens,

Thanks for this blog entry... it really helps a lot with understanding how you can take a better decision when partitioning a table. Nonetheless, there are some things which are not that clear to me... yet.

I want to partition table CDPOS and I'm looking at SAP Note 2044468.
There, SAP recommends a HASH partitioning on column OBJECTID, CHANGENR or TABKEY for table CDPOS. So I went on and checked SAP's recommendation in a DEV system where CDPOS has ~1million entries. From the 1 million entries, I found that OBJECTID has ~130.000 unique values, CHANGENR has ~155.000 unique values and TABKEY has ~815.000 unique values.

In this case, it seems to me none of the columns SAP recommends are suitable. Moreover, I don't understand their recommendation because it is expected that either OBJECTID, CHANGENR or  TABKEY will have a lot of unique entries in CDPOS regardless of customer's particularities.

Could you help me understand a scenario where OBJECTID, CHANGENR or TABKEY would be suitable in HASH partitioning for CDPOS?

Thanks,
Sergiu

 

 
jgleichmann
Active Contributor
0 Kudos

Hi Sergiu,

 

at first the DEV systems is not the system you should design your partitioning on. Please use the PRD system. There you have to analyse th most scanned columns and the SQLs. If you have 1 million records and 130.000 unique values for OBJECTID you may find the top ID's with over 8%. Check out the BSIS example. If there is no better combination you have to use the OBJECTID, but this look not too bad for me.

 

Regards,

Jens

sergiu_popa
Participant
0 Kudos
Hi Jens,

Thanks for your reply. Of course I will use the PRD system. For now DEV is what I have access to, so I thought I should give a first analysis a shot.

In the end my confusion was generated by how HASH partitioning is working. I thought that by using those columns I would generate too many partitions, when in fact I can decide how many partitions to create 🙂.

Cheers!

 
jgleichmann
Active Contributor
0 Kudos
Hi Sergiu,

correct, you can determine how many partitions will be created. HASH partitioning will pack different values without knowing the values in one partition. Range partitioning is more performant because the values are known. But this only a valid solution if you have number or date values. But in all scenarios you can define how many partitions will be created. The only exception is dynamic range partitioning. In this case partitions will be created if the defined limit is reached.

But a good hint, I will add some command examples for the scenarios to the blog.

 

Regards,

Jens

 
evelyn_curreri
Explorer
0 Kudos
Hi Jens,

Thanks for such a great article. I'm new to table partitioning so I'm trying to implement but ran into an issue:

I just partitioned PTDW_TIMES_DB with the following command:

alter table "SAPSCHEMA"."PTDW_TIMES_DB" partition by hash (MANDT) partitions 3;

However, noticed that the records were not split equally into the three partitions but they were all added onto the first partition. Did I miss a parameter on my partitioning command?

Thanks in advance for your help.

Evelyn
jgleichmann
Active Contributor
0 Kudos
Hi Evelyn,

thanks for reading. The client is not the optimal attribute for splitting and should only be used if you have more than 5 productive clients or for multilevel partitioning. It seems that your system has 3 clients but the amount of data in the table are the same for every client. This is the reason for the irregular splitting behavior.
select MANDT,count(*) from SAPSCHEMA"."PTDW_TIMES_DB" group by MANDT;

 

  1. evaluate a attribute which is used frequently in your queries besides MANDT

  2. check the rows of the table

  3. check the size of the table

  4. evaluate the number of splits

  5. test the table in a testsystem with your configuration


 

Regards,

Jens
Labels in this area