Introduction
In my previous post (
Split string into multiple rows using SQL in SAP HANA) I was demonstrating how to split variable string value into multiple lines. In this post I would like to focus on the similar scenario, but this time the challenge will be to do the same not for variable but for table column. SAP HANA enables
STRING_AGG function for concatenating values from multiple lines into single string, however doing opposite is quite challenging.
Scenario
In my scenario I will use my test table containing contact person details. Its structure is as follows:
CREATE COLUMN TABLE CONTACTS
(
ID INT,
COUNTRY VARCHAR(2),
FULL_NAME VARCHAR(100),
PHONE_NUMBERS VARCHAR(200)
);
Existing CONTACTS table has PHONE_NUMBERS column which stores comma delimited numbers. The purpose is to display phone numbers as separate rows and keep other contact information as in the source table. Single phone number length may vary and count of phone numbers can be also different for each record.
Split column values into multiple lines
To implement the logic for splitting column values into multiple lines, following steps are required
- Retrieve from the string values delimited by separators
- Dynamically define maximum number of values for all concatenated strings
- Generate multiple lines for each record
- Distribute retrieved values to generated rows
Step 1. Retrieve from the string values delimited by separators
In this step I will use string function, which allows to distinguish values from comma separated string. For this purpose I will use
SUBSTR_REGEXPR SQL function. This function allows to retrieve substring from specific string based on regular expression. It also allows to specify which occurrence of the matching substring we want to display.
Following expression allows to retrieve first occurrence of the string of any characters excluding commas
SUBSTR_REGEXPR('[^,]+' IN "PHONE_NUMBERS" OCCURRENCE 1 )
Knowing that in my scenario there is up to 3 phone numbers concatenated within single value, let's add the expression for remaining numbers:
Query:
SELECT
"ID",
"COUNTRY",
"FULL_NAME",
"PHONE_NUMBERS",
SUBSTR_REGEXPR('[^,]+' IN "PHONE_NUMBERS" OCCURRENCE 1) AS "PHONE_NUMBER1",
SUBSTR_REGEXPR('[^,]+' IN "PHONE_NUMBERS" OCCURRENCE 2) AS "PHONE_NUMBER2",
SUBSTR_REGEXPR('[^,]+' IN "PHONE_NUMBERS" OCCURRENCE 3) AS "PHONE_NUMBER3"
FROM
CONTACTS;
Result:
Step 2. Dynamically define maximum number of values for all concatenated strings
In this step we want to define what is the maximum number of phone number values in single string. For this purpose I will use OCCURRENCES_REGEXPR function to count number of separators within the string.
Then I will add +1, because number of commas is always less by 1 than the number of phone numbers in the string:
OCCURRENCES_REGEXPR(',' IN "PHONE_NUMBERS" ) + 1
Now we have count of phone number occurrences for each string.
Query:
SELECT
"ID",
"COUNTRY",
"FULL_NAME",
"PHONE_NUMBERS",
OCCURRENCES_REGEXPR(',' IN "PHONE_NUMBERS" ) + 1 AS "OCCURRENCES_COUNT"
FROM
CONTACTS;
Result:
Finally I want to see the maximum value to know, how many lines I need to generate. This value will be assigned to the variable MAX_NR_OCCURRENCES and will be used in
Step 3. For the purpose of creating variable, I used
anonymous block:
Query:
DO
BEGIN
DECLARE MAX_NR_OCCURRENCES INT;
SELECT
MAX( OCCURRENCES_REGEXPR(',' IN "PHONE_NUMBERS" ) + 1 )
INTO
MAX_NR_OCCURRENCES
FROM
CONTACTS;
END
Step 3. Generate multiple lines for each record
For generating multiple lines for each record I will cross join CONTACT table with series of 3 records (because in my case there are max 3 phone numbers in string). To generate N records I used
SERIES_GENERATE_INTEGER function. Variable defined in
Step 2 will be used as input parameter for this function to define number of records to be generated:
Query:
DO
BEGIN
DECLARE MAX_NR_OCCURRENCES INT;
SELECT
MAX( OCCURRENCES_REGEXPR(',' IN "PHONE_NUMBERS" ) + 1 )
INTO
MAX_NR_OCCURRENCES
FROM
CONTACTS;
SELECT * FROM SERIES_GENERATE_INTEGER(1,0, :MAX_NR_OCCURRENCES);
END
Result:
Now let's cross join the series with result set from
Step 1. This way each record will be copied 3 times:
Query:
DO
BEGIN
DECLARE MAX_NR_OCCURRENCES INT;
SELECT
MAX( OCCURRENCES_REGEXPR(',' IN "PHONE_NUMBERS" ) + 1 )
INTO
MAX_NR_OCCURRENCES
FROM
CONTACTS;
SELECT
CNT."ID",
CNT."COUNTRY",
CNT."FULL_NAME",
CNT."PHONE_NUMBERS",
SUBSTR_REGEXPR('[^,]+' IN "PHONE_NUMBERS" OCCURRENCE 1) AS "PHONE_NUMBER1",
SUBSTR_REGEXPR('[^,]+' IN "PHONE_NUMBERS" OCCURRENCE 2) AS "PHONE_NUMBER2",
SUBSTR_REGEXPR('[^,]+' IN "PHONE_NUMBERS" OCCURRENCE 3) AS "PHONE_NUMBER3",
SERIES."ELEMENT_NUMBER"
FROM
CONTACTS CNT
CROSS JOIN SERIES_GENERATE_INTEGER(1,0, :MAX_NR_OCCURRENCES) SERIES;
END
Result:
Step 4. Distribute retrieved values to generated rows
In this step we will apply final query adjustments, to move distinguished phone numbers from columns to consecutive rows. To achieve that we can use ELEMENT_NUMBER column from SERIES_GENERATE_INTEGER function, which returns consecutive numbers for each line within specific contact person. This column will be consumed by OCCURRENCE parameter. By having consecutive numbers in ELEMENT_NUMBER column we can dynamically substring separated values one by one, and display them in consecutive rows. We also need to remember that initially for each record we generated three lines, so at the end we also need to filter out empty rows (for these cases where there are less than 3 phone numbers in string)
Query:
DO
BEGIN
DECLARE MAX_NR_OCCURRENCES INT;
SELECT
MAX( OCCURRENCES_REGEXPR(',' IN "PHONE_NUMBERS" ) + 1 )
INTO
MAX_NR_OCCURRENCES
FROM
CONTACTS;
SELECT
CNT."ID",
CNT."COUNTRY",
CNT."FULL_NAME",
SUBSTR_REGEXPR('[^,]+' IN "PHONE_NUMBERS" OCCURRENCE SERIES."ELEMENT_NUMBER") AS "PHONE_NUMBER"
FROM
CONTACTS CNT
CROSS JOIN SERIES_GENERATE_INTEGER(1,0, :MAX_NR_OCCURRENCES) SERIES
WHERE
SUBSTR_REGEXPR('[^,]+' IN "PHONE_NUMBERS" OCCURRENCE SERIES."ELEMENT_NUMBER") IS NOT NULL;
END
Result:
Summary
This post was to give proposed solution for splitting table column and generating multiple lines, but also to highlight the power of using string functions in combination with regular expressions.
Thanks for reading!