Application Development and Automation Discussions
Join the discussions or start your own on all things application development, including tools and APIs, programming models, and keeping your skills sharp.
cancel
Showing results for 
Search instead for 
Did you mean: 
Read only

REGEX - find zip code in text

Former Member
0 Likes
1,501

Hi experts!

I have to find the zip code (for germany) in a given text. I found 5 digits in my text, but my problem is to ignore numbers which consists of more than 5 digits!

My first try works for all cases but not for the last one.

FIND FIRST OCCURRENCE OF REGEX '([0-9]{5})' IN ld_string SUBMATCHES ld_plz.

D-12345 Mainz -> should match 12345

D 12345 Mainz -> should match 12345

12345 Mainz -> should match 12345

12345Mainz -> should match 12345

Mainz D-12345 -> should match 12345

D-123 45 Mainz -> error because of the space between the numbers

D-12333345 Mainz -> error because only 5 digits are valid for a germany zip code; my REGEX does not work!

Thanks a lot!

Regards,

Florian

6 REPLIES 6
Read only

Former Member
0 Likes
1,162

D-123 45 Mainz -> error because of the space between the numbers

find ` ` in LD_STRING. " ` `  is back quotes (quote beside 1 in keyboard) 
if sy-subrc = 0
  message 'error, space not allowed'.
endif

and for this

D-12333345 Mainz -> error because only 5 digits are valid for a germany zip code;

i guess your code is correct. but you can try removing the ( )s

find regex '[0-9]{5}' in ld_string.

Read only

Clemenss
Active Contributor
0 Likes
1,162

Where's the question?

I think the introduction of REGEX in ABAP is revolutionray!

Regards,

Clemens

Read only

former_member156446
Active Contributor
0 Likes
1,162

Hi Florian check this :

FIND FIRST OCCURRENCE OF REGEX '^([0-9]{5})$' IN ld_string SUBMATCHES ld_plz.

Read only

0 Likes
1,162

Hi J@Y!

That's what I tried first. But unfortunately it does not seem to work. I tested it with DEMO_REGEX_TOY and it does not match in any case!? I wonder why!? I think that ^ and $ stands the start and the end of the hole content in the variable LD_STRING. It only matches 12345.

Regards,

Florian

Read only

Clemenss
Active Contributor
0 Likes
1,162

Hi Florian,

curiosity persists.

After some playing around I reduced the pattern using Placeholder for any single digit \d. Then I noticed that the 5-digit-sequence will also match 5 digits out of 6, so i used \D Placeholder for any character other than a digit Then I don't know how to recognize (optional) line start or end as alternative to non-digit, so I just enclose the string to be checked into spaces - please suggest a more elegant solution.

My test form

FORM regex .
  DATA:
    lv_subm   type string,
    lt_string TYPE TABLE OF string.
  FIELD-SYMBOLS:
    <string>  TYPE string.
  APPEND:
    'D-12345 Mainz' TO lt_string,
    'D 12345 Mainz' TO lt_string,
    '12345 Mainz'   TO lt_string,
    '12345Mainz'    TO lt_string,
    '123456Mainz'    TO lt_string,
    '123 45Mainz'    TO lt_string,
    'Mainz D-12345' TO lt_string.
  LOOP AT lt_string ASSIGNING <string>.
    clear:
      lv_subm.
    CONCATENATE ` ` <string> ` ` into <string>.
    FIND REGEX '\D(\d{5})\D' IN <string> SUBMATCHES lv_subm.
    WRITE: / <string>, 20 'matches', 30 lv_subm,40 'SY-SUBRC=', sy-subrc.
  ENDLOOP.
ENDFORM.                    " REGEX

creates this output:

 D-12345 Mainz     matches   12345     SY-SUBRC=     0
 D 12345 Mainz     matches   12345     SY-SUBRC=     0
 12345 Mainz       matches   12345     SY-SUBRC=     0
 12345Mainz        matches   12345     SY-SUBRC=     0
 123456Mainz       matches             SY-SUBRC=     4
 123 45Mainz       matches             SY-SUBRC=     4
 Mainz D-12345     matches   12345     SY-SUBRC=     0

As I do not fully understand the meaning of FIRST OCCURRENCE, I just removed it.

Regards,

Clemens

Read only

rainer_hbenthal
Active Contributor
0 Likes
1,162

Do not try to put everything in one regex, it makes them non performant, hard to read and hart to maintain.

As a first approach i would add word boundaries:


FIND FIRST OCCURRENCE OF REGEX '(\<[0-9]{5}\>)' IN ld_string SUBMATCHES ld_plz.

This fixes most of your examples but 12345Mainz not because the numbers do not have a word boundary.

If the regex above wouldfail, i would try another regex which fits the last remaining exmaple, maybe in alist with user approval.