Application Development and Automation Discussions
Join the discussions or start your own on all things application development, including tools and APIs, programming models, and keeping your skills sharp.
cancel
Showing results for 
Search instead for 
Did you mean: 
Read only

Pattern recognition in a given string

venkatesha_n
Product and Topic Expert
Product and Topic Expert
0 Likes
4,129

Hi Experts,

I have one problem in finding a repeated pattern in a given string.

For example: if the given string lv_string = '345723345982343452343345'.

you can check the lv_string, which contains '345' repeatedly in it.

so, i want a ABAP code to get list of all such type of patterns repeated in a given string.

Thanks in advance.

Venky.

1 ACCEPTED SOLUTION
Read only

kesavadas_thekkillath
Active Contributor
0 Likes
3,673

Vishnu,

The code is readily available in F1 help FIND - pattern

43 REPLIES 43
Read only

kesavadas_thekkillath
Active Contributor
0 Likes
3,314

Hey F1 ... Please help Venkatesa .

Read only

0 Likes
3,314

Its not there in F1 Help......

Read only

0 Likes
3,314

You have to Find it.

Read only

0 Likes
3,314

Hello,

its not just CP / CA / NP / CO... etc.

And also am not talking about regular expressions which we normally do by using cl_abap_matcher or cl_abap_regex.

Try to understand the problem first:

NOTE: you are given just a string and nothing else and you will not be given any search key/ pattern to be found.

we have to search the given string so that, what part of string is repeated again and again.

I hope this is enough to understand the problem.

thanks

Venky.

Read only

0 Likes
3,314

To be fair to the OP, the question is not trivial.

I challenge everyone here to write code to detect a string with patterns repeated more than once in the string. The pattern can be anything and is not known at design time.

Hint: Regular expressions may be a key and normal ABAP pattern matching is not much use

There are some programming snippets on the Web, though not in ABAP but still may be useful

http://social.msdn.microsoft.com/Forums/en/csharpgeneral/thread/037047fc-5506-4656-ad27-dab9a6c501ee

Edited by: Vishnu Tallapragada on Dec 21, 2011 1:01 PM

I just realize, the OP and I posted together

Read only

0 Likes
3,314
data: lv_string type string.
data: mcnt type i.
lv_string = '345723345982343452343345'.

find ALL OCCURRENCES OF '345' in lv_string MATCH COUNT mcnt.

write: / mcnt.

Did i win cookies?

Read only

0 Likes
3,314

Hello Maen Anachronos ,

Please try to understand the problem..

I have already told that, you will be given just a big string like this '345723345982343452343345'.

And you have to find out what part of the string is repeated again and again in the given string.

and if there are multiple such parts of the string, which are repeated in a given string.

all that repeated substrings should be the output...

Interesting?....

Thanks,

Venky.

Read only

0 Likes
3,314

Using the FIND statement it should be easy to program the stuff. Even if the serach string is not given.

Read only

kesavadas_thekkillath
Active Contributor
0 Likes
3,674

Vishnu,

The code is readily available in F1 help FIND - pattern

Read only

0 Likes
3,314

Keshav and Maen - sorry but you still don't seem to get it.

We don't know what the pattern is.

We want to detect a string with repeated patterns, the pattern we don't know in advance.

It can be anything

345w2343234523

Vishnudoesn'tlikeVishnu

Read only

0 Likes
3,314

Ow but i do understand.

Point is: you need to be creative to use the FIND statement.

Read only

0 Likes
3,314

Vishnu,

See this example in documentation


DATA: patt       TYPE string VALUE `now`, 
      text       TYPE string, 
      result_tab TYPE match_result_tab. 

FIELD-SYMBOLS <match> LIKE LINE OF result_tab. 


FIND ALL OCCURRENCES OF patt IN 
     `Everybody knows this is nowhere` 
     RESULTS result_tab. 

LOOP AT result_tab ASSIGNING <match>. 
  WRITE: / <match>-offset, <match>-length. 
ENDLOOP. 

Mods-Sorry for pasting the standard code, did just in case to demonstrate that Its as simple as that

@OP- No Points please

Read only

0 Likes
3,314
data: gv_string type string.
data: gv_length type i.
data: gv_offset type i.
data: gv_search type string.
data: mcnt type i.


gv_string = '345723345982343452343345'.
gv_length = strlen( gv_string ).

write: / gv_string.

gv_offset = 0.
do gv_length times.
  if gv_offset eq 0.
    gv_search = gv_string(1).
  else.
    concatenate gv_search gv_string+gv_offset(1) into gv_search.
  endif.

  find ALL OCCURRENCES OF gv_search in gv_string MATCH COUNT mcnt.
  write: /'Search for', gv_search, 'counted', mcnt.

  gv_offset = gv_offset + 1.

enddo.

Now i expect a really big cake instead of a cookie.

Read only

0 Likes
3,314

Maen, I guess, it will work, but it is not very efficient.

Read only

0 Likes
3,314

I have provided that standard code...Did anybody look into it

Read only

0 Likes
3,314

hehe... and why not? It's not like he's going to search for repeating patterns in the holy bible.

And the challenge was: write a piece of code to do it.

Read only

0 Likes
3,314

I have provided that standard code...Did anybody look into it

Hihihi... but he doesnt know the pattern upfront. The problem is: look for a repeating pattern in a string.

Read only

0 Likes
3,314

Got it...Let me give my brain an exercise before going home ... So that i can drive accurate

Read only

0 Likes
3,314

I know, it can be done by breaking the string into tokens of all possible lengths and at all possible offsets and searching for those tokens in a string. To be honest, I was expecting some regular expression trick that can do the job as efficiently as possible )

Maen, when I look at your code again, I don't think you are searching tokens at all offsets and of all possible lengths. Will check it out later.

Read only

0 Likes
3,314

I> Maen, when I look at your code again, I don't think you are searching tokens at all offsets and of all possible lengths. Will check it out later.

That's right. I'm only starting from the beginning.

Read only

0 Likes
3,314
TYPES: BEGIN OF ty_search,
          value  TYPE string,
          count  TYPE i,
        END OF ty_search.


DATA: it_search TYPE HASHED TABLE OF ty_search WITH UNIQUE KEY value.
DATA: wa_search TYPE ty_search.
DATA: gv_string TYPE string.
DATA: gv_length TYPE i.
DATA: gv_offset TYPE i.
DATA: gv_search TYPE string.
DATA: mcnt TYPE i.


gv_string = '345723345982343452343345'.
WRITE: / gv_string.

DO.
  gv_length = STRLEN( gv_string ).

  IF gv_length EQ 0.
    EXIT.
  ENDIF.

  gv_offset = 0.


  DO gv_length TIMES.
    IF gv_offset EQ 0.
      gv_search = gv_string(1).
    ELSE.
      CONCATENATE gv_search gv_string+gv_offset(1) INTO gv_search.
    ENDIF.

    READ TABLE it_search TRANSPORTING NO FIELDS WITH TABLE KEY value = gv_search.
    IF sy-subrc NE 0.
      FIND ALL OCCURRENCES OF gv_search IN gv_string MATCH COUNT mcnt.
      IF mcnt GT 1.
        wa_search-value = gv_search.
        wa_search-count = mcnt.
        INSERT wa_search INTO TABLE it_search.
      ENDIF.


    ENDIF.
    gv_offset = gv_offset + 1.

  ENDDO.
  SHIFT gv_string LEFT BY 1 PLACES.


ENDDO.
LOOP AT it_search INTO wa_search.
  WRITE: /'Search for', wa_search-value, 50 'counted', wa_search-count.
ENDLOOP.

Like this then: 2 really big cakes.

Read only

0 Likes
3,314

Ah.. small mistake..

Added:

DATA: gv_string2 TYPE string.

Changed

gv_string = gv_string2 = '345723345982343452343345'.

Changed:

FIND ALL OCCURRENCES OF gv_search IN gv_string2 MATCH COUNT mcnt.

@OP: just need to be a bit creative to find a solution.

Read only

0 Likes
3,314

This one shows the power and beauty of regular expressions.

Here, we find repeated non blank patterns in a string with 3 or more than 3 characters and lists them along with how many times they are repeated.

In the string, 'Today, as never before, the fates of men are so intimately linked to one another that a disaster for one is a disaster for everybody', the program lists the following

Count       Pattern
----------------
2           ever
3           for
2           the
2           ate
2           one
2           disaster

The key ingredient of the program is regular expression

([^ ]{3,}).*(\1)

which matches two repeated words, at a time, that are 3 or more than 3 characters in length. Rest you can figure out!

DATA: ls_string TYPE string VALUE 'Today, as never before, the fates of men are so intimately linked to one another that a disaster for one is a disaster for everybody'.
DATA: regex    TYPE c LENGTH 120,
      offset   TYPE i.

DATA: lt_result TYPE match_result_tab,
      ls_result TYPE LINE OF match_result_tab,
      ls_submatch TYPE LINE OF match_result-submatches,
      ls_pattern TYPE string.

DATA: BEGIN OF lt_found OCCURS 0,
        pattern TYPE string,
        count TYPE i,
      END OF lt_found.

FIELD-SYMBOLS: <ls_found> LIKE lt_found.

regex = '([^ ]{3,}).*(\1)'.
offset = 0.
TRY.
    DO.
      FIND ALL OCCURRENCES OF REGEX regex IN ls_string RESULTS lt_result.
      IF sy-subrc = 0.
        READ TABLE lt_result INDEX 1 INTO ls_result.
        IF sy-subrc = 0.
          READ TABLE ls_result-submatches INTO ls_submatch INDEX 1.
          IF sy-subrc = 0.
            ls_pattern = ls_string+ls_submatch-offset(ls_submatch-length).
            READ TABLE lt_found ASSIGNING <ls_found> WITH KEY pattern = ls_pattern.
            IF sy-subrc NE 0.
              lt_found-count = 2.
              lt_found-pattern = ls_string+ls_submatch-offset(ls_submatch-length).
              APPEND lt_found.
            ELSE.
              ADD 1 TO <ls_found>-count.
            ENDIF.
            offset = ls_submatch-offset + ls_submatch-length.
            ls_string = ls_string+offset.
          ENDIF.
        ENDIF.
      ELSE.
        EXIT.
      ENDIF.
    ENDDO.
  CATCH cx_sy_regex.
    MESSAGE 'Invalid regular expression' TYPE 'S' DISPLAY LIKE 'E'. "#EC NOTEXT
ENDTRY.

LOOP AT lt_found.
  WRITE:/ lt_found-count, lt_found-pattern.
ENDLOOP.

Read only

0 Likes
3,314

You still owe me 2 really big cakes!

Read only

0 Likes
3,314

You still owe me 2 really big cakes!

Hahaha.. I will once I come to your place some day!

I tested yours and it is working fine. Though, I would want to restrict only tokens above certain length like say 3.

By the way, did you execute and test the code snippet I gave?

Of course, it will only compile in ABAP 7 or above kernel.

Also just the below lines of code can check if there are repeated patterns in a string. Now try to match the power )

regex = '([^ ]{3,}).*(\1)'.
FIND ALL OCCURRENCES OF REGEX regex IN ls_string
IF sy-subrc = 0.
"Repeated patterns exist!
ENDIF.

Read only

0 Likes
3,314

Yup, works absolutely perfect! Well done!

Only fiddled around with regex 1 or 2 years ago and even only in a very simple form. And to be honest, i didn't expect this to be that fairly easy to achieve with regex; allthough i suspect it did took you some amount of trial and error to get it finally done.

Still: well done!

Read only

0 Likes
3,314

And additionally: this is the reason why i keep visiting SDN/SCN. To discover gems like this between all the .......

Read only

0 Likes
3,314

Yes, that feeling is mutual Maen!

Glad that we can all meet and learn new things!

Thanks to the OP for bringing this topic

Read only

0 Likes
3,314

Yup! Tx to Vishnu! this '([^ ]{1,}).*(\1)' will certainly be usefull someday

m.

Read only

0 Likes
3,314

I think we scared OP away....

Read only

0 Likes
3,314

I think we scared OP away....

Regular Expressions can scare away even the bravest

Read only

0 Likes
3,314

Hi Guys,

First of all, Thanks to all the replys to my query..

@Vishnu: you showed a hidden secret of "Regular expressions".

@Maen: you cracked hidden secret of "Regular expressions" in ur own way.

@Brendon: ur code is simple to understand for a freshers like me....!!.

@Rob: you are right..... But I know the experts will understand the problem for a solution right away!!.

Well done Guys.....

Venky.

Read only

0 Likes
3,314

> @Rob: you are right..... But I know the experts will understand the problem for a solution right away!!.

Nope - you can't understand if there are unanswered questions. I speak from experience.

Rob

Read only

0 Likes
3,314

Well...

More importantly; what is a repeated pattern? For example, in the string '2222', there are three repetitions of '2', two of '22' and one of '222' including overlaps. So do overlapping strings count or not?

A repeated pattern is a repeated pattern I mean we can always find questions to ask then... does it contain only number? only char? a mix of them? do we have to look for mirrored pattern also? etc....

The things is that here the pattern given as solution covers all cases (from 1 to n length, for any type of strings, with overlapping or not), hence the beauty of the solution!...

This should be done before coding starts - remember?

.... or we could also just read a bit between the lines...

you can't understand if there are unanswered questions. I speak from experience.

Yes we can...the proof: no need to know the answer to your question about the length for example since the solution proposed will work for any searched pattern length ;)... same for your question about overlaps...

... And I speak from experience too

cheers,

m.

Read only

Former Member
0 Likes
3,314

Hi

dude just use all occurences you will find the solution

FIND ALL OCCURRENCES OF '345' in lv_string MATCH COUNT var

Cheers

NZAB

Read only

venkatesha_n
Product and Topic Expert
Product and Topic Expert
0 Likes
3,314

Hello NZAB,

Can you please go through the problem properly....?

Thanks.

Venky.

Read only

0 Likes
3,314

He doesn't know if it is 345 or Vishnu or Maen or Keshav, that can repeat in the string.

He just wants to know if a pattern (which can be anything) is repeating in a string, if so what is (are) those. Not an easy problem.

Keshav, he doesn't know if it is "now" that can repeat in his string. It can be "now" or "then" or "here" or "there", can be anything

Read only

Former Member
0 Likes
3,314

DATA:

patt TYPE string ,

text TYPE string,

lv_lenght type i.

lv_key type n,

lv_match type n.

text = `345723345982343452343345`.

lv_lenght = STRLEN( text ).

do lv_lenght times.

lv_key = sy-index - 1.

DO ( lv_lenght - lv_key ) TIMES.

patt = text+lv_key(sy-index).

FIND ALL OCCURRENCES OF patt IN

text

match COUNT lv_match.

  • RESULTS result_tab.

if lv_match > 1.

write:/ patt, lv_match.

endif.

ENDDO.

ENDDO.

You could store each PATT once checked in table and do a read so as not to check it again

guess should have done refresh before posting

Edited by: Brendan Reid on Dec 21, 2011 9:46 PM

Read only

Former Member
0 Likes
3,314

I think the first thing that should be done is to question and get clarification on the specs.

How long must the pattern be; one character, three, something else?

More importantly; what is a repeated pattern? For example, in the string '2222', there are three repetitions of '2', two of '22' and one of '222' including overlaps. So do overlapping strings count or not?

This should be done before coding starts - remember?

Rob