Application Development and Automation Discussions
Join the discussions or start your own on all things application development, including tools and APIs, programming models, and keeping your skills sharp.
cancel
Showing results for 
Search instead for 
Did you mean: 
Read only

Select text between two tags

MikeB
Contributor
0 Likes
3,775

Hi,

How can I select the line in the text between two anchors?

For instance, we have the sentence: «I love <some_anchor>to visit old Europe cities</some_anchor> on holidays».

So, I want to select and store in internal table the text, located between some tag, e.g. <some_anchor>.

Should I use RegEx (regular expressions) or the regular pattern is enough?

And how can I do it?

Thanks.

1 ACCEPTED SOLUTION
Read only

MikeB
Contributor
0 Likes
1,534

Hi, I solved my issue.

Detailed explanation of the subject published in separate post on my blog.

«Regular expressions in ABAP. Approach to HTML processing with regex» —

http://scn.sap.com/community/abap/blog/2012/10/08/a-regular-expression-regex-approach-to-html-proces...

5 REPLIES 5
Read only

Former Member
0 Likes
1,534

It depends... are you trying to match just one specific tag pair, or are you trying to capture some XML/HTML type text? It's generally accepted that you can't parse html with regex

Regex will work, but be careful with it... try program DEMO_REGEX_TOY to mess around with your regular expressions. Consider the following:

DATA result TYPE match_result_tab.

DATA line LIKE LINE OF result.

DATA sub TYPE submatch_result.

DATA text TYPE string.

text  = '<tag>this</tag>asdfasdf</tag>asdf<tag>all</tag></tag>as<tag>matches</tag>'.

FIND ALL OCCURRENCES OF REGEX '<tag>((?:[^<]|<?!/tag>)*)</tag>' IN text IGNORING CASE RESULTS result.

LOOP AT result INTO line.

   LOOP AT line-submatches INTO sub.

     WRITE: / text+sub-offset(sub-length).

   ENDLOOP.

ENDLOOP.

the result of this is:

this

all

matches

the regex bit '<tag>((?:[^<]|<?!/tag>)*)</tag>' is slightly confusing looking because regex is lazy: it will not stop matching a sub query until it finds the last tag in the text. In my example, regex <tag>(.*)</tag> would match

'<tag>this</tag>asdfasdf</tag>asdf<tag>all</tag></tag>as<tag>matches</tag>'

Read only

0 Likes
1,534

Hi, Jorg

I need to care with all pairs of tags separately, I mean, if we have <SPAN STYLE="…">…<SPAN STYLE="…">…</SPAN></SPAN>, so we need to care of both SPAN tags.

Also, I've paid attention, that in text, you proposed as example:

text  = '<tag>this</tag>asdfasdf</tag>asdf<tag>all</tag></tag>as<tag>matches</tag>'.

There is an issue with hierarchy of tags, such as the third tag «</tag>» is a closing tag, but there is no needed open tag in order to build correct tag hierarchy.

Read only

0 Likes
1,534

My example only matches "<tag>", and not "<tag ..something else..>". You'll need to come up with a bit more elaborate opening tag regex.

Yeah, nested tags is an issue. Parsing html with regex is not a good idea.

Read only

0 Likes
1,534

I just thought of something... perhaps the XML parsing classes can help you out. That's all tags as well.

Read only

MikeB
Contributor
0 Likes
1,535

Hi, I solved my issue.

Detailed explanation of the subject published in separate post on my blog.

«Regular expressions in ABAP. Approach to HTML processing with regex» —

http://scn.sap.com/community/abap/blog/2012/10/08/a-regular-expression-regex-approach-to-html-proces...