2012 Sep 11 11:30 PM
Hi,
How can I select the line in the text between two anchors?
For instance, we have the sentence: «I love <some_anchor>to visit old Europe cities</some_anchor> on holidays».
So, I want to select and store in internal table the text, located between some tag, e.g. <some_anchor>.
Should I use RegEx (regular expressions) or the regular pattern is enough?
And how can I do it?
Thanks.
2012 Oct 08 4:38 AM
Hi, I solved my issue.
Detailed explanation of the subject published in separate post on my blog.
«Regular expressions in ABAP. Approach to HTML processing with regex» —
2012 Sep 12 12:54 AM
It depends... are you trying to match just one specific tag pair, or are you trying to capture some XML/HTML type text? It's generally accepted that you can't parse html with regex
Regex will work, but be careful with it... try program DEMO_REGEX_TOY to mess around with your regular expressions. Consider the following:
DATA result TYPE match_result_tab.
DATA line LIKE LINE OF result.
DATA sub TYPE submatch_result.
DATA text TYPE string.
text = '<tag>this</tag>asdfasdf</tag>asdf<tag>all</tag></tag>as<tag>matches</tag>'.
FIND ALL OCCURRENCES OF REGEX '<tag>((?:[^<]|<?!/tag>)*)</tag>' IN text IGNORING CASE RESULTS result.
LOOP AT result INTO line.
LOOP AT line-submatches INTO sub.
WRITE: / text+sub-offset(sub-length).
ENDLOOP.
ENDLOOP.
the result of this is:
this
all
matches
the regex bit '<tag>((?:[^<]|<?!/tag>)*)</tag>' is slightly confusing looking because regex is lazy: it will not stop matching a sub query until it finds the last tag in the text. In my example, regex <tag>(.*)</tag> would match
'<tag>this</tag>asdfasdf</tag>asdf<tag>all</tag></tag>as<tag>matches</tag>'
2012 Sep 12 7:30 AM
Hi, Jorg
I need to care with all pairs of tags separately, I mean, if we have <SPAN STYLE="…">…<SPAN STYLE="…">…</SPAN></SPAN>, so we need to care of both SPAN tags.
Also, I've paid attention, that in text, you proposed as example:
text = '<tag>this</tag>asdfasdf</tag>asdf<tag>all</tag></tag>as<tag>matches</tag>'.
There is an issue with hierarchy of tags, such as the third tag «</tag>» is a closing tag, but there is no needed open tag in order to build correct tag hierarchy.
2012 Sep 12 8:44 AM
My example only matches "<tag>", and not "<tag ..something else..>". You'll need to come up with a bit more elaborate opening tag regex.
Yeah, nested tags is an issue. Parsing html with regex is not a good idea.
2012 Sep 12 8:47 AM
I just thought of something... perhaps the XML parsing classes can help you out. That's all tags as well.
2012 Oct 08 4:38 AM
Hi, I solved my issue.
Detailed explanation of the subject published in separate post on my blog.
«Regular expressions in ABAP. Approach to HTML processing with regex» —