Recently I faced a problem to proceed HTML-code and replace some CSS-expression with HTML tag analog. For instance,
font-weight: bold; property inside of
<span> tag value must be replaced with
<strong> HTML-tag. One of the ways to solve this problem is to use regular expressions in ABAP. Further I'm going to explain my solution with detailed code of ABAP regex.
First-of-all we need to detect
<span style="…"> block, where there is a
font-weight property, and then surround the content of this block with HTML
<strong> tag.
- REPLACE ALL OCCURRENCES OF REGEX '(font-weight:[^>]*>)([^♦]*)(♦)(</span>)'
- IN html_string WITH '$1<strong>$2</strong>$3$4' IGNORING CASE.
You may ask about „
♦“ symbol, I'll pay attention to it at the end of this post.
Some comments:
- Brackets „(…)“ allow to us to define an block, that can be placed or deleted in specific place in result of regex.
- Expression „[^>]*“ will get the string until the char „>“, the same logic with „[^♦]*“.
- By using „$“ char and number we can arrange and put concrete block to the specific place.
Now, when we have found the relevant
<span> block and surrounded its content with wanted tag we can remove
font-weight property from
<span style="…"> block.
- REPLACE ALL OCCURRENCES OF REGEX '(font-weight:[^;]*;)'
- IN html_string WITH '' IGNORING CASE.
That's all. We just replaced
font-weight property in
<span> block with
<strong> HTML-tag.
Now, it's a turn to explain the meaning of „
♦“ symbol. Actually, it's a kind of workaround for the case of nested HTML-tags inside of span-block, e.g.
<span style="…">…<em>…</em>…</span>.
In order to detect the end of span-block content and not the end of any nested tag I add an anchor — „
♦“ symbol before
</span> and use this anchor in my regex.
At the and I have to remove this anchor with the following regex:
- REPLACE ALL OCCURRENCES OF REGEX '♦'
- IN html_string WITH '' IGNORING CASE.
Final code:
- " set workaround for nested tags case
- " I'm using a special char '♦' in order to deal
- " with case when we have a nested HTML tags and we want to know
- " the real end of the string that we want to surround
- " with basic HTML tag
- REPLACE ALL OCCURRENCES OF REGEX '</span>'
- IN html_string WITH '♦</span>' IGNORING CASE.
- " surround bold (FONT-WEIGHT: bold) text with HTML's STRONG tag
- REPLACE ALL OCCURRENCES OF REGEX '(font-weight:[^>]*>)([^♦]*)(♦)(</span>)'
- IN html_string WITH '$1<strong>$2</strong>$3$4' IGNORING CASE.
- " remove unneeded CSS-style font-weight property
- REPLACE ALL OCCURRENCES OF REGEX '(font-weight:[^;]*;)'
- IN html_string WITH '' IGNORING CASE.
- " delete workaround for nested tags case
- REPLACE ALL OCCURRENCES OF REGEX '♦'
- IN html_string WITH '' IGNORING CASE.
Additional links:
P.S. If you know the better way to solve this problem, feel free to share your experience!