Application Development Blog Posts
Learn and share on deeper, cross technology development topics such as integration and connectivity, automation, cloud extensibility, developing at scale, and security.
cancel
Showing results for 
Search instead for 
Did you mean: 
MikeB
Contributor
Recently I faced a problem to proceed HTML-code and replace some CSS-expression with HTML tag analog. For instance, font-weight: bold; property inside of <span> tag value must be replaced with <strong> HTML-tag. One of the ways to solve this problem is to use regular expressions in ABAP. Further I'm going to explain my solution with detailed code of ABAP regex.

First-of-all we need to detect <span style="…"> block, where there is a font-weight property, and then surround the content of this block with HTML <strong> tag.

  1. REPLACE ALL OCCURRENCES OF REGEX '(font-weight:[^>]*>)([^♦]*)(♦)(</span>)'

  2. IN html_string WITH '$1<strong>$2</strong>$3$4' IGNORING CASE.


You may ask about „“ symbol, I'll pay attention to it at the end of this post.
 

Some comments:

  • Brackets „(…)“ allow to us to define an block, that can be placed or deleted in specific place in result of regex.

  • Expression „[^>]*“ will get the string until the char „>“, the same logic with „[^♦]*“.

  • By using „$“ char and number we can arrange and put concrete block to the specific place.


Now, when we have found the relevant <span> block and surrounded its content with wanted tag we can remove font-weight property from <span style="…"> block.

  1. REPLACE ALL OCCURRENCES OF REGEX '(font-weight:[^;]*;)'

  2. IN html_string WITH '' IGNORING CASE.


That's all. We just replaced font-weight property in <span> block with <strong> HTML-tag.

Now, it's a turn to explain the meaning of „“ symbol. Actually, it's a kind of workaround for the case of nested HTML-tags inside of span-block, e.g. <span style="…">…<em>…</em>…</span>.

In order to detect the end of span-block content and not the end of any nested tag I add an anchor — „“ symbol before </span> and use this anchor in my regex.

At the and I have to remove this anchor with the following regex:

  1. REPLACE ALL OCCURRENCES OF REGEX '♦'

  2. IN html_string WITH '' IGNORING CASE.


 

Final code:

  1. " set workaround for nested tags case

  2. " I'm using a special char '♦' in order to deal

  3. " with case when we have a nested HTML tags and we want to know

  4. " the real end of the string that we want to surround

  5. " with basic HTML tag

  6. REPLACE ALL OCCURRENCES OF REGEX '</span>'

  7. IN html_string WITH '♦</span>' IGNORING CASE.


  8. " surround bold (FONT-WEIGHT: bold) text with HTML's STRONG tag

  9. REPLACE ALL OCCURRENCES OF REGEX '(font-weight:[^>]*>)([^♦]*)(♦)(</span>)'

  10. IN html_string WITH '$1<strong>$2</strong>$3$4' IGNORING CASE.


  11. " remove unneeded CSS-style font-weight property

  12. REPLACE ALL OCCURRENCES OF REGEX '(font-weight:[^;]*;)'

  13. IN html_string WITH '' IGNORING CASE.


  14. " delete workaround for nested tags case

  15. REPLACE ALL OCCURRENCES OF REGEX '♦'

  16. IN html_string WITH '' IGNORING CASE.


 

Additional links:


P.S. If you know the better way to solve this problem, feel free to share your experience!
4 Comments