Application Development and Automation Discussions
Join the discussions or start your own on all things application development, including tools and APIs, programming models, and keeping your skills sharp.
cancel
Showing results for 
Search instead for 
Did you mean: 
Read only

Remove HTML tags from a string

daniel_humberg
Advisor
Advisor
0 Likes
7,068

I have a string that contains a couple of HTML or XHTML tag, for example

 lv_my_string = '<p style="something">Hello <strong>World</strong>!</p>'.

For a special use case, I want to remove all HTML from that string and process only the plain text

 lv_my_new_string = 'Hello World!'.

Is there any method, function module, XSLT or anything else for that already?

1 ACCEPTED SOLUTION
Read only

Former Member
3,527

You can use some Regular Expressions -:)


DATA: message TYPE string.

message = '<p style="something">Hello <strong>World</strong>!</p>'.

REPLACE ALL OCCURRENCES OF REGEX '<[a-zA-Z\/][^>]*>' IN message with space.
WRITE:/ message.

Greetings,

Blag.

11 REPLIES 11
Read only

kesavadas_thekkillath
Active Contributor
0 Likes
3,527

Check

HRDSYS_CONVERT_FROM_HTML & SOTR_TAGS_REMOVE_FROM_STRING

Read only

Former Member
0 Likes
3,527

Hi,

Check

Read only

Former Member
0 Likes
3,527

Try using the followig FMs:

1. SOTR_TAGS_REMOVE_FROM_STRING

2. SWA_STRING_REMOVE_SUBSTRING

Read only

0 Likes
3,527

Thx for the help.

1. SOTR_TAGS_REMOVE_FROM_STRING => nice but not perfect. It removes also single characters like "<" and ">" from the text. I would have to encode them before, right?

2. HRDSYS_CONVERT_FROM_HTML => returns only an empty table in my test

3. SWA_STRING_REMOVE_SUBSTRING => what kind of delete pattern would I use?

Read only

0 Likes
3,527

Hi Daniel,

Hope this code solves your problem.


DATA : ipstr TYPE string,
       opstr1 type string,
       opstr2 type string,
       opstr TYPE string,
       len TYPE i VALUE 0,
       ch TYPE char1,
       num TYPE i VALUE 0,   "No of Characters to be taken
       pos TYPE char3,      "Position of Char in the Input String
       count(3) type n.
*Input string
ipstr = '<p style="something">Hello <strong>World</strong>!</p><br>I need the data.</br>How are you?'.

len = STRLEN( ipstr ).
  DO len TIMES.
*Char by Char
  ch = ipstr+pos(1).
  pos = pos + 1.
*Scan each char in input String for ">"
  FIND '>' IN ch IGNORING CASE.
  IF sy-subrc = 0.
count = count + 1.
endif.
  FIND '<' IN ch IGNORING CASE.
    IF sy-subrc = 0.
count = count + 1.
endif.
enddo.

 

Edited by: Vasuki S Patki on Dec 17, 2009 4:56 PM

Read only

0 Likes
3,527

  split ipstr at '>' INTO opstr opstr1.
    DO count TIMES.
  split opstr1 at '<' into opstr opstr1.
Read only

0 Likes
3,527

 concatenate opstr2 opstr into opstr2.
  split opstr1 at '>' into opstr opstr1.

ENDDO.
  WRITE :/ opstr2.

Please combine all the above posted code and hope thsi helps you..

I tested it with this code and works fine..

Read only

0 Likes
3,527

Hi Daniel,

I tried using the FM (SWA_STRING_REMOVE_SUBSTRING) but I guess it is expecting a particular pattern which is not so apparent in your case. Iu2019ve written a small piece of code which you can try using in a FM or a PERFORM and that should do the trick. Please let me know if you have any questions.


PARAMETER: P_LINE(100).

TYPES: BEGIN OF TY_LINE,
         LINE(100),
       END OF TY_LINE.

DATA: T_LINE TYPE STANDARD TABLE OF TY_LINE,
      WA_LINE LIKE LINE OF T_LINE.

DATA: W_LINE(100),
      W_LEN(100),
      W_COUNT TYPE I,
      W_FLAG,
      W_FLAG1,
      W_I TYPE I.

W_COUNT = STRLEN( P_LINE ).

DO W_COUNT TIMES.
  IF P_LINE+W_I(1) = '<'.
    W_FLAG = 1.
    W_I = W_I + 1.
    IF NOT WA_LINE-LINE IS INITIAL.
      APPEND WA_LINE-LINE TO T_LINE.
      CLEAR WA_LINE.
    ENDIF.
    CONTINUE.

  ELSEIF P_LINE+W_I(1) = '>'.
    W_FLAG = 0.
    W_I = W_I + 1.
    CONTINUE.
  ENDIF.

  IF W_FLAG = 1.
    W_I = W_I + 1.
    CONTINUE.
  ELSE.
    CONCATENATE WA_LINE-LINE P_LINE+W_I(1) INTO WA_LINE-LINE.
    W_I = W_I + 1.
  ENDIF.

ENDDO.

LOOP AT T_LINE INTO WA_LINE.
  CONCATENATE W_LINE WA_LINE-LINE INTO W_LINE SEPARATED BY SPACE.

ENDLOOP.

SHIFT W_LINE LEFT DELETING LEADING SPACE.
WRITE: W_LINE.

Input:

<p style="something">Hello <strong>World</strong>!</p>

Output:

HELLO WORLD ! 

Regards,

Pritam

Read only

Former Member
3,528

You can use some Regular Expressions -:)


DATA: message TYPE string.

message = '<p style="something">Hello <strong>World</strong>!</p>'.

REPLACE ALL OCCURRENCES OF REGEX '<[a-zA-Z\/][^>]*>' IN message with space.
WRITE:/ message.

Greetings,

Blag.

Read only

Former Member
0 Likes
3,527

Hi Daniel,

I realized i made a typo while copying the code. The Do~Enddo code would go like this.


DO W_COUNT TIMES.
  IF P_LINE+W_I(1) = '<'.
    W_FLAG = 1.
    W_I = W_I + 1.
    CONTINUE.
  ENDIF.
 
  IF P_LINE+W_I(1) = '>'.
    W_FLAG = 0.
    W_I = W_I + 1.
    CONTINUE.
  ENDIF.

  IF W_FLAG = 1.
    W_I = W_I + 1.
    CONTINUE.
  ELSE.
    CONCATENATE WA_LINE-LINE P_LINE+W_I(1) INTO WA_LINE-LINE.
    W_I = W_I + 1.
  ENDIF.
 ENDDO.

Please try this let me know if you have any questions

Read only

0 Likes
3,527

Hi Daniel,

There is some formatting issue while posting this code using the symbol {<}.

Basically inside the Do loop the if condition is to be written twice. First time to check for {<} and if satisafied w_flag is set to "1" and second time to check for ">" and if satisfied w_flag is set to "0".