‎2009 Dec 16 2:23 PM
I have a string that contains a couple of HTML or XHTML tag, for example
lv_my_string = '<p style="something">Hello <strong>World</strong>!</p>'.
For a special use case, I want to remove all HTML from that string and process only the plain text
lv_my_new_string = 'Hello World!'.
Is there any method, function module, XSLT or anything else for that already?
‎2009 Dec 17 4:24 PM
You can use some Regular Expressions -:)
DATA: message TYPE string.
message = '<p style="something">Hello <strong>World</strong>!</p>'.
REPLACE ALL OCCURRENCES OF REGEX '<[a-zA-Z\/][^>]*>' IN message with space.
WRITE:/ message.
Greetings,
Blag.
‎2009 Dec 16 2:25 PM
Check
HRDSYS_CONVERT_FROM_HTML & SOTR_TAGS_REMOVE_FROM_STRING
‎2009 Dec 16 3:04 PM
‎2009 Dec 17 12:54 AM
Try using the followig FMs:
1. SOTR_TAGS_REMOVE_FROM_STRING
2. SWA_STRING_REMOVE_SUBSTRING
‎2009 Dec 17 9:48 AM
Thx for the help.
1. SOTR_TAGS_REMOVE_FROM_STRING => nice but not perfect. It removes also single characters like "<" and ">" from the text. I would have to encode them before, right?
2. HRDSYS_CONVERT_FROM_HTML => returns only an empty table in my test
3. SWA_STRING_REMOVE_SUBSTRING => what kind of delete pattern would I use?
‎2009 Dec 17 10:58 AM
Hi Daniel,
Hope this code solves your problem.
DATA : ipstr TYPE string,
opstr1 type string,
opstr2 type string,
opstr TYPE string,
len TYPE i VALUE 0,
ch TYPE char1,
num TYPE i VALUE 0, "No of Characters to be taken
pos TYPE char3, "Position of Char in the Input String
count(3) type n.
*Input string
ipstr = '<p style="something">Hello <strong>World</strong>!</p><br>I need the data.</br>How are you?'.
len = STRLEN( ipstr ).
DO len TIMES.
*Char by Char
ch = ipstr+pos(1).
pos = pos + 1.
*Scan each char in input String for ">"
FIND '>' IN ch IGNORING CASE.
IF sy-subrc = 0.
count = count + 1.
endif.
FIND '<' IN ch IGNORING CASE.
IF sy-subrc = 0.
count = count + 1.
endif.
enddo.
Edited by: Vasuki S Patki on Dec 17, 2009 4:56 PM
‎2009 Dec 17 11:04 AM
split ipstr at '>' INTO opstr opstr1.
DO count TIMES.
split opstr1 at '<' into opstr opstr1.
‎2009 Dec 17 11:32 AM
concatenate opstr2 opstr into opstr2.
split opstr1 at '>' into opstr opstr1.
ENDDO.
WRITE :/ opstr2.
Please combine all the above posted code and hope thsi helps you..
I tested it with this code and works fine..
‎2009 Dec 18 6:19 PM
Hi Daniel,
I tried using the FM (SWA_STRING_REMOVE_SUBSTRING) but I guess it is expecting a particular pattern which is not so apparent in your case. Iu2019ve written a small piece of code which you can try using in a FM or a PERFORM and that should do the trick. Please let me know if you have any questions.
PARAMETER: P_LINE(100).
TYPES: BEGIN OF TY_LINE,
LINE(100),
END OF TY_LINE.
DATA: T_LINE TYPE STANDARD TABLE OF TY_LINE,
WA_LINE LIKE LINE OF T_LINE.
DATA: W_LINE(100),
W_LEN(100),
W_COUNT TYPE I,
W_FLAG,
W_FLAG1,
W_I TYPE I.
W_COUNT = STRLEN( P_LINE ).
DO W_COUNT TIMES.
IF P_LINE+W_I(1) = '<'.
W_FLAG = 1.
W_I = W_I + 1.
IF NOT WA_LINE-LINE IS INITIAL.
APPEND WA_LINE-LINE TO T_LINE.
CLEAR WA_LINE.
ENDIF.
CONTINUE.
ELSEIF P_LINE+W_I(1) = '>'.
W_FLAG = 0.
W_I = W_I + 1.
CONTINUE.
ENDIF.
IF W_FLAG = 1.
W_I = W_I + 1.
CONTINUE.
ELSE.
CONCATENATE WA_LINE-LINE P_LINE+W_I(1) INTO WA_LINE-LINE.
W_I = W_I + 1.
ENDIF.
ENDDO.
LOOP AT T_LINE INTO WA_LINE.
CONCATENATE W_LINE WA_LINE-LINE INTO W_LINE SEPARATED BY SPACE.
ENDLOOP.
SHIFT W_LINE LEFT DELETING LEADING SPACE.
WRITE: W_LINE.
Input:
<p style="something">Hello <strong>World</strong>!</p>Output:
HELLO WORLD ! Regards,
Pritam
‎2009 Dec 17 4:24 PM
You can use some Regular Expressions -:)
DATA: message TYPE string.
message = '<p style="something">Hello <strong>World</strong>!</p>'.
REPLACE ALL OCCURRENCES OF REGEX '<[a-zA-Z\/][^>]*>' IN message with space.
WRITE:/ message.
Greetings,
Blag.
‎2009 Dec 19 7:34 AM
Hi Daniel,
I realized i made a typo while copying the code. The Do~Enddo code would go like this.
DO W_COUNT TIMES.
IF P_LINE+W_I(1) = '<'.
W_FLAG = 1.
W_I = W_I + 1.
CONTINUE.
ENDIF.
IF P_LINE+W_I(1) = '>'.
W_FLAG = 0.
W_I = W_I + 1.
CONTINUE.
ENDIF.
IF W_FLAG = 1.
W_I = W_I + 1.
CONTINUE.
ELSE.
CONCATENATE WA_LINE-LINE P_LINE+W_I(1) INTO WA_LINE-LINE.
W_I = W_I + 1.
ENDIF.
ENDDO.
Please try this let me know if you have any questions
‎2009 Dec 19 7:57 AM
Hi Daniel,
There is some formatting issue while posting this code using the symbol {<}.
Basically inside the Do loop the if condition is to be written twice. First time to check for {<} and if satisafied w_flag is set to "1" and second time to check for ">" and if satisfied w_flag is set to "0".