Application Development and Automation Discussions
Join the discussions or start your own on all things application development, including tools and APIs, programming models, and keeping your skills sharp.
cancel
Showing results for 
Search instead for 
Did you mean: 
Read only

Parse HTML to Internal Table

former_member302911
Active Participant
0 Likes
4,557

Hi All,

i'm facing a problem in parsing the following generated XLS (HTML):

<html xmlns:v="urn:schemas-microsoft-com:vml"
xmlns:o="urn:schemas-microsoft-com:office:office"
xmlns:x="urn:schemas-microsoft-com:office:excel"
xmlns="http://www.w3.org/TR/REC-html40"><meta content="text/html; charset=UTF-8" http-equiv="Content-Type">
<meta name="ProgId" content="Excel.Sheet"/>
<meta name="Generator" content="Microsoft Excel 10"/>
<!--[if !mso]>
<style>
v\\:* {behavior:url(#default#VML);}");
o\\:* {behavior:url(#default#VML);}");
x\\:* {behavior:url(#default#VML);}");
.shape {behavior:url(#default#VML);}");
</style>");
<![endif]-->
<!--[if gte mso 9]><xml>
<x:ExcelWorkbook>
<x:ExcelWorksheets>
<x:ExcelWorksheet>
<x:Name>report</w:Name>
<x:WorksheetOptions>
<x:ProtectContents>False</w:ProtectContents>
<x:ProtectObjects>False</w:ProtectObjects>
<x:ProtectScenarios>False</w:ProtectScenarios>
</w:WorksheetOptions>
</w:ExcelWorksheet>
</w:ExcelWorksheets>
<x:ProtectStructure>False</w:ProtectStructure>
<x:ProtectWindows>False</w:ProtectWindows>
</w:ExcelWorkbook>");
</xml><![endif]-->
<head>
<style>
br {mso-data-placement:same-cell;}
</style>
</head>
<body>
<style>
table {
mso-displayed-decimal-separator:"\.";
mso-displayed-thousand-separator:"\,";
}
</style>
<table width="100%">
<tr>
<td align=center colspan=8 valign=top>
<span class="pageHead">
<nobr><h1>Rating</h1></nobr></span>
</td>
</tr>
<tr>
<td align=center colspan=8 valign=top>
<span class="pageHead"><nobr>
Generated Automatically
</nobr></span>
</td></tr>
<tr>
<td> </td>
</tr>
<tr>
<td> </td>
</tr>
</table>

********************************FROM HERE***********************************
<table border="1" cellspacing="0" cellpadding="0" width="100%">
<tr>
<th>Column 1</th>
<th>Column 2</th>
<th>Column 3</th>
</tr>
<tr>
<td style="vnd.ms-excel.numberformat:@">Text 1</td>
<td style="vnd.ms-excel.numberformat:@">Text 2</td>
<td>1</td>
</tr>
<tr>
<td style="vnd.ms-excel.numberformat:@">Text 1</td>
<td style="vnd.ms-excel.numberformat:@">Text 2</td>
<td>2</td>
</tr>
<tr>
<td style="vnd.ms-excel.numberformat:@">Text 1</td>
<td style="vnd.ms-excel.numberformat:@">Text 2</td>
<td>3</td>
</tr>
</table>
********************************TO HERE***********************************

</body></html>

Scenario is the following:

  1. Incoming Email in SAP System
  2. Reading Attachment XLS (XSTRING Format)
  3. Conversion XSTRING to STRING Table
  4. Parse the Result(HTML Format) into Internal Table

I can't convert XSL directly because the file is in XSTRING Format and is not on Local Machine but come from third part software through email.

I want to parse the highlighted part into internal table.

What is the best approach to do this ?

Best regards,

Angelo.

1 ACCEPTED SOLUTION
Read only

ChrisSolomon
Active Contributor
2,923

Oddly enough, this reminds me of one of the most famous answers on Stack Overflow (ie. why you do NOT use regex to parse HTML)....

http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/17...

9 REPLIES 9
Read only

retired_member
Product and Topic Expert
Product and Topic Expert
2,923

There are lots of possibilities. E.g., you stay with string processing and you parse the relevant part of the HTML file with the help of regular expressions. Or you treat the XSTRING directly as XML and parse it with methods of the sXML library.

Read only

0 Likes
2,923

Horst, thanks a lof for the help.

I'll try and search what you suggested.

Do you have any helpful example or link?

Best regards,

Angelo.

Read only

matt
Active Contributor
2,923

Please search yourself.

Read only

chaouki_akir
Contributor
0 Likes
2,923

Hello,

What do you mean by "Incoming Email in SAP System" ? Where do you see, in a SAP system, an "incoming email" ? Transaction SBWP ?

Read only

former_member302911
Active Participant
2,923

The transaction to see incoming e-mails is SOIN.

See this Blog

Read only

ChrisSolomon
Active Contributor
2,925

Oddly enough, this reminds me of one of the most famous answers on Stack Overflow (ie. why you do NOT use regex to parse HTML)....

http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/17...

Read only

retired_member
Product and Topic Expert
Product and Topic Expert
2,923

LOL ......

Read only

0 Likes
2,923

Really funny! 😄

Read only

former_member302911
Active Participant
0 Likes
2,923

I solved the problem parsing the file with a simple Transformation.

XML Classes return me a lot of TAG error.

I separate the relevant part from the internal table

<table border="1" cellspacing="0" cellpadding="0" width="100%">
<tr>
<th>Column 1</th>
<th>Column 2</th>
<th>Column 3</th>
</tr>
<tr>
<td style="vnd.ms-excel.numberformat:@">Text 1</td>
<td style="vnd.ms-excel.numberformat:@">Text 2</td>
<td>1</td>
</tr>
<tr>
<td style="vnd.ms-excel.numberformat:@">Text 1</td>
<td style="vnd.ms-excel.numberformat:@">Text 2</td>
<td>2</td>
</tr>
<tr>
<td style="vnd.ms-excel.numberformat:@">Text 1</td>
<td style="vnd.ms-excel.numberformat:@">Text 2</td>
<td>3</td>
</tr>
</table>

...and use this simple Transformation:

<?sap.transform simple?>
<tt:transform xmlns:tt="http://www.sap.com/transformation-templates" xmlns:ddic="http://www.sap.com/abapxml/types/dictionary" xmlns:def="http://www.sap.com/abapxml/types/defined">
  <tt:root name="READ" type="ddic:ZSTRUCTURE"/>
  <tt:template>
    <table>
      <tt:skip count="1" name="tr"/>
      <tt:loop ref=".READ">
        <tr>
          <td tt:value-ref="COL1"/>
          <td tt:value-ref="COL2"/>
          <td tt:value-ref="COL3"/>
        </tr>
      </tt:loop>
    </table>
  </tt:template>
</tt:transform>

Best regards,

Angelo.