<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic OCR parameters explanation in Artificial Intelligence Forum</title>
    <link>https://community.sap.com/t5/artificial-intelligence-forum/ocr-parameters-explanation/m-p/706659#M104</link>
    <description>&lt;P&gt;Hey,&lt;BR /&gt;I've got some questions regarding the parameters of the OCR service.&lt;BR /&gt;As stated in the &lt;A href="https://help.sap.com/viewer/b04a8fe9c04745b98ad8652ccd5d636f/1.0/en-US/3fa18aca0e35421394b620327875f04a.html" target="_blank"&gt;documentation&lt;/A&gt; there are several different options for the page segmentation mode and the type of the machine learning model.&lt;BR /&gt;The description of these parameters is really short. Does anyone know where I can find a more in depth description?&lt;BR /&gt;&lt;/P&gt;
  &lt;P&gt;&lt;STRONG&gt;Questions about the modelType&lt;/STRONG&gt;&lt;BR /&gt;Regarding the different modelTypes I would like to know the difference between lstmPrecise, lstmFast and lstmStandard. I am familiar with LSTM cells but I didn't find any information on what makes the&lt;STRONG&gt; "&lt;/STRONG&gt;&lt;EM&gt;precise&lt;/EM&gt;&lt;B&gt; &lt;/B&gt;model" &lt;EM&gt;precise&lt;/EM&gt;, the "&lt;I&gt;fast&lt;/I&gt; mode" &lt;I&gt;fast&lt;/I&gt; and so on. &lt;/P&gt;
  &lt;P&gt;There also is a model with "LSTM cells and standard processing algorithms". Is there any information what standard processing algorithms are used?&lt;/P&gt;
  &lt;P&gt;I am also looking for information on the training of these models.&lt;/P&gt;
  &lt;P&gt;&lt;STRONG&gt;Questions about the pageSegMode&lt;/STRONG&gt;&lt;/P&gt;
  &lt;P&gt;Most of the options are pretts self-explanatory, however I stumbled upon pageSegmode 13 - "Raw line. Treat the image as a single text line, bypassing hacks that are Tesseract-specific".&lt;BR /&gt;I know Tesseract as a free software for optical character recognition. Is the OCR service SAP provides based on Tesseract?&lt;BR /&gt;What Tesseract-specific hacks are bypassed?&lt;BR /&gt;&lt;/P&gt;
  &lt;P&gt;I really hope that there is someone out there who can help me with this questions or at least has an idea who might know this.&lt;BR /&gt;&lt;BR /&gt;Thanks in advance and best regards,&lt;BR /&gt;Leonie&lt;/P&gt;</description>
    <pubDate>Sun, 04 Feb 2024 04:29:15 GMT</pubDate>
    <dc:creator>Former Member</dc:creator>
    <dc:date>2024-02-04T04:29:15Z</dc:date>
    <item>
      <title>OCR parameters explanation</title>
      <link>https://community.sap.com/t5/artificial-intelligence-forum/ocr-parameters-explanation/m-p/706659#M104</link>
      <description>&lt;P&gt;Hey,&lt;BR /&gt;I've got some questions regarding the parameters of the OCR service.&lt;BR /&gt;As stated in the &lt;A href="https://help.sap.com/viewer/b04a8fe9c04745b98ad8652ccd5d636f/1.0/en-US/3fa18aca0e35421394b620327875f04a.html" target="_blank"&gt;documentation&lt;/A&gt; there are several different options for the page segmentation mode and the type of the machine learning model.&lt;BR /&gt;The description of these parameters is really short. Does anyone know where I can find a more in depth description?&lt;BR /&gt;&lt;/P&gt;
  &lt;P&gt;&lt;STRONG&gt;Questions about the modelType&lt;/STRONG&gt;&lt;BR /&gt;Regarding the different modelTypes I would like to know the difference between lstmPrecise, lstmFast and lstmStandard. I am familiar with LSTM cells but I didn't find any information on what makes the&lt;STRONG&gt; "&lt;/STRONG&gt;&lt;EM&gt;precise&lt;/EM&gt;&lt;B&gt; &lt;/B&gt;model" &lt;EM&gt;precise&lt;/EM&gt;, the "&lt;I&gt;fast&lt;/I&gt; mode" &lt;I&gt;fast&lt;/I&gt; and so on. &lt;/P&gt;
  &lt;P&gt;There also is a model with "LSTM cells and standard processing algorithms". Is there any information what standard processing algorithms are used?&lt;/P&gt;
  &lt;P&gt;I am also looking for information on the training of these models.&lt;/P&gt;
  &lt;P&gt;&lt;STRONG&gt;Questions about the pageSegMode&lt;/STRONG&gt;&lt;/P&gt;
  &lt;P&gt;Most of the options are pretts self-explanatory, however I stumbled upon pageSegmode 13 - "Raw line. Treat the image as a single text line, bypassing hacks that are Tesseract-specific".&lt;BR /&gt;I know Tesseract as a free software for optical character recognition. Is the OCR service SAP provides based on Tesseract?&lt;BR /&gt;What Tesseract-specific hacks are bypassed?&lt;BR /&gt;&lt;/P&gt;
  &lt;P&gt;I really hope that there is someone out there who can help me with this questions or at least has an idea who might know this.&lt;BR /&gt;&lt;BR /&gt;Thanks in advance and best regards,&lt;BR /&gt;Leonie&lt;/P&gt;</description>
      <pubDate>Sun, 04 Feb 2024 04:29:15 GMT</pubDate>
      <guid>https://community.sap.com/t5/artificial-intelligence-forum/ocr-parameters-explanation/m-p/706659#M104</guid>
      <dc:creator>Former Member</dc:creator>
      <dc:date>2024-02-04T04:29:15Z</dc:date>
    </item>
  </channel>
</rss>

