Technology Blog Posts by Members
cancel
Showing results for 
Search instead for 
Did you mean: 
natanael1
Explorer
1,398

The 3-hours to 7-Second Story

We had a weekly data integration form ABAP in Cloud, via an OData service, that took about 3 hours(10800 seconds) and nearly 600 requests to finish. After a small redesign, the same data now arrives in ~7 seconds(instead of 10800 seconds) using just 3 requests. So a dramatic x1500 reduction. No new servers, no fancy tools, just smarter packaging of the data, but of course with some drawback.

The full PDF paper, and code is available in this GitHub repo: https://github.com/legonmarian/abap-btp-api-optimization

Quick Context

  • Infrastructure: ABAP in Cloud on SAP BTP.

  • Data: A big, flat table (~3 million rows).
  • Task: We needed to expose this data to another system once a week
  • Status quo: OData service exposing this table, called 600 times, 5.000 rows per call. It was slow and expensive, the extraction workflow took ~3 hours
  • Constraint: Streaming is not available, the data shape should ideally stay the same, no external systems should be involved.
  • "vision": Deliver everything fast, simple, and cheap for both sides.

The Simple Change

Why not OData for this job:

OData is a really good tool for interactive reads: $filter, $expand, small pages, typed entities, also for Fiori applications, but our use case was the opposite: one flat dataset, all of it, as fast as possible.
With OData we’d still pay the cost of many small pages and per-entity overhead the client didn’t need. And most of the consumers wanted a simple file payload they could ingest with generic tools.
Besides that the biggest restriction why OData cannot be used for this use case is that, OData handles out-of-the-box the data transfer, so you have no control on the data shape, or compression(which proves to be a big player further).

Why a plain HTTP service instead

On the other hand an HTTP Service with an HTTP handler class gives us full control over the wire format (JSON/CSV), headers, and compression.

And this is the main reason why we moved from an out-of-the-box nice and elegant OData service but slow, to an HTTP service with a custom handler but fast.

Why “just HTTP” still wasn’t enough

So the HTTP service doesn't provide any other functionality out of the box other than full control over the request and the response objects. That means that all the other hard work needed in order to send back in the response a DB table should be implemented by the developer, this includes:

  • pagination
  • serialization
  • compression(optional)

Besides that when it comes to custom code, and big tables, the developer must ensure that the program's complexity isn't surpassing the allocated resources and constraints, both in terms of time and memory.

Therefore we explored(and documented in the code) multiple ways of doing this. For more details on these please check the PDF in here. But here are some of our conlusions:

  • If compression is not suitable for your use case, moving from JSON to CSV is a huge advantage
  • Compressing is the best solution, this way you can keep your JSON schema
  • There're several compression algorithms but the most suitable for us in ABAP in cloud proved to be GZIP with out-of-the-box support. We examined the compression / decompression performance of multiple algorithms in the PDF paper
  • The fastest JSON serializer is CALL TRANSFORMATION being thousands of times faster than the xco library
  • Reading a DB table in an internal table and serializing it requires memory, and we ran out of it when trying to serialize the full table.
    • An alternative to this was to read chunks of it, serialize and compress it and then repeat for next chunk. But this is a bit less client-friendly from a decompression perspective
  • Streaming data is not possible with the current ICF of ABAP in Cloud
  • One last performance optimization is parallelization, this can cut the load time in 3-4 times, but use it carefully
  • Always profile for bottlenecks
  • Always stress test to find the right parameters like page size, concurrent calls, etc.

What we landed on (the pattern)

  • Keep JSON for compatibility, but generate it using  CALL TRANSFORMATION

  • Compress every response and signal it with Content-Encoding: gzip. One single gzip member per response so common clients auto-decompress reliably.
  • Use increased paging (few big pages) to cut roundtrips. 1.000.000 records per call
  • Use parallelization

Before vs After

TL;DR: We didn’t change the data, only the delivery: coarse pages, fast JSON, and gzip over a plain HTTP contract.

ParameterBeforeAfter
Dataset~3,000,000 rows (flat, ~12 columns)Same
Page size5,000 rows/page~1,000,000 rows/page (tunable)
Number of requests~5933
End-to-end time~3 hours (sequential pulls)~6-7 seconds (3 parallel pulls)
Total transfer~6000 MB~9 MB
Payload per page~1.15 MB per 5k rows (raw)~3 MB per 1M rows (gzipped)
ProtocolODataPlain REST
CompressionNoneContent-Encoding: gzip (single member)
Client patternSequential loopFetch pages in parallel, then merge

How to reproduce this in three simple steps

Heads-up: in here I will refer to an Appendix, you can find it in the detailed PDF paper about the optimization, the paper and some code is available on GitHub


Step 1. Define a tiny HTTP contract

  • Endpoint: GET /entity?offset=…&count=…

  • Helper: GET /entity?get_only_count=true returns a small pagination object with totals and suggested pages, for example:
{
  "number_of_records": 2496434,
  "batch_size": { "maximum": 1500000, "recommended": 1000000 },
  "recommended_pages": [
    "/entity?offset=0&count=1000000",
    "/entity?offset=1000000&count=1000000",
    "/entity?offset=2000000&count=1000000"
  ]
}

This keeps clients simple and lets them plan parallel pulls.


Step 2. Build one big page, serialize, and gzip it on the server

a) Fast JSON generation with CALL TRANSFORMATION
Appendix C shows the lean serializer that perform the best, for the performance comparasion please check the PDF paper

METHOD convert_json_transformation.
  DATA(lo_writer) = cl_sxml_string_writer⇒create( type = if_sxml⇒co_xt_json ).
  CALL TRANSFORMATION id
    SOURCE itab = data
    RESULT XML lo_writer.
  string = lo_writer->get_output( ).
ENDMETHOD.

Use this to turn your internal table into JSON quickly.

b) Minimal HTTP handler that serves one gzipped page
Appendix I demonstrates the pattern: set headers, read a deterministic slice, serialize, gzip once, send bytes.

METHOD gzip_json_single_page.
  response->set_status( 200 ).
  response->set_content_type( 'application/gzip' ).
  response->set_header_field(
    i_name = 'Content-Disposition'
    i_value = |attachment; filename="data_subset.gz"| ).
  response->set_compression(
    options = if_web_http_response⇒co_compress_none ).
  response->set_header_field(
    i_name = 'Content-Encoding'
    i_value = |deflate| ).

  SELECT column_1, column_2, ... , column_12
    FROM dbtable
    ORDER BY column_2
    INTO TABLE @DATA(page)
    UP TO @page_size ROWS.

  cl_abap_gzip⇒compress_binary(
    EXPORTING raw_in = convert_json_transformation( page )
    IMPORTING gzip_out = DATA(gzip) ).

  response->set_binary( gzip ).
ENDMETHOD.

c) Where the handler is wired
Appendix F shows the entry point choosing which implementation to run:

METHOD if_http_service_extension~handle_request.
  " choose one of these
  gzip_json_single_page( CHANGING request = request response = response ).
  " only for demonstration
  "gzip_csv_single_page( CHANGING request = request response = response ).
  " only for demonstration
  "gzip_csv_multiple_pages( CHANGING request = request response = response ).
ENDMETHOD.

Step 3. Let the client pull a few big pages in parallel

  • Ask get_only_count first to get total and recommended pages.

  • Fire 2-4 page requests in parallel, then merge locally.
  • Most clients auto-decompress when Content-Encoding is set, which is exactly why the server returns a single gzip response.

Optional: CSV variant from the appendix

If you ever need CSV, Appendix H shows the single-page CSV + gzip flow. The JSON path above stayed as our final choice because gzip erases most of JSON’s key overhead while keeping tooling friendly. If you'd like to read more on why the JSON -> CSV conversion becomes obsolete after gzipping please check PDF paper.

When to use this pattern and when not to

Use it when

  • You need to deliver a large, flat dataset fast, usually for batch or analytics.

  • Your consumers are happy with a plain HTTP GET that returns JSON.
  • You can sort by a unique key and read stable slices with offset and count.

Think twice when

  • Consumers need rich OData features like server-side filtering and $expand. You will be giving those up and implementing only what you need in plain HTTP.

  • Clients cannot hold the decompressed page in memory. A 1,000,000 row page is roughly a few hundred MB once decompressed, so plan for that on the client side.
  • You require true streaming. ABAP ICF in the cloud does not support HTTP/1.1 chunked transfer, so streaming is out of scope here.

Gotchas to avoid

  • Do not concatenate multiple gzip members in one HTTP response if you expect tools like Postman to auto-decompress. Many clients only unpack the first member. Prefer one contiguous gzip stream per response.

  • Set the right headers. Send Content-Encoding: gzip for the single member response. If you build a multi-member payload, clients may not decode it automatically.
  • Keep ordering stable. Always ORDER BY a unique key to make pages deterministic and retries idempotent.
  • Pick a page size both sides can hold. For us 1,000,000 rows per page hits a good balance. Results and size scale with total rows, not so much with page count.
  • Stick to one gzip pass per response. Compress after you generate JSON for the page, not per record or per mini-chunk inside the same response. It keeps clients simple.

For the full PDF paper, and code please check this GitHub repo: https://github.com/legonmarian/abap-btp-api-optimization

 

6 Comments
zeno_rummler
Product and Topic Expert
Product and Topic Expert
0 Kudos
The github link give me a 404.
natanael1
Explorer
0 Kudos
@zeno_rummier thx for that, the link is now fixed
Sandra_Rossi
Active Contributor
0 Kudos

I'm more impressed by the quality of your post than the gain of performance 😉

  1. Could you please confirm: if for some reason, for instance the project wants OData only (not clear why), a "bulk" OData service is developed with a base64 string property containing the ~3 MB per 1M rows (gzipped), it shouldn't be so much slower than a minimal HTTP service like you did, right?
  2. Something else: you are zipping via CL_ABAP_GZIP and you use "Content-Encoding: gzip", doesn't it compress twice? (By the HTTP client in the SAP kernel.) What is the interest?

Thank you.

natanael1
Explorer

@Sandra_Rossi thank you very much!
So to answer your question, I think it won't be much slower, but ~33% slower because of the Base64 increase in size, but this is just a theoretic assumption, which I'm not a huge fan of 😅, to have a clear answer I'd need to just implement a small PoC and profile it. But could you explain why there's the need for Base64?

Regarding double compression, as far as I know setting the response header of the HTTP service to "Content-Encoding: gzip" doesn't compress the data out of the box, you still need to handle the compression manually, similarly how "Content-Type: application/json" won't serialize it to a JSON. But I will try it out and see if there's some magic😅 I plan to come up with a follow-up for this post because actually there's one better(more standard) way of exposing a simple OData service, and just asking for the compressed data by setting the request header to "Content-Encoding: gzip"

Sandra_Rossi
Active Contributor
0 Kudos

@natanael1 

  1. OData medium is JSON (or Atom XML which is deprecated since V4.0). How else than Base64 can you send compressed data via OData? I mean, what's better?
  2. It was just a guess, please forget it: I read too late that the magic would be with Transfer-Encoding, not with Content-Encoding, but if I believe rfc2616 - Content-Encoding vs Transfer Encoding in HTTP - Stack Overflow, "Transfer Encodings other than "chunked" (sadly) aren't implemented in practice". If I understand well, the header field "Transfer-Encoding: chunked" is implemented by the HTTP server or client, not by the receiving application, while Content-Encoding (like Content-Type and the rest of the message) is an information to be used or not by each client or server, and so it could be absent if the payload is always gzipped and the receiving application knows that and does the ungzip systematically.
natanael1
Explorer
0 Kudos

Ooh I see what you mean, so basically you meant sending an OData that will be like:

{
  "d": {
    "results": [
      {
        "__metadata": {
          "uri": "https://api.example.com/odata/BulkData('bulk1')",
          "type": "Demo.BulkData"
        },
        "ID": "bulk1",
        "Name": "Example",
        "CompressedData": "H4sIAAAAAAAA/0tMSgYBAAD..."  /* B64 encoded GZip */
      }
    ],
    "__count": "1"
  }
}

And it is valid, I see where you're coming from with the Base64, but I don't think it is very efficient, because the client will need extra implementation in order to decode + decompress the CompressedData. But you have a couple of alternatives, one of them is using the $value property of OData, this way you're able to expose the raw value of an object and this can be the raw binary of the GZip.

However I think the better way of approaching this would be to create a simple OData service that will expose the true values of the rows, and send the header Accept-Encoding: gzip. If you do this, you will keep your payload standard and work out of the box as before, but the data will be compressed by the OData engine. I will show an example with a standard SAP ABAP in Cloud API called Manage Git Repositories, which is an OData API:

No compression requestNo compression request

 So in this case I made the call to the OData API and I disabled the Accept-Encoding header on purpose. The response comes uncompressed with a size of 21KB, which is generally a very small payload😂, but it still ca be smaller.

Now I will remake the same call but I will enable the Accept-Encoding: gzip header:

Compressed requestCompressed request

And as you can see, the response size has dramatically decreased in size, now it is 2KB only, compared to the previous 21KB, also Postman indicates both the compressed and decompressed payload size. And the beauty of this is that at the end you have your clean OData API, without needing to do any manual compression / decompression / decoding, etc.