Technology Blogs by SAP
Learn how to extend and personalize SAP applications. Follow the SAP technology blog for insights into SAP BTP, ABAP, SAP Analytics Cloud, SAP HANA, and more.
cancel
Showing results for 
Search instead for 
Did you mean: 
JonGooding
Product and Topic Expert
Product and Topic Expert
For customers that have an existing (or starting a) footprint of Google BigQuery for data analytics and requiring to blend these datasets with SAP data - the SAP Data Warehouse Cloud provides an ability to cater for the large BigQuery datasets and also maintaining the SAP data within the SAP Data Warehouse cloud without having to move the data.

The process we went through to test this, using a public dataset was in summary:

  1. In Google Cloud Cockpit, created a large Data Set and generated a key for external access

  2. In SAP Data Warehouse Cloud, connected to the BigQuery dataset, and accessed the objects

  3. In SAP Data Warehouse Cloud created a view over BigQuery datasets with a parameter to connect larger datasets in BigQuery


In some more details...

  1. Access BigQuery, and found a decent public dataset as an example. I have had some good history with GDELT as a source, and found some data that was of reasonable size. So copying it to our local project using the BigQuery Data Transfer:



Google Big Query Data Transfer


The dataset I used, had the data set size of 170.39Gb:


With a schema:


As some preparation work for accessing the data in DWC;

  • Created the  BigQuery Service Account and created a key





  • Generated the Service Account Key:



2. In Data Warehouse Cloud noting the BigQuery Key, BigQuery Project Name

Firstly, Import the Google Certificate (under System/Configuration)

Create the  BigQuery connection, using the key generated previously:


Data Warehouse Cloud Connection


Now the fun begins. Given we have roughly 170Gb of data in BigQuery, there is some learnings that can be shared about things we have tried:

  • BigQuery datasets can be fully replicated into DWC if the dataset is smaller


For a dataset of 2.6Gb in GBQ, here it is replicated in DWC at 1GB on Disk in DWC. The full replication took 16 mins to copy.


DWC Replication with BigQuery




  • Trying to fully replicate datasets larger than 10Gb fails with the error: Error:


{"origin":"history","code":"71000129","message":"SQLException; REST API: requestType=BigQuery_FetchQuery, statusCode=403, msg=Response too large to return. Consider specifying a destination table in your job configuration. For more details, see https://cloud.google.com/bigquery/troubleshooting-errors","timestamp":"2022-09-01 11:04:21.6113190"}

3. Applying a filter to access more manageable datasets

  • For large data set, it was also taking too long to use the BigQuery remote table for querying and accessing the data. So a simple workaround is to apply the filters within Data Warehouse Cloud:



DWC View with Parameter


In our example, we used the location as the Parameter, as it was the easiest to sort by, and to test the results.

So some results:

Running the table scan limiting with 1 parameter directly in BigQuery gives:


BigQuery results


 

Initial Testing in SAP Data Warehouse Cloud:

The DWC resultset for the Data Preview (1000 record) in 11 seconds. This is with a dataset which hasn't been cached and the total records is about 900,000 records.

A really quick SAP Analytics Cloud Story based on the SAP Data Warehouse Cloud live connection:


SAP Analytics Cloud Story based on BigQuery data


 

Running the same query in SAP Analytics Cloud via SAP Data Warehouse Cloud :


SAC Performance Analysis


There is a lot more different enhancements that can be done, like dynamic filters based on the story context and actual integration with SAP data. But I just wanted to demonstrate that it's quite easy to make BigQuery data accessible in SAP Data Warehouse Cloud without moving the data.
1 Comment
siva_prakash
Explorer
0 Kudos
Hi jon.gooding

Interesting blog. You mentioned that datasets larger than 10Gb fails with the error.
From your experience, what is the best solution to bring in the above 170 gb dataset?