Technology Blogs by SAP
Learn how to extend and personalize SAP applications. Follow the SAP technology blog for insights into SAP BTP, ABAP, SAP Analytics Cloud, SAP HANA, and more.
cancel
Showing results for 
Search instead for 
Did you mean: 
Sefan_Linders
Product and Topic Expert
Product and Topic Expert

Introduction

At time of writing there is no dedicated connection tile for Snowflake in SAP Datasphere. An alternative that you can use today is the Generic JDBC connection, which can leverage the Snowflake JDBC driver. Since I got several questions on how to configure this and overcome certain limitations, I'd like to share my view on how to best configure this. This setup supports both federation and batch replication through remote tables. Data Flows are not natively supported, but you can use the created remote tables as a source in a Data Flow.This blog treats the connection from SAP Datasphere, but as the underlying framework for the connection is SAP Smart Data Integration, a similar configuration can be made on SAP HANA Cloud, although the user interface will be different.

Requirements

Besides the obvious need for a SAP Datasphere tenant and a Snowflake account, you will need the following:

  • Installed SAP Data Provisioning Agent (version 2.6.1.0 or later). The installation is not handled in this blog;
  • Admin rights on Snowflake to create a user and assign privileges.

How to set up

Prepare the CamelJDBC adapter and XML files

Most of the steps are described or linked from in these SAP Help pages, but in summary the following steps need to be followed:

  • Upload the Snowflake JDBC driver to folder <DPAgent_root>/camel/lib;
  • After uploading the driver, restart the DP Agent for the driver to load;
  • Check if the CamelJDBC config part in <DPAgent_root>/camel/adapters.xml is uncommented (details here);
  • Make sure CAP_LIMIT is part of the capabilities in file <DPAgent_root>/camel/adapters.xml, otherwise TOP or LIMIT operations are not pushed down;
  • Make sure the Generic JDBC adapter is activated in SAP Datasphere;
  • If you made any changes in the config files or upgraded the agent, make sure to refresh the capabilities in SAP Datasphere.
  • Make sure you have the right BACKTICK setting. In <DPAgent_Install_DIR>/camel, check the file configfile-jdbc.properties. It should say delimident=true, not delimident=BACKTICK. If other Camel clients require this, they should reside in a separate DP agent installation.
  • In the same configfile-jdbc.properties file, check the "filename" parameter to see which jdbc-dialect file needs to be edited in the next two steps:
    • Edit file sample-jdbc-dialect.xml, add the following line in the SQLRewrite block: <FuncRewrite fromFunc="LIMIT" toFunc="LIMIT"/>. In later versions of DP Agent, this is needed to manage the push down of LIMIT operations.
    • In the same sample-jdbc-dialect.xml file, also change the data type mapping for source type "NUMBER". Make sure the file has the following two entries for that source type. This is necessary because in Snowflake, any integer-like value (e.g. tinyint or bigint) is defined as NUMBER (38,0) in the source dictionary. To overcome issues with values not fitting in the default mapped HANA INTEGER data type, the mapping should be changed to BIGINT.

 

 

<Mapping srcType="NUMBER" length="" precision="" scale="[1,38]" hanaType="DECIMAL" />
<Mapping srcType="NUMBER" length="" precision="" scale="[-38,0]" hanaType="BIGINT" />

 

 

 

Create a user in Snowflake

Create a dedicated user in Snowflake that you use to logon with from SAP Datasphere.  With the following SQL statements I created a user on Snowflake, but there is obviously a bit of freedom here on how you want to do things. Please do note that:

  • This user requires operate/usage privileges on a warehouse or you will run into errors;
  • I have chosen not to assign a default database and a default warehouse to the user, and instead I provide those parameters in the connection string in the SAP Datasphere connection. But you have to assign it either to the user directly, or add it in the connection string, or you run into errors.
  • I recommend to assign a maximum of one database to a user. Assigning more databases can become confusing, as SAP Datasphere will list all schemas of all databases in one go, without differentiating between databases, and only the default database can actually be queried. If you have more than one database to connect to, create one connection with a dedicated user for each database.

 

 

 

 

 

create role dwc_role;
grant operate on warehouse sample_wh_xs to role dwc_role;
grant usage on warehouse sample_wh_xs to role dwc_role;
create user dwc_generic password='<PASSWORD>' must_change_password = false;
grant role dwc_role to user dwc_generic;
alter user dwc_generic set default_role = dwc_role;

 

 

 

 

 

Create the connection in SAP Datasphere

You need to create a DWC connection for each Snowflake database you want to connect. The reason is that one Snowflake connection can give you access to multiple databases. However, SAP Datasphere does not use the database metadata. This can result in multiple issues:

  • Seeing multiple schemas with the same name, not knowing to which database they belong;
  • Not being able to fetch data from any other database other than the default database that has been set for the user or configured in the connection string.

In SAP Datasphere, choose to setup a new connection and choose connection tile "Generic JDBC". The configuration should look similar as in below screenshot.

Figure 1: Configuration in Generic JDBC connection tile in SAP Datasphere


JDBC driver class: net.snowflake.client.jdbc.SnowflakeDriverJDBC connection string: jdbc❄️//.snowflakecomputing.com?db=SNOWFLAKE_SAMPLE_DATA&warehouse=sample_wh_xsAs you can see, I have set a database and warehouse which will be used as default in the JDBC session. Alternatively, you can leave this empty and assign a default user and warehouse when creating the user in Snowflake.After completing this connection wizard, the setup is completed and you should be able to start using your Snowflake as a source for modeling.

Troubleshooting

Below, the most common issues with their solution are listed. Please note that when you run into something unexpected in SAP Datasphere regarding this connection type, and the error thrown in the user interface is not clear or not present, please check the Data Provisioning Agent log files. This can be done either from SAP Datasphere directly, or by checking the framework.trc log files on the Data Provisioning Agent instance in the <DPAgent_root>/log directory.

Data preview fetches all data from source

Upon data preview, it can happen that all data is being fetched from the source. You would notice this either because the data preview time is very high, or you have checked the SQL statement in the remote query monitor, or in the Snowflake historic statements where you do not see a TOP statement applied. In that case:

  • Check if your Data Provisioning Agent is up to date. Make sure to run the latest DPA. A feature to push down TOP and LIMIT operations was added in version 2.6.1.0;
  • (After upgrading) make sure to check the capabilities as explained in paragraph "Prepare the CamelJDBC adapter and XML files", rebooted the Agent, and refreshed the agent capabilities in SAP Datasphere;
  • You might also have to re-save your connection in SAP Datasphere, by just opening the connection settings, re-entering your credentials and saving the connection. Sometimes this is needed additionally to refresh the adapter capabilities for this remote source.

DWC does not list databases, and shows all schemas of all databases unorganized

Below a screenshot of seeing duplicate schema names in SAP Datasphere.

Figure 2: Snowflake schema representation in SAP Datasphere source hierarchy

The reason for the duplicate schemas showing up, is that these schemas are present in multiple Snowflake databases. However, the database metadata is not used to present the source system hierarchy and therefore the schemas look like duplicates.To partially overcome the representation issue, the Snowflake database user should be restricted in the number of databases authorised for. This also means that if you want to access multiple Snowflake databases, you should create separate connections and use separate Snowflake users for each connection.However, there are always the two databases DEMO_DB and UTIL_DB assigned to any user, which both have a schema INFORMATION_SCHEMA and PUBLIC. Therefore, these will always show up as duplicates.

DWC data preview error: “Schema X does not exist or not authorized”

When trying to access a schema other than the user default schema, the following error is listed in the agent logs after trying to fetch data, even though the remote table can be created. When doing a data preview, the error occurs.2022-05-19 12:36:44,419 [ERROR] [1ffdd62b-2fac-4883-9841-54281957b4cc52674] DefaultErrorHandler | CamelLogger.log - Failed delivery for (MessageId: ID-ip-172-31-0-19-1652960718543-0-3 on ExchangeId: ID-ip-x-x-0-3). Exhausted after delivery attempt: 1 caught: net.snowflake.client.jdbc.SnowflakeSQLException: SQL compilation error:
Schema 'SAMPLE_DB.TPCH_SF1' does not exist or not authorized.The reason for this is that SAP Datasphere is not aware of, or ignores, the database parameter. To overcome this issue, create an individual connection for each Snowflake database that you want to access, with a parameter to set the default database. This can be achieved using a database connection parameter in the connection settings, as you can see in the paragraph Create the connection in SAP Datasphere.

Data preview returns 0 records, or throws an error on data preview “an error occurred while loading metadata”

It can happen (like it happened to me) that initially data fetching and data preview is working, and then at some point it just runs into an error. When this happened to me, the error in the agent logs was as follows:2022-05-23 08:22:08,550 [ERROR] [15ad44f9-9977-4e42-b317-b045f7fa4cc474829] DefaultErrorHandler | CamelLogger.log [] - Failed delivery for (MessageId: EDAED96EEC4A7FE-0000000000000002 on ExchangeId: EDAED96EEC4A7FE-0000000000000002). Exhausted after delivery attempt: 1 caught: net.snowflake.client.jdbc.SnowflakeSQLException: No active warehouse selected in the current session.  Select an active warehouse with the 'use warehouse' command.It turns out that I had not set an active warehouse for the user, a setting that might have changed after I stopped working actively in the Snowflake admin tool and a new session was initiated which did not have a default warehouse assigned.The solution is to assign a default warehouse to the user. This can be done either by assigning a default warehouse to the user using Snowflake user management, or by defining the default warehouse in the connection parameters, as you can see in the example in the connection configuration paragraph. When you make changes like these, you might have to disconnect the session from the Snowflake admin tool to force a new session with the new settings.

Data preview error: NumberFormatException

In most cases, this is because you are loading values larger than Integer into a HANA Integer field. Check paragraph Prepare the CamelJDBC adapter and XML files on how to change the mapping for source data type NUMBER(38,0).You might also have another data type mapping issue and have to adjust another source to target data type mapping. Check the agents framework.trc logs for more clues. Usually the erroneous value is listed, from which you can deduct which source or target data type is the culprit.

Cannot load JDBC driver class

The following error might appear in the DP Agent framework.trc file:

Failed to query schemas. org.apache.commons.dbcp.SQLNestedException: Cannot load JDBC driver class 'net.snowflake.client.jdbc.SnowflakeDriver'

In that case:

  • Check if the driver is really the right Snowflake JDBC driver (in my case the file name has the pattern "snowflake-jdbc-<version>.jar") and if permissions allow the dp agent user to read it;
  • Check if you uploaded into the right folder (<DPAgent_root>/camel/lib);
  • Make sure you restarted the DP Agent after you uploaded the driver;

No specific error, but the connection doesn't work

If the connection validation throws a non-specific error such as "internal error: Cannot get remote source objects [...]" and the DP Agent logs do not show any more specific errors, then double check the hostname of your Snowflake instance. If there are errors in the hostname, this might not be be found in the DP Agent log files.

Conclusion

Hopefully this blog helped you setting up or troubleshooting your connection from SAP Datasphere to Snowflake, using the Generic JDBC connection tile. If you have any experiences to add, just leave them in the comments. 

19 Comments