Technology Blogs by SAP
Learn how to extend and personalize SAP applications. Follow the SAP technology blog for insights into SAP BTP, ABAP, SAP Analytics Cloud, SAP HANA, and more.
cancel
Showing results for 
Search instead for 
Did you mean: 
jan_fetzer
Advisor
Advisor
I'd like to use this blog post to quickly run over the various methods for connecting to SAP Data Warehouse Cloud from Python,e .g. from Jupyter notebook. Since SAP Data Warehouse Cloud sits on top of SAP HANA Cloud and allows you to connect to it via Read Access & Write Access, this altogether boils down to connecting to HANA Cloud from Python. These standard ways exist:

  • hdbcli - standard Python client for SAP HANA, available on PyPi

  • SQLAlchemy is a Python SQL toolkit and object-relational mapper. By adding the dialect sqlalchemy-hana to it, you can connect to HANA in a very convenient manner and e.g. read HANA tables as Pandas DataFrames or write Pandas DataFrames easily back to SAP HANA. It is not officially endorsed by SAP, but still works like a charm in most situations.

  • hana_ml is the Python machine learning client for SAP HANA. It is geared towards data scientists and easily makes HANA's ML functionalities available to a Python-savvy audience, but any Python fan can use it and leverage its way to easily access HANA data.


Note that in this blog post I'll only explore the first two, because for hana_ml, excellent material is already available in hana-ml-samples@GitHub. For connectivity, just open any of the notebooks published in that repo.

Setting up a Database User in DWC



  1. Open Space Management of SAP Data Warehouse Cloud and find your space (here: PYTHONSAMPLES). Scroll down and start creating your database user for data consumption as well as data ingestion.

  2. You specify the user name suffix (here PYTHON)

  3. Next tell the system, if you want to enable read access directly from the space schema (same name as space, i.e. PYTHONSAMPLES in my case). With this, your database user will have access to all views (not tables) that were flagged for consumption in their properties in Data Builder

  4. Then tell the system, if you also want to write back. If yes, a so-called Open SQL Schema is generated of the name <schema>#<user> (hence in my case PYTHONCODESAMPLES#PYTHON). Note that this is really a different schema. You will be able to consume data from it in your space schema, but you'll not be able to write directly to the space schema itself (this is for security).


Now let's look at how this looks in the system:


After hitting Create, the system shows all relevant connection details, i.e. host, port (443), user name & password.


Don't forget to copy esp. the password, since this is the only place it ever shows. If you forgot to copy it, you'll need to inspect the database user again and request a new password

Note: If your password contains slashes ("/"), you should request a new one. SQLAlchemy will have issues with slashes since it requires them in escaped form and throws hard-to-understand error messages ("Connect failed (invalid SERVERNODE"") if not.

Add your IP to DWC's IP Allowlist


In order for DWC to accept calls from Python or your database client, your IP will need to be allow-listed. Just ask your search engine of choice "what is my IP" to check your public IP. Then open DWC Homepage > Configuration > IP Allowlist and add that IP there. If you forgot to do these steps, DWC will not accept your call, resulting in errors of the type "cannot connect to database...".



Setting up your Python environment


Let's now spin up a Jupyter notebook and let's do all subsequent actions directly in it. For simplicity sake, I copy the respective cell contents below one by one.


First you need to install hdbcli as well as Sqlalchemy. Sqlalchemy comes in two parts: the main engine and an own extension for the hana-specific 'dialect'. Install all 3 via PyPi/Conda as in

%pip install hdbcli
%conda install -c anaconda sqlalchemy
%pip install sqlalchemy-hana
%conda install -c conda-forge python-dotenv


Obviously your SAP Hana Cloud user and password should not appear in your GitHub repo, so best put it into a separate .env file that you prevent from committing via .gitignore

Copy the password into a new file .env with this content

TEST=some test value
HANA_PWD_PYTHONCODESAMPLES=<your pwd here>


You can hide this file's existence from git by adding a .gitignore file. This will prevent git from uploading the .env file to the repository. .gitignore should contain a single line that says:
.env

Then you can essentially start loading the libraries, test the environment and save the connection details. Let's start by importing the respective libraries
import os
from hdbcli import dbapi
from dotenv import load_dotenv
import pandas as pd

from sqlalchemy import create_engine
from sqlalchemy.engine import reflection
import sqlalchemy.types

Next test the environment file. It should yield 'some test value' since this is what the .env file contains for key "TEST"
# testing out that reading from .env works as expected
load_dotenv()
os.environ.get("TEST")

And let's set some constants for use in all subsequent calls. For simplicity, we also specify the name of an example table & view (both should obviously exist in their respective schemas)
# Connection constants for space PYTHONCODESAMPLES
dbHost="xyz.hana.prod-eu10.hanacloud.ondemand.com" #you should have noted them above
dbPort=443
dbUser="PYTHONCODESAMPLES#PYTHON"
dbPwd=os.environ.get("HANA_PWD_PYTHONCODESAMPLES") #load pwd from .env file
dbIngestionSchema="PYTHONCODESAMPLES#PYTHON"
dbConsumptionSchema="PYTHONCODESAMPLES"
dbExampleTable="NEWTABLE"
dbExampleView="V_Opportunities"

Connectivity via HANA Database Client hdbcli


hdbcli is the standard HANA client for all connections. It is supported by SAP directly and installable via PyPi. Below example is taken directly from SAP's standard documentation.
#Import your dependencies
import platform
from hdbcli import dbapi

#verify that this is a 64 bit version of Python
print ("Platform architecture: " + platform.architecture()[0])

#Initialize your connection
conn = dbapi.connect(
address=dbHost,
port=dbPort,
user=dbUser,
password=dbPwd,
encrypt=True, # must be set to True when connecting to HANA Cloud
sslValidateCertificate=False # True HC, False for HANA Express.
)
#If no errors, print connected
print('connected')

Now open a connection cursor and start printing out the first couple of records to the console
cursor = conn.cursor()
sql_command = f'select top 2 * from "{dbConsumptionSchema}"."{dbExampleView}"'
print(sql_command)
cursor.execute(sql_command)
rows = cursor.fetchall()
for row in rows:
for col in row:
print ("%s" % col, end=" ")
print (" ")

In my case, output was this:


If there is a table in your ingestion schema (you might want to create it ahead of time in SAP HANA Database Explorer or any other database client like e.g.. DBeaver), then this code will also work

Note: we create the table further down in the post by writing a Pandas DataFrame to SAP HANA.
cursor = conn.cursor()
sql_command = f'select top 2 * from "{dbIngestionSchema}"."{dbExampleTable}"'
print(sql_command)
cursor.execute(sql_command)
rows = cursor.fetchall()
for row in rows:
for col in row:
print ("%s" % col, end=" ")
print (" ")

In my case, output is this



Connectivity via SQLAlchemy


SQLAlchemy is an object-oriented way of connecting to databases. Its HANA dialect is only informally supported and has minor issues, but altogether I still consider it to be just way more practical than hdbcli as you'll see below, primarily because of its tight integration into Pandas. Some blog posts can help for SQLAlchemy in general:

Apart from this, there's really little HANA-specifics that you need to be aware of to get going. So let's dive in straight away.
# creation of central connection objects. Set echo=True, if you need a log of all statements issued towards the DB
engine = create_engine(f'hana://{dbUser}:{dbPwd}@{dbHost}:{dbPort}', connect_args={
'sslTrustStore' : "",
'encrypt' : 'true',
'sslHostNameInCertificate' : '*'
}, echo=False)
connection = engine.connect()
inspector = reflection.Inspector.from_engine(engine)

#rp = connection.execute(f'select top 10 * from "{dbConsumptionSchema}"."{dbExampleView}"')
rp = connection.execute(f'select top 2 * from "{dbConsumptionSchema}"."{dbExampleView}"')
print(rp.fetchall())

Your output will be a list of sqlalchemy.engine.result.RowProxy objects:


Note: If you run into a "Connection failed (invalid SERVERNODE)" error here, you should double-check your password. If it contains a slash, SQLAlchemy mistakingly interprets it for a control character rather than escaping it, leading to failure of authentication. You should consider requesting a new password in DWC.

Check what schemas exist


With the next call you can get all database schemas that your database user has access to
engine.dialect.get_schema_names(connection=connection)

In my case this is
['dwc_global',
'dwc_tenant_owner',
'pythoncodesamples',
'PYTHONCODESAMPLES#PYTHON',
'sys',
'_sys_bi',
'_sys_di',
'_sys_plan_stability',
'_sys_sql_analyzer',
'_sys_task']

Beware when using these schema names for subsequent calls: Schemas need to be in mixed-case for usage by SQLAlchemy, while plain SQL requires them in upper case only!

Read views from consumption schema


Consumption schema only exposes views. Use a sligthly different method to read those:
engine.dialect.get_view_names(connection=connection, schema=dbConsumptionSchema)

for my schema yields
['V_Opportunities', 'V_SalesReps']

Now any table is directly at our finger tips and directly loads as Pandas DataFrame
df = pd.read_sql_table(dbExampleView, con=engine, schema=dbConsumptionSchema)

This is really a standard DataFrame like you see here:
type(df), df.columns, df.shape

yields
(pandas.core.frame.DataFrame,
Index(['Opp_ID', 'Name', 'Account', 'Deal Size', 'Owner', 'Email',
'Sales Phase', 'Forecast', 'Projected Close Date', 'Close Probability',
'Opportunity Phase', 'RFP completed', 'Discount Pct'],
dtype='object'),
(12, 13))

Writing to the Database


Pandas DataFrames can be directly written to the database via pandas.Dataframe.to_sql. You can choose to replace, append or fail, if the table already exists. The table will be added to the ingestion schema (here: PYTHONCODESAMPLES#PYTHON). If you want to use it in DWC later, you need to import it from that source


Let's now create a dataframe and write it to SAP HANA. If it exists, you can choose to abort, replace or append. I typically plainly overwrite my current table for simplicity. Note that SAP HANA is fine with this and even if there are views in the consumption schema built on that table, you can still replace that full table and the views will instantly yield the new data
new_df = pd.DataFrame({"id": [1,2,3], "col1":["a","b","c"]})
new_df.to_sql("newtable",con=engine, index=False, if_exists="replace")

# Now the table has been written to SAP HANA as we can check e.g. by reading all table names of the schema
engine.table_names(schema=dbIngestionSchema)

You can also influence the data types used by the database by setting the dtype parameter when calling to_sql.
new_df = pd.DataFrame({"id": [1,2,3], "col1":["a","b","c"], "postcode":[69118, 69125, 69127]})
table_dtypes = {"postcode":sqlalchemy.types.String(15)}
new_df.to_sql(name="newtable",con=engine, dtype=table_dtypes, index=False, if_exists="replace")

While previously the dtype of column postcode was int64, it now will be object, when you read the table again from the database
updatedTable = pd.read_sql_table(tables[0], con=engine, schema=dbIngestionSchema)
updatedTable.dtypes

Reading tables from ingestion schema


Anything written to the ingestion schema is technically a table. Beware about upper/lower case in table names. Let's now read the table that we wrote previously from a Pandas Dataframe:
tables = engine.table_names(schema=dbIngestionSchema)
table = pd.read_sql_table(tables[0], con=engine, schema=dbIngestionSchema)
table

This will yield



Read table/view metadata


Finally, you can also read metadata for tables & views. Note that types below are sqlalchemy.types
itMeta = inspector.get_columns(table_name='V_Opportunities', schema=dbConsumptionSchema.lower())
itMeta

This will yield a list of dictionaries describing each column as in


I hope this blog post helps in your endeavours to use SAP Data Warehouse Cloud in from your Python environment. If you have any questions, please drop a note and I'm happy to help out.

Cheers, Jan

 

 
3 Comments