Technology Blogs by SAP
Learn how to extend and personalize SAP applications. Follow the SAP technology blog for insights into SAP BTP, ABAP, SAP Analytics Cloud, SAP HANA, and more.
cancel
Showing results for 
Search instead for 
Did you mean: 
schneidertho
Product and Topic Expert
Product and Topic Expert
6,356
In my last blog post I wrote about what it means that SAP Data Hub is a containerized application. Today I want to talk about the installation of SAP Data Hub. All explanations relate to SAP Data Hub 2.3 or newer.

This has a prelude: For SAP TechEd I had submitted a MeetUp about installing SAP Data Hub. I wanted to demonstrate the installation during this MeetUp with a live demo… and unfortunately realized a few hours before that the MeetUp room did not come with a monitor ☹. My improvisation consisted of a few hastily prepared whiteboard drawings. I took these drawings as a basis for this blog post.

There is a lot written about installing SAP Data Hub in the official documentation (in particular here and here). My intention behind this blog post is clearly not to enable you to install SAP Data Hub without consulting the documentation. Instead I like to complement the documentation by looking a bit behind the scenes.

Overview of an installation


Every installation of SAP Data Hub – independent of where you install the software – simply spoken consists of three phases: preparation, installation and post-installation.

The installation phase consists of four sub-phases: pre-flight checks, mirroring, deployment and validation. That’s how it looks:



During the installation you get in touch with different “things”. I have tried to depict them in the following diagram:


Preparation


During the preparation phase you need to set up an installation host (1) to run the installation procedure as well as a Kubernetes cluster (2) and a “local” container registry (3) to install SAP Data Hub through the installation procedure.

Remark: If SAP Data Hub is operated as part of an SAP system landscape, the recommended installation procedure to install SAP Data Hub is using SAP Maintenance Planner (see here). An alternative is a command-line tool (install.sh) delivered with SAP Data Hub. For today, I will use the command line tool. I might write a separate blog post about using SAP Maintenance Planner… or maybe a colleague will do. Behind the scenes always install.sh runs. So, what you learn today will stay valid.

The installation host (1) is a Linux computer / virtual machine. It must meet certain requirements. For example, it needs to have Docker, Python, the Kubernetes command-line tool (kubectl) and the Kubernetes package manager (helm) installed.

The installation procedure for SAP Data Hub assumes that you have a running Kubernetes cluster (2) as well as “local” container registry (3). Just like the installation host, both must meet certain requirements. The Kubernetes cluster (2) needs to consist of at least three nodes (all details can be found here).

Depending on whether you like to install SAP Data Hub on-premise or in the cloud (and on which cloud provider) the steps to spin up the cluster and registry defer. I do not like to bloat this blog post by listing the individual commands.

Installation


After you have prepared for the installation, you install SAP Data Hub. Thereto you download the software archive from the SAP Software Download Center (4) to the installation host (1):



After unpacking the software archive (in this example DHFOUNDATION03_3-80004015.zip, i.e. SAP Data Hub 2.3 patch 3), you find the following folder structure on the installation host (1):



Now the fun begins. You start the installation by running the command line tool (install.sh). The command line tool has two mandatory parameters: the Kubernetes namespace used to deploy SAP Data Hub and the “local” container registry (3). You can start the installation like this:



The command line tool has many more additional parameters. It will later prompt for (some of) these, if you do not pass them.

Pre-Flight Checks


After you have started the installation, the command line tool performs a couple of checks to ensure that the necessary prerequisites to install SAP Data Hub are fulfilled. These checks are supposed to ensure that the installation does not break halfway. They are comparable to the checks a pilot performs prior to takeoff to minimize the risk of a plane crash. Hence SAP’s developers called them “pre-flight checks”:



Subsequently to the pre-flight checks, the command line tool prompts for additional parameters (which you did not pass when calling install.sh). Finally, it asks you to confirm the parameters (aka “configuration”) for the installation:


Mirroring


After you have confirmed the parameters, the command line tool first mirrors the container images for SAP Data Hub. They will later be used to run the different components of SAP Data Hub on the Kubernetes cluster (2).

Mirroring means the command line tool pulls the necessary container images from the (private) SAP container registry (5) as well as from (public) third party container registries (6) to the installation host (1). Afterwards it tags the container images for and pushes them to the “local” container registry (3). “Local” means the container registry which is used by the Kubernetes cluster (2).

The following screenshot shows some of the container images on the installation host (1):



You can see that each container image is listed two times:

  • Once the container image is tagged with the container registry it was pulled from, e.g. the SAP container registry (73554900100200008830.dockersrv.repositories.sap.ondemand.com) or Docker Hub (docker.io).

  • Once it is tagged with the container registry it was pushed to, i.e. the “local” container registry (3) used by the Kubernetes cluster (2). In this example this is the container registry eu.gcr.io/…234664.


The SAP container registry (5) includes the container images for all versions (support packages, patches) of SAP Data Hub. The command line tool is “bound” to one version of SAP Data Hub (in this example SAP Data Hub 2.3 patch 3). All relevant container images are listed in the ./tools/images.sh file inside the software archive downloaded from the SAP Software Download Center (4).

Deployment


After all necessary container images have been mirrored, the command line tool deploys the different components of SAP Data Hub. For this it uses the Kubernetes package manager (helm). At the end of the deployment, all containers needed by SAP Data Hub will run on the Kubernetes cluster (2). The cluster will look similar to this now (the screenshot shows all running containers):



Necessary files for helm are stored in the ./deployment directory inside the software archive downloaded from the SAP Download Center (4).

Validation


Finally, install.sh runs a couple of validations to ensure that SAP Data Hub is functional. The following screenshot shows the output in case all validations are successful:



The validations include:

  • Creation of tables in SAP Vora, execution of several queries (vora-cluster)

  • Execution of smoke tests for Spark (vora-sparkkonk8s)
    Remark: Certain features of SAP Data Hub make use of Spark und run Spark workloads on the Kubernetes cluster.

  • Connection to SAP Data Hub System Management and verification of installed applications, e.g. Connection Management, Metadata Explorer, Vora Tools (vora-vsystem)

  • Verification of the SAP HANA database used by applications like the aforementioned ones (datahub-app-base-db)


You can find the detailed results of the validations in the ./logs folder:


Post-Installation


After you have successfully installed the software, additional post-installation steps can be necessary. Again (just like for the preparation), the steps depend on whether you like to install SAP Data Hub on-premise or in the cloud (and on which cloud provider). And again, I do not like to bloat this blog post by listing the individual commands. If you like to know the details, then you can take a look at the official documentation.

Hooray. SAP Data Hub is running. you can log on with the user / password passed to the command line tool earlier:



That’s all for now. I hope you found this blog post interesting. Next time I will most likely write something about data pipelines and workflows…
34 Comments
0 Kudos
Hello Thorsten

Maybe you can help me.

I want to install Data Hub on a Kubernetes Cluster on a Centos 7 VM.

But my Hana-0 and vora-*** pods wont start.

The only error I get from the installation script is: waiting for these pods to become ready.

And after that the installation is canceld.

I hope you can help me.
schneidertho
Product and Topic Expert
Product and Topic Expert
0 Kudos
Hi Miguel,

how big is your cluster? Is it one VM/node? The hana-0 pod alone requests 20GB of memory. My assumption is that it does not get this memory and the pod hence never gets ready. Do you have a chance to use a bigger cluster?

Cheers
Thorsten
0 Kudos
Hello Thorsten

Thanks for your fast anwser.

I have 3 VM in my cluster. I upgraded my master server to 64 Gb of Memory, 64 Gb hard drive and 6 cores but the pods still wont start. The other two server have each 16 Gb of Memory, 32 Gb hard drive and 4 cores.

How much memory do you gave to your VM?

Cheers

Miguel
schneidertho
Product and Topic Expert
Product and Topic Expert
0 Kudos
Hi Miguel,

the minimum requirement is 3 nodes, 32 GB RAM each (the 3 nodes are the workers only and do not include the master)

https://help.sap.com/viewer/e66c399612e84a83a8abe97c0eeb443a/2.3.latest/en-US/79724de552db4b2b81c4a8...

I am not saying that it is completely impossible to "tweak" things. But I do not like to recommend things we don't support as per the official documentation. Hope that helps...

Cheers

Thorsten
0 Kudos
Hi Thorsten,

 

Thank you for the helpful blog.

I am trying to install SAP Data Hub and the installer asked me for login credentials to access SAP Docker Artifactory.

I have tried with my SAP S-Users (including with my SAP Logon to user DMZ stores) unsuccessfully - I got error: [ERROR] Could not login with the provided credentials!

How I can request credentials for SAP Docker Artifactory access (Technical or S-User)?

 

Cheers,

Anton
schneidertho
Product and Topic Expert
Product and Topic Expert
0 Kudos
Hi Anton,

do you (or does your company) have a license for SAP Data Hub. Otherwise the download will not work.

Best regards

Thorsten
0 Kudos
Hello Thorsten,

I am installing SAP Data Hub 2.3. While installing , i am stuck at the mirroring process. Mirroring of one of the images throws me below error :



Can you please let me know where at the file system level does this downloading and extraction of images take place? As it says no space left on device, we need to find out the directory whether on installation host or the kubernetes cluster .

Please advice.

Thanks,
Shivani
schneidertho
Product and Topic Expert
Product and Topic Expert
0 Kudos
Hi Shivani,

I looked at your screenshot and also checked with our team developing the installer. It seems your installation host does not have enough space. You need 10 GB disk and 20 GB for container images

https://help.sap.com/viewer/e66c399612e84a83a8abe97c0eeb443a/2.4.latest/en-US/40cc1c6cd72546378182f0...

Hope that helps.

Br

Thorsten
0 Kudos
Hello ,

I proceeded with the above error but i am getting error in the validation phase.





 

Do you have any idea ?

Thanks,

Shivani
schneidertho
Product and Topic Expert
Product and Topic Expert
0 Kudos
Hi,

have you taken a look at the logs of dqp-validator-job as suggested in the validation log? What do the logs "say"?

Cheers
Thorsten
0 Kudos
Hello Thorsten,

Below is the logs for the dqp-validator-job

Running query: SELECT USER_NAME FROM SYS.USERS WHERE USER_NAME='default\vora-admin'; connecting to 10.35.255.244 at port 10002 (10.35.255.244:10002) ... switched to existing session "" query send... error on server response: "could not handle api call, failure reason : :-1, CException, Code: 10021 : Runtime category : an std::exception wrapped. invalid user default\vora-adminNext level: invalid user default\vora-admin " cease to connect to 10.35.255.244:10002... Validating schemas have been created in Vora ... Running query: SELECT * FROM SYS.SCHEMAS connecting to 10.35.255.244 at port 10002 (10.35.255.244:10002) ... switched to existing session "" query send... error on server response: "could not handle api call, failure reason : :-1, CException, Code: 10021 : Runtime category : an std::exception wrapped. invalid user default\vora-adminNext level: invalid user default\vora-admin " cease to connect to 10.35.255.244:10002...

I am not getting where this user default\vora-admin is created.

Thanks

Shivani
schneidertho
Product and Topic Expert
Product and Topic Expert
0 Kudos
Hi,

can you please use the “--validate” flag with install.sh to validate the installation.

I have talked to the development team. Hard to know the root cause w/o diving into more details of the installation log.

We have recently fixed an error in 2.4.1 which sporadically led to this problem. When install.sh with “--validate” flag passes, your cluster is good.

If not, the best is to open a ticket and let development support check this.

Best regards
Thorsten
0 Kudos
Hello!

Thank you for this blog, it was helpful while setting up DH on GCP.  I have a service account attached to my Kubernetes cluster and nodes.  This service account has GCS Storage Admin capabilities.  But I always see this WARNING in my Trace logs for pipelines.

 

"3/21/2019, 2:55:10 PM","WARN ","Scope 'devstorage.read_write' missing. Unable to push images to GCR",vflow,container,1,newGoogleContainerRegistry

 

Any suggestions?

 

Thank You,

Will
schneidertho
Product and Topic Expert
Product and Topic Expert
0 Kudos
Hey Will,

Can you run any graphs from the modeling environment (like the simple data generator sample which is delivered).

If that works, I currently would not worry about the warning. If that does not work, then something with the permissions is not.

Can you check what happens when you run the data generator sample?

Cheers
Thorsten
marcus_schiffer
Active Participant
0 Kudos
Hi,

 

I have the same issue, hana-o will not start. My cluster is on azure with 4 nodes 8vcpus each and 32GB RAM each.

What am I missing ?
schneidertho
Product and Topic Expert
Product and Topic Expert
0 Kudos
Hi,

hard to day without further information. Please check what the logs for the pod say (do a kubectl logs and/or kubectl describe...)

Cheers
Thorsten
marcus_schiffer
Active Participant
0 Kudos

Hi,

 

here is some log from kubectl describe.

Seems to be an authorization issue. I am however not sure which user would cause this error.

The pre flight checks showed pull / push are ok.

So what am I missing ?

Failed to pull image “registry.azurecr.io/com.sap.hana.container/base-opensuse42.3-amd64:2.03.031.00-3.1.0”: rpc error: code = Unknown desc = Error response from daemon: Get https://registry.azurecr.io/v2/com.sap.hana.container/base-opensuse42.3-amd64/manifests/2.03.031.00-...: unauthorized: authentication required

schneidertho
Product and Topic Expert
Product and Topic Expert
0 Kudos
Hi Marcus,

I am not sure, if the pre-flight checks also check the pull from the Azure registry or only from the SAP registry.

To me it seems either your service principal does not have the necessary roles or the image pull secret is missing.

db8ac33b71d34a778adf273b064c4883 has written an excellent post how to install Data Hub on Azure: https://blogs.sap.com/2019/01/10/your-sap-on-azure-part-13-install-sap-data-hub-on-azure-kubernetes-...

Have you checked this?

Cheers

Thorsten


BJarkowski
Active Contributor
0 Kudos
Thanks thorsten.schneider for a mention and your kind words about my blog.

marcus.schiffer  I also think this is an issue with the Service Principal authorization. In my blog there is a script that should fix it. I don't want to copy it here, but you can easily identify it by looking for "Modify for your environment. The ACR_NAME is the name of your Azure Container".
marcus_schiffer
Active Participant
0 Kudos
Hi,

thanks for the reply.

I solved this by adding the application ID of the repository to the role contributor in the ACR.

Now it works and the installation finished.
marcus_schiffer
Active Participant
0 Kudos
Now I get a new issue: The datahub runs, but in the pipeline modeller no pipeline runs.

The error is always:

cannot connect to docker registry https://registry.azurecr.io: Get https://registry.azurecr.io/v2/: http: non-successful response (status=401 body="{\"errors\":[{\"code\":\"UNAUTHORIZED\",\"message\":\"authentication required\",\"detail\":null}]}\n")


So there must be some other authorization problem.

Any help is appreciated.
BJarkowski
Active Contributor
marcus_schiffer
Active Participant
0 Kudos
Hi Bartosz,

 

seems to be the solution, however I am a bit lost with all these authorization stuff.

The help tells me to create the secret file with user (that should be the name of the registry)  and PW.

But I do not see a way to create this user and PW in Azure ACR.

So where would I get the password from to put it in the secret file?

 
BJarkowski
Active Contributor
0 Kudos
In Azure Container Registry blade there is an entry in the menu called Access Keys. I think I have used that.
RolandKramer
Active Contributor
0 Kudos
Hello Thorsten,
I did as well a DataHub 2.6 Installation from the Maintenance Planer and the usage of the SL Container Bridge recently. Another Blog also describes the Setup of the Jump server for the SLC Bridge

Best Regards Roland
0 Kudos
Hello Marcus,

The link mentioned by Bartosz is not working any more . Appreciate if you could help me to point out the solution .

https://help.sap.com/viewer/e66c399612e84a83a8abe97c0eeb443a/2.4.0.archive02/en-US/db861eb7aeac41d4b...

Regards

Satish

 

 
0 Kudos
Hi Marcus,

how did you solve this authorization issue ?

I don't have the rights to see the link from sap help desk.

 

Cheers

Emir
HWO
Explorer
0 Kudos
What Authorization method we select if we install 2.6 using SAP CAL. -- Standard or the extended one for Kubernetes ?

 
RolandKramer
Active Contributor
0 Kudos
Hello Rahul

From the Microsoft Azure Help - https://docs.microsoft.com/en-us/azure/aks/tutorial-kubernetes-prepare-acr the Basis Authorisation method should be sufficient ...



Best Regards Roland
0 Kudos
Hello Thorsten

Maybe you can help me,

I was trying to install SAP DATA HUB 2.6 on AWS EKS while I was installing stuck in phase getting below error.  If it was a space issue I have already given 150 gb space to my EC2 instance. If it is really space issue then can you please suggest how can we increase the space to hana-0?

HANA-0 is shows pending status



 

Thanks & Regards,

R.Van ES.
schneidertho
Product and Topic Expert
Product and Topic Expert
0 Kudos
Hi,

I expect this to be insufficient memory (or CPU). You can fine details when you describe the pod via kubectl command.

Cheers
Thorsten
Rushi
Product and Topic Expert
Product and Topic Expert
0 Kudos


renevanes


Rushi
Product and Topic Expert
Product and Topic Expert
0 Kudos
Please check output of "KUBECTL Get sc" storage class, if storage class is not using "DEFAULT" then HANA Pods will fail due to Persistency volumes was not bound to storage class.