One way to speed up the installation of SAP Data Hub is to have a reusable installation host on AWS to use for the installation and for maintenance tasks. I have assembled the various installation steps into this blog post as a guide for building the installation host. Using the steps below, you should have an environment that passes all the preflight checks done by the SAP Data Hub installer.
The installation guide contains the Installation Host Prerequisites listed at help.sap.com but the guide has all the steps in various links in the hierarchy, so I put them all together in one place. Note that this is confirmed for 2.3.x and 2.4.x and should work for future releases of Data Hub - but always confirm with the current product documentation.
Docker (minimum version 1.12.6) is installed and able to push to the internal registry.
Python 2.7 is installed
The Python YAML package (PyYAML) is installed.
The Kubernetes command-line tool, kubectl, is required, using one of the following versions:
1.10.x (greater or equal to 1.10.1)
kubectl must have access to the Kubernetes cluster
The Kubernetes package manager, Helm, is installed and properly configured, using one of the following versions:
2.9.x - is the required version for AWS EKS
For AWS-specific installation, you'll also need
Optional components (my personal recommendation and best practice)
screen for Linux
I'll explain the steps and provide commands for each of these items. This will save you from going to each of the various websites and deciphering each installation individually.
First, provision an EC2 instance as your installation host/jumpbox
For this I will assume you have an AWS account and have appropriate permissions to create instances.
Login to the Amazon Console and navigate to EC2. Make sure to install the EC2 instance in the same AWS region as you will install SAP Data Hub (to limit the cross-region networking costs).
Click the "Launch Instance" Button to start a new EC2 instance. See highlighted areas to check Region and Launch Instance below:
The next step is to select the Amazon Machine Image (AMI) to Launch. I chose Ubuntu server 18.04 64-bit because the software and package installers are easy to use.
Next, select the EC2 instance type for your jumpbox. I chose t3.xlarge because it has 4cpu and 16GB of ram and 5 Gigabit networking which helps in the docker mirroring phase of the Data Hub installation. You can choose another instance type if you like or want to reduce costs.
Next configure your instance details. The key change to make is on the storage of the jumpbox volume. The default storage on my instance was 8 GB which is not enough space to download the Installation files and docker images to the local machine.
The installation guide recommends "It has at least 10GB free disk space for SAP Data Hub installation folders and files, and has at least 20GB free disk space for used container images."
My suggestion is provision 100-200 GB to give yourself plenty of room for multiple installations and future upgrades.
Optional - It is a good idea to add tags to the instance for reference later.
Configure the security groups and make sure you have your keypair associated with the instance available on your local machine so you can login to the machine.
Once the instance is available, note the IP Address and login using SSH from your terminal program using your keypair file. The IP address is located in the lower panel of the instances view of the AWS console.
The machine images on AWS typically have ec2-user as the system user, except for Ubuntu, they use "ubuntu". Login to the terminal with the following:
ssh -i <keyname.pem> ubuntu@<ip address>
Note: I am on a MacBook, so I can use the pem key format. If you're on windows you'll use PuTTY and have to convert the .pem keyfile to a .ppk. There are lots of tutorials on AWS and StackOverflow that cover this topic.
Now we can install the software on the installation host
Unfortunately this installs the latest kubectl (1.14.1) which is not supported. We have to roll it back to a supported version. After some research I found that 1.11.5 works, here's how to downgrade it.
Export the path to your profile so it loads every time you login
echo 'export PATH=$HOME/bin:$PATH' >> ~/.bashrc
If the root user doesn't have a /bin folder under the $HOME directory, just create it and run the commands again.
Install AWS Command Line Interface (awscli)
You'll need the AWS CLI to connect to your AWS instances and the kubernetes cluster. The installation is a simple command:
sudo apt install awscli
Check the version:
Unfortunately, the version installed is way too old for us. 1.14.x. The minimum required version is 1.16.73.
To upgrade the awscli, we need to install pip3 which works with Python 3. The ubuntu image we chose already has Python 3 installed. To confirm type:
To install pip3, just run:
sudo apt install python3-pip
Now we can update the awscli
pip3 install awscli --upgrade --user
Confirm the version again:
We now have version 1.16.147 which is higher than the minimum version 1.16.73.
Optional Additional tools needed to install Data Hub 2.x
Now that we have the basics, you'll need to install unzip to unzip the Data Hub installation files.
Installing unzip is easy, just run:
sudo apt install unzip
I also install screen for linux for the Data Hub installation because it allows you to disconnect from a terminal "screen" and let the docker image mirroring stage of the Data Hub Installation happen without worry of losing my connection. For more information reference this handy guide: https://linuxize.com/post/how-to-use-linux-screen/
To install it run:
sudo apt install screen
Now you are prepared to run your Data Hub installation from your installation host. With the tooling we've installed you should be ready to pass all your pre-flight checks with ease. I've used these steps on multiple implementations and I hope that this blog will save you time as you prepare for your own Data Hub Installation.