Disclaimer
This tutorial is intended as a guide for the creation of demo/test data only. The sample script provided is not intended for use in a productive system.
Purpose
The following tutorial explains a way of generating demo data for the Gigya related database tables in SAP Business Suite Foundation.
Following are the tables:
SMI_USR_ACCOUNT
SMI_USR_CRTFCT
SMI_USR_EDCTN
SMI_USR_FAVORITE
SMI_USR_IDENTITY
SMI_USR_LIKE
SMI_USR_PATENT
SMI_USR_PBLCTN
SMI_USR_PHONE
SMI_USR_PROFILE
SMI_USR_SKILL
SMI_USR_WORKEXP
The pre-installed Python Interpreter from the SAP HANA client is used to execute a Python script from SAP HANA Studio.
To run the script, you will also need to make a few customizing and configuration settings in order to use the Pydev Plugin in SAP HANA Studio.
Prerequisites
Make sure that the following prerequisites are met before you start out :
• Installation of SAP HANA Studio and SAP HANA Client
Install SAP HANA Studio and SAP HANA Client and apply for a HANA user with Read, Write and Update authorization for foundation database tables listed in the previous section.
Setup
1. Configuring Python in SAP HANA Studio Client
Python version 2.6 is already embedded in SAP HANA client, so you do not need to install Python from scratch. To configure Python API to connect to SAP HANA, proceed as follows.
1. Copy and paste the following files from C:\Program Files\SAP\hdbclient\hdbcli to C:\Program Files\SAP\hdbclient\Python\Lib
a. _init_.py
b. dbapi.py
c. resultrow.py
2. Copy and paste the following files from C:\Program Files\SAP\hdbclient toC:\Program\Files\SAP\hdbclient\Python\Lib
a. pyhdbcli.pdb
b. pyhdbcli.pyd
Note:
In Windows OS, by default the installation path is C:\Program Files\SAP\.. for a 64 bit installation SAP HANA Studio and SAP HANA Database client
If you opted for a 32 bit Installation, the default path is C:\Program Files(x86)\sap\..
2. Setting up the Editor to run the file
2.1. Install Pydev plugin to use Python IDE for Eclipse
The preferred method is to use the Eclipse IDE from SAP HANA Studio. To be able to run the python script, you first need to install the Pydev plugin in SAP HANA Studio.
a. Open SAP HANA Studio. Click HELP on menu tab and select Install New Software
b. Click the button Add and enter the following information
Name : pydev
Location : http://pydev.org/updates
c. Select the settings as shown in this screenshot.
d. Press Next twice
e. Accept the license agreements, then press Finish.
f. Restart SAP HANA studio.
In SAP HANA studio, carry out the following steps:
a. Select the menu entries Window -> Preferences
b. Select PyDev -> Interpreters -> Python Interpreter
c. Click New button, type in an Interpreter name. Enter in filed Interpreter Executable the following executable file C:\Program Files\hdbclient\Python\Python.exe. Press OK twice.
2.3. Create a Python project
In SAP HANA Studio, carryout the following steps:
a. Click File -> New -> Project, then select Pydev project
b. Type in a project name, then press Finish
c. Right-click on your project. Click New -> File, then type your file name, press Finish.
Customizing and Running the Script
1. Customizing the python script
Copy and paste the below provided code into the newly created python file. Enter the values for the below parameters in the file.
a. server – HANA server name (Ex : lddbq7d.wdf.sap.corp)
b. port – HANA server port
c. username_hana – HANA server username
d. password_hana – HANA server password
e. schema – schema name
f. client – client number
g. count - number of users for which the records shall be created
import sys, dbapi
from time import strftime
from random import randint, choice
#Returns prefix + ndigits
def randomN(prefix, ndigits):
range_start = 10**(ndigits-1)
range_end = (10**ndigits)-1
return prefix + str(randint(range_start, range_end))
def get_patent_pub_name():
part1 = choice(['Decomposistion', 'Sel-focusing', 'Ground-based', 'Process', 'Method', 'System', 'Apparatus'])
part2 = choice(['of', 'and', 'for', 'in'])
part3 = choice(['Carbon dioxide', 'Oxygen', 'Nitrogen', 'Hydride', 'Peroxide', 'Ultraviolet radiation', 'Light' ,'molecule'])
part4 = choice(['conversion', 'generation', 'mixture', 'container', 'dispenser'])
return ' '.join([part1, part2, part3, part4])
# def random_date(start, end):
# return start + timedelta(seconds=randint(0, int((end - start).total_seconds())))
server = 'lddbbfi.wdf.sap.corp'
port = 30215
username_hana = ''
password_hana = ''
schema = 'SAPBFI'
client = '001'
#This is the number of users for which records shall be created
count = 5
hdb_target = dbapi.connect(server, port, username_hana, password_hana)
cursor_target = hdb_target.cursor()
profile_sql = 'upsert ' + schema + '.SMI_USR_PROFILE(CLIENT, DATAPROVIDERNAME, USERIDINDATAPROVIDER, FIRSTNAME, LASTNAME, NICKNAME, PHOTOURL, PROFILEURL, AGE, GENDER, BIRTHDAY, BIRTHMONTH, BIRTHYEAR, COUNTRY, STATE, CITY, ADDRESS, BIO, THUMBNAILURL, ZIP, PROXYEMAIL, LANGUAGE, HONORS, PROFESSIONALHEADLINE, INDUSTRY, SPECIALITIES, RELIGION, INTERESTEDIN, RELATIONSHIPSTATUS, HOMETOWN, FOLLOWERSCOUNT, FOLLOWINGCOUNT, USERNAME, NAME, LOCALE, ISVERIFIED, USERTIMEZONE, EDUCATIONLEVEL) values (?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?) with primary key'
identity_sql = 'upsert ' + schema + '.SMI_USR_IDENTITY(CLIENT, DATAPROVIDERNAME, USERIDINDATAPROVIDER, COUNTER, SOCIALMEDIACHANNEL, SOCIALUSER, ISLOGINIDENTITY, NICKNAME, ISALLOWEDFORLOGIN, ISEXPIREDSESSION, LASTLOGINTIMESTAMP_UTC, PHOTOURL, THUMBNAILURL, FIRSTNAME, LASTNAME, GENDER, AGE, BIRTHDAY, BIRTHMONTH, BIRTHYEAR, EMAIL, COUNTRYCODE, STATE, CITY, ZIP, PROFILEURL, PROXIEDEMAIL, ADDRESS, LANGUAGES, PROFESSIONALHEADLINE, BIO, INDUSTRY, SPECIALITIES, RELIGION, POLITICALVIEW, INTERESTEDIN, RELATIONSHIPSTATUS, HOMETOWN, FOLLOWERSCOUNT, FOLLOWINGCOUNT, USERNAME, LOCALE, ISVERIFIED, USERTIMEZONE) values (?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?) with primary key'
account_sql = 'upsert ' + schema + '.SMI_USR_ACCOUNT(CLIENT, DATAPROVIDERNAME, USERIDINDATAPROVIDER, USERIDSIGNATURE, SIGNATURETIMESTAMP_UTC, SOCIALMEDIACHANNEL, ISUSERREGISTERED, USERREGSTRDTIMESTAMP_UTC, ISUSERACCOUNTVERIFIED, USERACCNTVERIFIEDTIMESTAMP_UTC, ISUSERACCNTACTIVE, ISUSERACCNTLOCKEDOUT, INFLUENCERRANK, LASTLOGINLOCATION_COUNTRYCODE, LASTLOGINLOCATION_STATE, LASTLOGINLOCATION_CITY, LASTLOGINLOCATION_LATITUDE, LASTLOGINLOCATION_LONGITUDE, OLDESTDATAUPDATEDTIMESTAMP_UTC, ACCOUNTCREATEDTIMESTAMP_UTC, REGISTRATIONSOURCE) values (?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?) with primary key'
patent_sql = 'upsert ' + schema + '.SMI_USR_PATENT(CLIENT, PATENT_UUID, TITLE, DATAPROVIDERNAME, USERIDINDATAPROVIDER, SUMMARY, PATENTNUMBER, PATENTOFFICE, STATUS, PATENTDATE, PATENTURL) values (?,?,?,?,?,?,?,?,?,?,?) with primary key'
education_sql = 'upsert ' + schema + '.SMI_USR_EDCTN(CLIENT, EDU_UUID, DATAPROVIDERNAME, USERIDINDATAPROVIDER, SCHOOL, SCHOOLTYPE, FIELDOFSTUDY, DEGREE, STARTYEAR, ENDYEAR) values (?,?,?,?,?,?,?,?,?,?) with primary key'
workexp_sql = 'upsert ' + schema + '.SMI_USR_WORKEXP(CLIENT, WORK_UUID, DATAPROVIDERNAME, USERIDINDATAPROVIDER, COMPANY, COMPANYID, WORK_TITLE, COMPANYSIZE, WORK_STARTDATE, WORK_ENDDATE, WORK_INDUSTRY, ISCURRENTCOMPANY) values (?,?,?,?,?,?,?,?,?,?,?,?) with primary key'
favorite_sql = 'upsert ' + schema + '.SMI_USR_FAVORITE(CLIENT, FAV_UUID, DATAPROVIDERNAME, USERIDINDATAPROVIDER, TYPE, NAME, CATEGORY) values (?,?,?,?,?,?,?) with primary key'
skill_sql = 'upsert ' + schema + '.SMI_USR_SKILL(CLIENT, SKILL_UUID, DATAPROVIDERNAME, USERIDINDATAPROVIDER, SKILL, SKILL_LEVEL, SKILL_YEARS) values (?,?,?,?,?,?,?) with primary key'
phone_sql = 'upsert ' + schema + '.SMI_USR_PHONE(CLIENT, PHONE_UUID, DATAPROVIDERNAME, USERIDINDATAPROVIDER, PHONETYPE, PHONENUMBER) values (?,?,?,?,?,?) with primary key'
publication_sql = 'upsert ' + schema + '.SMI_USR_PBLCTN(CLIENT, PBLCTN_UUID, DATAPROVIDERNAME, USERIDINDATAPROVIDER, PBLCTN_TITLE, PBLCTN_SUMMARY, PUBLISHER, PUBLICATIONDATE, PUBLICATIONURL) values (?,?,?,?,?,?,?,?,?) with primary key'
like_sql = 'upsert ' + schema + '.SMI_USR_LIKE(CLIENT, LIKE_UUID, DATAPROVIDERNAME, USERIDINDATAPROVIDER, NAME, CATEGORY, ID, LIKECREATIONTIMSTAMP_UTC) values (?,?,?,?,?,?,?,?) with primary key'
cert_sql = 'upsert ' + schema + '.SMI_USR_CRTFCT(CLIENT, CERT_UUID, DATAPROVIDERNAME, USERIDINDATAPROVIDER, CERT_NAME, AUTHORITY, CERT_NUMBER, CERT_STARTDATE, CERT_ENDDATE) values (?,?,?,?,?,?,?,?,?) with primary key'
channel_list = ['TW', 'FB','BLOG']
men_names = ['Mohan', 'Suresh', 'Salman', 'Nivin', 'Jayasurya', 'Vijay', 'Prabhas', 'Fahad', 'Fazil', 'Asif', 'Prithviraj', 'Muhammed', 'Shankar', 'Rajni', 'Ajith', 'Surya', 'Kamal']
women_names = ['Mamta', 'Kavya', 'Sindhu', 'Shriya', 'Trisha', 'Tabu', 'Simran', 'Meena', 'Asin', 'Kareena', 'Vidya', 'Sonakshi', 'Aiswarya', 'Preity', 'Namita', 'Sherin', 'Shamna', 'Miya' ,'Sruthy']
countrycodes = ['IN', 'DE', 'FR', 'US', 'CH', 'IT', 'RU']
professionalheadlines = ['Data mining Expert', 'Career consultant', 'Programming Guru', 'Final word in English Grammar', 'Wildlife Explorer', 'Geologist', 'Writer, Director', 'Singer, Actor', 'Expert Sculptor', 'Master in Physics', 'Astronomy Rockstar', 'Social Science Guru']
for i in range(count):
dataprovidername = 'GIGYA'
useridindataprovider = guid = randomN('_guid_', 29)
counter = '1'
socialmediachannel = choice(channel_list)
socialuser = str(randint(111111111111111111, 999999999999999999))
isloginidentity = 't'
gender = choice(['1', '2'])
if gender == '1':
firstname = choice(men_names)
lastname = choice(women_names)
nickname = firstname[:3].lower() + '_' + choice(['star', 'therock', 'blazing', 'ismyelf', 'rocks', 'theking', 'kingest', 'royal', 'crazy', 'rider', 'fiery']) + str(randint(222, 999))
else:
firstname = choice(women_names)
lastname = choice(men_names)
nickname = firstname[:3].lower() + '_' + choice(['star', 'barbie', 'blazing', 'ismyelf', 'rocks', 'thequeen', 'queenest', 'royal', 'crazy', 'beauty', 'girl']) + str(randint(222, 999))
isallowedforlogin = 't'
isexpiredsession = 'f'
lastlogintimestamp_utc = 0
photourl = 'http://www.' + socialmediachannel.lower() + '.com/photo/' + socialuser
thumbnailurl = 'http://www.' + socialmediachannel.lower() + '.com/thumbnail/' + socialuser
age = str(randint(18, 90))
birthday = ''
birthmonth = ''
birthyear = ''
email = nickname + '@' + choice(['gmail', 'yahoo', 'mail', 'hotmail']) + '.com'
countrycode = choice(countrycodes)
state = 'test'
city = 'test'
zip = str(randint(2222222, 9999999))
profileurl = 'http://www.' + socialmediachannel.lower() + '.com/' + socialuser
proxiedemail = 'test'
address = 'test'
languages = 'test'
professionalheadline = choice(professionalheadlines)
bio = 'test'
honors = 'test'
industry = 'test'
specialities = 'test'
religion = 'test'
politicalview = 'test'
interestedin = choice(['1', '2'])
relationshipstatus = ''
hometown = 'test'
followerscount = str(randint(0, 500))
followingcount = str(randint(0, 500))
username = firstname + socialuser
locale = choice(['en_US', 'en_UK', 'en_IN'])
isverified = choice(['t', 'f'])
usertimezone = 'test'
educationlevel = 'test'
profile_record = (client, dataprovidername, useridindataprovider, firstname, lastname, nickname, photourl, profileurl, age, gender, birthday, birthmonth, birthyear, countrycode, state, city, address, bio, thumbnailurl, zip, proxiedemail, languages, honors, professionalheadline, industry, specialities, religion, interestedin, relationshipstatus, hometown, followerscount, followingcount, username, username, locale, isverified, usertimezone, educationlevel)
cursor_target.execute(profile_sql, profile_record)
identity_record = (client, dataprovidername, useridindataprovider, counter, socialmediachannel, socialuser, isloginidentity, nickname, isallowedforlogin, isexpiredsession, lastlogintimestamp_utc, photourl, thumbnailurl, firstname, lastname, gender, age, birthday, birthmonth, birthyear, email, countrycode, state, city, zip, profileurl, proxiedemail, address, languages, professionalheadline, bio, industry, specialities, religion, politicalview, interestedin, relationshipstatus, hometown, followerscount, followingcount, username, locale, isverified, usertimezone)
cursor_target.execute(identity_sql, identity_record)
useridsignature = 'test1'
signaturetimestamp_utc = '123'
isuserregistered = choice(['t', 'f'])
userregstrdtimestamp_utc = '123'
isuseraccountverified = choice(['t', 'f'])
useraccntverifiedtimestamp_utc = '123'
isuseraccntactive = choice(['t', 'f'])
isuseraccntlockedout = choice(['t', 'f'])
influencerrank = str(randint(0, 101))
lastloginlocation_countrycode = countrycode
lastloginlocation_state = 'test'
lastloginlocation_city = 'test'
lastloginlocation_latitude = '123'
lastloginlocation_longitude = '123'
oldestdataupdatedtimestamp_utc = '123'
accountcreatedtimestamp_utc = '123'
registrationsource = 'test'
account_record = (client, dataprovidername, useridindataprovider, useridsignature, signaturetimestamp_utc, socialmediachannel, isuserregistered, userregstrdtimestamp_utc, isuseraccountverified, useraccntverifiedtimestamp_utc, isuseraccntactive, isuseraccntlockedout, influencerrank, lastloginlocation_countrycode, lastloginlocation_state, lastloginlocation_city, lastloginlocation_latitude, lastloginlocation_longitude, oldestdataupdatedtimestamp_utc, accountcreatedtimestamp_utc, registrationsource)
cursor_target.execute(account_sql, account_record)
num_of_patents = randint(1, 5)
for i in range(num_of_patents):
patent_uuid = randomN('patent_id', 12)
title = get_patent_pub_name()
summary = 'This patent is about the ' + title
patentnumber = str(randint(222222222, 999999999))
patentoffice = 'Patent office-' + countrycode
status = choice(['Awarded', 'Submitted', 'Under scrutiny', 'Declined', 'Application received'])
patentdate = ''
patenturl = 'https://www.' + patentoffice + '.com/patents/' + patentnumber
patent_record = (client, patent_uuid, title, dataprovidername, useridindataprovider, summary, patentnumber, patentoffice, status, patentdate, patenturl)
cursor_target.execute(patent_sql, patent_record)
pblctn_uuid = randomN('publctn_id', 12)
pblctn_title = title
pblctn_summary = choice(['A work on ', 'A write up on ', 'Book about ', 'Article: ', 'Book: ']) + title
publisher = choice(['Mondadori', 'Bonnier', 'ThomsonReuters', 'Harper Collins', 'Oxford', 'Wiley', 'O\'reily', 'Shogakukan', 'Informa', 'Simon & Schuster', 'Pearson', 'Saraiva', 'Sanoma', 'Cambridge University Press'])
publicationdate = ''
publicationurl = 'https://www.' + publisher.replace(' ', '') + '.com/' + pblctn_title.replace(' ', '')
publication_record = (client, pblctn_uuid, dataprovidername, useridindataprovider, pblctn_title, pblctn_summary, publisher, publicationdate, publicationurl)
cursor_target.execute(publication_sql, publication_record)
cert_uuid = randomN('cert_id', 12)
cert_name = title
cert_number = str(randint(23423423,345345345))
authority = choice(['Mondadori', 'Bonnier', 'Harper Collins', 'Wiley', 'O\'reily', 'Shogakukan', 'Informa', 'Pearson', 'Saraiva', 'Sanoma', 'Cambridge University'])
cert_startdate = ''
cert_enddate = ''
cert_record = (client, cert_uuid, dataprovidername, useridindataprovider, cert_name, authority, cert_number, cert_startdate, cert_enddate)
cursor_target.execute(cert_sql, cert_record)
edu_uuid = randomN('edu_id', 12)
school = choice(['PES Institute of Technology', 'Bangalore University', 'IIT Madras', 'NIT Calicut', 'Government Engg college, Thrissur', 'VIT','MIT', 'MSRIT', 'RVCE', 'UVCE'])
schooltype = choice(['Engineering', 'Technical Education', 'Higher studies', 'Advanced studies'])
fieldofstudy = choice(['Computer Science', 'Electronics and Communication', 'Civil engineering', 'Mechanical Engineering', 'Electrical Engineering', 'Production engineering'])
degree = choice(['B.Tech', 'MS', 'M.Tech', 'BS', 'B.Sc', 'M.Sc'])
startyear = str(randint(2000, 2010))
endyear = str(int(startyear) + 4)
education_record = (client, edu_uuid, dataprovidername, useridindataprovider, school, schooltype, fieldofstudy, degree, startyear, endyear)
cursor_target.execute(education_sql, education_record)
work_uuid = randomN('work_id', 12)
company = choice(['SAP Labs India', 'IBM', 'CISCO', 'Microsoft', 'Google', 'Yahoo', 'Housing', 'Wipro', 'Infosys', 'TCS'])
companyid = randomN(company[:3], 9)
work_title = choice(['Senior developer', 'Developer Associate', 'Programmer', 'Coder', 'Hacker', 'Software Engineer', 'Data expert', 'Web developer', 'System programmer', 'UI Expert', 'Quality Assurance', 'Knowledge Management', 'Architect', 'Team Lead'])
companysize = str(randint(5000, 500000))
work_startdate = ''
work_enddate = ''
work_industry = 'Software'
iscurrentcompany = choice(['X', ''])
workexp_record = (client, work_uuid, dataprovidername, useridindataprovider, company, companyid, work_title, companysize, work_startdate, work_enddate, work_industry, iscurrentcompany)
cursor_target.execute(workexp_sql, workexp_record)
num_of_skills = randint(1, 10)
for i in range(num_of_skills):
skill_uuid = randomN('patent_id', 12)
skill = choice(['Algorithms', 'Analytics', 'Android', 'Applications', 'Blogging', 'Business', 'Business Analysis', 'Business Intelligence', 'Business Storytelling', 'Content Management', 'Content Marketing', 'Content Strategy', 'Data Analysis', 'Data Analytics', 'Data Engineering', 'Data Mining', 'Data Science', 'Data Warehousing', 'Database Administration', 'Database Management', 'Digital Marketing', 'Hospitality', 'Human Resources', 'Information Management', 'Information Security', 'Legal', 'Leadership ', 'Management', 'Marketing', 'Market Research', 'Media Planning', 'Microsoft Office Skills', 'Mobile Apps', 'Mobile Development', 'Network and Information Security', 'Newsletters', 'Online Marketing', 'Presentation', 'Project Management', 'Public Relations', 'Recruiting', 'Relationship Management', 'Research', 'Risk Management', 'Search Engine Optimization', 'Social Media', 'Social Media Management', 'Social Networking', 'Software', 'Software Engineering', 'Software Management', 'Strategic Planning', 'Strategy', 'Technical', 'Training', 'UI / UX', 'User Testing', 'Web Content', 'Web Development', 'Web Programming', 'WordPress', 'Writing'])
skill_level = choice(['Beginner', 'Medium', 'Advanced', 'Expert'])
skill_level_years_dict = {'Beginner': 0, 'Medium': 4, 'Advanced': 10, 'Expert': 20}
skill_years = skill_level_years_dict[skill_level]
skill_record = (client, skill_uuid, dataprovidername, useridindataprovider, skill, skill_level, skill_years)
cursor_target.execute(skill_sql, skill_record)
fav_uuid = randomN('work_id', 12)
type = ''
name = choice(['Eminem', 'Metallica', 'Led Zeppelin', 'Mother Jane', 'Avial', 'Lamb of God', 'Nirvana'])
category = 'Music'
favorite_record = (client, fav_uuid, dataprovidername, useridindataprovider, type, name, category)
cursor_target.execute(favorite_sql, favorite_record)
like_uuid = randomN('like_id', 12)
type = ''
name = choice(['Eminem', 'Metallica', 'Led Zeppelin', 'Mother Jane', 'Avial', 'Lamb of God', 'Nirvana'])
category = 'Music'
id = randomN('id', 7)
likecreationtimstamp_utc = '123'
like_record = (client, like_uuid, dataprovidername, useridindataprovider, name, category, id, likecreationtimstamp_utc)
cursor_target.execute(like_sql, like_record)
phone_uuid = randomN('work_id', 12)
phonetype = choice(['mobile', 'telephone'])
phonenumber = str(randint(9132323154, 9947931930))
phone_record = (client, phone_uuid, dataprovidername, useridindataprovider, phonetype, phonenumber)
cursor_target.execute(phone_sql, phone_record)
hdb_target.commit()
print('Done pushing data for ' + str(count) + ' users into ' + server + '!')
2. Run the script from your editor
3. Checking the Results in the database tables.
The script randomly chooses values for various fields from a specified set of values. For example:
countrycode will be chosen randomly from the list ['IN', 'DE', 'FR', 'US', 'CH', 'IT', 'RU'].
These lists can be modified as per the requirement for the demo.
Over and out! :smile:
Related Blog posts:
Demo Social and Sentiment data generation using Python script
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
User | Count |
---|---|
5 | |
2 | |
1 | |
1 | |
1 | |
1 | |
1 | |
1 | |
1 | |
1 |