Introduction
There are a number of wonderful blog posts here on the
SAP Community talking about how you can get started with
SAP HANA Cloud Document Store, and what you can do with it
For this blog post, I wanted to meet
absolute beginners where they are:
- Introducing the design considerations behind using the more familiar Relational (Row and Column) Stores
- Discussing where we might run into challenges when using Relational Stores
- Looking at where a Document Store might help
- Explaining at a high-level what SAP HANA Cloud Document Store is
This blog post is intended to be a high level introduction - it won't address all design considerations, and will simplify where able. Those looking for more
in-depth technical articles can find some recommendations under
Other Resources at the end of this post
Relational Stores Recap
When working with a database using
Relational Stores, we want to start by deciding how we'll organise and structure our data. This structure is called a
Data Model
To recap a very simplified
Relational Data Model, let's look at how we might store information about a book
Our simplified book data
You can describe this book in many other ways - it may be brown, it may be sold in a hardcover, and it may be 388 pages long
We only want to store information that will be useful to our business. For our example, the list above will be enough
With an understanding of which information we care about, we know what we'll need to store in a
Database Table. Someone looking at this problem for the first time might sketch out a
table as below
An initial table
A book's
ISBN is unique - each
ISBN refers to exactly one published book. This uniqueness lets us use the
ISBN as something called a
Primary Key (indicated by the key symbol)
By looking at our
Primary Key Column (in this case
ISBN), we can tell which book each row of data is about
Without knowing better we deploy our table in a database and people begin using it. Before long, our colleagues come to us with issues
Challenges with "One Central Table"
A colleague wants to search for
all books written by Cameron Swift. Except it turns out there's more than one person in the world called Cameron Swift
To help tell them apart, we add an
Author ID that is unique to each
Author. While we're at it, we give our
Publishers a
Publisher ID too
Adding IDs to our Table
A second colleague wants to enter
a book with two Authors.In theory, we could add
another set of Author ID and Name columns. Unfortunately, that won't help us when we need to enter a book with
three Authors, let alone four or more Authors
To help us store this data, we start by separating the data into multiple
tables. We move the
Author and
Publisher data out into their own
tables
Splitting our data into Books, Authors and Publishers Tables
As we can see above, the
Author ID and
Publisher ID also uniquely identify each row in our
Authors and
Publishers tables. These are the
Primary Keys for those tables, in the same way that
ISBN is for our
Books table
"Relational" Data
There's no point storing our data across different tables unless we know how they're linked so we can get it back out again
This is where the term
Relational Data comes from - in order to make sense of the data, we need to understand our
tables and the
Relationships between those
tables
The way we represent this relationship is by adding the
Primary Key from one
table as a
column in another
table
As an example, we're going to link the
Publishers table to our
Books table by including the primary key for our
Publisher table (the
Publisher ID) as a column in our
Books table
Linking our Books and Publishers Tables
By matching up the
Publisher ID columns from our
Books and Publishers tables, we can now easily find out the publisher for each book. Now that we understand how this works for
Publishers, we can look at our
Authors
Including Multiple Authors
When we think about including the
Author IDs inside our book table, it may sound like we have the same problem we started with. As we showed earlier, we can't include multiple
Author IDs in our
Books table, because there can be any number of
Authors (with the exception of zero, because books don't write themselves)
We need to create a
third table to link our
Books and
Authors tables. We'll refer to this as a
Linking Table. This table is a list of
ISBNs and associated
Author IDs
Our BooksAuthored Table links the Books and Authors Tables
Because a
book can have many
authors and an
author can write many
books, neither column here is
unique. This is by design
Starting with the
ISBN for any given
book, we can then look in our
Linking Table. Each row with a matching
ISBN points to an
author of that book.
Using this approach means we don't have to make any changes to the
structure of our
tables as we add multiple a
uthors. For each additional
author, we just add another row to our
Linking Table
Our Relational Data Structure
Our full data structure for storing this simplified example would look something like this:
Our full data structure
Challenges with Relational Tables
Working through these challenges makes something very clear:
When new business requirements arise, these could require changes in our underlying data model. As we saw above: even with a simplified scenario - changing data models is not a trivial task
We can't change the set of columns for just a single row - all rows within a table use the same set of columns
Business requirements often require changes to our data models
Depending on the scope of the change required, we could be looking at adding columns to a single table (like we did with
Author ID and
Publisher ID), or more complex changes requiring re-mapping our data across a number of tables
In this case, we'll likely need to update the logic that writes to and reads from these tables -
everywhere we use them
Does this mean that storing data in
Relational Stores is a bad idea?
Absolutely not. There are benefits to storing data in this way - we often want to enforce a
known, fixed schema and we want any variation to be considered and approved. This is where
Relational Data storage might be the best fit
Document Stores
On the other hand, there are times where the
fixed approach of predefined columns and tables isn't the best way to store our data
Maybe we want to store
nested data - for example, our books where we can have any number of authors -
without splitting the data between different tables
Perhaps we're receiving data from
Cloud APIs: if we're getting
data from
Ariba's APIs, the data isn't returned to us as entries ready to be stored in a table. Instead, we'll likely get results formatted in something called
JSON (JavaScript Object Notation)
JSON Documents
The term
JSON Document is used to describe either
JSON Objects or
JSON Arrays. We're only going to describe
JSON Objects in this blog post
Inside our
JSON Objects, we have a number of
Key/Value Pairs. For example,
[Book] Title is a
Key, and the
Value for this might be
Document Storage. Let's revisit our earlier example
Relational (table) structure vs JSON Document structure
Above we can see how the properties of a book could be modeled in either a
Table or
JSON Document. Our
JSON Document is an object within curly braces { }, made from a number of
Key/Value Pairs
The JSON Document is
self-describing. That is - if we retrieve the book data, we can check the
keys to tell what each
value represents. If we retrieve a single row of data from a table, the column names aren't included by default
Retrieving Book Data from Relational vs Document Stores
With our small number of
columns, this is workable. Are we going to be able to remember
fifty columns?
A hundred? Nobody has that kind of patience
JSON Collections
Because our
JSON Documents store the structure alongside the data,
we can easily
vary the structure between entries. Unlike
rows stored in a
table, JSON Documents are stored in something called a
JSON Collection
To help us understand this, we can picture each
JSON Document as a physical sheet of paper with our data written on it
Working with a blank piece of paper, we can write our data
exactly as it is - we don't need to stick to a fixed format
Loose sheets of paper aren't great for storage and retrieval, so we store our related documents in a Manila Folder. This folder is our
JSON Collection
JSON Documents and Collections
When you store a
JSON Document, you put it in a related
JSON Collection. When you want to access the
Document later, you retrieve it from the same
Collection
Using
SAP HANA Cloud Document Store, we can do this using a modified SQL Syntax. But that's a story for another, more technical
blog post
Document Storage and Retrieval
SAP HANA Cloud Document Store is an additional feature that can be activated within an existing
SAP HANA Cloud instance - for more information you can read the blog post linked under
Next Steps below
Conclusion
In this blog post we've discussed
Relational Storage using
Tables, including the benefits and limitations of working with a
fixed, known data structure
We've also discussed how we can work with
data that doesn't conform to a fixed structure using
JSON Documents, stored inside
JSON Collections
As always, the practical reality is more complex than the theory. Hopefully this has been a useful high level introduction to when you might use a
Document Store like
SAP HANA Cloud Document Store
I welcome any questions or comments below
Next Steps
For more
Practical Application:
Maybe you'd like to find out how you can activate SAP HANA Document Store within SAP HANA Cloud
Enabling JSON Document Store by swiftc
Perhaps you've already activated SAP HANA Document Store and want to get some experience with inserting and selecting JSON Documents
HANA DocStore First Steps by swiftc
For more
Theory:
You may have heard the term NoSQL Databases before. Document Store is a type of NoSQL Database, but isn't the only type. Perhaps you'd like to learn about another type of NoSQL Database, the Graph Database
SAP HANA Database as a Graph Store - Introduction by poornapragna.malamandisuresh
Other Resources
Enabling JSON Document Store by
swiftc
HANA DocStore First Steps by
swiftc
The Small JSON Document Store Cookbook by
mathias.kemeter
Spotlight: SAP HANA Cloud JSON Document Store by
laura.nevin
The SAP HANA JSON Document Store - Introduction (Part 1) by
kai.mueller01
Ariba Analytics using SAP Analytics Cloud, Data Intelligence Cloud and HANA DocStore - Part 1 by
swiftc
Note: While I am an employee of SAP, any views/thoughts are my own, and do not necessarily reflect those of my employer