Technology Blogs by Members
Explore a vibrant mix of technical expertise, industry insights, and tech buzz in member blogs covering SAP products, technology, and events. Get in the mix!
Showing results for 
Search instead for 
Did you mean: 
Active Contributor

This was an ASUG webcast from last month and I finally got around to watching it, to obtain a better understanding of Hadoop and Big Data.  SAP provided this webcast.

If you have SAP solutions, what does it mean to use Hadoop?  That was the topic of this webcast.  This webcast covered the CIO Guide on Big Data “How to Use Hadoop ... | SAP HANA

Figure 1: Source: SAP

The SAP speaker reviewed the Gartner definition of big data

High volume means hundreds of TB or petabytes

High velocity is where the data arrives rapidly

Variety includes SAP system, social media, and other types

You want it to give you better insight

Differences between HANA and Hadoop

Figure 2: Source: SAP

Hadoop can run over several servers

It is open source, which is lower cost.

Hadoop is designed to run on commodity servers -  you don’t need a server with higher reliability

Figure 3: Source: SAP

It uses a Map Reduce Programming model, which the speaker said is simple to use.

The first phase is to select phase, and the second phase combines the results

This allows it to scale in volume

Hadoop is slower than a conventional relational database and even more slower that HANA

Figure 4: Source: SAP

On the bottom of Figure 4 is the Data Storage, Hadoop Distributed File System, which can store any type of data that you can think of and any volume – 500 TB or more.

On top of that is the computation engine to process the data

It works opposite of relational database where you define it, clean, correct, then load to relational database – it takes time

In Hadoop you take the raw data and load it to Hadoop and then use it.

Figure 5: Source: SAP

Figure 5 shows the Hadoop ecosystem

It shows two computation engines.  Hive is not a full SQL.

HBase can be used to access piece of data with a key to retrieve

Next the speaker covered HANA which has been covered here on SCN before.

Figure 6: Source: SAP

Figure 6 compares the three ways of looking at the data.

Relational database is good for solving problems but if you want OLTP and OLAP in real time, then SAP says HANA is good, as long as you don’t have too much data.

Hadoop can handle any type of data but what you can’t do OLTP with Hadoop.  It can’t be a substitute for relational database.  It can handle large volumes at low cost.

Businesses will need all three, says SAP.  It is not a question of “HANA or Hadoop”  it is HANA and Hadoop.

My next blog, part 2 of this ASUG webcast, will cover HANA and Hadoop, Key Scenarios.

Labels in this area