SAP Cloud Data Warehouse fights the twin problems of data sprawl and data entropy with a more efficient, agile, and easy-to-understand approach to managing distributed data sets.
Imagine a library that houses a vast collection of books—centuries of print from across the globe. Now imagine that this collection is housed not under one stately roof, but across dozens of smaller buildings spread out over thousands of miles; and instead of one centralized store that allows users to locate and access the books they need, each building maintains its own store with its own cataloging and storage logic. What started as a scholar’s dream—an abundance of useful information—quickly became a nightmare—none of it easily accessible, most of it just out of reach. To make things even more complicated, the books are written in multiple languages.
You need a way to order that information so that it is easy to search for the book you want—without having to know how any of the details about how the book is catalogued and stored within the building. That is the situation in which many companies find themselves today — rich with information but unable to make the best use of that wealth. The phenomenon is called data sprawl.
Sprawl Contributes to Entropy
Data sprawl often accompanies data entropy. As the value and relevance of data diminish (that’s data entropy), the data itself tends to move to the simplest and lowest-cost storage. The less value the data has, the farther from the enterprise it is typically situated. Higher value data, on the other hand, is most often located on-premise or on a cloud with a dedicated network connection back to the enterprise.
Click log data, which encodes how and what customers look for online and on their mobile applications, is usually locked in cloud block-storage systems. IoT data, which encodes how customers use products, is often tied up in distributed databases and/or in thousands of edge databases. And to complicate matters further, this abundance of data is not created equal. Only a small percentage of available data is structured—such as sales data, customer data, and master data—and standardized in a format that makes it easily searchable and quick to analyze. Most of today’s data volumes come from previously untapped, semi-structured and unstructured sources, often in the cloud, which encode important insights organizations can use to remain competitive. The largest data volumes come from video and image files, but data created from machines and application usage is trying to catch up.
More Data, More Challenges
Businesses are accumulating more data than ever before, which should allow them to be more efficient, more flexible, and more profitable; however, companies report quite the opposite is happening. The abundance of data brings not only infinite opportunities, but infinite challenges. We know that companies want and need a consistent view of their operations, customers, suppliers and partners. The data to create that view exists, but as it becomes more difficult to locate and access, companies feel that they are losing their ability to understand not only their customers but also their own businesses.
Without a comprehensive, secure, logical, and efficient data management system—a heart to keep the blood flowing to all corners of a business—essential data can become useless or, worse, a drain on an enterprise as it struggles to access and interpret its mountains of increasingly complex and varied information. A traditional on-premise enterprise data warehouse used to be the defining solution for analyzing data. While never perfect, these systems covered the basics. But as data has multiplied and expanded with the addition of new systems (departmental, personal, machine, mobile) and through the adoption of cloud solutions and other line-of-business uses, data has quickly spread beyond the four physical walls of the organization—creating a modern data landscape that is difficult to manage.
Given the sheer complexity and volume of data that businesses now have access to, the notion of an enterprise data warehouse that can easily store top line aggregate data for analysis has been rendered moot; companies know they can’t go back to a 100% on-premise data management system. Other vendors claim to offer an easy solution—“bring all of your data to our cloud, and we’ll provide the insight.” As appealing as this one-stop cloud solution may sound in theory, it is not tenable in practice. Bandwidth between clouds is slow and high outbound data movement costs are exorbitant. This is not a viable solution for increasingly real-time and cost-conscious IT departments.
A New Approach
Solving the problem of data entropy calls for a solution that efficiently leverages vast amounts of data that can be easily analyzed. Organizations must be able to determine the location of the information they need at any given time, while moving the smallest possible amount of data between sources.
Not all cloud applications are designed to surrender their data easily, creating islands of data isolation that deprive knowledge workers of valuable insights outside their walls. Standard enterprise applications like HR or CRM are now built with distributed key value stores and shared OLTP databases, making traditional SQL access impossible. Even internally developed customer applications are now often built on NoSQL stores, which make application development faster, but at a cost. Internally developed custom applications are designed to solve a specific set of business problems, not to ensure that the underlying data is readily available to be harmonized with other corporate information.
End users and analysts often have unrealistic expectations that data from these types of applications (or external channels) will be easily accessible for analysis and reporting. However, it is not easy to wrangle the various types of non-standardized data from the cloud and other external data channels. As unstructured information continues to proliferate rapidly, more and more companies struggle to cope with the sprawl. Business cloud applications are typically not designed to make it easy to export data, but rather to increase opportunities to create and charge for analytic services. This leads to customers lamenting that their understanding of their businesses is, in fact, decreasing.
Data Assets and Digital Transformation
While data entropy encompasses the difficulty of locating disparate data sources, data sprawl focuses on the accessibility and quality of that data. Data sprawl and entropy are a threat to the promise of digital transformation, and yet they are the direct result of embracing cloud applications, IoT, and quickly built internal applications—all pillars of digital transformation. Companies need a solution that offers adapters to all data sources and data types, in order to properly access, query, and process data efficiently and to do this in a highly secure, reliable, and governed way.
Data lifecycle management is another factor in considering how to address data sprawl. As a rule, data is referenced less often as it ages, although certain data remains crucial no matter how old it is. (A good example is master data.) Data is an asset, and its value and quality must be tracked over time. Access frequency is one critical proxy for value, but there are others. These indicators must be accounted for in order to understand which data should be protected and stored in hot stores, like SAP HANA, and which data can be moved to other solutions. Data must be managed throughout its lifecycle and moved to cost- and security-appropriate locations as its value declines. On a per-country basis, data must be archived, and ultimately destroyed, to both meet regulations and limit the company’s legal exposure.
The “Data Age” is upon us, and data sprawl is here to stay. Companies need agile solutions to help them navigate the challenges presented by that sprawl. Solutions that will allow companies to reassert control over their data—to use data however they need it, whenever they need it, and wherever they need it.