March is an exciting time for people all around the world – for many, it means winter is melting into spring and Easter (along with the requisite holidays, chocolate eggs, and of course bunnies) are just around the corner. For our hardworking SAP Predictive Analytics team, it is also a time of celebration as we have just announced the general availability of SAP Predictive Analytics 2.5!
While every release is very special to us, this one is particularly sweet because it introduces some features that our team has been working on for quite some time. In addition to many product enhancements, optimizations, and new features (see the full “What’s New”, I’d like to highlight a few of the real “biggies” for SAP PA 2.5:
Native Spark Modeling
The thing about Big Data is that… well… it’s BIG… Zillions of rows are great, but where Big Data becomes really interesting is when the data become really wide (i.e. a large number of columns). How would you end up with thousands of columns? Easy. Take for example, the complexities of an airplane jet engine – and how many sensors it has to measure everything for engine bearings, temperature, and so on. Now imagine those tens of thousands of sensors (each represented by a single column) being read every five seconds for the duration of the flight – multiplied by the number of engines. That’s terabytes of wide data per hour.
Extracting that amount of data for analysis is simply not feasible because many databases can’t even handle that number of columns, and even if they could, the time required for the analysis may make the results meaningless. The Native Spark Modeling features in SAP Predictive Analytics 2.5 (sometimes called “In Database Modeling/IDBM in the interface) delegates the predictive modeling processing to Spark on Hadoop and the data transfer is avoided between the predictive engine and the data source.
Native Spark Modelling provides the following benefits when analyzing data using Spark on Hadoop:
Processing closer to the data source - reducing expensive I/O.
Faster response times – training models in less time to enable you to do more.
Higher scalability – create more models and use wider datasets than ever before.
Better CPU utilization – reduce costs and increase operational efficiency.
Easier access to Big Data – now business analysts can work with Hadoop without Spark coding skills.
As data volumes increase, we have an even greater ability to find even smaller patterns in the data. However some events in the data happen so infrequently, it is sometimes hard to determine if a predictive pattern for the pattern exists or if there is coincidental “noise” in the “signal”. Take for example a jet engine again: Thankfully jet engines fail very infrequently, but this presents a huge problem in predictive maintenance scenarios because what we are trying to do is find a pattern within the data that could have “predicted” the failure so we can try to prevent the next one. The consequence of finding a pattern in random data instead of a true set of factors for the failure is potentially a very expensive and unnecessary engine servicing that could not only cost millions, but could also ground the plane it is mounted on.
SAP Predictive Analytics 2.5 has an improved ability to help in these “rare event” cases by generating a predictive model only when there is sufficient indication that the model can be trusted. If the system determines the generated model cannot be trusted with enough confidence, it will alert you rather than providing a potentially inadequate model.
IP Protection for Partner Extensions in R
One of the more attractive aspects of the open source language “R” is the ability to easily share and obtain predictive algorithms and libraries. While the exact number changes all the time, there are currently over 6000+ R libraries freely available today. Why so many? Data scientists sometimes create their own algorithms from scratch or modify existing ones to solve specific problems of an industry, target data source, or even a single customer. In these cases, the creator may not want to share their work as it may either represent a competitive advantage over other companies, or it may be part of their own intellectual property that they wish to protect from being easily viewed or edited (unless they are paid for it!).
SAP Predictive Analytics 2.5 now includes a feature to create R extensions that “encrypted in transit”, meaning they can be transported and used by others without disclosing the recipe to their secret sauce. Now, customers and partners are able to invest in their R extensions while preserving their intellectual property. The SAP Analytics Extensions Directory also allows our partners to distribute and even monetize their extensions through an SAP-managed portal that can be directly launched from within the SAP Predictive Analytics interface.
More to Come (Soon!)
As exciting as March is for our Product Team, we’re driving really hard towards Q2 because we’ve got some great things lined up for SAPPHIRE NOW that will be in Orlando between May 17-19, 2016. At the conference, we are also planning a number of ASUG Educational Sessions that will not only give you roadmap information, demo scenarios, and deep dive details, but also some exciting news about where the future of SAP Predictive Analytics is headed – so make sure you attend if you can!