on ‎2023 Jul 27 12:44 PM
Hello,
I have a question regarding the SAP analytics cloud. More precisely about the classification in the forecast scenarios.
I have created a classification based on a datasheet that looks like this:

SAC does not detect supposed influencers that are obvious and can be detected with the naked eye.
Example here is the day of the week. SAC says that the packages with the days Wednesday and Friday arrive the most on time, but if you go into the datasheet you see that these are the most unpunctual days. The same is true for domestic and international deliveries.

Does anyone have an answer as to why this may be?
Is it due to the uneven distribution of the target variable?
I hope someone can help me with this.
Have a nice day.
Jan
Request clarification before answering.
Hi Jan, doing the maths you have roundabout 360 late packages and the rest is on time. Generally speaking several hundred examples of the "minority class" (this is the less represented class in the dataset, in your case late package) is enough for the algorithm to do a good job which seems to be the case here. I think the problem rather lies in your interpretation of the bar charts. Positive bars lead to more of the minority class to be represented = late packages, while negative bars do the opposite. What you see from the model seems to be in line with your business understanding / considerations. Does it make sense? Best regards Antoine
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Hi Antoine, that makes more sense then how I interpreted it.
I have oriented myself to this blog: Classification in SAP Analytics Cloud in Detail | SAP Blogs
Here the following is described:


Therefore, I have also interpreted the graph so that Friday and Wednesday have a positive impact on the delivery of the package.
Best regards Jan
Hi Jan, I do understand this visualization is not necessarily easy to decipher. I retrieved your help topic related to the same. I am not 100% happy of the clarity to provide in terms of interpretation. I'll follow up with my local documentation developers to make this better. In the meantime if you are happy with my guidance, I suggest you can accept my answer. Your use case sounds very exciting & promising. If you have more how-to questions, feel free to raise them to the SAP community and tag me! Best regards Antoine
"...A negative bar (influence on target less than 0) indicates that the category contains fewer positives cases (%) than the percentage of positive cases in the overall validation data source...". Positive cases means cases with class comparing to the minority class = 0 /late package in your example.
Hi Antoine,
this helped me a lot.
Thanks for your quick help!
Jan
Jan, you can still improve your predictive power (accuracy) further by bringing more explanatory variables to the dataset. To improve the prediction confidence you need more rows especially from the now famous minority class.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Hi Antoine,
there are other variables in the data set that are expected to have an impact on the prediction. However, these are not recognized by the SAC.
An example here is whether the package is delivered to the inland or to the foreign country. Here you can see in the dataset that the percentage of packages that are late is significantly higher than if they are sent in the same country.
However, only 500 of the 18000 packages are sent abroad. Is it possible that the SAC does not recognize this?
Hello Jan, difficult to comment w/o seeing this in details. To your point, this behavior might be very specific to these 500 packages, yet not significant at global dataset level. Are you trying to predict packages from being late or explain why they might be late? These are quite different intents. Best regards Antoine
would be interesting then for you to give it a try with Smart Discovery. The main focus of Smart Discovery is to explain, the main focus of Smart Predict is to predict (even if you have some explanatory elements in Smart Predict)
Indeed Wednesday seems to be the most problematic day for packages to come on time, followed by Friday.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Does 0 stands for late, and 1 stands for on-time?
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Hello Jan, how many records do you have in total in your dataset? Best regards Antoine
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
| User | Count |
|---|---|
| 13 | |
| 8 | |
| 7 | |
| 5 | |
| 4 | |
| 3 | |
| 2 | |
| 2 | |
| 2 | |
| 2 |
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.