- SAP Community
- Products and Technology
- Technology
- Technology Q&A
- How to detect and exclude outliers from linear reg...

Turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Bookmark
- Subscribe
- Printer Friendly Page
- Report Inappropriate Content

How to detect and exclude outliers from linear regression?

VolkerBarth

Contributor

- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Bookmark
- Subscribe
- Printer Friendly Page
- Report Inappropriate Content

on 2011 Jun 08 1:26 PM

4,288

- SAP Managed Tags:
- SAP SQL Anywhere

[Just to follow-up on this question]:

I'm doing **linear regression tests** with SQL Anywhere's builtin OLAP functions. Say, for a very simplified example, I would assume a linear correlation between the columns *x* and *y* in a table *MyTable*.

So I would generate a linear function with

select REGR_COUNT(y, x) as cnt, round(REGR_SLOPE(y, x), 4) as slope, round(REGR_INTERCEPT(y, x), 4) as yIntercept, round(REGR_R2(my, x), 4) as fitness from MyTable where x > 0 and y > 0;

This works well generally. However, what would be a senseful method to **exclude outliers**?

A simple test for maximum/minimum values (or a ranking) seems inadequate as outliers would be defined as based on their *value pairs*, not just on the *y* value.

Currently, Im trying to use the above query as common table expression and then to check for those pairs that have a bigger deviation compared to the generated linear function:

with CTE_LR as (select REGR_COUNT(y, x) as cnt, round(REGR_SLOPE(y, x), 4) as slope, round(REGR_INTERCEPT(y, x), 4) as yIntercept, round(REGR_R2(my, x), 4) as fitness from MyTable where x > 0 and y > 0) select x, y, round(slope * x + yIntercept, 4) as yCalc, abs(yCalc - y) as absDiff, abs(yCalc - y) / y as relDiff from MyTable M, CTE_LR where x > 0 and y > 0 order by relDiff desc, x, y;

**However, this helps to detect outliers post-mortem, but obviously they have already influenced the linear regression.** I could then build another regression without these outliers (say, those with a certain relative deviation) but that again might exclude the "wrong outliers" based on them being part of the previous regression.

Therefore I would like a way to exclude them *beforehand*. Is there a (not too complicated) way to do so?

MCMartin

Participant

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Report Inappropriate Content

2011 Jun 08
1:55 PM

VolkerBarth

Contributor

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Report Inappropriate Content

2011 Jun 09
5:20 AM

That seems helpful (and I haven't been aware of that variation of RANK()), but...

...the problem is I'm not looking for absolute values but maximum/minimum *y* values w.r.t. the according *x* values.

So basically it's a question **how to do a "partition" (or a group by) over a continuous range of double values** - and unfortunately not a range which is evenly filled with *x* values.

MCMartin

Participant

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Report Inappropriate Content

2011 Jun 09
7:14 AM

Ask a Question

Related Content

- Evaluating SAP Analytics Cloud’s (SAC) Classification with Smart Predict in Technology Blogs by SAP
- New Machine Learning features in SAP HANA Cloud database release 2024 Q2 in Technology Blogs by SAP
- Time Series Forecasting Model Performance Evaluation in Predictive Planning in Technology Blogs by SAP
- Fairness in Machine Learning - A New Feature in SAP HANA Cloud PAL in Technology Blogs by SAP
- What’s New in SAP HANA Cloud – September 2023 in Technology Blogs by SAP

Top Q&A Solution Author

User | Count |
---|---|

67 | |

10 | |

10 | |

10 | |

10 | |

8 | |

8 | |

7 | |

5 | |

5 |

You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.