File Exchange

## Theil-Sen Robust Linear Regression

version 1.2.0.0 (2.53 KB) by
Quickly perform linear regression that is robust to outliers.

Updated 29 Sep 2015

This function executes a fast version of the non-parametric Theil-Sen robust linear regression algorithm by finding the median slope between all pairwise combinations of points in a given data set.
For my application I needed to run a robust regression on large data sets (many thousands of points), but the implementations I found on the File Exchange were far too slow (see figure). This code is substantially faster, and for large data sets can be two orders of magnitude faster than those currently available.

### Cite As

Zachary Danziger (2021). Theil-Sen Robust Linear Regression (https://www.mathworks.com/matlabcentral/fileexchange/48294-theil-sen-robust-linear-regression), MATLAB Central File Exchange. Retrieved .

ABDELHAMID ELTAHAN

Thank you for your effort, I cant calculate the intercept?? Would you tell me how to calculate it please??

Leslie

Great routine, thanks!
Minor correction: Citation should be to Gilbert (1987) §16.5, not 6.5

Manthos Vogiatzoglou

Dear Zachary
I am a total newbie to the field so there are a few things I am missing here.
I want to estimate a linear model such y = a + b1*X1 + b2*X2. My input array should be data = [ones(n,1), X1, X2, y]?
What is the meaning of the offset output? What about standard errors? Any help and/or references would be highly appreciated. Thanks

Nommy

Then how would you plot the line on the time series data after you have estimated the slope?

Rafael Gatica

normanius

Lionel Juillen AF

Very good contrib

T C

I believe the 'nanmedian' function requires a toolbox?

You can make this function toolbox free if you use the following code

b = median(D(~isnan(D)));

Replace D with whatever the parameter of nanmedium was previously.

Great script btw!

Zachary Danziger

@Ian: Absolutely right, I have updated the submission with your suggestion.

Ian Craig

Fantastic script, but you can squeeze even more speed out of this. When you calculate the slopes, you are building a symmetric matrix and calculating each slope twice, when you only need to do it once. Change the calculation of C to:

C(i,i:end) = (data(i,2)-data(i:end,2))./(data(i,1) - data(i:end,1));

(That still calculates unnecessary 'NaN' values for the diagonal, but the indexing is easy.)

Using timeit to time both functions with various values of n up to 8192, I get an almost 2x speedup (not surprising because I'm only doing half the calculations). For 8192 points on my machine, the time goes from 11.278s to 6.3985s.

Zachary Danziger

@Dillon: You are absolutely right, the legend entries are swapped.

Dillon Amaya

There is a small error in the example code you have in the documentation. I believe you mislabeled the Least Squares line and the the Theil-Sen line. LS should be blue and TS should be red.

Levi Golston

This is the best of the three Theil-Sen scripts. Can return both m and b, faster by 30x, and handles multi-dimensional input.

##### MATLAB Release Compatibility
Created with R2015a
Compatible with any release
##### Platform Compatibility
Windows macOS Linux