A short description on various different outlier detection methods for times-series data and their implementations.
Z-Score Outlier Detection
One of the simplest way to automate outlier identification for machine learning is to implement a z-score confidence band approach.
Assuming you start with a series of annual prices p1, … , pn, we take the difference to get annual returns r1, … , rn-1. We compute the mean and the standard deviation of the returns.
μ = (∑n rn)/n and σ = sqrt(∑n (rn– μ)2/n)
To identify outliers, establish the confidence intervals on the returns and with a flexible range. Set a threshold value that controls the confidence intervals, call it z for z-score. Now you can identify outliers anytime they are outside the upper and lower bounds on these confidence intervals, i.e.,
rn > μ + z*σ or rn < μ – z*σ
Adjusting the z-score gives you a scaling band to use for identifying returns which are on the extreme values of the your distribution. Plotting this you can visualize which points were identified.
![](https://static.wixstatic.com/media/8255e9_56b5972c0b2a455ab8f1004d19a78b53~mv2.png/v1/fill/w_706,h_226,al_c,q_85,enc_auto/8255e9_56b5972c0b2a455ab8f1004d19a78b53~mv2.png)
The following guide can be used to establish the z-score values based on the probabilities of the events you want to capture. This assumes a normal distribution, but stock return distributions are usually leptokurtic having more kurtosis or “fatty tails”, so using higher z-score values are recommended.
![](https://static.wixstatic.com/media/8255e9_16bba401273d4f8bbd9561ea6917c8bb~mv2.png/v1/fill/w_739,h_437,al_c,q_85,enc_auto/8255e9_16bba401273d4f8bbd9561ea6917c8bb~mv2.png)
Comments