Following seminal papers by box 1953 and tukey 1960, which demonstrated the need for robust statistical procedures, the theory of robust statistics blossomed in the 1960s and 1970s. The breakdown value is a measure of the proportion of contamination that a procedure can withstand and still maintain its robustness. Outlier detection using distributionally robust optimization. These are all called highbreakdown estimators since they can be tuned to resist contamination in up to 50% of the observations. Sestimators of regression parameters, proposed by rousseeuw and yohai 1984, search for the slope and intercept values that minimize some measure of scale associated with the residuals. Rousseeuw 1984 proposed an approximate algorithm based on drawing random subsamples of the same size than the number of carriers. We advocate the least median of squares method rousseeuw 1984 because it appeals to the intuition and is easy to use. Later, they were applied to the multivariate scale and location estimation problem davies, 1992. A similar result was mentioned by donoho and rousseeuw at the 1985 oberwolfach workshop on robustness. Use the link below to share a fulltext version of this article with your friends and colleagues.
Individual differences in the perception of biological motion. Examples include the least median of squares lms rousseeuw, 1984, which minimizes the median of the absolute residuals, the least trimmed squares lts rousseeuw, 1985, which minimizes the sum of the qsmallest squared residuals, and sestimation rousseeuw and yohai, 1984, which has a higher statistical e ciency than lts with the same break. The results directly paralleled the uncorrected analyses. Outlier detection using nonconvex penalized regression yiyuan she and art b. Econometrics free fulltext financial big data solutions. This does not mean one should not use lts, just that one should be aware of the gaussian efficiency price one is paying, as a function of fraction of contaminationbreakdown point and probably opt for lower breakdown points when confident that the fraction of. Sestimators, proposed by rousseeuw and yohai 1984, were the first high. Therefore it can be viewed as a statistical theory dealing with approximate parametric models and a bridge between the fisherian parametric approach and the full nonparametric approach. This algorithm, that we call fasts, is based on modifying each candidate with a step that improves the soptimality criterion, and thus allows to reduce the number of subsamples required to obtain a desired breakdown. All of these estimates, however, have very low efficiency under a regression model with normal errors. This article presents some applications of an hbp estimator called the sestimator rousseeuw and yohai, robust and nonlinear time series analysis eds w. Note that the maximumlikelihood estimator is an mestimator, obtained by putting the maximumlikelihood estimator can give arbitrarily bad results when the underlying assumptions e. In this paper, we propose to use instead a modification of the cstep algorithm proposed by rousseeuw and van driessen 1999 which is actually a lot faster. On the basis of these robust estimates, rousseeuw and leroy.
En robust and nonlinear time series, editores, franke, hardle and martin. Part of the lecture notes in statistics book series lns, volume 26. When setting int to true, this adds an intercept column to the design matrix. To overcome these limitations, maronna 2011 has recently proposed an. Rousseeuw and yohai 1984 indicated that ols estimates have a breakdown point bp of bp 1n, which tends to zero when the sample size nis getting large. Robust estimation in simultaneous equations models, journal of statistical. Outlier detection using nonconvex penalized regression. Robust regression by means of sestimators springerlink. We find that 1 that robust regression applications are appropriate for modeling stock returns in global markets. Sestimators of regression parameters, proposed by rousseeuw and yohai 1984, search for the slope and intercept values that minimize some measure of. Model of robust regression with parametric and nonparametric. Rousseeuw 1984 showed that such estimators achieve a high breakdown value that is, they continue to give reasonable results even in the presence of many bad observations. One of the commonly used robust loss functions is hubers function huber 1981, where. The concept of robust estimators has been further extended in huber, rousseeuw and yohai, rousseeuw and tyler and is broadly discussed in the existing literature in the context of robust methods for principal component analysis as in maronna or huber and ronchetti.
Rousseeuw and yohai 1984 24 introduced the trimmed least squares tls regression which is a highly robust method for fitting a linear regression model. The efficiency of an sestimator in the linear model is maximized under a constraint on the breakdown point and the form of the optimal score function is derived. The lws is regression, scale, and affine equivariant similar to the lms and the lts rousseeuw and leroy, 1987. The use of alternative regression methods in social. This algorithm, that we call \fasts, is based on modifying each candidate with a step that improves the soptimality criterion, and thus allows to reduce the number of subsamples. With the same breakdown value, it has a higher statistical ef. Rousseeuw, 1984 the asymptotic breakdown point is then defined as 2. Mm estimation, introduced by yohai 1987, which combines high breakdown value estimation and m estimation. Therefore, one single unusual observation can have large impact on the ols estimate. Highbreakdown point estimation of some regression models. Fast robust sur with economical and actuarial applications.
This observation allows us to elaborate on a property of highbreakdown estimators first noted by rousseeuw 1984 and formally defined by yohai and zamar 1988. A class of robust estimates for the linear model is introduced. The br akdown point approach is highly attractive for a number of reasons, not the least. Least squares, for example, minimizes the variance of the residuals and is a special case of sestimators. For more details see salibianbarrera and yohai 2006 or thieler, fried and rathjens 2016.
Least trimmed squares lts regression is based on the subset of h cases out of n whose least squares t possesses the smallest sum of squared residuals. Croux 443 a measure of dispersion of the residuals that is less sensitive to extreme values than the variance. Ronchetti, rousseeuw, and stahel 1986, maronna, martin, and yohai 2006, and dellaquila and ronchetti 2006 for an overview. A combination of the high breakdown value method and mestimation is the mmestimation yohai, 1987. The books by hampel et al and rousseeuw and leroy 18 also cover robust tests. For comparison to the partial correlation and linear regression analyses summarized above, we also conducted robust regression analyses using the s rousseeuw and yohai, 1984 and mm estimation yohai, 1987 procedures, both of which correct estimates for the effects of outliers.
S estimation is a high breakdown value method introduced by rousseeuw and yohai 1984. The name sestimators was chosen as they are based on estimators of scale. Mestimation is the simplest approach both computationally. Rousseeuw and van driessen 1999 which is actually a lot faster. Both of these estimators are useful for variable selection, but can only be tuned to be either highly robust or highly efficient under the normal model yohai, 1987. It is shown than an sestimate based on ajumpfunctiontype p solves the minmax bias problem for the class ofmestimates with very general scale. A robust learning approach for regression models based on. A comparison of outlier detection procedures and robust. It has a higher statistical e ciency than sestimation. Robust least squares refers to a variety of regression methods designed to be robust, or less sensitive, to outliers. The breakdown value is a measure of the proportion of contamination that an estimation method can withstand and still maintain its robustness. Easily share your publications and get them in front of issuus. Whether youve loved the book or not, if you give your honest and detailed thoughts then people will find new books that are right for them.
Rousseeuw and yohai 1984 minimize the variance of the residuals. Rousseeuw and yohai 1984, by permission of springer. Eviews offers three different methods for robust least squares. In this analysis of the risk and return of stocks in global markets, we apply several applications of robust regression techniques in producing stock selection models and several optimization techniques in portfolio construction in global stock universes. The feasible solution algorithm for least trimmed squares. Robust regression and outlier detection rousseeuw, peter j. A class of robust and fully efficient regression estimators. An empirical comparison between robust estimation and.
The paper will provide an overview of robust regression methods, describe the syntax of proc robustreg, and illustrate the use of the procedure to. In this talk we present an algorithm for sestimates see rousseeuw and yohai, 1984 similar to the fastlts. That is, an minimizes the mscale an a implicitly defined by the equation 2. However, all these estimates have very low efficiency under a regression model with normal errors. A fast algorithm for the minimum covariance determinant estimator. For this reason, rousseeuw and yohai 1984 propose to minimize. More on sestimates below see rousseeuw and yohai, 1984. Rousseeuw and yohai 1984, by permission of springerverlag, new york. The sestimator rousseeuw and yohai 1984 develop a highbreakdown estimator that minimizes the dispersion of the residuals.
These estimates, called mmestimates, have simultaneously the following properties. Robust regression by means of sestimators in robust and nonlinear time series analysis. No back ground knowledge or choice of tuning constants are needed. Citeseerx a fast algorithm for sestimates of regression. Unfortunately, another common feature of these estimators is the timeconsuming nature. Introduction to rousseeuw 1984 least median of squares regression.
He obtained his phd in 1981 at the vrije universiteit brussel, following research carried out at the eth in zurich in the group of frank hampel, which led to a book on influence functions. The performance of this method was improved by the fastlts algorithm ofrousseeuw and van driessen2000. If this is the case then should a robust method be used for raim. Mm estimation, introduced by yohai 1987, combines high breakdown value estimation and. Its selfcontained treatment allows readers to skip the mathematical material which is concentrated in a few sections. Penalized weighted least squares for outlier detection and. Supandi et al 593 sestimators sestimators were first introduced in the context of regression by rousseeuw and yohai 1984. Given the same breakdown value, s estimation has a higher statistical efficiency than lts estimation. The goal of sestimators is to have a simple highbreakdown regression estimator, which share the flexibility and nice asymptotic properties of mestimators. Issuu is a digital publishing platform that makes it simple to publish magazines, catalogs, newspapers, books, and more online. Efficient robust regression via twostage generalized empirical. Rousseeuw 1984, least trimmed squares rousseeuw 1984, restimators jaeckel 1972, mestimators huber 1964, generalised mestimators hampel et al. Rousseeuw 1984 proposed the least median of squares lms and the least trimmed squares lts. A very important problem in finance is the construction of portfolios of assets that balance risk and reward in an optimal way.
To remedy this problem, rousseeuw and bassett introduced a new robust estimator which. It seeks to provide a robust estimator that is minimized the subsets. He obtained his phd in 1981 at the vrije universiteit brussel, following research carried out at the eth in zurich in the group of frank hampel, which led to a book on influence. Introduction to rousseeuw 1984 least median of squares. The following dataset can be found in the world almanac and book of facts. Selfefficacy, aftercare and relapse in a treatment program for alcoholics. The asymptotics of sestimators in the linear regression model jstor. The book of hampel, ronchetti, rousseeuw and stahel 1986 develops a. S estimation, which is a high breakdown value method that was introduced by rousseeuw and yohai 1984. In the latter two papers, the authors construct regression estimators which have both high breakdown points and high efficiency. Rousseeuw born october 1956 is a statistician known for his work on robust statistics and cluster analysis. Phd thesis, university of michigan, university micro.
It is worth noting that ss is a highly nonlinear, nondifferentiable function with multiple local maxima. Rousseeuw and yohai 1984 proposed s estimates, defined by the property of minimizing an m estimate of the residuals scale. To compute it, they use a modified version of the forward search algorithm see e. Robust and nonlinear time series analysis, 256272, 1984. High breakdownpoint estimates of regression by means of. Next 10 on robust properties of convex risk minimization methods for pattern recognition. Huber, 1964, 1973, least median of squares lms rousseeuw, 1984, least trimmed squares lts rousseeuw, 1985, sestimation rousseeuw and yohai, 1984, and mmestimation yohai, 1987, are elaborated in the book of rousseeuw and leroy 2005. On the optimality of sestimators ola hijssjer uppsala urkersity, uppsala, sweden received june 1991 abstract. A critical issue in portfolio development is how to address data outliers that reflect very unusual, generally nonrecurring, market conditions. We will consider estimators of scale defined by a function, which satisfy. Yohai 1984, and sestimators for multivariate location and scatter have been. The maximal bias under arbitrary contaminations of size. These estimates have a very high computational complexity and therefore the usual algorithms compute only approximate solutions. Sasstat software sas technical support sas support.
Part of the springer series in statistics book series sss. High breakdown point robust regression with censored data salibianbarrera, matias and yohai, victor j. The performance of this method was improved by the fastlts algorithm of rousseeuw and van driessen 1998. Robust regression and outlier detection rousseeuw, peter. Citeseerx citation query robust regression by means of. Rousseeuw and yohai 1984 proposed svestimates, defined by the property of minimizing an mestimateofthe residuals scale. In addition, asymptotic distributions of the estimators are given, coupled with second order corrections to the bias of the estimators. Other readers will always be interested in your opinion of the books youve read. These estimates have a very high computational complexity, and thus the usual algorithms compute only approximate solutions. Its breakdown is 50% when h is approximately n2 rousseeuw and leroy, 1987. The asymptotic breakdown point of the sestimator is given by rousseeuw and yohai, 1984. Rousseeuw and yohai 1984 proved consistency and asymptotic normality. It leads to the notion of breakdown hodges, 1967, hampel, 1974, and rousseeuw, 1984 and bias robustness see for example donoho and liu, 1988 or martin, yohai and zamar, 1989.