Outlier removal, sum scores, and the inflation of the Type I error rate in independent samples t tests: the power of alternatives and recommendations

Marjan Bakker; Jelte M Wicherts

doi:10.1037/met0000014

Outlier removal, sum scores, and the inflation of the Type I error rate in independent samples t tests: the power of alternatives and recommendations

Psychol Methods. 2014 Sep;19(3):409-27. doi: 10.1037/met0000014. Epub 2014 Apr 28.

Authors

Marjan Bakker¹, Jelte M Wicherts²

Affiliations

¹ Department of Psychology.
² Tilburg School of Social and Behavioral Sciences, Tilburg University.

PMID: 24773354
DOI: 10.1037/met0000014

Abstract

In psychology, outliers are often excluded before running an independent samples t test, and data are often nonnormal because of the use of sum scores based on tests and questionnaires. This article concerns the handling of outliers in the context of independent samples t tests applied to nonnormal sum scores. After reviewing common practice, we present results of simulations of artificial and actual psychological data, which show that the removal of outliers based on commonly used Z value thresholds severely increases the Type I error rate. We found Type I error rates of above 20% after removing outliers with a threshold value of Z = 2 in a short and difficult test. Inflations of Type I error rates are particularly severe when researchers are given the freedom to alter threshold values of Z after having seen the effects thereof on outcomes. We recommend the use of nonparametric Mann-Whitney-Wilcoxon tests or robust Yuen-Welch tests without removing outliers. These alternatives to independent samples t tests are found to have nominal Type I error rates with a minimal loss of power when no outliers are present in the data and to have nominal Type I error rates and good power when outliers are present.

Publication types

Research Support, Non-U.S. Gov't
Review

MeSH terms

Computer Simulation*
Humans
Models, Statistical*
Statistics as Topic*
Statistics, Nonparametric*