How to Deal with Data that Didn't Come from a Book



STATGRAPHICS® Centurion XV is a powerful and intuitive statistics package designed to handle real-world data. With over 150 procedures, it combines tabular and graphical output in a format that encourages users to take a close look at their data before applying any statistical procedure. Since data is not always normally distributed, independent, or free from outliers, methods are provided to test for and deal with violations of common statistical assumptions.

Testing for outliers

Everyone who analyzes data has been faced with the problem of determining whether or not an apparent abnormality should be left in the data set or removed. The implications of wrongly making either decision can be serious. While no statistical technique can say with certainty whether a questionable data value belongs with the others, the STATGRAPHICS Centurion Outlier Identification procedure provides important clues.
Among the tools provided are:
(1) Tests such as Grubbs’ and Dixon’s tests that examine the observations farthest from the mean to determine whether they come from the same population as the rest.
(2) Tukey’s test for outside points, often displayed on a box-and-whisker or stem-leaf plot.
(3) Winsorized statistics that adjust the mean and standard deviation for the possible presence of outliers.
(4) Nonparametric statistical procedures that are resistant to outliers, such as an ANOVA based on ranks or a regression fit based on absolute deviations rather than squared deviations.


click the image to enlarge

Comparison of fitted distributions.

Dealing with non-normality

Although many statistical procedures assume that data come from a Gaussian distribution, that is not always the case. When it’s not, analysts either need to transform the data or adapt their statistical procedures to the actual form of the distribution.
STATGRAPHICS Centurion provides tools for both approaches. The Power Transformations procedure assists users in selecting a
transformation that eliminates skewness and kurtosis. The Distribution Fitting procedure fits up to 45 distributions and sorts them by goodness-of-fit. Procedures such as Process Capability Analysis, Control Charts, and Life Data Analysis then adapt to the selected distribution.

Accounting for lack of independence

With today’s automated data collection systems, samples are frequently collected at closely spaced increments of time. Any sort of process dynamics introduces correlations into successive measurements, which causes havoc with standard control charts that assume independence between successive samples. In such cases, process monitoring must be based on a statistical model that captures the dependencies in the data.
A useful control chart for autocorrelated data is an ARIMA control chart. Based upon parametric time series models, such charts separate the effects of past events from current shocks to the system. In STATGRAPHICS Centurion, ARIMA control charts can be created for either Phase I analysis or Phase II monitoring.

Conclusions

When dealing with real-world data, the analyst must be prepared to handle violations of standard assumptions. STATGRAPHICS Centurion provides an extensive toolbox designed to handle the types of data often encountered in practice. In addition, since many practitioners are not professional statisticians, tools such as the StatWizard™ and StatAdvisor™ are provided to assist users in selecting and interpreting the statistical techniques. When combined with STATGRAPHICS’ trademark graphics and accurate, reliable algorithms, proper application of statistical methods has never been easier.


Neil W. Polhemus
Chief Technology Officer
StatPoint, Inc.
 
© 2008 Advantage Business Media . All rights reserved.
Use of this website is subject to its terms of use.
Privacy Policy