Laser games have become very popular entertainment over the last few years. There are certainly small differences between the points that look totally similar in this graph, but the measurement device cannot detect and recognize such small differences, resulting in a discrete distribution.Įxample: Probability Distribution Plots Using Laser Game Data In measurement system analysis, such a pattern often indicates that the resolution of a measuring device is insufficient (with a low number of distinct categories, for example). Clearly this does not represent a continuous distribution. In the graph below, distinct groups of points are displayed along the probability plot line. Using Probability Plots to Identify Discrete Distributions In a DOE or in a regression analysis, a plot like this this indicates that you need either to transform your data (into a normal distribution) or use another, more appropriate distribution. Cp and Cpk estimates are very sensitive to non-normality issues. This has use in capability analyses: such a curvilinear pattern indicates that an asymmetrical distribution would be more appropriate (not the normal one). The data are very concentrated and close to one another at the other end (left side) of the distribution. Clearly the points do not follow the probability plot line, with more dispersion on the longer (right-sided) tail. In the graph below, the data has been generated from an extremely asymmetrical (exponential) distribution. Using Probability Plots to Identify Asymmetrical Distributions Effects that lie along the normal probability plot line are not significant (these effects are only caused by random variations), whereas the points that look like outliers represent real significant effects. They may be used to identify significant effects. In a DOE (design of experiments) analysis, the effect plots are probability plots that represent factor or interaction effects. Probability plots also help up understand experimental designs. So probability plots on residual values from a statistical model are very useful for model validation and to detect some outliers that might be caused by failed tests, wrong measurements etc. Outliers may strongly affect regression or ANOVA models since a single outlier may result in all predictor coefficients being biased. The points at the upper or lower extreme of the line, or which are distant from this line, represent suspected values or outliers. The points located along the probability plot line represent “normal,” common, random variations.
Probability plots may be useful to identify outliers or unusual values. Using Probability Plots to Identify Outliers or Significant Effects If the points are all covered by this imaginary pencil, then the hypothesized distribution (the normal distribution in this case) is likely to be appropriate. In assessing how close the points are to a straight line, the "fat pencil" test is often used. But I want to focus specifically on analyzing graphical patterns in probability plots, based on a subjective visual examination of the data. In order to transform this S shaped curve into a line, a special Gausso-arithmetic (nonlinear) scale is needed (for the vertical Y scale).Ī low p value indicates that the normality hypothesis needs to be rejected.
For a normal distribution the CDF will look like an S shape. To do this, the cumulative density function (the so-called CDF, cumulating all probabilities below a given threshold) is used (see the graph below). In probability plots, the data density distribution is transformed into a linear plot. In this post, I intend to present the main principles of probability plots and focus on their visual interpretation using some real data. Probability plots are a powerful tool to better understand your data. There is more than just the p value in a probability plot-the overall graphical pattern also provides a great deal of useful information.