Pearson's chi-square test
Pearson's Chi-Square Test is a statistical test used to test the goodness of fit or to test if there is a difference between samples of data [1]. A one-sample chi-square test is used to test the goodness of fit between observed frequencies and theoretical expected frequencies [2]. An example of a one-sample in a chi-square is soil type on a farm. The farm is the single sample that is being tested. The soil types are the categories. The sample can have multiple categories but one sample is being tested. A two-or-more samples chi-square test is used to test differences between data samples [3]. An example of a two-sample chi-square is testing the differences between soil types on two separate farms. The two farms are the two samples.
The Chi-Square statistical test can be used to asses geographic data. The one-sample test enables geographers to examine the differences between observed data and expected data. Two-or-more samples test enables geographers to examine the differences between samples.
Contents
Criteria for a Chi-Square Test
-In terms of the Scale of measurement, the data must be nominal.
-The data may also be "categorized" ordinal or interval data.
-The categories of data must be mutually exclusive.
-The "data must be in frequencies, i.e. the number of discrete objects occurring in different categories" [4]. The data cannot be in percentages or proportions.
One-Sample Chi-Square Test
The chi-square statistic is a sum of differences between observed and expected outcome frequencies, each squared and divided by the expectation:
where:
- = an observed frequency for the bin
- = an expected (theoretical) frequency for the bin, asserted by the null hypothesis
The resulting value can be compared to the chi-square distribution to determine the goodness of fit.
In order to determine the degrees of Freedom of the Chi-Squared distribution, one takes the total number of observed frequencies and subtracts one. For example, if there are eight different frequencies, one would compare to a chi-squared with seven degrees of freedom.
Another way to describe the chi-squared statistic is with the differences weighted based on measurement error:
where is the variance of the observation [5]. This definition is useful when one has estimates for the error on the measurements.
The reduced chi-squared statistic is simply the chi-squared divided by the number of degrees of freedom: [5] [6] [7] [8]
where is the number of degrees of freedom, usually given by , where is the number of bins, and is the number of fit parameters. The advantage of the reduced chi-squared is that it already normalizes for the number of data points and model complexity. As a rule of thumb, a large indicates a poor model fit. However indicates that the model is 'over-fitting' the data (either the model is improperly fitting noise, or the error bars have been over-estimated). A indicates that the fit has not fully captured the data (or that the error bars have been under-estimated). In principle a is the best-fit for the given data and error bars.
Binomial Case
A binomial experiment is a sequence of independent trials in which the trials can result in one of two outcomes, success or failure. There are n trials each with probability of success, denoted by p. Provided that npi ≫ 1 for every i (where i = 1, 2, ..., k), then
This has approximately a chi-squared distribution with k − 1 df. The fact that df = k − 1 is a consequence of the restriction . We know there are k observed cell counts, however, once any k − 1 are known, the remaining one is uniquely determined. Basically, one can say, there are only k − 1 freely determined cell counts, thus df = k − 1.
References
- ↑ Ebdon, David (1985). Statistics in Geography, page 66, 67. Oxford: Blackwell Publishing, 1985.
- ↑ Ebdon, David (1985). Statistics in Geography, page 66. Oxford: Blackwell Publishing, 1985.
- ↑ Ebdon, David (1985). Statistics in Geography, page 67, 71. Oxford: Blackwell Publishing, 1985.
- ↑ Ebdon, David (1985). Statistics in Geography, page 67. Oxford: Blackwell Publishing, 1985.
- ↑ 5.0 5.1 Charlie Laub and Tonya L. Kuhl: Chi-Square Data Fitting. University California, Davis.
- ↑ John Robert Taylor: An introduction to error analysis, page 268. University Science Books, 1997.
- ↑ Kirkman, T.W.: Chi-Square Curve Fitting.
- ↑ David M. Glover, William J. Jenkins, and Scott C. Doney: Least Squares and regression techniques, goodness of fit and tests, non-linear least squares techniques. Woods Hole Oceanographic Institute, 2008.
Further Reading
- Ebdon, David(1985). Oxford: Blackwell Publishing. ISBN 978-0-631-13688-0.