Pearson's chi-square test

Pearson's Chi-Square Test is a statistical test used to test the goodness of fit or to test if there is a difference between samples of data ^[1]. A one-sample chi-square test is used to test the goodness of fit between observed frequencies and theoretical expected frequencies ^[2]. An example of a one-sample in a chi-square is soil type on a farm. The farm is the single sample that is being tested. The soil types are the categories. The sample can have multiple categories but one sample is being tested. A two-or-more samples chi-square test is used to test differences between data samples ^[3]. An example of a two-sample chi-square is testing the differences between soil types on two separate farms. The two farms are the two samples.

The Chi-Square statistical test can be used to asses geographic data. The one-sample test enables geographers to examine the differences between observed data and expected data. Two-or-more samples test enables geographers to examine the differences between samples.

Criteria for a Chi-Square Test

-In terms of the Scale of measurement, the data must be nominal.

-The data may also be "categorized" ordinal or interval data.

-The categories of data must be mutually exclusive.

-The "data must be in frequencies, i.e. the number of discrete objects occurring in different categories" ^[4]. The data cannot be in percentages or proportions.

One-Sample Chi-Square Test

The chi-square statistic is a sum of differences between observed and expected outcome frequencies, each squared and divided by the expectation:

\chi^2 = \sum_{i=1}^n {\frac{(O_i - E_i)}{E_i}^2}

where:

O_i

= an observed frequency for the

i^{th}

bin

E_i

= an expected (theoretical) frequency for the

i^{th}

bin, asserted by the null hypothesis

The resulting value can be compared to the chi-square distribution to determine the goodness of fit.

In order to determine the degrees of Freedom of the Chi-Squared distribution, one takes the total number of observed frequencies and subtracts one. For example, if there are eight different frequencies, one would compare to a chi-squared with seven degrees of freedom.

Another way to describe the chi-squared statistic is with the differences weighted based on measurement error:

\chi^2 = \sum {\frac{(O - E)^2}{\sigma^2}}

where $\sigma^2$ is the variance of the observation ^[5]. This definition is useful when one has estimates for the error on the measurements.

The reduced chi-squared statistic is simply the chi-squared divided by the number of degrees of freedom: ^[5] ^[6] ^[7] ^[8]

\chi_{red}^2 = \frac{\chi^2}{\nu} = \frac{1}{\nu} \sum {\frac{(O - E)^2}{\sigma^2}}

where $\nu$ is the number of degrees of freedom, usually given by $N-n-1$ , where $N$ is the number of bins, and $n$ is the number of fit parameters. The advantage of the reduced chi-squared is that it already normalizes for the number of data points and model complexity. As a rule of thumb, a large $\chi_{red}^2$ indicates a poor model fit. However $\chi_{red}^2 < 1$ indicates that the model is 'over-fitting' the data (either the model is improperly fitting noise, or the error bars have been over-estimated). A $\chi_{red}^2 > 1$ indicates that the fit has not fully captured the data (or that the error bars have been under-estimated). In principle a $\chi_{red}^2 = 1$ is the best-fit for the given data and error bars.

Binomial Case

A binomial experiment is a sequence of independent trials in which the trials can result in one of two outcomes, success or failure. There are n trials each with probability of success, denoted by p. Provided that np_i ≫ 1 for every i (where i = 1, 2, ..., k), then

$\chi^2 = \sum_{i=1}^{k} {\frac{(N_i - np_i)^2}{np_i}} = \sum_{\mathrm{all\ cells}}^{} {\frac{(\mathrm{O} - \mathrm{E})^2}{\mathrm{E}}}.$

This has approximately a chi-squared distribution with k − 1 df. The fact that df = k − 1 is a consequence of the restriction $\sum N_i=n$ . We know there are k observed cell counts, however, once any k − 1 are known, the remaining one is uniquely determined. Basically, one can say, there are only k − 1 freely determined cell counts, thus df = k − 1.

References

↑ Ebdon, David (1985). Statistics in Geography, page 66, 67. Oxford: Blackwell Publishing, 1985.
↑ Ebdon, David (1985). Statistics in Geography, page 66. Oxford: Blackwell Publishing, 1985.
↑ Ebdon, David (1985). Statistics in Geography, page 67, 71. Oxford: Blackwell Publishing, 1985.
↑ Ebdon, David (1985). Statistics in Geography, page 67. Oxford: Blackwell Publishing, 1985.
↑ ^5.0 ^5.1 Charlie Laub and Tonya L. Kuhl: Chi-Square Data Fitting. University California, Davis.
↑ John Robert Taylor: An introduction to error analysis, page 268. University Science Books, 1997.
↑ Kirkman, T.W.: Chi-Square Curve Fitting.
↑ David M. Glover, William J. Jenkins, and Scott C. Doney: Least Squares and regression techniques, goodness of fit and tests, non-linear least squares techniques. Woods Hole Oceanographic Institute, 2008.

Pearson's chi-square test

Contents

Criteria for a Chi-Square Test

One-Sample Chi-Square Test

Binomial Case

References

Further Reading

See Also

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Need Help

Tools

In other languages