# Statistics

Statistics is a branch of mathematics dealing with the collection, analysis, interpretation, and presentation of masses of numerical data  The study of geography is greatly enhanced when paired with the study of statistics and often statistics are required to produce new ideas and insight. According to the Australian Bureau of Statistics, "A statistical geography provides the extra dimension of location to statistics. A statistical geography effectively divides the area of interest, on which the statistics are collected, into spatial categories, called statistical areas, that allow the user to see not just how the data varies but also where it varies. An effective statistical geography is one which supports many uses and enables comparisons over time."

## Role of Statistics in Geography

Geographers use statistics in numerous ways:

• To describe and summarize spatial data.
• To make generalizations concerning complex spatial patterns.
• To estimate the probability of outcomes for an event at a given location.
• To use samples of geographic data to infer characteristics for a larger set of geographic data (population).
• To determine if the magnitude or frequency of some phenomenon differs from one location to another.
• To learn whether an actual spatial pattern matches some expected pattern.

Statistics in GIS

Performing analysis in GIS naturally evokes an element of science, particularly the analysis step of the scientific process. The results generated from analysis within a GIS are not significant or valid simply because the tools used yielded a result. The results must have statistical tests/analysis performed on them in order to confirm mathematically that they are actually significant, or that they actually mean something. Below are typical methods used by geographers to confirm the significance of their analysis. More information about inferential statistics and hypothesis testing can be found elsewhere on wiki.

## Statistical Methods Used in Geography

Many different types of statistical methods can be used in geography. ArcMap can execute these methods, causing the user to see graphically what is happening in a given study area. ArcMap's capability to perform these tasks shows how GIS and statistics have grown together. Below are a few statistical methods that are commonly used in geography.

• Descriptive Statistics
• Probability and Discrete Probability Distributions
• Continuous Probability Distributions and Probability Models
• Inferential Statistics: Confidence Intervals, Hypothesis Testing, and Sampling.
• Analysis of Variance
• Correlation
• Regression Analysis
• Spatial Patterns
• Data Reduction: Factor Analysis and Cluster Analysis
• Spatial Analysis (Examples: Nearest neighbor analysis and Thiessen polygons)
• Student T-Test

## Type I and Type II Errors A diagram showing four possible outcomes of statistical testing 

There are two types of errors that occur in statistical testing: Type I and Type II. Type I error results from falsely rejecting the tested null hypothesis. Type II error results from falsely accepting the tested hypothesis.  There are always errors in the results of statistical tests due simply to the nature of the testing. These errors can be minimized by increasing the sample size.

The likelihood of making a Type I error is denoted by the Greek letter alpha (⍺) and is referred as the alpha or significance level. Common values chosen for ⍺ are 0.01, 0.05, or 0.10. Though it is important to keep the likelihood of errors as small as possible, this cannot be accomplished simply by choosing an incredible small value for ⍺. This is because there is inverse relationship between ⍺ and β, the likelihood of making a Type II error. The lower the ⍺-value that is chosen, the greater the chance that a false hypothesis will fail to be rejected. As the thumbnail shows, there are four possible outcomes associated with statistical testing. If the null hypothesis is true, we either make a correct decision with probability 1-⍺ or an incorrect decision with probability ⍺. If the null hypothesis is false, we either makes correct decision with probability 1-β, or we make a Type II error with probability β.