# Normalized Variable

*Normalization* or *Standardization* is a process of transforming a variable into a more analytically useful form, usually using a ratio. Raw statistical data is often susceptible to misinterpretation, and normalization is one method of correcting for this. In geospatial and spatial analysis applications, the term is used for two different derivations:

- Transforming values on different absolute scales into a common measure based on the statistical distributions from which they are collected.
- Transforming a total-count (extensive) variable into a continuous field (intensive) variable, which is more appropriate for applications such as choropleth maps

In both cases, the normalized variable is a ratio, with the original variable of interest as the numerator. In the first case, the denominator is a measure of the spread of the distribution of each variable, such as the total range of values or the standard deviation. In the second case, the denominator is another total-count variable for which the analyst wants to control. For example normalizing a total by area creates a density variable.

## Scale Standardization

In some analyses, one may wish to directly compare phenomena that come from different statistical distributions. For example, King County Washington (Seattle) had a population of 500,000 in 1940, and a population of 2,000,000 in 2010. Obviously, it has grown significantly, but so has the rest of the United States. If 2,000,000 would qualify Seattle as a moderately large city today, what did 500,000 mean by 1940 standards? To make a direct comparison, one would need to compare the 2010 population relative to the 2010 statistical distribution of county populations to the 1940 population relative to the 1940 distribution.

The most common form of standardization is the z-score, which measures the number of standard deviations the value deviates from the mean. The z-score thus measures whether a value is extraordinarily high or low, or if it is "typical."

Another method is to divide each value by the maximum in its distribution. An example in GIS is if after analysis there was an area range (in acres) of 2-1250. In this case, the parcel with area = 2 acres will have a normalized value = 2/1250 = 0.0016; the parcel with maximum area of 1250 will have a normalized value =1250/1250 = 1. By do this for each of the areas, parcels now are brought to one scale with the maximum value of 1, and the largest area will receive the maximum score.

## Field Normalization

If a choropleth map was made to show data acquired during a census and showed the number of people per census tract, the result might be that larger census tracts in rural areas show greater population than the smaller tracts in urban areas. However, by normalizing the tract size to show people per square mile or people per hectare, the constancy in tract size would show that there are greater populations in the urban areas of the map. Normalizing the map, in this case, preserves the accuracy of the data and shows that more people live in urban areas than rural.

A common mistake made by people new to mapping is comparing areas based on a count statistic such as the number of people who fall into a particular category (e.g., marital status, age, ethnicity). This is not a very meaningful analysis because areas are usually arbitrary in size, and larger areas typically will have more people. Normalizing data factors out the size of areas by transforming counts (measures of magnitude) into ratios (measures of intensity). ^{[1]}

While variables are often normalized against area, this is not the only possible way. A marital status variable could be normalized against the population over age 15, household composition(e.g., one-person households) could be normalized against total households, and household income could be normalized against total households. It is important to remember that just because a ratio can be created out of data, does not mean that is the best way. You could very easily misrepresent your data if you choose to normalize data against the wrong variable.