Multiple Regression

From wiki.gis.com
Jump to: navigation, search

In Statistics, Multiple Regression is a type of regression analysis intended to find the strength and form of relationships between a dependent variable and multiple independent variables. It acknowledges that several factors can interact simultaneously, as is often the case in geography.

Regression analysis is used for prediction, forecasting and to understand the independent and dependent variables correlate with each other. Along with correlation with the dependent variable, a high multicollinearity (correlation between independent variables[1]) should not exist.

Equation

The multiple regression model is represented mathematically as an algebraic relationship between a response variable (Y) and one or more predictor variables (X1,X2...). Both should be quantitative, having meaningful numerical values.

\hat{y} = a + b_{1}x_{1} + b_{2}x_{2} +....+ b_{p}x_{p}
  • {a} = Constant
  • {x} = Independent Variables
  • {b} = Slope
  • \hat{y} =Dependent Variable
  • {p} = The number of Independent Variables.

Assumptions

When dealing with multiple regression we assume the following:

  • The relationship between the independent variable and the dependent variable is linear.
  • The residuals have mean zero.
  • The residuals are independent.
  • For each value of a independent variable having a normal distribution.
  • There is no multicollinearity.

Multicollinearity

With multiple regression, it is assumed that there is no multicollinearity among the independent variables. This simply means that the correlation between independent variables should not be significant and that one can be linearly predicted from the others with a non-trivial degree of accuracy. If multicollinearity does exist among independent variables in a multiple regression model, then the variables selected lack precision and could lead to inaccurate results. There are numerous way to avoid coming across multicollinearity in a multiple regression problem. Some of the more common solutions include:

  • Obtaining more precise data [2]
  • Standardizing your independent variables [3]
  • Dropping the variables in question all together [4]

Multiple Regression in Geography

Shows a Multiple Regression performed on the dependent variable population percentage increase based on several independent variables such as January mean temperature, July mean temperature, urban influence, census region, and area humidity.

In geographic analysis regression allows for the exploration of spatial patterns. It can help to explain the factors that influence these patterns that arise as the result of analyzing variables in a GIS[5]. There are numerous ways you can use multiple regression in a geographic studies. Some example variables that can be examined include:

  • Prices of houses in a community. [6]
  • Suitable habitat for an animal species.
  • The reported cases of a certain disease throughout an area.
  • Population increase in an area.

References

  1. Rogerson, Peter (2010). Statistical Methods for Geography: A Student's Guide. Thousand Oaks, California: SAGE Publications Inc.. 225-227. 
  2. Rogerson, Peter (2010). Statistical Methods for Geography: A Student's Guide. Thousand Oaks, California: SAGE Publications Inc.. 227. 
  3. Rogerson, Peter (2010). Statistical Methods for Geography: A Student's Guide. Thousand Oaks, California: SAGE Publications Inc.. 227. 
  4. Rogerson, Peter (2010). Statistical Methods for Geography: A Student's Guide. Thousand Oaks, California: SAGE Publications Inc.. 227. 
  5. Regression Analysis Basics. Retrieved from: http://resources.esri.com/help/9.3/arcgisdesktop/com/gp_toolref/spatial_statistics_toolbox/regression_analysis_basics.htm
  6. Rogerson, Peter (2010). Statistical Methods for Geography: A Student's Guide. Thousand Oaks, California: SAGE Publications Inc.. 225. 

See also