Exploratory data analysis

From wiki.gis.com
Jump to: navigation, search

Exploratory Data Analysis (EDA) in Statistics is an approach to analyzing data sets, and from that analysis, summarizing their main characteristics. EDA can be done using a statistical model, but it is mainly used for looking at what the data can tell us apart from hypothesis testing or formal modeling.

Objectives

Exploratory data analysis can involve a variety of techniques. The following is a list of the most commonly used tasks that should be performed when exploring data: [1]

  • Examine the distribution of your data
  • Look for global and local outliers
  • Look for global trends
  • Examine local variation
  • Examine spatial autocorrelation

Although not all of these steps are necessary in all cases, they can help lead to the following: [2]

  • Maximizing insight into a data set
  • Uncovering underlying structure
  • Extracting important variables
  • Determining optimal factor settings

Exploring the data with these goals in mind can help possibly formulate hypotheses that could promote new data collection and experiments. Knowing and becoming familiar with data being used can yield stronger tests and models, providing more accurate and confident results.

EDA and Geography

Exploratory Data Analysis allows us to interpret spatial relationships, as shown in this GeoDa graphic of the Homicede rate in the 1990's Southern United States.[3]
EDA can be used in the context of spatial data to observe spatial relationships. EDA is an excellent way to analyze the data from the map and to think of ways to best potray that phenomena in a logical and expressive way.[4] Several online computer programs have been created that reference EDA, or graphically represent spatial statistical relationships. [5] Some of these programs include:
  • NIST e-Handbook
  • ArcGIS
  • CAATMOG 49
  • GeoDa workbook
  • GeoViz toolkit

Exploratory Data Analysis helps the user identify if there are any spatial attributes to their data. As mentioned in Tobler’s first law of geography, some regions might influence certain attributes in nearby regions. EDA can help visualize these relationships.

History

The original work in Exploratory Data Analysis (EDA) is Exploratory Data Analysis, Tukey, (1977). [6] Over the years is has been useful from other noteworthy publications such as Data Analysis and Regression, Mosteller and Tukey (1977) [7] and it has gained its popularity as the "way" to analyze data set.

Techniques

Example of a scatterplot matrix with each cell containing a plot for a selected pair of variables. Each plot in the rows have the same variable on the Y axis, and each plot in the columns have the same variable in the X axis. This helps reveal any relationships within the data. source: StatgraphicsCenturion[8]

There are many tools that are helpful for EDA. Some of these tools are listed below:

  • Scatter plots
  • Histograms
  • Stem & leaf plot
  • Box plot
  • Parallel coordinates
  • Run chart
  • Multi-variability chart

Software

  • SpaceStat - a program that provides an extensive suite of spatiotemporal and statistical tools including: exploratory spatial data analysis; spatial econometric analyses; and the creation of spatial weights sets and variogram models.

Notes

  1. "Exploratory Spatial Data Analysis (ESDA)". ArcGIS Resources. http://resources.arcgis.com/en/help/main/10.1/index.html#//00310000000m000000
  2. http://www.itl.nist.gov/div898/handbook/eda/section1/eda11.htm/
  3. Anselin, Luc, Ibnu Syabri and Youngihn Kho (2006). GeoDa: An Introduction to Spatial Data Analysis. Geographical Analysis 38 (1), 5-22.
  4. ESRI, "Exploratory Spatial Data Analysis (ESDA)", http://desktop.arcgis.com/en/arcmap/latest/extensions/geostatistical-analyst/exploratory-spatial-data-analysis-esda-.htm, Accessed September 24, 2017.
  5. John, D. S., Goodchild, M. F., & Longley, P. (2015). Geospatial analysis: A comprehensive guide to principles, techniques and software tools (5th ed.). http://www.spatialanalysis.com
  6. Tukey, John (1977), Exploratory Data Analysis, Addison-Wesley.
  7. Mosteller, Frederick and Tukey, John (1977), Data Analysis and Regression, Addison-Wesley.
  8. http://www.statgraphics.com/eda.htm

References

"What Is EDA?" Engineering Statistics Handbook. N.p., n.d. http://www.itl.nist.gov/div898/handbook/eda/section1/eda11.htm

John, D. S., Goodchild, M. F., & Longley, P. (2015). Geospatial analysis: A comprehensive guide to principles, techniques and software tools (5th ed.). Retrieved September 1, 2016, from http://www.spatialanalysis.com

Miller, H. J. (2004) "Tobler's First Law and spatial analysis". Annals of the Association of American Geographers, 94, 284–289.