Exploratory data analysis
Exploratory Data Analysis (EDA) in Statistics is an approach to analyzing data sets, and from that analysis, summarizing their main characteristics. EDA can be done using a statistical model, but it is mainly used for looking at what the data can tell us apart from hypothesis testing or formal modeling.
Objectives
Exploratory data analysis can involve a variety of techniques. The following is a list of the most commonly used tasks that should be performed when exploring data: [1]
- Examine the distribution of your data
- Look for global and local outliers
- Look for global trends
- Examine local variation
- Examine spatial autocorrelation
Although not all of these steps are necessary in all cases, they can help lead to the following: [2]
- Maximizing insight into a data set
- Uncovering underlying structure
- Extracting important variables
- Determining optimal factor settings
Exploring the data with these goals in mind can help possibly formulate hypotheses that could promote new data collection and experiments. Knowing and becoming familiar with data being used can yield stronger tests and models, providing more accurate and confident results.
EDA and Geography
- NIST e-Handbook
- ArcGIS
- CAATMOG 49
- GeoDa workbook
- GeoViz toolkit
Exploratory Data Analysis helps the user identify if there are any spatial attributes to their data. As mentioned in Tobler’s first law of geography, some regions might influence certain attributes in nearby regions. EDA can help visualize these relationships.
History
The original work in Exploratory Data Analysis (EDA) is Exploratory Data Analysis, Tukey, (1977). [6] Over the years is has been useful from other noteworthy publications such as Data Analysis and Regression, Mosteller and Tukey (1977) [7] and it has gained its popularity as the "way" to analyze data set.
Techniques

There are many tools that are helpful for EDA. Some of these tools are listed below:
- Scatter plots
- Histograms
- Stem & leaf plot
- Box plot
- Parallel coordinates
- Run chart
- Multi-variability chart
Software
- SpaceStat - a program that provides an extensive suite of spatiotemporal and statistical tools including: exploratory spatial data analysis; spatial econometric analyses; and the creation of spatial weights sets and variogram models.
Notes
- ↑ "Exploratory Spatial Data Analysis (ESDA)". ArcGIS Resources. http://resources.arcgis.com/en/help/main/10.1/index.html#//00310000000m000000
- ↑ http://www.itl.nist.gov/div898/handbook/eda/section1/eda11.htm/
- ↑ Anselin, Luc, Ibnu Syabri and Youngihn Kho (2006). GeoDa: An Introduction to Spatial Data Analysis. Geographical Analysis 38 (1), 5-22.
- ↑ ESRI, "Exploratory Spatial Data Analysis (ESDA)", http://desktop.arcgis.com/en/arcmap/latest/extensions/geostatistical-analyst/exploratory-spatial-data-analysis-esda-.htm, Accessed September 24, 2017.
- ↑ John, D. S., Goodchild, M. F., & Longley, P. (2015). Geospatial analysis: A comprehensive guide to principles, techniques and software tools (5th ed.). http://www.spatialanalysis.com
- ↑ Tukey, John (1977), Exploratory Data Analysis, Addison-Wesley.
- ↑ Mosteller, Frederick and Tukey, John (1977), Data Analysis and Regression, Addison-Wesley.
- ↑ http://www.statgraphics.com/eda.htm
References
"What Is EDA?" Engineering Statistics Handbook. N.p., n.d. http://www.itl.nist.gov/div898/handbook/eda/section1/eda11.htm
John, D. S., Goodchild, M. F., & Longley, P. (2015). Geospatial analysis: A comprehensive guide to principles, techniques and software tools (5th ed.). Retrieved September 1, 2016, from http://www.spatialanalysis.com
Miller, H. J. (2004) "Tobler's First Law and spatial analysis". Annals of the Association of American Geographers, 94, 284–289.