Geocoding

From wiki.gis.com
Jump to: navigation, search

Geocoding is the process of assigning a location in the form of geographic coordinates (often expressed as latitude and longitude) by comparing it to other geographic data, such as street addresses, or zip codes (postal codes). With geographic coordinates the features can be mapped and entered into Geographic Information Systems, or the coordinates can be embedded into media such as digital photographs via geotagging.

Reverse geocoding is the opposite: finding an associated textual location such as a street address, from geographic coordinates.

A geocoder is a piece of software or a (web) service that helps in this process.

Address interpolation

A simple method of geocoding is address interpolation. This method makes use of data from a street geographic information system where the street network is already mapped within the geographic coordinate space. Each street segment is attributed with address ranges (e.g. house numbers from one segment to the next). Geocoding takes an address, matches it to a street and specific segment (such as a block, in towns that use the "block" convention). Geocoding then interpolates the position of the address, within the range along the segment.

Example

Take for example: 742 Evergreen Terrace

Let's say that this segment (for instance, a block) of Evergreen Terrace runs from 700 to 799. Even-numbered addresses would fall on one side (e.g. west side) of Evergreen Terrace, with odd-numbered addresses on the other side (e.g. east side). 742 Evergreen Terrace would (probably) be located slightly less than halfway up the block, on the west side of the street. A point would be mapped at that location along the street, perhaps offset some distance to the west of the street centerline.

Complicating factors

However, this process is not always as straightforward as in this example.

Difficulties arise when

  • distinguishing between ambiguous addresses such as 742 Evergreen Terrace and 742 W Evergreen Terrace.
  • attempting to geocode new addresses for a street that is not yet added to the geographic information system database.

While there might be 742 Evergreen Terrace in Springfield, there might also be a 742 Evergreen Terrace in Shelbyville. Asking for the city name (and state, province, country, etc. as needed) can solve this problem. Some situations require use of postal codes or district name for disambiguation. For example, there are multiple 100 Washington Streets in Boston, Massachusetts[1] because several cities have been annexed without changing street names.

Finally, several caveats on using interpolation:

  • The typical attribution of a street segment assumes that all "even" numbered parcels are on one side of the segment, and all "odd" numbered parcels are on the other. This is often not true in real life.
  • Interpolation assumes that the given parcels are evenly distributed along the length of the segment. This is almost never true in real life; it is not uncommon for a geocoded address to be off by several thousand feet.
  • Segment Information (esp. from sources such as TIGER) includes a maximum upper bound for addresses and is interpolated as though the full address range is used. For example, a segment (block) might have a listed range of 100-199, but the last address at the end of the block is 110. In this case, address 110 would be geocoded to 10% of the distance down the segment rather than near the end.
  • Most interpolation implementations will produce a point as their resulting "address" location. In reality, the physical address is distributed along the length of the segment, i.e. consider geocoding the address of a shopping mall - the physical lot may run quite some distance along the street segment (or could be thought of as a two-dimensional space-filling polygon which may front on several different streets - or worse, for cities with multi-level streets, a three-dimensional shape that meets different streets at several different levels) but the interpolation treats it as a singularity.

A very common error is to believe the accuracy ratings of a given map's geocodable attributes. Such "accuracy" currently touted by most vendors has no bearing on an address being attributed to the correct segment, being attributed to the correct "side" of the segment, nor resulting in an accurate position along that correct segment. With the geocoding process used for U.S. Census TIGER datasets, 5-7.5% of the addresses may be allocated to a different census tract, while 50% of the geocoded points might be located to a different property parcel. [2]

Because of this, it is quite important to avoid using interpolated results except for non-critical applications, such as pizza delivery. Interpolated geocoding is usually not appropriate for making authoritative decisions, for example if life safety will be impacted by that decision. Emergency services, for example, do not make an authoritative decision based on their interpolations; an ambulance or fire truck will always be dispatched regardless of what the map says.

Other techniques

Other means of geocoding might include locating a point at the centroid (center) of a land parcel, if parcel (property) data is available in the geographic information system database. In rural areas or other places lacking high quality street network data and addressing, GPS is useful for mapping a location. For traffic accidents, geocoding to a street intersection or midpoint along a street centerline is a suitable technique. Most highways in developed countries have mile markers to aid in emergency response, maintenance, and navigation.

Address locators that use different reference data types or that require different input information can be combined in a Composite Address Locator. An address is processed through each locator in turn, until a match is found. This method can be helpful if the input address table may have inaccurate addresses or incomplete address information. A common use of a composite address locator is combining a street address locator with a zip code locator. When an address is not matched according to its street address, the zip code can be used to associate this address with a zip code region center point.

An alternate name table can be used when a single street segment may have multiple names. For example, the main road through the center of Saratoga Springs, NY is called Broadway, but portions of this road are also associated with State Highway 50, Route 29, Route 9, and Route 9N. In order to account for addresses that use any of these street names, an alternate name table can be used to check for the various names.

It is also possible to use a combination of these geocoding techniques - using a particular technique for certain cases and situations and other techniques for other cases.

Solutions for complicating factors and geocoding softwares

In different softwares and programs they have varying amounts of capability to resolve issues addressed in the Complicating factors section above.

Google Earth Pro

This is a free Google software with a very powerful geocoding engine because it can uses Google's robust algorithmic equations to resolve conflicting addresses. Google is "smart" enough to realize that "123 St. Street Provo Ut" is the same thing as "123 State Street, Provo, Utah". Follow the detailed instructions at: Google Support: Import Addresses. This link also included requirements for the data format needed in order to be read by the Google Earth Pro geocoding engine.

ArcMap This is not a free software, but the company that makes this software specialized in Geospatial technologies and has robust ways to resolve complicating factors in the program. There is more infomation here: ArcMap Geocoding Capabilities


QGIS

Uses

Geocoded locations are useful in many GIS analysis and cartography tasks. Geocoding is common on the web, for services like finding driving directions to or from some address, or finding a list of the geographically nearest store or service locations. Geocoding is one of several methods of obtaining geographic coordinates for geotagging media, such as photographs or RSS items.

Unique Feature Identifier

A Unique Feature Identifier (UFI) is a number used for geocoding that uniquely identifies cities, towns, villages and other geographic features.[3] For example, the Rio Paya (a stream) in Venezuela has the Unique Feature Identifier -955094.[4]

Privacy concerns

The proliferation and ease of access to geocoding (and reverse-geocoding) services raises privacy concerns. For example, in mapping crime incidents, law enforcement agencies aim to balance the privacy rights of victims and offenders, with the public's right to know. Law enforcement agencies have experimented with alternative geocoding techniques that allow them to mask some of the locational detail (e.g., address specifics that would lead to identifying a victim or offender). As well, in providing online crime mapping to the public, they also place disclaimers regarding the locational accuracy of points on the map, acknowledging these location masking techniques, and impose terms of use for the information.

List of some geocoding systems

Some of these code systems are free for use, others have different licences.

  • ISO 6709 Standard Representation for Geographic Point Location by Coordinates
  • C-squares - compact encoding of geographic coordinate bounds (latitude-longitude)
  • FIPS country codes (FIPS 10-4), area code, administrative, free
  • FIPS place codes (FIPS 55) U.S. only, free
  • FIPS county codes (FIPS 6-4) US only, free
  • FIPS state codes (FIPS 5-2) US only, free
  • Canadian Location Code, encodes a weather forecast region
  • Geohash, compact string encoding of a geographic coordinate with arbitrary precision, in public domain
  • Georef, a military / air naviation coordinate system for point and area identification
  • IATA airport codes, area /point codes, airports
  • ICAO airport codes, area /point codes, airports
  • IANA country codes similar to ISO 3166-1 alpha-2
  • IOC country codes, area, worldwide
  • ISO 3166 country and subdivision codes
  • ITU-R country codes
  • ITU-T country calling codes
  • ITU-T mobile calling codes
  • Maidenhead Locator System
  • MapDot Protocol: world locations coded into a zone sequence[1], free
  • MARC country codes
  • Marsden Squares
  • NAC, area codes (area can be indefinitely small)
  • NUTS area code, partially administrative, worldwide: countries, Europe : country to community
  • ONS code, UK only, administrative
  • Postal codes, area, worldwide, country-codes by UPU, free
  • Quarter Degree Grid Cells
  • UN M.49 region codes, area code, continents, countries (like ISO 3166-1 numeric)
  • SALB (Second Administrative Level Boundaries), by UN [2]
  • SGC codes, Canada only, statistical
  • UN/LOCODE, area, administrative, cities
  • UTM
  • WMO squares

List of Geocoding Web Services

See also

References

  1. Google Maps
  2. Ratcliffe, Jerry H. (2001). "On the accuracy of TIGER-type geocoded address data in relation to cadastral and census areal units". International Journal of Geographic Information Sciences 15 (5). http://jratcliffe.net/papers/Ratcliffe%20(2001)%20On%20the%20accuracy%20of%20TIGER-type%20geocoding.pdf. 
  3. GNS Help. National Geospatial Intelligence Agency Web site.Accessed 17 September 2010.
  4. Geographical name data for Rio Paya, Venezuela. Geographic.org Web site.

External links