Batch Address Locator Testing

From wiki.gis.com
Jump to: navigation, search

Address Locator Testing

ArcGIS Server 10.2.2

Esri address locator testing found that a properly configured server can perform geocoding quite well. The processing challenge comes with batch geocoding where you are generating thousands of addresses to geo-enrich your database. It is important to properly size and configure your ArcGIS Server for optimum throughput when geocoding a large number of addresses.

Geocoding performance will depend on the address locator engine that you are using (e.g. Esri’s StreetMap Premium Address Locator).
Figure 1.1 Geocodes/hour, average response times, and CPU service times were generated from a series of address locator test results. The use of a variety of batch transaction sizes show roughly the same throughput.
In the optimum server configuration, ArcGIS Server will load the geocoding engine (the indexed locator dataset) into memory and queries will be completed at memory speeds (very fast). On a multi-core server, you can setup multiple geocoding instances – each instance will load its own indexed locator dataset into memory for optimum performance. Therefore, the server’s total memory is an important consideration; as you add more concurrent geocoding instances to the server, it is possible to exhaust the available platform physical memory which can significantly reduce peak throughput performance.

Esri testing with the StreetMap Premium Address Locator dataset demonstrated that batch geocode processing with a single service instance can perform at a throughput of about 250 geocodes/second (900,000 geocodes/hour). A batch job running multiple geocoding instances can be processed in parallel by running additional service instances. Each geocoding service instance will load a dedicated locator dataset into memory to optimize processing throughput.

In our testing, we deployed 10 concurrent service instances and were able to process a throughput of about 2500 geocode requests per second (9,000,000 requests per hour) on a single server. At 10 instances the server’s available physical memory was exhausted and peak throughput rates were reduced due to memory thrashing when deploying additional service instances.

The test configuration included a single physical server with four (4) Intel Xeon E5-4650 8-core processors (32 core total) with only 64 GB of physical memory (SPECrate_int2006 throughput = 37.6/core). Memory utilization increased by roughly 10 percent with each additional deployed service instance. The physical memory was exhausted with 10 concurrent geocoding service instances, reaching only 20 percent server CPU utilization. Peak throughput for this test was clearly limited by the amount of available physical memory resources on the server.

Increasing physical memory to accommodate multiple CPU processing capacity should result in equivalent (or near linear) increase in platform throughput. Therefore, increasing the server’s physical memory to 256 GB should support 32-40 concurrent service instances with an estimated peak throughput of around 100,000 geocode requests per second or an estimated 36,000,000 requests per hour at roughly 80 percent CPU utilization.

The processing throughput rate is determined by the processor speed. Faster processors will provide higher per core throughput rates. If an Intel Xeon E5-4650 with a per core SPEC rate of 37.6 can produce 250 geocodes/second per core, then a server with a per core SPEC of 45 (20% faster) should be able to produce 300 geocodes/second per core (assuming that there is sufficient physical memory on the server for near linear scaling).

The information above is based on our specific test configuration. Actual performance can vary depending on your selected platform and address locator dataset configuration.