GIS Data Administration - 27th Edition (Spring 2010)

Jump to: navigation, search
System Design Strategies
System Design Strategies 27th Edition (Spring 2010)
1. System Design Process 2. GIS Software Technology 3. Software Performance 4. GIS Data Administration
5. Performance Fundamentals 6. Network Communications 7. GIS Product Architecture 8. Information Security
9. Platform Performance 10. Capacity Planning Tool 11. City of Rome 12. System Implementation

GIS Data Administration - 27th Edition (Spring 2010)

Data management is a primary consideration when developing enterprise GIS architectures. Enterprise GIS normally benefits from efforts to consolidate agency GIS data resources. There are several reasons for supporting data consolidation. These reasons include improving user access to data resources, providing better data protection, and enhancing the quality of the data. Consolidation of IT support resources also reduces hardware cost and the overall cost of system administration.

The simplest and most cost-effective way to manage data resources is to keep one copy of the data in a central data repository and provide required user access to this data to support data maintenance and operational GIS query and analysis needs. This is not always practical, and many system solutions require that organizations maintain distributed copies of the data. Significant compromises may have to be made to support distributed data architectures.

This section provides an overview of data management technology. Several basic data management tasks will be identified along with the current state of technology to support these tasks. These data management tasks include ways to store, protect, back up, move, and manage spatial data.

Ways to Store Spatial Data

Storage technology has evolved over the past 20 years to improve data access and provide better management of available storage resources. Understanding the advantages of each technical solution will help you select the storage architecture that best supports your needs.

Evolution of Storage Area Networks

Figure 4.1 provides an overview of the evolution of traditional storage from internal workstation disk to the storage area network architecture.

Figure 4-1 Advent of the Storage Area Network

Internal Disk Storage. The most elementary storage architecture puts the storage disk on the local machine. Most computer hardware today includes internal disk for use as the storage medium. Workstations and servers can both be configured with internal disk storage. The fact that access to it is through the local workstation or server can be a significant limitation in a shared server environment: if the server operating system goes down, there is no way for other systems to access the internal data resources.

File server storage provides a network share that can be accessed by many client applications within the local network. Disk mounting protocols (NFS and CIFS) provide local application access over the network to the data on the file server platform. Query processing is provided by the application client, which can involve a high amount of chatty communications between the client and server network connection.

Database server storage provides query processing on the server platform, which significantly reduces the required chatty network communications. Database software also improves data management by providing better administration and control of the data.

Internal storage can include RAID mirror disk volumes that will preserve the data store in the event of a single disk failure. Many servers include storage trays that provide multiple disk drives for configuring RAID 5 configurations and facilitate high capacity storage needs. The internal storage access is limited to the host server, so as many data center environments grew larger in the 1990s customers would have many servers in their data center with too much disk (disk not being used), and other servers with too little disk making disk volume management a challenge (data volumes could not be shared between server internal storage volumes). External storage architecture (Direct Attached, Storage Area Networks, and Network Attached Storage) provides a way for organizations to “break out” from these “silo based” storage solutions and build a more manageable and adaptive storage architecture.

Direct Attached Storage. A direct attached storage (DAS) architecture provides the storage disk on an external storage array platform. Host bus adaptors (HBA) connect the server operating system to the external storage controller using the same block level protocols that were used for Internal Disk Storage, so from an application perspective the direct attached storage appears and functions the same as internal storage. The external storage arrays can be designed with fully redundant components (system would continue operations with any single component failure), so a single storage array product can satisfy high available storage requirements.

Direct attached storage technology would often provide several fiber channel connections between the storage controller and the server HBAs. For high availability purposes, it is standard practice to configure two HBA fiber channel connections for each server environment. Typical Direct Attached Storage solutions would provide from 4 to 8 fiber channel connections, so you can easily connect up to 4 servers each with redundant fiber channel connections from a single direct connect storage array controller. Multiple disk storage volumes are configured and assigned to each specific host server, and the host servers would have full access control to the assigned storage volumes. In a server failover scenario, the primary server disk volumes can be reassigned to the failover server.

Storage Area Networks. The difference between direct attached storage and a storage area network is the introduction of a Fiber Channel Switch to establish network connectivity between multiple Servers and multiple external Storage Arrays. The storage area network (SAN) improves administrative flexibility for assigning and managing storage resources when you have a growing number of server environments. The Server HBAs and the External Storage Array controllers are connected to the Fiber Channel Switch, so any Server can be assigned storage volumes from any Storage Array located in the storage farm (connected through the same storage network). Storage protocols are still the same as with Direct Attached or Internal Storage – so from a software perspective, these storage architecture solutions appear the same and are transparent to the application and data interface.

Evolution of Network Attached Storage

Network Attached Storage. By the late 1990s, many data centers were using servers to provide client application access to shared file data sources. High available environments would require complicated failover clustered file servers, so if one of the file servers fail users would still have access to the file share. Hardware vendors decided to provide a highbred appliance configuration to handle these network file shares (called Network Attached Storage or NAS) – the network attached storage incorporates a file server and storage in a single consolidated high available storage platform. The file server can be configured with a modified operating system that provides both NFS and CIFS disk mount protocols, and a storage array with this modified file server network interface is deployed as a simple network attached storage appliance. The storage appliance includes a standard Network Interface Card (NIC) interface to the local area network, and client applications can connect to the storage appliance file shares over standard disk mount protocols. The network attached storage provided a very simple way to deploy a high capacity network file share for access by a large number of UNIX and/or Windows network clients. Figure 4-2 shows the evolution of the Network Attached Storage architecture.

Figure 4-2 Advent of the Network Attached Storage

Network attached storage provides a very effective architecture alternative for supporting network file shares, and has become very popular among many GIS customers. When GIS data migrated from early file based data stores (coverages, LIBRARIAN, ArcStorm, Shapefiles) to a more database centric data management environment (ArcSDE Geodatabase servers), the network attached storage vendors suggested customers could use a network file share to support database server storage. There were some limitations: It is important to assign dedicated data storage volumes controlled by each host database server to avoid data corruption. Other limitations include slower database query performance due to chatty IP disk mount protocols and bandwidth over the IP network was lower than the Fiber Channel switch environments (1 Gbps IP networks vs 2 Gbps Fiber Channel networks) – implementation of Network Attached Storage as an alternative to Storage Area Networks was not an optimum storage architecture for geodatabase server environments. Network attached storage was an optimum architecture for file based data sources and use of the NAS technology alternative continued to grow.

Because of the simple nature of network attached storage solutions, you can use a standard local area network (LAN) Switch to provide a network to connect your servers and storage solutions; this is a big selling point for the NAS proponents. There is quite a bit of competition between Storage Area Networks and Network Attached Storage technology, particularly when supporting the more common database environments. The SAN community will claim their architecture has higher bandwidth connections and uses standard storage block protocols. The NAS community will claim they can support your storage network using standard LAN communication protocols and provide support for both database server and network file access clients from the same storage solution.

The network attached storage community eventually introduced a more efficient iSCSI communication protocol for IP network storage networks (SCSI storage protocols over IP networks). GIS architectures today include a growing number of file data sources (examples include ArcGIS Server Image Extention, ArcGIS Server pre-processed 2-D and 3-D map cache, and file geodatabase). For many GIS operations, a blend of these storage technologies (SAN and NAS) provides the optimum storage solution. Introduction of ISCSI over an IP switched storage network architecture provides an attractive compromise for mixed DBMS/File Share storage requirements.

Ways to Protect Spatial Data

Enterprise GIS environments depend heavily on GIS data to support a variety of critical business processes. Data is one of the most valuable resources of a GIS, and protecting data is fundamental to supporting critical business operations.

The primary data protection line of defense is provided by the storage solutions. Most storage vendors have standardized on redundant array of independent disks (RAID) storage solutions for data protection. A brief overview of basic storage protection alternatives includes the following:

Just a Bunch of Disks (JBOD): A disk volume with no RAID protection is referred to as just a bunch of disks configuration, or (JBOD). This represents a configuration of disks with no protection and no performance optimization.

RAID 0: A disk volume in a RAID 0 configuration provides striping of data across several disks in the storage array. Striping supports parallel disk controller access to data across several disks reducing the time required to locate and transfer the requested data. Data is transferred to array cache once it is found on each disk. RAID 0 striping provides optimum data access performance with no data protection. One hundred percent of the disk volume is available for data storage.

RAID 1: A disk volume in a RAID 1 configuration provides mirror copies of the data on disk pairs within the array. If one disk in a pair fails, data can be accessed from the remaining disk copy. The failed disk can be replaced and data restored automatically from the mirror copy without bringing the storage array down for maintenance. RAID 1 provides optimum data protection with minimum performance gain. Available data storage is limited to 50 percent of the total disk volume, since a mirror disk copy is maintained for every data disk in the array.

RAID 3 and 4: A disk volume in a RAID 3 or RAID 4 configuration supports striping of data across all disks in the array except for one parity disk. A parity bit is calculated for each data stripe and stored on the parity disk. If one of the disks fails, the parity bit can be used to recalculate and restore the missing data. RAID 3 provides good protection of the data and allows optimum use of the storage volume. All but one parity disk can be used for data storage, optimizing use of the available disk volume for data storage capacity.

There are some technical differences between RAID 3 and RAID 4, which, for our purposes, are beyond the scope of this discussion. Both of these storage configurations have potential performance disadvantages. The common parity disk must be accessed for each write, which can result in disk contention under heavy peak user loads. Performance may also suffer because of requirements to calculate and store the parity bit for each write. Write performance issues are normally resolved through array cache algorithms on most high-performance disk storage solutions.

The following RAID configurations are the most commonly used to support ArcSDE storage solutions. These solutions represent RAID combinations that best support data protection and performance goals. Figure 4-3 provides an overview of the most popular composite RAID configuration.

Figure 4-3 Ways to Protect Spatial Data (Standard RAID Configurations)

RAID 1/0: RAID 1/0 is a composite solution including RAID 0 striping and RAID 1 mirroring. This is the optimum solution for high performance and data protection. This is also the costliest solution. Available data storage is limited to 50 percent of the total disk volume, since a mirror disk copy is maintained for every data disk in the array.

RAID 5: RAID 5 includes the striping and parity of the RAID 3 solution and the distribution of the parity volumes for each stripe across the array to avoid parity disk contention performance bottlenecks. This improved parity solution provides optimum disk utilization and near optimum performance, supporting disk storage on all but one parity disk volume.

Hybrid Solutions: Some vendors provide alternative proprietary RAID strategies to enhance their storage solution. New ways to store data on disk can improve performance and protection and may simplify other data management needs. Each hybrid solution should be evaluated to determine if and how it may support specific data storage needs.

ArcSDE data storage strategies depend on the selected database environment.

SQL Server: Log files are located on RAID 1 mirror, and index and data tables are located on RAID 5 disk volume.

Oracle, Informix, and DB2: Index tables and log files are located on RAID 1/0 mirror, and striped data volumes and data tables are located on RAID 5.

Ways to Back Up Spatial Data

Data protection at the disk level minimizes the need for system recovery in the event of a single disk failure but will not protect against a variety of other data failure scenarios. It is always important to keep a current backup copy of critical data resources at a safe location away from the primary site. Data backups typically provide the last line of defense for protecting data investments. Careful planning and attention to storage backup procedures are important factors to a successful backup strategy. Data loss can result from many types of situations, with some of the most probable situations being administrative or user error. Figure 4-4 provides an overview of the different ways to back up spatial data.

Figure 4-4 Ways to Back Up Spatial Data

Host Tape Backup: Traditional server backup solutions use lower-cost tape storage for backup. Data must be converted to a tape storage format and stored in a linear tape medium. Backups can be a long drawn out process taking considerable server processing resource (typically consume a CPU during the backup process) and requiring special data management for operational environments.

For database environments, these backups must occur based on a single point in time to maintain database continuity. Database vendors support online backup requirements by establishing a procedural snapshot of the database. A copy of the protected snapshot data is retained in a snapshot table when changes are made to the database, supporting point-in-time backup of the database and potential database recovery back to the time of the snapshot.

Host processors can be used to support backup operations during off-peak hours. If backups are required during peak-use periods, backups can impact server performance.

Network Client Tape Backup: The traditional online backup can often be supported over the LAN with the primary batch backup process running on a separate client platform. DBMS snapshots may still be used to support point-in-time backups for online database environments. Client backup processes can contribute to potential network performance bottlenecks between the server and the client machine because of the high data transfer rates during the backup process.

Storage Area Network Client Tape Backup: Some backup solutions support direct disk storage access without impacting the host DBMS server environment. Storage backup is performed over the SAN or through a separate fiber channel access to the disk array with batch process running on a separate client platform. A disk-level storage array snapshot is used to support point-in-time backups for online database environments. Host platform processing loads and LAN performance bottlenecks can be avoided with disk-level backup solutions.

Disk Copy Backup: The size of databases has increased dramatically in recent years, growing from tens of gigabytes to hundreds of gigabytes and, in many cases, terabytes of data. Recovery of large databases from tape backups is very slow, taking days to recover large spatial database environments. At the same time, the cost of disk storage has decreased dramatically providing disk copy solutions for large database environments competitive in price to tape storage solutions. A copy of the database on local disk, or a copy of these disks to a remote recovery site, can support immediate restart of the DBMS following a storage failure by simply restarting the DBMS with the backup disk copy.

Ways to Move Spatial Data

Many enterprise GIS solutions require continued maintenance of distributed copies of the GIS data resources, typically replicated from a central GIS data repository or enterprise database environment. Organizations with a single enterprise database solution still have a need to protect data resources in the event of an emergency such as fire, flood, accidents, or other natural disasters. Many organizations have recently reviewed their business practices and updated their plans for business continuance in the event of a major loss of data resources. The tragic events of September 11, 2001, demonstrated the value of such plans and increased interest and awareness of the need for this type of protection.

This section reviews the various ways organizations move spatial data. Traditional methods copy data on tape or disk and physically deliver this data to the remote site through standard transportation modes. Once at the remote site, data is reinstalled on the remote server environment. Technology has evolved to provide more efficient alternatives for maintaining distributed data sources. Understanding the available options and risks involved in moving data is important in defining optimum enterprise GIS architecture.

Traditional Data Transfer Methods

Figure 4-5 identifies traditional methods for moving a copy of data to a remote location.

Figure 4-5 Ways to Move Spatial Data (Traditional Tape Backup/Disk Copy)

Traditional methods include backup and recovery of data using standard tape or disk transfer media. Moving data using these methods is commonly called "sneaker net." These methods provide a way to transfer data without the support of a physical network.

Tape Backup: Tape backup solutions can be used to move data to a separate server environment. Tape transfers are normally very slow. The reduced cost of disk storage has made disk copy a much more feasible option.

Disk Copy: A replicated copy of the database on disk storage can support rapid restore at a separate site. The database can be restarted with the new data copy and online within a short recovery period.

ArcGIS Geodatabase Transition

Moving subsets of a single database cannot normally be supported with standard backup strategies. Data must be extracted from the primary database and imported into the remote database to support the data transfer. Database transition can be supported using standard ArcGIS export/import functions. These tools can be used as a method of establishing and maintaining a copy of the database at a separate location. Figure 4-6 identifies ways to move spatial data using ArcGIS data transition functions.

Figure 4-6 Ways to Move Spatial Data (Geodatabase Transition)

ArcSDE Admin Commands: Batch process can be used with ArcSDE admin commands to support export and import of an ArcSDE database. Moving data using these commands is most practical when completely replacing the data layers. These commands are not optimum solutions when transferring data to a complex ArcSDE geodatabase environment.

ArcCatalog/ArcTools Commands: ArcCatalog supports migration of data between ArcSDE geodatabase environments, extracts from a personal geodatabase, and imports from a personal geodatabase to an ArcSDE environment.

Database Replication

Customers have experienced a variety of technical challenges when configuring DBMS spatial data replication solutions. ArcSDE data model modifications may be required to support DBMS replication solutions. Edit loads will be applied to both server environments, contributing to potential performance or server sizing impacts. Data changes must be transmitted over network connections between the two servers, causing potential communication bottlenecks. These challenges must be overcome to support a successful DBMS replication solution.   Customers have indicated that DBMS replication solutions can work but require a considerable amount of patience and implementation risk. Acceptable solutions are available through some DBMS vendors to support replication to a read-only backup database server. Dual master server configurations significantly increase the complexity of an already complex replication solution. Figure 4-7 presents the different ways to move spatial data using database replication.

Figure 4-7 Ways to Move Spatial Data (Database Replication)

Synchronous Replication. Real-time replication requires commitment of data transfer to the replicated server before releasing the client application on the primary server. Edit operations with this configuration would normally result in performance delays because of the typical heavy volume of spatial data transfers and the required client interaction times. High-bandwidth fiber connectivity (1000 Mbps bandwidth) is recommended between the primary server and the replicated backup server to minimize performance delays.

Asynchronous Replication. Near real-time database replication strategies decouple the primary server from the data transfer transaction to the secondary server environment. Asynchronous replication can be supported over WAN connections, since the slow transmission times are isolated from primary server performance. Data transfers (updates) can be delayed to off-peak periods if WAN bandwidth limitations dictate, supporting periodic updates of the secondary server environment at a frequency supporting operational requirements.

Disk-Level Replication

Disk-level replication is a well-established technology, supporting global replication of critical data for many types of industry solutions. Spatial data is stored on disk sectors very similar to any other data storage and, as such, does not require special attention beyond what might be required for other data types. Disk volume configurations (data location on disk and what volumes are transferred to the remote site) may be critical to ensure database integrity. Mirror copies are refreshed based on point-in-time snapshot functions supported by the storage vendor solution.   Disk-level replication provides transfer of block-level data changes on disk to a mirror disk volume located at a remote location. Transfer can be supported with active online transactions with minimum impact on DBMS server performance capacity. Secondary DBMS applications must be restarted to refresh the DBMS cache and processing environment to the point in time of the replicated disk volume.

Figure 4-8 Ways to Move Spatial Data (Disk-Level Replication)

Synchronous Replication—Real-time replication requires commitment of data transfer to the replicated storage array before releasing the DBMS application on the primary server. High-bandwidth fiber connectivity (1000 Mbps bandwidth) is recommended between the primary server and the replicated backup server to avoid performance delays.

Asynchronous Replication: Near real-time disk-level replication strategies decouple the primary disk array from the commit transaction of changes to the secondary storage array environment. Asynchronous replication can be supported over WAN connections, since the slow transmission times are isolated from primary DBMS server performance. Disk block changes can be stored and data transfers delayed to off-peak periods if WAN bandwidth limitations dictate, supporting periodic updates of the secondary disk storage volumes to meet operational requirements.  

Ways to Manage and Access Spatial Data

Release of the ArcGIS technology introduced the ArcSDE geodatabase, which provides a way to manage long transaction edit sessions within a single database instance. ArcSDE supports long transactions using versions (different views) of the database. A geodatabase can support thousands of concurrent versions of the data within a single database instance. The default version represents the real world, and other named versions are proposed changes and database updates in work.

Figure 4-9 shows a typical long transaction workflow life cycle. The workflow represents design and construction of a typical housing subdivision. Several design alternatives might initially be represented as separate named versions in the database to support planning for a new subdivision. One of these designs (versions) is approved to support the construction phase. After the construction phase is complete, the selected design (version) is modified to represent the as-built environment. Once the development is completed, the final design version will be reconciled with the geodatabase and posted to the default version to reflect the new subdivision changes.

Figure 4-9 Long Transaction Workflow Life Cycle

The simplest way to introduce the versioning concept in the geodatabase is by using some logical flow diagrams. Figure 4-10 demonstrates the explicit state model represented in the geodatabase. The default version lineage is represented in the center of the diagram, and a new default version state is added each time edits are posted to the default view. Each edit post represents a state change in the default view (accepted changes to the real-world view). There can be thousands of database changes (versions) at a time. As changes are completed, these versions are posted to the default lineage.

Figure 4-10 Explicit State Model

The "new version" on the top of the diagram shows the life cycle of a long transaction. The transaction begins as changes from "state 1" of the default lineage. Maintenance updates reflected in that version are represented by new states in the edit session (1a, 1b, and 1c). During the edit session, the default version accepts new changes from other completed versions. The new version active edit session is not aware of the posted changes to the default lineage (2, 3, 4, and 5) since it is referenced from default state 1. Once the new version is complete, it must be reconciled with the default lineage. The reconcile process compares the changes in the new version (1a, 1b, and 1c) with changes in the default lineage (2, 3, 4, and 5) to make sure there are no edit conflicts. If the reconcile process identifies conflicts, these conflicts must be resolved before the new version can be posted to the default lineage. Once all conflicts are resolved, the new version is posted to the default lineage forming state 6.

Figure 4-11 shows a typical workflow history of the default lineage. Named versions (t1, t4, and t7) represent edit transactions in work that have not been posted back to the default lineage. The parent states of these versions (1, 4, and 7) are locked in the default lineage to support the long edit sessions that have not been posted. The default lineage includes several states (2, 3, 5, and 6) that were created by posting completed changes.

Figure 4-11 Default History

Figure 4-12 demonstrates a geodatabase compress. Very long default lineages (thousands of states) can impact database performance. The geodatabase compress function consolidates all default changes into the named version parent states, thus decreasing the length of the default lineage and improving database performance.

Figure 4-12 Geodatabase Compress

Now that the geodatabase versioning concept is understood, it is helpful to recognize how this is physically implemented within the database table structure. When a feature table within the geodatabase is versioned, two new tables are created to track changes to the base feature table. An Adds Table is created to track additional rows added to the base feature table, and a Deletes Table is created to record deleted rows from the Base Table . Each row in the Adds and Deletes tables represents change states within the geodatabase. As changes are posted to the default version, these changes are represented by pointers in the Adds and Deletes tables. Once there is a versioned geodatabase, the real-world view (default version) is represented by the Base Table plus the Adds and Deletes tables included in the default lineage (the Base Table does not represent default). All outstanding versions must be reconciled and posted to compress all default changes back to the Base Table (zero state). This is not likely to occur for a working maintenance database in a real-world environment.

ArcSDE Geodatabase

The ArcGIS technology includes a spatial database engine (ArcSDE) for managing and sharing GIS data. figure 4-13 provides an overview of the ArcSDE components.

Figure 4-13 ArcSDE Components

Every ESRI software product includes an ArcSDE communications client. The ArcSDE schema includes relationships and dependencies used to manage geodatabase versioning and replication functionality. The ArcSDE schema also includes the geodatabase license code stored in host DBMS tables. ArcSDE also includes an executable that translates communications between ArcGIS ArcObjects and the supported DBMS. The ArcSDE executable is included in the ArcGIS ArcObject DBMS direct connect application program interface (api), and is also available for install on the DBMS server or middle server tier as a separate application executable (GSRVR).

Geodatabase Evolution. The ArcSDE Geodatabase schema has evolved from initial spatial binary schema and storage types to the current XML schema with SQL spatial storage types. figure 4-14 shows the evolution cycle improving spatial data access, improved performance and scalability, and a larger collection of supported spatial storage data types.

Figure 4-14 Geodatabase Evolution

The GIS spatial and attribute data are stored in relational database tables. The ArcSDE and User schema defines the geodatabase table structure, relationships, and dependencies. Figure 4-15 provides a representation of the Base Table, Adds Table, and Deletes Table in a versioned geodatabase.

Figure 4-15 Geodatabase Tables

Distributed Geodatabase

ArcSDE manages the versioning schema of the geodatabase and supports client application access to the appropriate views of the geodatabase. ArcSDE also supports export and import of data from and to the appropriate database tables and maintains the geodatabase scheme defining relationships and dependencies between the various tables.

Geodatabase Single-Generation Replication

The ArcGIS 8.3 release introduced a disconnected editing solution. This solution provides a registered geodatabase version extract to a personal geodatabase or separate database instance for disconnected editing purposes. The version adds/deletes values are collected by the disconnected editor and, on reconnecting to the parent server, can be uploaded to the central ArcSDE database as a version update.

Figure 4-16 presents an overview of the ArcGIS 8.3 disconnected editing with checkout to a personal geodatabase (PGD). The ArcGIS 8.3 release is restricted to a single checkout/check-in transaction for each client edit session.

Figure 4-16 Geodatabase Single-generation Replication —Personal Geodatabase Checkout

Figure 4-17 presents an overview of the ArcGIS 8.3 disconnected editing with checkout to a separate ArcSDE geodatabase. The ArcGIS 8.3 release is restricted to a single checkout/ check-in transaction for each child ArcSDE database. The child ArcSDE database can support multiple disconnected or local version edit sessions during the checkout period. All child versions must be reconciled before check-in with the parent ArcSDE database (any outstanding child versions will be lost during the child ArcSDE database check-in process).

Figure 4-17 Geodatabase Single-generation Replication — Database Checkout

Geodatabase Multi-generation Replication

The ArcGIS 8.3 database checkout functions provided with disconnected editing can be used to support peer-to-peer database refresh. Figure 4-18 shows a peer-to-peer database checkout, where ArcSDE disconnected editing functionality can be used to periodically refresh specific feature tables of the geodatabase to support a separate instance of the geodatabase environment. This functionality can be used to support a separate distribution view-only geodatabase that can be configured to support a nonversioned copy of the default version.

Figure 4-18 One-way Multi-generation Replication —Peer to Peer Geodatabase Updates

The ArcGIS 9.2 software incorporates support for incremental updates between ArcSDE geodatabase environments.

Geodatabase Multi-generation Replication: The ArcGIS disconnected editing functionality was expanded in with the ArcGIS 9 releases to support loosely coupled ArcSDE distributed database environments. Figure 4-19 presents an overview of the loosely coupled ArcSDE distributed database concept.

Figure 4-19 Distributed Geodatabase Architecture

Multi-generation replication supports a single ArcSDE geodatabase distributed over multiple platform environments. The child checkout versions of the parent database supports an unlimited number of update transactions without losing local version edits or requiring a new checkout. Updates are passed between parent and child database environments through simple datagrams that can be transmitted over standard WAN communications. This new geodatabase architecture supports distributed database environments over multiple sites connected by limited bandwidth communications (only the reconciled changes are transmitted between sites to support database synchronization).

Figure 4-20 provides an overview of common ArcGIS Server geodatabase use case scenarios.

Figure 4-20 Geodatabase Replication Use Cases

Data Management Overview

Support for distributed database solutions has traditionally introduced high-risk operations, with potential for data corruption and use of stale data sources in GIS operations. There are organizations that support successful distributed solutions. Their success is based on careful planning and detailed attention to their administrative processes that support the distributed data sites. More successful GIS implementations support central consolidated database environments with effective remote user performance and support. Future distributed database management solutions may significantly reduce the risk of supporting distributed environments. Whether centralized or distributed, the success of enterprise GIS solutions will depend heavily on the administrative team that keeps the system operational and provides an architecture solution that supports user access needs. Figure 4-21 provides an overview of the lessons learned discussed in this chapter.

Figure 4-21 GIS Data Administration Summary

The next chapter on performance fundamentals will focus on understanding the technology, presenting the fundamental terms and relationships used in system architecture design capacity planning.

System Design Strategies
System Design Strategies 27th Edition (Spring 2010)
1. System Design Process 2. GIS Software Technology 3. Software Performance 4. GIS Data Administration
5. Performance Fundamentals 6. Network Communications 7. GIS Product Architecture 8. Information Security
9. Platform Performance 10. Capacity Planning Tool 11. City of Rome 12. System Implementation

Page Footer
Specific license terms for this content
System Design Strategies 26th edition - An Esri ® Technical Reference Document • 2009 (final PDF release)