NetCDF

From wiki.gis.com
Jump to: navigation, search
Network Common Data Form
Filename extension .nc
.cdf
Internet media type application/netcdf
application/x-netcdf
Magic number CDF\001
Developed by R
Type of format scientific binary data
Extended from CDF

NetCDF (Network Common Data Form) is a set of software libraries and self-describing, machine-independent data formats that support the creation, access, and sharing of array-oriented scientific data. The project homepage is hosted by the Unidata program at the University Corporation for Atmospheric Research (UCAR). They are also the chief source of netCDF software, standards development, updates etc. The format is an open standard.

The project is actively supported. The recently released (2008) version 4.0 greatly enhances the data model by allowing the use of the HDF5 data file format.

The format was originally based on the conceptual model of the NASA CDF but has since diverged and is not compatible with it.

Format description

The data format is "self-describing". This means that there is a header which describes the layout of the rest of the file, in particular the data arrays, as well as arbitrary file metadata in the form of name/value attributes. The format is platform independent, with issues such as endianness being addressed in the software libraries. The data arrays are rectangular, not ragged, and stored in a simple and regular fashion that allows efficient subsetting.

The new 4.0 version of the netCDF API, allows the use of the HDF5 data format. NetCDF users can create HDF5 files with benefits not available with the netCDF format, such as much larger files and multiple unlimited dimensions. The netCDF library continues to use the classic netCDF binary format by default. Full backward compatibility in accessing old netCDF files is supported.

Software

Access libraries

The software libraries supplied by UCAR provide read-write access to netCDF files, encoding and decoding the necessary arrays and metadata. The core library is written in C, and provides an API for C, C++ and Fortran applications. An independent implementation, also developed and maintained by Unidata, is written in 100% Java, which extends the core data model and adds additional functionality. Interfaces to netCDF based on the C library are also available in other languages including R (ncdf and ncvar packages), Perl, Python, Ruby, Matlab, IDL, and Octave. The specification of the API calls is very similar across the different languages, apart from inevitable differences of syntax. The API calls for version 2 were rather different from those in version 3, but are also supported by version 3 for backward compatibility. Application programmers using supported languages need not normally be concerned with the file structure itself, even though it is available as an open format.

Applications

A wide range of application software has been written which makes use of netCDF files. These range from command line utilities to graphical visualization packages.

  • A commonly used set of Unix command line utilities for netCDF files is the NetCDF Operators (NCO) suite, which provide a range of commands for manipulation and analysis of netCDF files including basic record concatenating, slicing and averaging.
  • NcBrowse is a generic netCDF file viewer that includes Java graphics, animations and 3D visualizations for a wide range of netCDF file conventions.
  • ncview is a visual browser for netCDF format files. Typically you would use ncview to get a quick and easy, push-button look at your netCDF files. You can view simple movies of the data, view along various dimensions, take a look at the actual data values, change color maps, invert the data, etc.
  • Panoply is a netCDF file viewer developed at the NASA Goddard Institute for Space Studies which focuses on presentation of geo-gridded data. It is written in Java and thus platform independent. Although its feature set overlaps with ncBrowse and ncview, Panoply is distinguished by offering a wide variety of map projections and ability to work with different scale color tables.
  • Ferret is an interactive computer visualization and analysis environment designed to meet the needs of oceanographers and meteorologists analyzing large and complex gridded data sets. Ferret offers a Mathematica-like approach to analysis; new variables may be defined interactively as mathematical expressions involving data set variables. Calculations may be applied over arbitrarily shaped regions. Fully documented graphics are produced with a single command.
  • nCDF_Browser is a visual nCDF browser, written in the IDL programming language. Variables, attributes, and dimensions can be immediately downloaded to the IDL command line for further processing. All the Coyote Library files necessary to run nCDF_Browser are available in the zip file.
  • ArcGIS version 9.2 supports netCDF files. The Multidimension Tools toolbox can be used to create raster layers, feature layers, and table views from netCDF data in ArcMap, or convert feature, raster, and table data to netCDF.
  • Origin 8 imports netCDF files as matrix books where each book can hold a 4D array. Users can select a subset of the imported data to make surface, controur or image plots.

Common uses

It is commonly used in climatology, meteorology and oceanography applications (e.g., weather forecasting, climate change) and GIS applications.

It is an input/output format for many GIS applications, and for general scientific data exchange. To quote from their site "NetCDF (network Common Data Form) is an interface for array-oriented data access and a library that provides an implementation of the interface. The netCDF library also defines a machine-independent format for representing scientific data."

Climate Forecast Conventions

The Climate and Forecast (CF) conventions are metadata conventions for earth science data, intended to promote the processing and sharing of files created with the NetCDF Application Programmer Interface. The conventions define metadata that are included in the same file as the data (thus making the file "self-describing"), that provide a definitive description of what the data in each variable represents, and of the spatial and temporal properties of the data. This enables users of data from different sources to decide which quantities are comparable, and allows building applications with powerful extraction, regridding, and display capabilities.

CF is intended for use with state estimation and forecast data, for atmosphere, ocean, and climate. It was designed primarily to address gridded data types such as numerical model outputs and binned climatologies, but it is also applicable to many classes of observational data and has been adopted by a number of groups for such applications. CF is framed as a standard for data written in netCDF, but most of its ideas relate to fluid earth science metadata design in general, not specifically to netCDF. For example, some support for CF is now available to client that remotely access to Hierarchical Data Format data, and CF Standard Names are becoming a widely used ontology for data in other formats.

Examples of CF metadata include specification of coordinate systems information needed to locate data in space and time, standard names for quantities to determine whether data from different sources are comparable, and information about grids, such as grid cell bounds and cell averaging methods.

The CF Metadata Conventions document is maintained at the CF Conventions web site, which also hosts a list of institutions and projects that have adopted or endorsed the use of CF conventions for their data archives and applications.

Several principles guide the development of CF conventions:

  • Data should be self-describing, without external tables needed for interpretation.
  • Conventions should be developed only as needed, rather than anticipating possible needs.
  • Conventions should not be onerous to use for either data-writers or data-readers.
  • Metadata should be readable by humans as well as interpretable by programs.
  • Redundancy should be avoided to prevent inconsistencies when writing data.

Specific CF metadata descriptors use values of attributes to represent

  • Data provenance: title, institution, contact, source (e.g. model), history (audit trail of operations), references, comment
  • Description of associated activity: project, experiment
  • Description of data: units, standard_name, long_name, auxiliary_variables, missing_value, valid_range, flag_values, flag_meanings
  • Description of coordinates: coordinates, bounds, grid_mapping (with formula_terms); time specified with reference_time ("time since T0") and calendar attributes.
  • Meaning of grid cells: cell_methods, cell_measures, and climatological statistics.

The CF Standard Name table supports interoperability of geophysical data by providing a precise description of physical quantities being represented. This allows users of data from different sources to determine whether quantities are intended to be comparable, by uniquely associating a standard name with each geophysical parameter in a data set. Note that this is the string value of the standard_name attribute, not the name of the parameter. The CF standard name table identifies over 1000 physical quantities, each with a precise description and associated canonical units. Guidelines for construction of CF standard names are documented on the conventions web site.

As an example of the information provided by CF standard names, the entry for sea-level atmospheric pressure includes:

  • standard name: air_pressure_at_sea_level
  • description: sea_level means mean sea level, which is close to the geoid in sea areas. Air pressure at sea level is the quantity often abbreviated as MSLP or PMSL.
  • canonical units: Pa

Parallel-NetCDF

An extension of netCDF for parallel computing called Parallel-NetCDF (or PnetCDF) has been developed by Argonne National Laboratory and Northwestern University [1]. This is built upon MPI-IO, the I/O extension to MPI communications, and using the high-level netCDF data structures, the PnetCDF libraries can make use of optimizations to efficiently distribute the file read and write applications between multiple processors. The PnetCDF package can read/write only classic and 64-bit offset formats. PnetCDF cannot read or write the HDF5-based format available with netCDF-4.0.

Parallel I/O in the Unidata netCDF library has been supported since release 4.0, for HDF5 data files only.

NetCDF-Java Common Data Model

The NetCDF-Java library currently reads the following file formats and remote access protocols:

  • BUFR (ongoing development)
  • CINRAD level II (Chinese Radar format)
  • DMSP (Defense Meteorological Satellite Program)
  • DORADE radar file format
  • GINI (GOES Ingest and NOAAPORT Interface ) image format
  • GEMPAK gridded data
  • GRIB version 1 and version 2 (ongoing work on tables)
  • GTOPO 30-sec elevation dataset (USGS)
  • Hierarchical Data Format (HDF4, HDF-EOS, HDF5, HDF5-EOS)
  • NetCDF (classic and large format)
  • NetCDF-4(built on HDF5)
  • NEXRAD Radar level 2 and level 3.
  • OPeNDAP client/server protocol

There are a number of other formats in development. Since each of these is accessed transparently through the NetCDF API, the NetCDF-Java library is said to implement a Common Data Model for scientific datasets.

The Common Data Model has three layers, which build on top of each other to add successively richer semantics:

  1. The data access layer, also known as the syntactic layer, handles data reading.
  2. The coordinate system layer identifies the coordinates of the data arrays. Coordinates are a completely general concept for scientific data; specialized georeferencing coordinate systems, important to the Earth Science community, are specially annotated.
  3. The scientific data type layer identifies specific types of data, such as grids, images, and point data, and adds specialized methods for each kind of data.

The Data model of the data access layer is a generalization of the NetCDF-3 data model, and substantially the same as the NetCDF-4 data model. The coordinate system layer implements and extends the concepts in the CF Metadata Conventions. The scientific data type layer allows data to be manipulated in coordinate space, analogous to the Open Geospatial Consortium specifications. The identification of coordinate systems and data typing is ongoing, but users can plug in their own classes at runtime for specialized processing.

See also

  • Common Data Format (CDF)
  • GRIB (GRIdded Binary)
  • Hierarchical Data Format (HDF)
  • FITS

External links

This article was originally based on material from the Free On-line Dictionary of Computing, which is licensed under the GFDL.