OceanArray Data Format Specification
This specification defines the preferred final format for data output from the oceanarray processing pipeline. It builds upon the OceanSITES data format while incorporating elements from the OG1 (OceanGliders) format and making specific choices for oceanographic mooring data.
For detailed reference, see the complete OceanSITES Data Format Reference Manual.
1. Overview & Context
The oceanarray format follows NetCDF-4 conventions with CF (Climate and Forecast) metadata standards. This format ensures interoperability while optimizing for oceanographic time series data from moored instruments.
1.1 Relationship to Standards
OceanSITES Foundation: This format builds upon the OceanSITES Data Format Reference Manual v1.4. See our complete OceanSITES Data Format Reference Manual for reference.
OG1 Elements: Incorporates select patterns from the OceanGliders format (OG1), particularly for contributor metadata and time formatting.
CF Compliance: Maintains strict adherence to CF Conventions and UDUNITS-2 standards.
2. Key Design Decisions
Warning
The following choices differ from referenced standards and may need future revision based on community feedback.
2.1. Deviations from OceanSITES Standard
Aspect |
OceanSITES Standard |
OceanArray Choice |
---|---|---|
Vertical Dimension |
|
|
Units Format |
|
|
Time Format |
|
|
Rationale for N_LEVELS: The CCHDO community uses N_LEVELS
as the standard vertical dimension name, providing better alignment with hydrographic data processing workflows.
Rationale for UDUNITS: Strict adherence to UDUNITS ensures compatibility with scientific software packages and avoids ambiguity in unit interpretation.
Rationale for Time Format: The OG1 time format provides a more compact representation suitable for attribute strings while maintaining ISO 8601 compatibility.
3. File Organization & Naming
3.1. File Naming Convention
Files follow the OceanSITES naming pattern with oceanarray-specific modifications:
Basic Pattern: OS_[PLATFORM]_[DEPLOYMENT]_[MODE]_[PARAMS].nc
Components:
- OS
= OceanSITES prefix (maintains compatibility)
- [PLATFORM]
= Platform identifier (e.g., “CIS-1”)
- [DEPLOYMENT]
= Deployment code (e.g., “200502” for Feb 2005)
- [MODE]
= Data mode: R (real-time), P (provisional), D (delayed-mode)
- [PARAMS]
= Parameter identifier (e.g., “CTD”, “ADCP”)
Examples:
- OS_CIS-1_200502_D_CTD.nc
- Delayed-mode CTD data
- OS_PAPA_201505_D_ADCP.nc
- Delayed-mode current data
Reference: See OceanSITES file naming in 4.1.1 Deployment Data files Naming Convention.
4. Global Attributes
Requirement Status (RS): M = Mandatory, HD = Highly Desired, S = Suggested
Reference: See complete OceanSITES global attributes in 2.2 Global attributes.
Global attribute names are case sensitive.
4.1. Discovery and Identification
Attribute |
Definition |
Example |
RS |
---|---|---|---|
|
OceanSITES site name where platform installed |
|
M |
|
Unique platform identifier |
|
M |
|
Data quality mode: R/P/D (typically D for oceanarray) - see 8.4. OceanSITES 1.4 Reference Table 4: Data Mode |
|
M |
|
Free-format dataset description |
|
HD |
|
Detailed description (up to 100 words) for data discovery |
|
HD |
|
Unique dataset identifier (often filename without .nc) |
|
HD |
|
Organization managing dataset names |
|
HD |
|
WMO identifier unique within OceanSITES project |
|
HD |
|
Platform type from SeaVoX L06 vocabulary |
|
HD |
|
OceanSITES theme areas (comma-separated) - see 1.3 OceanSITES Organizational model |
|
HD |
|
OceanArray format version |
|
M |
|
OceanSITES data type (maintains compatibility) |
|
M |
4.2. Provenance
Warning
DEVIATION from OceanSITES: This section consolidates OceanSITES creator_*
(creator_name, creator_email, creator_url, creator_institution) and principal_investigator_*
(principal_investigator_name, principal_investigator_email, principal_investigator_institution) fields into unified contributor_*
attributes. This follows the OG1 pattern which provides more flexible support for multiple contributors with defined roles, rather than having separate creator and PI fields. The OceanArray approach allows for clearer attribution of different contributor roles (PI, Data scientist, Operator, etc.) within a single, consistent metadata structure.
Consolidates OceanSITES creator_*
and principal_investigator_*
fields into unified contributor_*
attributes supporting multiple contributors following OG1 patterns.
Attribute |
Definition |
Example |
RS |
---|---|---|---|
|
Name of the contributors to the oceanographic mission. Multiple contributors are separated by commas. PI name is mandatory. |
|
M |
|
Email of the contributors to the oceanographic mission. Multiple contributors’ emails are separated by commas. PI email is mandatory. |
|
M |
|
Unique id of the contributors to the oceanographic mission. Multiple contributors’ ids are separated by commas. |
|
HD |
|
Role of the contributors to the oceanographic mission. Multiple contributors’ roles are separated by commas. PI vocabulary is mandatory. |
|
M |
|
Controlled vocabulary for the roles used in the “contributor_role”. Multiple contributors’ roles and vocabularies are separated by commas. PI vocabulary is mandatory. |
|
M |
|
Names of institutions involved in the oceanographic mission. Multiple institutions are separated by commas. Operator is mandatory. |
|
M |
|
URL to the repository of the institution id. Multiple vocabularies are separated by commas. |
|
HD |
|
Role of the institutions involved in the oceanographic mission. Multiple institutions’ roles are separated by commas. Operator role is mandatory. |
|
M |
|
The controlled vocabulary of the role used in the institution’s role. Multiple vocabularies are separated by commas. Operator vocabulary is mandatory. |
|
M |
Standard Contributor Roles: Data scientist
, Manufacturer
, PI
, Technical Coordinator
, Operator
, Owner
Provenance and Data History
Attribute |
Definition |
Example |
RS |
---|---|---|---|
|
|
|
M |
|
|
|
HD |
|
Provides an audit trail for modifications to the original data. Each line should begin with a timestamp, and include user name, modification name, and modification arguments. The timestamp should follow YYYYmmddTHHMMss format. |
|
HD |
|
Processing level from 3.3 Reference table 3: Processing level |
|
HD |
|
A value valid for the whole dataset, one of: ‘unknown’ – no QC done, no known problems ‘excellent’ - no known problems, all important QC done ‘probably good’ - validation phase ‘mixed’ - some problems, see variable attributes |
|
HD |
4.3. Geospatial-Temporal Attributes
Attribute |
Definition |
Example |
RS |
---|---|---|---|
|
Minimum latitude of dataset coverage |
|
M |
|
Maximum latitude of dataset coverage |
|
M |
|
Minimum longitude of dataset coverage |
|
M |
|
Maximum longitude of dataset coverage |
|
M |
|
Minimum depth/elevation of dataset coverage |
|
M |
|
Maximum depth/elevation of dataset coverage |
|
M |
|
Indicates which direction is positive; “up” means that z represents height, while “down” means that z represents pressure or depth. If not specified then “down” is assumed. (ACDD) |
|
HD |
|
Units of depth, pressure, or height. If not specified then “meter” is assumed. (ACDD) |
|
S |
|
|
|
M |
|
|
|
M |
|
ISO 8601 duration format (examples: P1Y, P3M, P415D, P1Y1M3D) |
|
HD |
|
CF Discrete Sampling Geometry type (timeSeries or timeSeriesProfile) |
|
HD |
|
Date and time in ISO format of platform deployment |
|
S |
|
Ship name for deployment (see https://ocean.ices.dk/codes/ShipCodes.aspx) |
|
S |
|
Cruise name for deployment (may be found on operators’ sites or rvdata.us) |
|
S |
|
ICES ship code for deployment vessel |
|
S |
|
ICES ship code plus cruise start date for deployment |
|
S |
|
Date and time in ISO format of platform recovery |
|
S |
|
Ship name for recovery (see https://ocean.ices.dk/codes/ShipCodes.aspx) |
|
S |
|
Cruise name for recovery (may be found on operators’ sites or rvdata.us) |
|
S |
|
ICES ship code for recovery vessel |
|
S |
|
ICES ship code plus cruise start date for recovery |
|
S |
Time Format Rationale: The compact YYYYmmddTHHMMss
format reduces attribute string length while maintaining human readability and ISO 8601 compatibility.
File dates: The file dates, date_created and date_modified, are our interpretation of the file dates as defined by ACDD. Date_created is the time stamp on the file, date_modified may be used to represent the ‘version date’ of the geophysical data in the file. The date_created may change when e.g. metadata is added or the file format is updated, and the optional date_modified MAY be earlier.
Geospatial extents: (geospatial_lat_min, max, and lon_min, max) are preferred to be stored as strings for use in the GDAC software, however numeric fields are acceptable. This information is linked to the site information, and may not be specific to the platform deployment.
4.4. Publication Information
Attribute |
Definition |
Example |
RS |
---|---|---|---|
|
Frequency of dataset updates |
|
M |
|
Published or web-based references that describe the data or methods used to produce it. Include a reference to OceanSITES and a project-specific reference if appropriate. |
|
HD |
|
A statement describing the data distribution policy; it may be a project- or DAC-specific statement, but must allow free use of data. OceanSITES has adopted the CLIVAR data policy. (ACDD) |
|
S |
|
The citation to be used in publications using the dataset; should include a reference to OceanSITES, the name of the PI, the site name, platform code, data access date, time, and URL, and, if available, the DOI of the dataset. |
|
S |
|
A place to acknowledge various types of support for the project that produced this data. (ACDD) |
|
S |
5. Dimensions
NetCDF dimensions provide information on the size of the data variables, and additionally tie coordinate variables to data. CF recommends that if any or all of the dimensions of a variable have the interpretations of “date or time” (T), “height or depth” (Z), “latitude” (Y), or “longitude” (X) then those dimensions should appear in the relative order T, Z, Y, X in the variable’s definition (in the CDL).
Reference: See OceanSITES dimensions in 2.3 Dimensions.
Warning
DEVIATION from OceanSITES: OceanArray uses N_LEVELS
instead of the OceanSITES standard DEPTH
dimension to align with CCHDO (CLIVAR and Carbon Hydrographic Data Office) conventions used in hydrographic data processing workflows. This reduces confusion between a dimension of DEPTH when the instruments are recording pressure.
Name |
Type |
Definition |
RS |
---|---|---|---|
|
Unlimited |
Number of time steps. Example: for a mooring with one value per day and a mission length of one year, TIME contains 365 time steps. |
M |
|
Fixed |
[DEVIATION from OceanSITES] Number of depth levels. Used instead of OceanSITES |
M |
|
Fixed (=1) |
Dimension of the LATITUDE coordinate variable. |
M |
|
Fixed (=1) |
Dimension of the LONGITUDE coordinate variable. |
M |
Dimension Ordering: Following OceanSITES convention, dimensions should be ordered as (TIME, N_LEVELS, Y, X)
. CF recommends T,Z,Y,X order in variable definitions.
6. Coordinate Variables
NetCDF coordinates are a special subset of variables. Coordinate variables orient the data in time and space; they may be dimension variables or auxiliary coordinate variables (identified by the ‘coordinates’ attribute on a data variable).
Reference: See OceanSITES coordinate variables in 2.4 Coordinate variables.
Variable |
Dimension |
Attributes and Requirements |
RS |
---|---|---|---|
|
|
data type: double
long_name = “Time elapsed since 1970-01-01T00:00:00Z”
calendar = “gregorian”
units = “seconds since 1970-01-01T00:00:00Z”
axis = “T”
_FillValue = -1.0
valid_min = 1e9, valid_max = 4e9
ancillary_variables = “TIME_QC”
interpolation_methodology = “”
|
M |
|
|
data type: double
long_name = “longitude of measurement location”
standard_name = “longitude”
units = “degrees_east”
axis = “X”
_FillValue = -9999.9
valid_min = -180.0, valid_max = 180.0
ancillary_variables = “LONGITUDE_QC”
interpolation_methodology = “”
|
HD |
|
|
HD |
|
|
|
data type: double
**long_name” = “Pressure below surface of the water body”
standard_name = “sea_water_pressure”
units = “dbar”
axis = “Z”
positive = “down”
_FillValue = -9999.9
valid_min” = 0.0, **valid_max = 10000.0
ancillary_variables = “PRESSURE_QC”
interpolation_methodology = “”
|
HD |
Note: PRESSURE
coordinate variable uses the N_LEVELS
dimension, following oceanographic convention where instruments typically measure pressure rather than depth directly.
7. Data Variables & Quality Control
7.1. Data Variable Structure
Data variables contain the actual measurements and information about their quality, uncertainty, and mode by which they were obtained.
Reference: See OceanSITES data variables in oceanSITES data variables.
Standard Structure:
Float <PARAM>(TIME, N_LEVELS);
<PARAM>:standard_name = <CF_NAME>;
<PARAM>:units = <UDUNITS_STRING>;
<PARAM>:_FillValue = <VALUE>;
<PARAM>:long_name = <DESCRIPTION>;
7.2. Quality Control Variables
Each measured parameter should have an associated quality control variable with suffix “_QC”.
Reference: See OceanSITES QC variables in oceanSITES QC variables.
QC Structure:
Byte <PARAM>_QC(TIME, N_LEVELS);
<PARAM>_QC:long_name = "quality flag for <PARAM>";
<PARAM>_QC:flag_values = 0, 1, 2, 3, 4, 7, 8, 9;
<PARAM>_QC:flag_meanings = "unknown good_data probably_good_data potentially_correctable_bad_data bad_data nominal_value interpolated_value missing_value";
8. Reference Tables
8.1. Units
All units must follow the UDUNITS-2 standard:
Quantity |
UDUNITS Format |
Notes |
---|---|---|
Latitude |
|
[DEVIATION] OceanSITES uses |
Longitude |
|
[DEVIATION] OceanSITES uses |
Temperature |
|
Preferred over |
Pressure |
|
Standard oceanographic unit |
Depth |
|
Standard SI unit |
Salinity |
|
Dimensionless or practical salinity units |
Current Speed |
|
SI derived unit |
Current Direction |
|
Angular measurement |
Ocean Transport |
|
1 Sv = 10^6 m³/s (note: not |
Reference: See OceanSITES units in oceanSITES reference tables.
8.2. QC Flag Values
Reference: See complete QC flags in oceanSITES QC flags.
Flag |
Meaning |
Description |
---|---|---|
0 |
unknown |
No QC was performed |
1 |
good_data |
All QC tests passed |
2 |
probably_good_data |
Data are probably good |
3 |
potentially_correctable_bad_data |
Data are not to be used without scientific correction |
4 |
bad_data |
Data have failed one or more tests |
7 |
nominal_value |
Data were not observed but reported (e.g. instrument target depth) |
8 |
interpolated_value |
Missing data interpolated from neighboring data |
9 |
missing_value |
Fill value |
8.3. Processing Levels
Reference: See OceanSITES processing levels in oceanSITES processing levels.
Standard processing level descriptions for the processing_level
attribute:
Raw instrument data
Instrument data that has been converted to geophysical values
Post-recovery calibrations have been applied
Data has been scaled using contextual information
Known bad data has been replaced with null values
Data manually reviewed
Data verified against model or other contextual information
8.4. OceanSITES 1.4 Reference Table 4: Data Mode
The values for the variables <PARAM>_DM
, the global attribute data_mode
, and variable attributes <PARAM>:DM_indicator
are defined as follows:
Value |
Meaning |
Description |
---|---|---|
R |
Real-time data |
Data coming from the (typically remote) platform through a communication channel without physical access to the instruments, disassembly or recovery of the platform. Example: for a mooring with a radio communication, this would be data obtained through the radio. |
P |
Provisional data |
Data obtained after instruments have been recovered or serviced; some calibrations or editing may have been done, but the data is not thought to be fully processed. Refer to the history attribute for more detailed information. |
D |
Delayed-mode data |
Data published after all calibrations and quality control procedures have been applied on the internally recorded or best available original data. This is the best possible version of processed data. |
M |
Mixed |
This value is only allowed in the global attribute “data_mode” or in attributes to variables in the form “<PARAM>:DM_indicator”. It indicates that the file contains data in more than one of the above states. In this case, the variable(s) <PARAM>_DM specify which data is in which data mode. |
9. Implementation Guidelines
9.1. Technical Compliance
UDUNITS Compliance: Strict adherence to UDUNITS-2 ensures compatibility with analysis tools like Python’s cf_units, NCO, and CDO.
CF Compliance: All files must validate against CF Checker.
9.2. Validation Checklist
Requirement |
Status |
---|---|
CF Convention compliance (cf-checker passes) |
☐ Required |
All required global attributes present |
☐ Required |
Coordinate variables have required attributes |
☐ Required |
Data variables have QC companions |
☐ Required |
Units follow UDUNITS standard |
☐ Required |
QC flags use standard values (0,1,2,3,4,7,8,9) |
☐ Required |
File naming follows OS_[PLATFORM]_[DEPLOYMENT]_[MODE]_[PARAMS].nc |
☐ Required |
9.3. References
OceanSITES Manual: OceanSITES Data Format Reference Manual
OceanSITES Website: https://www.ocean-ops.org/oceansites/
CF Conventions: http://cfconventions.org/
CF Checker: https://github.com/cedadev/cf-checker