3. Automatic QC flagging

Here we will create some automatic QC flagging based on U.S. Integrated Ocean Observing System (IOOS) Quality Assurance of Real Time Ocean Data (QARTOD); https://ioos.noaa.gov/project/qartod/).

The outcome here will be to flag data

Quality Control Flag Values and Meanings

Flag

OceanSITES Meaning

IOC Meaning

Notes

0

unknown

not defined

Used in OceanSITES, not IOC

1

good_data

Data point passed the test

Passed documented required QC tests

2

probably_good_data

Test was not evaluated

OceanSITES assumes quality; IOC indicates no test performed or unknown

3

potentially_correctable_bad_data

Data point is interesting/unusual or suspect

OceanSITES implies fixable; IOC flags as suspect (non-critical or subjective failure)

4

bad_data

Data point fails the test

Failed critical QC tests or flagged by data provider

7

nominal_value

not defined

Constant value, e.g. for reference or nominal settings; not used by IOC

8

interpolated_value

not defined

Estimated or gap-filled data; not used by IOC

9

missing_value

Data point is missing

Placeholder when data are absent

Including QC Flags in an xarray Dataset

To add a QC flag variable to an xarray Dataset, define a new variable (e.g., TEMP_QC) with the same dimensions as the data variable, and assign the appropriate attributes:

import numpy as np
import xarray as xr

ds["TEMP_QC"] = xr.DataArray(
   np.ones(ds["TEMP"].shape, dtype="int8"),
   dims=ds["TEMP"].dims,
   attrs={
      "long_name": "quality flag for TEMP",
      "flag_values": [0, 1, 2, 3, 4, 7, 8, 9],
      "flag_meanings": "unknown good_data probably_good_data potentially_correctable_bad_data bad_data nominal_value interpolated_value missing_value"
   }
)

1. Overview

Besides Raw mooring records often contain extraneous data before deployment or after recovery (e.g., deck recording, values during ascent/descent, post-recovery handling). These segments must be trimmed to retain only the time interval when the instrument was collecting valid in-situ measurements at the nominal depth during deployment. In this stage:

  • Visualised to identify data issues (e.g., deployment start/end spikes only)

  • Optionally low-pass filtered (e.g., 2-day Butterworth)

  • Inspected manually

  • Optionally adjusted: - Revised trimming bounds

  • Prepared for further processing (e.g., gridding)

2. Purpose

  • Flag data quality per sample

  • Generate summary plots and statistics

3. Input

  • Standardised xarray.Dataset containing raw time series (TIME, TEMP, etc.)

  • Configuration information for the automatic QC tests to be applied (e.g. QARTOD global range test, spike test, etc)

4. Output

  • Additional flagged data variables on the xarray.Dataset named <PARAM>_QC.

  • Configuration information for the automatic QC applied

5. Example

from oceanarray.methods import auto_qc

ds_trimmed = newname_here(ds_std, start="2021-01-05T20:00", end="2023-02-25T17:00")
<xarray.Dataset>
Dimensions:      (TIME: 104576)
Coordinates:
  * TIME         (TIME) datetime64[ns] ...
Data variables:
    TEMPERATURE  (TIME) float32 ...
    PRESSURE     (TIME) float32 ...
Attributes:
    start_time: 2021-01-05T20:00
    end_time: 2023-02-25T17:00
    trimmed: True

6. Implementation Notes

  • Rely heavily on the ioos_qc python package

7. FAIR Considerations

  • Don’t change the data - only apply flags

  • Retain configuration information for the flagging carried out automatically: i.e., what thresholds were used

  • Note: Since we are using OceanSITES data format, we should use OceanSITES flagging. However, there is a conflict in meaning for flag “2”. Possibly it might be wiser to simply not use flag 2 and only use flag 3 when it’s not a flag 1?

See also: 3. Calibration (Instrument-level Corrections)