3. Automatic QC flagging

Here we will create some automatic QC flagging based on U.S. Integrated Ocean Observing System (IOOS) Quality Assurance of Real Time Ocean Data (QARTOD); https://ioos.noaa.gov/project/qartod/).

The outcome here will be to flag data

Quality Control Flag Values and Meanings
Flag	OceanSITES Meaning	IOC Meaning	Notes
0	unknown	not defined	Used in OceanSITES, not IOC
1	good_data	Data point passed the test	Passed documented required QC tests
2	probably_good_data	Test was not evaluated	OceanSITES assumes quality; IOC indicates no test performed or unknown
3	potentially_correctable_bad_data	Data point is interesting/unusual or suspect	OceanSITES implies fixable; IOC flags as suspect (non-critical or subjective failure)
4	bad_data	Data point fails the test	Failed critical QC tests or flagged by data provider
7	nominal_value	not defined	Constant value, e.g. for reference or nominal settings; not used by IOC
8	interpolated_value	not defined	Estimated or gap-filled data; not used by IOC
9	missing_value	Data point is missing	Placeholder when data are absent

Including QC Flags in an xarray Dataset

To add a QC flag variable to an xarray Dataset, define a new variable (e.g., TEMP_QC) with the same dimensions as the data variable, and assign the appropriate attributes:

import numpy as np
import xarray as xr

ds["TEMP_QC"] = xr.DataArray(
   np.ones(ds["TEMP"].shape, dtype="int8"),
   dims=ds["TEMP"].dims,
   attrs={
      "long_name": "quality flag for TEMP",
      "flag_values": [0, 1, 2, 3, 4, 7, 8, 9],
      "flag_meanings": "unknown good_data probably_good_data potentially_correctable_bad_data bad_data nominal_value interpolated_value missing_value"
   }
)

1. Overview

Besides Raw mooring records often contain extraneous data before deployment or after recovery (e.g., deck recording, values during ascent/descent, post-recovery handling). These segments must be trimmed to retain only the time interval when the instrument was collecting valid in-situ measurements at the nominal depth during deployment. In this stage:

Visualised to identify data issues (e.g., deployment start/end spikes only)
Optionally low-pass filtered (e.g., 2-day Butterworth)
Inspected manually
Optionally adjusted: - Revised trimming bounds
Prepared for further processing (e.g., gridding)

2. Purpose

Flag data quality per sample
Generate summary plots and statistics

3. Input

Standardised xarray.Dataset containing raw time series (TIME, TEMP, etc.)
Configuration information for the automatic QC tests to be applied (e.g. QARTOD global range test, spike test, etc)

4. Output

Additional flagged data variables on the xarray.Dataset named <PARAM>_QC.
Configuration information for the automatic QC applied

5. Example

from oceanarray.methods import auto_qc

ds_trimmed = newname_here(ds_std, start="2021-01-05T20:00", end="2023-02-25T17:00")

<xarray.Dataset>
Dimensions:      (TIME: 104576)
Coordinates:
  * TIME         (TIME) datetime64[ns] ...
Data variables:
    TEMPERATURE  (TIME) float32 ...
    PRESSURE     (TIME) float32 ...
Attributes:
    start_time: 2021-01-05T20:00
    end_time: 2023-02-25T17:00
    trimmed: True

6. Implementation Notes

Rely heavily on the ioos_qc python package

7. FAIR Considerations

Don’t change the data - only apply flags
Retain configuration information for the flagging carried out automatically: i.e., what thresholds were used
Note: Since we are using OceanSITES data format, we should use OceanSITES flagging. However, there is a conflict in meaning for flag “2”. Possibly it might be wiser to simply not use flag 2 and only use flag 3 when it’s not a flag 1?