3. Automatic QC flagging ============================== Here we will create some automatic QC flagging based on U.S. Integrated Ocean Observing System (IOOS) Quality Assurance of Real Time Ocean Data (QARTOD); https://ioos.noaa.gov/project/qartod/). The outcome here will be to flag data .. list-table:: Quality Control Flag Values and Meanings :header-rows: 1 :widths: 6 22 22 40 * - Flag - OceanSITES Meaning - IOC Meaning - Notes * - 0 - **unknown** - not defined - Used in OceanSITES, not IOC * - 1 - **good_data** - **Data point passed the test** - Passed documented required QC tests * - 2 - **probably_good_data** - Test was not evaluated - OceanSITES assumes quality; IOC indicates no test performed or unknown * - 3 - **potentially_correctable_bad_data** - **Data point is interesting/unusual or suspect** - OceanSITES implies fixable; IOC flags as suspect (non-critical or subjective failure) * - 4 - **bad_data** - **Data point fails the test** - Failed critical QC tests or flagged by data provider * - 7 - **nominal_value** - not defined - Constant value, e.g. for reference or nominal settings; not used by IOC * - 8 - **interpolated_value** - not defined - Estimated or gap-filled data; not used by IOC * - 9 - **missing_value** - **Data point is missing** - Placeholder when data are absent **Including QC Flags in an xarray Dataset** To add a QC flag variable to an xarray Dataset, define a new variable (e.g., `TEMP_QC`) with the same dimensions as the data variable, and assign the appropriate attributes: .. code-block:: python import numpy as np import xarray as xr ds["TEMP_QC"] = xr.DataArray( np.ones(ds["TEMP"].shape, dtype="int8"), dims=ds["TEMP"].dims, attrs={ "long_name": "quality flag for TEMP", "flag_values": [0, 1, 2, 3, 4, 7, 8, 9], "flag_meanings": "unknown good_data probably_good_data potentially_correctable_bad_data bad_data nominal_value interpolated_value missing_value" } ) 1. Overview ----------- Besides Raw mooring records often contain extraneous data before deployment or after recovery (e.g., deck recording, values during ascent/descent, post-recovery handling). These segments must be trimmed to retain only the time interval when the instrument was collecting valid in-situ measurements at the nominal depth during deployment. In this stage: - Visualised to identify data issues (e.g., deployment start/end spikes only) - Optionally low-pass filtered (e.g., 2-day Butterworth) - Inspected manually - Optionally adjusted: - Revised trimming bounds - Prepared for further processing (e.g., gridding) 2. Purpose ---------- - Flag data quality per sample - Generate summary plots and statistics 3. Input -------- - Standardised `xarray.Dataset` containing raw time series (`TIME`, `TEMP`, etc.) - Configuration information for the automatic QC tests to be applied (e.g. QARTOD global range test, spike test, etc) 4. Output --------- - Additional flagged data variables on the `xarray.Dataset` named `_QC`. - Configuration information for the automatic QC applied 5. Example ---------- .. code-block:: python from oceanarray.methods import auto_qc ds_trimmed = newname_here(ds_std, start="2021-01-05T20:00", end="2023-02-25T17:00") .. code-block:: text Dimensions: (TIME: 104576) Coordinates: * TIME (TIME) datetime64[ns] ... Data variables: TEMPERATURE (TIME) float32 ... PRESSURE (TIME) float32 ... Attributes: start_time: 2021-01-05T20:00 end_time: 2023-02-25T17:00 trimmed: True 6. Implementation Notes ----------------------- - Rely heavily on the `ioos_qc` python package 7. FAIR Considerations ---------------------- - Don't change the data - only apply flags - Retain configuration information for the flagging carried out automatically: i.e., what thresholds were used - **Note:** Since we are using OceanSITES data format, we should use OceanSITES flagging. However, there is a conflict in meaning for flag "2". Possibly it might be wiser to simply not use flag 2 and only use flag 3 when it's not a flag 1? See also: :doc:`calibration`