2. Trimming to Deployed period ============================== This document outlines the **trimming** step in the oceanarray processing workflow. Trimming refers to the process of isolating the valid deployment period from a time series—typically corresponding to the interval between instrument deployment and recovery. This step also applies clock corrections and produces standardized NetCDF files ready for further processing. This step may need to be run several times, adjusting the start and end of the deployment periods based on data inspection. It corresponds to **Stage 2** from RAPID data processing and management, which uses as input the RDB format and outputs (in RDB format) the `*.use` file. Here, we start with the Stage 1 output `*_raw.nc` files and produce `*_use.nc` files. 1. Overview ----------- Raw mooring records often contain extraneous data before deployment or after recovery (e.g., deck recording, values during ascent/descent, post-recovery handling). These segments must be trimmed to retain only the time interval when the instrument was collecting valid in-situ measurements at the nominal depth during deployment. In this stage: - Clock offsets are applied to correct instrument timing - Data is trimmed to deployment/recovery time windows - Unnecessary variables are removed - Metadata is enriched and standardized - Files are converted from `*_raw.nc` to `*_use.nc` format 2. Purpose ---------- - Apply clock corrections to synchronize instrument times - Remove non-oceanographic data before deployment and after recovery - Ensure temporal consistency across instruments on the same mooring - Define the usable deployment interval for each instrument - Standardize file naming and metadata structure 3. Current Implementation (Stage 2) ----------------------------------- The trimming process is implemented in the :mod:`oceanarray.stage2` module, which provides automated clock correction and temporal trimming for mooring datasets. 3.1. Input Requirements ^^^^^^^^^^^^^^^^^^^^^^^ The :class:`oceanarray.stage2.Stage2Processor` class processes Stage 1 output files: - **Raw NetCDF files** (`*_raw.nc`): Standardized files from Stage 1 processing - **YAML configuration**: Mooring metadata including deployment times and clock offsets - **Deployment windows**: Start and end times for valid data periods 3.2. Processing Workflow ^^^^^^^^^^^^^^^^^^^^^^^^ The Stage 2 processing follows these steps: 1. **Configuration Loading**: Read YAML files containing: - Deployment and recovery timestamps - Instrument-specific clock offsets - Mooring location and metadata 2. **Clock Offset Application**: Correct instrument timing by: - Adding clock offset variables to datasets - Shifting time coordinates by specified offset amounts - Preserving original datasets (immutable processing) 3. **Temporal Trimming**: Remove invalid data by: - Trimming to deployment time window (if specified) - Trimming to recovery time window (if specified) - Handling missing or invalid time bounds gracefully 4. **Metadata Enrichment**: Ensure complete metadata by: - Adding missing instrument depth, serial number, type - Preserving existing metadata without overwriting - Maintaining provenance information 5. **Variable Cleanup**: Remove unnecessary variables: - Legacy time variables (e.g., `timeS`) - Derived variables not needed for further processing 6. **NetCDF Output**: Write processed files with: - Updated filename suffix (`*_use.nc`) - Optimized compression and chunking - CF-compliant metadata structure 3.3. Configuration Format ^^^^^^^^^^^^^^^^^^^^^^^^^^ Clock offsets and deployment times are specified in YAML configuration: .. code-block:: yaml name: mooring_name deployment_time: '2018-08-12T08:00:00' recovery_time: '2018-08-26T20:47:24' instruments: - instrument: microcat serial: 7518 depth: 100 clock_offset: 300 # seconds 3.4. Usage Example ^^^^^^^^^^^^^^^^^^ .. code-block:: python from oceanarray.stage2 import Stage2Processor, process_multiple_moorings_stage2 # Process a single mooring processor = Stage2Processor('/path/to/data/') success = processor.process_mooring('mooring_name') # Process multiple moorings moorings = ['mooring1', 'mooring2', 'mooring3'] results = process_multiple_moorings_stage2(moorings, '/path/to/data/') 4. Output Format ---------------- The processed output includes: - **Trimmed NetCDF files** (`*_use.nc`) with: - Time coordinates corrected for clock offsets - Data restricted to deployment period - Standardized metadata and variable structure - Optimized storage format - **Processing logs** with detailed information about: - Clock offsets applied - Trimming operations performed - Data volume changes - Error conditions encountered Example output structure: .. code-block:: python Dimensions: (time: 8640) # Trimmed from original 124619 Coordinates: * time (time) datetime64[ns] 2018-08-12T08:05:00 ... 2018-08-26T19:55:00 Data variables: temperature (time) float32 ... salinity (time) float32 ... pressure (time) float32 ... serial_number int64 7518 InstrDepth int64 100 instrument `__ This script performed similar functions to the current Python implementation: .. literalinclude:: ../_static/code/microcat_raw2use_003.m :language: matlab :lines: 1-40 :linenos: :caption: Excerpt from `microcat_raw2use_003.m` The modern Python implementation in :mod:`oceanarray.stage2` provides equivalent functionality with improved: - **Scalability**: Process multiple moorings and years of data efficiently - **Reproducibility**: Automated processing with comprehensive logging - **Flexibility**: Easy modification of deployment windows and clock offsets - **Integration**: Seamless connection to subsequent processing stages 9. Implementation Notes ----------------------- - **Time Handling**: All times are processed as UTC datetime64 objects for consistency - **Clock Corrections**: Applied as integer second offsets to maintain precision - **Immutable Processing**: Original datasets are preserved; all operations return new objects - **Flexible Trimming**: Supports trimming by deployment time, recovery time, or both - **Batch Processing**: Efficient processing of multiple instruments and moorings 10. FAIR Considerations ----------------------- - **Findable**: Standardized file naming and metadata structure - **Accessible**: NetCDF format with CF conventions for broad compatibility - **Interoperable**: Consistent data structure across instruments and deployments - **Reusable**: Comprehensive metadata and processing provenance Trimmed intervals and clock corrections are documented transparently in dataset attributes and processing logs to maintain full provenance. See also: :doc:`../oceanarray`, :doc:`standardisation`, :doc:`filtering`