2. Trimming to Deployed period
This document outlines the trimming step in the oceanarray processing workflow. Trimming refers to the process of isolating the valid deployment period from a time series—typically corresponding to the interval between instrument deployment and recovery. This step also applies clock corrections and produces standardized NetCDF files ready for further processing.
This step may need to be run several times, adjusting the start and end of the deployment periods based on data inspection.
It corresponds to Stage 2 from RAPID data processing and management, which uses as input the RDB format and outputs (in RDB format) the *.use file. Here, we start with the Stage 1 output *_raw.nc files and produce *_use.nc files.
1. Overview
Raw mooring records often contain extraneous data before deployment or after recovery (e.g., deck recording, values during ascent/descent, post-recovery handling). These segments must be trimmed to retain only the time interval when the instrument was collecting valid in-situ measurements at the nominal depth during deployment. In this stage:
Clock offsets are applied to correct instrument timing
Data is trimmed to deployment/recovery time windows
Unnecessary variables are removed
Metadata is enriched and standardized
Files are converted from *_raw.nc to *_use.nc format
2. Purpose
Apply clock corrections to synchronize instrument times
Remove non-oceanographic data before deployment and after recovery
Ensure temporal consistency across instruments on the same mooring
Define the usable deployment interval for each instrument
Standardize file naming and metadata structure
3. Current Implementation (Stage 2)
The trimming process is implemented in the oceanarray.stage2
module, which provides automated clock correction and temporal trimming for mooring datasets.
3.1. Input Requirements
The oceanarray.stage2.Stage2Processor
class processes Stage 1 output files:
Raw NetCDF files (*_raw.nc): Standardized files from Stage 1 processing
YAML configuration: Mooring metadata including deployment times and clock offsets
Deployment windows: Start and end times for valid data periods
3.2. Processing Workflow
The Stage 2 processing follows these steps:
Configuration Loading: Read YAML files containing: - Deployment and recovery timestamps - Instrument-specific clock offsets - Mooring location and metadata
Clock Offset Application: Correct instrument timing by: - Adding clock offset variables to datasets - Shifting time coordinates by specified offset amounts - Preserving original datasets (immutable processing)
Temporal Trimming: Remove invalid data by: - Trimming to deployment time window (if specified) - Trimming to recovery time window (if specified) - Handling missing or invalid time bounds gracefully
Metadata Enrichment: Ensure complete metadata by: - Adding missing instrument depth, serial number, type - Preserving existing metadata without overwriting - Maintaining provenance information
Variable Cleanup: Remove unnecessary variables: - Legacy time variables (e.g., timeS) - Derived variables not needed for further processing
NetCDF Output: Write processed files with: - Updated filename suffix (*_use.nc) - Optimized compression and chunking - CF-compliant metadata structure
3.3. Configuration Format
Clock offsets and deployment times are specified in YAML configuration:
name: mooring_name
deployment_time: '2018-08-12T08:00:00'
recovery_time: '2018-08-26T20:47:24'
instruments:
- instrument: microcat
serial: 7518
depth: 100
clock_offset: 300 # seconds
3.4. Usage Example
from oceanarray.stage2 import Stage2Processor, process_multiple_moorings_stage2
# Process a single mooring
processor = Stage2Processor('/path/to/data/')
success = processor.process_mooring('mooring_name')
# Process multiple moorings
moorings = ['mooring1', 'mooring2', 'mooring3']
results = process_multiple_moorings_stage2(moorings, '/path/to/data/')
4. Output Format
The processed output includes:
Trimmed NetCDF files (*_use.nc) with: - Time coordinates corrected for clock offsets - Data restricted to deployment period - Standardized metadata and variable structure - Optimized storage format
Processing logs with detailed information about: - Clock offsets applied - Trimming operations performed - Data volume changes - Error conditions encountered
Example output structure:
<xarray.Dataset>
Dimensions: (time: 8640) # Trimmed from original 124619
Coordinates:
* time (time) datetime64[ns] 2018-08-12T08:05:00 ... 2018-08-26T19:55:00
Data variables:
temperature (time) float32 ...
salinity (time) float32 ...
pressure (time) float32 ...
serial_number int64 7518
InstrDepth int64 100
instrument <U8 'microcat'
clock_offset int64 300
Attributes:
mooring_name: test_mooring
deployment_time: 2018-08-12T08:00:00
recovery_time: 2018-08-26T20:47:24
5. Quality Control and Error Handling
Stage 2 processing includes comprehensive quality control:
Time Range Validation: Ensures deployment/recovery times are reasonable
Clock Offset Verification: Logs all timing corrections applied
Data Completeness Checks: Warns when trimming results in empty datasets
File Integrity: Validates input files before processing
Graceful Degradation: Continues processing other instruments if one fails
All processing activities are logged with timestamps for audit trails and debugging.
6. Integration with Processing Chain
Stage 2 processed files serve as input to subsequent processing steps:
Stage 3: Further quality control and calibration applications
Later stages: filtering, gridding, stitching for array products
The consistent structure and timing corrections applied during Stage 2 ensure that downstream processing tools can operate reliably across different instrument types and deployments.
7. Historical Context: RAPID Processing
This trimming step evolved from the RAPID programme’s Stage 2 processing, which converted RDB format files from *.raw to *.use format. The modern implementation provides equivalent functionality with several improvements:
Automated Processing: Batch processing of multiple instruments and moorings
Enhanced Metadata: Comprehensive provenance and quality control information
Flexible Configuration: YAML-based configuration for easy modification
Error Recovery: Robust handling of missing files and invalid configurations
Modern Formats: NetCDF output with CF conventions for interoperability
8. Legacy Processing Scripts
The original RAPID processing chain used MATLAB scripts for trimming operations:
This script performed similar functions to the current Python implementation:
1clear
2% basic preprocessing for microcat data
3%
4% features
5% 1. eliminate lauching and recovery period
6% 2. save data to rodb file
7% 3. create data overview sheet
8%
9% uses timeaxis.m, auto_filt.m, julian.m
10
11% 11.01.01 Kanzow
12% 13.08.02 Kanzow : debugged
13%
14%
15
16% --- get mooring information from infofile
17
18
19moor = 'ebm1_2_200736'; % Mooring name
20
21operator = 'wallace';
22
23plot_interval = [2007 11 07 0; % start time of time axis on plot
24 2008 11 20 0]; % end time of time axis on plot
25
26mc_id = [333 337] ; % microcat id
27
28%-- path
29
30% -- set path for data input and output
31
32% inpath = ['/data/rapid/cd170/moorings/',moor,'/microcat/'];
33% outpath = ['/data/rapid/cd170/moorings/',moor,'/microcat/'];
34% infofile =['/data/rapid/cd170/moorings/',moor,'/',moor,'info.dat'];
35
36inpath = ['~/Data/rpdmoc/rapid/data/moor/proc/',moor,'/microcat/']; % C Wallace changed paths for cruise d334
37outpath = ['~/Data/rpdmoc/rapid/data/moor/proc/',moor,'/microcat/'];
38infofile = ['~/Data/rpdmoc/rapid/data/moor/proc/',moor,'/',moor,'info.dat'];
The modern Python implementation in oceanarray.stage2
provides equivalent functionality with improved:
Scalability: Process multiple moorings and years of data efficiently
Reproducibility: Automated processing with comprehensive logging
Flexibility: Easy modification of deployment windows and clock offsets
Integration: Seamless connection to subsequent processing stages
9. Implementation Notes
Time Handling: All times are processed as UTC datetime64 objects for consistency
Clock Corrections: Applied as integer second offsets to maintain precision
Immutable Processing: Original datasets are preserved; all operations return new objects
Flexible Trimming: Supports trimming by deployment time, recovery time, or both
Batch Processing: Efficient processing of multiple instruments and moorings
10. FAIR Considerations
Findable: Standardized file naming and metadata structure
Accessible: NetCDF format with CF conventions for broad compatibility
Interoperable: Consistent data structure across instruments and deployments
Reusable: Comprehensive metadata and processing provenance
Trimmed intervals and clock corrections are documented transparently in dataset attributes and processing logs to maintain full provenance.
See also: oceanarray API, 1. Standardisation (Internally-consistent format), filtering