Development Roadmap
This document outlines the development roadmap for the OceanArray processing framework, focusing on features documented in the processing workflow that need implementation, technical improvements, and future functionality priorities.
Status Overview
The OceanArray framework currently provides a solid foundation for oceanographic data processing, but several key components documented in the processing framework require implementation or completion.
Current Implementation Status:
- ✅ Implemented & Working
Stage 1: Standardisation (
stage1.py
)Stage 2: Trimming & Clock Corrections (
stage2.py
)Step 1: Time Gridding (
time_gridding.py
)Clock Offset Analysis (
clock_offset.py
)Data Readers (
readers.py
)Basic QC visualization (
plotters.py
)Configurable Logging System
- 🟡 Partially Implemented
Step 2: Vertical Gridding - physics-based interpolation exists (
rapid_interp.py
)
- ❌ Documented but Not Implemented
Stage 3: Automatic Quality Control using QARTOD standards
Stage 4: Calibration Information Integration (microcat focus)
Stage 5: Conversion to OceanSites format
Step 3: Concatenation of deployments
Multi-site merging for boundary profiles
Priority 1: Core Missing Features
1. Stage 3: Automatic Quality Control using QARTOD Standards
Documentation: docs/source/methods/auto_qc.rst
Purpose: Apply systematic quality control checks following QARTOD (Quality Assurance/Quality Control of Real-Time Oceanographic Data) standards to identify and flag suspect data.
Current State: Basic QC functions exist in tools.py:run_qc()
with salinity outlier detection, temporal spike detection, and visualization in plotters.py
.
Missing Implementation:
- Complete stage3.py
module implementing full QARTOD test suite
- Integration with ioos_qc
package for standardized tests
- QARTOD-compliant flag value handling (0,1,2,3,4,7,8,9)
- Configurable QC test parameters via YAML
- Automated QC report generation with summary statistics
- QC metadata preservation in NetCDF output
QARTOD Tests to Implement: - Gross range test (min/max bounds) - Climatological test (seasonal expectations) - Spike test (temporal derivatives) - Rate of change test - Flat line test (stuck values) - Multi-variate tests (T-S relationships) - Neighbor test (spatial consistency)
Estimated Effort: 2-3 weeks
- Implementation Plan:
Create
oceanarray/stage3.py
module withQCProcessor
classDesign YAML-based QC configuration system for test parameters
Integrate
ioos_qc
package for standardized QARTOD implementationsImplement comprehensive QARTOD flag handling and metadata
Add QC validation and reporting with summary statistics
Integrate with Stage 2 → Stage 3 → Stage 4 pipeline
2. Stage 4: Calibration Information Integration (Microcat Focus)
Documentation: docs/source/methods/calibration.rst
Purpose: Apply instrument calibration corrections, with initial focus on Sea-Bird MicroCAT conductivity-temperature sensors, incorporating pre- and post-deployment calibration information.
Current State: Basic microcat calibration functions exist in process_rodb.py
for legacy RODB workflows.
Missing Implementation:
- Complete stage4.py
module for modern CF-compliant calibration workflow
- Integration with Sea-Bird calibration certificate parsing
- Pre/post-deployment calibration comparison and drift analysis
- Conductivity cell thermal mass corrections
- Calibration uncertainty propagation through processing chain
- Calibration metadata preservation in NetCDF output
- Support for multiple calibration coefficient sets
Calibration Features to Implement: - Sea-Bird calibration certificate parsing (.xmlcon, .cal files) - Conductivity calibration equation application (frequency-based) - Temperature calibration with ITS-90 conversion - Pressure sensor calibration and atmospheric correction - Thermal mass correction for conductivity measurements - Calibration drift analysis between pre/post deployments - Uncertainty quantification and propagation
Estimated Effort: 2-3 weeks
- Implementation Plan:
Create
oceanarray/stage4.py
module withCalibrationProcessor
classDesign calibration configuration system for coefficient management
Implement Sea-Bird calibration certificate parsing
Add thermal mass correction algorithms
Create pre/post calibration comparison tools
Add uncertainty propagation and metadata preservation
Integrate with Stage 3 → Stage 4 → Stage 5 pipeline
3. Stage 5: OceanSites Format Conversion
Documentation: docs/source/methods/conversion.rst
Purpose: Convert processed and calibrated data to OceanSites format specification for community data sharing and archival.
Current State: Some format conversion exists in convertOS.py
, but not the full OceanSites specification compliance.
Missing Implementation:
- Complete stage5.py
module for OceanSites format conversion
- Global attribute validation and enforcement per OceanSites standards
- CF-convention compliance checking and validation
- Variable attribute standardization according to OceanSites vocabulary
- Comprehensive metadata template system
- Quality flag conversion to OceanSites standards
Estimated Effort: 2-3 weeks
- Implementation Plan:
Create
oceanarray/stage5.py
module withOceanSitesProcessor
classImplement complete OceanSites format validation
Add CF-compliance checking and enforcement
Design metadata template system for OceanSites requirements
Add quality flag conversion from QARTOD to OceanSites standards
Integrate with Stage 4 → Stage 5 pipeline
4. Step 3: Deployment Concatenation
Documentation: docs/source/methods/concatenation.rst
Current State: No implementation found.
Missing Implementation: - Multi-deployment time series merging - Gap handling and interpolation - Consistent time-pressure grid creation - Metadata preservation across deployments - Quality flag propagation
Estimated Effort: 1-2 weeks
- Implementation Plan:
Create
oceanarray/concatenation.py
moduleDesign deployment merging algorithm
Implement gap filling strategies
Add time-pressure grid standardization
Create validation and QC checks
5. Enhanced Visualization System
Current State: Basic plotting functions exist in plotters.py
.
Missing Implementation: - Interactive plotting capabilities - Multi-instrument comparison plots - Time series overview with zoom functionality - QC flag visualization overlays - Deployment boundary and gap visualization - Statistical summary plots - Customizable plot templates
Estimated Effort: 2-3 weeks
- Implementation Plan:
Expand
plotters.py
with interactive featuresAdd multi-instrument comparison tools
Implement QC flag overlay visualization
Create statistical summary plots
Add customizable plotting templates
Integrate with processing pipeline for automatic reporting
6. Intelligent Metadata Fallback System
Current State: Metadata extraction relies on explicit YAML configuration.
Missing Implementation: - Filename pattern parsing for instrument type and serial number - Fallback metadata extraction when YAML is incomplete - Intelligent instrument identification from file patterns - Automatic serial number detection from filenames - Validation and warning system for inferred metadata
Estimated Effort: 1 week
- Implementation Plan:
Create filename parsing utilities in
utilities.py
Design instrument type detection patterns
Add serial number extraction from common filename formats
Implement metadata validation and fallback logic
Add logging and warnings for inferred metadata
Integrate with Stage 1 processing pipeline
7. Comprehensive Mooring Processing Reports
Current State: No automated reporting system exists.
Missing Implementation: - HTML report generation for each mooring - Processing completeness analysis (YAML vs actual files) - Missing file detection and reporting - Data coverage visualization and statistics - Automated figure generation for all available variables - Processing timeline and status summaries - Integration with existing processing pipeline
Estimated Effort: 2-3 weeks
- Implementation Plan:
Create
oceanarray/reporting.py
module withReportGenerator
classDesign HTML template system for mooring reports
Implement file completeness checking (YAML vs
*_use.nc
vs raw files)Add automated visualization generation for all data variables
Create processing status and timeline summaries
Integrate with processing pipeline for automatic report generation
Design directory structure:
moor/proc/{mooring}/processing/{report,logs,figures}/
Priority 2: Advanced Processing Features
8. Multi-site Merging for Boundary Profiles
Documentation: docs/source/methods/multisite_merging.rst
Current State: No implementation found.
Missing Implementation: - Cross-site data integration - Boundary profile construction - Static stability checking - Site-specific weighting strategies - Spatial interpolation methods
Estimated Effort: 3-4 weeks
- Implementation Plan:
Create
oceanarray/multisite_merging.py
moduleImplement spatial merging algorithms
Add static stability validation
Design site weighting strategies
Create boundary profile outputs
9. Complete Vertical Gridding Integration
Documentation: docs/source/methods/vertical_gridding.rst
Current State: Physics-based interpolation exists in rapid_interp.py
but needs integration.
Missing Implementation: - Integration with main processing pipeline - Climatology data management - Configuration for different interpolation strategies - Gap filling and extrapolation options - Validation against known profiles
Estimated Effort: 1-2 weeks
- Implementation Plan:
Refactor
rapid_interp.py
for general useCreate configuration system for interpolation parameters
Add climatology data handling
Integrate with mooring processing workflow
Add validation and diagnostic tools
Priority 3: Enhanced Calibration System
10. Comprehensive Calibration Framework
Documentation: docs/source/methods/calibration.rst
Current State: Basic microcat calibration exists in process_rodb.py
.
Missing Implementation: - Multi-instrument calibration support (not just microcat) - Structured calibration metadata handling - Pre/post-cruise comparison workflows - Calibration uncertainty propagation - Automated calibration log parsing
Estimated Effort: 2-3 weeks
- Implementation Plan:
Expand
process_rodb.py
calibration functionsCreate calibration configuration system
Add uncertainty propagation
Design calibration workflow automation
Add comprehensive logging and provenance
Priority 4: System Architecture Improvements
11. Methods Module Organization
Current State: Processing functions scattered across multiple modules.
Improvement: Create organized methods/
directory structure:
oceanarray/methods/
├── __init__.py
├── auto_qc.py
├── calibration.py
├── concatenation.py
├── conversion.py
├── multisite_merging.py
└── vertical_gridding.py
Estimated Effort: 1 week
12. Enhanced Configuration System
Current State: Basic logging configuration exists.
Missing Features: - Global processing configuration - Site-specific parameter management - Processing pipeline configuration - Validation and schema checking
Estimated Effort: 1-2 weeks
13. Test Coverage Improvement
Current State: Basic tests exist in tests/
directory.
Missing Features: - End-to-end pipeline testing - Method-specific unit tests - Configuration validation tests - Performance benchmarking
Estimated Effort: 2-3 weeks (ongoing)
Technical Debt Note: This represents accumulated testing debt where functionality exists but lacks comprehensive test coverage, making maintenance and refactoring more risky.
Priority 5: Advanced Analysis Features
14. Data Storage Efficiency Improvements
Current State: Standard NetCDF output with basic compression.
Missing Implementation: - Optimized chunking strategies - Advanced compression algorithms - Memory-efficient processing for large datasets - Streaming processing capabilities - Storage format optimization
Estimated Effort: 2-3 weeks
- Implementation Plan:
Profile current storage bottlenecks
Implement optimized chunking strategies
Add advanced compression options
Create memory-efficient processing pipelines
Add storage format benchmarking
Development Milestones
Phase 1: Core Framework Completion (Months 1-3)
Improve test coverage (address technical debt)
Implement intelligent metadata fallback system
Enhance visualization system
Implement comprehensive mooring processing reports
Complete auto QC framework
Implement OceanSites format conversion
Add deployment concatenation
Phase 2: Advanced Processing (Months 4-6)
Organize methods module structure
Enhance configuration system
Implement multi-site merging
Complete vertical gridding integration
Enhance calibration framework
Phase 3: System Optimization (Months 7-9)
Improve data storage efficiency
Performance optimization and profiling
Create comprehensive documentation
User experience improvements
Technical Debt and Maintenance
Ongoing Improvements
Code Quality - Add type hints throughout codebase - Improve error handling and validation - Standardize documentation strings - Enhance logging throughout pipeline
Performance - Profile processing bottlenecks - Optimize memory usage for large datasets - Add parallel processing capabilities - Implement caching strategies
User Experience - Create command-line interface - Add progress indicators for long operations - Improve error messages and debugging - Create tutorial notebooks
Documentation - Complete API documentation - Add processing examples - Create troubleshooting guides - Document best practices
Dependencies and External Integration
Key External Dependencies
ioos_qc
: For comprehensive QC implementationgsw
(TEOS-10): For seawater property calculationsverticalnn
: For physics-based vertical interpolationxarray
&netCDF4
: Core data handlingdask
: For large dataset processing (future)
Integration Opportunities
Pangaea: Data publication workflows
OceanSites: Enhanced format compliance
ERDDAP: Direct data ingestion capabilities
Community and Collaboration
Contribution Priorities
Method validation with known datasets
Cross-array compatibility testing
Performance benchmarking
User interface development
Processing workflow documentation
This roadmap provides a structured path toward completing the OceanArray processing framework while maintaining focus on documented requirements and practical implementation priorities.