Legacy Modules

This section documents the legacy RODB/RAPID format processing modules that are maintained for backward compatibility with existing datasets.

Warning

These modules are deprecated and not recommended for new projects.

For new processing workflows, use the modern CF-compliant pipeline:

1. Standardisation (Internally-consistent format) (Stage 1)
2. Trimming to Deployed period (Stage 2)
Step 1: Time Gridding and Optional Filtering (Time Gridding)

Overview

The legacy modules are located in oceanarray.legacy and provide processing functions for RAPID/RODB format oceanographic data. These were the original processing functions developed for the RAPID-MOC array but have been superseded by the modern CF-compliant workflow.

Legacy Processing Workflow

The legacy workflow follows this pattern:

Read RODB data using RodbReader
Process individual instruments using process_rodb functions
Stack instruments into mooring using mooring_rodb functions
Convert to OceanSites format using convertOS functions

Legacy Modules

`oceanarray.legacy.rodb`

RODB format data reader for legacy RAPID datasets.

rodbhead.py - Decode RO database keywords.

Originally written in MATLAB by:

1. Krahmann, IfM Kiel, Oct 1995 (v0.1.0)
Updated for EPIC conventions: Feb 1996 (v0.1.1)
Optimized and extended by multiple contributors through 2005
Last change noted in MATLAB: C. Mertens, May 1997 (v0.1.4)
Extensive extensions by D. Kieke (2000) and T. Kanzow (2005)

Ported to Python by E Frajka-Williams, 2025

oceanarray.legacy.rodb.format_latlon(value, is_lat=True)[source]

oceanarray.legacy.rodb.is_rodb_file(filepath: Path) → bool[source]: Check whether a file is RODB-style based on filename and/or content.

oceanarray.legacy.rodb.parse_rodb_keys_file(filepath)[source]

Parse a rodb_keys.txt file with MATLAB-style lines into structured dicts.

Returns a dictionary with a list of entries under the ‘RODB_KEYS’ key.

oceanarray.legacy.rodb.rodbload(filepath, variables: list[str] = None) → Dataset[source]: Read a RODB .use or .raw file into an xarray.Dataset.

oceanarray.legacy.rodb.rodbsave(filepath, ds: Dataset, fmt=None)[source]

Save an xarray.Dataset to a RODB-style .use or .raw file.

Parameters:

filepath (str or Path) – Output file path.
ds (xarray.Dataset) – Dataset to write. Expects time-series variables to be indexed by ‘obs’.
fmt (str, optional) – Format string passed to np.savetxt. If None, one is generated automatically.

`oceanarray.legacy.process_rodb`

Individual instrument processing functions for RODB data.

oceanarray.legacy.process_rodb.apply_microcat_calibration_from_txt(txt_file: str, use_file: str) → Dataset[source]

Apply calibration offsets from a *.microcat.txt file to the original .use data file.

Parameters:

txt_file (str) – Path to the calibration log file (e.g., ‘wb1_12_2015_005.microcat’).
use_file (str) – Path to the original ‘.use’ file.

Returns:

ds – Dataset with calibrated temperature, conductivity, and pressure.

Return type:

xr.Dataset

oceanarray.legacy.process_rodb.mean_of_middle_percent(values, percent=95)[source]

Compute the mean of values within the central percent of the data.

Parameters:

values (array-like) – Input data (1D array). NaNs will be ignored.
percent (float) – Desired central percentage (e.g., 95 for middle 95%).

Returns:

Mean of values within the specified middle percentage.

Return type:

float

oceanarray.legacy.process_rodb.middle_percent(values, percent=95)[source]

Return the lower and upper bounds for the central percent of the data.

Parameters:

values (array-like) – Input data (1D array). NaNs will be ignored.
percent (float) – Desired central percentage (e.g., 95 for middle 95%).

Returns:

(lower_bound, upper_bound)

Return type:

tuple

oceanarray.legacy.process_rodb.normalize_by_middle_percent(values, percent=95)[source]

Normalize a data series by the mean and standard deviation of its central percent range.

Parameters:

values (array-like) – Input data (1D array). NaNs are ignored.
percent (float) – Central percentage to define the ‘middle’ of the distribution (e.g., 95).

Returns:

Normalized array with the same shape as input.

Return type:

array

oceanarray.legacy.process_rodb.normalize_dataset_by_middle_percent(ds, percent=95)[source]

Normalize all 1D data variables in an xarray Dataset that match the length of TIME, using the mean and std over the central percent of each variable.

Parameters:

ds (xarray.Dataset) – Input dataset with a ‘TIME’ coordinate.
percent (float) – Percentage of central values to define the middle (e.g., 95 for middle 95%).

Returns:

New dataset with normalized data variables.

Return type:

xarray.Dataset

oceanarray.legacy.process_rodb.stage2_trim(ds: Dataset, deployment_start: datetime = None, deployment_end: datetime = None) → Dataset[source]

Trim dataset to deployment period and fill time gaps with NaN.

Parameters:

ds (xarray.Dataset) – Input dataset with ‘TIME’ coordinate and variables like ‘TEMP’, ‘CNDC’, optionally ‘PRES’.
deployment_start (datetime, optional) – Start of the valid deployment period. If None, uses first timestamp in ds.
deployment_end (datetime, optional) – End of the valid deployment period. If None, uses last timestamp in ds.

Returns:

Trimmed and gap-filled dataset.

Return type:

xr.Dataset

oceanarray.legacy.process_rodb.std_of_middle_percent(values, percent=95)[source]

Compute the standard deviation of values within the central percent of the data.

Parameters:

values (array-like) – Input data (1D array). NaNs will be ignored.
percent (float) – Desired central percentage (e.g., 95 for middle 95%).

Returns:

Standard deviation of values within the specified middle percentage.

Return type:

float

oceanarray.legacy.process_rodb.trim_suggestion(ds, percent=95, threshold=6, vars_to_check=['T', 'C', 'P'])[source]

Normalize dataset variables using the middle percentile and determine suggested deployment start and end times where the normalized values are below a given threshold.

Parameters:

ds (xarray.Dataset) – Input dataset with a ‘TIME’ coordinate and 1D data variables.
percent (float) – Percentage for middle-percent normalization (e.g., 95).
threshold (float) – Absolute threshold on normalized values to consider as stable.
vars_to_check (list of str) – List of variable names to consider for start/end detection.

Returns:

start_time (np.datetime64 or None) – Suggested deployment start time.
end_time (np.datetime64 or None) – Suggested deployment end time.

`oceanarray.legacy.mooring_rodb`

Mooring-level stacking and filtering functions for RODB data.

oceanarray.legacy.mooring_rodb.add_serial_and_sensor_info(ds_combined, ds_list)[source]

Add serial number/depth mapping and sensor variable info to the combined dataset.

Parameters:

ds_combined (xarray.Dataset) – The combined mooring-level dataset (with DEPTH coordinate).
ds_list (list of xarray.Dataset) – List of instrument-level datasets used to create ds_combined.

Returns:

xarray.Dataset – The dataset with added attributes and SENSOR_VARS variable.
dict – The updated attributes dictionary.

oceanarray.legacy.mooring_rodb.auto_filt(y, sr, co, typ='low', fo=6)[source]

Apply a Butterworth digital filter to a data array.

Parameters:

y (array_like) – Input data array (1D).
sr (float) – Sampling rate (Hz or 1/time units of your data).
co (float or tuple of float) – Cutoff frequency/frequencies. A scalar for ‘low’ or ‘high’, a 2-tuple for ‘bandstop’.
typ (str, optional) – Filter type: ‘low’, ‘high’, or ‘bandstop’. Default is ‘low’.
fo (int, optional) – Filter order. Default is 6.

Returns:

yf – Filtered data array.

Return type:

ndarray

oceanarray.legacy.mooring_rodb.combine_mooring_OS(ds_list)[source]

Combine a list of OceanSITES instrument-level datasets into a mooring-level dataset.

Parameters:: ds_list (list of xarray.Dataset) – List of instrument-level datasets to concatenate along DEPTH.
Returns:: Combined dataset with cleaned and updated global attributes.
Return type:: xarray.Dataset

oceanarray.legacy.mooring_rodb.filter_all_time_vars(ds, cutoff_days=2, fo=6)[source]

Apply a lowpass Butterworth filter to all data variables that depend on TIME.

Parameters:

ds (xarray.Dataset) – Dataset containing time series variables.
cutoff_days (float, optional) – Lowpass filter cutoff in days. Default is 2 days.
fo (int, optional) – Filter order. Default is 6.

Returns:

Dataset with filtered variables.

Return type:

xarray.Dataset

oceanarray.legacy.mooring_rodb.find_common_attributes(ds_list)[source]: Return attributes that are common across all datasets with the same value.

oceanarray.legacy.mooring_rodb.find_time_vars(ds_list, time_key='TIME')[source]: Return all variable names that have (time_key,) dimensions in any dataset.

oceanarray.legacy.mooring_rodb.get_12hourly_time_grid(time_or_ds, freq='12h', start_offset=Timedelta('1 days 00:00:00'), end_offset=Timedelta('0 days 00:00:00'), time_var='TIME')[source]

Given a pandas.DatetimeIndex, array of datetimes, or xarray.Dataset, return a regular time grid (default: 12-hourly) from the first full day after the start to the last full day before the end.

Parameters:

time_or_ds (array-like of datetime64, pandas.DatetimeIndex, or xarray.Dataset) – Input time array or dataset containing a time variable.
freq (str, optional) – Frequency string for the output grid (default ‘12h’).
start_offset (pd.Timedelta, optional) – Offset to add to the start time before normalizing (default 1 day).
end_offset (pd.Timedelta, optional) – Offset to subtract from the end time after normalizing (default 0).
time_var (str, optional) – Name of the time variable in the dataset (default ‘TIME’).

Returns:

Regular time grid at the specified frequency.

Return type:

pandas.DatetimeIndex

oceanarray.legacy.mooring_rodb.interp_to_12hour_grid(ds1)[source]

Interpolate all variables with a TIME dimension to a regular 12-hour grid.

Handles both 1D and multidimensional variables with TIME as the first dimension.

oceanarray.legacy.mooring_rodb.stack_instruments(ds_list, time_key='TIME')[source]

`oceanarray.legacy.convertOS`

OceanSites format conversion functions for legacy RODB data.

oceanarray.legacy.convertOS.add_fixed_coordinates(ds, metadata)[source]: Add fixed spatial coordinates LATITUDE, LONGITUDE, and DEPTH from metadata.

oceanarray.legacy.convertOS.add_global_attributes(ds, metadata)[source]: Add OceanSITES-compliant global attributes to the dataset.

oceanarray.legacy.convertOS.add_instrument_metadata(ds, sensor_dict, metadata)[source]

Add centralized instrument metadata and link variables via the ‘instrument’ attribute.

Parameters:

ds (xarray.Dataset) – The dataset to modify.
sensor_dict (dict) – Dictionary with ‘instrument_<N>’ keys and/or TOOL### templates, plus ‘variables’.
metadata (dict) – Header metadata to extract instrument-specific fields like serial and dates.

oceanarray.legacy.convertOS.add_project_attributes(ds, project_attrs=None, metadata=None)[source]: Merge project-level attributes into dataset, with derived and dynamic values.

oceanarray.legacy.convertOS.add_variable_attributes(ds, vocab_attrs)[source]

Convert a dataset loaded from RODB format into OceanSITES-compliant format.

Parameters:

ds (xarray.Dataset) – Original dataset with RODB-style variables and TIME coordinate.
metadata_txt (str or Path) – Path to RODB metadata text file (header block with key=value lines).
var_map_yaml (str or Path) – YAML file mapping original variable names to OceanSITES names.
vocab_yaml (str or Path) – YAML file providing OceanSITES variable attributes.
sensor_yaml (str or Path, optional) – YAML file with instrument metadata and variable-instrument mapping.
project_yaml (str or Path, optional) – YAML file with project-level global attributes.

Returns:

OceanSITES-compliant dataset.

Return type:

xarray.Dataset

oceanarray.legacy.convertOS.format_time_variable(ds)[source]

oceanarray.legacy.convertOS.infer_data_mode(source_filename=None, history=None)[source]

Infer data_mode (D=delayed, R=real-time, P=provisional) from filename or processing history.

Parameters:

source_filename (str or Path, optional) – Original filename (e.g., ‘example.raw’, ‘data.microcat’).
history (str, optional) – Global attribute describing the file’s processing history.

Returns:

One of “D”, “R”, or “P”. Defaults to “D” (delayed mode) if no clear match found.

Return type:

str

oceanarray.legacy.convertOS.load_yaml(path)[source]

oceanarray.legacy.convertOS.parse_rodb_metadata(txt_path)[source]

Parse RODB metadata from a .raw or .use file header.

Parameters:: txt_path (str or Path) – Path to the RODB file.
Returns:: (header_dict, data_start_index)
Return type:: tuple

oceanarray.legacy.convertOS.sort_global_attributes(ds, attr_order=None)[source]

Legacy Configuration

Legacy configuration files are stored in oceanarray/config/legacy/:

rodb_keys.yaml - RODB variable name mappings
rodb_keys.txt - Text format RODB variable definitions

Migration Guide

To migrate from legacy to modern processing:

Legacy Workflow:

from oceanarray.legacy import process_instrument, combine_mooring_OS
from oceanarray.legacy.rodb import RodbReader

# Legacy processing
reader = RodbReader('data.rodb')
data = reader.read()
processed = process_instrument(data)
mooring = combine_mooring_OS([processed])

Modern Workflow:

from oceanarray.stage1 import MooringProcessor
from oceanarray.stage2 import Stage2Processor
from oceanarray.time_gridding import TimeGridProcessor

# Modern CF-compliant processing
stage1 = MooringProcessor('/data/path')
stage1.process_mooring('mooring_name')

stage2 = Stage2Processor('/data/path')
stage2.process_mooring('mooring_name')

gridder = TimeGridProcessor('/data/path')
gridder.process_mooring('mooring_name')

Key Differences

Legacy Demo Notebooks

Legacy demo notebooks are available in notebooks/legacy/:

demo_instrument_rdb.ipynb - Legacy RODB instrument processing
demo_mooring_rdb.ipynb - Legacy RODB mooring processing
demo_batch_instrument.ipynb - Batch processing and QC analysis

These notebooks demonstrate the legacy workflow but are not recommended for new processing tasks.

Deprecation Timeline

The legacy modules will be maintained for backward compatibility but will not receive new features:

Current: Full backward compatibility maintained
Future: Bug fixes only, no new features
Long-term: May be moved to separate package or archived

For all new processing workflows, please use the modern CF-compliant pipeline documented in the main methods section.

Legacy Modules

Overview

Legacy Processing Workflow

Legacy Modules

oceanarray.legacy.rodb

oceanarray.legacy.process_rodb

oceanarray.legacy.mooring_rodb

oceanarray.legacy.convertOS

Legacy Configuration

Migration Guide

Key Differences

Legacy Demo Notebooks

Deprecation Timeline

`oceanarray.legacy.rodb`

`oceanarray.legacy.process_rodb`

`oceanarray.legacy.mooring_rodb`

`oceanarray.legacy.convertOS`