Legacy Modules
This section documents the legacy RODB/RAPID format processing modules that are maintained for backward compatibility with existing datasets.
Warning
These modules are deprecated and not recommended for new projects.
For new processing workflows, use the modern CF-compliant pipeline:
2. Trimming to Deployed period (Stage 2)
Step 1: Time Gridding and Optional Filtering (Time Gridding)
Overview
The legacy modules are located in oceanarray.legacy
and provide processing functions for RAPID/RODB format oceanographic data. These were the original processing functions developed for the RAPID-MOC array but have been superseded by the modern CF-compliant workflow.
Legacy Processing Workflow
The legacy workflow follows this pattern:
Read RODB data using
RodbReader
Process individual instruments using
process_rodb
functionsStack instruments into mooring using
mooring_rodb
functionsConvert to OceanSites format using
convertOS
functions
Legacy Modules
oceanarray.legacy.rodb
RODB format data reader for legacy RAPID datasets.
rodbhead.py - Decode RO database keywords.
- Originally written in MATLAB by:
Krahmann, IfM Kiel, Oct 1995 (v0.1.0)
Updated for EPIC conventions: Feb 1996 (v0.1.1)
Optimized and extended by multiple contributors through 2005
Last change noted in MATLAB: C. Mertens, May 1997 (v0.1.4)
Extensive extensions by D. Kieke (2000) and T. Kanzow (2005)
Ported to Python by E Frajka-Williams, 2025
- oceanarray.legacy.rodb.is_rodb_file(filepath: Path) bool [source]
Check whether a file is RODB-style based on filename and/or content.
- oceanarray.legacy.rodb.parse_rodb_keys_file(filepath)[source]
Parse a rodb_keys.txt file with MATLAB-style lines into structured dicts.
Returns a dictionary with a list of entries under the ‘RODB_KEYS’ key.
- oceanarray.legacy.rodb.rodbload(filepath, variables: list[str] = None) Dataset [source]
Read a RODB .use or .raw file into an xarray.Dataset.
- oceanarray.legacy.rodb.rodbsave(filepath, ds: Dataset, fmt=None)[source]
Save an xarray.Dataset to a RODB-style .use or .raw file.
- Parameters:
filepath (str or Path) – Output file path.
ds (xarray.Dataset) – Dataset to write. Expects time-series variables to be indexed by ‘obs’.
fmt (str, optional) – Format string passed to np.savetxt. If None, one is generated automatically.
oceanarray.legacy.process_rodb
Individual instrument processing functions for RODB data.
- oceanarray.legacy.process_rodb.apply_microcat_calibration_from_txt(txt_file: str, use_file: str) Dataset [source]
Apply calibration offsets from a *.microcat.txt file to the original .use data file.
- Parameters:
txt_file (str) – Path to the calibration log file (e.g., ‘wb1_12_2015_005.microcat’).
use_file (str) – Path to the original ‘.use’ file.
- Returns:
ds – Dataset with calibrated temperature, conductivity, and pressure.
- Return type:
xr.Dataset
- oceanarray.legacy.process_rodb.mean_of_middle_percent(values, percent=95)[source]
Compute the mean of values within the central percent of the data.
- Parameters:
values (array-like) – Input data (1D array). NaNs will be ignored.
percent (float) – Desired central percentage (e.g., 95 for middle 95%).
- Returns:
Mean of values within the specified middle percentage.
- Return type:
float
- oceanarray.legacy.process_rodb.middle_percent(values, percent=95)[source]
Return the lower and upper bounds for the central percent of the data.
- Parameters:
values (array-like) – Input data (1D array). NaNs will be ignored.
percent (float) – Desired central percentage (e.g., 95 for middle 95%).
- Returns:
(lower_bound, upper_bound)
- Return type:
tuple
- oceanarray.legacy.process_rodb.normalize_by_middle_percent(values, percent=95)[source]
Normalize a data series by the mean and standard deviation of its central percent range.
- Parameters:
values (array-like) – Input data (1D array). NaNs are ignored.
percent (float) – Central percentage to define the ‘middle’ of the distribution (e.g., 95).
- Returns:
Normalized array with the same shape as input.
- Return type:
array
- oceanarray.legacy.process_rodb.normalize_dataset_by_middle_percent(ds, percent=95)[source]
Normalize all 1D data variables in an xarray Dataset that match the length of TIME, using the mean and std over the central percent of each variable.
- Parameters:
ds (xarray.Dataset) – Input dataset with a ‘TIME’ coordinate.
percent (float) – Percentage of central values to define the middle (e.g., 95 for middle 95%).
- Returns:
New dataset with normalized data variables.
- Return type:
xarray.Dataset
- oceanarray.legacy.process_rodb.stage2_trim(ds: Dataset, deployment_start: datetime = None, deployment_end: datetime = None) Dataset [source]
Trim dataset to deployment period and fill time gaps with NaN.
- Parameters:
ds (xarray.Dataset) – Input dataset with ‘TIME’ coordinate and variables like ‘TEMP’, ‘CNDC’, optionally ‘PRES’.
deployment_start (datetime, optional) – Start of the valid deployment period. If None, uses first timestamp in ds.
deployment_end (datetime, optional) – End of the valid deployment period. If None, uses last timestamp in ds.
- Returns:
Trimmed and gap-filled dataset.
- Return type:
xr.Dataset
- oceanarray.legacy.process_rodb.std_of_middle_percent(values, percent=95)[source]
Compute the standard deviation of values within the central percent of the data.
- Parameters:
values (array-like) – Input data (1D array). NaNs will be ignored.
percent (float) – Desired central percentage (e.g., 95 for middle 95%).
- Returns:
Standard deviation of values within the specified middle percentage.
- Return type:
float
- oceanarray.legacy.process_rodb.trim_suggestion(ds, percent=95, threshold=6, vars_to_check=['T', 'C', 'P'])[source]
Normalize dataset variables using the middle percentile and determine suggested deployment start and end times where the normalized values are below a given threshold.
- Parameters:
ds (xarray.Dataset) – Input dataset with a ‘TIME’ coordinate and 1D data variables.
percent (float) – Percentage for middle-percent normalization (e.g., 95).
threshold (float) – Absolute threshold on normalized values to consider as stable.
vars_to_check (list of str) – List of variable names to consider for start/end detection.
- Returns:
start_time (np.datetime64 or None) – Suggested deployment start time.
end_time (np.datetime64 or None) – Suggested deployment end time.
oceanarray.legacy.mooring_rodb
Mooring-level stacking and filtering functions for RODB data.
- oceanarray.legacy.mooring_rodb.add_serial_and_sensor_info(ds_combined, ds_list)[source]
Add serial number/depth mapping and sensor variable info to the combined dataset.
- Parameters:
ds_combined (xarray.Dataset) – The combined mooring-level dataset (with DEPTH coordinate).
ds_list (list of xarray.Dataset) – List of instrument-level datasets used to create ds_combined.
- Returns:
xarray.Dataset – The dataset with added attributes and SENSOR_VARS variable.
dict – The updated attributes dictionary.
- oceanarray.legacy.mooring_rodb.auto_filt(y, sr, co, typ='low', fo=6)[source]
Apply a Butterworth digital filter to a data array.
- Parameters:
y (array_like) – Input data array (1D).
sr (float) – Sampling rate (Hz or 1/time units of your data).
co (float or tuple of float) – Cutoff frequency/frequencies. A scalar for ‘low’ or ‘high’, a 2-tuple for ‘bandstop’.
typ (str, optional) – Filter type: ‘low’, ‘high’, or ‘bandstop’. Default is ‘low’.
fo (int, optional) – Filter order. Default is 6.
- Returns:
yf – Filtered data array.
- Return type:
ndarray
- oceanarray.legacy.mooring_rodb.combine_mooring_OS(ds_list)[source]
Combine a list of OceanSITES instrument-level datasets into a mooring-level dataset.
- Parameters:
ds_list (list of xarray.Dataset) – List of instrument-level datasets to concatenate along DEPTH.
- Returns:
Combined dataset with cleaned and updated global attributes.
- Return type:
xarray.Dataset
- oceanarray.legacy.mooring_rodb.filter_all_time_vars(ds, cutoff_days=2, fo=6)[source]
Apply a lowpass Butterworth filter to all data variables that depend on TIME.
- Parameters:
ds (xarray.Dataset) – Dataset containing time series variables.
cutoff_days (float, optional) – Lowpass filter cutoff in days. Default is 2 days.
fo (int, optional) – Filter order. Default is 6.
- Returns:
Dataset with filtered variables.
- Return type:
xarray.Dataset
- oceanarray.legacy.mooring_rodb.find_common_attributes(ds_list)[source]
Return attributes that are common across all datasets with the same value.
- oceanarray.legacy.mooring_rodb.find_time_vars(ds_list, time_key='TIME')[source]
Return all variable names that have (time_key,) dimensions in any dataset.
- oceanarray.legacy.mooring_rodb.get_12hourly_time_grid(time_or_ds, freq='12h', start_offset=Timedelta('1 days 00:00:00'), end_offset=Timedelta('0 days 00:00:00'), time_var='TIME')[source]
Given a pandas.DatetimeIndex, array of datetimes, or xarray.Dataset, return a regular time grid (default: 12-hourly) from the first full day after the start to the last full day before the end.
- Parameters:
time_or_ds (array-like of datetime64, pandas.DatetimeIndex, or xarray.Dataset) – Input time array or dataset containing a time variable.
freq (str, optional) – Frequency string for the output grid (default ‘12h’).
start_offset (pd.Timedelta, optional) – Offset to add to the start time before normalizing (default 1 day).
end_offset (pd.Timedelta, optional) – Offset to subtract from the end time after normalizing (default 0).
time_var (str, optional) – Name of the time variable in the dataset (default ‘TIME’).
- Returns:
Regular time grid at the specified frequency.
- Return type:
pandas.DatetimeIndex
oceanarray.legacy.convertOS
OceanSites format conversion functions for legacy RODB data.
- oceanarray.legacy.convertOS.add_fixed_coordinates(ds, metadata)[source]
Add fixed spatial coordinates LATITUDE, LONGITUDE, and DEPTH from metadata.
- oceanarray.legacy.convertOS.add_global_attributes(ds, metadata)[source]
Add OceanSITES-compliant global attributes to the dataset.
- oceanarray.legacy.convertOS.add_instrument_metadata(ds, sensor_dict, metadata)[source]
Add centralized instrument metadata and link variables via the ‘instrument’ attribute.
- Parameters:
ds (xarray.Dataset) – The dataset to modify.
sensor_dict (dict) – Dictionary with ‘instrument_<N>’ keys and/or TOOL### templates, plus ‘variables’.
metadata (dict) – Header metadata to extract instrument-specific fields like serial and dates.
- oceanarray.legacy.convertOS.add_project_attributes(ds, project_attrs=None, metadata=None)[source]
Merge project-level attributes into dataset, with derived and dynamic values.
- oceanarray.legacy.convertOS.convert_rodb_to_oceansites(ds: Dataset, metadata_txt: str | Path, var_map_yaml: str | Path, vocab_yaml: str | Path, sensor_yaml: str | Path | None = None, project_yaml: str | Path | None = None) Dataset [source]
Convert a dataset loaded from RODB format into OceanSITES-compliant format.
- Parameters:
ds (xarray.Dataset) – Original dataset with RODB-style variables and TIME coordinate.
metadata_txt (str or Path) – Path to RODB metadata text file (header block with key=value lines).
var_map_yaml (str or Path) – YAML file mapping original variable names to OceanSITES names.
vocab_yaml (str or Path) – YAML file providing OceanSITES variable attributes.
sensor_yaml (str or Path, optional) – YAML file with instrument metadata and variable-instrument mapping.
project_yaml (str or Path, optional) – YAML file with project-level global attributes.
- Returns:
OceanSITES-compliant dataset.
- Return type:
xarray.Dataset
- oceanarray.legacy.convertOS.infer_data_mode(source_filename=None, history=None)[source]
Infer data_mode (D=delayed, R=real-time, P=provisional) from filename or processing history.
- Parameters:
source_filename (str or Path, optional) – Original filename (e.g., ‘example.raw’, ‘data.microcat’).
history (str, optional) – Global attribute describing the file’s processing history.
- Returns:
One of “D”, “R”, or “P”. Defaults to “D” (delayed mode) if no clear match found.
- Return type:
str
Legacy Configuration
Legacy configuration files are stored in oceanarray/config/legacy/
:
rodb_keys.yaml
- RODB variable name mappingsrodb_keys.txt
- Text format RODB variable definitions
Migration Guide
To migrate from legacy to modern processing:
Legacy Workflow:
from oceanarray.legacy import process_instrument, combine_mooring_OS
from oceanarray.legacy.rodb import RodbReader
# Legacy processing
reader = RodbReader('data.rodb')
data = reader.read()
processed = process_instrument(data)
mooring = combine_mooring_OS([processed])
Modern Workflow:
from oceanarray.stage1 import MooringProcessor
from oceanarray.stage2 import Stage2Processor
from oceanarray.time_gridding import TimeGridProcessor
# Modern CF-compliant processing
stage1 = MooringProcessor('/data/path')
stage1.process_mooring('mooring_name')
stage2 = Stage2Processor('/data/path')
stage2.process_mooring('mooring_name')
gridder = TimeGridProcessor('/data/path')
gridder.process_mooring('mooring_name')
Key Differences
Legacy Demo Notebooks
Legacy demo notebooks are available in notebooks/legacy/
:
demo_instrument_rdb.ipynb
- Legacy RODB instrument processingdemo_mooring_rdb.ipynb
- Legacy RODB mooring processingdemo_batch_instrument.ipynb
- Batch processing and QC analysis
These notebooks demonstrate the legacy workflow but are not recommended for new processing tasks.
Deprecation Timeline
The legacy modules will be maintained for backward compatibility but will not receive new features:
Current: Full backward compatibility maintained
Future: Bug fixes only, no new features
Long-term: May be moved to separate package or archived
For all new processing workflows, please use the modern CF-compliant pipeline documented in the main methods section.