oceanarray API

Load and process moored oceanographic time series data from raw instrument format to array-integrated transport products.

Inputs and Outputs

readers

Shared utilities and base classes for loading raw instrument data.

writers

Write datasets to disk in standardized NetCDF format.

oceanarray.writers.save_OS_instrument(ds: Dataset, data_dir: Path)[source]

Save OceanSITES dataset to netCDF using the ‘id’ global attribute as filename.

Parameters:
  • ds (xarray.Dataset) – Dataset with OceanSITES-compliant global attributes including ‘id’.

  • data_dir (pathlib.Path) – Directory to save the netCDF file.

Returns:

Full path to the saved NetCDF file.

Return type:

Path

oceanarray.writers.save_dataset(ds: Dataset, output_file: str = '../test.nc') bool[source]

Attempts to save the dataset to a NetCDF file. If a TypeError occurs due to invalid attribute values, it converts the invalid attributes to strings and retries the save operation.

Parameters:
  • ds (xarray.Dataset) – The dataset to be saved.

  • output_file (str, optional) – The path to the output NetCDF file. Defaults to ‘../test.nc’.

Returns:

True if the dataset was saved successfully, False otherwise.

Return type:

bool

Notes

This function is based on a workaround for issues with saving datasets containing attributes of unsupported types. See: https://github.com/pydata/xarray/issues/3743

plotters

Tools for plotting mooring time series, profile sections, and transport products.

oceanarray.plotters.pcolor_timeseries_by_depth(ds_interp, var='SA')[source]
oceanarray.plotters.plot_climatology(clim_ds: Dataset, var: str = 'dTdp', clim_ds_smoothed: Dataset | None = None, fig=None, ax=None)[source]

Plot seasonal climatology of dT/dP or dS/dP, optionally with smoothed version overlaid.

Parameters:
  • clim_ds (xr.Dataset) – Raw climatology dataset with ‘dTdp’ and/or ‘dSdp’.

  • var (str, optional) – Variable to plot (‘dTdp’ or ‘dSdp’), by default ‘dTdp’.

  • clim_ds_smoothed (xr.Dataset, optional) – Smoothed climatology dataset to overlay, by default None.

  • fig (matplotlib.figure.Figure, optional) – Existing figure to plot on. If None, a new figure is created.

  • ax (matplotlib.axes.Axes, optional) – Existing axes to plot on. If None, new axes are created.

Notes

If smoothed climatology is provided, the raw climatology is shown in grey. Otherwise, only the provided climatology is shown in color.

oceanarray.plotters.plot_microcat(ds)[source]
oceanarray.plotters.plot_qartod_summary(ds, var='TEMP', qc_var='QC_ROLLUP')[source]

Plot QARTOD rollup flags and flagged data points for a given variable.

Parameters:
  • ds (xarray.Dataset) – Dataset containing the variable and the QC flag.

  • var (str, optional) – Name of the variable to plot (default is “TEMP”).

  • qc_var (str, optional) – Name of the QC rollup flag variable (default is “QC_ROLLUP”).

oceanarray.plotters.plot_timeseries_by_depth(ds, var='TEMP')[source]

Plot individual time series for each depth level.

Parameters:
  • ds (xarray.Dataset) – Dataset containing the variable to plot.

  • var (str) – Variable name (default is “TEMP”).

oceanarray.plotters.plot_trim_windows(ds, dstart, dend, NN=np.timedelta64(12, 'h'))[source]

Plot start and end windows for variables T, C, P in the dataset, highlighting data before/after dstart/dend.

Parameters:
  • ds (xarray.Dataset) – Dataset containing variables ‘T’, ‘C’, ‘P’ and ‘TIME’.

  • dstart (np.datetime64) – Deployment start time.

  • dend (np.datetime64) – Deployment end time.

  • NN (np.timedelta64, optional) – Window size (default: 12 hours).

oceanarray.plotters.scatter_profile_vs_PRES(ds_interp, ds_12h, var='CT')[source]
oceanarray.plotters.show_attributes(data)[source]

Processes an xarray Dataset or a netCDF file, extracts attribute information, and returns a DataFrame with details about the attributes.

Parameters: data (str or xr.Dataset): The input data, either a file path to a netCDF file or an xarray Dataset.

Returns: pandas.DataFrame: A DataFrame containing the following columns:

  • Attribute: The name of the attribute.

  • Value: The value of the attribute.

oceanarray.plotters.show_variables(data)[source]

Processes an xarray Dataset or a netCDF file, extracts variable information, and returns a styled DataFrame with details about the variables.

Parameters: data (str or xr.Dataset): The input data, either a file path to a netCDF file or an xarray Dataset.

Returns: pandas.io.formats.style.Styler: A styled DataFrame containing the following columns:

  • dims: The dimension of the variable (or “string” if it is a string type).

  • name: The name of the variable.

  • units: The units of the variable (if available).

  • comment: Any additional comments about the variable (if available).

Instrument Processing

stage 1 - standardisation

Trim instrument records to the deployment window and flag out-of-bounds values.

Refactored stage1 processing for mooring data with improved readability.

class oceanarray.stage1.MooringProcessor(base_dir: str)[source]

Handles stage1 processing of mooring data.

COORDS_TO_REMOVE = {'sbe-asc': ['depth', 'latitude', 'longitude'], 'sbe-cnv': ['depth', 'latitude', 'longitude']}
READER_MAP = {'nortek-aqd': <class 'ctd_tools.readers.nortek_ascii_reader.NortekAsciiReader'>, 'rbr-dat': <class 'ctd_tools.readers.rbr_ascii_reader.RbrAsciiReader'>, 'rbr-rsk': <class 'ctd_tools.readers.rbr_rsk_auto_reader.RbrRskAutoReader'>, 'sbe-asc': <class 'ctd_tools.readers.sbe_ascii_reader.SbeAsciiReader'>, 'sbe-cnv': <class 'ctd_tools.readers.sbe_cnv_reader.SbeCnvReader'>}
VARS_TO_REMOVE = {'sbe-asc': ['potential_temperature', 'julian_days_offset', 'density'], 'sbe-cnv': ['potential_temperature', 'julian_days_offset', 'density']}
process_mooring(mooring_name: str, output_path: str | None = None) bool[source]

Process a single mooring’s data.

Parameters:
  • mooring_name – Name of the mooring to process

  • output_path – Optional custom output path. If None, uses default structure.

Returns:

True if processing completed successfully, False otherwise

Return type:

bool

oceanarray.stage1.process_multiple_moorings(mooring_list: List[str], basedir: str) Dict[str, bool][source]

Process multiple moorings.

Parameters:
  • mooring_list – List of mooring names to process

  • basedir – Base directory containing the data

Returns:

Dict mapping mooring names to success status

oceanarray.stage1.stage1_mooring(mooring_name: str, basedir: str, output_path: str | None = None) bool[source]

Process a single mooring’s data (backwards compatibility function).

Parameters:
  • mooring_name – Name of the mooring to process

  • basedir – Base directory containing the data

  • output_path – Optional output path override

Returns:

True if processing completed successfully

Return type:

bool

stage 2 - trimming

Trim instrument records to the deployment window and flag out-of-bounds values.

Stage 2 processing for mooring data: Apply clock offsets and trim to deployment period.

This module handles: - Loading processed Stage 1 NetCDF files - Applying clock corrections from YAML configuration - Trimming data to deployment/recovery time windows - Writing updated NetCDF files with ‘_use’ suffix

class oceanarray.stage2.Stage2Processor(base_dir: str)[source]

Handles Stage 2 processing: clock correction and temporal trimming.

process_mooring(mooring_name: str, output_path: str | None = None) bool[source]

Process Stage 2 for a single mooring.

Parameters:
  • mooring_name – Name of the mooring to process

  • output_path – Optional custom output path

Returns:

True if processing completed successfully

Return type:

bool

oceanarray.stage2.process_multiple_moorings_stage2(mooring_list: List[str], basedir: str) Dict[str, bool][source]

Process Stage 2 for multiple moorings.

Parameters:
  • mooring_list – List of mooring names to process

  • basedir – Base directory containing the data

Returns:

Dict mapping mooring names to success status

oceanarray.stage2.stage2_mooring(mooring_name: str, basedir: str, output_path: str | None = None) bool[source]

Process Stage 2 for a single mooring (backwards compatibility function).

Parameters:
  • mooring_name – Name of the mooring to process

  • basedir – Base directory containing the data

  • output_path – Optional output path override

Returns:

True if processing completed successfully

Return type:

bool

calibration

Apply post-cruise calibration offsets derived from shipboard CTD comparisons.

convertOS

Convert to OceanSites format.

Mooring Processing

Step 1 - time_gridding

Apply Butterworth filters to remove tides and smooth high-frequency noise.

Step 1 processing for mooring data: Time gridding and optional filtering.

This module handles: - Loading processed Stage 2 NetCDF files (_use.nc) from multiple instruments - Optional time-domain filtering applied to individual instrument records - Interpolating all instruments onto a common time grid - Combining instruments into a single dataset with N_LEVELS dimension - Encoding instrument metadata as coordinate arrays - Writing time-gridded mooring datasets

This represents Step 1 in the mooring-level processing workflow: - Step 1: Time gridding (this module) - Step 2: Vertical gridding (future) - Step 3: Multi-deployment stitching (future)

IMPORTANT: Filtering is applied to individual instrument records BEFORE interpolation to preserve data integrity and avoid interpolation artifacts.

Version: 1.1 Last updated: 2025-09-07

class oceanarray.time_gridding.TimeGriddingProcessor(base_dir: str)[source]

Handles Step 1 processing: time gridding and optional filtering of mooring instruments.

process_mooring(mooring_name: str, output_path: str | None = None, file_suffix: str = '_use', vars_to_keep: List[str] = None, filter_type: str | None = None, filter_params: Dict[str, Any] | None = None) bool[source]

Process Step 1 for a single mooring: time gridding and optional filtering.

Parameters:
  • mooring_name – Name of the mooring to process

  • output_path – Optional custom output path

  • file_suffix – Suffix for input files (‘_use’ or ‘_raw’)

  • vars_to_keep – List of variables to include in combined dataset

  • filter_type – Type of time filtering to apply (‘lowpass’, ‘detide’, ‘bandpass’)

  • filter_params – Parameters for filtering

Returns:

True if processing completed successfully

Return type:

bool

oceanarray.time_gridding.process_multiple_moorings_time_gridding(mooring_list: List[str], basedir: str, file_suffix: str = '_use', filter_type: str | None = None, filter_params: Dict[str, Any] | None = None) Dict[str, bool][source]

Process Step 1 for multiple moorings.

Parameters:
  • mooring_list – List of mooring names to process

  • basedir – Base directory containing the data

  • file_suffix – Suffix for input files (‘_use’ or ‘_raw’)

  • filter_type – Optional time filtering to apply (‘lowpass’, ‘detide’, ‘bandpass’)

  • filter_params – Optional parameters for filtering

Returns:

Dict mapping mooring names to success status

oceanarray.time_gridding.time_gridding_mooring(mooring_name: str, basedir: str, output_path: str | None = None, file_suffix: str = '_use', filter_type: str | None = None, filter_params: Dict[str, Any] | None = None) bool[source]

Process Step 1 for a single mooring (convenience function).

Parameters:
  • mooring_name – Name of the mooring to process

  • basedir – Base directory containing the data

  • output_path – Optional output path override

  • file_suffix – Suffix for input files (‘_use’ or ‘_raw’)

  • filter_type – Optional time filtering to apply (‘lowpass’, ‘detide’, ‘bandpass’)

  • filter_params – Optional parameters for filtering

Returns:

True if processing completed successfully

Return type:

bool

Step 2 - vertical gridding

Vertically interpolate T/S/P data onto a common pressure grid using climatological constraints.

Step 3 - stitching

Concatenate deployments and interpolate onto a continuous time base.

Array Processing

transports

Compute transport time series by integrating geostrophic velocity profiles and applying boundary corrections.

General Tools and Utilities

tools

Helper functions for unit conversion, time alignment, and quality control.

oceanarray.tools.calc_ds_difference(ds1, ds2)[source]
oceanarray.tools.calc_psal(ds)[source]
oceanarray.tools.downsample_to_sparse(temp_profiles, salt_profiles, full_pressures, sparse_pressures)[source]

Downsample full T/S profiles to sparse pressure levels.

Parameters:
  • temp_profiles (np.ndarray) – Full temperature profiles, shape (n_profiles, n_pressures_full).

  • salt_profiles (np.ndarray) – Full salinity profiles, shape (n_profiles, n_pressures_full).

  • full_pressures (np.ndarray) – Full pressure levels corresponding to temp_profiles and salt_profiles, shape (n_pressures_full,).

  • sparse_pressures (np.ndarray) – Target sparse pressure levels to sample, shape (n_pressures_sparse,).

Returns:

sparse_inputs – Concatenated sparse temperature and salinity features, shape (n_profiles, 2 * n_pressures_sparse). (temp_sparse followed by salt_sparse)

Return type:

np.ndarray

oceanarray.tools.find_cold_entry_exit(time, temp, quantile=0.95, dwell_seconds=1800, smooth_window=5)[source]

Identify first sustained entry into ‘cold’ regime and last sustained exit.

Parameters:
  • time (array-like of datetime64)

  • temp (array-like of float)

  • quantile (float) – Percentile for threshold (e.g. 0.1 ~ 10th percentile).

  • dwell_seconds (int) – Minimum time in cold regime for it to count (seconds).

  • smooth_window (int) – Rolling median window length (samples).

Returns:

t_start, t_end, threshold

Return type:

tuple of (Timestamp or None, Timestamp or None, float)

oceanarray.tools.flag_salinity_outliers(ds, n_std=4)[source]

Flags PSAL values that are more than n_std standard deviations from the mean, computed separately for each depth level.

Parameters:
  • ds (xarray.Dataset) – Dataset containing “PSAL” variable with dimensions including “DEPTH”.

  • n_std (float, optional) – Number of standard deviations from the mean to define an outlier (default is 4).

Returns:

Boolean array with True where salinity is flagged as an outlier.

Return type:

xarray.DataArray (bool)

oceanarray.tools.flag_temporal_spikes(ds, var='CNDC', threshold=5)[source]

Flags large absolute differences in time for each depth. threshold: maximum allowed difference in units of the variable

oceanarray.tools.flag_vertical_inconsistencies(ds, var='CNDC', threshold=2)[source]

Flags points that are very different from vertical neighbors. threshold: max allowed difference between vertically adjacent sensors.

oceanarray.tools.lag_correlation(x, y, max_lag, min_overlap=10)[source]

Pearson correlation at integer lags in [-max_lag, max_lag].

oceanarray.tools.process_dataset(ds: Dataset, latlim: tuple[float, float] = (26.0, 27.0), lonlim: tuple[float, float] = (-77.0, -76.5), pgrid: ndarray = None) tuple[Dataset, Dataset][source]

Filter and process a hydrographic dataset for use in training.

This function selects a region of interest, extracts and downsamples profiles of temperature and salinity onto both standard and sparse pressure grids. It also computes potential density anomaly for both resolutions.

Parameters:
  • ds (xr.Dataset) – Input dataset containing hydrographic data including CT, SA, PRES, and metadata.

  • latlim (tuple of float, optional) – Latitude limits for filtering, by default (26.0, 27.0).

  • lonlim (tuple of float, optional) – Longitude limits for filtering, by default (-77.0, -76.5).

Returns:

  • ds_standard (xr.Dataset) – Dataset downsampled to standard pressure levels.

  • ds_sparse (xr.Dataset) – Dataset downsampled to sparse pressure levels.

See also

verticalnn.data_utils.downsample_to_sparse

Used to interpolate to target pressure levels.

verticalnn.config.STANDARD_PRESSURES

Standard pressure grid.

verticalnn.config.SPARSE_PRESSURES

Sparse pressure grid.

oceanarray.tools.run_qc(ds)[source]
oceanarray.tools.split_value(data, nbins=30)[source]

utilities

General utilities for file management, logging, and parsing ASCII metadata.

oceanarray.utilities.apply_defaults(default_source: str, default_files: List[str]) Callable[source]

Decorator to apply default values for ‘source’ and ‘file_list’ parameters if they are None.

Parameters:
  • default_source (str) – Default source URL or path.

  • default_files (list of str) – Default list of filenames.

Returns:

A wrapped function with defaults applied.

Return type:

Callable

oceanarray.utilities.concat_with_scalar_vars(datasets, dim, scalar_vars=None)[source]

Concatenate a list of xarray Datasets along a given dimension, preserving scalar variables (0-D DataArrays) as scalars (not broadcast).

Parameters:
  • datasets (list of xarray.Dataset) – Datasets to concatenate.

  • dim (str) – Dimension along which to concatenate.

  • scalar_vars (list of str, optional) – List of variable names to treat as scalars. If None, auto-detect scalar variables (those with ndim == 0 in any dataset).

Returns:

Concatenated dataset with scalar variables re-attached as 0-D DataArrays.

Return type:

xarray.Dataset

oceanarray.utilities.get_dims(ds_gridded)[source]

Helper function to extract pressure key, time key, and their respective dimensions from a dataset.

Parameters:

ds_gridded (xarray.Dataset) – Dataset containing the variables and dimensions.

Returns:

  • pres_key (str) – Key for the pressure variable.

  • time_key (str) – Key for the time variable.

  • pres_dim (str) – Dimension associated with the pressure variable.

  • time_dim (str) – Dimension associated with the time variable.

oceanarray.utilities.get_time_key(ds)[source]

Return the name of the time coordinate or variable in an xarray.Dataset.

Parameters:

ds (xarray.Dataset) – The dataset to inspect.

Returns:

The name of the time coordinate or variable.

Return type:

str

Raises:

ValueError – If no time dimension or coordinate is found.

oceanarray.utilities.is_iso8601_utc(timestr)[source]

Validate whether a string is in ISO8601 UTC format: YYYY-MM-DDTHH:MM:SSZ

Parameters:

timestr (str) – Input time string.

Returns:

True if valid ISO8601 UTC format, False otherwise.

Return type:

bool

oceanarray.utilities.iso8601_duration_from_seconds(seconds)[source]

Convert a duration in seconds to an ISO 8601 duration string.

Parameters:

seconds (float) – Duration in seconds.

Returns:

ISO 8601 duration string, e.g., ‘PT1H’, ‘PT30M’, ‘PT15S’.

Return type:

str