`seagliderOG1 API`

convertOG1

Convert Seaglider basestation files to OG1 format.

This module provides the core functionality for converting Seaglider basestation NetCDF files into OG1 (Ocean Gliders 1) format. It handles data processing, variable renaming, attribute assignments, and dataset standardization.

seagliderOG1.convertOG1.add_gps_info_to_dataset(ds: Dataset, gps_ds: Dataset) → Dataset[source]

Add GPS information (LATITUDE_GPS, LONGITUDE_GPS, TIME_GPS) to the dataset.

The GPS values will be included within the N_MEASUREMENTS dimension, with non-NaN values only when GPS information is available. The dataset will be sorted by TIME.

Parameters:

ds (xarray.Dataset) – The dataset with renamed dimensions and variables, representing the main data.
gps_ds (xarray.Dataset) – The dataset containing GPS information, typically extracted from the original basestation dataset.

Returns:

The updated dataset with added GPS information. This includes values for LATITUDE_GPS, LONGITUDE_GPS, and TIME_GPS only when GPS information is available.

Return type:

xarray.Dataset

Notes

The dataset is sorted by TIME (or ctd_time from the original basestation dataset).
If the data are not sorted by time, there may be unintended consequences.
The function assumes that the GPS dataset contains variables log_gps_lon, log_gps_lat, and log_gps_time for longitude, latitude, and time respectively.
The function uses the sg_data_point dimension as defined in the OG1 vocabulary.

seagliderOG1.convertOG1.convert_to_OG1(list_of_datasets: list[Dataset] | Dataset, contrib_to_append: dict[str, str] | None = None) → tuple[Dataset, list[str]][source]

Convert Seaglider basestation datasets to OG1 format. Processes a list of xarray datasets or a single xarray dataset, converts them to OG1 format, concatenates the datasets, sorts by time, and applies attributes. Main conversion function that processes basestation datasets, applies OG1 standardization, concatenates multiple datasets, and adds global attributes.

Parameters:

list_of_datasets (list of xarray.Dataset or xarray.Dataset) – A list of xarray datasets or a single xarray dataset in basestation format.
contrib_to_append (dict of str, optional) – Dictionary containing additional contributor information to append. Default is None.

Returns:

A tuple containing: - ds_og1 (xarray.Dataset): The concatenated and processed dataset in OG1 format. - varlist (list of str): A list of variable names from the input datasets.

Return type:

tuple of (xarray.Dataset, list of str)

seagliderOG1.convertOG1.extract_attr_to_keep(ds1: Dataset, attr_as_is: list[str] = {'acknowledgment', 'date_created', 'disclaimer', 'file_version', 'geospatial_lat_max', 'geospatial_lat_min', 'geospatial_lon_max', 'geospatial_lon_min', 'geospatial_vertical_max', 'geospatial_vertical_min', 'id', 'institution', 'keywords', 'keywords_vocabulary', 'license', 'naming_authority', 'project'}) → dict[str, str][source]

Extract attributes to retain unchanged.

Parameters:

ds1 (xarray.Dataset) – Source dataset.
attr_as_is (list) – Attribute names to retain without modification.

Returns:

Retained attributes.

Return type:

dict

seagliderOG1.convertOG1.extract_attr_to_rename(ds1: Dataset, attr_to_rename: dict[str, str] = {'comment': 'history', 'platform_id': 'PLATFORM_SERIAL_NUMBER', 'site': 'summary', 'uri': 'uuid', 'uri_comment': 'UUID'}) → dict[str, str][source]

Extract and rename attributes according to OG1 vocabulary.

Parameters:

ds1 (xarray.Dataset) – Source dataset.
attr_to_rename (dict) – Mapping of new_name: old_name for attribute renaming.

Returns:

Renamed attributes.

Return type:

dict

seagliderOG1.convertOG1.extract_variables(ds: Dataset) → tuple[Dataset, Dataset, Dataset][source]

Split variables from the basestation file that have no dimensions into categorized datasets.

This function further processes the variables from the basestation file that had no dimensions. It categorizes them based on their prefixes or characteristics into three groups: variables from sg_calib_constants, log files, and other mission/dive-specific values.

Parameters:

ds (xarray.Dataset) – The input dataset. This function is designed to work on variables from the basestation file that had no dimensions, typically after being processed by split_by_unique_dims.

Returns:

A tuple containing three xarray Datasets: - sg_cal : xarray.Dataset

Dataset containing variables starting with ‘sg_cal_’ (originally from sg_calib_constants.m). The variables are renamed to remove the ‘sg_cal_’ prefix, so they can be accessed directly (e.g., sg_cal.hd_a).

dc_logxarray.Dataset
Dataset containing variables starting with ‘log_’. These variables are typically from log files.
dc_otherxarray.Dataset
Dataset containing other mission/dive-specific values. This includes depth-averaged currents and other variables like magnetic_variation.

Return type:

tuple of (xarray.Dataset, xarray.Dataset, xarray.Dataset)

seagliderOG1.convertOG1.get_contributors(ds: Dataset, values_to_append: dict[str, str] | None = None) → dict[str, str][source]

Extract and format contributor information for OG1 attributes.

Processes creator and contributor information from dataset attributes, formats them as comma-separated strings, and handles institution mapping.

Parameters:

ds (xarray.Dataset) – Dataset containing original contributor attributes.
values_to_append (dict, optional) – Additional contributor information to append.

Returns:

Dictionary with formatted contributor attribute strings.

Return type:

dict

seagliderOG1.convertOG1.get_time_attributes(ds: Dataset) → dict[str, str][source]

Extract and clean time-related attributes from the dataset.

Converts various time formats to OG1-standard YYYYMMDDTHHMMSS format and adds date_modified timestamp.

Parameters:: ds (xarray.Dataset) – The input dataset containing various attributes.
Returns:: A dictionary containing cleaned time-related attributes.
Return type:: dict

seagliderOG1.convertOG1.process_and_save_data(input_location: str, save: bool = False, output_dir: str = '.', run_quietly: bool = True) → Dataset[source]

Process and save data from the specified input location.

This function loads and concatenates datasets from the server, converts them to OG1 format, and saves the resulting dataset to a NetCDF file. If the file already exists, the function will prompt the user to decide whether to overwrite it or not.

Parameters:

input_location (str) – The location of the input data to be processed.
save (bool, optional) – Whether to save the processed dataset to a file. Default is False.
output_dir (str, optional) – The directory where the output file will be saved. Default is ‘.’.
run_quietly (bool, optional) – If True, suppresses user prompts and assumes ‘no’ for overwriting files. Default is True.

Returns:

The processed dataset.

Return type:

xarray.Dataset

seagliderOG1.convertOG1.process_dataset(ds1_base: Dataset, firstrun: bool = False) → tuple[Dataset, list[str], Dataset, Dataset, Dataset][source]

Processes a dataset by performing a series of transformations and extractions.

Parameters:

ds1_base (xarray.Dataset) – The input dataset from a basestation file, containing various attributes and variables.
firstrun (bool, optional) – Indicates whether this is the first run of the processing pipeline. Default is False.

Returns:

A tuple containing: - ds_new (xarray.Dataset): The processed dataset with renamed variables, assigned attributes,

converted units, and additional information such as GPS info and dive number.

attr_warnings (list[str]): A list of warnings related to attribute assignments.
sg_cal (xarray.Dataset): A dataset containing variables starting with ‘sg_cal’.
dc_other (xarray.Dataset): A dataset containing other variables not categorized under ‘sg_cal’ or ‘dc_log’.
dc_log (xarray.Dataset): A dataset containing variables starting with ‘log_’.

Return type:

tuple

Notes

The function performs the following steps:
1. Handles and splits the inputs:
  
  Extracts the dive number from the attributes.
  
  Splits the dataset by unique dimensions.
  
  Extracts the gps_info from the split dataset.
  
  Extracts variables starting with ‘sg_cal’ (originally from sg_calib_constants.m).
2. Renames the dataset dimensions, coordinates, and variables according to OG1:
  
  Extracts and renames dimensions for ‘sg_data_point’ (N_MEASUREMENTS).
  
  Renames variables according to the OG1 vocabulary.
  
  Assigns variable attributes according to OG1 and logs warnings for conflicts.
  
  Converts units in the dataset (e.g., cm/s to m/s) where possible.
  
  Converts QC flags to int8.
3. Adds new variables:
  
  Adds GPS info as LATITUDE_GPS, LONGITUDE_GPS, and TIME_GPS (increasing the length of N_MEASUREMENTS).
  
  Adds the divenum as a variable of length N_MEASUREMENTS.
  
  Adds the PROFILE_NUMBER (odd for dives, even for ascents).
  
  Adds the PHASE of the dive (1 for ascent, 2 for descent, 3 for between the first two surface points).
  
  Adds the DEPTH_Z with positive up.
4. Returns the processed dataset, attribute warnings, and categorized datasets.
The function sorts the dataset by TIME and may exhibit undesired behavior if there are not two surface GPS fixes before a dive.

seagliderOG1.convertOG1.standardise_OG10(ds: Dataset, firstrun: bool = False, unit_format: dict[str, str] = {'S/m': 'S m-1', 'cm/s': 'cm s-1', 'degreesCelsius': 'Celsius', 'degrees_Celsius': 'Celsius', 'g/kg': 'g kg-1', 'g/m^3': 'g m-3', 'kg/m^3': 'kg m-3', 'm/s': 'm s-1', 'mS/cm': 'mS cm-1', 'meters': 'm'}) → Dataset[source]

Standardize the dataset to OG1 format by renaming dimensions, variables, and assigning attributes.

Applies OG1 vocabulary for variable names, units, and attributes. Performs unit conversions and QC flag standardization.

Parameters:

ds (xarray.Dataset) – The input dataset to be standardized.
firstrun (bool, optional) – Indicates whether this is the first run of the standardization process. Default is False.
unit_format (dict of str, optional) – A dictionary mapping unit strings to their standardized format. Default is vocabularies.unit_str_format.

Returns:

The standardized dataset in OG1 format.

Return type:

xarray.Dataset

seagliderOG1.convertOG1.update_dataset_attributes(ds: Dataset, contrib_to_append: dict[str, str] | None) → dict[str, str][source]

Update the attributes of the dataset based on the provided attribute input.

Processes contributor information, time attributes, and applies OG1 global attribute vocabulary in the correct order.

Parameters:

ds (xarray.Dataset) – The input dataset whose attributes need to be updated.
contrib_to_append (dict of str or None) – A dictionary containing additional contributor information to append. Default is None.

Returns:

A dictionary of ordered attributes with updated values.

Return type:

dict

vocabularies

Vocabularies and mappings for OG1 format conversion.

This module defines the vocabulary mappings, unit conversions, and attribute configurations used to convert Seaglider basestation files to OG1 format. It loads YAML configuration files and defines various dictionaries used throughout the conversion process.

Key Components: - Variable name mappings from basestation to OG1 format - Unit conversion factors and formatting rules - Global attribute definitions and ordering - Variables to include/exclude during conversion - Dimension renaming mappings

Configuration Files: - OG1_var_names.yaml: Variable name mappings - OG1_vocab_attrs.yaml: Variable attribute vocabularies - OG1_sensor_attrs.yaml: Sensor attribute definitions - OG1_global_attrs.yaml: Global attribute configurations - OG1_author.yaml: Default author/contributor information

Notes

This module is primarily data-driven and loads most configuration from YAML files in the config/ directory. The YAML-based approach allows easy modification of OG1 format requirements without code changes.

readers

seagliderOG1.readers.filter_files_by_profile(file_list: list[str], start_profile: int | None = None, end_profile: int | None = None) → list[str][source]

Filter files by profile/dive number range.

Filters Seaglider basestation files based on profile number range. Expects filenames of the form pXXXYYYY.nc, where XXX is the 3-digit glider serial number and YYYY is the 4-digit dive cycle number.

Example: p0420001.nc represents glider 042, dive 0001.

Note: Input file_list does not need to be sorted.

Parameters:

file_list (list of str) – List of Seaglider filenames to filter.
start_profile (int, optional) – Minimum profile number (inclusive).
end_profile (int, optional) – Maximum profile number (inclusive).

Returns:

Filtered list of filenames within the specified range.

Return type:

list of str

seagliderOG1.readers.list_files(source: str, registry_loc: str = 'seagliderOG1', registry_name: str = 'seaglider_registry.txt') → list[str][source]

List NetCDF files from a source (URL or local directory).

For online sources, scrapes directory listings using BeautifulSoup. For local sources, lists files in the directory.

Parameters:

source (str) – URL (http/https) or local directory path to scan for files.
registry_loc (str, optional) – Legacy parameter, not currently used.
registry_name (str, optional) – Legacy parameter, not currently used.

Returns:

Sorted list of NetCDF filenames (.nc files only).

Return type:

list of str

Raises:

ValueError – If source is neither a valid URL nor directory path.

seagliderOG1.readers.load_basestation_files(source: str, start_profile: int | None = None, end_profile: int | None = None) → list[Dataset][source]

Load multiple Seaglider basestation files with optional profile filtering.

Main function for loading Seaglider data from either online repositories or local directories. Supports filtering by dive/profile number range.

Parameters:

source (str) – Directory path containing Seaglider NetCDF files.
start_profile (int, optional) – Minimum profile number to load.
end_profile (int, optional) – Maximum profile number to load.

Returns:

List of loaded basestation datasets, ordered by filename.

Return type:

list of xarray.Dataset

seagliderOG1.readers.load_first_basestation_file(source: str) → Dataset[source]

Load the first (alphabetically) basestation file from a source.

Useful for quick examination of data structure and metadata from a Seaglider mission without loading all files.

Parameters:: source (str) – URL or local directory path containing NetCDF files.
Returns:: The first basestation dataset.
Return type:: xarray.Dataset

seagliderOG1.readers.load_sample_dataset(file_path: str = '/home/runner/work/seagliderOG1/seagliderOG1/data/demo_sg005/p0050001_20080606.nc') → Dataset[source]

Download sample datasets for use with seagliderOG1.

Parameters:: dataset_name (str, optional) – Name of the sample dataset to load. Must be one of the available datasets in the registry. Default is “p0330015_20100906.nc”.
Returns:: The requested sample dataset loaded from the cache.
Return type:: xarray.Dataset
Raises:: KeyError – If the requested dataset is not available in the registry.

seagliderOG1.readers.log_repair(log_path: str | Path, filename: str, fixed_vars: list[str], error: Exception | str)[source]: Append one repair entry to the log file.

seagliderOG1.readers.repair_netcdf_time_metadata_inplace(path: str | Path, repair_dir: str | Path, backup: bool = True) → list[str][source]

Repair variables stored as text but mislabeled as CF time variables.

The original file is backed up into repair_dir before the repaired version replaces the original file.

seagliderOG1.readers.scan_and_repair_files(source: str, start_profile: int | None = None, end_profile: int | None = None) → None[source]

Scan NetCDF files and repair inconsistent time metadata only when needed.

Parameters:

source (str) – Directory path containing NetCDF files to scan and repair.
start_profile (int, optional) – Minimum profile number to include (inclusive).
end_profile (int, optional) – Maximum profile number to include (inclusive).

Return type:

None

plotters

Plotting and visualization functions for Seaglider data.

This module provides functions for plotting and inspecting Seaglider datasets, including variable summaries, attribute displays, and depth profile visualizations.

seagliderOG1.plotters.plot_ctd_depth_vs_time(ds: Dataset, start_traj: int | None = None, end_traj: int | None = None) → None[source]

Plot CTD depth vs time with GPS fix highlighting.

Creates a depth time series plot with special highlighting of points where GPS latitude data is available (non-NaN), useful for identifying surface intervals.

Parameters:

ds (xarray.Dataset) – The input dataset containing ‘ctd_time’, ‘ctd_depth’, and ‘gps_lat’ variables.
start_traj (int, optional) – The starting trajectory number to filter the data. If None, no filtering applied.
end_traj (int, optional) – The ending trajectory number to filter the data. If None, no filtering applied.

Notes

Plots CTD depth as black dots for all data points.
Overlays red circles at points where GPS latitude is non-NaN.
Inverts y-axis to show depth increasing downward.
Filters data by trajectory range if specified.

seagliderOG1.plotters.plot_depth_colored(data: DataFrame | Dataset, color_by: str | None = None, start_dive: int | None = None, end_dive: int | None = None) → None[source]

Plot depth as a function of time with optional coloring and dive filtering.

Creates a depth time series plot with optional color coding by another variable and filtering by dive number range.

Parameters:

data (pandas.DataFrame or xarray.Dataset) – The input data containing depth, time, and dive number variables. Should contain ‘ctd_depth’/’DEPTH’, ‘ctd_time’/’TIME’, and dive number variables.
color_by (str, optional) – The variable name to color the plot by. If None, uses a simple line plot.
start_dive (int, optional) – The starting dive number to filter the data. If None, no filtering applied.
end_dive (int, optional) – The ending dive number to filter the data. If None, no filtering applied.

Notes

Accepts dive number variables: ‘dive_number’, ‘divenum’, or ‘dive_num’.
Uses scatter plot with colorbar when color_by is specified.
Inverts y-axis and formats time axis similar to plot_profile_depth.

Raises:

TypeError – If input data is not a pandas DataFrame or xarray Dataset.
ValueError – If no valid dive number variable is found in the dataset.

seagliderOG1.plotters.plot_profile_depth(data: DataFrame | Dataset) → None[source]

Plot profile depth as a function of time.

Creates a time series plot of depth data with automatic point reduction for large datasets and proper axis formatting.

Parameters:: data (pandas.DataFrame or xarray.Dataset) – The input data containing depth and time variables. Should contain either ‘ctd_depth’/’ctd_time’ or ‘DEPTH’/’TIME’ variables.

Notes

Automatically reduces the total number of points to less than 100,000 for performance.
Inverts y-axis to show depth increasing downward.
Formats x-axis with month-day labels and adds year information.
Sets tight y-axis limits rounded to nearest 10 meters.

Raises:

TypeError – If input data is not a pandas DataFrame or xarray Dataset.
KeyError – If required time or depth variables are not found in the dataset.

seagliderOG1.plotters.show_attributes(data: str | Dataset) → DataFrame[source]

Process an xarray Dataset or netCDF file and extract global attribute information.

Creates a DataFrame with comprehensive details about all global attributes in the dataset, including their values and data types.

Parameters:: data (str or xarray.Dataset) – The input data, either a file path to a netCDF file or an xarray Dataset.
Returns:: A DataFrame containing the following columns: - Attribute: The name of the attribute. - Value: The value of the attribute. - DType: The data type of the attribute value.
Return type:: pandas.DataFrame
Raises:: TypeError – If input data is not a file path or xarray Dataset.

seagliderOG1.plotters.show_contents(data: str | Dataset, content_type: str = 'variables') → DataFrame[source]

Show contents of an xarray Dataset or a netCDF file.

Wrapper function to display either variables or attributes from the dataset.

Parameters:

data (str or xarray.Dataset) – The input data, either a file path to a netCDF file or an xarray Dataset.
content_type (str, optional) – The type of content to show, either ‘variables’ (or ‘vars’) or ‘attributes’ (or ‘attrs’). Default is ‘variables’.

Returns:

A DataFrame with details about the variables or attributes.

Return type:

pandas.DataFrame

Raises:

TypeError – If input data is not a file path or xarray Dataset.
ValueError – If content_type is not ‘variables’, ‘vars’, ‘attributes’, or ‘attrs’.

seagliderOG1.plotters.show_variables(data: str | Dataset) → DataFrame[source]

Process an xarray Dataset or netCDF file and extract variable information.

Creates a styled DataFrame with comprehensive details about all variables in the dataset, including dimensions, units, comments, and data types.

Parameters:: data (str or xarray.Dataset) – The input data, either a file path to a netCDF file or an xarray Dataset.
Returns:: A styled DataFrame containing the following columns: - dims: The dimension of the variable (or “string” if it is a string type). - name: The name of the variable. - units: The units of the variable (if available). - comment: Any additional comments about the variable (if available). - standard_name: CF standard name (if available). - dtype: Data type of the variable.
Return type:: pandas.io.formats.style.Styler
Raises:: TypeError – If input data is not a file path or xarray Dataset.

seagliderOG1.plotters.show_variables_by_dimension(data: str | Dataset, dimension_name: str = 'trajectory') → DataFrame[source]

Process dataset and extract variables filtered by a specific dimension.

Creates a styled DataFrame showing only variables that have the specified dimension, useful for examining variables of a particular type.

Parameters:

data (str or xarray.Dataset) – The input data, either a file path to a netCDF file or an xarray Dataset.
dimension_name (str, optional) – The name of the dimension to filter variables by. Default is “trajectory”.

Returns:

A styled DataFrame containing the following columns: - dims: The dimension of the variable (or “string” if it is a string type). - name: The name of the variable. - units: The units of the variable (if available). - comment: Any additional comments about the variable (if available).

Return type:

pandas.io.formats.style.Styler

Raises:

TypeError – If input data is not a file path or xarray Dataset.

writers

seagliderOG1.writers.save_dataset(ds: Dataset, output_file: str = '../test.nc') → None[source]

Attempts to save the dataset to a NetCDF file.

If a TypeError occurs due to invalid attribute values, converts the invalid attributes to strings and retries the save operation.

Parameters:

ds (xarray.Dataset) – The dataset to be saved.
output_file (str, optional) – The path to the output NetCDF file. Defaults to ‘../test.nc’.

Returns:

True if the dataset was saved successfully, False otherwise.

Return type:

bool

Notes

Based on: https://github.com/pydata/xarray/issues/3743

tools

seagliderOG1.tools.add_dive_number(ds: Dataset, dive_number: int | None = None) → Dataset[source]

Add dive number as a variable to the dataset. Assumes present in the basestation attributes.

Parameters:

ds – The dataset to which the dive number will be added.
dive_number – The dive number to add. If None, extracts from dataset attributes.
optional – The dive number to add. If None, extracts from dataset attributes.

Returns:

The dataset with the dive number added.

Return type:

xarray.Dataset

seagliderOG1.tools.add_hdm_parameters(ds_OG1, hdm_parameters)[source]: Add HDM parameters to the OG1 dataset as new variables with their attributes.

Parameters:

ds_OG1 (xarray.Dataset): The OG1 dataset to which HDM parameters will be added. hdm_parameters (dict): A dictionary containing HDM parameters and their attributes

in the format {standard_name: {“value”: …, “attributes”: {…}}}.

Returns:

xarray.Dataset: Updated OG1 dataset with HDM parameters added as variables.

seagliderOG1.tools.add_sensor_to_dataset(dsa: Dataset, ds: Dataset, sg_cal: Dataset, firstrun: bool = False) → Dataset[source]

Add sensor information to the dataset based on instrument data and calibration parameters.

This function processes different types of seaglider sensors including CTD (sbe41), oxygen sensors (sbe43, aa4381, aa4330), and optical sensors (wlbb2f). For each sensor, it extracts calibration information, parses serial numbers and calibration dates, and creates sensor variables with appropriate attributes and metadata.

Parameters:

dsa – The target dataset to add sensor information to.
ds – Dataset containing instrument/sensor data with attributes.
sg_cal – Dataset containing calibration information and calibcomm strings.
firstrun – Whether to enable detailed logging for debugging. Default is False.
optional – Whether to enable detailed logging for debugging. Default is False.

Returns:

The updated dataset with sensor variables and metadata added.

Return type:

xarray.Dataset

Notes

Handles CTD sensors by parsing calibcomm strings for serial numbers and dates
Processes oxygen sensors using calibcomm_oxygen or calibcomm_optode information
Creates sensor variable names in format: ‘SENSOR_{type}_{serial}’
Skips altimeter sensors as they are not processed in current implementation
For wlbb2f sensors, calibration parsing is commented out pending data availability

seagliderOG1.tools.assign_phase(ds: Dataset) → Dataset[source]

This function adds new variables ‘PHASE’ and ‘PHASE_QC’ to the dataset ds, which indicate the phase of each measurement. The phase is determined based on the pressure readings (‘PRES’) for each unique dive number (‘dive_num’).

Note: In this formulation, we are only separating into dives and climbs based on when the glider is at the maximum depth. Future work needs to separate out the other phases: https://github.com/OceanGlidersCommunity/OG-format-user-manual/blob/main/vocabularyCollection/phase.md and generate a PHASE_QC. Assigns phase values to the dataset based on pressure readings.

Parameters:

(xarray.Dataset) (ds)

Returns:

xarray.Dataset (The dataset with an additional ‘PHASE’ variable, where:)
xarray.Dataset (The dataset with additional ‘PHASE’ and ‘PHASE_QC’ variables, where:) –
- ‘PHASE’ indicates the phase of each measurement:
  - Phase 2 is assigned to measurements up to and including the maximum pressure point.
  - Phase 1 is assigned to measurements after the maximum pressure point.
- ’PHASE_QC’ is an additional variable with no QC applied.
Note (In this formulation, we are only separating into dives and climbs based on when the glider is at the maximum depth. Future work needs to separate out the other phases: https://github.com/OceanGlidersCommunity/OG-format-user-manual/blob/main/vocabularyCollection/phase.md and generate a PHASE_QC)

seagliderOG1.tools.assign_profile_number(ds: Dataset, ds1: Dataset) → Dataset[source]

Assign profile numbers to measurements based on dive phases (down/up casts).

This function separates each dive into two profiles: descent (down cast) and ascent (up cast) phases. The dive is split at the maximum pressure point, with the descent phase getting the dive number and ascent getting dive + 0.5. Profile numbers are then calculated as 2 * dive_num_cast - 1.

Parameters:

ds – The dataset to add profile numbers to.
ds1 – Dataset containing dive number information in attributes.

Returns:

Dataset with ‘dive_num_cast’ and ‘PROFILE_NUMBER’ variables added.

Return type:

xarray.Dataset

Notes

Requires pressure variable (PRES, ctd_pressure, Pressure, or pres)
Down cast: dive_num_cast = dive number
Up cast: dive_num_cast = dive number + 0.5
Profile numbers: descent = 2*dive-1, ascent = 2*dive

seagliderOG1.tools.calc_Z(ds: Dataset) → Dataset[source]

Calculate the depth (Z position) of the glider using the gsw library to convert pressure to depth.

Parameters:: ds – The input dataset containing ‘PRES’, ‘LATITUDE’, and ‘LONGITUDE’ variables.
Returns:: The dataset with an additional ‘DEPTH’ variable.
Return type:: xarray.Dataset

seagliderOG1.tools.combine_two_dim_of_dataset(ds: Dataset, dim1: str = 'sg_data_point', dim2: str = 'ctd_data_point') → Dataset[source]

Updates the original dataset by removing variables with dim1 and dim2 and adding the merged dataset.

Parameters:

ds – The original dataset.
dim1 – First dimension to be removed. Default is ‘sg_data_point’.
optional – First dimension to be removed. Default is ‘sg_data_point’.
dim2 – Second dimension to be removed. Default is ‘ctd_data_point’.
optional – Second dimension to be removed. Default is ‘ctd_data_point’.

Returns:

The updated dataset with merged variables.

Return type:

xarray.Dataset

seagliderOG1.tools.convert_qc_flags(dsa: Dataset, qc_name: str) → Dataset[source]

Convert QC flag variables to proper integer format and update attributes.

This function converts QC flag variables from string format to int8, handles NaN values appropriately, removes ‘QC_’ prefixes from flag meanings, and adds proper metadata including long_name and standard_name.

Parameters:

dsa – The dataset containing QC flag variables.
qc_name – The name of the QC flag variable to process.

Returns:

Dataset with converted QC flag variable and updated attributes.

Return type:

xarray.Dataset

Notes

Must be called after the main variable has been assigned its OG1 long_name.

seagliderOG1.tools.convert_units(ds: Dataset) → Dataset[source]

Convert the units of variables in an xarray Dataset to preferred units.

This is useful, for instance, to convert cm/s to m/s based on vocabulary specifications.

Parameters:: ds – The dataset containing variables to convert.
Returns:: The dataset with converted units.
Return type:: xarray.Dataset

seagliderOG1.tools.convert_units_var(var_values: ndarray, current_unit: str, new_unit: str, unit1_to_unit2: dict = {'Celsius_to_degreesCelsius': {'current_unit': 'Celsius', 'factor': 1, 'new_unit': 'degreesCelsius'}, 'Pa_to_dbar': {'current_unit': 'Pa', 'factor': 0.0001, 'new_unit': 'dbar'}, 'S m-1_to_mS cm-1': {'current_unit': 'S m-1', 'factor': 0.1, 'new_unit': 'mS cm-1'}, 'S/m_to_mS/cm': {'current_unit': 'S/m', 'factor': 0.1, 'new_unit': 'mS/cm'}, 'cm s-1_to_m s-1': {'current_unit': 'cm s-1', 'factor': 0.01, 'new_unit': 'm s-1'}, 'cm/s_to_m/s': {'current_unit': 'cm/s', 'factor': 0.01, 'new_unit': 'm/s'}, 'cm_to_m': {'current_unit': 'cm', 'factor': 0.01, 'new_unit': 'm'}, 'dbar_to_Pa': {'current_unit': 'dbar', 'factor': 10000, 'new_unit': 'Pa'}, 'dbar_to_kPa': {'current_unit': 'dbar', 'factor': 10, 'new_unit': 'kPa'}, 'degreesCelsius_to_Celsius': {'current_unit': 'degreesCelsius', 'factor': 1, 'new_unit': 'Celsius'}, 'g m-3_to_kg m-3': {'current_unit': 'g m-3', 'factor': 0.001, 'new_unit': 'kg m-3'}, 'g/m^3_to_kg/m^3': {'current_unit': 'g/m3', 'factor': 0.001, 'new_unit': 'kg/m3'}, 'kg m-3_to_g m-3': {'current_unit': 'kg m-3', 'factor': 1000, 'new_unit': 'g m-3'}, 'kg/m^3_to_g/m^3': {'current_unit': 'kg/m3', 'factor': 1000, 'new_unit': 'g/m3'}, 'km_to_m': {'current_unit': 'km', 'factor': 1000, 'new_unit': 'm'}, 'm s-1_to_cm s-1': {'current_unit': 'm s-1', 'factor': 100, 'new_unit': 'cm s-1'}, 'm/s_to_cm/s': {'current_unit': 'm/s', 'factor': 100, 'new_unit': 'cm/s'}, 'mS cm-1_to_S m-1': {'current_unit': 'mS cm-1', 'factor': 10, 'new_unit': 'S m-1'}, 'mS/cm_to_S/m': {'current_unit': 'mS/cm', 'factor': 10, 'new_unit': 'S/m'}, 'm_to_cm': {'current_unit': 'm', 'factor': 100, 'new_unit': 'cm'}, 'm_to_km': {'current_unit': 'm', 'factor': 0.001, 'new_unit': 'km'}}, firstrun: bool = False) → tuple[ndarray, str][source]

Convert the units of variables in an xarray Dataset to preferred units. This is useful, for instance, to convert cm/s to m/s.

Parameters:

(xarray.Dataset) (ds)
(list) (preferred_units)
(dict) (unit1_to_unit2)
string (Each key is a unit) –
- ‘factor’: The factor to multiply the variable by to convert it.
- ’units_name’: The new unit name after conversion.
with (and each value is a dictionary) –
- ‘factor’: The factor to multiply the variable by to convert it.
- ’units_name’: The new unit name after conversion.

Returns:

xarray.Dataset

Return type:

The dataset with converted units.

seagliderOG1.tools.encode_times(ds: Dataset) → Dataset[source]

Encode time variables with standard units and remove problematic attributes.

Parameters:: ds – Dataset containing time variables to encode.
Returns:: Dataset with properly encoded time variables.
Return type:: xarray.Dataset

seagliderOG1.tools.encode_times_og1(ds: Dataset) → Dataset[source]

Encode time variables according to OG1 format specifications.

Parameters:: ds – Dataset containing time variables to encode.
Returns:: Dataset with OG1-formatted time variables.
Return type:: xarray.Dataset

seagliderOG1.tools.extract_hdm_parameters(list_datasets)[source]

Extracts HDM parameters and their attributes from a list of datasets. If the parameter has the same value across all datasets, it keeps a single value; otherwise, it returns the full list.

Parameters:

list_datasets (list): List of xarray.Dataset objects. standard_names (dict): Vocabulary mapping internal names to standard names.

Returns:

dict: A nested dictionary where keys are standard names and values contain: the ‘data’ and ‘attributes’.

seagliderOG1.tools.find_best_dtype(var_name: str, da: DataArray) → type[source]

Determine the optimal data type for a variable based on its name and values.

Parameters:

var_name – The name of the variable.
da – The data array to analyze.

Returns:

The recommended numpy data type.

Return type:

type

Notes

Latitude/longitude variables use double precision
QC variables use int8
Time variables keep original dtype
Integer variables are downsized based on value range
Float64 variables are converted to float32

seagliderOG1.tools.gather_sensor_info(ds_other: Dataset, ds_sgcal: Dataset, firstrun: bool = False) → Dataset[source]

Gathers sensor information from the provided datasets and organizes it into a new dataset.

Parameters:

ds_other – The dataset containing sensor data.
ds_sgcal – The dataset containing calibration data.
firstrun – A flag indicating if this is the first run. Default is False.
optional – A flag indicating if this is the first run. Default is False.

Returns:

A dataset containing the gathered sensor information.

Return type:

xarray.Dataset

Notes

The function looks for specific sensor names in the ds_other dataset and adds them to a new dataset ds_sensor.
If ‘aanderaa4330_instrument_dissolved_oxygen’ is present in ds_other, it is renamed to ‘aa4330’.
If ‘Pcor’ is present in ds_sgcal, an additional sensor ‘sbe43’ is created based on ‘sbe41’ with specific attributes.
If ‘optode_FoilCoefA1’ is present in ds_sgcal, an additional sensor ‘aa4831’ is created based on ‘sbe41’ with specific attributes.
The function sets appropriate attributes for the sensors ‘aa4330’, ‘aa4831’, and ‘sbe43’ if they are present.

seagliderOG1.tools.get_sg_attrs(ds: Dataset) → dict[source]

Extract seaglider attributes and calibration information into a dictionary.

Parameters:: ds – Dataset containing seaglider attributes and calibration data.
Returns:: Dictionary containing seaglider variables and their attributes.
Return type:: dict

seagliderOG1.tools.merge_parts_of_dataset(ds: Dataset, dim1: str = 'sg_data_point', dim2: str = 'ctd_data_point') → Dataset[source]

Merges variables from a dataset along two dimensions, ensuring consistency in coordinates. The function first separates the dataset into two datasets based on the specified dimensions, renames the second dimension to match the first, and then concatenates them along the first dimension.

Missing time values are filled with NaN, and the final dataset is sorted by time.

Parameters:

ds (xarray.Dataset) – The input dataset containing both dimensions.
dim1 (str) – Primary dimension name (e.g., ‘sg_data_point’).
dim2 (str) – Secondary dimension name to be merged into dim1 (e.g., ‘ctd_data_point’).

Returns:

merged_ds – A merged dataset sorted by time.

Return type:

xarray.Dataset

Notes

Original author: Till Moritz

seagliderOG1.tools.reformat_units_str(old_unit: str, unit_format: dict = {'S/m': 'S m-1', 'cm/s': 'cm s-1', 'degreesCelsius': 'Celsius', 'degrees_Celsius': 'Celsius', 'g/kg': 'g kg-1', 'g/m^3': 'g m-3', 'kg/m^3': 'kg m-3', 'm/s': 'm s-1', 'mS/cm': 'mS cm-1', 'meters': 'm'}) → str[source]

Reformat a unit string based on the provided unit format dictionary.

Parameters:

old_unit – The original unit string to reformat.
unit_format – A dictionary mapping old unit strings to new formatted unit strings.
optional – A dictionary mapping old unit strings to new formatted unit strings.

Returns:

The reformatted unit string, or the original if no mapping exists.

Return type:

str

seagliderOG1.tools.reformat_units_var(ds: Dataset, var_name: str, unit_format: dict = {'S/m': 'S m-1', 'cm/s': 'cm s-1', 'degreesCelsius': 'Celsius', 'degrees_Celsius': 'Celsius', 'g/kg': 'g kg-1', 'g/m^3': 'g m-3', 'kg/m^3': 'kg m-3', 'm/s': 'm s-1', 'mS/cm': 'mS cm-1', 'meters': 'm'}) → str[source]

Rename units in the dataset based on the provided dictionary for OG1.

Parameters:

ds – The input dataset containing variables with units to be renamed.
var_name – The name of the variable whose units should be reformatted.
unit_format – A dictionary mapping old unit strings to new formatted unit strings.
optional – A dictionary mapping old unit strings to new formatted unit strings.

Returns:

The reformatted unit string.

Return type:

str

seagliderOG1.tools.set_best_dtype(ds: Dataset) → Dataset[source]

Optimize data types across all variables in the dataset to reduce memory usage.

Parameters:: ds – The dataset to optimize.
Returns:: Dataset with optimized data types and appropriate fill values.
Return type:: xarray.Dataset

seagliderOG1.tools.set_best_dtype_value(value, var_name: str)[source]

Determine the best data type for a single value based on its variable name and convert it.

Parameters:: value (any) – The input value to convert.
Returns:: converted_value – The value converted to the best data type.
Return type:: any

seagliderOG1.tools.set_fill_value(new_dtype: type) → int[source]

Calculate appropriate fill value for integer data types.

Parameters:: new_dtype – The target integer data type.
Returns:: The fill value calculated as 2^(bits-1) - 1.
Return type:: int

seagliderOG1.tools.split_by_unique_dims(ds: Dataset) → dict[source]

Splits an xarray dataset into multiple datasets based on the unique set of dimensions of the variables.

Parameters:: ds – The input xarray dataset containing various variables.
Returns:: A dictionary mapping dimension tuples to datasets, each with variables sharing the same set of dimensions.
Return type:: dict

seagliderOG1 API

convertOG1

vocabularies

readers

plotters

writers

tools

Parameters:

Returns:

Parameters:

Returns:

utilities

`seagliderOG1 API`