Coding Conventions

This document outlines the coding standards and conventions for the seagliderOG1 project.

Code Style

Python Code Formatting

We use Black for consistent code formatting:

black .

Key formatting rules:

  • Line length: 88 characters (Black default)

  • Use double quotes for strings

  • Consistent indentation (4 spaces)

  • Trailing commas in multi-line structures

Linting

We use Ruff for fast Python linting:

ruff check --fix  # Auto-fix issues where possible
ruff check        # Check without fixing

Enabled rule categories:

  • E, W - pycodestyle errors and warnings

  • F - pyflakes errors

  • D - pydocstyle (documentation)

  • ANN - type annotations

  • B - flake8-bugbear

  • C90 - mccabe complexity

  • TRY - tryceratops

  • ARG - flake8-arguments

  • SLF - flake8-self

Pre-commit Hooks

Run before every commit:

pre-commit run --all-files

This automatically runs:

  • Black formatting

  • Ruff linting with auto-fix

  • Codespell for typos

  • Basic file checks (trailing whitespace, end-of-file, YAML validation)

  • pytest tests

Code Organization

File Structure

Follow the established package structure:

seagliderOG1/
├── readers.py      # Data input/reading functions
├── writers.py      # Data output/writing functions  
├── convertOG1.py   # Main conversion logic
├── tools.py        # User-facing utility functions
├── utilities.py    # Internal helper functions
├── vocabularies.py # OG1 vocabulary mappings
└── plotters.py     # Visualization functions

Import Organization

Order imports following PEP 8:

# Standard library
import logging
import os
from datetime import datetime

# Third-party packages
import numpy as np
import pandas as pd
import xarray as xr

# Local imports
from seagliderOG1 import utilities, vocabularies

Documentation

Docstrings

Use numpy-style docstrings for all public functions:

def convert_to_OG1(list_of_datasets, contrib_to_append=None):
    """
    Processes datasets and converts them to OG1 format.

    Parameters
    ----------
    list_of_datasets : list[xr.Dataset] | xr.Dataset
        A list of xarray datasets or a single dataset in basestation format.
    contrib_to_append : dict[str, str] | None, optional
        Dictionary containing additional contributor information. Default is None.

    Returns
    -------
    tuple[xr.Dataset, list[str]]
        A tuple containing:
        - ds_og1 : xr.Dataset
            The processed dataset in OG1 format.
        - varlist : list[str]
            A list of variable names from input datasets.

    Raises
    ------
    ValueError
        If input datasets are invalid.

    Examples
    --------
    >>> datasets = readers.load_basestation_files("path/to/files/")
    >>> og1_data, variables = convert_to_OG1(datasets)
    """

Comments

  • Use comments sparingly for complex logic

  • Prefer self-documenting code with clear variable names

  • Add TODO comments for future improvements

Variable Naming

Conventions

  • Functions and variables: snake_case

  • Constants: UPPER_CASE

  • Classes: PascalCase (when needed)

  • Private functions: _leading_underscore

Dataset Variables

Follow OG1 naming conventions:

  • Time coordinates: TIME, TIME_GPS

  • Spatial coordinates: LATITUDE, LONGITUDE, DEPTH

  • Measurements: Descriptive names in UPPER_CASE

  • QC flags: Append _QC to variable name

Descriptive Names

Prefer descriptive names over abbreviations:

# Good
temperature_data = ds["TEMP"]
pressure_sensor = ds["PRES"] 
dive_number = ds.attrs["dive_number"]

# Avoid
temp = ds["TEMP"]
p = ds["PRES"]
dn = ds.attrs["dive_number"]

Testing

Test Structure

  • Place tests in tests/ directory

  • Mirror package structure: test_module.py for each module.py

  • Use pytest for all testing

Test Naming

def test_convert_to_og1_single_dataset():
    """Test conversion with a single input dataset."""
    pass

def test_convert_to_og1_multiple_datasets():
    """Test conversion with multiple input datasets."""
    pass

Coverage

  • Aim for good test coverage of public functions

  • Test both success and failure cases

  • Include edge cases and boundary conditions

Data Handling

xarray Best Practices

  • Use descriptive coordinate and dimension names

  • Set appropriate attributes for variables

  • Handle missing data consistently (use NaN)

  • Use .compute() for dask arrays when needed

File I/O

  • Use context managers for file operations

  • Handle file paths consistently (use pathlib when appropriate)

  • Provide informative error messages for I/O failures

Error Handling

Logging

Use the standard logging module:

import logging

_log = logging.getLogger(__name__)

def process_data(data):
    _log.info("Starting data processing")
    try:
        result = expensive_operation(data)
        _log.debug(f"Processed {len(result)} items")
        return result
    except Exception as e:
        _log.error(f"Failed to process data: {e}")
        raise

Exception Handling

  • Use specific exception types when possible

  • Provide helpful error messages

  • Log errors appropriately

  • Don’t suppress exceptions without good reason

Performance

Efficiency Guidelines

  • Use vectorized operations with NumPy/pandas

  • Avoid unnecessary data copying

  • Use appropriate data types (e.g., float32 vs float64)

  • Consider memory usage for large datasets

Memory Management

  • Use generators for large data processing

  • Clear large variables when no longer needed

  • Be mindful of xarray lazy loading

Version Control

Commit Messages

Use clear, descriptive commit messages:

[FEAT] Add GPS coordinate interpolation
[FIX] Handle missing pressure data in dive profiles  
[DOC] Update installation instructions
[TEST] Add tests for vocabulary mapping

Branch Naming

  • Feature branches: feature/description

  • Bug fixes: fix/description

  • Documentation: docs/description

Configuration

YAML Files

  • Use consistent indentation (2 spaces)

  • Include comments for complex configurations

  • Validate YAML syntax

  • Follow the existing structure in seagliderOG1/config/ directory

Dependencies

Adding New Dependencies

  • Add to appropriate requirements file:

    • requirements.txt - Runtime dependencies

    • requirements-dev.txt - Development tools

  • Use version constraints appropriately

  • Document why new dependencies are needed

  • Prefer established, well-maintained packages