Data - Bathymetry & Pangaea

cruiseplan.data package.

This package contains data management and external data integration modules:

bathymetry: Bathymetry data handling using ETOPO datasets with interpolation
cache: File-based caching system for expensive computations and downloads
eez_boundaries: EEZ boundary data management and spatial operations
pangaea: Integration with Pangaea data repository for oceanographic data

These modules handle external data sources, caching strategies, and data processing required for cruise planning, including bathymetric information and scientific datasets.

cruiseplan.data.bathymetry module

Bathymetry data download and management.

This module provides functionality for downloading, caching, and accessing bathymetry data from ETOPO datasets for depth lookups.

class cruiseplan.data.bathymetry.BathymetryManager(source: str = 'gebco2025', data_dir: str = 'data/bathymetry')[source]

Bases: object

Handles ETOPO bathymetry data with lazy loading and bilinear interpolation.

Manages bathymetric data from ETOPO datasets, providing depth lookups and grid subsets for oceanographic applications. Implements fallback to mock data when bathymetry files are unavailable.

source

Bathymetry data source identifier.

Type:: str

data_dir

Directory containing bathymetry data files.

Type:: Path

_is_mock

Whether the manager is operating in mock mode.

Type:: bool

_dataset

NetCDF dataset object when loaded.

Type:: Optional[nc.Dataset]

_lats

Latitude coordinate array.

Type:: Optional[np.ndarray]

_lons

Longitude coordinate array.

Type:: Optional[np.ndarray]

__init__(source: str = 'gebco2025', data_dir: str = 'data/bathymetry')[source]

Initialize the bathymetry manager.

Parameters:

source (str, optional) – Bathymetry data source (default: “etopo2022”).
data_dir (str, optional) – Data directory. Can be absolute path or relative to current working directory (default: “data/bathymetry”).

close()[source]

Close the NetCDF dataset if open.

Should be called when the manager is no longer needed to free resources.

ensure_gebco_2025(silent_if_exists=False)[source]

Ensure GEBCO 2025 dataset is available for high-resolution bathymetry.

Downloads and extracts GEBCO 2025 data if not present or corrupted. The full dataset is ~7.5GB uncompressed, downloaded as ~4GB zip.

Parameters:: silent_if_exists (bool, optional) – If True, don’t log when file already exists (default: False).
Returns:: True if GEBCO 2025 is available, False if download failed or cancelled.
Return type:: bool

get_depth_at_point(lat: float, lon: float) → float[source]

Get depth at a specific geographic point.

Returns depth in meters (negative values indicate depth below sea level). Uses bilinear interpolation on the ETOPO grid for accurate results.

Parameters:

lat (float) – Latitude in decimal degrees.
lon (float) – Longitude in decimal degrees.

Returns:

Depth in meters (negative for below sea level).

Return type:

float

get_grid_subset(lat_min, lat_max, lon_min, lon_max, stride=1)[source]

Get a subset of the bathymetry grid for contour plotting.

Returns 2D arrays suitable for matplotlib contour plotting. Supports downsampling with stride parameter for performance.

Parameters:

lat_min (float) – Minimum latitude of the subset.
lat_max (float) – Maximum latitude of the subset.
lon_min (float) – Minimum longitude of the subset.
lon_max (float) – Maximum longitude of the subset.
stride (int, optional) – Downsampling factor (default: 1, no downsampling).

Returns:

Tuple of (lons, lats, depths) as 2D numpy arrays for contour plotting.

Return type:

tuple

Notes

This method performs expensive NetCDF grid slicing operations. Consider using cruiseplan.utils.cache.CacheManager for repeated grid subset requests with overlapping geographic bounds, especially in interactive applications like station_picker and map generation workflows.

cruiseplan.data.bathymetry.check_bathymetry_availability(source: str) → bool[source]

Check if bathymetry files are available for the specified source.

Parameters:: source (str) – Bathymetry source (“etopo2022” or “gebco2025”)
Returns:: True if bathymetry files are available and valid, False otherwise
Return type:: bool

cruiseplan.data.bathymetry.determine_bathymetry_source(requested_source: str) → str[source]

Determine the optimal bathymetry source with automatic fallback.

If the requested source is not available but an alternative is, automatically switch to the available source.

Parameters:: requested_source (str) – The user’s requested bathymetry source
Returns:: The optimal available bathymetry source
Return type:: str

cruiseplan.data.bathymetry.download_bathymetry(target_dir: str = 'data/bathymetry', source: str = 'gebco2025') → bool[source]

Download bathymetry dataset with progress bar.

Downloads either ETOPO 2022 (60s resolution) or GEBCO 2025 (15s resolution) bathymetry data based on the source parameter.

Parameters:

target_dir (str, optional) – Target directory for bathymetry files (default: “data/bathymetry”). Files will be saved directly in this directory.
source (str, optional) – Bathymetry source to download (default: “etopo2022”). Options: “etopo2022”, “gebco2025”, “msm142”, “msm142_jj”, “msm142_dt”.

Returns:

Returns file path (str) if successful or file exists, False if explicitly cancelled, None if download failed completely.

Return type:

str, bool, or None

cruiseplan.data.bathymetry.get_bathymetry_singleton()[source]: Get the singleton BathymetryManager instance (lazy initialization).

cruiseplan.data.pangaea module

PANGAEA database integration and data retrieval.

This module provides functionality for searching, downloading, and processing oceanographic datasets from the PANGAEA data repository.

class cruiseplan.data.pangaea.PangaeaManager(cache_dir: str = '.cache')[source]

Bases: object

Manager for PANGAEA dataset search and retrieval.

Provides functionality to search PANGAEA datasets using spatial queries, fetch metadata and coordinate data, and cache results for performance. Integrates with the cruise planning system for incorporating existing cruise tracks into planning workflows.

cache

File-based cache for storing fetched dataset metadata.

Type:: CacheManager

__init__(cache_dir: str = '.cache')[source]

Initialize PANGAEA manager.

Parameters:: cache_dir (str, optional) – Directory for cache storage (default: “.cache”).

create_map(datasets: list[dict[str, Any]], filename: str = 'pangaea_map.html', include_eez: bool = True) → Path[source]

Convenience wrapper to visualize datasets fetched by this manager.

Parameters:

datasets (List[Dict[str, Any]]) – List of dataset dictionaries with latitude/longitude data.
filename (str, optional) – Output filename for the HTML map (default: “pangaea_map.html”).
include_eez (bool, optional) – Include EEZ boundaries on the map (default: True).

Returns:

Path to the generated HTML map file.

Return type:

Path

fetch_datasets(doi_list: list[str], rate_limit: float | None = None, merge_campaigns: bool = False, progress_callback: Callable[[int, int, str], None] | None = None) → list[dict[str, Any]][source]

Process a list of DOIs and return standardized metadata objects.

Parameters:

doi_list (List[str]) – List of DOI strings to fetch.
rate_limit (Optional[float], optional) – Optional requests per second limit (None = no rate limiting).
merge_campaigns (bool, optional) – Whether to merge campaigns with same name (default: False).
progress_callback (Optional[Callable[[int, int, str], None]], optional) – Optional function(current, total, message) for progress updates.

Returns:

List of dataset dictionaries with standardized metadata.

Return type:

List[Dict[str, Any]]

search(query: str, bbox: tuple | None = None, limit: int = 10) → list[dict[str, Any]][source]

Search PANGAEA using the native PanQuery bbox support.

Parameters:

query (str) – Search query string for PANGAEA datasets.
bbox (tuple, optional) – Bounding box as (min_lon, min_lat, max_lon, max_lat).
limit (int, optional) – Maximum number of results to return (default: 10).

Returns:

List of dataset metadata dictionaries.

Return type:

List[Dict[str, Any]]

cruiseplan.data.pangaea.load_campaign_data(file_path: str | Path, merge_tracks: bool = True) → list[dict][source]

Load pre-processed PANGAEA campaign data from pickle file.

This function is required by the CLI to integrate with the interactive picker. If merge_tracks is True, it ensures that all datasets with the same label are combined into a single track before being returned.

Parameters:

file_path (Union[str, Path]) – Path to the pickled PANGAEA campaign file.
merge_tracks (bool) – If True, runs merge_campaign_tracks on the loaded data. (Default: True)

Returns:

List of campaign datasets (merged if requested).

Return type:

List[Dict]

cruiseplan.data.pangaea.merge_campaign_tracks(datasets: list[dict]) → list[dict][source]

Merges datasets by their ‘label’ (campaign).

Aggregates coordinates into single arrays and collects all source DOIs.

Parameters:: datasets (List[Dict]) – List of dataset dictionaries to merge.
Returns:: Merged campaign datasets with combined coordinates.
Return type:: List[Dict]

cruiseplan.data.pangaea.read_doi_list(file_path: str | Path) → list[str][source]

Read DOI list from text file, filtering out comments and empty lines.

Parameters:: file_path (Union[str, Path]) – Path to DOI list file
Returns:: List of DOI strings
Return type:: list[str]
Raises:: ValueError – If file cannot be read or contains no valid DOIs

cruiseplan.data.pangaea.save_campaign_data(datasets: list[dict], file_path: str | Path, progress_callback: Callable[[str], None] | None = None, original_dataset_count: int | None = None) → None[source]

Save PANGAEA datasets to pickle file.

Parameters:

datasets (List[Dict]) – List of dataset dictionaries to save.
file_path (Union[str, Path]) – Output file path for the pickle file.
progress_callback (Optional[Callable[[str], None]], optional) – Optional function(message) for progress updates.
original_dataset_count (Optional[int], optional) – Optional count of datasets before merging for summary.

Raises:

ValueError – If there’s an error saving the file.