Data - Bathymetry & Pangaea
cruiseplan.data package.
This package contains data management and external data integration modules:
bathymetry: Bathymetry data handling using ETOPO datasets with interpolationcache: File-based caching system for expensive computations and downloadspangaea: Integration with Pangaea data repository for oceanographic data
These modules handle external data sources, caching strategies, and data processing required for cruise planning, including bathymetric information and scientific datasets.
cruiseplan.data.bathymetry module
Bathymetry data download and management.
This module provides functionality for downloading, caching, and accessing bathymetry data from ETOPO datasets for depth lookups.
- class cruiseplan.data.bathymetry.BathymetryManager(source: str = 'etopo2022', data_dir: str = 'data/bathymetry')[source]
Bases:
objectHandles ETOPO bathymetry data with lazy loading and bilinear interpolation.
Manages bathymetric data from ETOPO datasets, providing depth lookups and grid subsets for oceanographic applications. Implements fallback to mock data when bathymetry files are unavailable.
- source
Bathymetry data source identifier.
- Type:
str
- data_dir
Directory containing bathymetry data files.
- Type:
Path
- _is_mock
Whether the manager is operating in mock mode.
- Type:
bool
- _dataset
NetCDF dataset object when loaded.
- Type:
Optional[nc.Dataset]
- _lats
Latitude coordinate array.
- Type:
Optional[np.ndarray]
- _lons
Longitude coordinate array.
- Type:
Optional[np.ndarray]
- __init__(source: str = 'etopo2022', data_dir: str = 'data/bathymetry')[source]
Initialize the bathymetry manager.
- Parameters:
source (str, optional) – Bathymetry data source (default: “etopo2022”).
data_dir (str, optional) – Data directory. Can be absolute path or relative to current working directory (default: “data/bathymetry”).
- close()[source]
Close the NetCDF dataset if open.
Should be called when the manager is no longer needed to free resources.
- ensure_gebco_2025(silent_if_exists=False)[source]
Ensure GEBCO 2025 dataset is available for high-resolution bathymetry.
Downloads and extracts GEBCO 2025 data if not present or corrupted. The full dataset is ~7.5GB uncompressed, downloaded as ~4GB zip.
- Parameters:
silent_if_exists (bool, optional) – If True, don’t log when file already exists (default: False).
- Returns:
True if GEBCO 2025 is available, False if download failed or cancelled.
- Return type:
bool
- get_depth_at_point(lat: float, lon: float) float[source]
Get depth at a specific geographic point.
Returns depth in meters (negative values indicate depth below sea level). Uses bilinear interpolation on the ETOPO grid for accurate results.
- Parameters:
lat (float) – Latitude in decimal degrees.
lon (float) – Longitude in decimal degrees.
- Returns:
Depth in meters (negative for below sea level).
- Return type:
float
- get_grid_subset(lat_min, lat_max, lon_min, lon_max, stride=1)[source]
Get a subset of the bathymetry grid for contour plotting.
Returns 2D arrays suitable for matplotlib contour plotting. Supports downsampling with stride parameter for performance.
- Parameters:
lat_min (float) – Minimum latitude of the subset.
lat_max (float) – Maximum latitude of the subset.
lon_min (float) – Minimum longitude of the subset.
lon_max (float) – Maximum longitude of the subset.
stride (int, optional) – Downsampling factor (default: 1, no downsampling).
- Returns:
Tuple of (lons, lats, depths) as 2D numpy arrays for contour plotting.
- Return type:
tuple
Notes
This method performs expensive NetCDF grid slicing operations. Consider using cruiseplan.utils.cache.CacheManager for repeated grid subset requests with overlapping geographic bounds, especially in interactive applications like station_picker and map generation workflows.
- cruiseplan.data.bathymetry.check_bathymetry_availability(source: str) bool[source]
Check if bathymetry files are available for the specified source.
- Parameters:
source (str) – Bathymetry source (“etopo2022” or “gebco2025”)
- Returns:
True if bathymetry files are available and valid, False otherwise
- Return type:
bool
- cruiseplan.data.bathymetry.determine_bathymetry_source(requested_source: str) str[source]
Determine the optimal bathymetry source with automatic fallback.
If the requested source is not available but an alternative is, automatically switch to the available source.
- Parameters:
requested_source (str) – The user’s requested bathymetry source
- Returns:
The optimal available bathymetry source
- Return type:
str
- cruiseplan.data.bathymetry.download_bathymetry(target_dir: str = 'data/bathymetry', source: str = 'etopo2022') bool[source]
Download bathymetry dataset with progress bar.
Downloads either ETOPO 2022 (60s resolution) or GEBCO 2025 (15s resolution) bathymetry data based on the source parameter.
- Parameters:
target_dir (str, optional) – Target directory for bathymetry files (default: “data/bathymetry”). Files will be saved directly in this directory.
source (str, optional) – Bathymetry source to download (default: “etopo2022”). Options: “etopo2022”, “gebco2025”.
- Returns:
Returns file path (str) if successful or file exists, False if explicitly cancelled, None if download failed completely.
- Return type:
str, bool, or None
cruiseplan.data.pangaea module
PANGAEA database integration and data retrieval.
This module provides functionality for searching, downloading, and processing oceanographic datasets from the PANGAEA data repository.
- class cruiseplan.data.pangaea.PangaeaManager(cache_dir: str = '.cache')[source]
Bases:
objectManager for PANGAEA dataset search and retrieval.
Provides functionality to search PANGAEA datasets using spatial queries, fetch metadata and coordinate data, and cache results for performance. Integrates with the cruise planning system for incorporating existing cruise tracks into planning workflows.
- cache
File-based cache for storing fetched dataset metadata.
- Type:
- __init__(cache_dir: str = '.cache')[source]
Initialize PANGAEA manager.
- Parameters:
cache_dir (str, optional) – Directory for cache storage (default: “.cache”).
- create_map(datasets: list[dict[str, Any]], filename: str = 'pangaea_map.html') Path[source]
Convenience wrapper to visualize datasets fetched by this manager.
- Parameters:
datasets (List[Dict[str, Any]]) – List of dataset dictionaries with latitude/longitude data.
filename (str, optional) – Output filename for the HTML map (default: “pangaea_map.html”).
- Returns:
Path to the generated HTML map file.
- Return type:
Path
- fetch_datasets(doi_list: list[str], rate_limit: float | None = None, merge_campaigns: bool = False, progress_callback: Callable[[int, int, str], None] | None = None) list[dict[str, Any]][source]
Process a list of DOIs and return standardized metadata objects.
- Parameters:
doi_list (List[str]) – List of DOI strings to fetch.
rate_limit (Optional[float], optional) – Optional requests per second limit (None = no rate limiting).
merge_campaigns (bool, optional) – Whether to merge campaigns with same name (default: False).
progress_callback (Optional[Callable[[int, int, str], None]], optional) – Optional function(current, total, message) for progress updates.
- Returns:
List of dataset dictionaries with standardized metadata.
- Return type:
List[Dict[str, Any]]
- search(query: str, bbox: tuple | None = None, limit: int = 10) list[dict[str, Any]][source]
Search PANGAEA using the native PanQuery bbox support.
- Parameters:
query (str) – Search query string for PANGAEA datasets.
bbox (tuple, optional) – Bounding box as (min_lon, min_lat, max_lon, max_lat).
limit (int, optional) – Maximum number of results to return (default: 10).
- Returns:
List of dataset metadata dictionaries.
- Return type:
List[Dict[str, Any]]
- cruiseplan.data.pangaea.load_campaign_data(file_path: str | Path, merge_tracks: bool = True) list[dict][source]
Load pre-processed PANGAEA campaign data from pickle file.
This function is required by the CLI to integrate with the interactive picker. If merge_tracks is True, it ensures that all datasets with the same label are combined into a single track before being returned.
- Parameters:
file_path (Union[str, Path]) – Path to the pickled PANGAEA campaign file.
merge_tracks (bool) – If True, runs merge_campaign_tracks on the loaded data. (Default: True)
- Returns:
List of campaign datasets (merged if requested).
- Return type:
List[Dict]
- cruiseplan.data.pangaea.merge_campaign_tracks(datasets: list[dict]) list[dict][source]
Merges datasets by their ‘label’ (campaign).
Aggregates coordinates into single arrays and collects all source DOIs.
- Parameters:
datasets (List[Dict]) – List of dataset dictionaries to merge.
- Returns:
Merged campaign datasets with combined coordinates.
- Return type:
List[Dict]
- cruiseplan.data.pangaea.read_doi_list(file_path: str | Path) list[str][source]
Read DOI list from text file, filtering out comments and empty lines.
- Parameters:
file_path (Union[str, Path]) – Path to DOI list file
- Returns:
List of DOI strings
- Return type:
list[str]
- Raises:
ValueError – If file cannot be read or contains no valid DOIs
- cruiseplan.data.pangaea.save_campaign_data(datasets: list[dict], file_path: str | Path, progress_callback: Callable[[str], None] | None = None, original_dataset_count: int | None = None) None[source]
Save PANGAEA datasets to pickle file.
- Parameters:
datasets (List[Dict]) – List of dataset dictionaries to save.
file_path (Union[str, Path]) – Output file path for the pickle file.
progress_callback (Optional[Callable[[str], None]], optional) – Optional function(message) for progress updates.
original_dataset_count (Optional[int], optional) – Optional count of datasets before merging for summary.
- Raises:
ValueError – If there’s an error saving the file.