{ "cells": [ { "cell_type": "markdown", "id": "c6a29764-f39c-431c-8e77-fbc6bfe20f01", "metadata": {}, "source": [ "# Demo: mooring-level processing (Step 1)\n", "\n", "This notebook demonstrates the first of the mooring-level processing steps, time_gridding.\n", "\n" ] }, { "cell_type": "markdown", "id": "006edeaa", "metadata": {}, "source": [ "## Step 1: Time Gridding and Optional Filtering Demo\n", "\n", "This notebook demonstrates the Step 1 processing workflow for mooring data:\n", "- Loading multiple instrument datasets\n", "- Optional time-domain filtering (applied BEFORE interpolation)\n", "- Interpolating onto a common time grid\n", "- Combining into a unified mooring dataset\n", "\n", "**Key Point**: Filtering is applied to individual instrument records on their native time grids BEFORE interpolation to preserve data integrity.\n", "\n", "Version: 1.0 \n", "Date: 2025-09-07" ] }, { "cell_type": "code", "execution_count": 1, "id": "6a1920f3", "metadata": { "execution": { "iopub.execute_input": "2025-09-25T06:33:25.881842Z", "iopub.status.busy": "2025-09-25T06:33:25.881669Z", "iopub.status.idle": "2025-09-25T06:33:26.476041Z", "shell.execute_reply": "2025-09-25T06:33:26.475467Z" } }, "outputs": [], "source": [ "import numpy as np\n", "import pandas as pd\n", "import xarray as xr\n", "import matplotlib.pyplot as plt\n", "from pathlib import Path\n", "import yaml\n", "\n", "# Import the time gridding module\n", "from oceanarray.time_gridding import (\n", " TimeGriddingProcessor,\n", " time_gridding_mooring,\n", " process_multiple_moorings_time_gridding\n", ")\n", "\n", "# Set up plotting\n", "plt.style.use('default')\n" ] }, { "cell_type": "markdown", "id": "4ad6e0e5", "metadata": { "vscode": { "languageId": "javascript" } }, "source": [ "### Configuration\n", "\n", "First, let's set up our data paths and examine the mooring configuration." ] }, { "cell_type": "code", "execution_count": 2, "id": "ce860d75", "metadata": { "execution": { "iopub.execute_input": "2025-09-25T06:33:26.478114Z", "iopub.status.busy": "2025-09-25T06:33:26.477834Z", "iopub.status.idle": "2025-09-25T06:33:26.481436Z", "shell.execute_reply": "2025-09-25T06:33:26.481011Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Processing directory: ../data/moor/proc/dsE_1_2018\n", "Configuration file: ../data/moor/proc/dsE_1_2018/dsE_1_2018.mooring.yaml\n", "Config exists: True\n" ] } ], "source": [ "# Set your data paths here\n", "basedir = '../data'\n", "mooring_name = 'dsE_1_2018'\n", "\n", "# Construct paths\n", "proc_dir = Path(basedir) / 'moor' / 'proc' / mooring_name\n", "config_file = proc_dir / f\"{mooring_name}.mooring.yaml\"\n", "\n", "print(f\"Processing directory: {proc_dir}\")\n", "print(f\"Configuration file: {config_file}\")\n", "print(f\"Config exists: {config_file.exists()}\")" ] }, { "cell_type": "code", "execution_count": 3, "id": "a4b5b029", "metadata": { "execution": { "iopub.execute_input": "2025-09-25T06:33:26.482986Z", "iopub.status.busy": "2025-09-25T06:33:26.482821Z", "iopub.status.idle": "2025-09-25T06:33:26.497410Z", "shell.execute_reply": "2025-09-25T06:33:26.496988Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Mooring Configuration:\n", "Name: dsE_1_2018\n", "Water depth: 929 m\n", "Location: 65.47561666666667°N, -29.570083333333333°E\n", "\n", "Instruments (23):\n", " 1. sbe56 (serial: unknown) at 629 m\n", " 2. sbe16 (serial: unknown) at 679 m\n", " 3. sbe56 (serial: unknown) at 689 m\n", " 4. sbe56 (serial: unknown) at 699 m\n", " 5. sbe56 (serial: unknown) at 709 m\n", " 6. sbe56 (serial: unknown) at 719 m\n", " 7. sbe56 (serial: unknown) at 729 m\n", " 8. sbe56 (serial: unknown) at 739 m\n", " 9. sbe56 (serial: unknown) at 749 m\n", " 10. sbe56 (serial: unknown) at 759 m\n", " 11. sbe56 (serial: unknown) at 769 m\n", " 12. sbe16 (serial: unknown) at 780 m\n", " 13. tr1050 (serial: unknown) at 790 m\n", " 14. rbrsolo (serial: unknown) at 800 m\n", " 15. tr1050 (serial: unknown) at 810 m\n", " 16. rbrsolo (serial: unknown) at 820 m\n", " 17. tr1050 (serial: unknown) at 830 m\n", " 18. rbrsolo (serial: unknown) at 840 m\n", " 19. tr1050 (serial: unknown) at 850 m\n", " 20. rbrsolo (serial: unknown) at 860 m\n", " 21. tr1050 (serial: unknown) at 870 m\n", " 22. microcat (serial: unknown) at 880 m\n", " 23. sbe56 (serial: unknown) at 905 m\n" ] } ], "source": [ "# Load and examine the mooring configuration\n", "if config_file.exists():\n", " with open(config_file, 'r') as f:\n", " config = yaml.safe_load(f)\n", "\n", " print(\"Mooring Configuration:\")\n", " print(f\"Name: {config['name']}\")\n", " print(f\"Water depth: {config.get('waterdepth', 'unknown')} m\")\n", " print(f\"Location: {config.get('latitude', 'unknown')}°N, {config.get('longitude', 'unknown')}°E\")\n", " print(f\"\\nInstruments ({len(config.get('instruments', []))}):\")\n", "\n", " for i, inst in enumerate(config.get('instruments', [])):\n", " print(f\" {i+1}. {inst.get('instrument', 'unknown')} \"\n", " f\"(serial: {inst.get('serial num.', 'unknown')}) at {inst.get('depth', 'unknown')} m\")\n", "else:\n", " print(\"Configuration file not found!\")\n", " print(\"Please check your data path and mooring name.\")" ] }, { "cell_type": "markdown", "id": "c87567a6", "metadata": { "vscode": { "languageId": "javascript" } }, "source": [ "### Examine individual instrument files\n", "\n", "Let's look at the individual instrument files before processing to understand the different sampling rates and data characteristics." ] }, { "cell_type": "code", "execution_count": 4, "id": "b824bf8c", "metadata": { "execution": { "iopub.execute_input": "2025-09-25T06:33:26.499111Z", "iopub.status.busy": "2025-09-25T06:33:26.498935Z", "iopub.status.idle": "2025-09-25T06:33:26.544229Z", "shell.execute_reply": "2025-09-25T06:33:26.543758Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "| Instrument | Serial | Depth [m] | File | Start | End | Points | Sampling | Variables |\n", "|:-------------|---------:|------------:|:-----------------------|:--------------------|:--------------------|---------:|:-----------|:-------------------------------------------------------------------------------------------------------------------------|\n", "| sbe56 | 6363 | 629 | MISSING | | | 0 | | |\n", "| sbe16 | 2419 | 679 | MISSING | | | 0 | | |\n", "| sbe56 | 6401 | 689 | MISSING | | | 0 | | |\n", "| sbe56 | 6402 | 699 | MISSING | | | 0 | | |\n", "| sbe56 | 8482 | 709 | MISSING | | | 0 | | |\n", "| sbe56 | 6365 | 719 | MISSING | | | 0 | | |\n", "| sbe56 | 6409 | 729 | MISSING | | | 0 | | |\n", "| sbe56 | 6397 | 739 | MISSING | | | 0 | | |\n", "| sbe56 | 6366 | 749 | MISSING | | | 0 | | |\n", "| sbe56 | 6394 | 759 | MISSING | | | 0 | | |\n", "| sbe56 | 6370 | 769 | MISSING | | | 0 | | |\n", "| sbe16 | 2418 | 780 | MISSING | | | 0 | | |\n", "| tr1050 | 13889 | 790 | MISSING | | | 0 | | |\n", "| rbrsolo | 101651 | 800 | MISSING | | | 0 | | |\n", "| tr1050 | 15580 | 810 | MISSING | | | 0 | | |\n", "| rbrsolo | 101647 | 820 | MISSING | | | 0 | | |\n", "| tr1050 | 13874 | 830 | MISSING | | | 0 | | |\n", "| rbrsolo | 101645 | 840 | MISSING | | | 0 | | |\n", "| tr1050 | 15574 | 850 | MISSING | | | 0 | | |\n", "| rbrsolo | 101646 | 860 | MISSING | | | 0 | | |\n", "| tr1050 | 15577 | 870 | MISSING | | | 0 | | |\n", "| microcat | 7518 | 880 | dsE_1_2018_7518_use.nc | 2018-08-13T07:25:51 | 2018-08-26T10:37:50 | 113473 | 10.0 sec | temperature, salinity, conductivity, pressure, serial_number, InstrDepth, instrument, clock_offset, start_time, end_time |\n", "| sbe56 | 6364 | 905 | MISSING | | | 0 | | |\n", "\n", "Found 1 instrument datasets\n" ] } ], "source": [ "# Find and examine individual instrument files\n", "file_suffix = \"_use\"\n", "instrument_files = []\n", "instrument_datasets = []\n", "rows = []\n", "\n", "if config_file.exists():\n", " for inst_config in config.get(\"instruments\", []):\n", " instrument_type = inst_config.get(\"instrument\", \"unknown\")\n", " serial = inst_config.get(\"serial\", 0)\n", " depth = inst_config.get(\"depth\", 0)\n", "\n", " # Look for the file\n", " filename = f\"{mooring_name}_{serial}{file_suffix}.nc\"\n", " filepath = proc_dir / instrument_type / filename\n", "\n", " if filepath.exists():\n", " ds = xr.open_dataset(filepath)\n", " instrument_files.append(filepath)\n", " instrument_datasets.append(ds)\n", "\n", " # Time coverage\n", " t0, t1 = ds.time.values[0], ds.time.values[-1]\n", " npoints = len(ds.time)\n", "\n", " # Median sampling interval\n", " time_diff = np.diff(ds.time.values) / np.timedelta64(1, \"m\") # in minutes\n", " median_interval = np.nanmedian(time_diff)\n", " if median_interval > 1:\n", " sampling = f\"{median_interval:.1f} min\"\n", " else:\n", " sampling = f\"{median_interval*60:.1f} sec\"\n", "\n", " # Collect a row for the table\n", " rows.append(\n", " {\n", " \"Instrument\": instrument_type,\n", " \"Serial\": serial,\n", " \"Depth [m]\": depth,\n", " \"File\": filepath.name,\n", " \"Start\": str(t0)[:19],\n", " \"End\": str(t1)[:19],\n", " \"Points\": npoints,\n", " \"Sampling\": sampling,\n", " \"Variables\": \", \".join(list(ds.data_vars)),\n", " }\n", " )\n", " else:\n", " rows.append(\n", " {\n", " \"Instrument\": instrument_type,\n", " \"Serial\": serial,\n", " \"Depth [m]\": depth,\n", " \"File\": \"MISSING\",\n", " \"Start\": \"\",\n", " \"End\": \"\",\n", " \"Points\": 0,\n", " \"Sampling\": \"\",\n", " \"Variables\": \"\",\n", " }\n", " )\n", "\n", " # Make a DataFrame summary\n", " summary = pd.DataFrame(rows)\n", " pd.set_option(\"display.max_colwidth\", 80) # allow long var lists\n", " print(summary.to_markdown(index=False))\n", "\n", " print(f\"\\nFound {len(instrument_datasets)} instrument datasets\")\n" ] }, { "cell_type": "markdown", "id": "dcd5ace6", "metadata": {}, "source": [ "### Process with time gridding (no filtering)" ] }, { "cell_type": "code", "execution_count": 5, "id": "9430b5f1", "metadata": { "execution": { "iopub.execute_input": "2025-09-25T06:33:26.545935Z", "iopub.status.busy": "2025-09-25T06:33:26.545777Z", "iopub.status.idle": "2025-09-25T06:33:26.832403Z", "shell.execute_reply": "2025-09-25T06:33:26.831901Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Processing mooring with time gridding only (no filtering)...\n", "============================================================\n", "Starting Step 1 (time gridding) processing for mooring: dsE_1_2018\n", "Using files with suffix: _use\n", "WARNING: File not found: ../data/moor/proc/dsE_1_2018/sbe56/dsE_1_2018_6363_use.nc\n", "WARNING: File not found: ../data/moor/proc/dsE_1_2018/sbe16/dsE_1_2018_2419_use.nc\n", "WARNING: File not found: ../data/moor/proc/dsE_1_2018/sbe56/dsE_1_2018_6401_use.nc\n", "WARNING: File not found: ../data/moor/proc/dsE_1_2018/sbe56/dsE_1_2018_6402_use.nc\n", "WARNING: File not found: ../data/moor/proc/dsE_1_2018/sbe56/dsE_1_2018_8482_use.nc\n", "WARNING: File not found: ../data/moor/proc/dsE_1_2018/sbe56/dsE_1_2018_6365_use.nc\n", "WARNING: File not found: ../data/moor/proc/dsE_1_2018/sbe56/dsE_1_2018_6409_use.nc\n", "WARNING: File not found: ../data/moor/proc/dsE_1_2018/sbe56/dsE_1_2018_6397_use.nc\n", "WARNING: File not found: ../data/moor/proc/dsE_1_2018/sbe56/dsE_1_2018_6366_use.nc\n", "WARNING: File not found: ../data/moor/proc/dsE_1_2018/sbe56/dsE_1_2018_6394_use.nc\n", "WARNING: File not found: ../data/moor/proc/dsE_1_2018/sbe56/dsE_1_2018_6370_use.nc\n", "WARNING: File not found: ../data/moor/proc/dsE_1_2018/sbe16/dsE_1_2018_2418_use.nc\n", "WARNING: File not found: ../data/moor/proc/dsE_1_2018/tr1050/dsE_1_2018_13889_use.nc\n", "WARNING: File not found: ../data/moor/proc/dsE_1_2018/rbrsolo/dsE_1_2018_101651_use.nc\n", "WARNING: File not found: ../data/moor/proc/dsE_1_2018/tr1050/dsE_1_2018_15580_use.nc\n", "WARNING: File not found: ../data/moor/proc/dsE_1_2018/rbrsolo/dsE_1_2018_101647_use.nc\n", "WARNING: File not found: ../data/moor/proc/dsE_1_2018/tr1050/dsE_1_2018_13874_use.nc\n", "WARNING: File not found: ../data/moor/proc/dsE_1_2018/rbrsolo/dsE_1_2018_101645_use.nc\n", "WARNING: File not found: ../data/moor/proc/dsE_1_2018/tr1050/dsE_1_2018_15574_use.nc\n", "WARNING: File not found: ../data/moor/proc/dsE_1_2018/rbrsolo/dsE_1_2018_101646_use.nc\n", "WARNING: File not found: ../data/moor/proc/dsE_1_2018/tr1050/dsE_1_2018_15577_use.nc\n", "Loading microcat serial 7518: dsE_1_2018_7518_use.nc\n", "WARNING: File not found: ../data/moor/proc/dsE_1_2018/sbe56/dsE_1_2018_6364_use.nc\n", "\n", "WARNING: Missing instruments compared to YAML configuration:\n", " - sbe56:6363\n", " - sbe16:2419\n", " - sbe56:6401\n", " - sbe56:6402\n", " - sbe56:8482\n", " - sbe56:6365\n", " - sbe56:6409\n", " - sbe56:6397\n", " - sbe56:6366\n", " - sbe56:6394\n", " - sbe56:6370\n", " - sbe16:2418\n", " - tr1050:13889\n", " - rbrsolo:101651\n", " - tr1050:15580\n", " - rbrsolo:101647\n", " - tr1050:13874\n", " - rbrsolo:101645\n", " - tr1050:15574\n", " - rbrsolo:101646\n", " - tr1050:15577\n", " - sbe56:6364\n", " Expected 23, found 1\n", "\n", "Loaded 1 instrument datasets\n", "Dataset 0 depth 880 [microcat:7518]:\n", " Start: 2018-08-13T07:25:51, End: 2018-08-26T10:37:50\n", " Time interval - Median: 10.02 sec, Range: 9.94 sec to 10.02 sec, Std: 0.04 sec\n", " Variables: ['temperature', 'salinity', 'conductivity', 'pressure', 'serial_number', 'InstrDepth', 'instrument', 'clock_offset', 'start_time', 'end_time']\n", "\n", "TIMING ANALYSIS:\n", " Overall median interval: 0.17 min\n", " Range of median intervals: 0.17 to 0.17 min\n", " Common grid will use 0.17 min intervals\n", " microcat:7518: 0.17 min -> 0.17 min (0.0% change) MINIMAL CHANGE\n", " Common time grid: 113472 points from 2018-08-13T07:25:51.008000000 to 2018-08-26T10:37:41.008000000\n", "\n", "INTERPOLATING FILTERED DATASETS ONTO COMMON GRID:\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Successfully interpolated dataset 0\n", "Variable 'u_velocity' not found in any dataset, skipping\n", "Variable 'v_velocity' not found in any dataset, skipping\n", "Successfully wrote time-gridded dataset: ../data/moor/proc/dsE_1_2018/dsE_1_2018_mooring_use.nc\n", "Combined dataset shape: {'time': 113472, 'N_LEVELS': 1}\n", "Variables: ['temperature', 'salinity', 'conductivity', 'pressure', 'instrument_id']\n", "\n", "Processing result: SUCCESS\n" ] } ], "source": [ "# Process without filtering\n", "print(\"Processing mooring with time gridding only (no filtering)...\")\n", "print(\"=\"*60)\n", "\n", "result = time_gridding_mooring(mooring_name, basedir, file_suffix='_use')\n", "\n", "print(f\"\\nProcessing result: {'SUCCESS' if result else 'FAILED'}\")" ] }, { "cell_type": "code", "execution_count": 6, "id": "60fdf8e1", "metadata": { "execution": { "iopub.execute_input": "2025-09-25T06:33:26.834121Z", "iopub.status.busy": "2025-09-25T06:33:26.833869Z", "iopub.status.idle": "2025-09-25T06:33:26.846062Z", "shell.execute_reply": "2025-09-25T06:33:26.845589Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Output file exists: ../data/moor/proc/dsE_1_2018/dsE_1_2018_mooring_use.nc\n" ] } ], "source": [ "# Load and examine the combined dataset\n", "output_file = proc_dir / f\"{mooring_name}_mooring_use.nc\"\n", "\n", "if output_file.exists():\n", " print(f\"Output file exists: {output_file}\")\n", "\n", " # Load the combined dataset\n", " combined_ds = xr.open_dataset(output_file)\n", "else:\n", " print(\"Output file not found - processing may have failed\")" ] }, { "cell_type": "markdown", "id": "ab26f061", "metadata": {}, "source": [ "### Visualize Combined Dataset\n", "\n", "Let's plot the combined dataset to see how the different instruments look on the common time grid.\n" ] }, { "cell_type": "code", "execution_count": 7, "id": "68780c36", "metadata": { "execution": { "iopub.execute_input": "2025-09-25T06:33:26.847775Z", "iopub.status.busy": "2025-09-25T06:33:26.847588Z", "iopub.status.idle": "2025-09-25T06:33:27.645981Z", "shell.execute_reply": "2025-09-25T06:33:27.645446Z" } }, "outputs": [ { "data": { "image/png": "", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "import numpy as np\n", "import matplotlib.pyplot as plt\n", "\n", "def plot_combined_timeseries(\n", " combined_ds,\n", " variables=(\"temperature\", \"salinity\", \"pressure\"),\n", " cmap_name=\"viridis\",\n", " line_alpha=0.8,\n", " line_width=1.2,\n", " percentile_limits=(1, 99),\n", "):\n", " \"\"\"\n", " Plot selected variables from a combined mooring dataset as stacked time series.\n", "\n", " Parameters\n", " ----------\n", " combined_ds : xarray.Dataset\n", " Must have dims: time, N_LEVELS. Optional coords: nominal_depth, serial_number.\n", " variables : iterable[str]\n", " Variable names to try to plot (if present in dataset).\n", " cmap_name : str\n", " Matplotlib colormap name for coloring by instrument level.\n", " line_alpha : float\n", " Line transparency.\n", " line_width : float\n", " Line width.\n", " percentile_limits : (low, high)\n", " Percentiles to use for automatic y-limits (e.g., (1, 99)).\n", " \"\"\"\n", " if combined_ds is None:\n", " print(\"Combined dataset not available.\")\n", " return None, None\n", " n_levels = combined_ds.sizes.get(\"N_LEVELS\")\n", " if n_levels is None:\n", " raise ValueError(\"Dataset must contain dimension 'N_LEVELS'.\")\n", "\n", " available = [v for v in variables if v in combined_ds.data_vars]\n", " if not available:\n", " print(\"No requested variables found to plot.\")\n", " return None, None\n", "\n", " # Colors by level\n", " cmap = plt.get_cmap(cmap_name)\n", " colors = cmap(np.linspace(0, 1, n_levels))\n", "\n", " fig, axes = plt.subplots(\n", " len(available), 1, figsize=(14, 3.6 * len(available)), sharex=True, constrained_layout=True\n", " )\n", " if len(available) == 1:\n", " axes = [axes]\n", "\n", " depth_arr = combined_ds.get(\"nominal_depth\")\n", " serial_arr = combined_ds.get(\"serial_number\")\n", "\n", " first_axis = True\n", " for ax, var in zip(axes, available):\n", " values_for_limits = []\n", " for level in range(n_levels):\n", " depth = None if depth_arr is None else depth_arr.values[level]\n", " serial = None if serial_arr is None else serial_arr.values[level]\n", " label = None\n", " if first_axis:\n", " if depth is not None and np.isfinite(depth):\n", " label = f\"Serial {serial} ({int(depth)} m)\" if serial is not None else f\"({int(depth)} m)\"\n", " elif serial is not None:\n", " label = f\"Serial {serial}\"\n", "\n", " da = combined_ds[var].isel(N_LEVELS=level)\n", " da = da.where(np.isfinite(da), drop=True)\n", " if da.size == 0:\n", " continue\n", "\n", " values_for_limits.append(da.values)\n", "\n", " ax.plot(\n", " da[\"time\"].values,\n", " da.values,\n", " color=colors[level],\n", " alpha=line_alpha,\n", " linewidth=line_width,\n", " label=label,\n", " )\n", "\n", " # Set labels and grid\n", " ax.set_ylabel(var.replace(\"_\", \" \").title())\n", " ax.grid(True, alpha=0.3)\n", " ax.set_title(f\"{var.replace('_', ' ').title()} — Combined Time Grid\")\n", "\n", " # Legend only once\n", " if first_axis:\n", " ax.legend(ncol=3, fontsize=8, loc=\"upper right\", frameon=False)\n", " first_axis = False\n", "\n", " # Auto y-limits based on percentiles\n", " if values_for_limits:\n", " flat = np.concatenate(values_for_limits)\n", " low, high = np.nanpercentile(flat, percentile_limits)\n", " ax.set_ylim(low, high)\n", "\n", " axes[-1].set_xlabel(\"Time\")\n", " return fig, axes\n", "\n", "# Usage:\n", "if 'combined_ds' in locals():\n", " plot_combined_timeseries(combined_ds)\n" ] } ], "metadata": { "kernelspec": { "display_name": "venv (3.11.7)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.13.7" } }, "nbformat": 4, "nbformat_minor": 5 }