fetchAZA API
convertAZA
- fetchAZA.convertAZA.convertAZA(data_path, fn, STN='sample', deploy_date='2000-01-01', recovery_date='2099-01-01', latitude='0', longitude='0', water_depth='0', keys=['DQZ', 'PIES', 'INC', 'TMP', 'KLR'], cleanup=True, overwrite=None)[source]
Processes and converts AZA data from CSV to netCDF format.
- Parameters:
data_path (str) – Path to the data directory.
fn (str) – Filename of the input CSV file.
STN (str) – Station identifier.
deploy_date (str) – Deployment date in ‘YYYY-MM-DD’ format.
recovery_date (str) – Recovery date in ‘YYYY-MM-DD’ format.
latitude (float) – Latitude of the station.
longitude (float) – Longitude of the station.
water_depth (float) – Water depth at the station.
keys (list of str) – Keys for intermediate files to process.
cleanup (bool) – Whether to delete intermediate files after processing.
overwrite (bool, optional) – If True, overwrite existing files. If False, skip existing files. If None, prompt the user for input.
- Returns:
ds_pressure and ds_AZA datasets.
- Return type:
tuple
readers
- fetchAZA.readers.combine_pattern(datasets, var_to_combine='INDEX_NUM')[source]
Combines multiple xarray datasets into a single dataset, adding relevant attributes.
- Parameters:
datasets (dict) – A dictionary where keys are dataset names and values are xarray.Dataset objects.
pattern (list) – The pattern used for grouping events.
file_path (str) – The file path of the original CSV file.
- Returns:
A combined xarray dataset with added attributes.
- Return type:
xarray.Dataset
- fetchAZA.readers.csv_to_xarray(file_path)[source]
Converts a CSV file into a dictionary of xarray Datasets, grouped by event type. The CSV file is expected to have a header section and a data section. The header section contains metadata about the columns, including event types, column names, and optional units. The data section contains the actual data values, with each row corresponding to an event type. The function processes the header to extract column names and units, and then groups the data by event type to create xarray Datasets. Units are assigned as attributes to the corresponding variables in the datasets.
- Parameters:
file_path (str) – The path to the CSV file to be processed.
- Returns:
A dictionary where keys are event types (str) and values are xarray.Dataset objects containing the data for each event type. Each dataset includes variables with units assigned as attributes (if available).
- Return type:
dict
Notes
The header section should precede the data section in the CSV file.
The data section should start with a line containing “# Data”.
Each row in the data section should begin with the event type, followed by the data values.
Column names in the header can include units in the format “Column Name (Unit)” or “Column Name %” or “Column Name V”.
- fetchAZA.readers.csv_to_xarray_pattern(file_path, pattern=['AZS', 'AZA', 'AZA', 'AZA', 'AZS'])[source]
Reads a CSV file with a header section and a data section (after a “# Data” line), then searches the data lines for contiguous groups of lines that match the given logging event pattern (by default: [‘AZS’, ‘AZA’, ‘AZA’, ‘AZA’, ‘AZS’]).
- For each matched group, two new columns are appended:
“Sample Num”: increments for each matched group (starting at 1).
“Sequence Num”: within the group, numbering from 1 to len(pattern).
Only groups that match the pattern are processed; any data lines that do not belong to such a group are skipped.
The function then groups rows by event type and creates an xarray.Dataset for each, using the header information (column names and units) for that event type.
- Parameters:
file_path (str) – Path to the CSV file.
pattern (list of str, optional) – A list of event types defining a valid group (default: [‘AZS’,’AZA’,’AZA’,’AZA’,’AZS’]).
- Returns:
A dictionary with keys for each event type found in matched groups and values that are xarray.Datasets built from the corresponding rows.
- Return type:
dict
- fetchAZA.readers.data_overview(data_path, base_filename, extracted_parts, output_file_path=None)[source]
Reads data details from a directory and prints or writes them to a file.
- Parameters:
data_path (str) – The path to the data directory.
base_filename (str) – The base filename for the data files.
extracted_parts (list) – A list of parts to be extracted from the base filename.
output_file_path (str, optional) – The path to the output file. If None, prints to standard output.
- Return type:
None
- fetchAZA.readers.load_netcdf_datasets(data_path, file_root, keys=None)[source]
Load netCDF files matching the given file_root and optional keys from the specified data_path into a dictionary of xarray datasets.
- Parameters:
data_path (str) – The directory path where the netCDF files are located.
file_root (str) – The root name of the files to match.
keys (list of str, optional) – A list of keys to filter the files. Only files containing these keys in their names will be loaded. If None, all matching files will be loaded.
- Returns:
A dictionary where keys are dataset names and values are xarray datasets.
- Return type:
dict
- fetchAZA.readers.parse_column_name(col)[source]
Extracts the column name and unit from a given column string.
- Parameters:
col (str) – The column string, which may include a unit in parentheses or specific suffixes.
- Returns:
A tuple containing the column name (str) and the unit (str or None).
- Return type:
tuple
- fetchAZA.readers.parse_header_data(lines, allowed_events=None)[source]
Reads a CSV file and separates it into header and data lines, optionally filtering by allowed event types.
- Parameters:
file_path (str) – Path to the CSV file.
allowed_events (set, optional) – A set of allowed event types. If provided, only lines with the first field matching an allowed event type will be included. If None, all lines are included.
- Returns:
A tuple containing: - header_lines (list of str): The header lines from the file. - data_lines (list of str): The data lines from the file.
- Return type:
tuple
- fetchAZA.readers.process_header_lines(header_lines)[source]
Processes the header lines to extract event headers and units.
- Parameters:
header_lines (list of str) – List of header lines from the CSV file.
- Returns:
A tuple containing: - event_headers (dict): A dictionary mapping event types to column headers. - event_units (dict): A dictionary mapping event types to column units.
- Return type:
tuple
- fetchAZA.readers.read_csv_to_xarray(input_file, deploy_date=None, recovery_date=None, pattern=['AZS', 'AZA', 'AZA', 'AZA', 'AZS'])[source]
Processes a CSV file and converts it into NetCDF-compatible datasets. This function reads data from the input CSV file, processes it into xarray datasets by separating the data into datasets corresponding to each logging event type, standardizes the datasets, and then processes datasets that match a specified sequence pattern. The matched datasets are further combined into an additional xarray dataset.
- Parameters:
input_file (str) – The path to the input CSV file.
pattern (list of str, optional) – A list of event types defining a valid group (default: [‘AZS’, ‘AZA’, ‘AZA’, ‘AZA’, ‘AZS’]).
- Returns:
A dictionary of standardized xarray datasets. The keys are event types, and the values are the corresponding datasets. An additional dataset with the key ‘AZAseq’ contains the combined dataset created from patterns in the input CSV file.
- Return type:
dict
Notes
The function first processes the input file into individual datasets based on event types.
It then standardizes each dataset by converting constant variables to attributes, handling fill values, and ensuring proper data types.
The function identifies and processes groups of rows matching the specified pattern, adding sequence-related metadata.
The combined dataset (‘AZAseq’) includes additional attributes describing the sequence pattern.
- fetchAZA.readers.standardise_dataset(ds, replace_fill=True, fill_value=nan)[source]
- Standardizes the given dataset by:
Converting constant variables to global attributes and removing them.
Converting integer-like and float-like strings to numbers.
Converting datetime strings to datetime64.
Dropping variables marked for removal.
Fixing object variables via utilities.reformat_object_vars.
Additionally, for any variable of an integer type where the valid values should be >= 1, any 0 encountered is assumed to be a fill value. If replace_fill is True, those 0’s are replaced with fill_value (default –9999) and the variable is given an attribute ‘_FillValue’.
Finally, for variables with “Serial” in their name that are effectively constant (i.e. all non-fill values are the same), they are promoted to global attributes.
- Parameters:
ds (xarray.Dataset) – The input dataset to be standardized.
- Returns:
The standardized dataset with updated variables and attributes.
- Return type:
xarray.Dataset
Notes
Variables with constant values are converted to global attributes and removed from the dataset.
Integer-like and float-like strings are converted to their respective numeric types.
Datetime strings are converted to pandas datetime64 format.
The function assumes the presence of a utilities.reformat_object_vars utility for handling object variables.
tools
- fetchAZA.tools.convert_units(ds, key='')[source]
Convert the units of variables in an xarray Dataset to preferred units. This is useful, for instance, to convert cm/s to m/s.
- Parameters:
(xarray.Dataset) (ds)
- Returns:
xarray.Dataset
- Return type:
The dataset with converted units.
- fetchAZA.tools.convert_units_var(var_values, current_unit, new_unit, unit_conversion={'Celsius': {'factor': 1, 'units_name': 'degrees_Celsius'}, 'Deg C': {'factor': 1, 'units_name': 'Celcius'}, 'Pa': {'factor': 0.0001, 'units_name': 'dbar'}, 'S m-1': {'factor': 0.1, 'units_name': 'mS cm-1'}, 'S/m': {'factor': 0.1, 'units_name': 'mS/cm'}, 'cm': {'factor': 0.01, 'units_name': 'm'}, 'cm s-1': {'factor': 0.01, 'units_name': 'm s-1'}, 'cm/s': {'factor': 0.01, 'units_name': 'm/s'}, 'dbar': {'factor': 10, 'units_name': 'kPa'}, 'deg': {'factor': 1, 'units_name': 'degrees'}, 'degrees_Celcius': {'factor': 1, 'units_name': 'Celsius'}, 'degrees_Celsius': {'factor': 1, 'units_name': 'Celsius'}, 'g m-3': {'factor': 0.001, 'units_name': 'kg m-3'}, 'kPa': {'factor': 0.1, 'units_name': 'dbar'}, 'kPa s-1': {'factor': 0.1, 'units_name': 'dbar s-1'}, 'kg m-3': {'factor': 1000, 'units_name': 'g m-3'}, 'km': {'factor': 1000, 'units_name': 'm'}, 'm': {'factor': 0.001, 'units_name': 'km'}, 'm s-1': {'factor': 100, 'units_name': 'cm s-1'}, 'm/s': {'factor': 100, 'units_name': 'cm/s'}, 'mS cm-1': {'factor': 10, 'units_name': 'S m-1'}, 'mS/cm': {'factor': 10, 'units_name': 'S/m'}})[source]
Convert the units of variables in an xarray Dataset to preferred units. This is useful, for instance, to convert cm/s to m/s.
- Parameters:
(xarray.Dataset) (ds)
(list) (preferred_units)
(dict) (unit_conversion)
string (Each key is a unit) –
‘factor’: The factor to multiply the variable by to convert it.
’units_name’: The new unit name after conversion.
with (and each value is a dictionary) –
‘factor’: The factor to multiply the variable by to convert it.
’units_name’: The new unit name after conversion.
- Returns:
xarray.Dataset
- Return type:
The dataset with converted units.
- fetchAZA.tools.process_datasets(data_path, file_root, deploy_date, recovery_date, keys=['KLR', 'INC', 'DQZ', 'TMP', 'PIES', 'AZAseq'])[source]
Processes datasets by loading, transforming, and combining data from multiple sources.
Steps performed: 1. Load netCDF datasets based on provided keys. 2. Convert units and adjust time formats. 3. Assign sampling time for the AZA sequence dataset. 4. Filter datasets to the deployment period. 5. Reindex datasets on time. 6. Rename variables in datasets using predefined mappings. 7. Add dataset-specific attributes to variables. 8. Combine selected datasets into a single dataset. 9. Interpolate the combined dataset to an evenly spaced time grid. 10. Clean and organize dataset attributes and variables. 11. Process the AZA sequence dataset, including renaming attributes and cleaning variables.
External utilities and tools such as readers, timetools, and utilities are used for specific operations. Attributes with certain prefixes or exact matches are removed from the combined dataset.
- Parameters:
data_path (str) – Path to the directory containing the dataset files.
file_root (str) – Root name of the files to be processed.
deploy_date (str or datetime) – Deployment start date for filtering the datasets.
recovery_date (str or datetime) – Recovery end date for filtering the datasets.
keys (list of str, optional) – List of dataset keys to process. Defaults to [‘KLR’, ‘INC’, ‘DQZ’, ‘TMP’, ‘PIES’, ‘AZAseq’].
- Returns:
ds_pressure (xarray.Dataset): Combined and interpolated dataset containing pressure-related data.
ds_AZA (xarray.Dataset): Processed AZA sequence dataset with cleaned attributes.
- Return type:
tuple
- fetchAZA.tools.reformat_units_var(ds, var_name, unit_format={'Deg C': 'Celsius', 'S/m': 'S m-1', 'cm/s': 'cm s-1', 'degrees_Celsius': 'Celsius', 'g/m^3': 'g m-3', 'hex': 'hexadecimal', 'kPa/Sec': 'kPa s-1', 'm/s': 'm s-1', 'meters': 'm', 's': 'seconds'})[source]
Renames units in the dataset based on the provided dictionary for OG1.
- Parameters:
(xarray.Dataset) (ds)
(dict) (unit_format)
- Returns:
xarray.Dataset
- Return type:
The dataset with renamed units.
writers
- fetchAZA.writers.delete_netcdf_datasets(data_path, file_root, keys=['KLR', 'DQZ', 'PIES', 'TMP', 'INC'])[source]
Delete netCDF files matching the given file_root and optional keys from the specified data_path.
- Parameters:
data_path (str) – The directory path where the netCDF files are located.
file_root (str) – The root name of the files to match.
keys (list of str, optional) – A list of keys to filter the files. Only files containing these keys in their names will be deleted. If None, all matching files will be deleted.
- Returns:
The number of files successfully deleted.
- Return type:
int
- fetchAZA.writers.save_dataset(ds, output_file='../data/test.nc', overwrite=None)[source]
Attempts to save the dataset to a NetCDF file. If a TypeError occurs due to invalid attribute values, it converts the invalid attributes to strings and retries the save operation.
- Parameters:
(xarray.Dataset) (ds)
(str) (output_file)
(bool (overwrite) – If None, prompt the user for input.
optional) (If True, overwrite existing files. If False, skip existing files.) – If None, prompt the user for input.
- Returns:
bool
- Return type:
True if the dataset was saved successfully, False otherwise.
Note
- fetchAZA.writers.save_datasets(data_sets_new, input_fn, overwrite=None)[source]
Save multiple datasets to NetCDF files with filenames derived from the input filename.
- Parameters:
data_sets_new (dict) – A dictionary where keys are dataset identifiers (e.g., strings) and values are the corresponding datasets to be saved.
input_fn (str) – The input filename used as a base to generate output filenames.
overwrite (bool, optional) – If True, overwrite existing files. If False, skip existing files. If None, prompt the user for input.
Notes
Iterates through the data_sets_new dictionary.
For each dataset, constructs an output filename by appending the dataset key to the base name of input_fn and adding the .nc extension.
Calls save_dataset to save each dataset to the corresponding output file.
Prints the key of each dataset being processed.
If saving a dataset fails, prints an error message indicating the failure.
Assumes the existence of a save_dataset function that handles the actual saving of datasets to files.
plotters
- fetchAZA.plotters.compare_pressure(ds_pressure, keys, STN)[source]
Compare pressure data between multiple time series in a dataset over a specified time range.
This function interpolates the pressure data from the specified time series onto a common time axis, calculates the differences between them, and plots the pressure data and their differences.
- Parameters:
ds_pressure (xarray.Dataset) – The dataset containing pressure data. It must have a time coordinate (‘TIME’, ‘SAMPLE_TIME’, or ‘RECORD_TIME’) and the specified variable for each key in keys.
keys (list of str) – A list of strings identifying the pressure time series to compare in the dataset.
variable (str) – The name of the variable in the dataset to compare (e.g., ‘PRESSURE’).
deploy_date (str, optional) – The deployment date in ‘YYYY-MM-DD’ format. Data before this date will be excluded. Default is ‘2000-01-01’.
recovery_date (str, optional) – The recovery date in ‘YYYY-MM-DD’ format. Data after this date will be excluded. Default is ‘2099-01-01’.
- Returns:
fig (matplotlib.figure.Figure) – The figure object containing the plots.
axs (numpy.ndarray) – An array of axes objects corresponding to the subplots.
- fetchAZA.plotters.get_record_type(key)[source]
Retrieve the record type description for a given key or a list of keys.
- Parameters:
key (str or list) – The key or list of keys to look up in the record_types dictionary.
- Returns:
The record type description if the key is found, otherwise None.
- Return type:
str or None
- fetchAZA.plotters.plot_AZA_pressure(ds_AZA, variables, demean=False)[source]
Plot a comparison of pressure-related variables within a single dataset over a specified time range.
- Parameters:
ds_AZA (xarray.Dataset) – The dataset containing the variables to plot.
variables (list) – List of variable names (strings) to plot from the dataset.
start_date (str, optional) – Start date for the time range to filter the data (format: ‘YYYY-MM-DD’). Defaults to None.
end_date (str, optional) – End date for the time range to filter the data (format: ‘YYYY-MM-DD’). Defaults to None.
demean (bool, optional) – If True, demean the data by subtracting the mean of each variable before plotting. Defaults to False.
- Returns:
fig (matplotlib.figure.Figure) – Figure object containing the plots.
axs (numpy.ndarray) – Array of axes objects corresponding to the subplots.
Notes
If the number of data points in a variable is less than or equal to 20, the data points are plotted with symbols.
The function automatically adjusts the number of columns in the subplot grid based on the number of variables.
Unused subplot axes are removed for cleaner visualization.
The y-axis is inverted for all plots.
Units for each variable are extracted from the dataset attributes and displayed on the y-axis label.
- fetchAZA.plotters.plot_age_values(ds_AZA, fig_path, time_var='SAMPLE_TIME', fig=None, ax=None)[source]
Plots AGE values for AZA and AZS datasets.
- Parameters:
ds_AZA (xarray.Dataset) – The AZA dataset.
age_values_aza (xarray.DataArray) – AGE values for AZA.
age_values_azs (xarray.DataArray) – AGE values for AZS.
STN (str) – Station identifier.
fig_path (str) – Path to save the figure.
time_var (str) – Time variable name. Default is ‘SAMPLE_TIME’.
ax (matplotlib.axes._axes.Axes, optional) – Axes to plot into. If None, a new figure and axes are created.
- Returns:
The axes containing the plot.
- Return type:
matplotlib.axes._axes.Axes
- fetchAZA.plotters.plot_all_variables(ds_pressure, STN, data_path=None)[source]
Plot and save figures for all variables in a given xarray dataset.
- Parameters:
ds_pressure (xarray.Dataset) – The dataset containing the variables to plot.
STN (str) – The station identifier.
data_path (str, optional) – The directory path where the figures will be saved. If None, the figures will be displayed instead.
- Return type:
None
Notes
This function cycles through all numeric variables in the dataset and creates time series plots.
Each figure contains up to 6 variables, and the figures are saved with filenames that include the station identifier and part number.
- fetchAZA.plotters.plot_data_type(data_sets_new, data_type, STN, deploy_date, recovery_date, data_path=None)[source]
Plot and save figures for a specified data type.
- Parameters:
data_type (str) – The type of data to plot.
STN (str) – The station identifier.
last_4_digits (str) – The last four digits to include in the filename.
- Return type:
None
Notes
This function checks if the specified data type exists in the data_sets_new dictionary.
If the data type exists, it selects the dataset for the given time range (deploy_date to recovery_date), identifies numeric variables, and creates figures to plot these variables over time.
Each figure contains up to 4 variables, and the figures are saved with filenames that include the data type, station identifier, and part number.
- fetchAZA.plotters.plot_hist_fig(data_sets_new, types_to_plot, deploy_date, recovery_date, data_path, STN)[source]
Generate and save histogram plots for specified data types within a given time range.
This function creates histograms for numeric variables in the provided datasets, filtered by the deployment and recovery dates. Each histogram is saved as an image file in the specified data path.
- Parameters:
data_sets_new (dict) – A dictionary containing xarray datasets, where keys are data types and values are the corresponding datasets.
types_to_plot (list) – A list of data types to generate histograms for.
deploy_date (str or datetime) – The start date for filtering the datasets.
recovery_date (str or datetime) – The end date for filtering the datasets.
data_path (str) – The directory path where the histogram images will be saved. If None, the images will not be saved.
STN (str) – The station identifier to include in the plot titles and saved filenames.
- Return type:
None
Notes
The function assumes that the datasets are xarray objects with a ‘TIME’ dimension.
Only numeric variables in the datasets are considered for plotting.
If a variable has a timedelta64 dtype, it is converted to seconds before plotting.
The function creates subplots with up to 12 histograms per figure.
If there are fewer variables than subplot slots, the extra slots are removed.
The mean of each variable is marked on the histogram with a red dashed line.
The saved filenames follow the format: ‘{STN}_{data_type}_histograms.png’.
Example
- plot_hist_fig(
data_sets_new=my_datasets, types_to_plot=[‘type1’, ‘type2’], deploy_date=’2023-01-01’, recovery_date=’2023-12-31’, data_path=’/path/to/save’, STN=’Station123’
)
- fetchAZA.plotters.plot_histograms(ds, STN)[source]
Plot histograms of all data variables in the dataset.
This function creates a grid of histograms showing the distribution of values for each data variable in the dataset. The histograms are arranged in a grid with up to 4 columns, and unused subplot positions are removed.
- Parameters:
ds (xarray.Dataset) – The dataset containing variables to plot. All data variables in the dataset will have histograms created.
STN (str) – Station identifier that will be displayed in the top right corner of the figure.
- Returns:
The function displays the plot but does not return any values.
- Return type:
None
Notes
NaN values are automatically excluded from the histograms.
Each histogram uses 30 bins by default.
The grid layout uses up to 4 columns with the number of rows calculated based on the number of variables.
Unused subplot axes are removed for cleaner visualization.
- fetchAZA.plotters.plot_pressure_diff_and_age(ds, key1, key2, seqnum)[source]
Plots the pressure difference, AGE, and temperatures for specified sequences in a dataset.
- Parameters:
ds (xarray.Dataset) – The dataset containing the data to be plotted. It should include variables such as ‘SEQUENCE_NUM’, ‘SAMPLE_TIME’, ‘AGE’, ‘AMBIENT_TEMPERATURE’, and ‘TRANSFER_TEMPERATURE’.
key1 (str) – The name of the first pressure variable in the dataset.
key2 (str) – The name of the second pressure variable in the dataset.
seqnum (list of int) – A list of sequence numbers to filter and plot.
- Returns:
fig (matplotlib.figure.Figure) – The figure object containing the plots.
axs (numpy.ndarray of matplotlib.axes._subplots.AxesSubplot) – The array of axes objects for the subplots.
Notes
The function calculates the difference between the two pressure variables (key1 and key2) and plots it against time for each sequence in seqnum.
The AGE variable is converted to seconds and plotted against time.
Ambient and transfer temperatures are plotted on the third subplot.
The x-axis is formatted to display time in 2-month intervals.
The temperature subplot’s y-axis limits are determined based on the median and standard deviation of the combined temperature data.
The function assumes that the dataset variables have attributes standard_name and units for proper labeling.
A station identifier (STN) is expected to be available in the global scope for annotation.
- Raises:
KeyError: – If any of the required variables (key1, key2, ‘AGE’, ‘AMBIENT_TEMPERATURE’, ‘TRANSFER_TEMPERATURE’, or ‘SEQUENCE_NUM’) are missing from the dataset.
- fetchAZA.plotters.plot_pressure_sequence(ds, key1, STN)[source]
Plot pressure data for different sequence numbers with comparisons and differences.
This function creates a three-panel plot showing pressure data for different sequence numbers. It compares AZA (sequences 2 and 4) with AZS (sequences 1 and 5) data and displays the differences between paired sequences.
- Parameters:
ds (xarray.Dataset) – The dataset containing pressure data with ‘SEQUENCE_NUM’ and ‘SAMPLE_TIME’ coordinates, and the specified pressure variable.
key1 (str) – The name of the pressure variable to plot (e.g., ‘AMBIENT_PRESSURE’). Must be present in the dataset with ‘units’, ‘standard_name’, and ‘long_name’ attributes.
STN (str) – Station identifier for annotation (currently unused in the function).
- Returns:
fig (matplotlib.figure.Figure) – The figure object containing all three subplots.
axs (numpy.ndarray) – Array of three axes objects corresponding to the subplots.
Notes
The function filters data for sequence numbers 1, 2, 4, and 5.
Left panel: Comparison of sequences 1 (AZS) and 2 (AZA).
Middle panel: Comparison of sequences 4 (AZA) and 5 (AZS).
Right panel: Differences between paired sequences (2-1 and 5-4).
AZA sequences are shown in blue solid lines, AZS in red dashed lines.
The function expects the variable to have ‘units’, ‘standard_name’, and ‘long_name’ attributes.
- fetchAZA.plotters.plot_temperature_variables(ds_pressure, keys, STN, fig=None, axs=None)[source]
Plot temperature variables from a dataset against time.
This function creates time series plots of temperature variables, automatically determining the appropriate time coordinate and creating subplots for each specified variable key.
- Parameters:
ds_pressure (xarray.Dataset) – The dataset containing temperature variables. Should have a time coordinate such as ‘RECORD_TIME’ or ‘TIME’.
keys (str or list of str) – A single key (str) or a list of keys to match variable names for plotting. Each key will generate a separate subplot.
STN (str) – Station identifier for annotation on the plots.
fig (matplotlib.figure.Figure, optional) – Existing figure to plot into. If None, a new figure is created. Default is None.
axs (numpy.ndarray or matplotlib.axes._axes.Axes, optional) – Existing axes to plot into. If None, new axes are created. Default is None.
- Returns:
fig (matplotlib.figure.Figure) – The figure object containing the plots.
axs (numpy.ndarray or list) – The array or list of axes objects corresponding to the subplots.
Notes
The function prioritizes ‘TIME’ over ‘RECORD_TIME’ as the time coordinate.
If neither is found, it uses the first coordinate in the dataset.
Each key gets its own subplot with shared x-axis for time comparison.
- fetchAZA.plotters.plot_temperatures(ds, fig=None, ax=None, ylim=None)[source]
Plot transfer and ambient temperature time series from a dataset.
This function creates a time series plot showing both transfer temperature and ambient temperature variables from the dataset. The y-axis limits are automatically calculated based on the data range (mean ± 2 standard deviations) unless specified.
- Parameters:
ds (xarray.Dataset) – The dataset containing temperature variables. Must have ‘TRANSFER_TEMPERATURE’, ‘AMBIENT_TEMPERATURE’, and ‘SAMPLE_TIME’ variables, and a ‘Station’ attribute.
fig (matplotlib.figure.Figure, optional) – Existing figure to plot into. If None, a new figure is created. Default is None.
ax (matplotlib.axes._axes.Axes, optional) – Existing axes to plot into. If None, new axes are created. Default is None.
ylim (tuple of float, optional) – Y-axis limits as (lower_limit, upper_limit). If None, limits are automatically calculated from the data. Default is None.
- Returns:
fig (matplotlib.figure.Figure) – The figure object containing the plot.
ax (matplotlib.axes._axes.Axes) – The axes object containing the plot.
Notes
The function expects specific variable names: ‘TRANSFER_TEMPERATURE’, ‘AMBIENT_TEMPERATURE’, and ‘SAMPLE_TIME’.
Y-axis limits are auto-calculated as mean ± 2 standard deviations if not provided.
Station information is displayed in the bottom right corner of the plot.
X-axis labels are rotated 45 degrees for better readability.
- fetchAZA.plotters.show_attributes(data)[source]
Extract and display attribute information from an xarray Dataset or a netCDF file.
- Parameters:
data (str or xr.Dataset) – The input data, either a file path to a netCDF file or an xarray Dataset.
- Returns:
A DataFrame containing the following columns: - Attribute: The name of the attribute. - Value: The value of the attribute. - DType: The data type of the attribute value.
- Return type:
pandas.DataFrame
Notes
If the input is a file path, the function reads the attributes from the netCDF file.
If the input is an xarray Dataset, the function reads the attributes directly from the Dataset.
- fetchAZA.plotters.show_contents(data, content_type='variables')[source]
Display the contents of an xarray Dataset or a netCDF file.
- Parameters:
data (str or xr.Dataset) – The input data, either a file path to a netCDF file or an xarray Dataset.
content_type (str) – The type of content to display. Options are: - ‘variables’ or ‘vars’: Show details about the variables. - ‘attributes’ or ‘attrs’: Show details about the attributes. Default is ‘variables’.
- Returns:
A styled DataFrame with details about the variables or attributes.
- Return type:
pandas.io.formats.style.Styler or pandas.DataFrame
- fetchAZA.plotters.show_variables(data)[source]
Extract variable information from an xarray Dataset or a netCDF file.
- Parameters:
data (str or xr.Dataset) – The input data, either a file path to a netCDF file or an xarray Dataset.
- Returns:
A styled DataFrame containing the following columns: - dims : str
The dimension of the variable (or “string” if it is a string type).
- namestr
The name of the variable.
- unitsstr
The units of the variable (if available).
- commentstr
Any additional comments about the variable (if available).
- Return type:
pandas.io.formats.style.Styler
- fetchAZA.plotters.show_variables_by_dimension(data, dimension_name='trajectory')[source]
Extract variable information filtered by a specific dimension from an xarray Dataset or a netCDF file.
- Parameters:
data (str or xr.Dataset) – The input data, either a file path to a netCDF file or an xarray Dataset.
dimension_name (str) – The name of the dimension to filter variables by.
- Returns:
A styled DataFrame containing the following columns: - dims : str
The dimension of the variable (or “string” if it is a string type).
- namestr
The name of the variable.
- unitsstr
The units of the variable (if available).
- commentstr
Any additional comments about the variable (if available).
- Return type:
pandas.io.formats.style.Styler
utilities
- fetchAZA.utilities.convert_float_to_int(ds)[source]
Converts float variables in an xarray dataset to integer type if all values are sufficiently close to their integer representation.
- Parameters:
ds (xarray.Dataset) – An xarray dataset containing variables to be checked and potentially converted.
- Returns:
The dataset is modified in place, with applicable variables converted to integer type.
- Return type:
None
- fetchAZA.utilities.convert_type(value)[source]
Convert a string value to an appropriate type.
This function attempts to convert a string value to an integer or a float. If the value contains a leading zero and is entirely numeric, it is returned as a string to preserve the leading zero. If the value contains a decimal point, it is converted to a float. If the value is numeric without a decimal point, it is converted to an integer. If the conversion to an integer or float fails, the value is returned as a string.
- Parameters:
value (str) – The string value to be converted.
- Returns:
The converted value as an integer, float, or string.
- Return type:
int, float, or str
- fetchAZA.utilities.find_best_dtype(var_name, da)[source]
Determines the most appropriate data type for a given variable based on its name and data array.
Rules
If the variable name contains “latitude” or “longitude” (case-insensitive), return np.double.
If the variable name ends with “qc” (case-insensitive), return np.int8.
If the variable name contains “time” (case-insensitive), return the input data type.
If the variable name ends with “raw” or the input data type is an integer: - If the maximum value in the data array is less than 2^15, return np.int16. - If the maximum value in the data array is less than 2^31, return np.int32.
If the input data type is np.float64, return np.float32.
Otherwise, return the input data type.
- param var_name:
The name of the variable. This is used to infer the type based on naming conventions.
- type var_name:
str
- param da:
The data array containing the variable’s values and its current data type.
- type da:
xarray.DataArray
- returns:
The recommended data type for the variable.
- rtype:
numpy.dtype
- fetchAZA.utilities.netcdf_compliancer(ds)[source]
Check for variables and attributes with empty string values in the dataset.
- Parameters:
ds (xarray.Dataset) – The dataset to check.
- Return type:
None
- fetchAZA.utilities.reformat_object_vars(data)[source]
Fix variables with mixed data types in xarray datasets or a single xarray dataset.
This function processes variables with mixed data types (e.g., object type) by attempting to convert them to a single consistent type (e.g., float, int, or string). It removes common fill values (‘0’) and checks the remaining values to determine the appropriate type. If conversion to integer is not possible, the variable is converted to a string.
- Parameters:
data (dict or xarray.Dataset) – A dictionary of xarray datasets or a single xarray dataset.
- Returns:
A dictionary of modified xarray datasets or a single modified xarray dataset with variables of type object converted to numeric (if possible) or string (otherwise).
- Return type:
dict or xarray.Dataset
- fetchAZA.utilities.set_best_dtype(ds)[source]
Adjusts the data types of variables in an xarray Dataset to optimize memory usage while preserving data integrity. The function evaluates each variable’s current data type and determines a more efficient data type, if applicable. It also updates attributes like valid_min and valid_max to match the new data type.
- Parameters:
ds (xarray.Dataset) – The input dataset whose variables’ data types will be evaluated and potentially adjusted.
- Returns:
A new dataset with optimized data types for its variables, potentially saving memory space.
- Return type:
xarray.Dataset
Notes
If a variable’s data type is changed to an integer type, NaN values are replaced with a fill value, and the _FillValue encoding is updated accordingly.
Logs the percentage of memory saved due to data type adjustments.
Relies on the helper functions find_best_dtype and set_fill_value to determine the optimal data type and appropriate fill value, respectively.
Logs debug messages for each variable whose data type is changed, including the original and new data types.
Logs an info message summarizing the overall memory savings.
Raises: Assumes that find_best_dtype and set_fill_value are defined elsewhere in the codebase.
Raises: Assumes that _log is a configured logger available in the current scope.
- fetchAZA.utilities.set_fill_value(new_dtype)[source]
Calculate and return the maximum fill value for a given numeric data type. The function extracts the bit-width of the provided data type (e.g., int32, uint16) and computes the maximum value that can be represented by that type, assuming it is a signed integer type.
- Parameters:
new_dtype (str) – A string representation of the data type (e.g., ‘int32’, ‘uint16’).
- Returns:
The maximum fill value for the given data type.
- Return type:
int
- Raises:
ValueError – If the bit-width cannot be extracted from the provided data type string.: