Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dolfyn cleaning function updates #354

Open
wants to merge 16 commits into
base: develop
Choose a base branch
from

Conversation

jmcvey3
Copy link
Contributor

@jmcvey3 jmcvey3 commented Sep 18, 2024

Renames and makes the functions that calculate water depth and remove surface interference more robust. The functions were designed around Nortek ADCPs but didn't take into account the calculations RDI already does. Errors or warnings will guide the user through the proper use of functions if previous calculations were already completed or data is missing.

Also removes the "ignore rankwarning" as some users may not want to do so, and updates future package deprecations.

Solves the problems Calum and I found in Issue #353. Tests were updated as appropriate.

Also added some fixes for determining beam angle and carrier frequency attributes for Sentinel V ADCPs.

@ssolson ssolson mentioned this pull request Oct 28, 2024
@ssolson ssolson self-requested a review October 29, 2024 20:10
@ssolson ssolson added the Clean Up Improve code consistency and readability label Oct 29, 2024
@ssolson
Copy link
Contributor

ssolson commented Oct 29, 2024

@jmcvey3 is this ready for review?

@ssolson ssolson linked an issue Oct 29, 2024 that may be closed by this pull request
@jmcvey3
Copy link
Contributor Author

jmcvey3 commented Oct 29, 2024

@jmcvey3 is this ready for review?

It is. I've also done some serious reorganizing of the I/O files in an attempt to make them easier to follow, and have added a few random fixes for random hiccups. The fix for 535 is in the first couple commits.

@ssolson
Copy link
Contributor

ssolson commented Oct 30, 2024

Thanks James. I plan to tackle this Monday.

Copy link
Contributor

@ssolson ssolson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jmcvey3 thanks for fixing the bug and improving the modularity of the dolfyn module.

I was adding docstrings to what I thought were new functions but realized they were primarily moved functions in our meeting yesterday. As such I will finish improving docstrings when we get to linting this module in the future.

That said could you review the proposed docstrings I already did?

Comment on lines 14 to 16
"""
Fetches or calculates range offset from dataset attributes.
"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"""
Determines the range offset from the dataset attributes. The function 
first checks if specific attributes are present in the dataset (`ds`) 
and calculates the range offset accordingly. If the attribute `h_deploy` 
is found, it is renamed to `range_offset` with a deprecation warning.

Parameters
----------
ds : xarray.Dataset
    Dataset containing attributes used to calculate the range offset.

Returns
-------
float
    The calculated or retrieved range offset. Returns 0 if no 
    relevant attributes are present.

Raises
------
DeprecationWarning
    Warns that the attribute `h_deploy` is deprecated and has been 
    renamed to `range_offset`.
"""



def set_range_offset(ds, h_deploy):
def __check_for_range_offset(ds):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know we haven't done typehints here yet but we will eventually. Wouldn't hurt to start with new functions now.

def __check_for_range_offset(ds: "xarray.Dataset") -> float:

Comment on lines 95 to 96
def find_surface(*args, **kwargs):
warnings.warn(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets include a docstring to catch the deprecation in our documentation/ display in IDEs.

    """
    Deprecated function. Use `water_depth_from_amplitude` instead.
    """

Comment on lines 183 to 184
def find_surface_from_P(*args, **kwargs):
warnings.warn(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Include docstring deprecation for documentation/ IDEs

    """
    Deprecated function. Use `water_depth_from_pressure` instead.
    """

Comment on lines 282 to 283
def nan_beyond_surface(*args, **kwargs):
warnings.warn(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Include deprecation docstring

    Deprecated function. Use `remove_surface_interference` instead.

@@ -1352,14 +732,16 @@ def check_offset(self, offset, readbytes):
fd.seek(4 + self._fixoffset, 1)

def remove_end(self, iens):
"""Remove uncompleted measurements"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"""
Removes incomplete measurements from the dataset.

This method cleans up any partially read data by truncating measurements 
to the specified ensemble index (`iens`). This is typically called upon 
reaching the end of the file to ensure only complete data is retained.

Parameters
----------
iens : int
    The index up to which data is considered complete and should be retained.
"""


def save_profiles(self, dat, nm, en, iens):
ds = defs._get(dat, nm)
"""Reformat and save profile measurements"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"""
Reformats and saves profile measurements to the dataset.

This method processes and saves profile data from individual measurements, 
adapting to changing cell counts and cell sizes as needed. It appends 
the processed profile data to the specified location in the dataset.

Parameters
----------
dat : dict
    The dataset dictionary where profile measurements are stored.
nm : str
    The name of the measurement variable being saved.
en : dict
    A dictionary containing ensemble data with measurement arrays for each variable.
iens : int
    The index of the current ensemble, indicating where to save data in the profile.

Returns
-------
dict
    The updated dataset dictionary with the reformatted and saved profile data.
"""

Comment on lines 785 to 786
"""Clean up recorded data, particularly variablem cell sizes
and profile ranges."""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"""
Cleans up recorded data by adjusting variable cell sizes and profile ranges.

This method handles adjustments when cell sizes change during data collection, 
performing depth-bin averaging for smaller cells if needed. It also updates 
the configuration data, range coordinates, and manages any surface layer profiles.

Parameters
----------
dat : dict
    The dataset dictionary containing data variables and coordinates to be cleaned up.
cfg : dict
    Configuration dictionary, which is updated with cell size, range, and additional 
    attributes after cleanup.

Returns
-------
tuple
    - dict : The updated dataset dictionary with cleaned data.
    - dict : The updated configuration dictionary with new attributes.
"""

"""
Reshape the array `arr` to shape (...,n,n_bin).
"""
"""Reshape the array `arr` to shape (...,n,n_bin)."""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"""
Reshapes the input array `arr` to a shape of (..., n, n_bin).

This method reshapes the last dimension of `arr` into two dimensions, where 
the last two dimensions become `(n, n_bin)`.

Parameters
----------
arr : np.ndarray
    The array to reshape. The last dimension of `arr` will be divided into 
    `(n, n_bin)` based on the value of `n_bin`.
n_bin : int or float, optional
    The bin size for reshaping. If `n_bin` is an integer, it divides the 
    last dimension directly. If not, indices are adjusted, and excess 
    bins may be removed. Default is None.

Returns
-------
np.ndarray
    The reshaped array with shape (..., n, n_bin).
"""

"""
Remove the attributes from the data that were never loaded.
"""
"""Remove the attributes from the data that were never loaded."""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"""
This method cleans up the dataset by removing any attributes that were
defined but not loaded, updates configuration attributes, and sets the
sampling frequency (fs) based on the data source program. Additionally,
it adjusts the axes of certain variables as defined in data_defs.

Parameters
----------
dat : dict
    The dataset dictionary to be finalized. This dictionary is modified 
    in place by removing unused attributes, setting configuration values 
    as attributes, and calculating `fs`.

Returns
-------
dict
    The finalized dataset dictionary with cleaned attributes and added metadata.
"""

@jmcvey3
Copy link
Contributor Author

jmcvey3 commented Nov 14, 2024

@jmcvey3 thanks for fixing the bug and improving the modularity of the dolfyn module.

I was adding docstrings to what I thought were new functions but realized they were primarily moved functions in our meeting yesterday. As such I will finish improving docstrings when we get to linting this module in the future.

That said could you review the proposed docstrings I already did?

Yep, and sounds like a plan

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Clean Up Improve code consistency and readability
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Error from api.clean.find_surface_from_P(ds, salinity=XX)
2 participants