Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support other filename prefixes than 'D' #291

Closed
nordam opened this issue Sep 18, 2024 · 5 comments
Closed

Support other filename prefixes than 'D' #291

nordam opened this issue Sep 18, 2024 · 5 comments
Assignees
Labels
minor new feature intended for changes that require bumping MINOR version. New features that are backwards-compatible. wontfix This will not be worked on

Comments

@nordam
Copy link
Collaborator

nordam commented Sep 18, 2024

No description provided.

@emlynjdavies emlynjdavies added the minor new feature intended for changes that require bumping MINOR version. New features that are backwards-compatible. label Sep 18, 2024
@emlynjdavies
Copy link
Collaborator

Longer-term plan is to implement a silcam data structure where the timestamp is contained in metadata, rather than the filename (see #288).

Instead we will better document how the current accepted filename format (see #294)

@emlynjdavies emlynjdavies closed this as not planned Won't fix, can't repro, duplicate, stale Sep 19, 2024
@emlynjdavies emlynjdavies added the wontfix This will not be worked on label Sep 19, 2024
@nordam nordam reopened this Oct 11, 2024
@nordam
Copy link
Collaborator Author

nordam commented Oct 11, 2024

Sorry to nag on this, but how about changing the date parser function like shown below. This uses what I think is a fairly robust regex to find a sub-string in the filename that matches the date format (and only if the default approach of removing the first letter didn't work).


import re
from pandas._libs.tslibs.parsing import DateParseError


def timestamp_from_filename(filename):
    '''get a pandas timestamp from a silcam filename

    Parameters
    ----------
    filename (string): silcam filename (.silc)

    Returns
    -------
    timestamp: timestamp
        timestamp from pandas.to_datetime()
    '''

    # get the timestamp of the image (in this case from the filename)
    try:
        # Default approach: Skip the first letter, parse the rest
        timestamp = pd.to_datetime(os.path.splitext(os.path.basename(filename))[0][1:])
    except DateParseError as e:
        # Backup-approach, try looking for a standard datestring,
        # assuming we are still in the 21st century, i.e. that datestring
        # is on the form 20yymmddThhmmss.ffffff
        # The regex below will match 20 followed by 6 digits followed by T
        # followed by six digits followed by . followed by any number of digits
        datestrings = re.findall(r'20[0-9]{6}T[0-9]{6}\.[0-9]*', filename)
        # findall returns a list, only proceed if exactly 1 matching string was found
        if len(datestrings) != 1:
            logger = logging.getLogger()
            logger.error(f'Default date parsing failed with error: {e}')
            logger.error(f'Fallback dataparser could not uniquely identify a single datestring')
            raise
        else:
            timestamp = pd.to_datetime(datestrings[0])

    return timestamp

@emlynjdavies
Copy link
Collaborator

Yea, this could be an approach that enhances the functionality without complicating the configs and input arguments from the user side - although it does still assume a specific datestring.

I suppose another alternative is to implement an optional input that defines the string format, which could be passed directly to the format optional argument in pd.to_datetime()

@nordam
Copy link
Collaborator Author

nordam commented Oct 11, 2024

It makes the same assumptions about the datestring as the default, since also the modified code uses pd.to_datetime to create the timestamp. The only change is that the datestring can appear elsewhere in the filename.

(Well, almost the same. I added the assumption that the year would start with 20, to make it less likely to find spurious matches. Although spurious matches would only be found if the filename contained six digits, a T, six more digits, a dot, and some more digits, for some other reason than representing the timestamp. Which sounds unlikely to begin with).

@nordam
Copy link
Collaborator Author

nordam commented Oct 11, 2024

Raymond says no.

@nordam nordam closed this as completed Oct 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
minor new feature intended for changes that require bumping MINOR version. New features that are backwards-compatible. wontfix This will not be worked on
Projects
None yet
Development

When branches are created from issues, their pull requests are automatically linked.

2 participants