Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Fix auto-reload with globs in IGNORE_FILES #3441

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

yashaslokesh
Copy link

@yashaslokesh yashaslokesh commented Jan 2, 2025

The default IGNORE_FILES value of '.*' was being compiled to a regular expression and then passed into watchfiles.DefaultFilter to filter out files when autoreload is enabled.

This commit compares the patterns from IGNORE_FILES directly with the filename to match the usage of fnmatch elsewhere in the codebase. The new filtering class will continue working as expected for custom IGNORE_FILES settings.

Pull Request Checklist

Resolves: #3431

  • Ensured tests pass and (if applicable) updated functional test output
  • Conformed to code style guidelines by running appropriate linting tools
  • Added tests for changed code
  • Updated documentation for changed code

I didn't update any documentation because the current instructions for IGNORE_FILES are correct, this setting is still expected to be a list of glob patterns.

Edit: January 14th - Wrote documentation to further explain the IGNORE_FILES setting and the new default value

Also tested locally in addition to adding unit tests, both with the default settings and with IGNORE_FILES = ['*.md']

  • Changes to content and theme files matching the pattern did not trigger an auto-reload
  • Changes to content and theme files NOT matching the pattern triggered an auto-reload.

@justinmayer justinmayer requested a review from a team January 4, 2025 09:25
Copy link
Contributor

@boxydog boxydog left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Love that this has unit tests, which give a concrete idea of what the behavior should be.

I don't love that the behavior seems a bit hard for me to understand, though it could just be me.

self.ignore_file_patterns = ignore_file_patterns

def __call__(self, change: watchfiles.Change, path: str) -> bool:
return super().__call__(change, path) and not any(
Copy link
Contributor

@boxydog boxydog Jan 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand this logic. Is it something like:

"Ignore the file if it regex-matches any pattern in ignore_file_patterns, unless the basename of the file also fn-matches any pattern in ignore_file_patterns"?

That seems quite subtle to me, i.e. a bit hard to understand.

I could be wrong, though, and it's not obvious to me what the simple solution is.

@yashaslokesh
Copy link
Author

ah, I can add more comments describing the purpose of the filter class and how it's used with the watchfiles functions

For a brief summary for you, fnmatch is used throughout the code for matching filename/filepath strings with glob patterns. The watchfiles requires a list of regexes to be used instead of glob patterns. To avoid confusion or any incompatibilities with this conversion, I decided to skip the regex conversion (fnmatch.translate is not used) and just use fnmatch.fnmatch in the code

The filter class is setup to return True if a file should be processed, i.e. if it is not ignored

The design for the watchfiles module is a little convoluted. As for why the unit tests are looping over the watchfiles.Change enum, this is to ensure that files are processed on addition, deletion, AND modification.

@boxydog
Copy link
Contributor

boxydog commented Jan 5, 2025

ah, I can add more comments describing the purpose of the filter class and how it's used with the watchfiles functions

For a brief summary for you, fnmatch is used throughout the code for matching filename/filepath strings with glob patterns. The watchfiles requires a list of regexes to be used instead of glob patterns. To avoid confusion or any incompatibilities with this conversion, I decided to skip the regex conversion (fnmatch.translate is not used) and just use fnmatch.fnmatch in the code

The filter class is setup to return True if a file should be processed, i.e. if it is not ignored

The design for the watchfiles module is a little convoluted.

Thanks for your reply. Sounds like "just match with unix globs instead of regexes", which is pretty easy to understand and use. If that's true, then great.

Sorry I am slow, but why call super().__call__(change, path) if you're just replacing the behavior with fnmatch? Could you just call fnmatch?

As for why the unit tests are looping over the watchfiles.Change enum, this is to ensure that files are processed on addition, deletion, AND modification.

I don't think I asked about this?, but anyway I was pretty happy with the unit tests.

@yashaslokesh
Copy link
Author

yashaslokesh commented Jan 5, 2025

I'm doing the super call to leave in the sensible defaults set by watchfiles.DefaultFilter, you can see the lists of those here, https://watchfiles.helpmanual.io/api/filters/#watchfiles.DefaultFilter.ignore_dirs

I was debating whether to keep the super call and just use fnmatch for the filter. Only using fnmatch gives more power to the Pelican user to decide which files to ignore at the cost of having to write a longer IGNORE_FILES list

I don't think I asked about this?, but anyway I was pretty happy with the unit tests.

Sorry, was elaborating to get ahead of possible questions or concerns from other contributors 😊

@boxydog
Copy link
Contributor

boxydog commented Jan 6, 2025

I'm doing the super call to leave in the sensible defaults set by watchfiles.DefaultFilter, you can see the lists of those here, https://watchfiles.helpmanual.io/api/filters/#watchfiles.DefaultFilter.ignore_dirs

I was debating whether to keep the super call and just use fnmatch for the filter. Only using fnmatch gives more power to the Pelican user to decide which files to ignore at the cost of having to write a longer IGNORE_FILES list

I don't think I asked about this?, but anyway I was pretty happy with the unit tests.

Sorry, was elaborating to get ahead of possible questions or concerns from other contributors 😊

This all sounds reasonable to me. I would put it in code comments, as you suggested, and docs, so a casual reader knows what's going on.

They both should say that the format is "unix glob" format plus commonly hidden files (with a link to the default for filewatcher).

@justinmayer
Copy link
Member

@yashaslokesh: Any chance you could add the comments @boxydog mentioned? I agree that would help make things clearer, and I would like to release a new version of Pelican in a day or two if possible.

@yashaslokesh
Copy link
Author

hi! yes, sorry, got sidetracked at work for a few days. I found a way to improve my implementation as well, I will try to get these new changes with the comments out today

…h Unix shell wildcards

The default `IGNORE_FILES` value of `'.*'` was being compiled to a regular expression and then passed into `watchfiles.DefaultFilter` to filter out files when autoreload is enabled.

This commit compares the patterns from `IGNORE_FILES` directly with the filename to match the usage of `fnmatch` elsewhere in the codebase. The new filtering class will continue working as expected for custom `IGNORE_FILES` settings.
@yashaslokesh
Copy link
Author

yashaslokesh commented Jan 14, 2025

@justinmayer Fixed the IGNORE_FILES setting to ignore hidden files in all directories, wrote more comprehensive testing, updated documentation and comments with these new changes, and squashed commits.

My previous changes compared the IGNORE_FILES patterns only to the basename of the file path, but this won't allow the user to use Unix globs to their full potential. I made the change to compare against the full file path using os.path.abspath(path) and this also required changing the IGNORE_FILES default value to ignore all hidden files in any directories, from ['.*'] -> ['**/.*'].

@pauloxnet
Copy link
Member

I confirm that the new IGNORE_FILES works locally for me, but I hadn't the time to deeply review all the code changes. I see some test failures, can you fix those before ask for review?

@justinmayer
Copy link
Member

The test failures shown here are almost certainly not related to this pull request. I am currently investigating, but the cause is likely related to some dependency change and/or the Typogrify-related changes @davidlesieur and I have been collaborating on.

@justinmayer
Copy link
Member

Regarding the aforementioned test failures, I should have known… Pygments strikes again: #3446

@justinmayer justinmayer changed the title fix #3431: Change autoreload behavior to work correctly with Unix shell wildcards fix: Fix auto-reload with globs in IGNORE_FILES Jan 14, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

pelican -lr not reloading on content changes in 4.10.2 on Windows 11
4 participants