-
Notifications
You must be signed in to change notification settings - Fork 65
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement the codespell pre-commit hook #403
Implement the codespell pre-commit hook #403
Conversation
13f540d
to
e2e4bfe
Compare
e2e4bfe
to
446ae05
Compare
@drbenvincent -- hey just hoping to see if this is the right direction for this PR. |
Hi @westford14 thanks for the PR! I'll be able to take a look and do a review later today/tomorrow 🙂 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR @westford14. This looks pretty good to me. We'll definitely merge, but I've got a suggested edit...
If you are running into loads of issues with the notebooks, then that's fine. Though it's a bit of a shame because there are likely a number of errors in notebooks. What about converting the notebooks into markdown files and running codespell on those markdown files? ChatGPT suggests a way:
python script to convert notebooks to markdown
import nbformat
from nbconvert import MarkdownExporter
import sys
import os
def notebook_to_markdown(notebook_path, output_dir):
with open(notebook_path, 'r', encoding='utf-8') as f:
nb = nbformat.read(f, as_version=4)
markdown_exporter = MarkdownExporter()
(body, resources) = markdown_exporter.from_notebook_node(nb)
# Ensure the output directory exists
os.makedirs(output_dir, exist_ok=True)
# Create output file name based on the notebook name
output_file = os.path.join(output_dir, os.path.splitext(os.path.basename(notebook_path))[0] + ".md")
with open(output_file, 'w', encoding='utf-8') as f:
f.write(body)
if __name__ == "__main__":
if len(sys.argv) != 3:
print("Usage: python notebook_to_markdown.py <notebook.ipynb> <output_dir>")
else:
notebook_path = sys.argv[1]
output_dir = sys.argv[2]
notebook_to_markdown(notebook_path, output_dir)
We'd also have to update the optional docs dependencies for those extra imports.
Add that temp folder to .gitignore
# Ignore the temporary markdown files directory
tmp_markdown/
Run the script in the pre-commit checks
- repo: local
hooks:
- id: convert-notebooks
name: Convert Notebooks to Markdown
entry: python notebook_to_markdown.py
language: python
files: \.ipynb$
pass_filenames: false
args: ['{}', 'tmp_markdown']
- id: codespell
name: Spell Check Markdown Files
entry: codespell
args: [
"-w",
"--ignore-words=.codespell-whitelist.txt",
]
language: python
types: [markdown]
files: tmp_markdown/.*\.md$
It's clear that that is not exactly right because it's not got the ignore args etc. But something along these lines?
Tagging you @OriolAbril because I know you've thought about these with the pymc-examples repo. Here I'm not suggesting that we keep or commit the converted notebooks, just use them purely locally for spellchecking.
The only added complexity might be ensuring that the pre-commit checks also run in the remote tests. But should all be doable I hope!
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #403 +/- ##
=======================================
Coverage 94.35% 94.35%
=======================================
Files 30 30
Lines 1967 1967
=======================================
Hits 1856 1856
Misses 111 111 ☔ View full report in Codecov by Sentry. |
Awesome thanks for direction here! I'll give this a go and ping you once it's ready for re-review. Though the diff might be kind of a lot to parse through 😬 |
now calls for codespell.
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
hey @drbenvincent -- let me know what you think of this, i slightly changed the chatgpt script and also added a final pre-commit to remove the created temporary directory for the notebook to markdown conversions again sorry for the large diff in this PR since, you were right, there were a number of spelling issues in the notebooks themselves |
Thanks @westford14. I hope to be able to take a proper look tomorrow or Monday |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks pretty cool! Just 2 minor things:
My only minor gripe is the addition of the .codespell
directory in the root of the repo. Would it be possible to move that into ~/docs/
or ~/docs/source/
?
Thinking about long term stability and maintainability, I'm thinking that it could be good to add some test coverage to the notebook_to_markdown
function. I'm in 2 minds whether this is overkill or not - In a way this new code becomes quite important as part of the pre-commit checks. But at the same time, it's important in the developer workflow and isn't a user-facing thing that will upset a lot of people if it breaks. But I am leaning towards adding some basic test coverage. What do you think?
thanks for your guidance on this! |
@drbenvincent took a cut at the tests and I wasn't totally sure where the test should go, didn't really make sense to put it into the let me know if this makes sense |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. Happy to merge once the conflicts with .pre-commit-config.yaml
are resolved.
I'm still a little new to GitHub Actions - but as far as I understand non of this stuff is not being triggered to run remotely when there's a PR or push to main and that the .pre-commit-config.yaml
is only being run locally when a user pushes to their branch? I think that's probably what we want.
EDIT: Well it seems that these checks are being run remotely, and they are failing. Quick look suggests it's just a matter of adding in some of the new optional dependencies into the pyproject.toml
?
I think |
@@ -41,3 +41,39 @@ repos: | |||
# needed to make excludes in pyproject.toml work | |||
# see here https://github.com/econchick/interrogate/issues/60#issuecomment-735436566 | |||
pass_filenames: false | |||
- repo: local |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm, design choices! (i.e. no right or wrong here, but would love to double-check ideas)
May I ask, codespell appears to behave more like a software test than a linter with the setup (convert-notebooks
) and teardown (remove-temp-directory-notebooks
), so would it be better for this to be implemented inside CI/CD instead of in pre-commit hooks?
Not suggesting that we do so, but I just wanted to see whether there's a strong(er) rationale for leaving it in a pre-commit hook than within a GitHub action independently. Is the intent for it to be run locally? Also, might there be a more compact way of configuring this?
To be clear, definitely not suggesting that we move away from what's implemented. Just asking these questions to make sure the rationale is strong.
The only ask I'd have here is to document this design choice in the documentation directory. (My criteria for documentation is that if a topic has been asked and the answers are not in the docs already, then it should be documented.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is probably more of a philosophical question for you (@ericmjl ) and @drbenvincent
initially this was just checking spelling outside of the jupyter notebooks, which to me definitely feels like a pre-commit
check, but now the check is including the notebook to markdown conversion to find spelling mistakes in these notebooks. maybe this highlights a bit of scope creep for this one PR / issue because purely as a pre-commit
check I think it makes sense to just look at the .py
and .md
files and then the notebook spelling check is more of a CI check
so, it's up to you two but i'm happy to keep plugging away at this PR with the updated doc changes and rationale changes but maybe it makes sense to prune back this PR to just the base codespell
checks then cherry-pick out the commits to a new PR for an actual CI check for jupyter notebooks since those should be more static and don't need to be checked on every commit -- thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @westford14. Yeah, I was thinking the same thing - happy for you to do this.
rev: v2.3.0 | ||
hooks: | ||
- id: codespell | ||
args: [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Noticing that codespell can be configured by pyproject.toml. I think it's worth standardizing on pyproject.toml
as the place for configuration. Establishing the pattern will be good for the long-term health of the package. @drbenvincent what are your thoughts here?
I'm mostly thinking of the -S
flags btw, just to see if we can compact down the args list.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree - pyproject.toml
should do as much of the project config work as possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yea that's an easy fix
442302b
to
df16e4b
Compare
Sorry about the noise, I was doing a test and didn't mean to commit to this branch. I reset the branch to the previous commit. |
Regarding the aforementioned test, I was suspicious that it might not work on Windows. I'm not sure how representative a GitHub runner is, but the hook ran successfully on the Windows GH runner. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, let's go with your suggestion here https://github.com/pymc-labs/CausalPy/pull/403/files#r1753839292
That way, we can keep this as a single pre-commit hook to check the .py
and .md
files.
i'm going to close this PR, but reference this in new PRs that are forthcoming -- thanks for all the back and forth with me on this! |
Closes #401 -- I ignored all the jupyter notebooks as there's a lot of custom variable definition in there that is getting flagged as incorrect words. I felt like trying to maintain a list of all these words, long term, as a dictionary would be a bit too tricky / annoying.
📚 Documentation preview 📚: https://causalpy--403.org.readthedocs.build/en/403/