Implement the codespell pre-commit hook #403

westford14 · 2024-09-02T15:26:43Z

Closes #401 -- I ignored all the jupyter notebooks as there's a lot of custom variable definition in there that is getting flagged as incorrect words. I felt like trying to maintain a list of all these words, long term, as a dictionary would be a bit too tricky / annoying.

📚 Documentation preview 📚: https://causalpy--403.org.readthedocs.build/en/403/

westford14 · 2024-09-03T07:34:19Z

@drbenvincent -- hey just hoping to see if this is the right direction for this PR. codespell is fine if not a little finicky

drbenvincent · 2024-09-03T07:59:17Z

Hi @westford14 thanks for the PR! I'll be able to take a look and do a review later today/tomorrow 🙂

drbenvincent

Thanks for the PR @westford14. This looks pretty good to me. We'll definitely merge, but I've got a suggested edit...

If you are running into loads of issues with the notebooks, then that's fine. Though it's a bit of a shame because there are likely a number of errors in notebooks. What about converting the notebooks into markdown files and running codespell on those markdown files? ChatGPT suggests a way:

python script to convert notebooks to markdown

import nbformat
from nbconvert import MarkdownExporter
import sys
import os

def notebook_to_markdown(notebook_path, output_dir):
    with open(notebook_path, 'r', encoding='utf-8') as f:
        nb = nbformat.read(f, as_version=4)
    
    markdown_exporter = MarkdownExporter()
    (body, resources) = markdown_exporter.from_notebook_node(nb)
    
    # Ensure the output directory exists
    os.makedirs(output_dir, exist_ok=True)

    # Create output file name based on the notebook name
    output_file = os.path.join(output_dir, os.path.splitext(os.path.basename(notebook_path))[0] + ".md")
    
    with open(output_file, 'w', encoding='utf-8') as f:
        f.write(body)

if __name__ == "__main__":
    if len(sys.argv) != 3:
        print("Usage: python notebook_to_markdown.py <notebook.ipynb> <output_dir>")
    else:
        notebook_path = sys.argv[1]
        output_dir = sys.argv[2]
        notebook_to_markdown(notebook_path, output_dir)

We'd also have to update the optional docs dependencies for those extra imports.

Add that temp folder to `.gitignore`

# Ignore the temporary markdown files directory
tmp_markdown/

Run the script in the pre-commit checks

- repo: local
  hooks:
    - id: convert-notebooks
      name: Convert Notebooks to Markdown
      entry: python notebook_to_markdown.py
      language: python
      files: \.ipynb$
      pass_filenames: false
      args: ['{}', 'tmp_markdown']

    - id: codespell
      name: Spell Check Markdown Files
      entry: codespell
      args: [
          "-w",
          "--ignore-words=.codespell-whitelist.txt",
        ]
      language: python
      types: [markdown]
      files: tmp_markdown/.*\.md$

It's clear that that is not exactly right because it's not got the ignore args etc. But something along these lines?

Tagging you @OriolAbril because I know you've thought about these with the pymc-examples repo. Here I'm not suggesting that we keep or commit the converted notebooks, just use them purely locally for spellchecking.

The only added complexity might be ensuring that the pre-commit checks also run in the remote tests. But should all be doable I hope!

.codespell-whitelist.txt

docs/source/knowledgebase/glossary.rst

codecov · 2024-09-03T11:05:59Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 94.35%. Comparing base (4b0fcef) to head (4d26e6d).
Report is 2 commits behind head on main.

Additional details and impacted files

@@           Coverage Diff           @@
##             main     #403   +/-   ##
=======================================
  Coverage   94.35%   94.35%           
=======================================
  Files          30       30           
  Lines        1967     1967           
=======================================
  Hits         1856     1856           
  Misses        111      111

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

westford14 · 2024-09-03T12:51:06Z

Thanks for the PR @westford14. This looks pretty good to me. We'll definitely merge, but I've got a suggested edit...

If you are running into loads of issues with the notebooks, then that's fine. Though it's a bit of a shame because there are likely a number of errors in notebooks. What about converting the notebooks into markdown files and running codespell on those markdown files? ChatGPT suggests a way:

python script to convert notebooks to markdown
import nbformat
from nbconvert import MarkdownExporter
import sys
import os

def notebook_to_markdown(notebook_path, output_dir):
    with open(notebook_path, 'r', encoding='utf-8') as f:
        nb = nbformat.read(f, as_version=4)
    
    markdown_exporter = MarkdownExporter()
    (body, resources) = markdown_exporter.from_notebook_node(nb)
    
    # Ensure the output directory exists
    os.makedirs(output_dir, exist_ok=True)

    # Create output file name based on the notebook name
    output_file = os.path.join(output_dir, os.path.splitext(os.path.basename(notebook_path))[0] + ".md")
    
    with open(output_file, 'w', encoding='utf-8') as f:
        f.write(body)

if __name__ == "__main__":
    if len(sys.argv) != 3:
        print("Usage: python notebook_to_markdown.py <notebook.ipynb> <output_dir>")
    else:
        notebook_path = sys.argv[1]
        output_dir = sys.argv[2]
        notebook_to_markdown(notebook_path, output_dir)
We'd also have to update the optional docs dependencies for those extra imports.

Add that temp folder to .gitignore
# Ignore the temporary markdown files directory
tmp_markdown/
Run the script in the pre-commit checks
- repo: local
  hooks:
    - id: convert-notebooks
      name: Convert Notebooks to Markdown
      entry: python notebook_to_markdown.py
      language: python
      files: \.ipynb$
      pass_filenames: false
      args: ['{}', 'tmp_markdown']

    - id: codespell
      name: Spell Check Markdown Files
      entry: codespell
      args: [
          "-w",
          "--ignore-words=.codespell-whitelist.txt",
        ]
      language: python
      types: [markdown]
      files: tmp_markdown/.*\.md$
It's clear that that is not exactly right because it's not got the ignore args etc. But something along these lines?

Tagging you @OriolAbril because I know you've thought about these with the pymc-examples repo. Here I'm not suggesting that we keep or commit the converted notebooks, just use them purely locally for spellchecking.

The only added complexity might be ensuring that the pre-commit checks also run in the remote tests. But should all be doable I hope!

Awesome thanks for direction here! I'll give this a go and ping you once it's ready for re-review. Though the diff might be kind of a lot to parse through 😬

now calls for codespell.

review-notebook-app · 2024-09-05T10:16:12Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

westford14 · 2024-09-05T10:19:19Z

hey @drbenvincent -- let me know what you think of this, i slightly changed the chatgpt script and also added a final pre-commit to remove the created temporary directory for the notebook to markdown conversions

again sorry for the large diff in this PR since, you were right, there were a number of spelling issues in the notebooks themselves

drbenvincent · 2024-09-05T11:45:17Z

Thanks @westford14. I hope to be able to take a proper look tomorrow or Monday

drbenvincent

This looks pretty cool! Just 2 minor things:

My only minor gripe is the addition of the .codespell directory in the root of the repo. Would it be possible to move that into ~/docs/ or ~/docs/source/?

Thinking about long term stability and maintainability, I'm thinking that it could be good to add some test coverage to the notebook_to_markdown function. I'm in 2 minds whether this is overkill or not - In a way this new code becomes quite important as part of the pre-commit checks. But at the same time, it's important in the developer workflow and isn't a user-facing thing that will upset a lot of people if it breaks. But I am leaning towards adding some basic test coverage. What do you think?

westford14 · 2024-09-09T14:48:41Z

This looks pretty cool! Just 2 minor things:

My only minor gripe is the addition of the .codespell directory in the root of the repo. Would it be possible to move that into ~/docs/ or ~/docs/source/?

Thinking about long term stability and maintainability, I'm thinking that it could be good to add some test coverage to the notebook_to_markdown function. I'm in 2 minds whether this is overkill or not - In a way this new code becomes quite important as part of the pre-commit checks. But at the same time, it's important in the developer workflow and isn't a user-facing thing that will upset a lot of people if it breaks. But I am leaning towards adding some basic test coverage. What do you think?

yea abosolutely can move this into the ~/docs/source/ directory
for your second point, i agree that adding some basic test coverage would be good here -- i'll give it a first pass and then re-request a review

thanks for your guidance on this!

westford14 · 2024-09-09T15:57:05Z

@drbenvincent took a cut at the tests and I wasn't totally sure where the test should go, didn't really make sense to put it into the causalpy/tests dir so i put it into the docs/source/.codespell dir and created another ci step

let me know if this makes sense

drbenvincent

Looks good. Happy to merge once the conflicts with .pre-commit-config.yaml are resolved.

I'm still a little new to GitHub Actions - but as far as I understand non of this stuff is not being triggered to run remotely when there's a PR or push to main and that the .pre-commit-config.yaml is only being run locally when a user pushes to their branch? I think that's probably what we want.

EDIT: Well it seems that these checks are being run remotely, and they are failing. Quick look suggests it's just a matter of adding in some of the new optional dependencies into the pyproject.toml?

drbenvincent · 2024-09-10T13:52:29Z

I think nbformat and nbconvert might also need to be added to the optional test dependencies.

ericmjl · 2024-09-10T14:00:06Z

.pre-commit-config.yaml

@@ -41,3 +41,39 @@ repos:
        # needed to make excludes in pyproject.toml work
        # see here https://github.com/econchick/interrogate/issues/60#issuecomment-735436566
        pass_filenames: false
+  - repo: local


Hmmm, design choices! (i.e. no right or wrong here, but would love to double-check ideas)

May I ask, codespell appears to behave more like a software test than a linter with the setup (convert-notebooks) and teardown (remove-temp-directory-notebooks), so would it be better for this to be implemented inside CI/CD instead of in pre-commit hooks?

Not suggesting that we do so, but I just wanted to see whether there's a strong(er) rationale for leaving it in a pre-commit hook than within a GitHub action independently. Is the intent for it to be run locally? Also, might there be a more compact way of configuring this?

To be clear, definitely not suggesting that we move away from what's implemented. Just asking these questions to make sure the rationale is strong.

The only ask I'd have here is to document this design choice in the documentation directory. (My criteria for documentation is that if a topic has been asked and the answers are not in the docs already, then it should be documented.)

this is probably more of a philosophical question for you (@ericmjl ) and @drbenvincent

initially this was just checking spelling outside of the jupyter notebooks, which to me definitely feels like a pre-commit check, but now the check is including the notebook to markdown conversion to find spelling mistakes in these notebooks. maybe this highlights a bit of scope creep for this one PR / issue because purely as a pre-commit check I think it makes sense to just look at the .py and .md files and then the notebook spelling check is more of a CI check

so, it's up to you two but i'm happy to keep plugging away at this PR with the updated doc changes and rationale changes but maybe it makes sense to prune back this PR to just the base codespell checks then cherry-pick out the commits to a new PR for an actual CI check for jupyter notebooks since those should be more static and don't need to be checked on every commit -- thoughts?

Hi @westford14. Yeah, I was thinking the same thing - happy for you to do this.

ericmjl · 2024-09-10T14:03:24Z

.pre-commit-config.yaml

+    rev: v2.3.0
+    hooks:
+      - id: codespell
+        args: [


Noticing that codespell can be configured by pyproject.toml. I think it's worth standardizing on pyproject.toml as the place for configuration. Establishing the pattern will be good for the long-term health of the package. @drbenvincent what are your thoughts here?

I'm mostly thinking of the -S flags btw, just to see if we can compact down the args list.

Agree - pyproject.toml should do as much of the project config work as possible.

yea that's an easy fix

maresb · 2024-09-10T16:40:54Z

Sorry about the noise, I was doing a test and didn't mean to commit to this branch. I reset the branch to the previous commit.

maresb · 2024-09-10T16:53:52Z

Regarding the aforementioned test, I was suspicious that it might not work on Windows.

I'm not sure how representative a GitHub runner is, but the hook ran successfully on the Windows GH runner.

drbenvincent

Yeah, let's go with your suggestion here https://github.com/pymc-labs/CausalPy/pull/403/files#r1753839292

That way, we can keep this as a single pre-commit hook to check the .py and .md files.

westford14 · 2024-09-13T13:22:32Z

i'm going to close this PR, but reference this in new PRs that are forthcoming -- thanks for all the back and forth with me on this!

westford14 force-pushed the westford14/codespell-implementation branch from 13f540d to e2e4bfe Compare September 2, 2024 16:50

Implement the codespell pre-commit hook

446ae05

westford14 force-pushed the westford14/codespell-implementation branch from e2e4bfe to 446ae05 Compare September 2, 2024 17:00

drbenvincent requested changes Sep 3, 2024

View reviewed changes

.codespell-whitelist.txt Outdated Show resolved Hide resolved

docs/source/knowledgebase/glossary.rst Show resolved Hide resolved

Create a notebook conversion script that the pre-commit job

db9e69b

now calls for codespell.

fixing issue with asthe and complier

4d26e6d

westford14 requested a review from drbenvincent September 6, 2024 10:47

drbenvincent requested changes Sep 9, 2024

View reviewed changes

Adding test coverage to the codespell utility

df16e4b

westford14 requested a review from drbenvincent September 9, 2024 15:56

drbenvincent reviewed Sep 10, 2024

View reviewed changes

ericmjl reviewed Sep 10, 2024

View reviewed changes

maresb force-pushed the westford14/codespell-implementation branch from 442302b to df16e4b Compare September 10, 2024 16:40

drbenvincent requested changes Sep 13, 2024

View reviewed changes

westford14 closed this Sep 13, 2024

westford14 mentioned this pull request Sep 13, 2024

Implement the codespell pre-commit hook #408

Merged

westford14 mentioned this pull request Sep 16, 2024

Codespell CI Job #411

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement the codespell pre-commit hook #403

Implement the codespell pre-commit hook #403

westford14 commented Sep 2, 2024 •

edited by github-actions bot

Loading

westford14 commented Sep 3, 2024

drbenvincent commented Sep 3, 2024

drbenvincent left a comment •

edited

Loading

codecov bot commented Sep 3, 2024 •

edited

Loading

westford14 commented Sep 3, 2024

python script to convert notebooks to markdown

Add that temp folder to `.gitignore`

Run the script in the pre-commit checks

review-notebook-app bot commented Sep 5, 2024

westford14 commented Sep 5, 2024

drbenvincent commented Sep 5, 2024

drbenvincent left a comment

westford14 commented Sep 9, 2024

westford14 commented Sep 9, 2024

drbenvincent left a comment •

edited

Loading

drbenvincent commented Sep 10, 2024

ericmjl Sep 10, 2024

westford14 Sep 11, 2024

drbenvincent Sep 13, 2024

ericmjl Sep 10, 2024 •

edited

Loading

drbenvincent Sep 10, 2024

westford14 Sep 11, 2024

maresb commented Sep 10, 2024

maresb commented Sep 10, 2024

drbenvincent left a comment •

edited

Loading

westford14 commented Sep 13, 2024

Implement the codespell pre-commit hook #403

Implement the codespell pre-commit hook #403

Conversation

westford14 commented Sep 2, 2024 • edited by github-actions bot Loading

westford14 commented Sep 3, 2024

drbenvincent commented Sep 3, 2024

drbenvincent left a comment • edited Loading

Choose a reason for hiding this comment

python script to convert notebooks to markdown

Add that temp folder to .gitignore

Run the script in the pre-commit checks

codecov bot commented Sep 3, 2024 • edited Loading

Codecov Report

westford14 commented Sep 3, 2024

python script to convert notebooks to markdown

Add that temp folder to .gitignore

Run the script in the pre-commit checks

review-notebook-app bot commented Sep 5, 2024

westford14 commented Sep 5, 2024

drbenvincent commented Sep 5, 2024

drbenvincent left a comment

Choose a reason for hiding this comment

westford14 commented Sep 9, 2024

westford14 commented Sep 9, 2024

drbenvincent left a comment • edited Loading

Choose a reason for hiding this comment

drbenvincent commented Sep 10, 2024

ericmjl Sep 10, 2024

Choose a reason for hiding this comment

westford14 Sep 11, 2024

Choose a reason for hiding this comment

drbenvincent Sep 13, 2024

Choose a reason for hiding this comment

ericmjl Sep 10, 2024 • edited Loading

Choose a reason for hiding this comment

drbenvincent Sep 10, 2024

Choose a reason for hiding this comment

westford14 Sep 11, 2024

Choose a reason for hiding this comment

maresb commented Sep 10, 2024

maresb commented Sep 10, 2024

drbenvincent left a comment • edited Loading

Choose a reason for hiding this comment

westford14 commented Sep 13, 2024

westford14 commented Sep 2, 2024 •

edited by github-actions bot

Loading

drbenvincent left a comment •

edited

Loading

Add that temp folder to `.gitignore`

codecov bot commented Sep 3, 2024 •

edited

Loading

Add that temp folder to `.gitignore`

drbenvincent left a comment •

edited

Loading

ericmjl Sep 10, 2024 •

edited

Loading

drbenvincent left a comment •

edited

Loading