Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rewrite file_source and file_parser #102

Open
2 of 3 tasks
Pennycook opened this issue Aug 26, 2024 · 1 comment
Open
2 of 3 tasks

Rewrite file_source and file_parser #102

Pennycook opened this issue Aug 26, 2024 · 1 comment
Labels
documentation Improvements or additions to documentation enhancement New feature or request help wanted Extra attention is needed
Milestone

Comments

@Pennycook
Copy link
Contributor

Feature/behavior summary

These two files are large, complicated, and poorly documented. The desired functionality is actually very simple:

  • "Clean" source files, by stripping comments and combining whitespace.
  • Provide an iterator over some representation of the cleaned lines.

Request attributes

  • Would this be a refactor of existing code?
  • Does this proposal require new package dependencies?
  • Would this change break backwards compatibility?

Related issues

No response

Solution description

  • Replace file_source with a simpler approach, probably by re-using existing lexers.
  • Rewrite file_parser to use a simpler representation of the line_info and LineGroup objects.

Additional notes

We should explore implementing this functionality with pygments; it should be possible to iterate through tokens and discard the ones we don't want. This would also open up the opportunity to rewrite other parts of Code Base Investigator to use pygments tokens.

If we do use pygments, we'd be introducing a new dependency.

@Pennycook Pennycook added documentation Improvements or additions to documentation enhancement New feature or request help wanted Extra attention is needed labels Aug 26, 2024
@Pennycook Pennycook added this to the 2.0.0 milestone Aug 26, 2024
@Pennycook
Copy link
Contributor Author

After #122 is merged, parse_file accounts for ~20% of execution time in my offline stress test.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation enhancement New feature or request help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

2 participants
@Pennycook and others