Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize path handling #122

Merged
merged 5 commits into from
Oct 24, 2024
Merged

Optimize path handling #122

merged 5 commits into from
Oct 24, 2024

Conversation

Pennycook
Copy link
Contributor

Related issues

Closes #114.

Proposed changes

  • Use abspath instead of realpath when resolving symlinks is unnecessary.
  • Allow functions to read from paths containing symlinks, by using standard open.

realpath() resolves any and all symlinks in the path; in situations
where we just need to store or print an absolute path, we should use
abspath() instead.

Signed-off-by: John Pennycook <[email protected]>
Now that we no longer use realpath() when locating files, there is a
possibility that paths will still contain symlinks when we read them.

Signed-off-by: John Pennycook <[email protected]>
@Pennycook Pennycook added the performance Improvements to execution time label Oct 10, 2024
@Pennycook Pennycook marked this pull request as draft October 10, 2024 14:01
@Pennycook
Copy link
Contributor Author

I'm seeing a few unexpected differences in computed divergence after this change, so I'm converting this to a draft until I can figure out what's going on.

Reintroduce realpath() when searching for information we have stored
about files, to ensure that platform information propagates through
symlinks to the original files.

Signed-off-by: John Pennycook <[email protected]>
@Pennycook Pennycook marked this pull request as ready for review October 10, 2024 15:01
@Pennycook
Copy link
Contributor Author

The issue was that without realpath, files and symlinks to those files were being treated as two separate entities. If one platform used the real file and another platform used the symlink, that would be incorrectly counted as divergence.

Reintroducing realpath into ParserState fixes the problem. Things are still much faster, because a call to realpath is now only required when searching for file information, rather than every time we encounter a path.

I suspect there are opportunities to optimize things further by hoisting some of these realpath calls, but it might be best to explore that during the planned refactoring of ParserState.

Significantly improves performance by further reducing the number of
calls to realpath.

Signed-off-by: John Pennycook <[email protected]>
@Pennycook
Copy link
Contributor Author

I suspect there are opportunities to optimize things further by hoisting some of these realpath calls, but it might be best to explore that during the planned refactoring of ParserState.

I realized this morning that it would actually be very easy to cache computed realpaths inside of ParserState without actually doing any refactoring, so I've gone ahead and done that. With that fix committed, my test runs are now almost 2.5x faster.

@Pennycook Pennycook added this to the 2.0.0 milestone Oct 11, 2024
Copy link
Contributor

@laserkelvin laserkelvin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@Pennycook Pennycook merged commit 4cb95e2 into intel:main Oct 24, 2024
3 checks passed
@Pennycook Pennycook deleted the optimize-paths branch October 24, 2024 06:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Improvements to execution time
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Optimize symlink handling
2 participants