Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gh-122071: Fixed traceback leaks global code when file does not exist #122126

Open
wants to merge 21 commits into
base: main
Choose a base branch
from

Conversation

lhish
Copy link

@lhish lhish commented Jul 22, 2024

Copy link

cpython-cla-bot bot commented Jul 22, 2024

All commit authors signed the Contributor License Agreement.
CLA signed

@bedevere-app
Copy link

bedevere-app bot commented Jul 22, 2024

Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool.

If this change has little impact on Python users, wait for a maintainer to apply the skip news label instead.

@devdanzin
Copy link
Contributor

I got the same test failures without is_main, so I added it. But if you can write a test that fails due to checking is_main, that'll lead to a better solution.

@lhish
Copy link
Author

lhish commented Jul 22, 2024

I got the same test failures without is_main, so I added it. But if you can write a test that fails due to checking is_main, that'll lead to a better solution.

t.py

import rely
import traceback
try:
    rely.fun()
except:
    traceback.print_exc()

rely.py

def fun():
    exec(compile("tuple()[1]","s","exec"))

@lhish
Copy link
Author

lhish commented Jul 22, 2024

By the way, based on preliminary testing, the zipimporter module does not appear to trigger issues related to this bug.

@lhish lhish changed the title gh-122071: Fixed traceback leaks global code when exec gh-122071: Fixed traceback leaks global code when file does not exist Jul 22, 2024
@lhish
Copy link
Author

lhish commented Jul 22, 2024

Regarding the modifications to the testing section: The previous testing method utilized the characteristic that when the target file didn't exist, the current file would be used as the content of the non-existent file for testing purposes. I have now changed this method to directly use the current file for testing.

@lhish
Copy link
Author

lhish commented Aug 24, 2024

After reviewing the discussion in GH-122145, I found it very insightful. The logic presented there is more general and avoids the need for a specific check for zipimporter, as I had implemented. Therefore, I have revised my code accordingly.

Lib/test/test_traceback.py Outdated Show resolved Hide resolved
Lib/linecache.py Outdated Show resolved Hide resolved
Lib/linecache.py Outdated Show resolved Hide resolved
Lib/test/test_traceback.py Outdated Show resolved Hide resolved
Lib/test/test_traceback.py Outdated Show resolved Hide resolved
@picnixz
Copy link
Contributor

picnixz commented Aug 25, 2024

EDIT: It appears that test failures are important. So maybe the previous logic was fine. By the way, #122161 (comment) suggested returning False instead of True for lazycaching.

@lhish
Copy link
Author

lhish commented Aug 27, 2024

EDIT: It appears that test failures are important. So maybe the previous logic was fine. By the way, #122161 (comment) suggested returning False instead of True for lazycaching.

After some investigation, I found that the get_lines function seems to be unique to linecache, so it shouldn't cause any issues. However, I noticed that if we return False directly, using linecache["nonexistent_filename"] results in an error, whereas it didn't before the modification. Therefore, I'm uncertain if returning False is the correct approach.

@lhish
Copy link
Author

lhish commented Aug 28, 2024

To handle Fakeloader and zipfiles, I've added additional conditions. However, I don't believe this is the best solution. If anyone has any suggestions or alternative ideas, please help me.

Copy link
Contributor

@picnixz picnixz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When you say that FakeLoader and ZipImporter do not work well, what are the issues with them? how do they fail? (sorry if you already replied to those questions but I need someone to refresh my memory :'))

Lib/linecache.py Outdated Show resolved Hide resolved
Lib/linecache.py Outdated Show resolved Hide resolved
Lib/linecache.py Outdated
Comment on lines 186 to 187
mod_file = module_globals.get('__file__')
if isinstance(loader, importlib._bootstrap_external.SourceFileLoader) and (not mod_file or (not mod_file.endswith(filename) and not mod_file.endswith('.pyc'))):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You said that returning False makes linecache["non-existant"] fail, but how does it fail exactly (namely what are the operations you do (i.e. please provide a reproducer)? is it possible to fix this case? technically, if the file does not exist linecache should just return an empty list.

If the issue is with an existing test, then the test might also need to be updated (since the logic has been changed here).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we directly return False, then when attempting to lazycache something that meets the conditions in the newly added if statement, it won’t add a new element to linecache.cache. As a result, accessing this key later will raise a KeyError. However, if we return an empty getline instead and return True, this issue doesn’t occur. While I doubt anyone would use code like the following, this code did not produce an error before the change:

import linecache
linecache.lazycache("11", globals())
print(linecache.cache["11"])

The output is:

print(linecache.cache["11"])
      ~~~~~~~~~~~~~~~^^^^^^
KeyError: '11'

Additionally, regarding the loader, if we don’t add this sourceloader restriction, the test_loader in test_linecache will fail:

def test_loader(self):
        filename = 'scheme://path'

        for loader in (None, object(), NoSourceLoader()):
            linecache.clearcache()
            module_globals = {'__name__': 'a.b.c', '__loader__': loader}
            self.assertEqual(linecache.getlines(filename, module_globals), [])

        linecache.clearcache()
        module_globals = {'__name__': 'a.b.c', '__loader__': FakeLoader()}
        self.assertEqual(linecache.getlines(filename, module_globals),
                         ['source for a.b.c\n'])

Since module_globals does not have filename set, mod_file = module_globals.get('__file__') definitely set mod_file to None. Without this if statement, it would normally retrieve source through the loader. However, the new if condition now covers this case, which means almost all non-standard loaders might return empty results (though I haven’t exhaustively tested this).
For ZipImporter, if inspect.getsource() is used to obtain the contents of a module imported via zipimporter, an error occurs because inspect.getsource relies on linecache.getline. When zipimporter goes through lazycache, it enters the newly added if statement, resulting in an empty getline. This is because the if checks whether mod_file and file_name have the same suffix. In the case of zipimporter, mod_file ends with .pyc, while file_name ends with .py. This ultimately causes failure in test.test_zipimport.UncompressedZipImportTestCase.testGetCompiledSource.

Copy link
Contributor

@picnixz picnixz Oct 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're not meant to access the cache using .cache directly. The cache should be accessed using linecache.getlines(). So we should perhaps see whether other path need to be updated in updatecache().

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have two questions regarding your reply.

First, Since my English is not very good, I struggled to understand your final suggestion about updating other paths in updatecache(). If you meant that all direct accesses to the cache in updatecache should be replaced with linecache.getlines, I don't think that's feasible. This is because getlines itself calls updatecache, which would create a circular dependency if updatecache were modified this way.

Second, If simply returns False, it would cause two existing Linecache-related tests to fail:

Test Case 1: test_lazycache_already_cached:
This test directly accesses the cache and expects it to have content. Here’s a snippet of the test:

    def test_lazycache_already_cached(self):
        linecache.clearcache()
        lines = linecache.getlines(NONEXISTENT_FILENAME, globals())
        self.assertEqual(
            False,
            linecache.lazycache(NONEXISTENT_FILENAME, globals()))
        self.assertEqual(4, len(linecache.cache[NONEXISTENT_FILENAME]))

In this case, returning False would result in a failure because the test expects the cache to contain data.
Test Case 2: test_lazycache_smoke:
This test expects that lazycache should return True when called with a nonexistent filename. Here’s the relevant snippet:

    def test_lazycache_smoke(self):
        lines = linecache.getlines(NONEXISTENT_FILENAME, globals())
        linecache.clearcache()
        self.assertEqual(
            True, linecache.lazycache(NONEXISTENT_FILENAME, globals()))
        self.assertEqual(1, len(linecache.cache[NONEXISTENT_FILENAME]))
        # Note here that we're looking up a nonexistent filename with no
        # globals: this would error if the lazy value wasn't resolved.
        self.assertEqual(lines, linecache.getlines(NONEXISTENT_FILENAME))

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First, Since my English is not very good

Don't worry, I replied a bit too fast. My English is not perfect either so don't hesistate to ask if I was unclear!

If you meant that all direct accesses to the cache in updatecache should be replaced with linecache.getlines, I don't think that's feasible

Sorry, I meant to see whether we correctly covered the cases (namely, to see if the algorithm needs to be updated because of this new logic).


For the tests, the logic could be changed. Leaking the global code is probably worse than not leaking it IMO. I'll have a better look at the lazy-caching interface and the question on non-existent filenames (maybe we could make it work).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, it does seem that returning false is appropriate in this case. However, I'm not entirely sure if other direct usages of linecache.cache need to be modified beyond the linecache and test modules. My understanding is that no further changes are necessary because the other two usages occur in pyshell and timeit. In both of these cases, the cache is directly assigned to, rather than read from.

Additionally, I've adjusted two tests that were failing in my latest commit. However, I'm not completely certain if these adjustments are logically sound.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants