Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gh-115999: Implement thread-local bytecode and enable specialization for
BINARY_OP
#123926base: main
Are you sure you want to change the base?
gh-115999: Implement thread-local bytecode and enable specialization for
BINARY_OP
#123926Changes from 71 commits
776a1e1
2b40870
344d7ad
f203d00
82b456a
b021704
aea69c5
552277d
50a6089
3f1d941
7d2eb27
d5476b9
e3b367a
b2375bf
2707f8e
3fdcb28
4a55ce5
8b3ff60
862afa1
0b4d952
7795e99
693a4cc
b43531e
9025f43
c44c7d9
e2a6656
e6513d1
a18396f
81fe1a2
837645e
f13e132
942f628
66cb24d
ad12bd4
1bbbbbc
e63e403
8b97771
d34adeb
6d4fe73
c2d8693
b104782
deb5216
2f11cc7
04f1ac3
aa330b1
7dfd1ca
7c9da24
dd144d0
ad180d1
95d2264
b6380de
adb59ef
39c947d
2cc5830
96ec126
5ecebd9
815b2fe
fb90d23
4e42414
814e4ca
ba3930a
cb8a774
0f8a55b
70ce0fe
f512353
4be2b1f
61c7aa9
ab6222c
6bbb220
4580e3c
b992f44
4c040d3
5b7658c
bec5bce
c9054b7
1a48ab2
b16ae5f
176b24e
c107495
07f9140
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You were storing the bytecode in the frame directly before, IIRC.
This looks more expensive, and is used on at least one fast path:
https://github.com/python/cpython/pull/123926/files#diff-729a985b0cb8b431cb291f1edb561bbbfea22e3f8c262451cd83328a0936a342R4821
Does it makes things faster overall, or is it just more compact?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep. You suggested storing
tlbc_index
instead since it was smaller. I think this is better for a couple of reasons:RESUME_CHECK
. Previously, we would have to load the bytecode pointer for the current thread and deopt if it didn't match what was in the frame. Now we only have to compare tlbc indices. This is a cost shift, however, since now the callers of_PyFrame_GetBytecode
have to do the more expensive load of the bytecode. I think the size reduction + simplification ofRESUME_CHECK
probably outweighs the higher cost of_PyFrame_GetBytecode
. tier2 is also still disabled in free-threaded builds, so it's a bit hard to evaluate the relative cost of slower trace exits vs fasterRESUME_CHECK
s. Once we get tier2 enabled we can reevaluate.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only purpose of passing the thread state is to initialize the tlbc index, IIUC.
Most callers will already have a tlbc index, so could we pass that instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The tlbc index is only present in free-threaded builds. To do this I think we'd need to either have separate versions of
_PyFrame_Initialize
for free-threaded/default builds or introduce the notion of thread-local bytecode into the default build. Passing thePyThreadState
as the first parameter seems simpler and more maintainable.