Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Off-by-one error in zero_pad #1599

Closed
daejunpark opened this issue Sep 5, 2019 · 14 comments · Fixed by #1611 or #1605
Closed

Off-by-one error in zero_pad #1599

daejunpark opened this issue Sep 5, 2019 · 14 comments · Fixed by #1611 or #1605

Comments

@daejunpark
Copy link
Contributor

daejunpark commented Sep 5, 2019

Version Information

  • vyper Version (output of vyper --version): 0.1.0b12+commit.8663ac5

What's your issue about?

Off-by-one error in zero_pad():

https://github.com/ethereum/vyper/blob/ab39d4e9c5168eff2646dd29e23d7212654b636d/vyper/parser/parser_utils.py#L904

Logically, the loop termination condition should use ge instead of gt.

How can it be fixed?

It can be fixed by replace gt by ge in the above, but I'm not sure if such a fix does not affect others.

@charles-cooper
Copy link
Member

@daejunpark could you please take a look at 5a84571 and let me know if you think it is a reasonable solution?

@daejunpark
Copy link
Contributor Author

daejunpark commented Sep 10, 2019

@charles-cooper thanks for this. It is a really clever trick, and I like it (from my hacker instinct)!

But I have a concern about this kind of off-label uses of opcodes for the long term in general. For example, if people discuss the behavior change of calldatacopy in a future hard fork, they may be not aware of this off-label use, and fail to consider/analyze the effect of their change to this.

I have a question. Is the zero padding size always less than 32? (I cannot think of a counterexample.)

If so, you may want to consider to use the ID precompiled for the zero padding. That is,

CALL <gas> 4 0 <src> <len> <dest> <len>

where <src> points to the zero bytes that can be preallocated in the first region of the memory.
Although this will consume more gas than the calldatacopy approach, but it is an expected use case, so would be safer for the long term.

@charles-cooper
Copy link
Member

charles-cooper commented Sep 10, 2019

If so, you may want to consider to use the ID precompiled for the zero padding. That is,
CALL <gas> 4 0 <src> <len> <dest> <len>

Yes, I've been thinking about this approach for awhile. In this case the zero-pad is always <32 bytes, but having this technique for zeroing arbitrary amounts of memory would be handy. So we could always guarantee that the source is a zeroed block of memory simply by reading from past-the-end of where we have allocated, or some number so large it would be impractical to write to (e.g. 2**32). But I think reasoning about where the past-the-end pointer should be for CALLDATACOPY is easier - it is always CALLDATASIZE!

But I have a concern about this kind of off-label uses of opcodes for the long term in general. For example, if people discuss the behavior change of calldatacopy in a future hard fork, they may be not aware of this off-label use, and fail to consider/analyze the effect of their change to this.

I was also concerned about this. However, I feel like the existing semantics were carefully considered and were written that way for a reason. Note the difference in semantics between CALLDATACOPY and CODECOPY - CODECOPY issues a STOP if you try to read from past-the-end. So even if it may not be the "intended" use-case for CALLDATACOPY, it is not exactly "off-label" either! A change would be quite a significant - almost pathological - change to the EVM and would definitely break the semantics of existing programs. But in any case, I think what we can do is create a new pseudo-opcode mzero <dst> <len> which zeroes out the destination block (analogously to stdlib's bzero) and then choose to implement it however we want. Then even if CALLDATACOPY is changed for whatever reason, the overhead of swapping out the implementation of mzero is fairly low.

@daejunpark
Copy link
Contributor Author

@charles-cooper I agree with all of your points, but please note that the overhead of updating the mzero implementation may not be low from the contract owners' perspective, because they will need to redeploy the updated bytecode if that happens, and the redeployment may not be easy in certain situations.

@CarlBeek what do you think about this in terms of the deposit contract operation?

@daejunpark
Copy link
Contributor Author

daejunpark commented Sep 10, 2019

FYI, this is also related: #1610

@charles-cooper
Copy link
Member

@charles-cooper I agree with all of your points, but please note that the overhead of updating the mzero implementation may not be low from the contract owners' perspective, because they will need to redeploy the updated bytecode if that happens, and the redeployment may not be easy in certain situations.

Thanks for the note. My assumption was that, if the opcode semantics are ever changed, it would be as an EVM version update so that only contracts deployed after fork_blocknum would have the new semantics, while existing contracts would retain the old semantics.

@charles-cooper
Copy link
Member

charles-cooper commented Sep 10, 2019

I want to do the quick fix for the gt issue though since it seems more pressing. As it stands, the loop can write a \0 byte to one past the end of the bytearray.

@daejunpark
Copy link
Contributor Author

Thanks for the quick fix! It will help to facilitate the deposit contract verification process.

BTW, my understanding is that even a contract deployed before certain fork will be ran using the new semantics of the folk. (Or, I may misunderstood your explanation.)

@charles-cooper
Copy link
Member

@daejunpark there are several proposals to add versioning to the EVM https://github.com/ethereum-cat-herders/PM/issues/53. I don't believe (and this is just my educated guess) such a significant semantic change to CALLDATALOAD would make it in without some kind of versioning scheme.

@CarlBeek
Copy link

Initially it seemed to me that this was too off-label to be a part of compiler behaviour, but after taking a look at the yellow paper, Geth, and py-evm implementations, I agree with @charles-cooper that this would be something that would need to be specified via EVM versioning.

On the topic of longevity and redeployment, the deposit contract is intended to live until at least the death of Eth1 somewhere way in the future and thus it is vital that this be resolved in a robust manner. As mentioned above, however, I don't think that this will present a problem.

@fubuloubu
Copy link
Member

Meeting: Merge #1611, then #1605 later

@davesque
Copy link
Contributor

Thought I'd chime in and just say that, in my mind, the likelihood that CALLDATALOAD will stop yielding zeros for ranges outside of CALLDATASIZE any time soon is basically zero. I'd feel pretty comfortable adding compiler features that depend on that behavior.

@fubuloubu
Copy link
Member

Note from meeting: relying on CALLDATALOAD assumption is okay as long as it gets documented well.

@daejunpark
Copy link
Contributor Author

Just to make sure, I do not have an objection to the calldataload approach, if it is very unlikely that the current semantics of calldataload will change in the future.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
5 participants