Off-by-one error in `zero_pad` #1599

daejunpark · 2019-09-05T00:18:59Z

Version Information

vyper Version (output of vyper --version): 0.1.0b12+commit.8663ac5

What's your issue about?

Off-by-one error in zero_pad():

https://github.com/ethereum/vyper/blob/ab39d4e9c5168eff2646dd29e23d7212654b636d/vyper/parser/parser_utils.py#L904

Logically, the loop termination condition should use ge instead of gt.

How can it be fixed?

It can be fixed by replace gt by ge in the above, but I'm not sure if such a fix does not affect others.

The text was updated successfully, but these errors were encountered:

charles-cooper · 2019-09-08T21:14:38Z

@daejunpark could you please take a look at 5a84571 and let me know if you think it is a reasonable solution?

daejunpark · 2019-09-10T17:33:45Z

@charles-cooper thanks for this. It is a really clever trick, and I like it (from my hacker instinct)!

But I have a concern about this kind of off-label uses of opcodes for the long term in general. For example, if people discuss the behavior change of calldatacopy in a future hard fork, they may be not aware of this off-label use, and fail to consider/analyze the effect of their change to this.

I have a question. Is the zero padding size always less than 32? (I cannot think of a counterexample.)

If so, you may want to consider to use the ID precompiled for the zero padding. That is,

CALL <gas> 4 0 <src> <len> <dest> <len>

where <src> points to the zero bytes that can be preallocated in the first region of the memory.
Although this will consume more gas than the calldatacopy approach, but it is an expected use case, so would be safer for the long term.

charles-cooper · 2019-09-10T18:05:50Z

If so, you may want to consider to use the ID precompiled for the zero padding. That is,
CALL <gas> 4 0 <src> <len> <dest> <len>

Yes, I've been thinking about this approach for awhile. In this case the zero-pad is always <32 bytes, but having this technique for zeroing arbitrary amounts of memory would be handy. So we could always guarantee that the source is a zeroed block of memory simply by reading from past-the-end of where we have allocated, or some number so large it would be impractical to write to (e.g. 2**32). But I think reasoning about where the past-the-end pointer should be for CALLDATACOPY is easier - it is always CALLDATASIZE!

But I have a concern about this kind of off-label uses of opcodes for the long term in general. For example, if people discuss the behavior change of calldatacopy in a future hard fork, they may be not aware of this off-label use, and fail to consider/analyze the effect of their change to this.

I was also concerned about this. However, I feel like the existing semantics were carefully considered and were written that way for a reason. Note the difference in semantics between CALLDATACOPY and CODECOPY - CODECOPY issues a STOP if you try to read from past-the-end. So even if it may not be the "intended" use-case for CALLDATACOPY, it is not exactly "off-label" either! A change would be quite a significant - almost pathological - change to the EVM and would definitely break the semantics of existing programs. But in any case, I think what we can do is create a new pseudo-opcode mzero <dst> <len> which zeroes out the destination block (analogously to stdlib's bzero) and then choose to implement it however we want. Then even if CALLDATACOPY is changed for whatever reason, the overhead of swapping out the implementation of mzero is fairly low.

daejunpark · 2019-09-10T18:58:34Z

@charles-cooper I agree with all of your points, but please note that the overhead of updating the mzero implementation may not be low from the contract owners' perspective, because they will need to redeploy the updated bytecode if that happens, and the redeployment may not be easy in certain situations.

@CarlBeek what do you think about this in terms of the deposit contract operation?

daejunpark · 2019-09-10T19:21:45Z

FYI, this is also related: #1610

charles-cooper · 2019-09-10T20:08:35Z

@charles-cooper I agree with all of your points, but please note that the overhead of updating the mzero implementation may not be low from the contract owners' perspective, because they will need to redeploy the updated bytecode if that happens, and the redeployment may not be easy in certain situations.

Thanks for the note. My assumption was that, if the opcode semantics are ever changed, it would be as an EVM version update so that only contracts deployed after fork_blocknum would have the new semantics, while existing contracts would retain the old semantics.

charles-cooper · 2019-09-10T20:14:36Z

I want to do the quick fix for the gt issue though since it seems more pressing. As it stands, the loop can write a \0 byte to one past the end of the bytearray.

daejunpark · 2019-09-10T21:33:34Z

Thanks for the quick fix! It will help to facilitate the deposit contract verification process.

BTW, my understanding is that even a contract deployed before certain fork will be ran using the new semantics of the folk. (Or, I may misunderstood your explanation.)

charles-cooper · 2019-09-10T21:47:27Z

@daejunpark there are several proposals to add versioning to the EVM https://github.com/ethereum-cat-herders/PM/issues/53. I don't believe (and this is just my educated guess) such a significant semantic change to CALLDATALOAD would make it in without some kind of versioning scheme.

CarlBeek · 2019-09-11T13:33:49Z

Initially it seemed to me that this was too off-label to be a part of compiler behaviour, but after taking a look at the yellow paper, Geth, and py-evm implementations, I agree with @charles-cooper that this would be something that would need to be specified via EVM versioning.

On the topic of longevity and redeployment, the deposit contract is intended to live until at least the death of Eth1 somewhere way in the future and thus it is vital that this be resolved in a robust manner. As mentioned above, however, I don't think that this will present a problem.

fubuloubu · 2019-09-16T15:33:56Z

Meeting: Merge #1611, then #1605 later

davesque · 2019-09-17T06:01:16Z

Thought I'd chime in and just say that, in my mind, the likelihood that CALLDATALOAD will stop yielding zeros for ranges outside of CALLDATASIZE any time soon is basically zero. I'd feel pretty comfortable adding compiler features that depend on that behavior.

fubuloubu · 2019-09-17T13:10:54Z

Note from meeting: relying on CALLDATALOAD assumption is okay as long as it gets documented well.

daejunpark · 2019-09-17T16:52:07Z

Just to make sure, I do not have an objection to the calldataload approach, if it is very unlikely that the current semantics of calldataload will change in the future.

fubuloubu mentioned this issue Sep 10, 2019

Meeting 16th September 2019 #1608

Closed

daejunpark mentioned this issue Sep 10, 2019

Non-semantics-preserving refactoring for zero_pad() #1610

Closed

charles-cooper mentioned this issue Sep 16, 2019

Fix termination condition in zero_pad #1611

Merged

charles-cooper mentioned this issue Sep 21, 2019

Zero pad from calldata #1605

Merged

charles-cooper closed this as completed in #1611 Sep 24, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Off-by-one error in `zero_pad` #1599

Off-by-one error in `zero_pad` #1599

daejunpark commented Sep 5, 2019 •

edited

Loading

charles-cooper commented Sep 8, 2019

daejunpark commented Sep 10, 2019 •

edited

Loading

charles-cooper commented Sep 10, 2019 •

edited

Loading

daejunpark commented Sep 10, 2019

daejunpark commented Sep 10, 2019 •

edited

Loading

charles-cooper commented Sep 10, 2019

charles-cooper commented Sep 10, 2019 •

edited

Loading

daejunpark commented Sep 10, 2019

charles-cooper commented Sep 10, 2019

CarlBeek commented Sep 11, 2019

fubuloubu commented Sep 16, 2019

davesque commented Sep 17, 2019

fubuloubu commented Sep 17, 2019

daejunpark commented Sep 17, 2019

Off-by-one error in zero_pad #1599

Off-by-one error in zero_pad #1599

Comments

daejunpark commented Sep 5, 2019 • edited Loading

Version Information

What's your issue about?

How can it be fixed?

charles-cooper commented Sep 8, 2019

daejunpark commented Sep 10, 2019 • edited Loading

charles-cooper commented Sep 10, 2019 • edited Loading

daejunpark commented Sep 10, 2019

daejunpark commented Sep 10, 2019 • edited Loading

charles-cooper commented Sep 10, 2019

charles-cooper commented Sep 10, 2019 • edited Loading

daejunpark commented Sep 10, 2019

charles-cooper commented Sep 10, 2019

CarlBeek commented Sep 11, 2019

fubuloubu commented Sep 16, 2019

davesque commented Sep 17, 2019

fubuloubu commented Sep 17, 2019

daejunpark commented Sep 17, 2019

Off-by-one error in `zero_pad` #1599

Off-by-one error in `zero_pad` #1599

daejunpark commented Sep 5, 2019 •

edited

Loading

daejunpark commented Sep 10, 2019 •

edited

Loading

charles-cooper commented Sep 10, 2019 •

edited

Loading

daejunpark commented Sep 10, 2019 •

edited

Loading

charles-cooper commented Sep 10, 2019 •

edited

Loading