Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate PoV sizes #667

Closed
ordian opened this issue Sep 11, 2023 · 12 comments
Closed

Investigate PoV sizes #667

ordian opened this issue Sep 11, 2023 · 12 comments

Comments

@ordian
Copy link

ordian commented Sep 11, 2023

gm

I wonder if you're aware of the PoV sizes of the blocks produced by both Basilisk on Kusama and HydraDX on Polkadot. Our scraping services show that the PoV contribution of each of your parachain is around 8GB per day, which is more than 1MB per PoV block on average (this is way above any other parachain).

Are you aware of this or know that might be contributing to the PoV size? (can also probably see the collators logs to confirm)

Given that there not enough extrinsics in the blocks to explain that, is it possible that you for example do some iteration in every block, reading some state? If not, that could indicate a bug in state proof recorder that includes some data it shouldn't.

@mrq1911
Copy link
Member

mrq1911 commented Sep 11, 2023

i can confirm this, collator is producing 1MB+ proofs for as long i can see in collator logs (2 weeks). do you have historical data about PoV size?

@ordian
Copy link
Author

ordian commented Sep 11, 2023

I can only say it was still the case in July but don't have historical data below that.

@ordian
Copy link
Author

ordian commented Sep 11, 2023

Might be related: paritytech/polkadot-sdk#1498

@burdges
Copy link

burdges commented Sep 13, 2023

@cheme do you have a radix 2 flavor of https://github.com/paritytech/trie or whatever somewhere?

@cheme
Copy link

cheme commented Sep 14, 2023

@cheme do you have a radix 2 flavor of https://github.com/paritytech/trie or whatever somewhere?

Had this branch from 2020 paritytech/trie#84, but today it is a lot behind current trie crate (probably 1 or 2 days of work to update).

@mrq1911
Copy link
Member

mrq1911 commented Sep 18, 2023

it looks like we are attaching code wasm in every block proof.

this is example list of tries we are building the proof from:

[
  ''... 2538366 more characters,
  '5f08dcde934c658227ee1dfafcd6e16903050108dc4d79aad5a9d01a359995838830a80733a0bff7e4eb087bfc621ef1873fec49be4f21c56d926b91f020b5071f14935cb93f001f1127c53d3eac6eed23ffea64',
  '5f0a42f33323cb5ced3b44dd825fda9fcc804545454545454545454545454545454545454545454545454545454545454545',
  '5f0e0621c4869aa60c02be9adcc98a0d1d050108dc4d79aad5a9d01a359995838830a80733a0bff7e4eb087bfc621ef1873fec49be4f21c56d926b91f020b5071f14935cb93f001f1127c53d3eac6eed23ffea64',
  '764704b568d21667356a5a050c118746b4def25cfda6ef3a00000000804545454545454545454545454545454545454545454545454545454545454545',
  '7d0bce545fb382c34570e5dfbf338f5e4e7b9012096b41c4eb3aaf947f6ea429080000',
  '7e0f0c53fa332d4d9712c66fd92efcb64e7b9012096b41c4eb3aaf947f6ea429080000',
  '7e1467a096bcd71a5b6a0c8155e208104e7b9012096b41c4eb3aaf947f6ea429080000',
  '7e3237373ffdfeb1cab4222e3b520d6b4e7b9012096b41c4eb3aaf947f6ea429080200',
  '7e323df7cc47150b3930e2666b0aa3134e7b9012096b41c4eb3aaf947f6ea429080200',
  ...
]

1st one contains our runtime code wasm:

🗜  Compressed:                 Yes, 78.78%
✨ Reserved meta:                OK - [6D, 65, 74, 61]
🎁 Metadata version:            V14
🔥 Core version:                hydradx-178 (hydradx-0.tx1.au1)
🗳️  system.setCode hash:               0x64c439e579c3bfff9f4ebb8be01ca8a33f5c6f565c42531b46011974a9f79c93
🗳️  authorizeUpgrade hash:     0xa059f2c663f68b95f2e72ad34e2ff34569706ebee1c6fe74c519e847eb5dab3a
#️⃣  Blake2-256 hash:           0x32dc435cbda2592facebf36852feb2ec411f7b77cd33a9ec8ba109cb579a7cb9
📦 IPFS:                        https://www.ipfs.io/ipfs/QmU5Lw394PxSziP6vMNH7B2UdFn4XZfXVKQrG2hXG4NELk

why do we attach it in storage proof?

@cheme
Copy link

cheme commented Sep 18, 2023

Well you are probably using trie_version 0 (maybe state_version), with version 1 the value is not attach to the node and only include if accessed by the runtime. WARNING: switching requires a migration (or warpsync will be broken).

So if using state_version 0, maybe you query an entry close to key ":code" eg ":codex" that would include the node (and its value) at ":code" into your proof.
so any query to key starting with ":code", and some insert or removal of key close to :code (which may result in changing the node prefix at :code and thus touching the node), would include the wasm in the proof.

1st one appears to contain wasm file

yes wasm is in the top trie at key ':code' (utf8 values).
would make sense to trace all runtime access during block processing and try to find what key query can touch ':code' trie node (I don't remember how it can be done but maybe with try-runtime and some logging (cannot check right now), if missing traces can be added on sp-io storage function or sp-state-machine trie accesses directly.

@enthusiastmartin
Copy link
Contributor

enthusiastmartin commented Sep 18, 2023

With state_version set to 1.

2023-09-18 16:40:12 [Parachain] PoV size { header: 0.1787109375kb, extrinsics: 2.865234375kb, storage_proof: 4.306640625kb }
2023-09-18 16:40:12 [Parachain] Compressed PoV size: 6.330078125kb

So I guess, we should consider migrating.

Would have some details how to do that?

@cheme
Copy link

cheme commented Sep 19, 2023

There is a link to the md guide and lot of link to different progress on relay chain https://github.com/paritytech/devops/issues/1508#issuecomment-1271565180 .
Note that the migration process for parachain may seems more complicated than for a relay (progress by adding extrinsic in each block manually), to avoid going over the block size (with the automatic process used in relay chain there is always a risk that the content of the chain will include a very big value on top of an already big proof).
But from my point of view a parachain could still audit cautiously its content and assert such scenario will not happen (even possibly run some value migration ahead and skip them afterward in the automatic migration process: but this is not currently coded in the state-migration pallet).
Generally, the migration process is not something complicated, it just requires that every key value in the runtime get written again once (if you look at the state migration automatic process, we just store a progress key and advance a few value at the start of every blocks).

@mrq1911
Copy link
Member

mrq1911 commented Sep 20, 2023

There is a link to the md guide and lot of link to different progress on relay chain https://github.com/paritytech/devops/issues/1508#issuecomment-1271565180 .

sry, this link is broken for me, is it in private repo?

@cheme
Copy link

cheme commented Sep 21, 2023

🤦 yes it is a private one, the link to the guide was https://hackmd.io/JagpUd8tTjuKf9HQtpvHIQ (a post refering to it https://forum.polkadot.network/t/state-trie-migration/852).

@jak-pan
Copy link
Contributor

jak-pan commented Apr 22, 2024

Solved by #799 and running migration on wasm

@jak-pan jak-pan closed this as completed Apr 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants