Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Check the metadata and the file #548

Open
jasinliu opened this issue Jul 24, 2024 · 4 comments
Open

feat: Check the metadata and the file #548

jasinliu opened this issue Jul 24, 2024 · 4 comments
Labels
enhancement New feature or request

Comments

@jasinliu
Copy link
Contributor

Describe the enhancement requested

Currently, our metadata and storage files are separate, and the metadata can be modified. This provides great convenience, but it will be very troublesome if an error occurs.

We need to provide a checking tool to check whether the metadata are valid and the consistency between the storage files and the metadata.

Component(s)

Format, Other

@jasinliu jasinliu added the enhancement New feature or request label Jul 24, 2024
@SemyonSinchenko
Copy link
Member

What do you think about making such a tool as a part of the planned GraphAr CLI?

#463

@jasinliu
Copy link
Contributor Author

What do you think about making such a tool as a part of the planned GraphAr CLI?

#463

Yes, please let me fix this.

@yecol
Copy link
Contributor

yecol commented Jul 26, 2024

Hi @jasinliu, I think this question does not come from the separate placement of metadata and files.
The situation you mentioned, by only change the metadata is allowed and even by design:

e.g., a user has a graph G with edges labeled A/B/C, and vertices labeled D/E.
He/She can easily generate a G' with edges labeled A and vertices labeled D,
by only copying/modifying the metadata M'.

Hence, for the validation tool, I suggest it may not check the pairing unmodified. But to validate these:

  • the modified metadata is self-valid. e.g., in the example above, in the M' for G', the edges A are connecting ONLY the vertices labeled D, otherwise the G' lack vertices.
  • For the storage files, I suggest in the metadata should record each file location and its digest/MD5, to ensure there is no modification since last archive. When loading, check the MD5 to ensure the payload of the data is what you intend to read.

@jasinliu
Copy link
Contributor Author

Hi @jasinliu, I think this question does not come from the separate placement of metadata and files. The situation you mentioned, by only change the metadata is allowed and even by design:

e.g., a user has a graph G with edges labeled A/B/C, and vertices labeled D/E.
He/She can easily generate a G' with edges labeled A and vertices labeled D,
by only copying/modifying the metadata M'.

Hence, for the validation tool, I suggest it may not check the pairing unmodified. But to validate these:

  • the modified metadata is self-valid. e.g., in the example above, in the M' for G', the edges A are connecting ONLY the vertices labeled D, otherwise the G' lack vertices.
  • For the storage files, I suggest in the metadata should record each file location and its digest/MD5, to ensure there is no modification since last archive. When loading, check the MD5 to ensure the payload of the data is what you intend to read.

Thank you very much, this is a very good suggestion. This suggestion provides such an idea that one storage file can correspond to multiple different graph data.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants