-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make Sum fields clearer. #2
Comments
so having two separate fields complicates things... when the file is received, the integrity sum gets persisted using an extended attribute, so now we either persist sum or signature or both, and we read one or the other and if both are present, is there a precedence? For now, I implemented a change in the name of the field, it is called "integrity", instead of "sum", and the intent is to allow for signature algorithms to be allowed in addition to the simple checksums used now What do people think? |
I guess we have agreed here some time ago, but perhaps @petersilva could bring here the recent discussion about unusual algorithms, like "arbitrary". I am quite OK with things like "FLK-SHA512" ( hash of the concatenation of the first and last kilobyte of the file (used for large files to avoid reading the whole thing.) It means a bit of work for me but makes perfect sense. Would we need also other size variants, e.g. "FLM-.." (first and last megabyte)? Or perhaps "FL4K-..."? It becomes a bit cryptic, but I could live with it. |
this comment mostly just reports what is in the Canadian implementations currently. All of the currently implemented ones are in use, and were added because of use cases encountered: https://github.com/MetPX/sarracenia/blob/master/doc/sr_postv3.7.rst#sum-method-value `
Not present: The FLK-SHA512 one is as you described it, not yet implemented, but thinking about it as in one use case, I need a compromise between no data checksum (such as name) and full data checksum (sha512) |
thoughts:
|
competing/complementary/nested goals for the sum field.
All the mesh algorithm needs is 1. These purposes encompass one another 3 does strictly more than 2, 2 more than 1. It also follows for bytes, a proper signature is going to be a lot more bytes than just a checksum, and in turn 2 will be say, 512 bytes, a lot more than a typical UUID. We could use separate data structures for all three, but it is tempting to somehow combine them. |
an example of identical data that differs. In North America, there is GOES DCS (Data communications Service) a low bandwidth uplink for automated stations. Various organizations/sites operate LRGS (Land readout ground stations) to pick up DCS data from a local satellite dish. Often there is a tail on the actual datum that gives information about signal strength and noise. Obviously such data is going to differ for every dish. People posting such data could make the data site neutral and binary identical if they strip off the radio metadata, but then people who want to know that would miss it. So ideally, a checksum that excluded that tail would be used. |
something that is constant is that an intermediary party does not know enough about the data to select an appropriate sum algorithm. The choice needs to be made by the source. |
I agree - the checksum must be independent on the actual transfer encoding/compression. Verification of the checksum makes sense only in the systems that are going to use the data and those will do the unpacking anyway. Moreover, "content" field is used only for small data. |
The ET-CTS committee found the sum="," notation idiosyncratic. One option:
"sum" = { "method" : "md5" , "value": "the checksum value" }
They also raised the slightly different notion of a signature that can accomplish the same thing as a checksum, while also confirming provenance. The suggestion is:
"signature" = { "method": "???" , "value": "the signature value" }
Currently, sum is a required field, but the proposal is to have one of sum or signature required.
The text was updated successfully, but these errors were encountered: