Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The "entropy per bit" value is misleading (for non-bit-oriented entropy sources) #13

Open
joshuaehill opened this issue Sep 20, 2018 · 9 comments

Comments

@joshuaehill
Copy link

The "Min Entropy per bit" value that the tool outputs is likely to be misinterpreted and abused.

The tool supplies an average value for the min entropy per bit (the per-symbol min entropy divided by the bits per symbol). It is likely that folks will make the assumption that entropy is uniformly distributed throughout the symbol (a wildly incorrect assumption for many sources!) and attempt to sub-divide the symbols and credit the proportional entropy for the sub-portion of the symbol that is being used.

@dj-on-github
Copy link
Owner

It reports both. I'm not sure why this matters - the entropy per bit is useful for getting normalized results for comparisons. From a certification perspective, you want to show that you are meeting the input requirements of the extractor and all the vetted extractors take multi-symbol inputs. So (number_of_input_symbols_to_ext * entropy_per_symbol) == (number_of_input_bits_to_ext * entropy per bit).

@joshuaehill
Copy link
Author

That last equality statement is not true for many sources if you truncate the samples. It is true if complete samples are used.

For example, one common scheme is to sample a fast running counter (e.g., a TSC value), where the sampling occurs as a consequence of some event whose exact timing is difficult for an attacker to guess. If you look at how the min entropy is distributed in the samples from such a system, the low-order bits are often more difficult for an attacker to predict than the higher-order bits (the high order bits are often basically wholly known to any suitably informed attacker). Thus the low order bits tend to have more min entropy than the high order bits.

Providing a min entropy assessment in terms of a per-bit average suggests that one can freely subdivide a sample and credit each bit of the sample as containing the stated average. If one includes the entire sample, then (by definition) you get the total sample entropy, and the equality you state is clearly true. If you instead subdivide the sample, it's hard to comment about the entropy of the part that remains, and for systems where min entropy isn't uniformly distributed, it's very likely that the number of bits multiplied by the per-bit average min entropy won't be the correct value that should be credited.

I have witnessed this occurring "in the wild" on several instances, and the results are sometimes unfortunate.

@dj-on-github
Copy link
Owner

dj-on-github commented Sep 20, 2018 via email

@joshuaehill
Copy link
Author

You may want to wait to put a bunch of time into making the output look like the NIST's 2016 python implementation, as NIST plans on releasing a completely different C++ tool "real soon now". The last I heard (about a month ago), they had all the development done, and were performing testing.

@dj-on-github
Copy link
Owner

dj-on-github commented Sep 20, 2018 via email

@dj-on-github
Copy link
Owner

CSV is in. Multi file isn't.

@joshuaehill
Copy link
Author

NIST released their updated reference implementation today.

@dj-on-github
Copy link
Owner

dj-on-github commented Sep 21, 2018 via email

@yuyinw
Copy link
Contributor

yuyinw commented Jul 6, 2021

when i use cpu jitter collect 3840byte(30720bit), and i use this python tool , it out Minimum Min Entropy = 0.6581506573264674, so the final result is (30720 * 0.6581506573264674 = 20218 bit )?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants