Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Misleading description for hash-to-Edwards point #400

Closed
pornin opened this issue Jun 23, 2022 · 15 comments
Closed

Misleading description for hash-to-Edwards point #400

pornin opened this issue Jun 23, 2022 · 15 comments
Labels
do-for-4.0 This should be resolved before we do a 4.0 release

Comments

@pornin
Copy link
Contributor

pornin commented Jun 23, 2022

The function EdwardsPoint::hash_from_bytes() is described as "performing hashing to the group" and explicitly references draft-irtf-cfrg-hash-to-curve:

/// Perform hashing to the group using the Elligator2 map
///
/// See https://tools.ietf.org/html/draft-irtf-cfrg-hash-to-curve-10#section-6.7.1
pub fn hash_from_bytes<D>(bytes: &[u8]) -> EdwardsPoint

This is a rather misleading description because:

  • This function does not implement the hash-to-curve process of draft-irtf-cfrg-hash-to-curve: the hash from input to a field element is different, and the handling of the sign bit is also different.
  • Only a single Elligator2 map is applied, yielding something that draft-irtf-cfrg-hash-to-curve calls encode_to_curve, not hash_to_curve (the draft reserves the latter name for the construction that invokes the map twice, and adds together the two output points, to yield a quasi-uniform output distribution).

The first point is mostly about interoperability; while the map is not the same as in the draft, it is still safe (in particular, this code just uses the first 255 bits of a hash output as a field element, while the draft would generate 129 extra bits and perform a modular reduction; but in Curve25519, the field modulus is very close to 2^255, so the bias from using just 255 input bits is negligible). The second point is, arguably, more important for security, because the current hash_from_bytes() produces points with a non-uniform distribution that is easy to distinguish from a uniform random generation. Most protocols (and in particular the Signal stuff that this implementation follows) don't mind, but some of them require a uniform distribution, and the lack of uniformity can have consequences ranging from invalid security proofs to actual information leaks.

I recommend, at least, modifying the documentation to make it explicit that the implemented functionality is not compatible with draft-irtf-cfrg-hash-to-curve, and that it has a non-uniform output distribution, which may be troublesome for some (admittedly not many) protocols.

Looking at the existing pull requests, I see PR #377, which not only modifies the description of hash_from_bytes(), but also implements (in new, additional functions) the draft-irtf-hash-to-curve process. I have not looked in details at that PR, but the idea seems good.

@rozbb rozbb added the do-for-4.0 This should be resolved before we do a 4.0 release label Oct 18, 2022
@rozbb
Copy link
Contributor

rozbb commented Nov 17, 2022

I don't like that these function is here at all, actually.

  • hash_from_bytes doesn't appear to adhere to any specs, it's misdocumented, and I can't tell who uses it (libsignal doens't seem to be). I also can't figure out where these test vectors are from.
  • The elligator_encode function (called map_to_curve in the RFC) uses notation from the paper, not the RFC, which makes it a bit hard to read. Also it only has a single test vector, and it doens't appear to have come from the RFC. Also, it's the Montgomery version of map_to_curve, but hash_from_bytes uses it to output Edwards curves.

I'd like to remove these functions entirely, unless someone can find a non-broken upstream use. If removal isn't possible, I'd like to feature-gate it behind hazmat or something.

Plan for removal: What I'd like to do is deprecate this in 4.0 and remove it in 4.1. But this technically breaks semver. So maybe a better route is to feature-gate it behind deprecate-in-4.1 or something.

@tarcieri thoughts?

@tarcieri
Copy link
Contributor

We’ve done a decent amount of work in the @RustCrypto elliptic-curve crate trying to implement curve-agnostic traits for the hash2curve APIs, although they’re still rough around the edges and I wouldn’t recommend them as anything more than an optional dependency, especially as the elliptic-curve crate is pre-1.0. But it may be good to expose inherent methods which try to align with those APIs.

My best guess is hash_from_bytes predates the RFC and is a nonstandard construction. If it doesn’t appear to have users it can probably be safely removed.

@tarcieri
Copy link
Contributor

tarcieri commented Nov 17, 2022

Also you can feature gate them behind a feature which is explicitly called out as exempt from semver, perhaps complete with a deprecation warning.

The same thing could be done with the elliptic-curve APIs too: they could be gated behind a hash2curve or hash2curve-unstable feature if we wanted to conditionally impl them.

@burdges
Copy link
Contributor

burdges commented Nov 18, 2022

elligator_ristretto_flavorappears to be a separate beast entirely.

@rozbb
Copy link
Contributor

rozbb commented Nov 20, 2022

Ugh you're right, the Ristretto version also does not appear to implement the Elligator standard (it only does half the map). I think this should also be deprecated. We could pretty easily re-introduce elligator2 methods in a future major release. The RFC has lots of test vectors

@burdges
Copy link
Contributor

burdges commented Nov 20, 2022

There were good reasons to expose the original singleton elligator map for ed25519, no?

It's not exposed for ristretto now anyways, likely because there are no ristretto protocols you wish to resemble. I simply meant the ristretto version does not depend upon the ed25519 version.

In ristretto, the actual hash from_uniform_bytes adds two elligator calls, so just add the RFC test vectors into the code. Right now, the code uses elligator test vectors, while the RFC uses elligator2 test vectors. It should all be fine.

@rozbb
Copy link
Contributor

rozbb commented Nov 20, 2022

I'd like to release before fixing this issue though. We can deprecate these functions but you bring up an important question: do we need a new name for the functions that will replace these? Is that a question for later?

@burdges
Copy link
Contributor

burdges commented Nov 21, 2022

Ignore my ed25519 question above..

Afaik, there are no problems with the Ristretto code, so afaik nothing there should be deprecated or changed now or in the future, just maybe adding the RFC test vectors eventually.

@rozbb
Copy link
Contributor

rozbb commented Nov 21, 2022

The Ristretto code appears to only be doing half of the one-way map defined in the RFC (ie it should be running MAP twice, and summing the results). We could retroactively fix it, but this would cause subtle breakage. So what I'd rather do is deprecate this, put it behind a feature flag, and introduce the correct function later.

I'm wrong. It appears to be doing the correct thing. I should add the real test vectors to make sure.

@rozbb
Copy link
Contributor

rozbb commented Nov 21, 2022

Fixed my point above. A separate point is that these hashing functions do no domain separation, despite the RFC mandating it. Fixing it would be breaking, but leaving it would be Not Good. Hmmm

@burdges
Copy link
Contributor

burdges commented Nov 21, 2022

You can remind people in comments, but clearly from_uniform_bytes cannot do domain separation, and it has a lot of users.

@pornin
Copy link
Contributor Author

pornin commented Nov 21, 2022

RistrettoPoint::from_uniform_bytes() implements the "one-way map" which is defined in the ristretto draft spec (that is probably going to be a published RFC soon). It is fine. The hash-to-curve draft (appendix B) defines hash_to_ristretto255() as the combination of an expand_message() call (which is basically a domain-separated XOF) and the one-way map from the the ristretto draft; in that sense, curve25519-dalek implements (correctly!) the second half of hash_to_ristretto255() (under the name from_uniform_bytes()).

My issue here was specifically about EdwardsPoint::hash_from_bytes(), which is completely separate from the ristretto code. My beef with hash_from_bytes() is twofold: that it wrongly documents that it follows the map described in the referenced hash-to-curve draft (it implements something close, but not that exact one), and that it uses the word "hash" despite having a non-uniform output. Normally you could fix that with only a documentation change, i.e. modifying the function description into something that explicitly states that it implements a map with a non-uniform output which is similar in principle (but not identical) to the Elligator2 map.

As for deprecating: since hash_from_bytes() does not follow any pre-existing standard (draft or not), it probably does not have many users, though I think that it has been used for some Signal-related purposes. You'd have to ask Trevor Perrin about that. Marking the function as deprecated is orthogonal to the question of fixing its documentation, though.

@jrose-signal
Copy link
Contributor

I can get in touch with Trevor if you like, but if we did use EdwardsPoint::hash_from_bytes I'm pretty sure we no longer do; our current "hash from bytes" operations are in lizard_ristretto.rs, plus the aforementioned RistrettoPoint::from_uniform_bytes. We don't use EdwardsPoints directly except in our slightly customized implementation of Ed25519 (maintaining compatibility with the implementation in the old libsignal-protocol-java).

(It'd be neat to get those upstreamed but I do not myself have a cryptographer background and can't say whether they're generally useful operations, or too specific to what Signal / zkgroup is doing.)

@rozbb
Copy link
Contributor

rozbb commented Nov 22, 2022

That's good to know. Hopefully this means the deprecation won't be noticed. Thank you!

@rozbb
Copy link
Contributor

rozbb commented Nov 24, 2022

Resolved in #438

@rozbb rozbb closed this as completed Nov 24, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
do-for-4.0 This should be resolved before we do a 4.0 release
Projects
None yet
Development

No branches or pull requests

5 participants