-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: implement usage_key_hashes and expose via sequence metadata api #28436
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Notes for reviewers.
openedx/core/lib/hash_utils.py
Outdated
def hash_usage_key(usage_key: UsageKey) -> str: | ||
""" | ||
Get the blake2b hash key for the given usage_key and encode the value. The | ||
hash key will be added to the usage key's mapping dictionary for decoding | ||
in LMS. | ||
|
||
Args: | ||
usage_key: the id of the location to which to generate the path | ||
|
||
Returns: | ||
The string of the encoded hashed key. | ||
""" | ||
short_key = hashlib.blake2b(bytes(str(usage_key), 'utf-8'), digest_size=6) | ||
encoded_hash = urlsafe_b64encode(bytes(short_key.hexdigest(), 'utf-8')) | ||
return str(encoded_hash, 'utf-8') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@KristinAoki This is pulled near-verbatim from your implementation in openedx.core.djangoapps.course_experience.url_helpers
. The differences are:
- It's in a more generic location now. I didn't want to make
learning_sequences
depend on something course-specific likecourse_experience
. - It does not save the mappings to a persisted in-process dictionary; it's a pure function now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I ended up moving this again; it's in openedx.core.djangoapps.content.learning_sequences.api.key_hashing
now.
# Load requested sequence from learning_sequences, trying to interpret | ||
# the path param first as a UsageKey, and as then as a hash of a UsageKey. | ||
try: | ||
usage_key = UsageKey.from_string(usage_key_string) | ||
except InvalidKeyError: | ||
raise NotFound(f"Invalid usage key: '{usage_key_string}'.") # lint-amnesty, pylint: disable=raise-missing-from | ||
# Try to parse as a real UsageKey, and then load the sequence. | ||
usage_key = UsageKey.from_string(usage_key_or_hash) | ||
learning_sequence = get_learning_sequence(usage_key) | ||
except (InvalidKeyError, LearningSequenceData.DoesNotExist): | ||
learning_sequence = None | ||
if not usage_key: | ||
# If we are here, then _parsing_ failed. | ||
try: | ||
# Try treating the path parameter as a *hash* of a UsageKey. | ||
learning_sequence = get_learning_sequence_by_hash(usage_key_or_hash) | ||
usage_key = learning_sequence.usage_key | ||
except LearningSequenceData.DoesNotExist: | ||
learning_sequence = None | ||
if not learning_sequence: | ||
raise NotFound(f"Invalid usage key: '{usage_key_or_hash}'.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
New behavior 1: If the Sequence Metadata API gets a hashed usage_key as its input, it will leverage learning_sequences to recover the correct sequence and its original usage_key.
This way, the courseware frontend can immediately call the Sequence Metadata API with a hashed usage_key, instead of being blocked on the Course Metadata API's response.
sequence_data = sequence.get_metadata(view=view) | ||
|
||
# Insert usage_key_hashes of this sequence and its units into the response. | ||
# We do this here in order to avoid further complicating seq_module | ||
# with details of usage key hashing. | ||
sequence_data["usage_key_hash"] = learning_sequence.usage_key_hash | ||
for unit_data in sequence_data["items"]: | ||
unit_data["usage_key_hash"] = hash_usage_key(unit_data["item_id"]) | ||
|
||
return Response(sequence.get_metadata(view=view)) | ||
return Response(sequence_data) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
New behavior 2: The Sequence Metadata API will include hashes of the sequence usage_key and all its unit usage_keys.
This way, the frontend can construct a unit_usage_key<->unit_usage_key_hash
mapping as soon as this API call returns.
7c99d47
to
52d0c6b
Compare
openedx/core/djangoapps/content/learning_sequences/api/key_hashing.py
Outdated
Show resolved
Hide resolved
openedx/core/djangoapps/content/learning_sequences/api/outlines.py
Outdated
Show resolved
Hide resolved
openedx/core/djangoapps/content/learning_sequences/api/key_hashing.py
Outdated
Show resolved
Hide resolved
openedx/core/djangoapps/content/learning_sequences/api/outlines.py
Outdated
Show resolved
Hide resolved
8e77416
to
ce3dc0c
Compare
ce3dc0c
to
7f63eb1
Compare
openedx/core/djangoapps/content/learning_sequences/api/tests/test_data.py
Show resolved
Hide resolved
openedx/core/djangoapps/content/learning_sequences/api/tests/test_data.py
Show resolved
Hide resolved
usage_keys_list = ', '.join([ | ||
str(sequence.usage_key) for sequence in sequences | ||
]) | ||
raise Exception( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we throw something more specific here. ValueError
maybe?
# A URL-safe Base64-encoding of a blake2b hash of the usage key. | ||
# For aesthetic use (eg, as a path parameter, to shorten URLs). | ||
# This field is experimental. See TNL-8638 for more information. | ||
usage_key_hash = attr.ib(type=str) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Might want to specify in the comment whether padding =
characters are stripped or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will do.
# Length, in characters, of a base64-encoded usage key hash. | ||
# These hashes are being experimentally used to shorten courseware URLs. | ||
# See TNL-8638 for more details. | ||
USAGE_KEY_HASH_LENGTH = 8 | ||
|
||
# Number of hash digest bits needed in order to produce a base64-encoded output | ||
# of length `USAGE_KEY_HASH_LENGTH`. | ||
# Each base64 character captures 6 bits (because 2 ^ 6 = 64). | ||
_USAGE_KEY_HASH_BITS = USAGE_KEY_HASH_LENGTH * 6 | ||
|
||
# Number of hash digest bytes needed in order to produce a base64-encoded output | ||
# of length `USAGE_KEY_HASH_LENGTH`. Is equal to one eighth of `_USAGE_KEY_HASH_BITS`. | ||
# In the event that _USAGE_KEY_HASH_BITS is not divisible by 8, we round up. | ||
# We will cut off the extra any output at the end of `hash_usage_key`. | ||
_USAGE_KEY_HASH_BYTES = math.ceil(_USAGE_KEY_HASH_BITS / 8) | ||
|
||
# A regex to capture strings that could be the output of `hash_usage_key`. | ||
# Captures a string of length `USAGE_KEY_HASH_LENGTH`, made up of letters, | ||
# numbers, dashes, underscores, and/or equals signs. | ||
USAGE_KEY_HASH_PATTERN = rf'(?P<usage_key_hash>[A-Z_a-z0-9=-]{{{USAGE_KEY_HASH_LENGTH}}})' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't the linter complain if you declare these constants in the middle of the module? If not, that works. 🤷
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No complaints! Maybe because they come before any function definitions?
usage_key_hash = models.CharField( | ||
max_length=USAGE_KEY_HASH_LENGTH, | ||
null=True, | ||
default=None, | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs an index of its own, or the API query that fetches using it will have to do a full table scan.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or alternatively, you could make the unique_together
constraint go in the opposite order, so that usage_key_hash
came first.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Didn't realize the order made a difference. I'll swap the order of those.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it doesn't matter for the purposes of enforcing the constraint, but it does for speed of lookup. When you have a composite index of (a, b)
, the database can use it for lookups on just a
as well.
except InvalidKeyError: | ||
raise NotFound(f"Invalid usage key: '{usage_key_string}'.") # lint-amnesty, pylint: disable=raise-missing-from | ||
learning_sequence = get_learning_sequence(usage_key) | ||
except (InvalidKeyError, LearningSequenceData.DoesNotExist) as exc: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please separate the logged error if we have LearningSequenceData.DoesNotExist
, and have it not fail altogether when that happens?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some comments, requests, and optional nits.
Your PR has finished running tests. There were no failures. |
e0cb001
to
6ea685b
Compare
We want to enable shorter URLs in the courseware pages of frontend-app-learning. To do this, we define a function hash_usage_key, which maps usage keys to an 8-character- long, urlsafe-base64-encoded hash. The function is defined in the learning_sequences data model, and hashes of sequence keys are persisted to learning_sequences upon outline generation. We also present an experimental learning_sequences python API for fetching sequences by their hash (instead of their usage key). Building on this, we enhance the Sequence Metadata HTTP API to accept hashes as sequence identifiers (in addition to usage keys). The API also now returns the usage_key_hash of the requested sequence, as well as the usage_key_hashes of every unit it returns as part of the sequence. We may revisit the exposure of usage_key_hashes from this API in https://openedx.atlassian.net/browse/TNL-8638, as we think it may be better to only expose this data from the learning_sequences Outlines HTTP API. Finally, we should note that this removes one constraint from the LearningSequence model: <unique together on (learning_context_key, usage_key)> and adds two new ones: <unique on usage_key> <unique together on (learning_context_key, usage_key_hash)> in order to support the new learning_sequences Public Jira (archived): https://openedx.atlassian.net/browse/TNL-8511 Private-ref: https://2u-internal.atlassian.net/browse/TNL-8511 Co-Authored-By: Kristin Aoki <[email protected]>
6ea685b
to
d3df2e6
Compare
Just rebased this, because I still think it'd be great to improve the Learning URLs if any of us ever get the chance to keep working on it :) FYI @KristinAoki . |
@kdmccormick should this be closed or maybe converted to a draft? |
Yep, I'll convert to draft. |
If we do ever get back to this (and I hope we do), I think the right thing to do would be to compute and store the hashes on Learning Core Container models rather than edx-platform learning_sequences models. |
Description
We need to make the Sequence Metadata API aware of hashed usage keys. Specifically:
usage_key
orusage_key_hash
as its path parameter.usage_key_hash
es of the sequence and of all its units.The squashed commit message will be:
Supporting information
This PR is in support of TNL-8511, courseware URL shortening. It will be incorporated into @KristinAoki 's edx-platform PR, which powers her frontend-app-learning PR.
This PR makes references to TNL-8638. That ticket is currently empty, but we can use it to retroactively define and then track this project's follow-up work.
Testing instructions
"usage_key_hash": null
on the returned sequence and all of its units.usage_key_hash
values are filled in. Each value should be distinct.usage_key_hash
. Call the Sequence Metadata API one more time, replacing the sequence usage key in the URL parameter with the usage_key_hash.Deadline
Thurs 8/12