Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: implement usage_key_hashes and expose via sequence metadata api #28436

Closed

Conversation

kdmccormick
Copy link
Member

@kdmccormick kdmccormick commented Aug 10, 2021

Description

We need to make the Sequence Metadata API aware of hashed usage keys. Specifically:

  1. The Seq Meta API needs to be able to accept either a usage_key or usage_key_hash as its path parameter.
  2. The Seq Meta API needs to return the usage_key_hashes of the sequence and of all its units.

The squashed commit message will be:

feat: implement usage_key_hashes and expose via sequence metadata api

We want to enable shorter URLs in the courseware pages of
frontend-app-learning.

To do this, we define a function hash_usage_key,
which maps usage keys to an 8-character-
long, urlsafe-base64-encoded hash. The function is
defined in the learning_sequences data model, and hashes
of sequence keys are persisted to learning_sequences upon
outline generation. We also present an experimental
learning_sequences python API for fetching sequences by
their hash (instead of their usage key).

Building on this, we enhance the Sequence Metadata
HTTP API to accept hashes as sequence identifiers
(in addition to usage keys). The API also now returns
the usage_key_hash of the requested sequence, as
well as the usage_key_hashes of every unit it returns
as part of the sequence. We may revisit the exposure
of usage_key_hashes from this API in TNL-8638,
as we think it may be better to only expose this data
from the learning_sequences Outlines HTTP API.

Finally, we should note that this removes one
constraint from the LearningSequence model:
    <unique together on (learning_context_key, usage_key)>
and adds two new ones:
    <unique on usage_key>
    <unique together on (learning_context_key, usage_key_hash)>
in order to support the new learning_sequences
API functions.

Co-Authored-By: Kristin Aoki <[email protected]>

TNL-8511

Supporting information

This PR is in support of TNL-8511, courseware URL shortening. It will be incorporated into @KristinAoki 's edx-platform PR, which powers her frontend-app-learning PR.

This PR makes references to TNL-8638. That ticket is currently empty, but we can use it to retroactively define and then track this project's follow-up work.

Testing instructions

  • On master: call the Sequence Metadata API for a any sequence with visible units, and note its contents.
  • Switch to this branch, and restart LMS and CMS.
  • Call the Sequence Metadata API for the same sequence. Note that it is return value is the same, other than the addition of "usage_key_hash": null on the returned sequence and all of its units.
  • In CMS Admin, regenerate the outline for the course to which your sequence belongs.
  • Call the Sequence Metadata API again, and note that the usage_key_hash values are filled in. Each value should be distinct.
  • Copy the sequence-level usage_key_hash. Call the Sequence Metadata API one more time, replacing the sequence usage key in the URL parameter with the usage_key_hash.
  • Note that the response is unchanged.

Deadline

Thurs 8/12

Copy link
Member Author

@kdmccormick kdmccormick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Notes for reviewers.

openedx/core/lib/hash_utils.py Outdated Show resolved Hide resolved
Comment on lines 40 to 54
def hash_usage_key(usage_key: UsageKey) -> str:
"""
Get the blake2b hash key for the given usage_key and encode the value. The
hash key will be added to the usage key's mapping dictionary for decoding
in LMS.

Args:
usage_key: the id of the location to which to generate the path

Returns:
The string of the encoded hashed key.
"""
short_key = hashlib.blake2b(bytes(str(usage_key), 'utf-8'), digest_size=6)
encoded_hash = urlsafe_b64encode(bytes(short_key.hexdigest(), 'utf-8'))
return str(encoded_hash, 'utf-8')
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@KristinAoki This is pulled near-verbatim from your implementation in openedx.core.djangoapps.course_experience.url_helpers. The differences are:

  • It's in a more generic location now. I didn't want to make learning_sequences depend on something course-specific like course_experience.
  • It does not save the mappings to a persisted in-process dictionary; it's a pure function now.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ended up moving this again; it's in openedx.core.djangoapps.content.learning_sequences.api.key_hashing now.

Comment on lines 531 to 548
# Load requested sequence from learning_sequences, trying to interpret
# the path param first as a UsageKey, and as then as a hash of a UsageKey.
try:
usage_key = UsageKey.from_string(usage_key_string)
except InvalidKeyError:
raise NotFound(f"Invalid usage key: '{usage_key_string}'.") # lint-amnesty, pylint: disable=raise-missing-from
# Try to parse as a real UsageKey, and then load the sequence.
usage_key = UsageKey.from_string(usage_key_or_hash)
learning_sequence = get_learning_sequence(usage_key)
except (InvalidKeyError, LearningSequenceData.DoesNotExist):
learning_sequence = None
if not usage_key:
# If we are here, then _parsing_ failed.
try:
# Try treating the path parameter as a *hash* of a UsageKey.
learning_sequence = get_learning_sequence_by_hash(usage_key_or_hash)
usage_key = learning_sequence.usage_key
except LearningSequenceData.DoesNotExist:
learning_sequence = None
if not learning_sequence:
raise NotFound(f"Invalid usage key: '{usage_key_or_hash}'.")
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New behavior 1: If the Sequence Metadata API gets a hashed usage_key as its input, it will leverage learning_sequences to recover the correct sequence and its original usage_key.

This way, the courseware frontend can immediately call the Sequence Metadata API with a hashed usage_key, instead of being blocked on the Course Metadata API's response.

Comment on lines 567 to 576
sequence_data = sequence.get_metadata(view=view)

# Insert usage_key_hashes of this sequence and its units into the response.
# We do this here in order to avoid further complicating seq_module
# with details of usage key hashing.
sequence_data["usage_key_hash"] = learning_sequence.usage_key_hash
for unit_data in sequence_data["items"]:
unit_data["usage_key_hash"] = hash_usage_key(unit_data["item_id"])

return Response(sequence.get_metadata(view=view))
return Response(sequence_data)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New behavior 2: The Sequence Metadata API will include hashes of the sequence usage_key and all its unit usage_keys.

This way, the frontend can construct a unit_usage_key<->unit_usage_key_hash mapping as soon as this API call returns.

@kdmccormick kdmccormick force-pushed the kdmccormick/learning-sequences-hash-mappings branch 3 times, most recently from 7c99d47 to 52d0c6b Compare August 11, 2021 19:44
@kdmccormick kdmccormick force-pushed the kdmccormick/learning-sequences-hash-mappings branch 4 times, most recently from 8e77416 to ce3dc0c Compare August 12, 2021 18:58
@kdmccormick kdmccormick changed the title feat: make sequence metadata API handle and provide hashed usage keys feat: implement usage_key_hashes and expose via sequence metadata api Aug 12, 2021
@kdmccormick kdmccormick force-pushed the kdmccormick/learning-sequences-hash-mappings branch from ce3dc0c to 7f63eb1 Compare August 12, 2021 19:42
@kdmccormick kdmccormick marked this pull request as ready for review August 12, 2021 19:43
usage_keys_list = ', '.join([
str(sequence.usage_key) for sequence in sequences
])
raise Exception(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we throw something more specific here. ValueError maybe?

# A URL-safe Base64-encoding of a blake2b hash of the usage key.
# For aesthetic use (eg, as a path parameter, to shorten URLs).
# This field is experimental. See TNL-8638 for more information.
usage_key_hash = attr.ib(type=str)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Might want to specify in the comment whether padding = characters are stripped or not.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do.

Comment on lines +390 to +409
# Length, in characters, of a base64-encoded usage key hash.
# These hashes are being experimentally used to shorten courseware URLs.
# See TNL-8638 for more details.
USAGE_KEY_HASH_LENGTH = 8

# Number of hash digest bits needed in order to produce a base64-encoded output
# of length `USAGE_KEY_HASH_LENGTH`.
# Each base64 character captures 6 bits (because 2 ^ 6 = 64).
_USAGE_KEY_HASH_BITS = USAGE_KEY_HASH_LENGTH * 6

# Number of hash digest bytes needed in order to produce a base64-encoded output
# of length `USAGE_KEY_HASH_LENGTH`. Is equal to one eighth of `_USAGE_KEY_HASH_BITS`.
# In the event that _USAGE_KEY_HASH_BITS is not divisible by 8, we round up.
# We will cut off the extra any output at the end of `hash_usage_key`.
_USAGE_KEY_HASH_BYTES = math.ceil(_USAGE_KEY_HASH_BITS / 8)

# A regex to capture strings that could be the output of `hash_usage_key`.
# Captures a string of length `USAGE_KEY_HASH_LENGTH`, made up of letters,
# numbers, dashes, underscores, and/or equals signs.
USAGE_KEY_HASH_PATTERN = rf'(?P<usage_key_hash>[A-Z_a-z0-9=-]{{{USAGE_KEY_HASH_LENGTH}}})'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't the linter complain if you declare these constants in the middle of the module? If not, that works. 🤷

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No complaints! Maybe because they come before any function definitions?

Comment on lines +119 to +123
usage_key_hash = models.CharField(
max_length=USAGE_KEY_HASH_LENGTH,
null=True,
default=None,
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs an index of its own, or the API query that fetches using it will have to do a full table scan.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or alternatively, you could make the unique_together constraint go in the opposite order, so that usage_key_hash came first.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't realize the order made a difference. I'll swap the order of those.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it doesn't matter for the purposes of enforcing the constraint, but it does for speed of lookup. When you have a composite index of (a, b), the database can use it for lookups on just a as well.

except InvalidKeyError:
raise NotFound(f"Invalid usage key: '{usage_key_string}'.") # lint-amnesty, pylint: disable=raise-missing-from
learning_sequence = get_learning_sequence(usage_key)
except (InvalidKeyError, LearningSequenceData.DoesNotExist) as exc:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please separate the logged error if we have LearningSequenceData.DoesNotExist, and have it not fail altogether when that happens?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do.

Copy link
Contributor

@ormsbee ormsbee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comments, requests, and optional nits.

@edx-status-bot
Copy link

Your PR has finished running tests. There were no failures.

@kdmccormick kdmccormick force-pushed the kdmccormick/learning-sequences-hash-mappings branch 2 times, most recently from e0cb001 to 6ea685b Compare December 20, 2022 18:24
We want to enable shorter URLs in the courseware pages of
frontend-app-learning.

To do this, we define a function hash_usage_key,
which maps usage keys to an 8-character-
long, urlsafe-base64-encoded hash. The function is
defined in the learning_sequences data model, and hashes
of sequence keys are persisted to learning_sequences upon
outline generation. We also present an experimental
learning_sequences python API for fetching sequences by
their hash (instead of their usage key).

Building on this, we enhance the Sequence Metadata
HTTP API to accept hashes as sequence identifiers
(in addition to usage keys). The API also now returns
the usage_key_hash of the requested sequence, as
well as the usage_key_hashes of every unit it returns
as part of the sequence. We may revisit the exposure
of usage_key_hashes from this API in
https://openedx.atlassian.net/browse/TNL-8638,
as we think it may be better to only expose this data
from the learning_sequences Outlines HTTP API.

Finally, we should note that this removes one
constraint from the LearningSequence model:
    <unique together on (learning_context_key, usage_key)>
and adds two new ones:
    <unique on usage_key>
    <unique together on (learning_context_key, usage_key_hash)>
in order to support the new learning_sequences

Public Jira (archived): https://openedx.atlassian.net/browse/TNL-8511

Private-ref: https://2u-internal.atlassian.net/browse/TNL-8511
Co-Authored-By: Kristin Aoki <[email protected]>
@kdmccormick kdmccormick force-pushed the kdmccormick/learning-sequences-hash-mappings branch from 6ea685b to d3df2e6 Compare December 20, 2022 18:29
@kdmccormick
Copy link
Member Author

Just rebased this, because I still think it'd be great to improve the Learning URLs if any of us ever get the chance to keep working on it :)

FYI @KristinAoki .

@e0d
Copy link
Contributor

e0d commented Nov 20, 2023

@kdmccormick should this be closed or maybe converted to a draft?

@kdmccormick
Copy link
Member Author

Yep, I'll convert to draft.

@kdmccormick kdmccormick marked this pull request as draft November 21, 2023 21:14
@kdmccormick
Copy link
Member Author

If we do ever get back to this (and I hope we do), I think the right thing to do would be to compute and store the hashes on Learning Core Container models rather than edx-platform learning_sequences models.

@kdmccormick kdmccormick deleted the kdmccormick/learning-sequences-hash-mappings branch October 24, 2024 18:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants