Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add compression support for Content #153

Open
ormsbee opened this issue Feb 7, 2024 · 1 comment
Open

Add compression support for Content #153

ormsbee opened this issue Feb 7, 2024 · 1 comment
Labels
data model Anything relating to the relational models or more abstract "model" concepts around Learning Core.

Comments

@ormsbee
Copy link
Contributor

ormsbee commented Feb 7, 2024

This came up in #149

I was looking through some example course data and there are a handful of Capa problems that weigh in at ~13-14 KB. But when compressed with zlib, that goes down to about 2K–the larger problems tend to be that way because they have a lot of Python code and HTML table markup, both of which compress really well.

General plan

  1. Rename the text field to uncompressed_text.
  2. Create a new BinaryField for compressed_text.
  3. Create a cached property text that knows how to switch between the two.

At the time we write to Content, we run zlib compression on the text and decide whether to use the compressed or uncompressed field for this row. The other field is left null. When we first introduce this feature, we can run it as a data migration, though that wouldn't be a requirement.

Pruning is still the more important feature for controlling the content size growth.

@ormsbee
Copy link
Contributor Author

ormsbee commented Feb 28, 2024

We need to balance the usefulness of this vs. the loss of query capability. The thing we'd really want to do here is to let MySQL handle the compression using page compression. What's stopping us is that AWS Aurora doesn't support that as of v3: https://www.skeema.io/blog/2022/01/27/exploring-aurora-v3/

The scary bit is that using the COMPRESSED row format will be silently converted by Aurora to COMPACT which removes the ability to have larger keys and is worse than the default DYNAMIC format.

Given that most deployments use Tutor, maybe it's possible to detect that and have a bit of SQL executed to add compression for situations where we're not running under Aurora.

@ormsbee ormsbee added the data model Anything relating to the relational models or more abstract "model" concepts around Learning Core. label Feb 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
data model Anything relating to the relational models or more abstract "model" concepts around Learning Core.
Projects
None yet
Development

No branches or pull requests

1 participant