Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

813 - Persist portal metadata computed file and upload after creation #825

Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
96 commits
Select commit Hold shift + click to select a range
5f38079
create the management command file for the portal-wide metadata and r…
nozomione Jul 19, 2024
62bff10
create the test file for the portal-wide matadata management command …
nozomione Jul 19, 2024
f4df115
(edit) rename the setup_database method to load_test_data, remove 'pr…
nozomione Jul 22, 2024
7118063
Merge remote-tracking branch 'origin/feature/portal-metadata-command'…
nozomione Jul 23, 2024
4b07459
add ComputedFile::get_portal_metadata_file method and readme_file.get…
nozomione Jul 23, 2024
3d9f6af
(edit) call ComputedFile::get_portal_metadata_file in create-portal-m…
nozomione Jul 23, 2024
3538049
(edit) add the portal metadata file generation workflow to ComputedFi…
nozomione Jul 24, 2024
301cab9
(edit) adjust the create portal metadata command and its test (add te…
nozomione Jul 24, 2024
6250be9
Merge remote-tracking branch 'origin/feature/portal-metadata-command'…
nozomione Jul 25, 2024
72e8664
(edit) check against queryset objects count rather than IDs for proje…
nozomione Jul 25, 2024
fc3b731
(minor) fix a typo and remove comments
nozomione Jul 25, 2024
468c52d
Merge branch 'nozomione/797-scaffolding-management-command-file-1' in…
nozomione Jul 25, 2024
836a8d9
Merge branch 'origin/feature/portal-metadata-command' into nozomione/…
nozomione Jul 25, 2024
d5ce758
add common.GENERATED_PORTAL_METADATA_DOWNLOAD_CONFIG for the portal m…
nozomione Jul 25, 2024
7b416aa
(edit) edit readme_file.get_file_contents temporarily(and remove get_…
nozomione Jul 25, 2024
37e29d8
(edit) check the portal_metadata_only key to swap config and template…
nozomione Jul 26, 2024
9474b77
(edit) add a type hint for the queryset parameter
nozomione Jul 26, 2024
ef38860
(merge) resolve conflict and merge the latest dev branch updates to n…
nozomione Jul 26, 2024
690aa4b
(edit) store the portal metadata computed file to a local variable co…
nozomione Jul 26, 2024
2a0f7ef
Merge branch 'nozomione/797-generate-readme-file-zip-2' into nozomion…
nozomione Jul 26, 2024
f26e0d7
(edit) add the static method ComputedFile::get_local_portal_metadata_…
nozomione Jul 26, 2024
d001b6a
Merge 'dev' branch into nozomione/813-persist-portal-metadata-compute…
nozomione Jul 31, 2024
03a3791
(edit) add save/upload logic to the create portal metadata management…
nozomione Jul 31, 2024
428b9f8
(fix) fixed the readme output
nozomione Jul 31, 2024
5283d28
Merge 'dev' branch into nozomione/797-generate-readme-file-zip-2
nozomione Jul 31, 2024
c65e59e
Merge branch 'nozomione/797-generate-readme-file-zip-2' into nozomion…
nozomione Jul 31, 2024
1628d3e
Merge branch 'nozomione/797-generate-metadata-file-zip-3' into nozomi…
nozomione Jul 31, 2024
9024959
(merge) resolve conflict and merge the tracking feature/portal-metada…
nozomione Aug 1, 2024
3435b18
(edit) adjust readme_file.get_file_contents using new readme template…
nozomione Aug 1, 2024
9a2728c
(edit) remove the test_zip_file and test_readme_file methods and asse…
nozomione Aug 1, 2024
d379f7e
(merge) resolve conflict and merge 'nozomione/797-generate-readme-fil…
nozomione Aug 1, 2024
f5104a2
(bug) remove the extra context manager for readme check
nozomione Aug 1, 2024
fb2533d
Merge 'nozomione/797-generate-readme-file-zip-2' into nozomione/797-g…
nozomione Aug 1, 2024
bfde9f6
(edit) remove the test_metadata_file method and directly assert the m…
nozomione Aug 1, 2024
b2d26cb
(minor) move comments and variable for zip assertion inside the conte…
nozomione Aug 2, 2024
8ddd3b1
Merge 'nozomione/797-generate-readme-file-zip-2' into nozomione/797-g…
nozomione Aug 2, 2024
4aeb76e
(edit) use the csv module's DictReader for the metadata.tsv file asse…
nozomione Aug 2, 2024
31ff9af
Merge nozomione/797-generate-metadata-file-zip-3 into nozomione/813-p…
nozomione Aug 2, 2024
54a203d
(edit) remove TODO comment and move the body of test_computed_file in…
nozomione Aug 2, 2024
4c68764
(minor) remove the LOCAL_ZIP_FILE_PATH variable (no need - since the …
nozomione Aug 2, 2024
011abdf
(typo) fix a typo
nozomione Aug 2, 2024
20564c4
Merge branch 'dev' into nozomione/797-generate-readme-file-zip-2
nozomione Aug 2, 2024
a7e56b5
Merge branch 'feature/portal-metadata-command' into nozomione/797-gen…
nozomione Aug 5, 2024
9c703c8
Merge remote-tracking branch 'origin/nozomione/797-generate-readme-fi…
nozomione Aug 5, 2024
0d12432
(minor) remove a comment in computed_file, instead it wil be included…
nozomione Aug 5, 2024
f3ce338
Merge branch 'nozomione/797-generate-metadata-file-zip-3' into nozomi…
nozomione Aug 5, 2024
3f0ccd5
Merge branch 'feature/portal-metadata-command' into nozomione/797-gen…
nozomione Aug 13, 2024
7213be4
Merge branch 'nozomione/797-generate-readme-file-zip-2' into nozomion…
nozomione Aug 13, 2024
7ecd55c
Merge branch 'nozomione/797-generate-metadata-file-zip-3' into nozomi…
nozomione Aug 13, 2024
802080e
(edit) add an arg for upload_s3 and define the constans for args' def…
nozomione Aug 13, 2024
0200be7
(rename) append the suffix '_FILE' to the constants README and METADATA
nozomione Aug 13, 2024
1f5e074
(rename) rename common.GENERATED_SAMPLE_DOWNLOAD_CONFIGS (to plural) …
nozomione Aug 16, 2024
ed9e0e4
(edit) refactor download_config handling for portal-metadata (revert …
nozomione Aug 16, 2024
017b992
(clean up) remove the constant added previously(no longer used)
nozomione Aug 16, 2024
9bd173f
Merge branch 'nozomione/797-generate-readme-file-zip-2' into nozomion…
nozomione Aug 16, 2024
5d81c0e
(edit) use list literals to add a check that matches the project down…
nozomione Aug 19, 2024
3cfca8a
Merge branch 'nozomione/797-generate-readme-file-zip-2' into nozomion…
nozomione Aug 19, 2024
02cdde6
add a new static method ComputedFile::get_local_file_path (remove Com…
nozomione Aug 19, 2024
cbc25eb
Merge 'nozomione/797-generate-metadata-file-zip-3' branch into nozomi…
nozomione Aug 19, 2024
0e5cf01
Merge branch 'dev' into nozomione/813-persist-portal-metadata-compute…
nozomione Aug 19, 2024
03b7b40
(migration) merge migration files to resolve multiple leaf nodes
nozomione Aug 19, 2024
ffcc45e
(edit) use s3.upload_output_file (remove computed_file.upload_s3_file…
nozomione Aug 19, 2024
0c7b5ae
(minor) remove the constant 'ENCODING' to match the implementation of…
nozomione Aug 19, 2024
e7ba46a
(TODO comment) add a TODO comment to indicate that once PR #839 is me…
nozomione Aug 19, 2024
189573f
(edit) add the command 'configure_aws_cli' to resolve the duplicated …
nozomione Aug 20, 2024
970d67e
(TENP) temporaily skip isort to modify the import order of the manage…
nozomione Aug 20, 2024
0a8b524
(edit) add mock for s3.upload_output_file used in create_portal_metadata
nozomione Aug 20, 2024
dc12498
(migration) undo the merged migration files
nozomione Aug 20, 2024
b910454
(fix) run pre-commit and migrate
nozomione Aug 20, 2024
d87b7d5
Merge branch 'feature/portal-metadata-command' into nozomione/797-gen…
nozomione Aug 21, 2024
2c66417
(adjust) rollblack previous migration and re-migrate the portal_metad…
nozomione Aug 21, 2024
8610545
Merge branch 'nozomione/797-generate-readme-file-zip-2' into nozomion…
nozomione Aug 21, 2024
631cef9
Merge branch 'nozomione/797-generate-metadata-file-zip-3' into nozomi…
nozomione Aug 21, 2024
cfe68bf
(mionr) re-locate the logger message and fix typos in create_portal_m…
nozomione Aug 21, 2024
450d11a
(minor) make another revision of the comments
nozomione Aug 21, 2024
baa659c
(remove TEMP) remove temporaily added codeblocks
nozomione Aug 21, 2024
2b77b7a
(edit) access kwargs props using square brackets instead of the get m…
nozomione Aug 22, 2024
48ab892
(edit) use the get method to give the default value in if condition t…
nozomione Aug 22, 2024
0258446
(minor) add a check for adding logging handler to make sure no duplic…
nozomione Aug 22, 2024
182ab63
(minor) add a comment and remove the handler var (no needed)
nozomione Aug 22, 2024
b692665
(minor) add a comment for the handler check
nozomione Aug 22, 2024
964fa78
(edit) rename GENERATED_PROJECT_DOWNLOAD_CONFIG to GENERATED_PROJECT_…
nozomione Aug 22, 2024
f2574ef
Merge branch 'nozomione/797-generate-readme-file-zip-2' into nozomion…
nozomione Aug 22, 2024
651d753
(edit) use the config.logging.get_and_configure_logger for the log me…
nozomione Aug 22, 2024
52c6daf
Merge 'nozomione/797-generate-metadata-file-zip-3' branch into nozomi…
nozomione Aug 22, 2024
9454372
Merge 'feature/portal-metadata-command' branch into nozomione/797-gen…
nozomione Aug 26, 2024
68053c9
(edit) add Iterable type hint instead of Queryset in readme_file.get_…
nozomione Aug 26, 2024
e0cf99f
Merge 'nozomione/797-generate-readme-file-zip-2)' into nozomione/797-…
nozomione Aug 26, 2024
afadbac
Merge 'nozomione/797-generate-metadata-file-zip-3' into nozomione/813…
nozomione Aug 26, 2024
609dc06
(edit) pass the argument 'computed_file.s3_bucket' to 's3.upload_outo…
nozomione Aug 26, 2024
d1f8a25
(edit) clean up output data regardless of computed file existence and…
nozomione Aug 30, 2024
e652a76
Merge branch 'feature/portal-metadata-command' into nozomione/813-per…
nozomione Sep 4, 2024
0300b9f
(fix) adjust the computed file byte size in the test_create_portal)me…
nozomione Sep 4, 2024
cd59c14
(edit) use walrus operator for computed_file assignment and save comp…
nozomione Sep 5, 2024
daccc79
(edit) add assertEqualWithVariance (which checks computed_file file s…
nozomione Sep 5, 2024
75ed5b3
(edit) remove data type checks for the field values and instad, perfo…
nozomione Sep 6, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
34 changes: 19 additions & 15 deletions api/scpca_portal/management/commands/create_portal_metadata.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,40 +3,44 @@
from django.conf import settings
from django.core.management.base import BaseCommand

from scpca_portal import common
from scpca_portal import common, s3
from scpca_portal.config.logging import get_and_configure_logger
from scpca_portal.models import ComputedFile, Project

logger = get_and_configure_logger(__name__)


class Command(BaseCommand):
help = """Creates a computed file and zip for portal-wide metadata,
saves the instance to the databse, and
uploads the zip file to S3 bucket.
help = """Creates a computed file for portal-wide metadata.
Saves generated computed file to the db.
Optionally uploads file to s3 and cleans up output data.
"""

@staticmethod
def clean_up_output_data():
"""Cleans up the output data files after processing the computed file"""
logger.info("Cleaning up output data")
# This static method may not be required using buffers

def add_arguments(self, parser):
parser.add_argument(
"--clean-up-output-data", action=BooleanOptionalAction, default=settings.PRODUCTION
)
parser.add_argument(
"--update-s3", action=BooleanOptionalAction, default=settings.UPDATE_S3_DATA
)

def handle(self, *args, **kwargs):
self.create_portal_metadata(**kwargs)

def create_portal_metadata(self, **kwargs):
def create_portal_metadata(self, clean_up_output_data: bool, update_s3: bool, **kwargs):
logger.info("Creating the portal-wide metadata computed file")
computed_file = ComputedFile.get_portal_metadata_file(
if computed_file := ComputedFile.get_portal_metadata_file(
Project.objects.all(), common.GENERATED_PORTAL_METADATA_DOWNLOAD_CONFIG
)
):
if update_s3:
logger.info("Updating the zip file in S3")
s3.upload_output_file(computed_file.s3_key, computed_file.s3_bucket)
Comment on lines +35 to +37
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's move updating above saving it to the DB, so that if it fails on upload we exit and the file is unavailable.


logger.info("Saving the object to the database")
computed_file.save()

if kwargs["clean_up_output_data"]:
self.clean_up_output_data()
if clean_up_output_data:
logger.info("Cleaning up the output directory")
computed_file.clean_up_local_computed_file()

return computed_file
7 changes: 5 additions & 2 deletions api/scpca_portal/models/computed_file.py
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,6 @@ class OutputFileFormats:
format = models.TextField(choices=OutputFileFormats.CHOICES, null=True)
includes_merged = models.BooleanField(default=False)
modality = models.TextField(choices=OutputFileModalities.CHOICES, null=True)
portal_metadata_only = models.BooleanField(default=False)
metadata_only = models.BooleanField(default=False)
portal_metadata_only = models.BooleanField(default=False)
s3_bucket = models.TextField()
Expand Down Expand Up @@ -133,7 +132,11 @@ def get_portal_metadata_file(cls, projects, download_config: Dict) -> Self:
)

computed_file = cls(
portal_metadata_only=True,
format=download_config.get("format"),
modality=download_config.get("modality"),
includes_merged=download_config.get("includes_merged"),
metadata_only=download_config.get("metadata_only"),
portal_metadata_only=download_config.get("portal_metadata_only"),
s3_bucket=settings.AWS_S3_OUTPUT_BUCKET_NAME,
s3_key=common.PORTAL_METADATA_COMPUTED_FILE_NAME,
size_in_bytes=zip_file_path.stat().st_size,
Expand Down
Original file line number Diff line number Diff line change
@@ -1,6 +1,8 @@
import csv
import shutil
from io import TextIOWrapper
from typing import Dict
from unittest.mock import patch
from zipfile import ZipFile

from django.conf import settings
Expand Down Expand Up @@ -29,9 +31,6 @@ def tearDownClass(cls):
super().tearDownClass()
shutil.rmtree(common.OUTPUT_DATA_PATH, ignore_errors=True)

def assertProjectReadmeContains(self, text, zip_file):
self.assertIn(text, zip_file.read("README.md").decode("utf-8"))

def load_test_data(self):
# Expected object counts
PROJECTS_COUNT = 3
Expand All @@ -54,31 +53,71 @@ def load_test_data(self):
self.assertEqual(Sample.objects.all().count(), SAMPLES_COUNT)
self.assertEqual(Library.objects.all().count(), LIBRARIES_COUNT)

def test_create_portal_metadata(self):
# TODO: After PR #839 is merged into dev, add readme file format testing
def assertProjectReadmeContains(self, text, zip_file):
self.assertIn(text, zip_file.read(README_FILE).decode("utf-8"))
Comment on lines +56 to +58
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, we should draft the issue to do this instead of just having the comment.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've gone ahead and filed the issue here #869 👍!


def assertFields(self, computed_file, expected_fields: Dict):
for expected_key, expected_value in expected_fields.items():
actual_value = getattr(computed_file, expected_key)
message = f"Expected {expected_value}, received {actual_value} on '{expected_key}'"
self.assertEqual(actual_value, expected_value, message)

def assertEqualWithVariance(self, value, expected, variance=50):
# Make sure the given value is within the range of expected bounds
message = f"{value} is out of range"
self.assertGreaterEqual(value, expected - variance, message)
self.assertLessEqual(value, expected + variance, message)

@patch("scpca_portal.management.commands.create_portal_metadata.s3.upload_output_file")
def test_create_portal_metadata(self, mock_upload_output_file):
# Set up the database for test
self.load_test_data()
self.processor.create_portal_metadata(clean_up_output_data=False)
# Create the portal metadata computed file
self.processor.create_portal_metadata(clean_up_output_data=False, update_s3=True)

# Test the computed file
computed_files = ComputedFile.objects.filter(portal_metadata_only=True)
# Make sure the computed file is created and singular
self.assertEqual(computed_files.count(), 1)
computed_file = computed_files.first()
# Make sure the computed file size is as expected range
self.assertEqualWithVariance(computed_file.size_in_bytes, 8430)
# Make sure all fields match the download configuration values
download_config = {
"modality": None,
"format": None,
"includes_merged": False,
"metadata_only": True,
"portal_metadata_only": True,
}
self.assertFields(computed_file, download_config)
# Make sure mock_upload_output_file called once
mock_upload_output_file.assert_called_once_with(
computed_file.s3_key, computed_file.s3_bucket
)

# Test the content of the generated zip file
zip_file_path = ComputedFile.get_local_file_path(
common.GENERATED_PORTAL_METADATA_DOWNLOAD_CONFIG
)
with ZipFile(zip_file_path) as zip:
# Test the content of the generated zip file
with ZipFile(zip_file_path) as zip_file:
# There are 2 file:
# ├── README.md
# |── metadata.tsv
expected_file_count = 2
# Make sure the zip has the exact number of expected files
files = set(zip.namelist())
files = set(zip_file.namelist())
self.assertEqual(len(files), expected_file_count)
self.assertIn(README_FILE, files)
self.assertIn(METADATA_FILE, files)
# README.md
expected_text = (
"This download includes associated metadata for samples from all projects"
)
self.assertProjectReadmeContains(expected_text, zip)
self.assertProjectReadmeContains(expected_text, zip_file)
# metadata.tsv
with zip.open(METADATA_FILE) as metadata_file:
with zip_file.open(METADATA_FILE) as metadata_file:
csv_reader = csv.DictReader(
TextIOWrapper(metadata_file, "utf-8"),
delimiter=common.TAB,
Expand Down
Loading