-
-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
813 - Persist portal metadata computed file and upload after creation #825
813 - Persist portal metadata computed file and upload after creation #825
Conversation
…egister it to sportal
…test_create_portal_metadata
…oject_id' kwarg when calling the load_data command, check PROJECT_COUNT, SAMPLES_COUNT, and LIBRARIES_COUNT by quering libraries metadata from the Library model to verify database setup instead of checking an explicit list of project ids
… into nozomione/797-scaffolding-management-command-file-1
…_portal_metadata_file_content for the portal-wide metadata readme file generation
…etadata management command and update its test
…le::get_portal_metadata_file
…st for metadata file, organize and split the logic into separate methods for readbility)
… into nozomione/797-scaffolding-management-command-file-1
…cts, samples, and libraries
…to nozomione/797-generate-readme-file-zip-2
…797-generate-readme-file-zip-2
…etadata download and adjust create_portal_metadata management command, change suffix CONFIGURATIONS to CONFIG for the download config and adjust the codebase accordingly
…portal_metadata_file_content - no longer needed), add a new model field portal_metadata_only and update ComputedFile::get_portal_metadata_file to return the instance of the computed file, and adjust test_create_portal_metadata
… contexts values (remove the Project instance check) and adjust the codebase accordingly
…ozomione/797-generate-metadata-file-zip-3
…mputed_file in the create_portal_metadata management command, and temporarily add noqa to supress Flake8 unused variable warning (with TODO comment)
…e/797-generate-metadata-file-zip-3
…path and remove OUTPUT_* constants from common and adjust the codebase
… command for the database/s3 bucket respectively
…ut_file' (new change from dev), add the new parameters ('clean_up_output_data’, 'update_s3') with type hints to the 'create_portal_metadata' method function signature in the create portal metadata command, and adjust its test
I've applied your feedback and this PR is ready for your review. Thank you David! |
help = """Creates a computed file and zip for portal-wide metadata. | ||
Saves generated computed file the db. | ||
Uploads file to s3 and cleans up output data depending on passed options. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
help = """Creates a computed file and zip for portal-wide metadata. | |
Saves generated computed file the db. | |
Uploads file to s3 and cleans up output data depending on passed options. | |
help = """Creates a computed file for portal-wide metadata. | |
Saves generated computed file to the db. | |
Optionally uploads file to s3 and cleans up output data. |
if clean_up_output_data: | ||
logger.info("Cleaning up the output directory") | ||
computed_file.clean_up_local_computed_file() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if clean_up_output_data: | |
logger.info("Cleaning up the output directory") | |
computed_file.clean_up_local_computed_file() | |
if clean_up_output_data: | |
logger.info("Cleaning up the output directory") | |
computed_file.clean_up_local_computed_file() |
This should run even if the generated file was unsuccessful because we potentially may have a half-completed file on disk.
# TODO: After PR #839 is merged into dev, add readme file format testing | ||
def assertProjectReadmeContains(self, text, zip_file): | ||
self.assertIn(text, zip_file.read(README_FILE).decode("utf-8")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, we should draft the issue to do this instead of just having the comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've gone ahead and filed the issue here #869 👍!
computed_file = self.processor.create_portal_metadata( | ||
clean_up_output_data=False, update_s3=True | ||
) | ||
|
||
# Test computed_file | ||
if computed_file: | ||
expected_size = 8469 | ||
self.assertEqual(computed_file.size_in_bytes, expected_size) | ||
mock_upload_output_file.assert_called_once_with( | ||
computed_file.s3_key, computed_file.s3_bucket | ||
) | ||
else: | ||
self.fail("No computed file") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think here we should be calling the same assertions independent of the command running successfully. That is what we are here to test / ensure.
I rewrote the test below but I think for better coverage we should also assert that all of the
GENERATED_PORTAL_METADATA_DOWNLOAD_CONFIG
values have been applied to the computed file as well.
computed_file = self.processor.create_portal_metadata( | |
clean_up_output_data=False, update_s3=True | |
) | |
# Test computed_file | |
if computed_file: | |
expected_size = 8469 | |
self.assertEqual(computed_file.size_in_bytes, expected_size) | |
mock_upload_output_file.assert_called_once_with( | |
computed_file.s3_key, computed_file.s3_bucket | |
) | |
else: | |
self.fail("No computed file") | |
self.processor.create_portal_metadata( | |
clean_up_output_data=False, update_s3=True | |
) | |
computed_files = ComputedFile.objects.filter(portal_metadata_only=True) | |
self.assertEqual(portal_computed_files.count(), 1) | |
computed_file = computed_files.first() | |
self.assertEqual(computed_file.size_in_bytes, 8469) | |
mock_upload_output_file.assert_called_once_with( | |
computed_file.s3_key, computed_file.s3_bucket | |
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the rewrite, it looks great! I've also applied your suggestion to assert that all field values match the download config file at d1f8a25 👍!
… enhance the help string for the create_portal_metadata management command, add the assertion for computed file count (for singularity) and fields (values match the portal metadata download configuration) in test_create_portal_metadata, update CopmutedFile::get_portal_metadata_file to explicitly set field values defined in the download config when instantiating a computed file for the portal metadata
I've applied your feedback and this PR is ready for another look. Thank you David! |
…sist-portal-metadata-computed-file-data
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM if you find that testing the value does not consistently pass, I left a comment with an alternative.
self.assertEqual(computed_files.count(), 1) | ||
computed_file = computed_files.first() | ||
# Make sure the computed file size is as expected | ||
self.assertEqual(computed_file.size_in_bytes, 8460) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
self.assertEqual(computed_file.size_in_bytes, 8460) | |
self.assertEqualWithVariance(computed_file.size_in_bytes, 8460, 50) | |
# outside of this test | |
def assertEqualWithVariance(self, value, expected, variance): | |
self.assertGreater(expected + variance, value) | |
self.assertLess(expected - variance, value) | |
If you find that the value changes and you cannot reliably assert that the value matches you can create a test that allows for some wiggle room.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is such a great alternative! I've applied it at daccc79 👍
I’ve also added a new method called assertFields
, which asserts that the fields in the portal metadata computed file match the download configuration settings. We talked about this optimization idea during our 1:1, so I went ahead and applied the input as well!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry one more comment about ordering so that we don't end up with an object in the DB that no one can download.
logger.info("Creating the portal-wide metadata computed file") | ||
computed_file = ComputedFile.get_portal_metadata_file( | ||
Project.objects.all(), common.GENERATED_PORTAL_METADATA_DOWNLOAD_CONFIG | ||
) | ||
|
||
if kwargs["clean_up_output_data"]: | ||
self.clean_up_output_data() | ||
if computed_file: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can use the walrus operator here.
if update_s3: | ||
logger.info("Updating the zip file in S3") | ||
s3.upload_output_file(computed_file.s3_key, computed_file.s3_bucket) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's move updating above saving it to the DB, so that if it fails on upload we exit and the file is unavailable.
…uted_file object after the successful s3 file upload in create_portal_metadata
…ize within the specified range) and assertFields (which checks the computed file field values against download configurations) to test_create_portal_metadata
I've updated the PR and it's ready for your review. Thank you David! |
def assert_bool(value, expected_value, message=""): | ||
if expected_value: | ||
self.assertTrue(value, message) | ||
else: | ||
self.assertFalse(value, message) | ||
|
||
def assert_is_none(value, message=""): | ||
self.assertIsNone(value, message) | ||
|
||
def assert_equal(value, expected_value, message=""): | ||
self.assertEqual(value, expected_value, message) | ||
|
||
for expected_key, expected_value in expected_fields.items(): | ||
message = f"Field '{expected_key}' does not match" | ||
output_value = getattr(computed_file, expected_key) | ||
if isinstance(expected_value, bool): | ||
assert_bool(output_value, expected_value, message) | ||
elif expected_value is None: | ||
assert_is_none(output_value, message) | ||
else: | ||
assert_equal(output_value, expected_value, message) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def assert_bool(value, expected_value, message=""): | |
if expected_value: | |
self.assertTrue(value, message) | |
else: | |
self.assertFalse(value, message) | |
def assert_is_none(value, message=""): | |
self.assertIsNone(value, message) | |
def assert_equal(value, expected_value, message=""): | |
self.assertEqual(value, expected_value, message) | |
for expected_key, expected_value in expected_fields.items(): | |
message = f"Field '{expected_key}' does not match" | |
output_value = getattr(computed_file, expected_key) | |
if isinstance(expected_value, bool): | |
assert_bool(output_value, expected_value, message) | |
elif expected_value is None: | |
assert_is_none(output_value, message) | |
else: | |
assert_equal(output_value, expected_value, message) | |
for attr, expected_value in expected_fields.items(): | |
actual_value = getattr(computed_file, attr) | |
message = f"Expected {expected_value}, received {actual_value} on '{expected_key}'" | |
self.assertEqual(actual_value, expected_value, message) |
I'd like to see less custom logic here. We really just need to assert equality. ex: False === False, True === True, string === string etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've applied this feedback at 75ed5b3👍
…rm equality check in assertFields
…#825) * create the management command file for the portal-wide metadata and register it to sportal * create the test file for the portal-wide matadata management command test_create_portal_metadata * (edit) rename the setup_database method to load_test_data, remove 'project_id' kwarg when calling the load_data command, check PROJECT_COUNT, SAMPLES_COUNT, and LIBRARIES_COUNT by quering libraries metadata from the Library model to verify database setup instead of checking an explicit list of project ids * add ComputedFile::get_portal_metadata_file method and readme_file.get_portal_metadata_file_content for the portal-wide metadata readme file generation * (edit) call ComputedFile::get_portal_metadata_file in create-portal-metadata management command and update its test * (edit) add the portal metadata file generation workflow to ComputedFile::get_portal_metadata_file * (edit) adjust the create portal metadata command and its test (add test for metadata file, organize and split the logic into separate methods for readbility) * (edit) check against queryset objects count rather than IDs for projects, samples, and libraries * (minor) fix a typo and remove comments * add common.GENERATED_PORTAL_METADATA_DOWNLOAD_CONFIG for the portal metadata download and adjust create_portal_metadata management command, change suffix CONFIGURATIONS to CONFIG for the download config and adjust the codebase accordingly * (edit) edit readme_file.get_file_contents temporarily(and remove get_portal_metadata_file_content - no longer needed), add a new model field portal_metadata_only and update ComputedFile::get_portal_metadata_file to return the instance of the computed file, and adjust test_create_portal_metadata * (edit) check the portal_metadata_only key to swap config and template contexts values (remove the Project instance check) and adjust the codebase accordingly * (edit) add a type hint for the queryset parameter * (edit) store the portal metadata computed file to a local variable computed_file in the create_portal_metadata management command, and temporarily add noqa to supress Flake8 unused variable warning (with TODO comment) * (edit) add the static method ComputedFile::get_local_portal_metadata_path and remove OUTPUT_* constants from common and adjust the codebase * (edit) add save/upload logic to the create portal metadata management command for the database/s3 bucket respectively * (fix) fixed the readme output * (edit) adjust readme_file.get_file_contents using new readme template structures, and make common.GENERATED_PORTAL_METADATA_DOWNLOAD_CONFIG from a list to a dict * (edit) remove the test_zip_file and test_readme_file methods and assert directly from test_create_portal_metadata, and change the readme assert value and remove TODO comments * (bug) remove the extra context manager for readme check * (edit) remove the test_metadata_file method and directly assert the metadata file in test_create_portal_metadata, and use common.METADATA_COLUMN_SORT_ORDER as the value of expected_keys instead of hand-coded value * (minor) move comments and variable for zip assertion inside the context manager * (edit) use the csv module's DictReader for the metadata.tsv file assertion * (edit) remove TODO comment and move the body of test_computed_file inside test_create_portal_metadata for the computed file assertion * (minor) remove the LOCAL_ZIP_FILE_PATH variable (no need - since the test no longer has individual methods) * (typo) fix a typo * (minor) remove a comment in computed_file, instead it wil be included in the PR * (edit) add an arg for upload_s3 and define the constans for args' default value to the create_portal_metadata management command, and add constants for default values and rename 'zip' to 'zip_file' in the context mamanger for testing zip file in the test_create_portal_metadata * (rename) append the suffix '_FILE' to the constants README and METADATA * (rename) rename common.GENERATED_SAMPLE_DOWNLOAD_CONFIGS (to plural) and adjust codebase accordingly * (edit) refactor download_config handling for portal-metadata (revert from dict to list to perform the same check (in) as the project's config, remove the variable, add a QuerySet check to determine whether 'projects' should be passed as when dealing with a single project * (clean up) remove the constant added previously(no longer used) * (edit) use list literals to add a check that matches the project download config check * add a new static method ComputedFile::get_local_file_path (remove Computed::get_local_portal_metadata_path), add the 'zip_file_path' variable in ComputedFile::get_portal_metadata_file, and adjust test_create_portal_metadata * (migration) merge migration files to resolve multiple leaf nodes * (edit) use s3.upload_output_file (remove computed_file.upload_s3_file - no longer used), fix verb from 'upload' to 'update' and adjust the codebase, remove constants (CLEAN_UP_OUTPUT_DATA and UPLOAD_S3 - no longer used) in create_portal_metadata management command, and minor adjustment in its test * (minor) remove the constant 'ENCODING' to match the implementation of PR 389 (the whitespace removal from the readme files) * (TODO comment) add a TODO comment to indicate that once PR #839 is merged into 'dev' branch, readme file format testing will be added, and 'assertProjectReadmeContains' will be removed from test_create_portal_metadata * (edit) add the command 'configure_aws_cli' to resolve the duplicated load data call in test_create_portal_metadata * (TENP) temporaily skip isort to modify the import order of the management commands (to prevent tests to run in parallel) * (edit) add mock for s3.upload_output_file used in create_portal_metadata * (migration) undo the merged migration files * (fix) run pre-commit and migrate * (adjust) rollblack previous migration and re-migrate the portal_metadata_only field for ComputedFile model * (mionr) re-locate the logger message and fix typos in create_portal_metadata management command file * (minor) make another revision of the comments * (remove TEMP) remove temporaily added codeblocks * (edit) access kwargs props using square brackets instead of the get method * (edit) use the get method to give the default value in if condition to prevent an error when the args is not passed when calling the portal metadata management command * (minor) add a check for adding logging handler to make sure no duplicate loggings * (minor) add a comment and remove the handler var (no needed) * (minor) add a comment for the handler check * (edit) rename GENERATED_PROJECT_DOWNLOAD_CONFIG to GENERATED_PROJECT_DOWNLOAD_CONFIGS, and replace iin with is in the readme_file module for the portal metadata config test * (edit) use the config.logging.get_and_configure_logger for the log message and remove the logging module) from the create_portal_metadata management command, adjust test_create_portal_metadata to explicitly check each file with assertIn (no for loop and remove the expected_files variable), use csv and TextIOWrapper to open TSV file to check the content, and assert expected_keys using set (no longer using list), and add check for library ids * Merge 'feature/portal-metadata-command' branch into nozomione/797-generate-readme-file-zip-2 * (edit) add Iterable type hint instead of Queryset in readme_file.get_file_contents, and pass the list as arguments for Project and Sample when calling readme_file.get_file_contents in computed_file * (edit) pass the argument 'computed_file.s3_bucket' to 's3.upload_outout_file' (new change from dev), add the new parameters ('clean_up_output_data’, 'update_s3') with type hints to the 'create_portal_metadata' method function signature in the create portal metadata command, and adjust its test * (edit) clean up output data regardless of computed file existence and enhance the help string for the create_portal_metadata management command, add the assertion for computed file count (for singularity) and fields (values match the portal metadata download configuration) in test_create_portal_metadata, update CopmutedFile::get_portal_metadata_file to explicitly set field values defined in the download config when instantiating a computed file for the portal metadata * (fix) adjust the computed file byte size in the test_create_portal)metadata * (edit) use walrus operator for computed_file assignment and save computed_file object after the successful s3 file upload in create_portal_metadata * (edit) add assertEqualWithVariance (which checks computed_file file size within the specified range) and assertFields (which checks the computed file field values against download configurations) to test_create_portal_metadata * (edit) remove data type checks for the field values and instad, perform equality check in assertFields
* Merge dev updates into feature/consolidate-readmes (#850) * add s3 module * update references in management command, models and tests to point to functions in new s3 module, remove duplicate code * move utils::list_s3_paths to new s3 module, update references in other files * move TestListS3Paths class from test_utils to new test_s3 testing module * add generic s3::download_s3_files method, add s3::download_metadata_files and s3::download_data_files methods * fix list.extend bug in s3::download_s3_files * add s3::download_sample_data_files and s3::download_project_data_files methods, impove s3::download_metadata_files * swap s3::download_data for s3::download_metadata_files in load_data command, call s3::download_project_data_files in Project::create_computed_files, call s3::download_sample_data_files in Sample::create_computed_files, add Sample::has_downloaded_data property method * improve readability of Project::get_data_file_paths * delete s3::download_data and s3::download_data_files functions, update s3 module function docstrings * add s3::download_data_files * integrate s3::download_data_files inside of ComputedFile::get_project|sample_file, remove references to s3::download_sample_data_files inside of Sample::create_computed_files and s3::download_project_data_files inside of Project::create_computed_files * remove unused s3::download_project_data_files and s3::download_sample_data_files * remove unused property method Sample::has_downloaded_data * replace s3::download_data_files with s3::download_input_files, replace s3::download_metadata_files with s3::download_input_metadta, delete no longer used s3::download_s3_files, rename s3::list_s3_paths to s3::list_input_paths, and move function higher up in module * rename s3::delete_s3_file to s3::delete_output_file and improve function, rename s3::upload_s3_file to s3::upload_output_file and improve function * add s3::generate_pre_signed_link function * change reference to aws boto3 variable from s3 to aws_s3 in s3 module * update references to new s3 module function names in Project, Sample, Library, and ComputedFile models * update tests to reflect new s3 module function names * exchange s3::create_download_url for s3::generate_pre_signed_link in ComputedFile model and ComputedFile tests * delete no longer used s3::create_download_url * add configure_aws_cli django management command * add calls to configure_aws_cli management command for testing environmment, staging and prod environments, and add it to sportal * add logging to configure_aws_cli management command * delete unused s3::configure_aws_cli and call to it in load_data command * delete unused aws cli configuration options for load_data command * add age_timing attribute to Sample model, rename age_at_diagnosis attribute to age in Sample model * remove unused SAMPLE_METADATA_KEYS construct from metadata_file module and update metadata_file::load_samples_metadata accordingly * add migration for Sample model added and rename attributes * update references from age_at_diagnosis to age throughout code, add in new attribute age_timing along side age * fix random formatting bug in Sample::get_metadata, add missed age_timing attribute into expected_keys in test_load_data * update testing bucket to most recent bucket, adjust sample_cell_count_estimate assertion value in test_load_data::test_single_cell_metadata to account for change to filtered cells in input SCPCL999990_metadata.json file * add boto3 config to configure_aws_cli command * short term patch - move configure_aws_cli command call out of run_tests and into test_load_data * remove no longer used Force attribute from calls to s3::delete_output_file in Project::purge * pass s3_key to s3::delete_output_file in Project::purge and not entire ComputedFile object * update type hints in s3 module * explicitly destructure used kwargs in configure_aws_cli function in configure_aws_cli command * replace project_samples_mapping with load_data::project_has_s3_files helper function in load_data command * ensure sample exists before creating library * docker-compose to docker compose * remove docker-compose in GHA * docker-compose to docker compose * remove docker-compose in GHA * remove Sample::AgeTiming(models.TextChoices), set age and age_timing Sample attributes to default=common.NA, add comment to test_load_data noting that sample_cell_count_estimate assertion in test_single_cell_metadata will likely fail during version changes * docker-compose to docker compose * remove docker-compose in GHA * swap migration files * adjust migration to rename age field and not to delete it and re-add it * update age and age_timing Sample attributes migration, make fields non-nullable without a default in Sample model * update ProjectSamplesTable component on client side to incorporate renaming of age_at_diagnosis Sample attribute to age and addition of new age_timing attribute * propogate attribute changes to api/resources/samples and related storybook mock data prop * add bool return values for s3::download_input_files, s3::download_input_metadata, s3::upload_output_file * change logging level in s3::list_input_paths from error to warning * update code clarity in ComputedFile::download_url * fix bug in in update_s3 clause in Project::on_get_project_file and Sample::on_get_sample_file, update return early value in s3::delete_output_file * match filename to docs * match filename to docs * update postgres minor version 12.14 -> 12.19 * db.t2.micro -> db.t3.micro --------- Co-authored-by: Avrohom Gottlieb <[email protected]> Co-authored-by: David Mejia <[email protected]> Co-authored-by: Avrohom Gottlieb <[email protected]> * add input tests to test_s3 on function s3::list_input_paths * add output tests to test_s3 on function s3::list_input_paths, delete remaining unnecessary tests * clarify that recursive is called by default by adding words 'by_default' test_llist_input_paths_recursive_flag_passed test case in test_s3, update comments * fix possible end of s3 resource slash bug in s3::list_input_paths, clean up tests in test_s3 module * update s3 resource trailing slash appending to to occur only when recursive=False in s3::list_input_paths, update tests accordingly * 837 - Manage whitespace in generated readmes (#839) * ensure sample exists before creating library * docker-compose to docker compose * remove docker-compose in GHA * (edit) adjust readme template files to have no more than double newlines * add a new class method ‘save_readme_files’ which rename each zip file’s README.md to the name of the zip file and save it to test_data/readmes folder and call this method from tearDownClass, add a constant ENCODING and replace hard-coded unicode 'utf-8', and add not to ignore api/test_data/readmes folder to gitignore * add a new instance method 'assertProjectReadmeContent' which compares the saved formatted readme file content in test_data/readmes with the README.md content of each zip files, and call it from 'test_readme_files'(previously 'save_readme_files') * (minor) remove no longer used constant * (edit) move the Local Data for beck end rule to bottom of the file to prevent, and track files in api/test_data/readmes * (edit) add api/test_data/readmes to exlude in pre-commit trailing-whitespace * (edit) add a boolean flag SAVE_README_OUTPUT that determines whether the readme files should be saved to api/readmes (readme files should be re-generated when their contents changes for formatting testing, otherwise by default it set to False) * (rename) append the suffix '_FILE' to the constant README * (edit) refactor test_load_data's class method 'test_readme_files' to exclude PROJECT ID prefixes when renaming README.md (to remove redundancy check) and to only save the readme files if the combination (for each computed file type) doesn't exist, and make scope adjustments * (delete) remove previosly generated readme files from test_data/readmes (no longer needed) * add the generated test_data/readmes/ files * (edit) modify the class method save_readme_files (previously test_readme_files) and regenerate test_data/readmes files * (remove) remove the class method save_readme_files (no longer required), constans ENCODING and SAVE_README_OUTPUT, and adjust codebase in test_load_data * (edit) adjust assertProjectReadmeContent to take zip_file as a positional arg and to replace project ID and data in readme contents with placeholder values for generic format testing using its inner method get_masked_content, and replace assertProjectReadmeContains (deleted) with assertProjectReadmeContent in each test * (edit) update assertProjectReadmeContent to use the splitlines method for content comparison, adjust get_updated_content (previously get_masked_content) to replace the placeholders with each test's project id and today's date, and adjust the test_data/readmes files and codebase * (minor) replace f-string to just project_id (not required) and add a bit more context to the comment for splitlines * (edit) add logging to track the names of the readme file and the current test method to assertProjectReadmeContent for easiler debugging * (edit) add a custom message log (print the names of the test and readme file in test_data/readmes) only when test failed * (edit) update 'assertProjectReadmeContent' to take a list of project IDs and use the string.replace method to replace the placeholders(no longer using regex) in test_load_data, and adjust test_data/readems's placeholders * (edit) update the assertion failure message for readme content comparison in assertProjectReadmeContent * (edit) remove the placeholders variable and directly replace the placeholder with project IDs in assertProjectReadmeContent in test_load_data --------- Co-authored-by: David Mejia <[email protected]> * (merge) resolve conflicts to merge feature/consolidate-readmes-into-dev * Revert "Fix conflict in test_load_data" * Revert "Revert "Fix conflict in test_load_data"" * Merge dev updates into feature/portal-medatada-command (#807) * create readme_creation.py module, move all readme constants and ComputedFile::get_readme_from_download_config method to new file, update references in Project and ComputedFile models that call these constants * add readme_creation::create_readme_file, add TEMPLATE_PATHS data structure in readme_creation module * fix bug in readme_creation::create_readme_file * swap out readme implementations in ComputedFile::get_project_file and ComputedFile::get_sample_file * remove eager readme generation logic from Project model and no longer used constants in readme_creation module * update constant names and method name and signature in readme_creation module, propogated changes to caller methods in ComputedFile model * change file name from readme_creation to readme_file * update references to readme_file in ComputedFile in light of name change * remove leftover project keyword arg reference in ComputedFile --------- Co-authored-by: Avrohom Gottlieb <[email protected]> Co-authored-by: Avrohom Gottlieb <[email protected]> * Merge dev updates into feature/portal-medatada-command (#812) * create readme_creation.py module, move all readme constants and ComputedFile::get_readme_from_download_config method to new file, update references in Project and ComputedFile models that call these constants * add readme_creation::create_readme_file, add TEMPLATE_PATHS data structure in readme_creation module * fix bug in readme_creation::create_readme_file * swap out readme implementations in ComputedFile::get_project_file and ComputedFile::get_sample_file * remove eager readme generation logic from Project model and no longer used constants in readme_creation module * update constant names and method name and signature in readme_creation module, propogated changes to caller methods in ComputedFile model * change file name from readme_creation to readme_file * move MetadaFilenames class of constants from ComputedFile model to metadata_file module, move output metadata file naming logic from ComputedFile to metadata_file * replace writing of metadata to file with writing to buffer in ComputedFile::get_project|sample_file, update metadata_file.write_metadata_dicts to handle writing to buffer * update references to readme_file in ComputedFile in light of name change * remove leftover project keyword arg reference in ComputedFile * add new method metadata_file::get_file_contents, move buffer logic out of ComputedFile::get_project|sample_file into metadata_file::get_file_contents, rename metadata_file::get_metadata_file_name to metadata_file::get_file_name * improve readability of metadata_file::get_file_name * update metadata_file::get_file_contents signature to accept list of metadata dicts as opposed to list of libraries * update tests in test_metadata_file to handle metadata_file::get_file_contents and not metadata_file::write_metadata_dicts * remove metadata_file::write_metadata_dicts --------- Co-authored-by: Avrohom Gottlieb <[email protected]> Co-authored-by: Avrohom Gottlieb <[email protected]> * 797 - Scaffolding the porta-wide metadata management command and test files (#804) * create the management command file for the portal-wide metadata and register it to sportal * create the test file for the portal-wide matadata management command test_create_portal_metadata * (edit) rename the setup_database method to load_test_data, remove 'project_id' kwarg when calling the load_data command, check PROJECT_COUNT, SAMPLES_COUNT, and LIBRARIES_COUNT by quering libraries metadata from the Library model to verify database setup instead of checking an explicit list of project ids * (edit) check against queryset objects count rather than IDs for projects, samples, and libraries * (minor) fix a typo and remove comments * Merge dev updates into feature/portal-metadata-command (#857) * add s3 module * update references in management command, models and tests to point to functions in new s3 module, remove duplicate code * move utils::list_s3_paths to new s3 module, update references in other files * move TestListS3Paths class from test_utils to new test_s3 testing module * add generic s3::download_s3_files method, add s3::download_metadata_files and s3::download_data_files methods * fix list.extend bug in s3::download_s3_files * add s3::download_sample_data_files and s3::download_project_data_files methods, impove s3::download_metadata_files * swap s3::download_data for s3::download_metadata_files in load_data command, call s3::download_project_data_files in Project::create_computed_files, call s3::download_sample_data_files in Sample::create_computed_files, add Sample::has_downloaded_data property method * improve readability of Project::get_data_file_paths * delete s3::download_data and s3::download_data_files functions, update s3 module function docstrings * add s3::download_data_files * integrate s3::download_data_files inside of ComputedFile::get_project|sample_file, remove references to s3::download_sample_data_files inside of Sample::create_computed_files and s3::download_project_data_files inside of Project::create_computed_files * remove unused s3::download_project_data_files and s3::download_sample_data_files * remove unused property method Sample::has_downloaded_data * replace s3::download_data_files with s3::download_input_files, replace s3::download_metadata_files with s3::download_input_metadta, delete no longer used s3::download_s3_files, rename s3::list_s3_paths to s3::list_input_paths, and move function higher up in module * rename s3::delete_s3_file to s3::delete_output_file and improve function, rename s3::upload_s3_file to s3::upload_output_file and improve function * add s3::generate_pre_signed_link function * change reference to aws boto3 variable from s3 to aws_s3 in s3 module * update references to new s3 module function names in Project, Sample, Library, and ComputedFile models * update tests to reflect new s3 module function names * exchange s3::create_download_url for s3::generate_pre_signed_link in ComputedFile model and ComputedFile tests * delete no longer used s3::create_download_url * add configure_aws_cli django management command * add calls to configure_aws_cli management command for testing environmment, staging and prod environments, and add it to sportal * add logging to configure_aws_cli management command * delete unused s3::configure_aws_cli and call to it in load_data command * delete unused aws cli configuration options for load_data command * add age_timing attribute to Sample model, rename age_at_diagnosis attribute to age in Sample model * remove unused SAMPLE_METADATA_KEYS construct from metadata_file module and update metadata_file::load_samples_metadata accordingly * add migration for Sample model added and rename attributes * update references from age_at_diagnosis to age throughout code, add in new attribute age_timing along side age * fix random formatting bug in Sample::get_metadata, add missed age_timing attribute into expected_keys in test_load_data * update testing bucket to most recent bucket, adjust sample_cell_count_estimate assertion value in test_load_data::test_single_cell_metadata to account for change to filtered cells in input SCPCL999990_metadata.json file * add boto3 config to configure_aws_cli command * short term patch - move configure_aws_cli command call out of run_tests and into test_load_data * remove no longer used Force attribute from calls to s3::delete_output_file in Project::purge * pass s3_key to s3::delete_output_file in Project::purge and not entire ComputedFile object * update type hints in s3 module * explicitly destructure used kwargs in configure_aws_cli function in configure_aws_cli command * replace project_samples_mapping with load_data::project_has_s3_files helper function in load_data command * docker-compose to docker compose * remove docker-compose in GHA * remove Sample::AgeTiming(models.TextChoices), set age and age_timing Sample attributes to default=common.NA, add comment to test_load_data noting that sample_cell_count_estimate assertion in test_single_cell_metadata will likely fail during version changes * docker-compose to docker compose * remove docker-compose in GHA * swap migration files * adjust migration to rename age field and not to delete it and re-add it * update age and age_timing Sample attributes migration, make fields non-nullable without a default in Sample model * update ProjectSamplesTable component on client side to incorporate renaming of age_at_diagnosis Sample attribute to age and addition of new age_timing attribute * propogate attribute changes to api/resources/samples and related storybook mock data prop * add bool return values for s3::download_input_files, s3::download_input_metadata, s3::upload_output_file * change logging level in s3::list_input_paths from error to warning * update code clarity in ComputedFile::download_url * fix bug in in update_s3 clause in Project::on_get_project_file and Sample::on_get_sample_file, update return early value in s3::delete_output_file * match filename to docs * match filename to docs * update postgres minor version 12.14 -> 12.19 * db.t2.micro -> db.t3.micro * fix setting name AWS_S3_BUCKET_NAME * enforce latest RDS CA * update cert name: latest -> cert * set rds ca: rds-ca-rsa2048-g1 --------- Co-authored-by: Avrohom Gottlieb <[email protected]> Co-authored-by: David Mejia <[email protected]> Co-authored-by: Avrohom Gottlieb <[email protected]> * 797 - Generate and add `README.md` to a zip file (#808) * create the management command file for the portal-wide metadata and register it to sportal * create the test file for the portal-wide matadata management command test_create_portal_metadata * (edit) rename the setup_database method to load_test_data, remove 'project_id' kwarg when calling the load_data command, check PROJECT_COUNT, SAMPLES_COUNT, and LIBRARIES_COUNT by quering libraries metadata from the Library model to verify database setup instead of checking an explicit list of project ids * add ComputedFile::get_portal_metadata_file method and readme_file.get_portal_metadata_file_content for the portal-wide metadata readme file generation * (edit) call ComputedFile::get_portal_metadata_file in create-portal-metadata management command and update its test * (edit) check against queryset objects count rather than IDs for projects, samples, and libraries * (minor) fix a typo and remove comments * add common.GENERATED_PORTAL_METADATA_DOWNLOAD_CONFIG for the portal metadata download and adjust create_portal_metadata management command, change suffix CONFIGURATIONS to CONFIG for the download config and adjust the codebase accordingly * (edit) edit readme_file.get_file_contents temporarily(and remove get_portal_metadata_file_content - no longer needed), add a new model field portal_metadata_only and update ComputedFile::get_portal_metadata_file to return the instance of the computed file, and adjust test_create_portal_metadata * (edit) check the portal_metadata_only key to swap config and template contexts values (remove the Project instance check) and adjust the codebase accordingly * (edit) add a type hint for the queryset parameter * (edit) store the portal metadata computed file to a local variable computed_file in the create_portal_metadata management command, and temporarily add noqa to supress Flake8 unused variable warning (with TODO comment) * (edit) adjust readme_file.get_file_contents using new readme template structures, and make common.GENERATED_PORTAL_METADATA_DOWNLOAD_CONFIG from a list to a dict * (edit) remove the test_zip_file and test_readme_file methods and assert directly from test_create_portal_metadata, and change the readme assert value and remove TODO comments * (bug) remove the extra context manager for readme check * (minor) move comments and variable for zip assertion inside the context manager * (rename) rename common.GENERATED_SAMPLE_DOWNLOAD_CONFIGS (to plural) and adjust codebase accordingly * (edit) refactor download_config handling for portal-metadata (revert from dict to list to perform the same check (in) as the project's config, remove the variable, add a QuerySet check to determine whether 'projects' should be passed as when dealing with a single project * (clean up) remove the constant added previously(no longer used) * (edit) use list literals to add a check that matches the project download config check * (adjust) rollblack previous migration and re-migrate the portal_metadata_only field for ComputedFile model * (edit) rename GENERATED_PROJECT_DOWNLOAD_CONFIG to GENERATED_PROJECT_DOWNLOAD_CONFIGS, and replace iin with is in the readme_file module for the portal metadata config test * Merge 'feature/portal-metadata-command' branch into nozomione/797-generate-readme-file-zip-2 * (edit) add Iterable type hint instead of Queryset in readme_file.get_file_contents, and pass the list as arguments for Project and Sample when calling readme_file.get_file_contents in computed_file * 797 - Generate and add `metadata.tsv` to a zip file (#810) * create the management command file for the portal-wide metadata and register it to sportal * create the test file for the portal-wide matadata management command test_create_portal_metadata * (edit) rename the setup_database method to load_test_data, remove 'project_id' kwarg when calling the load_data command, check PROJECT_COUNT, SAMPLES_COUNT, and LIBRARIES_COUNT by quering libraries metadata from the Library model to verify database setup instead of checking an explicit list of project ids * add ComputedFile::get_portal_metadata_file method and readme_file.get_portal_metadata_file_content for the portal-wide metadata readme file generation * (edit) call ComputedFile::get_portal_metadata_file in create-portal-metadata management command and update its test * (edit) add the portal metadata file generation workflow to ComputedFile::get_portal_metadata_file * (edit) adjust the create portal metadata command and its test (add test for metadata file, organize and split the logic into separate methods for readbility) * (edit) check against queryset objects count rather than IDs for projects, samples, and libraries * (minor) fix a typo and remove comments * add common.GENERATED_PORTAL_METADATA_DOWNLOAD_CONFIG for the portal metadata download and adjust create_portal_metadata management command, change suffix CONFIGURATIONS to CONFIG for the download config and adjust the codebase accordingly * (edit) edit readme_file.get_file_contents temporarily(and remove get_portal_metadata_file_content - no longer needed), add a new model field portal_metadata_only and update ComputedFile::get_portal_metadata_file to return the instance of the computed file, and adjust test_create_portal_metadata * (edit) check the portal_metadata_only key to swap config and template contexts values (remove the Project instance check) and adjust the codebase accordingly * (edit) add a type hint for the queryset parameter * (edit) store the portal metadata computed file to a local variable computed_file in the create_portal_metadata management command, and temporarily add noqa to supress Flake8 unused variable warning (with TODO comment) * (edit) add the static method ComputedFile::get_local_portal_metadata_path and remove OUTPUT_* constants from common and adjust the codebase * (edit) adjust readme_file.get_file_contents using new readme template structures, and make common.GENERATED_PORTAL_METADATA_DOWNLOAD_CONFIG from a list to a dict * (edit) remove the test_zip_file and test_readme_file methods and assert directly from test_create_portal_metadata, and change the readme assert value and remove TODO comments * (bug) remove the extra context manager for readme check * (edit) remove the test_metadata_file method and directly assert the metadata file in test_create_portal_metadata, and use common.METADATA_COLUMN_SORT_ORDER as the value of expected_keys instead of hand-coded value * (minor) move comments and variable for zip assertion inside the context manager * (edit) use the csv module's DictReader for the metadata.tsv file assertion * (rename) rename common.GENERATED_SAMPLE_DOWNLOAD_CONFIGS (to plural) and adjust codebase accordingly * (edit) refactor download_config handling for portal-metadata (revert from dict to list to perform the same check (in) as the project's config, remove the variable, add a QuerySet check to determine whether 'projects' should be passed as when dealing with a single project * (clean up) remove the constant added previously(no longer used) * (edit) use list literals to add a check that matches the project download config check * add a new static method ComputedFile::get_local_file_path (remove Computed::get_local_portal_metadata_path), add the 'zip_file_path' variable in ComputedFile::get_portal_metadata_file, and adjust test_create_portal_metadata * (adjust) rollblack previous migration and re-migrate the portal_metadata_only field for ComputedFile model * (edit) rename GENERATED_PROJECT_DOWNLOAD_CONFIG to GENERATED_PROJECT_DOWNLOAD_CONFIGS, and replace iin with is in the readme_file module for the portal metadata config test * (edit) use the config.logging.get_and_configure_logger for the log message and remove the logging module) from the create_portal_metadata management command, adjust test_create_portal_metadata to explicitly check each file with assertIn (no for loop and remove the expected_files variable), use csv and TextIOWrapper to open TSV file to check the content, and assert expected_keys using set (no longer using list), and add check for library ids * Merge 'feature/portal-metadata-command' branch into nozomione/797-generate-readme-file-zip-2 * (edit) add Iterable type hint instead of Queryset in readme_file.get_file_contents, and pass the list as arguments for Project and Sample when calling readme_file.get_file_contents in computed_file * (pre-commit) remove the old unused redandant method for the portal metadata from computed_file model * 813 - Persist portal metadata computed file and upload after creation (#825) * create the management command file for the portal-wide metadata and register it to sportal * create the test file for the portal-wide matadata management command test_create_portal_metadata * (edit) rename the setup_database method to load_test_data, remove 'project_id' kwarg when calling the load_data command, check PROJECT_COUNT, SAMPLES_COUNT, and LIBRARIES_COUNT by quering libraries metadata from the Library model to verify database setup instead of checking an explicit list of project ids * add ComputedFile::get_portal_metadata_file method and readme_file.get_portal_metadata_file_content for the portal-wide metadata readme file generation * (edit) call ComputedFile::get_portal_metadata_file in create-portal-metadata management command and update its test * (edit) add the portal metadata file generation workflow to ComputedFile::get_portal_metadata_file * (edit) adjust the create portal metadata command and its test (add test for metadata file, organize and split the logic into separate methods for readbility) * (edit) check against queryset objects count rather than IDs for projects, samples, and libraries * (minor) fix a typo and remove comments * add common.GENERATED_PORTAL_METADATA_DOWNLOAD_CONFIG for the portal metadata download and adjust create_portal_metadata management command, change suffix CONFIGURATIONS to CONFIG for the download config and adjust the codebase accordingly * (edit) edit readme_file.get_file_contents temporarily(and remove get_portal_metadata_file_content - no longer needed), add a new model field portal_metadata_only and update ComputedFile::get_portal_metadata_file to return the instance of the computed file, and adjust test_create_portal_metadata * (edit) check the portal_metadata_only key to swap config and template contexts values (remove the Project instance check) and adjust the codebase accordingly * (edit) add a type hint for the queryset parameter * (edit) store the portal metadata computed file to a local variable computed_file in the create_portal_metadata management command, and temporarily add noqa to supress Flake8 unused variable warning (with TODO comment) * (edit) add the static method ComputedFile::get_local_portal_metadata_path and remove OUTPUT_* constants from common and adjust the codebase * (edit) add save/upload logic to the create portal metadata management command for the database/s3 bucket respectively * (fix) fixed the readme output * (edit) adjust readme_file.get_file_contents using new readme template structures, and make common.GENERATED_PORTAL_METADATA_DOWNLOAD_CONFIG from a list to a dict * (edit) remove the test_zip_file and test_readme_file methods and assert directly from test_create_portal_metadata, and change the readme assert value and remove TODO comments * (bug) remove the extra context manager for readme check * (edit) remove the test_metadata_file method and directly assert the metadata file in test_create_portal_metadata, and use common.METADATA_COLUMN_SORT_ORDER as the value of expected_keys instead of hand-coded value * (minor) move comments and variable for zip assertion inside the context manager * (edit) use the csv module's DictReader for the metadata.tsv file assertion * (edit) remove TODO comment and move the body of test_computed_file inside test_create_portal_metadata for the computed file assertion * (minor) remove the LOCAL_ZIP_FILE_PATH variable (no need - since the test no longer has individual methods) * (typo) fix a typo * (minor) remove a comment in computed_file, instead it wil be included in the PR * (edit) add an arg for upload_s3 and define the constans for args' default value to the create_portal_metadata management command, and add constants for default values and rename 'zip' to 'zip_file' in the context mamanger for testing zip file in the test_create_portal_metadata * (rename) append the suffix '_FILE' to the constants README and METADATA * (rename) rename common.GENERATED_SAMPLE_DOWNLOAD_CONFIGS (to plural) and adjust codebase accordingly * (edit) refactor download_config handling for portal-metadata (revert from dict to list to perform the same check (in) as the project's config, remove the variable, add a QuerySet check to determine whether 'projects' should be passed as when dealing with a single project * (clean up) remove the constant added previously(no longer used) * (edit) use list literals to add a check that matches the project download config check * add a new static method ComputedFile::get_local_file_path (remove Computed::get_local_portal_metadata_path), add the 'zip_file_path' variable in ComputedFile::get_portal_metadata_file, and adjust test_create_portal_metadata * (migration) merge migration files to resolve multiple leaf nodes * (edit) use s3.upload_output_file (remove computed_file.upload_s3_file - no longer used), fix verb from 'upload' to 'update' and adjust the codebase, remove constants (CLEAN_UP_OUTPUT_DATA and UPLOAD_S3 - no longer used) in create_portal_metadata management command, and minor adjustment in its test * (minor) remove the constant 'ENCODING' to match the implementation of PR 389 (the whitespace removal from the readme files) * (TODO comment) add a TODO comment to indicate that once PR #839 is merged into 'dev' branch, readme file format testing will be added, and 'assertProjectReadmeContains' will be removed from test_create_portal_metadata * (edit) add the command 'configure_aws_cli' to resolve the duplicated load data call in test_create_portal_metadata * (TENP) temporaily skip isort to modify the import order of the management commands (to prevent tests to run in parallel) * (edit) add mock for s3.upload_output_file used in create_portal_metadata * (migration) undo the merged migration files * (fix) run pre-commit and migrate * (adjust) rollblack previous migration and re-migrate the portal_metadata_only field for ComputedFile model * (mionr) re-locate the logger message and fix typos in create_portal_metadata management command file * (minor) make another revision of the comments * (remove TEMP) remove temporaily added codeblocks * (edit) access kwargs props using square brackets instead of the get method * (edit) use the get method to give the default value in if condition to prevent an error when the args is not passed when calling the portal metadata management command * (minor) add a check for adding logging handler to make sure no duplicate loggings * (minor) add a comment and remove the handler var (no needed) * (minor) add a comment for the handler check * (edit) rename GENERATED_PROJECT_DOWNLOAD_CONFIG to GENERATED_PROJECT_DOWNLOAD_CONFIGS, and replace iin with is in the readme_file module for the portal metadata config test * (edit) use the config.logging.get_and_configure_logger for the log message and remove the logging module) from the create_portal_metadata management command, adjust test_create_portal_metadata to explicitly check each file with assertIn (no for loop and remove the expected_files variable), use csv and TextIOWrapper to open TSV file to check the content, and assert expected_keys using set (no longer using list), and add check for library ids * Merge 'feature/portal-metadata-command' branch into nozomione/797-generate-readme-file-zip-2 * (edit) add Iterable type hint instead of Queryset in readme_file.get_file_contents, and pass the list as arguments for Project and Sample when calling readme_file.get_file_contents in computed_file * (edit) pass the argument 'computed_file.s3_bucket' to 's3.upload_outout_file' (new change from dev), add the new parameters ('clean_up_output_data’, 'update_s3') with type hints to the 'create_portal_metadata' method function signature in the create portal metadata command, and adjust its test * (edit) clean up output data regardless of computed file existence and enhance the help string for the create_portal_metadata management command, add the assertion for computed file count (for singularity) and fields (values match the portal metadata download configuration) in test_create_portal_metadata, update CopmutedFile::get_portal_metadata_file to explicitly set field values defined in the download config when instantiating a computed file for the portal metadata * (fix) adjust the computed file byte size in the test_create_portal)metadata * (edit) use walrus operator for computed_file assignment and save computed_file object after the successful s3 file upload in create_portal_metadata * (edit) add assertEqualWithVariance (which checks computed_file file size within the specified range) and assertFields (which checks the computed file field values against download configurations) to test_create_portal_metadata * (edit) remove data type checks for the field values and instad, perform equality check in assertFields * 815 - Ensure only one portal metadata computed file exists at a time (#859) * create the management command file for the portal-wide metadata and register it to sportal * create the test file for the portal-wide matadata management command test_create_portal_metadata * (edit) rename the setup_database method to load_test_data, remove 'project_id' kwarg when calling the load_data command, check PROJECT_COUNT, SAMPLES_COUNT, and LIBRARIES_COUNT by quering libraries metadata from the Library model to verify database setup instead of checking an explicit list of project ids * add ComputedFile::get_portal_metadata_file method and readme_file.get_portal_metadata_file_content for the portal-wide metadata readme file generation * (edit) call ComputedFile::get_portal_metadata_file in create-portal-metadata management command and update its test * (edit) add the portal metadata file generation workflow to ComputedFile::get_portal_metadata_file * (edit) adjust the create portal metadata command and its test (add test for metadata file, organize and split the logic into separate methods for readbility) * (edit) check against queryset objects count rather than IDs for projects, samples, and libraries * (minor) fix a typo and remove comments * add common.GENERATED_PORTAL_METADATA_DOWNLOAD_CONFIG for the portal metadata download and adjust create_portal_metadata management command, change suffix CONFIGURATIONS to CONFIG for the download config and adjust the codebase accordingly * (edit) edit readme_file.get_file_contents temporarily(and remove get_portal_metadata_file_content - no longer needed), add a new model field portal_metadata_only and update ComputedFile::get_portal_metadata_file to return the instance of the computed file, and adjust test_create_portal_metadata * (edit) check the portal_metadata_only key to swap config and template contexts values (remove the Project instance check) and adjust the codebase accordingly * (edit) add a type hint for the queryset parameter * (edit) store the portal metadata computed file to a local variable computed_file in the create_portal_metadata management command, and temporarily add noqa to supress Flake8 unused variable warning (with TODO comment) * (edit) add the static method ComputedFile::get_local_portal_metadata_path and remove OUTPUT_* constants from common and adjust the codebase * (edit) add save/upload logic to the create portal metadata management command for the database/s3 bucket respectively * (fix) fixed the readme output * (edit) adjust readme_file.get_file_contents using new readme template structures, and make common.GENERATED_PORTAL_METADATA_DOWNLOAD_CONFIG from a list to a dict * (edit) remove the test_zip_file and test_readme_file methods and assert directly from test_create_portal_metadata, and change the readme assert value and remove TODO comments * (bug) remove the extra context manager for readme check * (edit) remove the test_metadata_file method and directly assert the metadata file in test_create_portal_metadata, and use common.METADATA_COLUMN_SORT_ORDER as the value of expected_keys instead of hand-coded value * (minor) move comments and variable for zip assertion inside the context manager * (edit) use the csv module's DictReader for the metadata.tsv file assertion * (edit) remove TODO comment and move the body of test_computed_file inside test_create_portal_metadata for the computed file assertion * (minor) remove the LOCAL_ZIP_FILE_PATH variable (no need - since the test no longer has individual methods) * (typo) fix a typo * (minor) remove a comment in computed_file, instead it wil be included in the PR * (edit) add an arg for upload_s3 and define the constans for args' default value to the create_portal_metadata management command, and add constants for default values and rename 'zip' to 'zip_file' in the context mamanger for testing zip file in the test_create_portal_metadata * (rename) append the suffix '_FILE' to the constants README and METADATA * (rename) rename common.GENERATED_SAMPLE_DOWNLOAD_CONFIGS (to plural) and adjust codebase accordingly * (edit) refactor download_config handling for portal-metadata (revert from dict to list to perform the same check (in) as the project's config, remove the variable, add a QuerySet check to determine whether 'projects' should be passed as when dealing with a single project * (clean up) remove the constant added previously(no longer used) * (edit) use list literals to add a check that matches the project download config check * add a new static method ComputedFile::get_local_file_path (remove Computed::get_local_portal_metadata_path), add the 'zip_file_path' variable in ComputedFile::get_portal_metadata_file, and adjust test_create_portal_metadata * (migration) merge migration files to resolve multiple leaf nodes * (edit) use s3.upload_output_file (remove computed_file.upload_s3_file - no longer used), fix verb from 'upload' to 'update' and adjust the codebase, remove constants (CLEAN_UP_OUTPUT_DATA and UPLOAD_S3 - no longer used) in create_portal_metadata management command, and minor adjustment in its test * (minor) remove the constant 'ENCODING' to match the implementation of PR 389 (the whitespace removal from the readme files) * (TODO comment) add a TODO comment to indicate that once PR #839 is merged into 'dev' branch, readme file format testing will be added, and 'assertProjectReadmeContains' will be removed from test_create_portal_metadata * (edit) add the command 'configure_aws_cli' to resolve the duplicated load data call in test_create_portal_metadata * (TENP) temporaily skip isort to modify the import order of the management commands (to prevent tests to run in parallel) * (edit) add mock for s3.upload_output_file used in create_portal_metadata * (migration) undo the merged migration files * (fix) run pre-commit and migrate * (adjust) rollblack previous migration and re-migrate the portal_metadata_only field for ComputedFile model * (mionr) re-locate the logger message and fix typos in create_portal_metadata management command file * (minor) make another revision of the comments * (remove TEMP) remove temporaily added codeblocks * add the purge function to the create_portal_metadata management command and add its test to test_create_portal_metadata * add the argument 'purge' and the keyword argument 'delete_from_s3' for purge_computed)file, and call it from the handle method * (edit) access kwargs props using square brackets instead of the get method * (edit) access kwargs props using square brackets instead of the get method * (edit) use the get method to give the default value in if condition to prevent an error when the args is not passed when calling the portal metadata management command * (edit) add fallback values to kwargs for purge in create_portal_metadata management command, add mock for upload_output_file to test_purge_computed_file * (minor) add a check for adding logging handler to make sure no duplicate loggings * (minor) add a comment and remove the handler var (no needed) * (minor) add a comment for the handler check * (minor) improve comments * (edit) rename GENERATED_PROJECT_DOWNLOAD_CONFIG to GENERATED_PROJECT_DOWNLOAD_CONFIGS, and replace iin with is in the readme_file module for the portal metadata config test * (edit) use the config.logging.get_and_configure_logger for the log message and remove the logging module) from the create_portal_metadata management command, adjust test_create_portal_metadata to explicitly check each file with assertIn (no for loop and remove the expected_files variable), use csv and TextIOWrapper to open TSV file to check the content, and assert expected_keys using set (no longer using list), and add check for library ids * Merge 'feature/portal-metadata-command' branch into nozomione/797-generate-readme-file-zip-2 * (edit) add Iterable type hint instead of Queryset in readme_file.get_file_contents, and pass the list as arguments for Project and Sample when calling readme_file.get_file_contents in computed_file * (edit) pass the argument 'computed_file.s3_bucket' to 's3.upload_outout_file' (new change from dev), add the new parameters ('clean_up_output_data’, 'update_s3') with type hints to the 'create_portal_metadata' method function signature in the create portal metadata command, and adjust its test * (edit) call 'purge_computed_file' from 'create_portal_metadata' in the 'create_portal_matadata' command and add a new test 'test_only_one_computed_file_at_any_point' (remove 'test_purge_computed_file') in 'test.create_portal_metadata' * (rename) rename the keyword arg from 'delete_from_s3' to 'update_s3 in the 'purge_computed_file' method in the 'cretate_portal_metadata' command' * (edit) clean up output data regardless of computed file existence and enhance the help string for the create_portal_metadata management command, add the assertion for computed file count (for singularity) and fields (values match the portal metadata download configuration) in test_create_portal_metadata, update CopmutedFile::get_portal_metadata_file to explicitly set field values defined in the download config when instantiating a computed file for the portal metadata * (edit) rename 'exsisting_computed_file' to 'old_computed_file' and optimize the 'if' check using the warlus operator, remove the 'if' check for copmuted file existence when purgingand let it crash when no computed file exists (since we'll call the purge method only if the computed file exists) * (fix) adjust the computed file byte size in the test_create_portal)metadata * (edit) use walrus operator for computed_file assignment and save computed_file object after the successful s3 file upload in create_portal_metadata * (edit) add assertEqualWithVariance (which checks computed_file file size within the specified range) and assertFields (which checks the computed file field values against download configurations) to test_create_portal_metadata * (edit) update test_only_one_computed_file_at_any_point to call twice create_portal_metadata without storing their values when checking the singularity of the computed file object, query the newly saved computed file to ensure mock_delete_output_file can be called one with its field values, (minor) remove the first logger ('Purging the portal-wide metadata computed...') from the purge method in create_portal_metadata management command --------- Co-authored-by: Avrohom Gottlieb <[email protected]> Co-authored-by: David Mejia <[email protected]> Co-authored-by: Avrohom Gottlieb <[email protected]>
* Merge dev updates into feature/portal-medatada-command (#807) * create readme_creation.py module, move all readme constants and ComputedFile::get_readme_from_download_config method to new file, update references in Project and ComputedFile models that call these constants * add readme_creation::create_readme_file, add TEMPLATE_PATHS data structure in readme_creation module * fix bug in readme_creation::create_readme_file * swap out readme implementations in ComputedFile::get_project_file and ComputedFile::get_sample_file * remove eager readme generation logic from Project model and no longer used constants in readme_creation module * update constant names and method name and signature in readme_creation module, propogated changes to caller methods in ComputedFile model * change file name from readme_creation to readme_file * update references to readme_file in ComputedFile in light of name change * remove leftover project keyword arg reference in ComputedFile --------- Co-authored-by: Avrohom Gottlieb <[email protected]> Co-authored-by: Avrohom Gottlieb <[email protected]> * Merge dev updates into feature/portal-medatada-command (#812) * create readme_creation.py module, move all readme constants and ComputedFile::get_readme_from_download_config method to new file, update references in Project and ComputedFile models that call these constants * add readme_creation::create_readme_file, add TEMPLATE_PATHS data structure in readme_creation module * fix bug in readme_creation::create_readme_file * swap out readme implementations in ComputedFile::get_project_file and ComputedFile::get_sample_file * remove eager readme generation logic from Project model and no longer used constants in readme_creation module * update constant names and method name and signature in readme_creation module, propogated changes to caller methods in ComputedFile model * change file name from readme_creation to readme_file * move MetadaFilenames class of constants from ComputedFile model to metadata_file module, move output metadata file naming logic from ComputedFile to metadata_file * replace writing of metadata to file with writing to buffer in ComputedFile::get_project|sample_file, update metadata_file.write_metadata_dicts to handle writing to buffer * update references to readme_file in ComputedFile in light of name change * remove leftover project keyword arg reference in ComputedFile * add new method metadata_file::get_file_contents, move buffer logic out of ComputedFile::get_project|sample_file into metadata_file::get_file_contents, rename metadata_file::get_metadata_file_name to metadata_file::get_file_name * improve readability of metadata_file::get_file_name * update metadata_file::get_file_contents signature to accept list of metadata dicts as opposed to list of libraries * update tests in test_metadata_file to handle metadata_file::get_file_contents and not metadata_file::write_metadata_dicts * remove metadata_file::write_metadata_dicts --------- Co-authored-by: Avrohom Gottlieb <[email protected]> Co-authored-by: Avrohom Gottlieb <[email protected]> * 797 - Scaffolding the porta-wide metadata management command and test files (#804) * create the management command file for the portal-wide metadata and register it to sportal * create the test file for the portal-wide matadata management command test_create_portal_metadata * (edit) rename the setup_database method to load_test_data, remove 'project_id' kwarg when calling the load_data command, check PROJECT_COUNT, SAMPLES_COUNT, and LIBRARIES_COUNT by quering libraries metadata from the Library model to verify database setup instead of checking an explicit list of project ids * (edit) check against queryset objects count rather than IDs for projects, samples, and libraries * (minor) fix a typo and remove comments * Merge dev updates into feature/portal-metadata-command (#835) * docker-compose to docker compose * remove docker-compose in GHA --------- Co-authored-by: David Mejia <[email protected]> * Merge dev updates into feature/portal-metadata-command (#857) * add s3 module * update references in management command, models and tests to point to functions in new s3 module, remove duplicate code * move utils::list_s3_paths to new s3 module, update references in other files * move TestListS3Paths class from test_utils to new test_s3 testing module * add generic s3::download_s3_files method, add s3::download_metadata_files and s3::download_data_files methods * fix list.extend bug in s3::download_s3_files * add s3::download_sample_data_files and s3::download_project_data_files methods, impove s3::download_metadata_files * swap s3::download_data for s3::download_metadata_files in load_data command, call s3::download_project_data_files in Project::create_computed_files, call s3::download_sample_data_files in Sample::create_computed_files, add Sample::has_downloaded_data property method * improve readability of Project::get_data_file_paths * delete s3::download_data and s3::download_data_files functions, update s3 module function docstrings * add s3::download_data_files * integrate s3::download_data_files inside of ComputedFile::get_project|sample_file, remove references to s3::download_sample_data_files inside of Sample::create_computed_files and s3::download_project_data_files inside of Project::create_computed_files * remove unused s3::download_project_data_files and s3::download_sample_data_files * remove unused property method Sample::has_downloaded_data * replace s3::download_data_files with s3::download_input_files, replace s3::download_metadata_files with s3::download_input_metadta, delete no longer used s3::download_s3_files, rename s3::list_s3_paths to s3::list_input_paths, and move function higher up in module * rename s3::delete_s3_file to s3::delete_output_file and improve function, rename s3::upload_s3_file to s3::upload_output_file and improve function * add s3::generate_pre_signed_link function * change reference to aws boto3 variable from s3 to aws_s3 in s3 module * update references to new s3 module function names in Project, Sample, Library, and ComputedFile models * update tests to reflect new s3 module function names * exchange s3::create_download_url for s3::generate_pre_signed_link in ComputedFile model and ComputedFile tests * delete no longer used s3::create_download_url * add configure_aws_cli django management command * add calls to configure_aws_cli management command for testing environmment, staging and prod environments, and add it to sportal * add logging to configure_aws_cli management command * delete unused s3::configure_aws_cli and call to it in load_data command * delete unused aws cli configuration options for load_data command * add age_timing attribute to Sample model, rename age_at_diagnosis attribute to age in Sample model * remove unused SAMPLE_METADATA_KEYS construct from metadata_file module and update metadata_file::load_samples_metadata accordingly * add migration for Sample model added and rename attributes * update references from age_at_diagnosis to age throughout code, add in new attribute age_timing along side age * fix random formatting bug in Sample::get_metadata, add missed age_timing attribute into expected_keys in test_load_data * update testing bucket to most recent bucket, adjust sample_cell_count_estimate assertion value in test_load_data::test_single_cell_metadata to account for change to filtered cells in input SCPCL999990_metadata.json file * add boto3 config to configure_aws_cli command * short term patch - move configure_aws_cli command call out of run_tests and into test_load_data * remove no longer used Force attribute from calls to s3::delete_output_file in Project::purge * pass s3_key to s3::delete_output_file in Project::purge and not entire ComputedFile object * update type hints in s3 module * explicitly destructure used kwargs in configure_aws_cli function in configure_aws_cli command * replace project_samples_mapping with load_data::project_has_s3_files helper function in load_data command * docker-compose to docker compose * remove docker-compose in GHA * remove Sample::AgeTiming(models.TextChoices), set age and age_timing Sample attributes to default=common.NA, add comment to test_load_data noting that sample_cell_count_estimate assertion in test_single_cell_metadata will likely fail during version changes * docker-compose to docker compose * remove docker-compose in GHA * swap migration files * adjust migration to rename age field and not to delete it and re-add it * update age and age_timing Sample attributes migration, make fields non-nullable without a default in Sample model * update ProjectSamplesTable component on client side to incorporate renaming of age_at_diagnosis Sample attribute to age and addition of new age_timing attribute * propogate attribute changes to api/resources/samples and related storybook mock data prop * add bool return values for s3::download_input_files, s3::download_input_metadata, s3::upload_output_file * change logging level in s3::list_input_paths from error to warning * update code clarity in ComputedFile::download_url * fix bug in in update_s3 clause in Project::on_get_project_file and Sample::on_get_sample_file, update return early value in s3::delete_output_file * match filename to docs * match filename to docs * update postgres minor version 12.14 -> 12.19 * db.t2.micro -> db.t3.micro * fix setting name AWS_S3_BUCKET_NAME * enforce latest RDS CA * update cert name: latest -> cert * set rds ca: rds-ca-rsa2048-g1 --------- Co-authored-by: Avrohom Gottlieb <[email protected]> Co-authored-by: David Mejia <[email protected]> Co-authored-by: Avrohom Gottlieb <[email protected]> * Rebase feature/portal-metadata-command with the latest dev (#877) * Merge dev updates into feature/consolidate-readmes (#850) * add s3 module * update references in management command, models and tests to point to functions in new s3 module, remove duplicate code * move utils::list_s3_paths to new s3 module, update references in other files * move TestListS3Paths class from test_utils to new test_s3 testing module * add generic s3::download_s3_files method, add s3::download_metadata_files and s3::download_data_files methods * fix list.extend bug in s3::download_s3_files * add s3::download_sample_data_files and s3::download_project_data_files methods, impove s3::download_metadata_files * swap s3::download_data for s3::download_metadata_files in load_data command, call s3::download_project_data_files in Project::create_computed_files, call s3::download_sample_data_files in Sample::create_computed_files, add Sample::has_downloaded_data property method * improve readability of Project::get_data_file_paths * delete s3::download_data and s3::download_data_files functions, update s3 module function docstrings * add s3::download_data_files * integrate s3::download_data_files inside of ComputedFile::get_project|sample_file, remove references to s3::download_sample_data_files inside of Sample::create_computed_files and s3::download_project_data_files inside of Project::create_computed_files * remove unused s3::download_project_data_files and s3::download_sample_data_files * remove unused property method Sample::has_downloaded_data * replace s3::download_data_files with s3::download_input_files, replace s3::download_metadata_files with s3::download_input_metadta, delete no longer used s3::download_s3_files, rename s3::list_s3_paths to s3::list_input_paths, and move function higher up in module * rename s3::delete_s3_file to s3::delete_output_file and improve function, rename s3::upload_s3_file to s3::upload_output_file and improve function * add s3::generate_pre_signed_link function * change reference to aws boto3 variable from s3 to aws_s3 in s3 module * update references to new s3 module function names in Project, Sample, Library, and ComputedFile models * update tests to reflect new s3 module function names * exchange s3::create_download_url for s3::generate_pre_signed_link in ComputedFile model and ComputedFile tests * delete no longer used s3::create_download_url * add configure_aws_cli django management command * add calls to configure_aws_cli management command for testing environmment, staging and prod environments, and add it to sportal * add logging to configure_aws_cli management command * delete unused s3::configure_aws_cli and call to it in load_data command * delete unused aws cli configuration options for load_data command * add age_timing attribute to Sample model, rename age_at_diagnosis attribute to age in Sample model * remove unused SAMPLE_METADATA_KEYS construct from metadata_file module and update metadata_file::load_samples_metadata accordingly * add migration for Sample model added and rename attributes * update references from age_at_diagnosis to age throughout code, add in new attribute age_timing along side age * fix random formatting bug in Sample::get_metadata, add missed age_timing attribute into expected_keys in test_load_data * update testing bucket to most recent bucket, adjust sample_cell_count_estimate assertion value in test_load_data::test_single_cell_metadata to account for change to filtered cells in input SCPCL999990_metadata.json file * add boto3 config to configure_aws_cli command * short term patch - move configure_aws_cli command call out of run_tests and into test_load_data * remove no longer used Force attribute from calls to s3::delete_output_file in Project::purge * pass s3_key to s3::delete_output_file in Project::purge and not entire ComputedFile object * update type hints in s3 module * explicitly destructure used kwargs in configure_aws_cli function in configure_aws_cli command * replace project_samples_mapping with load_data::project_has_s3_files helper function in load_data command * ensure sample exists before creating library * docker-compose to docker compose * remove docker-compose in GHA * docker-compose to docker compose * remove docker-compose in GHA * remove Sample::AgeTiming(models.TextChoices), set age and age_timing Sample attributes to default=common.NA, add comment to test_load_data noting that sample_cell_count_estimate assertion in test_single_cell_metadata will likely fail during version changes * docker-compose to docker compose * remove docker-compose in GHA * swap migration files * adjust migration to rename age field and not to delete it and re-add it * update age and age_timing Sample attributes migration, make fields non-nullable without a default in Sample model * update ProjectSamplesTable component on client side to incorporate renaming of age_at_diagnosis Sample attribute to age and addition of new age_timing attribute * propogate attribute changes to api/resources/samples and related storybook mock data prop * add bool return values for s3::download_input_files, s3::download_input_metadata, s3::upload_output_file * change logging level in s3::list_input_paths from error to warning * update code clarity in ComputedFile::download_url * fix bug in in update_s3 clause in Project::on_get_project_file and Sample::on_get_sample_file, update return early value in s3::delete_output_file * match filename to docs * match filename to docs * update postgres minor version 12.14 -> 12.19 * db.t2.micro -> db.t3.micro --------- Co-authored-by: Avrohom Gottlieb <[email protected]> Co-authored-by: David Mejia <[email protected]> Co-authored-by: Avrohom Gottlieb <[email protected]> * add input tests to test_s3 on function s3::list_input_paths * add output tests to test_s3 on function s3::list_input_paths, delete remaining unnecessary tests * clarify that recursive is called by default by adding words 'by_default' test_llist_input_paths_recursive_flag_passed test case in test_s3, update comments * fix possible end of s3 resource slash bug in s3::list_input_paths, clean up tests in test_s3 module * update s3 resource trailing slash appending to to occur only when recursive=False in s3::list_input_paths, update tests accordingly * 837 - Manage whitespace in generated readmes (#839) * ensure sample exists before creating library * docker-compose to docker compose * remove docker-compose in GHA * (edit) adjust readme template files to have no more than double newlines * add a new class method ‘save_readme_files’ which rename each zip file’s README.md to the name of the zip file and save it to test_data/readmes folder and call this method from tearDownClass, add a constant ENCODING and replace hard-coded unicode 'utf-8', and add not to ignore api/test_data/readmes folder to gitignore * add a new instance method 'assertProjectReadmeContent' which compares the saved formatted readme file content in test_data/readmes with the README.md content of each zip files, and call it from 'test_readme_files'(previously 'save_readme_files') * (minor) remove no longer used constant * (edit) move the Local Data for beck end rule to bottom of the file to prevent, and track files in api/test_data/readmes * (edit) add api/test_data/readmes to exlude in pre-commit trailing-whitespace * (edit) add a boolean flag SAVE_README_OUTPUT that determines whether the readme files should be saved to api/readmes (readme files should be re-generated when their contents changes for formatting testing, otherwise by default it set to False) * (rename) append the suffix '_FILE' to the constant README * (edit) refactor test_load_data's class method 'test_readme_files' to exclude PROJECT ID prefixes when renaming README.md (to remove redundancy check) and to only save the readme files if the combination (for each computed file type) doesn't exist, and make scope adjustments * (delete) remove previosly generated readme files from test_data/readmes (no longer needed) * add the generated test_data/readmes/ files * (edit) modify the class method save_readme_files (previously test_readme_files) and regenerate test_data/readmes files * (remove) remove the class method save_readme_files (no longer required), constans ENCODING and SAVE_README_OUTPUT, and adjust codebase in test_load_data * (edit) adjust assertProjectReadmeContent to take zip_file as a positional arg and to replace project ID and data in readme contents with placeholder values for generic format testing using its inner method get_masked_content, and replace assertProjectReadmeContains (deleted) with assertProjectReadmeContent in each test * (edit) update assertProjectReadmeContent to use the splitlines method for content comparison, adjust get_updated_content (previously get_masked_content) to replace the placeholders with each test's project id and today's date, and adjust the test_data/readmes files and codebase * (minor) replace f-string to just project_id (not required) and add a bit more context to the comment for splitlines * (edit) add logging to track the names of the readme file and the current test method to assertProjectReadmeContent for easiler debugging * (edit) add a custom message log (print the names of the test and readme file in test_data/readmes) only when test failed * (edit) update 'assertProjectReadmeContent' to take a list of project IDs and use the string.replace method to replace the placeholders(no longer using regex) in test_load_data, and adjust test_data/readems's placeholders * (edit) update the assertion failure message for readme content comparison in assertProjectReadmeContent * (edit) remove the placeholders variable and directly replace the placeholder with project IDs in assertProjectReadmeContent in test_load_data --------- Co-authored-by: David Mejia <[email protected]> * (merge) resolve conflicts to merge feature/consolidate-readmes-into-dev * Revert "Fix conflict in test_load_data" * Revert "Revert "Fix conflict in test_load_data"" * (rebase) rebase feature/portal-metadata-command with origin dev * Merge dev updates into feature/portal-medatada-command (#812) * create readme_creation.py module, move all readme constants and ComputedFile::get_readme_from_download_config method to new file, update references in Project and ComputedFile models that call these constants * add readme_creation::create_readme_file, add TEMPLATE_PATHS data structure in readme_creation module * fix bug in readme_creation::create_readme_file * swap out readme implementations in ComputedFile::get_project_file and ComputedFile::get_sample_file * remove eager readme generation logic from Project model and no longer used constants in readme_creation module * update constant names and method name and signature in readme_creation module, propogated changes to caller methods in ComputedFile model * change file name from readme_creation to readme_file * move MetadaFilenames class of constants from ComputedFile model to metadata_file module, move output metadata file naming logic from ComputedFile to metadata_file * replace writing of metadata to file with writing to buffer in ComputedFile::get_project|sample_file, update metadata_file.write_metadata_dicts to handle writing to buffer * update references to readme_file in ComputedFile in light of name change * remove leftover project keyword arg reference in ComputedFile * add new method metadata_file::get_file_contents, move buffer logic out of ComputedFile::get_project|sample_file into metadata_file::get_file_contents, rename metadata_file::get_metadata_file_name to metadata_file::get_file_name * improve readability of metadata_file::get_file_name * update metadata_file::get_file_contents signature to accept list of metadata dicts as opposed to list of libraries * update tests in test_metadata_file to handle metadata_file::get_file_contents and not metadata_file::write_metadata_dicts * remove metadata_file::write_metadata_dicts --------- Co-authored-by: Avrohom Gottlieb <[email protected]> Co-authored-by: Avrohom Gottlieb <[email protected]> * 797 - Scaffolding the porta-wide metadata management command and test files (#804) * create the management command file for the portal-wide metadata and register it to sportal * create the test file for the portal-wide matadata management command test_create_portal_metadata * (edit) rename the setup_database method to load_test_data, remove 'project_id' kwarg when calling the load_data command, check PROJECT_COUNT, SAMPLES_COUNT, and LIBRARIES_COUNT by quering libraries metadata from the Library model to verify database setup instead of checking an explicit list of project ids * (edit) check against queryset objects count rather than IDs for projects, samples, and libraries * (minor) fix a typo and remove comments * Merge dev updates into feature/portal-metadata-command (#857) * add s3 module * update references in management command, models and tests to point to functions in new s3 module, remove duplicate code * move utils::list_s3_paths to new s3 module, update references in other files * move TestListS3Paths class from test_utils to new test_s3 testing module * add generic s3::download_s3_files method, add s3::download_metadata_files and s3::download_data_files methods * fix list.extend bug in s3::download_s3_files * add s3::download_sample_data_files and s3::download_project_data_files methods, impove s3::download_metadata_files * swap s3::download_data for s3::download_metadata_files in load_data command, call s3::download_project_data_files in Project::create_computed_files, call s3::download_sample_data_files in Sample::create_computed_files, add Sample::has_downloaded_data property method * improve readability of Project::get_data_file_paths * delete s3::download_data and s3::download_data_files functions, update s3 module function docstrings * add s3::download_data_files * integrate s3::download_data_files inside of ComputedFile::get_project|sample_file, remove references to s3::download_sample_data_files inside of Sample::create_computed_files and s3::download_project_data_files inside of Project::create_computed_files * remove unused s3::download_project_data_files and s3::download_sample_data_files * remove unused property method Sample::has_downloaded_data * replace s3::download_data_files with s3::download_input_files, replace s3::download_metadata_files with s3::download_input_metadta, delete no longer used s3::download_s3_files, rename s3::list_s3_paths to s3::list_input_paths, and move function higher up in module * rename s3::delete_s3_file to s3::delete_output_file and improve function, rename s3::upload_s3_file to s3::upload_output_file and improve function * add s3::generate_pre_signed_link function * change reference to aws boto3 variable from s3 to aws_s3 in s3 module * update references to new s3 module function names in Project, Sample, Library, and ComputedFile models * update tests to reflect new s3 module function names * exchange s3::create_download_url for s3::generate_pre_signed_link in ComputedFile model and ComputedFile tests * delete no longer used s3::create_download_url * add configure_aws_cli django management command * add calls to configure_aws_cli management command for testing environmment, staging and prod environments, and add it to sportal * add logging to configure_aws_cli management command * delete unused s3::configure_aws_cli and call to it in load_data command * delete unused aws cli configuration options for load_data command * add age_timing attribute to Sample model, rename age_at_diagnosis attribute to age in Sample model * remove unused SAMPLE_METADATA_KEYS construct from metadata_file module and update metadata_file::load_samples_metadata accordingly * add migration for Sample model added and rename attributes * update references from age_at_diagnosis to age throughout code, add in new attribute age_timing along side age * fix random formatting bug in Sample::get_metadata, add missed age_timing attribute into expected_keys in test_load_data * update testing bucket to most recent bucket, adjust sample_cell_count_estimate assertion value in test_load_data::test_single_cell_metadata to account for change to filtered cells in input SCPCL999990_metadata.json file * add boto3 config to configure_aws_cli command * short term patch - move configure_aws_cli command call out of run_tests and into test_load_data * remove no longer used Force attribute from calls to s3::delete_output_file in Project::purge * pass s3_key to s3::delete_output_file in Project::purge and not entire ComputedFile object * update type hints in s3 module * explicitly destructure used kwargs in configure_aws_cli function in configure_aws_cli command * replace project_samples_mapping with load_data::project_has_s3_files helper function in load_data command * docker-compose to docker compose * remove docker-compose in GHA * remove Sample::AgeTiming(models.TextChoices), set age and age_timing Sample attributes to default=common.NA, add comment to test_load_data noting that sample_cell_count_estimate assertion in test_single_cell_metadata will likely fail during version changes * docker-compose to docker compose * remove docker-compose in GHA * swap migration files * adjust migration to rename age field and not to delete it and re-add it * update age and age_timing Sample attributes migration, make fields non-nullable without a default in Sample model * update ProjectSamplesTable component on client side to incorporate renaming of age_at_diagnosis Sample attribute to age and addition of new age_timing attribute * propogate attribute changes to api/resources/samples and related storybook mock data prop * add bool return values for s3::download_input_files, s3::download_input_metadata, s3::upload_output_file * change logging level in s3::list_input_paths from error to warning * update code clarity in ComputedFile::download_url * fix bug in in update_s3 clause in Project::on_get_project_file and Sample::on_get_sample_file, update return early value in s3::delete_output_file * match filename to docs * match filename to docs * update postgres minor version 12.14 -> 12.19 * db.t2.micro -> db.t3.micro * fix setting name AWS_S3_BUCKET_NAME * enforce latest RDS CA * update cert name: latest -> cert * set rds ca: rds-ca-rsa2048-g1 --------- Co-authored-by: Avrohom Gottlieb <[email protected]> Co-authored-by: David Mejia <[email protected]> Co-authored-by: Avrohom Gottlieb <[email protected]> --------- Co-authored-by: Avrohom Gottlieb <[email protected]> Co-authored-by: David Mejia <[email protected]> Co-authored-by: Avrohom Gottlieb <[email protected]> * 797 - Generate and add `README.md` to a zip file (#808) * create the management command file for the portal-wide metadata and register it to sportal * create the test file for the portal-wide matadata management command test_create_portal_metadata * (edit) rename the setup_database method to load_test_data, remove 'project_id' kwarg when calling the load_data command, check PROJECT_COUNT, SAMPLES_COUNT, and LIBRARIES_COUNT by quering libraries metadata from the Library model to verify database setup instead of checking an explicit list of project ids * add ComputedFile::get_portal_metadata_file method and readme_file.get_portal_metadata_file_content for the portal-wide metadata readme file generation * (edit) call ComputedFile::get_portal_metadata_file in create-portal-metadata management command and update its test * (edit) check against queryset objects count rather than IDs for projects, samples, and libraries * (minor) fix a typo and remove comments * add common.GENERATED_PORTAL_METADATA_DOWNLOAD_CONFIG for the portal metadata download and adjust create_portal_metadata management command, change suffix CONFIGURATIONS to CONFIG for the download config and adjust the codebase accordingly * (edit) edit readme_file.get_file_contents temporarily(and remove get_portal_metadata_file_content - no longer needed), add a new model field portal_metadata_only and update ComputedFile::get_portal_metadata_file to return the instance of the computed file, and adjust test_create_portal_metadata * (edit) check the portal_metadata_only key to swap config and template contexts values (remove the Project instance check) and adjust the codebase accordingly * (edit) add a type hint for the queryset parameter * (edit) store the portal metadata computed file to a local variable computed_file in the create_portal_metadata management command, and temporarily add noqa to supress Flake8 unused variable warning (with TODO comment) * (edit) adjust readme_file.get_file_contents using new readme template structures, and make common.GENERATED_PORTAL_METADATA_DOWNLOAD_CONFIG from a list to a dict * (edit) remove the test_zip_file and test_readme_file methods and assert directly from test_create_portal_metadata, and change the readme assert value and remove TODO comments * (bug) remove the extra context manager for readme check * (minor) move comments and variable for zip assertion inside the context manager * (rename) rename common.GENERATED_SAMPLE_DOWNLOAD_CONFIGS (to plural) and adjust codebase accordingly * (edit) refactor download_config handling for portal-metadata (revert from dict to list to perform the same check (in) as the project's config, remove the variable, add a QuerySet check to determine whether 'projects' should be passed as when dealing with a single project * (clean up) remove the constant added previously(no longer used) * (edit) use list literals to add a check that matches the project download config check * (adjust) rollblack previous migration and re-migrate the portal_metadata_only field for ComputedFile model * (edit) rename GENERATED_PROJECT_DOWNLOAD_CONFIG to GENERATED_PROJECT_DOWNLOAD_CONFIGS, and replace iin with is in the readme_file module for the portal metadata config test * Merge 'feature/portal-metadata-command' branch into nozomione/797-generate-readme-file-zip-2 * (edit) add Iterable type hint instead of Queryset in readme_file.get_file_contents, and pass the list as arguments for Project and Sample when calling readme_file.get_file_contents in computed_file * 797 - Generate and add `metadata.tsv` to a zip file (#810) * create the management command file for the portal-wide metadata and register it to sportal * create the test file for the portal-wide matadata management command test_create_portal_metadata * (edit) rename the setup_database method to load_test_data, remove 'project_id' kwarg when calling the load_data command, check PROJECT_COUNT, SAMPLES_COUNT, and LIBRARIES_COUNT by quering libraries metadata from the Library model to verify database setup instead of checking an explicit list of project ids * add ComputedFile::get_portal_metadata_file method and readme_file.get_portal_metadata_file_content for the portal-wide metadata readme file generation * (edit) call ComputedFile::get_portal_metadata_file in create-portal-metadata management command and update its test * (edit) add the portal metadata file generation workflow to ComputedFile::get_portal_metadata_file * (edit) adjust the create portal metadata command and its test (add test for metadata file, organize and split the logic into separate methods for readbility) * (edit) check against queryset objects count rather than IDs for projects, samples, and libraries * (minor) fix a typo and remove comments * add common.GENERATED_PORTAL_METADATA_DOWNLOAD_CONFIG for the portal metadata download and adjust create_portal_metadata management command, change suffix CONFIGURATIONS to CONFIG for the download config and adjust the codebase accordingly * (edit) edit readme_file.get_file_contents temporarily(and remove get_portal_metadata_file_content - no longer needed), add a new model field portal_metadata_only and update ComputedFile::get_portal_metadata_file to return the instance of the computed file, and adjust test_create_portal_metadata * (edit) check the portal_metadata_only key to swap config and template contexts values (remove the Project instance check) and adjust the codebase accordingly * (edit) add a type hint for the queryset parameter * (edit) store the portal metadata computed file to a local variable computed_file in the create_portal_metadata management command, and temporarily add noqa to supress Flake8 unused variable warning (with TODO comment) * (edit) add the static method ComputedFile::get_local_portal_metadata_path and remove OUTPUT_* constants from common and adjust the codebase * (edit) adjust readme_file.get_file_contents using new readme template structures, and make common.GENERATED_PORTAL_METADATA_DOWNLOAD_CONFIG from a list to a dict * (edit) remove the test_zip_file and test_readme_file methods and assert directly from test_create_portal_metadata, and change the readme assert value and remove TODO comments * (bug) remove the extra context manager for readme check * (edit) remove the test_metadata_file method and directly assert the metadata file in test_create_portal_metadata, and use common.METADATA_COLUMN_SORT_ORDER as the value of expected_keys instead of hand-coded value * (minor) move comments and variable for zip assertion inside the context manager * (edit) use the csv module's DictReader for the metadata.tsv file assertion * (rename) rename common.GENERATED_SAMPLE_DOWNLOAD_CONFIGS (to plural) and adjust codebase accordingly * (edit) refactor download_config handling for portal-metadata (revert from dict to list to perform the same check (in) as the project's config, remove the variable, add a QuerySet check to determine whether 'projects' should be passed as when dealing with a single project * (clean up) remove the constant added previously(no longer used) * (edit) use list literals to add a check that matches the project download config check * add a new static method ComputedFile::get_local_file_path (remove Computed::get_local_portal_metadata_path), add the 'zip_file_path' variable in ComputedFile::get_portal_metadata_file, and adjust test_create_portal_metadata * (adjust) rollblack previous migration and re-migrate the portal_metadata_only field for ComputedFile model * (edit) rename GENERATED_PROJECT_DOWNLOAD_CONFIG to GENERATED_PROJECT_DOWNLOAD_CONFIGS, and replace iin with is in the readme_file module for the portal metadata config test * (edit) use the config.logging.get_and_configure_logger for the log message and remove the logging module) from the create_portal_metadata management command, adjust test_create_portal_metadata to explicitly check each file with assertIn (no for loop and remove the expected_files variable), use csv and TextIOWrapper to open TSV file to check the content, and assert expected_keys using set (no longer using list), and add check for library ids * Merge 'feature/portal-metadata-command' branch into nozomione/797-generate-readme-file-zip-2 * (edit) add Iterable type hint instead of Queryset in readme_file.get_file_contents, and pass the list as arguments for Project and Sample when calling readme_file.get_file_contents in computed_file * (pre-commit) remove the old unused redandant method for the portal metadata from computed_file model * 813 - Persist portal metadata computed file and upload after creation (#825) * create the management command file for the portal-wide metadata and register it to sportal * create the test file for the portal-wide matadata management command test_create_portal_metadata * (edit) rename the setup_database method to load_test_data, remove 'project_id' kwarg when calling the load_data command, check PROJECT_COUNT, SAMPLES_COUNT, and LIBRARIES_COUNT by quering libraries metadata from the Library model to verify database setup instead of checking an explicit list of project ids * add ComputedFile::get_portal_metadata_file method and readme_file.get_portal_metadata_file_content for the portal-wide metadata readme file generation * (edit) call ComputedFile::get_portal_metadata_file in create-portal-metadata management command and update its test * (edit) add the portal metadata file generation workflow to ComputedFile::get_portal_metadata_file * (edit) adjust the create portal metadata command and its test (add test for metadata file, organize and split the logic into separate methods for readbility) * (edit) check against queryset objects count rather than IDs for projects, samples, and libraries * (minor) fix a typo and remove comments * add common.GENERATED_PORTAL_METADATA_DOWNLOAD_CONFIG for the portal metadata download and adjust create_portal_metadata management command, change suffix CONFIGURATIONS to CONFIG for the download config and adjust the codebase accordingly * (edit) edit readme_file.get_file_contents temporarily(and remove get_portal_metadata_file_content - no longer needed), add a new model field portal_metadata_only and update ComputedFile::get_portal_metadata_file to return the instance of the computed file, and adjust test_create_portal_metadata * (edit) check the portal_metadata_only key to swap config and template contexts values (remove the Project instance check) and adjust the codebase accordingly * (edit) add a type hint for the queryset parameter * (edit) store the portal metadata computed file to a local variable computed_file in the create_portal_metadata management command, and temporarily add noqa to supress Flake8 unused variable warning (with TODO comment) * (edit) add the static method ComputedFile::get_local_portal_metadata_path and remove OUTPUT_* constants from common and adjust the codebase * (edit) add save/upload logic to the create portal metadata management command for the database/s3 bucket respectively * (fix) fixed the readme output * (edit) adjust readme_file.get_file_contents using new readme template structures, and make common.GENERATED_PORTAL_METADATA_DOWNLOAD_CONFIG from a list to a dict * (edit) remove the test_zip_file and test_readme_file methods and assert directly from test_create_portal_metadata, and change the readme assert value and remove TODO comments * (bug) remove the extra context manager for readme check * (edit) remove the test_metadata_file method and directly assert the metadata file in test_create_portal_metadata, and use common.METADATA_COLUMN_SORT_ORDER as the value of expected_keys instead of hand-coded value * (minor) move comments and variable for zip assertion inside the context manager * (edit) use the csv module's DictReader for the metadata.tsv file assertion * (edit) remove TODO comment and move the body of test_computed_file inside test_create_portal_metadata for the computed file assertion * (minor) remove the LOCAL_ZIP_FILE_PATH variable (no need - since the test no longer has individual methods) * (typo) fix a typo * (minor) remove a comment in computed_file, instead it wil be included in the PR * (edit) add an arg for upload_s3 and define the constans for args' default value to the create_portal_metadata management command, and add constants for default values and rename 'zip' to 'zip_file' in the context mamanger for testing zip file in the test_create_portal_metadata * (rename) append the suffix '_FILE' to the constants README and METADATA * (rename) rename common.GENERATED_SAMPLE_DOWNLOAD_CONFIGS (to plural) and adjust codebase accordingly * (edit) refactor download_config handling for portal-metadata (revert from dict to list to perform the same check (in) as the project's config, remove the variable, add a QuerySet check to determine whether 'projects' should be passed as when dealing with a single project * (clean up) remove the constant added previously(no longer used) * (edit) use list literals to add a check that matches the project download config check * add a new static method ComputedFile::get_local_file_path (remove Computed::get_local_portal_metadata_path), add the 'zip_file_path' variable in ComputedFile::get_portal_metadata_file, and adjust test_create_portal_metadata * (migration) merge migration files to resolve multiple leaf nodes * (edit) use s3.upload_output_file (remove computed_file.upload_s3_file - no longer used), fix verb from 'upload' to 'update' and adjust the codebase, remove constants (CLEAN_UP_OUTPUT_DATA and UPLOAD_S3 - no longer used) in create_portal_metadata management command, and minor adjustment in its test * (minor) remove the constant 'ENCODING' to match the implementation of PR 389 (the whitespace removal from the readme files) * (TODO comment) add a TODO comment to indicate that once PR #839 is merged into 'dev' branch, readme file format testing will be added, and 'assertProjectReadmeContains' will be removed from test_create_portal_metadata * (edit) add the command 'configure_aws_cli' to resolve the duplicated load data call in test_create_portal_metadata * (TENP) temporaily skip isort to modify the import order of the management commands (to prevent tests to run in parallel) * (edit) add mock for s3.upload_output_file used in create_portal_metadata * (migration) undo the merged migration files * (fix) run pre-commit and migrate * (adjust) rollblack previous migration and re-migrate the portal_metadata_only field for ComputedFile model * (mionr) re-locate the logger message and fix typos in create_portal_metadata management command file * (minor) make another revision of the comments * (remove TEMP) remove temporaily added codeblocks * (edit) access kwargs props using square brackets instead of the get method * (edit) use the get method to give the default value in if condition to prevent an error when the args is not passed when calling the portal metadata management command * (minor) add a check for adding logging handler to make sure no duplicate loggings * (minor) add a comment and remove the handler var (no needed) * (minor) add a comment for the handler check * (edit) rename GENERATED_PROJECT_DOWNLOAD_CONFIG to GENERATED_PROJECT_DOWNLOAD_CONFIGS, and replace iin with is in the readme_file module for the portal metadata config test * (edit) use the config.logging.get_and_configure_logger for the log message and remove the logging module) from the create_portal_metadata management command, adjust test_create_portal_metadata to explicitly check each file with assertIn (no for loop and remove the expected_files variable), use csv and TextIOWrapper to open TSV file to check the content, and assert expected_keys using set (no longer using list), and add check for library ids * Merge 'feature/portal-metadata-command' branch into nozomione/797-generate-readme-file-zip-2 * (edit) add Iterable type hint instead of Queryset in readme_file.get_file_contents, and pass the list as arguments for Project and Sample when calling readme_file.get_file_contents in computed_file * (edit) pass the argument 'computed_file.s3_bucket' to 's3.upload_outout_file' (new change from dev), add the new parameters ('clean_up_output_data’, 'update_s3') with type hints to the 'create_portal_metadata' method function signature in the create portal metadata command, and adjust its test * (edit) clean up output data regardless of computed file existence and enhance the help string for the create_portal_metadata management command, add the assertion for computed file count (for singularity) and fields (values match the portal metadata download configuration) in test_create_portal_metadata, update CopmutedFile::get_portal_metadata_file to explicitly set field values defined in the download config when instantiating a computed file for the portal metadata * (fix) adjust the computed file byte size in the test_create_portal)metadata * (edit) use walrus operator for computed_file assignment and save computed_file object after the successful s3 file upload in create_portal_metadata * (edit) add assertEqualWithVariance (which checks computed_file file size within the specified range) and assertFields (which checks the computed file field values against download configurations) to test_create_portal_metadata * (edit) remove data type checks for the field values and instad, perform equality check in assertFields * 815 - Ensure only one portal metadata computed file exists at a time (#859) * create the management command file for the portal-wide metadata and register it to sportal * create the test file for the portal-wide matadata management command test_create_portal_metadata * (edit) rename the setup_database method to load_test_data, remove 'project_id' kwarg when calling the load_data command, check PROJECT_COUNT, SAMPLES_COUNT, and LIBRARIES_COUNT by quering libraries metadata from the Library model to verify database setup instead of checking an explicit list of project ids * add ComputedFile::get_portal_metadata_file method and readme_file.get_portal_metadata_file_content for the portal-wide metadata readme file generation * (edit) call ComputedFile::get_portal_metadata_file in create-portal-metadata management command and update its test * (edit) add the portal metadata file generation workflow to ComputedFile::get_portal_metadata_file * (edit) adjust the create portal metadata command and its test (add test for metadata file, organize and split the logic into separate methods for readbility) * (edit) check against queryset objects count rather than IDs for projects, samples, and libraries * (minor) fix a typo and remove comments * add common.GENERATED_PORTAL_METADATA_DOWNLOAD_CONFIG for the portal metadata download and adjust create_portal_metadata management command, change suffix CONFIGURATIONS to CONFIG for the download config and adjust the codebase accordingly * (edit) edit readme_file.get_file_contents temporarily(and remove get_portal_metadata_file_content - no longer needed), add a new model field portal_metadata_only and update ComputedFile::get_portal_metadata_file to return the instance of the computed file, and adjust test_create_portal_metadata * (edit) check the portal_metadata_only key to swap config and template contexts values (remove the Project instance check) and adjust the codebase accordingly * (edit) add a type hint for the queryset parameter * (edit) store the portal metadata computed file to a local variable computed_file in the create_portal_metadata management command, and temporarily add noqa to supress Flake8 unused variable warning (with TODO comment) * (edit) add the static method ComputedFile::get_local_portal_metadata_path and remove OUTPUT_* constants from common and adjust the codebase * (edit) add save/upload logic to the create portal metadata management command for the database/s3 bucket respectively * (fix) fixed the readme output * (edit) adjust readme_file.get_file_contents using new readme template structures, and make common.GENERATED_PORTAL_METADATA_DOWNLOAD_CONFIG from a list to a dict * (edit) remove the test_zip_file and test_readme_file methods and assert directly from test_create_portal_metadata, and change the readme assert value and remove TODO comments * (bug) remove the extra context manager for readme check * (edit) remove the test_metadata_file method and directly assert the metadata file in test_create_portal_metadata, and use common.METADATA_COLUMN_SORT_ORDER as the value of expected_keys instead of hand-coded value * (minor) move comments and variable for zip assertion inside the context manager * (edit) use the csv module's DictReader for the metadata.tsv file assertion * (edit) remove TODO comment and move the body of test_computed_file inside test_create_portal_metadata for the computed file assertion * (minor) remove the LOCAL_ZIP_FILE_PATH variable (no need - since the test no longer has individual methods) * (typo) fix a typo * (minor) remove a comment in computed_file, instead it wil be included in the PR * (edit) add an arg for upload_s3 and define the constans for args' default value to the create_portal_metadata management command, and add constants for default values and rename 'zip' to 'zip_file' in the context mamanger for testing zip file in the test_create_portal_metadata * (rename) append the suffix '_FILE' to the constants README and METADATA * (rename) rename common.GENERATED_SAMPLE_DOWNLOAD_CONFIGS (to plural) and adjust codebase accordingly * (edit) refactor download_config handling for portal-metadata (revert from dict to list to perform the same check (in) as the project's config, remove the variable, add a QuerySet check to determine whether 'projects' should be passed as when dealing with a single project * (clean up) remove the constant added previously(no longer used) * (edit) use list literals to add a check that matches the project download config check * add a new static method ComputedFile::get_local_file_path (remove Computed::get_local_portal_metadata_path), add the 'zip_file_path' variable in ComputedFile::get_portal_metadata_file, and adjust test_create_portal_metadata * (migration) merge migration files to resolve multiple leaf nodes * (edit) use s3.upload_output_file (remove computed_file.upload_s3_file - no longer used), fix verb from 'upload' to 'update' and adjust the codebase, remove constants (CLEAN_UP_OUTPUT_DATA and UPLOAD_S3 - no longer used) in create_portal_metadata management command, and minor adjustment in its test * (minor) remove the constant 'ENCODING' to match the implementation of PR 389 (the whitespace removal from the readme files) * (TODO comment) add a TODO comment to indicate that once PR #839 is merged into 'dev' branch, readme file format testing will be added, and 'assertProjectReadmeContains' will be removed from test_create_portal_metadata * (edit) add the command 'configure_aws_cli' to resolve the duplicated load data call in test_create_portal_metadata * (TENP) temporaily skip isort to modify the import order of the management commands (to prevent tests to run in parallel) * (edit) add mock for s3.upload_output_file used in create_portal_metadata * (migration) undo the merged migration files * (fix) run pre-commit and migrate * (adjust) rollblack previous migration and re-migrate the portal_metadata_only field for ComputedFile model * (mionr) re-locate the logger message and fix typos in create_portal_metadata management command file * (minor) make another revision of the comments * (remove TEMP) remove temporaily added codeblocks * add the purge function to the create_portal_metadata management command and add its test to test_create_portal_metadata * add the argument 'purge' and the keyword argument 'delete_from_s3' for purge_computed)file, and call it from the handle method * (edit) access kwargs props using square brackets instead of the get method * (edit) access kwargs props using square brackets instead of the get method * (edit) use the get method to give the default value in if condition to prevent an error when the args is not passed when calling the portal metadata management command * (edit) add fallback values to kwargs for purge in create_portal_metadata management command, add mock for upload_output_file to test_purge_computed_file * (minor) add a check for adding logging handler to make sure no duplicate loggings * (minor) add a comment and remove the handler var (no needed) * (minor) add a comment for the handler check * (minor) improve comments * (edit) rename GENERATED_PROJECT_DOWNLOAD_CONFIG to GENERATED_PROJECT_DOWNLOAD_CONFIGS, and replace iin with is in the readme_file module for the portal metadata config test * (edit) use the config.logging.get_and_configure_logger for the log message and remove the logging module) from the create_portal_metadata management command, adjust test_create_portal_metadata to explicitly check each file with assertIn (no for loop and remove the expected_files variable), use csv and TextIOWrapper to open TSV file to check the content, and assert expected_keys using set (no longer using list), and add check for library ids * Merge 'feature/portal-metadata-command' branch into nozomione/797-generate-readme-file-zip-2 * (edit) add Iterable type hint instead of Queryset in readme_file.get_file_contents, and pass the list as arguments for Project and Sample when calling readme_file.get_file_contents in computed_file * (edit) pass the argument 'computed_file.s3_bucket' to 's3.upload_outout_file' (new change from dev), add the new parameters ('clean_up_output_data’, 'update_s3') with type hints to the 'create_portal_metadata' method function signature in the create portal metadata command, and adjust its test * (edit) call 'purge_computed_file' from 'create_portal_metadata' in the 'create_portal_matadata' command and add a new test 'test_only_one_computed_file_at_any_point' (remove 'test_purge_computed_file') in 'test.create_portal_metadata' * (rename) rename the keyword arg from 'delete_from_s3' to 'update_s3 in the 'purge_computed_file' method in the 'cretate_portal_metadata' command' * (edit) clean up output data regardless of computed file existence and enhance the help string for the create_portal_metadata management command, add the assertion for computed file count (for singularity) and fields (values match the portal metadata download configuration) in test_create_portal_metadata, update CopmutedFile::get_portal_metadata_file to explicitly set field values defined in the download config when instantiating a computed file for the portal metadata * (edit) rename 'exsisting_computed_file' to 'old_computed_file' and optimize the 'if' check using the warlus operator, remove the 'if' check for copmuted file existence when purgingand let it crash when no computed file exists (since we'll call the purge method only if the computed file exists) * (fix) adjust the computed file byte size in the test_create_portal)metadata * (edit) use walrus operator for computed_file assignment and save computed_file object after the successful s3 file upload in create_portal_metadata * (edit) add assertEqualWithVariance (which checks computed_file file size within the specified range) and assertFields (which checks the computed file field values against download configurations) to test_create_portal_metadata * (edit) update test_only_one_computed_file_at_any_point to call twice create_portal_metadata without storing their values when checking the singularity of the computed file object, query the newly saved computed file to ensure mock_delete_output_file can be called one with its field values, (minor) remove the first logger ('Purging the portal-wide metadata computed...') from the purge method in create_portal_metadata management command * Prepare merge feature/portal-metadata-command into dev (#885) * Merge dev updates into feature/consolidate-readmes (#850) * add s3 module * update references in management command, models and tests to point to functions in new s3 module, remove duplicate code * move utils::list_s3_paths to new s3 module, update references in other files * move TestListS3Paths class from test_utils to new test_s3 testing module * add generic s3::download_s3_files method, add s3::download_metadata_files and s3::download_data_files methods * fix list.extend bug in s3::download_s3_files * add s3::download_sample_data_files and s3::download_project_data_files methods, impove s3::download_metadata_files * swap s3::download_data for s3::download_metadata_files in load_data command, call s3::download_project_data_files in Project::create_computed_files, call s3::download_sample_data_files in Sample::create_computed_files, add Sample::has_downloaded_data property method * improve readability of Project::get_data_file_paths * delete s3::download_data and s3::download_data_files functions, update s3 module function docstrings * add s3::download_data_files * integrate s3::download_data_files inside of ComputedFile::get_project|sample_file, remove references to s3::download_sample_data_files inside of Sample::create_computed_files and s3::download_project_data_files inside of Project::create_computed_files * remove unused s3::download_project_data_files and s3::download_sample_data_files * remove unused property method Sample::has_downloaded_data * replace s3::download_data_files with s3::download_input_files, replace s3::download_metadata_files with s3::download_input_metadta, delete no longer used s3::download_s3_files, rename s3::list_s3_paths to s3::list_input_paths, and move function higher up in module * rename s3::delete_s3_file to s3::delete_output_file and improve function, rename s3::upload_s3_file to s3::upload_output_file and improve function * add s3::generate_pre_signed_link function * change reference to aws boto3 variable from s3 to aws_s3 in s3 module * update references to new s3 module function names in Project, Sample, Library, and ComputedFile models * update tests to reflect new s3 module function names * exchange s3::create_download_url for s3::generate_pre_signed_link in ComputedFile model and ComputedFile tests * delete no longer used s3::create_download_url * add configure_aws_cli django management command * add calls to configure_aws_cli management command for testing environmment, staging and prod environments, and add it to sportal * add logging to configure_aws_cli management command * delete unused s3::configure_aws_cli and call to it in load_data command * delete unused aws cli configuration options for load_data command * add age_timing attribute to Sample model, rename age_at_diagnosis attribute to age in Sample model * remove unused SAMPLE_METADATA_KEYS construct from metadata_file module and update metadata_file::load_samples_metadata accordingly * add migration for Sample model added and rename attributes * update references from age_at_diagnosis to age throughout code, add in new attribute age_timing along side age * fix random formatting bug in Sample::get_metadata, add missed age_timing attribute into expected_keys in test_load_data * update testing bucket to most recent bucket, adjust sample_cell_count_estimate assertion value in test_load_data::test_single_cell_metadata to account for change to filtered cells in input SCPCL999990_metadata.json file * add boto3 config to configure_aws_cli command * short term patch - move configure_aws_cli command call out of run_tests and into test_load_data * remove no longer used Force attribute from calls to s3::delete_output_file in Project::purge * pass s3_key to s3::delete_output_file in Project::purge and not entire ComputedFile object * update type hints in s3 module * explicitly destructure used kwargs in configure_aws_cli function in configure_aws_cli command * replace project_samples_mapping with load_data::project_has_s3_files helper function in load_data command * ensure sample exists before creating library * docker-compose to docker compose * remove docker-compose in GHA * docker-compose to docker compose * remove docker-compose in GHA * remove Sample::AgeTiming(models.TextChoices), set age and age_timing Sample attributes to default=common.NA, add comment to test_load_data noting that sample_cell_count_estimate assertion in test_single_cell_metadata will likely fail during version changes * docker-compose to docker compose * remove docker-compose in GHA * swap migration files * adjust migration to rename age field and not to delete it and re-add it * update age and age_timing Sample attributes migration, make fields non-nullable without a default in Sample model * update ProjectSamplesTable component on client side to incorporate renaming of age_at_diagnosis Sample attribute to age and addition of new age_timing attribute * propogate attribute changes to api/resources/samples and related storybook mock data prop * add bool return values for s3::download_input_files, s3::download_input_metadata, s3::upload_output_file * change logging level in s3::list_input_paths from error to warning * update code clarity in ComputedFile::download_url * fix bug in in update_s3 clause in Project::on_get_project_file and Sample::on_get_sample_file, update return early value in s3::delete_output_file * match filename to docs * match filename to docs * update postgres minor version 12.14 -> 12.19 * db.t2.micro -> db.t3.micro --------- Co-authored-by: Avrohom Gottlieb <[email protected]> Co-authored-by: David Mejia <[email protected]> Co-authored-by: Avrohom Gottlieb <[email protected]> * add input tests to test_s3 on function s3::list_input_paths * add output tests to test_s3 on function s3::list_input_paths, delete remaining unnecessary tests * clarify that recursive is called by default by adding words 'by_default' test_llist_input_paths_recursive_flag_passed test case in test_s3, update comments * fix possible end of s3 resource slash bug in s3::list_input_paths, clean up tests in test_s3 module * update s3 resource trailing slash appending to to occur only when recursive=False in s3::list_input_paths, update tests accordingly * 837 - Manage whitespace in generated readmes (#839) * ensure sample exists before creating library * docker-compose to docker compose * remove docker-compose in GHA * (edit) adjust readme template files to have no more than double newlines * add a new class method ‘save_readme_files’ which rename each zip file’s README.md to the name of the zip file and save it to test_data/readmes folder and call this method from tearDownClass, add a constant ENCODING and replace hard-coded unicode 'utf-8', and add not to ignore api/test_data/readmes folder to gitignore * add a new instance method 'assertProjectReadmeContent' which compares the saved formatted readme file content in test_data/readmes with the README.md content of each zip files, and call it from 'test_readme_files'(previously 'save_readme_files') * (minor) remove no longer used constant * (edit) move the Local Data for beck end ru…
Issue Number
Closing #813
Feature branch:
feature/portal-metadata-command
Purpose/Implementation Notes
The following methods were added to the
create_portal_metadata
management command:save
method to persistcomputed_file
in theComputedFile
model's tableComputedFile::upload_s3_file
to uploadportal_metadata.zip
to the S3 bucketAdjusted the test:
test_create_portal_metadata
Types of changes
Functional tests
N/A
Checklist
Screenshots
N/A