Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

915 - Add generate computed file command #936

Merged

Conversation

avrohomgottlieb
Copy link
Contributor

@avrohomgottlieb avrohomgottlieb commented Oct 15, 2024

Issue Number

Closes #915 and #841

Purpose/Implementation Notes

This PR makes 3 additions

  • Adds a loader::generate_computed_file method
  • Adds a generate_computed_file management command, which will be used as the entrypoint with Batch jobs
  • Adds a dispatch_to_batch management command, responsible to iterating over all permutations of projects, samples and downloads_configs which need computed files generated for them

Types of changes

What types of changes does your code introduce?

  • Bugfix (non-breaking change which fixes an issue)
  • Refactor (addresses code organization and design mentioned in corresponding issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)

Functional tests

List out the functional tests you've completed to verify your changes work locally.

Checklist

  • Lint and unit tests pass locally with my changes

Screenshots

N/A


def _create_computed_file_callback(future, *, update_s3: bool, clean_up_output_data: bool) -> None:
"""
Wrap multiprocessing logic by grabbing computed file future and uploading it tohe s3.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sp

self,
project_id: str,
sample_id: str,
download_config: Dict,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be a string that references a dict in common.py

Comment on lines 185 to 198
if project:
computed_file = ComputedFile.get_project_file(
project, download_config, project.get_output_file_name(download_config)
)
_create_computed_file(computed_file, update_s3, clean_up_output_data=False)
elif sample:
computed_file = ComputedFile.get_sample_file(
sample,
download_config,
sample.get_output_file_name(download_config),
Lock(), # this should be removed when CF::get_sample_file is refactored
)
_create_computed_file(computed_file, update_s3, clean_up_output_data=False)
sample.project.update_downloadable_sample_count()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if project:
computed_file = ComputedFile.get_project_file(
project, download_config, project.get_output_file_name(download_config)
)
_create_computed_file(computed_file, update_s3, clean_up_output_data=False)
elif sample:
computed_file = ComputedFile.get_sample_file(
sample,
download_config,
sample.get_output_file_name(download_config),
Lock(), # this should be removed when CF::get_sample_file is refactored
)
_create_computed_file(computed_file, update_s3, clean_up_output_data=False)
sample.project.update_downloadable_sample_count()
if project and computed_file := ComputedFile.get_project_file(project, download_config):
_create_computed_file(computed_file, update_s3, clean_up_output_data=False)
if sample and computed_file := ComputedFile.get_sample_file(sample, download_config):
_create_computed_file(computed_file, update_s3, clean_up_output_data=False)
sample.project.update_downloadable_sample_count()

I think you can make Lock() the default value for a lock in get_sample_file. Otherwise rewriting it this way makes it a bit clearer.

@avrohomgottlieb avrohomgottlieb changed the base branch from dev to feature/batch October 16, 2024 13:30
Copy link
Contributor

@davidsmejia davidsmejia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks mostly correct, just some bugs / changes that we should discuss again before implementing.

projects = (
Project.objects.filter(project_computed_files__is_null=True)
if not project_id
else [Project.objects.filter(scpca_id=project_id).first()]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
else [Project.objects.filter(scpca_id=project_id).first()]
else Project.objects.filter(scpca_id=project_id)

if download_config_name not in common.PROJECT_DOWNLOAD_CONFIGS.keys():
logger.error(f"{download_config_name} is not a valid project download config name.")
logger.info(
f"Here are the correct project download_config names: "
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
f"Here are the correct project download_config names: "
f"Here are valid download_config_name values for projects: "

if download_config_name not in common.SAMPLE_DOWNLOAD_CONFIGS.keys():
logger.error(f"{download_config_name} is not a valid sample download config name.")
logger.info(
f"Here are the correct sample download_config names: "
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
f"Here are the correct sample download_config names: "
f"Here are valid download_config_name values for samples: "

f"Here are the correct sample download_config names: "
f"{common.SAMPLE_DOWNLOAD_CONFIGS.keys()}"
)
download_config = common.PROJECT_DOWNLOAD_CONFIGS[download_config_name]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
download_config = common.PROJECT_DOWNLOAD_CONFIGS[download_config_name]
download_config = common.SAMPLE_DOWNLOAD_CONFIGS[download_config_name]

@@ -205,9 +205,9 @@ def get_output_file_name(self, download_config: Dict) -> str:
def get_computed_file(self, download_config: Dict) -> ComputedFile:
"Return the project computed file that matches the passed download_config."
if download_config["metadata_only"]:
return self.computed_files.filter(metadata_only=True).first()
return self.project_computed_files.filter(metadata_only=True).first()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return self.project_computed_files.filter(metadata_only=True).first()
return self.computed_files.filter(metadata_only=True).first()

Let's keep using the property for now.


return self.computed_files.filter(
return self.project_computed_files.filter(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return self.project_computed_files.filter(
return self.computed_files.filter(

@@ -123,7 +123,7 @@ def get_metadata(self) -> Dict:

def get_computed_file(self, download_config: Dict) -> ComputedFile:
"Return the sample computed file that matches the passed download_config."
return self.computed_files.filter(
return self.sample_computed_files.filter(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return self.sample_computed_files.filter(
return self.computed_files.filter(

@avrohomgottlieb avrohomgottlieb changed the title 915 - Add generate computed files command 915 - Add generate computed file command Oct 29, 2024
@avrohomgottlieb avrohomgottlieb merged commit 5ab62f6 into feature/batch Oct 29, 2024
5 checks passed
@avrohomgottlieb avrohomgottlieb deleted the avrohom/915-add-generate-computed-files-command branch October 29, 2024 19:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add generate_computed_file Command Create dispatch_computed_file_generation command
2 participants