Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add worker to query classifier data to fit cognoml model #11

Open
jessept opened this issue Nov 12, 2016 · 4 comments
Open

Add worker to query classifier data to fit cognoml model #11

jessept opened this issue Nov 12, 2016 · 4 comments
Assignees

Comments

@jessept
Copy link
Collaborator

jessept commented Nov 12, 2016

@awm33 I created this so we can track it going forward.

A given classifier task has a list of entrezids and disease types. The worker code will query for any samples that match the list of disease types and join that to the mutations table. The result will not be an a [sample_id,mutation_status] form, so the worker needs to transform it into that form and pass it to the cognoml code.

@dhimmel
Copy link
Member

dhimmel commented Nov 12, 2016

The worker code will query for any samples that match the list of disease types and join that to the mutations table.

I'm not sure the worker code has to do this at all. The frontend will need access to this information for their realtime displays. Therefore, we can potentially outsource/consolidate this computation to the frontend?

@awm33
Copy link
Member

awm33 commented Nov 12, 2016

@dhimmel

The frontend will need access to this information for their realtime displays.

What realtime displays?

Therefore, we can potentially outsource/consolidate this computation to the frontend?

It's just reformatting some JSON results into [sample_id,mutation_status] , I don't think it will be computationally expensive.

If I can just pass something like [sample_id,mutation_status] or a dict, or whatever singular format is chosen, that's all that's needed. Oh, and the congnoml code should know to skip pulling the mutation statues remotely, since it has been passed in by the worker code.

@dhimmel
Copy link
Member

dhimmel commented Nov 12, 2016

If I can just pass something like [sample_id,mutation_status] or a dict, or whatever singular format

@awm33 --- so your main priority is to pass both sample_id and mutation_status status info in a single parameter? Your problem with the current design is that we force you to split this information into two parameters?

If I'm understanding correctly, I'd then favor parameter called sample_to_mutation_status that consumes a dictionary. The dictionary is nice because it makes clear that the ordering is not important.

@awm33
Copy link
Member

awm33 commented Nov 12, 2016

@dhimmel Ah, it looks like I can still pass it. An earlier version of the refactor PR expected a dataframe.

Then I think we just need to make get_mutations_df run optionally and we should be good.

@jessept jessept self-assigned this Nov 16, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants