Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Finish report #15

Closed
13 of 19 tasks
carrascomj opened this issue Dec 9, 2019 · 3 comments
Closed
13 of 19 tasks

Finish report #15

carrascomj opened this issue Dec 9, 2019 · 3 comments
Assignees
Labels
documentation Improvements or additions to documentation model Architecture and hyperparameters deep learning model related pre-processing

Comments

@carrascomj
Copy link
Owner

carrascomj commented Dec 9, 2019

Christmas is coming and the deadline is looming! Please tell others if you are working on one of the tasks and edit this issue as you please.

Urgent and doable right now

  • Introduction (in Overleaf).
  • Methods.
  • Draft of Abstract.
  • Pre-processing exploitation.
  • Pre-processing explain: add information in the input pd.DataFrame about the
    functions of each gene so we can find explanations later.
  • Post-processing gathering (put all the functions together).
  • Table of results/performance/confusion matrix (try stuff and annotate
    results).

Urgent but blocked

  • Results.
  • Discussion.
  • Conclusion (I am keen on separating all the sections, but these three
    may be combined if needed).
  • Finish Abstract.
  • Analysis of difficult cases.

Not urgent but doable

  • Convnet: Try one hot encoding instead of index input (4 input channels)
  • Convnet: analysis of activations.
  • CBOW: tSNE/PCA.
  • Proper handling of UNK characters (N, S, etc.), just remove them and break
    the window at this point.

Proposed in feedback

  • Add more conv layers with different kernel size.
  • Explore CBOW k.
  • There were more things but I don't remember them, just edit this section if you feel like doing so.
@carrascomj carrascomj added documentation Improvements or additions to documentation model Architecture and hyperparameters deep learning model related pre-processing labels Dec 9, 2019
@carrascomj carrascomj added this to the MILESTONE 1: Gene/No gene milestone Dec 9, 2019
@carrascomj
Copy link
Owner Author

carrascomj commented Dec 9, 2019

Added some classifiers to the gene ranges in the input dataframe (10138d0). Those classifiers are name and product_accession of their corresponding "CDS" region, whose id should allow us to cluster and analyze "difficult cases".

This feature makes the preprocessing workflow slower. To get to the same performance, we would need to change the logic in src/pre-processing/get_labeled_ranges to work with pd.DataFrame instead of numpy arrays.

EDIT: improved a bit with memoization based on sequentiality of overlapping windows.

@carrascomj carrascomj pinned this issue Dec 9, 2019
@carrascomj
Copy link
Owner Author

Overlapping windows is now the default for the window_pipeline.

@carrascomj
Copy link
Owner Author

carrascomj commented Dec 10, 2019

All the post-processing was updated as used in the notebooks (commit b5969bc). As an extra, the functions for the calculation of the confussion matrix and the CLI were updated in /src/evaluation and /bin, respectively.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation model Architecture and hyperparameters deep learning model related pre-processing
Projects
None yet
Development

No branches or pull requests

3 participants