-
Notifications
You must be signed in to change notification settings - Fork 1
1. Introduction
leADS (multi-label learning based on active dataset subsampling) a simple framework, that leverages the idea of subsampling pathway data to reduce the negative impact of training loss due to imbalances in the distribution of pathways in the dataset. Specifically, leADS (Fig. \ref{fig:workflow}A.) performs training in three iterative steps:
- Building an acquisition model. At the very first iteration, an empty set is initialized with randomly selected data from a given pathway dataset (Fig. \ref{fig:workflow}a-b). Then, an ensemble consisting of g members is constructed (Fig. \ref{fig:workflow}c), where each member g in an ensemble is trained on a randomly selected portion of the training data.
- Dataset sub-sampling. During this step, a subset of pathway data is selected using one of the following four acquisition functions: entropy, mutual information, variation ratios, and normalized propensity scored precision at k (nPSP@k). For each function, the top per% examples are retrieved, where per% (\in (0, 100]) is a prespecified hyperparameter indicating the subsampling proportion (Fig. \ref{fig:workflow}d).
- Train using sub-sampled data. Examples from the previous step are used to train leADS as the multi-label 1-vs-All approach (similar to mlLGPR (Fig. \ref{fig:workflow}e).
The three steps are repeated
If you find leADS useful in your research, please consider citing the following paper:
M. A. Basher, Abdur Rahman and Hallam, Steven J.. "Multi-label pathway prediction based on active dataset subsampling.", bioRxiv (2020).
For any inquiries or issues, please contact Abdurrahman Abul-Basher at: [email protected]