You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I've tried the official settings, but the results lag far behind the reported ones.
For the fact that use seq length 2x-4x of the downstream tasks, I tried 2 experiments:
use the officially produced tiny 1k 256 weights,
train a 2k model with no seq warm up(max length is 2k)
But both underperformed the reported results.
Here is the config file for my nucleotide transformer tasks:
model: name: dna_embedding
d_model: 256
n_layer: 2
d_inner: ${eval:4 * ${.d_model}}
vocab_size: 12
resid_dropout: 0.0
embed_dropout: 0.1
fused_mlp: False # figure out how to use fused MLP, maybe only with bf16 + a100
fused_dropout_add_ln: True
residual_in_fp32: True
pad_vocab_size_multiple: 8
layer: name: hyena
emb_dim: 5
filter_order: 64
short_filter_order: 3
l_max: 1026 # required to be set the same as the pretrained model if using, don't forget the +2! ${eval:${dataset.max_length}+2}
modulate: True
w: 10
lr: ${optimizer.lr}
wd: 0.0
lr_pos_emb: 0.0
train:
gpu_mem: ${eval:"round(float(import('subprocess').check_output('nvidia-smi -i 0 --query-gpu=memory.total --format=csv,noheader,nounits', shell=True).strip().decode()) / 1000)"}
seed: 2222
global_batch_size: ${eval:${trainer.devices}*${dataset.batch_size}}
remove_test_loader_in_eval: true # no test set in this benchmark
pretrained_model_strict_load: False # false allows encoder/decoder to be used if new model uses it
for loading backbone and not head, requires both of these flags below
pretrained_model_path: hyena-dna/outputs/weights.ckpt
pretrained_model_state_hook: name: load_backbone
freeze_backbone: false # seems to work much better if false (ie finetune entire model)
The text was updated successfully, but these errors were encountered:
I've tried the official settings, but the results lag far behind the reported ones.
For the fact that use seq length 2x-4x of the downstream tasks, I tried 2 experiments:
But both underperformed the reported results.
Here is the config file for my nucleotide transformer tasks:
@Package global
defaults:
model:
name: dna_embedding
d_model: 256
n_layer: 2
d_inner: ${eval:4 * ${.d_model}}
vocab_size: 12
resid_dropout: 0.0
embed_dropout: 0.1
fused_mlp: False # figure out how to use fused MLP, maybe only with bf16 + a100
fused_dropout_add_ln: True
residual_in_fp32: True
pad_vocab_size_multiple: 8
layer:
name: hyena
emb_dim: 5
filter_order: 64
short_filter_order: 3
l_max: 1026 # required to be set the same as the pretrained model if using, don't forget the +2! ${eval:${dataset.max_length}+2}
modulate: True
w: 10
lr: ${optimizer.lr}
wd: 0.0
lr_pos_emb: 0.0
task:
name: masked_multiclass
loss: cross_entropy
metrics:
- accuracy
torchmetrics: null
trainer:
accelerator: gpu
devices: 4
num_nodes: 1
accumulate_grad_batches: ${div_up:${train.global_batch_size}, ${eval:${trainer.devices} * ${dataset.batch_size} * ${trainer.num_nodes}}}
max_epochs: 200
precision: 16 # bf16 only a100
gradient_clip_val: 1.0
name maxlen classes samples metric
enhancer 200 2 14968 MCC
enhancer_types 200 3 14968 MCC
H3 500 2 13468 MCC
H3K4me1 500 2 28509 MCC
H3K4me2 500 2 27614 MCC
H3K4me3 500 2 33119 MCC
H3K9ac 500 2 25003 MCC
H3K14ac 500 2 29743 MCC
H3K36me3 500 2 31392 MCC
H3K79me3 500 2 25953 MCC
H4 500 2 13140 MCC
H4ac 500 2 30685 MCC
promoter_all 300 2 53276 F1
promoter_non_tata 300 2 47759 F1
promoter_tata 300 2 5517 F1
splice_sites_acceptor 600 2 19961 F1
splice_sites_donor 600 2 19775 F1
dataset:
batch_size: 32
dataset_name: 'H3K4me1'
tokenizer_name: char
add_eos: false
rc_aug: false # reverse complement augmentation
return_mask: false
padding_side: left
scheduler:
t_in_epochs: False
t_initial: ${eval:${div_up:${dataset.train_len}, ${train.global_batch_size}} * ${trainer.max_epochs}}
warmup_lr_init: 1e-6
warmup_t: ${eval:${div_up:${dataset.train_len}, ${train.global_batch_size}} * ${trainer.max_epochs} * 0.01}
lr_min: ${eval:0.1 * ${optimizer.lr}}
optimizer:
lr: 6e-4
weight_decay: 0.1
train:
gpu_mem: ${eval:"round(float(import('subprocess').check_output('nvidia-smi -i 0 --query-gpu=memory.total --format=csv,noheader,nounits', shell=True).strip().decode()) / 1000)"}
seed: 2222
global_batch_size: ${eval:${trainer.devices}*${dataset.batch_size}}
remove_test_loader_in_eval: true # no test set in this benchmark
pretrained_model_strict_load: False # false allows encoder/decoder to be used if new model uses it
for loading backbone and not head, requires both of these flags below
pretrained_model_path: hyena-dna/outputs/weights.ckpt
pretrained_model_state_hook:
name: load_backbone
freeze_backbone: false # seems to work much better if false (ie finetune entire model)
The text was updated successfully, but these errors were encountered: