-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merging mass-mapping to main #3
base: master
Are you sure you want to change the base?
Changes from all commits
a0fd6c7
33beb84
a865c73
c850275
2a60244
91be170
cff2bde
c0cd7dc
bc01aba
4ffad34
982dc73
db59207
ab33eb4
54c9325
79e0335
0c6d8d3
e3ea9cc
8b0a448
56d7c0f
e0aff25
eaf9304
0ab2179
2f5ef01
f84d814
6884078
22da7c6
33b841e
40c57cb
8b9da61
10e5feb
81f6410
6100a94
aabf6f8
4068f7f
623fa47
2a50ab5
7a3eadd
a4ef3c1
da198b8
a9bb75f
dd27f01
0448119
0fef025
e64beb4
3879ffe
6f945b9
458d2bb
040d790
4552e79
2388783
c0c752d
eb2d0ba
e567def
1a66bb5
bc7a8f6
160f6b6
5a6ec55
4c3b202
42d037f
a1cd6c1
f27943d
bb73623
b48afb3
340d3e7
a1ad36b
0ff1894
6f4b9ae
07bf73a
031d0a2
d76d7bb
8f083e5
4379995
597b80e
24fabe5
9c8555e
b3f2dcd
e218d1a
a54baf4
edb00b3
f8f7457
18c4529
7a7eae6
739ccb1
67810e2
35a615e
9939eed
d6c237d
93e1778
34a2862
6dd22f1
37d380c
830ad72
9fa1531
70dc2ac
5f76ee2
fc21a82
1007bc2
91c4757
cbd4c4a
510c074
d28c15c
0e4d9f1
12febf1
38e9f91
e9d5538
803dcac
ae8314b
e340c4f
939aefd
d5db4e7
33e4ae2
80e47f0
ce3295a
813db7d
1a27b3b
951b8b1
23f8a7d
8df820f
6f84dfc
e38c7c8
ef63b54
9615931
c2500f4
ec7c896
f8359f2
a000cdd
741aa00
2e08d65
2b231a7
39f3716
604ebca
f4bf412
6f40477
10c1574
c314d1d
a97dcd3
5afff1c
97053c0
bde5273
7049fb9
44e483a
bfbf5b4
b9a5fc9
252e078
b5f26c6
436c38a
05fad96
c43347f
5b47cc3
512c105
2170788
dadae50
8e4f175
bb03abc
2f2598b
453827c
67f9ffa
8ccaab3
d110a3d
125b1ff
136caf7
9349c97
937fc30
ce70928
671c567
9e71f04
4c42283
86e8f04
71215fc
fdbe4fc
73df881
2e4d9cb
b5174bf
796d4be
ebfcf08
7b03eed
0942ace
c2ff13a
cdf0a6a
da87b31
2540a10
1d3ceb0
8cbe51a
246314b
f500a7f
4ffb0b6
b03a4bc
4d94d39
9cec270
467480d
7eb8588
7b3b72d
08f3229
e2de336
4c312f1
667f6bf
493b78a
e23b534
e20bfb1
112409d
d2b813a
08ba1bc
a59620c
b2040dd
19d2f74
747f58a
1054da5
82ae302
e13d5f3
067a39e
fbd1636
d33c269
e12ac49
dc3e63b
d2b2408
bdc8705
e129df6
be91b63
4f3d46a
6a2388b
6257f20
45f1a7b
c431fc1
f4dfe73
846dd4f
459cc73
1e76e87
524550b
a05d4e6
8f3cffb
5ed4910
2e0e4b0
e76661e
7cebbb9
770f4a3
f83f6e7
4d04fe3
fc19b5d
543d5ef
830981c
10d0cbe
d81e240
ddefdcd
dea55b0
8c27c4c
1e83446
c47a73d
f299b53
9d12018
42fdf29
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,72 +1,25 @@ | ||
# A Regularized Conditional GAN for Posterior Sampling in Inverse Problems [[arXiv]](https://arxiv.org/abs/2210.13389) | ||
# Generative modelling for mass-mapping with fast uncertainty quantification [[arXiv]](https://arxiv.org/abs/2410.24197) | ||
## Setup | ||
See ```docs/setup.md``` for basic environment setup instructions. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This file does not exist |
||
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. You need to add a reference to the |
||
## Reproducing our Results | ||
### MRI | ||
See ```docs/mri.md``` for instructions on how to setup and reproduce our MRI results. | ||
|
||
## Extending the Code | ||
See ```docs/new_applications.md``` for basic instructions on how to extend the code to your application. | ||
## Reproducing the our Results | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. typo |
||
### | ||
See ```docs/mass_mapping.md``` for instructions on how to setup and reproduce our COSMOS results. | ||
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Could you coordinate with Matthijs so that the radio application of the GAN is also updated on this main branch. It'd be good to have everything to reproduce the results of both papers in a single repo. |
||
## Questions and Concerns | ||
If you have any questions, or run into any issues, don't hesitate to reach out at [email protected]. | ||
|
||
## TODO | ||
- [x] Migrate to PyTorch Lightning | ||
- [x] Reimplement MRI rcGAN | ||
- [x] Update MRI experiment to R=8 | ||
- [ ] Reimplement inpainting rcGAN | ||
- [ ] Extend to super resolution | ||
If you have any questions, or run into any issues, don't hesitate to reach out at [email protected] | ||
|
||
## References | ||
This repository contains code from the following works, which should be cited: | ||
|
||
``` | ||
@article{zbontar2018fastmri, | ||
title={fastMRI: An open dataset and benchmarks for accelerated MRI}, | ||
author={Zbontar, Jure and Knoll, Florian and Sriram, Anuroop and Murrell, Tullie and Huang, Zhengnan and Muckley, Matthew J and Defazio, Aaron and Stern, Ruben and Johnson, Patricia and Bruno, Mary and others}, | ||
journal={arXiv preprint arXiv:1811.08839}, | ||
year={2018} | ||
} | ||
|
||
@article{devries2019evaluation, | ||
title={On the evaluation of conditional GANs}, | ||
author={DeVries, Terrance and Romero, Adriana and Pineda, Luis and Taylor, Graham W and Drozdzal, Michal}, | ||
journal={arXiv preprint arXiv:1907.08175}, | ||
year={2019} | ||
} | ||
This repository was forked from rcGAN by Bendel et al., with significant changes and modification made by Whitney et al. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Add link to original Bendel paper as well as the original rcGAN repo |
||
|
||
@inproceedings{Karras2020ada, | ||
title={Training Generative Adversarial Networks with Limited Data}, | ||
author={Tero Karras and Miika Aittala and Janne Hellsten and Samuli Laine and Jaakko Lehtinen and Timo Aila}, | ||
booktitle={Proc. NeurIPS}, | ||
year={2020} | ||
} | ||
|
||
@inproceedings{zhao2021comodgan, | ||
title={Large Scale Image Completion via Co-Modulated Generative Adversarial Networks}, | ||
author={Zhao, Shengyu and Cui, Jonathan and Sheng, Yilun and Dong, Yue and Liang, Xiao and Chang, Eric I and Xu, Yan}, | ||
booktitle={International Conference on Learning Representations (ICLR)}, | ||
year={2021} | ||
} | ||
|
||
@misc{zeng2022github, | ||
howpublished = {Downloaded from \url{https://github.com/zengxianyu/co-mod-gan-pytorch}}, | ||
month = sep, | ||
author={Yu Zeng}, | ||
title = {co-mod-gan-pytorch}, | ||
year = 2022 | ||
} | ||
``` | ||
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It'd be good to add a reproducibility section and links to Zenodo. It'd be good to find there the trained GAN used to generate the results. Also, some samples and reconstructions of the GAN. Should we include the simulations in a zenodo? It depends on the size of the sims. If they are too big, you could just upload some of them to have a representative set. |
||
## Citation | ||
If you find this code helpful, please cite our paper: | ||
``` | ||
@journal{bendel2022arxiv, | ||
author = {Bendel, Matthew and Ahmad, Rizwan and Schniter, Philip}, | ||
title = {A Regularized Conditional {GAN} for Posterior Sampling in Inverse Problems}, | ||
year = {2022}, | ||
journal={arXiv:2210.13389} | ||
@journal{2024arxiv, | ||
author = {Whitney, Jessica and Liaudat, Tobías and Price, Matthew and Mars, Matthijs and McEwen, Jason}, | ||
title = {Generative modelling for mass-mapping with fast uncertainty quantification}, | ||
year = {2024}, | ||
journal={arXiv:2410.24197} | ||
} | ||
``` |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,6 +1,14 @@ | ||
# rcGAN development version | ||
|
||
|
||
# Installation | ||
|
||
If in the Hypatia cluster, first run: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We should not mention Hypatia, but be general for a computer cluster. |
||
``` bash | ||
source /share/apps/anaconda/3-2022.05/etc/profile.d/conda.sh | ||
``` | ||
|
||
|
||
First install the conda dependencies setting the correct channels: | ||
``` bash | ||
conda create --name cGAN --file conda_requirements.txt --channel pytorch --channel nvidia --channel conda-forge --channel defaults | ||
|
@@ -26,10 +34,56 @@ configs -> `~/.config/wandb` -> `WANDB_CONFIG_DIR` | |
|
||
# Set the variables | ||
``` bash | ||
export WANDB_DIR=/share/gpu0/tl3/wandb/logs | ||
export WANDB_CACHE_DIR=/share/gpu0/tl3/wandb/.cache/wandb | ||
export WANDB_CONFIG_DIR=/share/gpu0/tl3/wandb/.config/wandb | ||
export WANDB_DIR=/share/gpu0/jjwhit/wandb/logs | ||
export WANDB_CACHE_DIR=/share/gpu0/jjwhit/wandb/.cache/wandb | ||
export WANDB_CONFIG_DIR=/share/gpu0/jjwhit/wandb/.config/wandb | ||
``` | ||
|
||
# Training the model | ||
|
||
Training is as simple as running the following command: | ||
```python | ||
python train.py --config ./configs/mass_map.yml --exp-name rcgan_test --num-gpus X | ||
``` | ||
where ```X``` is the number of GPUs you plan to use. Note that this project uses Weights and Biases (wandb) for logging. | ||
See [their documentation](https://docs.wandb.ai/quickstart) for instructions on how to setup environment variables. | ||
Alternatively, you may use a different logger. See PyTorch Lightning's [documentation](https://lightning.ai/docs/pytorch/stable/extensions/logging.html) for options. | ||
|
||
If you need to resume training, use the following command: | ||
```python | ||
python train.py --config ./configs/mass_map.yml --exp-name rcgan_test --num-gpus X --resume --resume-epoch Y | ||
``` | ||
where ```Y``` is the epoch to resume from. | ||
|
||
By default, we save the previous 50 epochs. Ensure that your checkpoint path points to a location with sufficient disk space. | ||
If disk space is a concern, 50 can be reduced to 25. | ||
This is important for the next step, validation. | ||
|
||
|
||
## Multi-GPU Runs | ||
To make the lightning module work on multiple GPUs (and on multiple nodes) when using the SLURM workload manager, we need to be careful in setting up the SLURM job script. An example of how to do this can be found here https://pytorch-lightning.readthedocs.io/en/1.2.10/clouds/slurm.html. | ||
|
||
In particular if we want to run on 4 GPUs on one node we need to make sure that we ask for 4 GPUs as well as 4 tasks (since lightning will create 1 task per GPU) per node: | ||
|
||
``` | ||
#SBATCH --gres=gpu:4 # n_gpus | ||
#SBATCH --ntasks-per-node=4 # ntasks needs to be same as n_gpus | ||
``` | ||
|
||
An example of a job-script for training using multiple GPUs can be found in [examples/example_multi_gpu.sh](https://github.com/astro-informatics/rcGAN/blob/dev-multiGPU/examples/example_multi_gpu_train.sh) | ||
|
||
## Batch size tuning | ||
Additionally I have created a script, [find_batch_size.py](https://github.com/astro-informatics/rcGAN/blob/dev-multiGPU/find_batch_size.py) that finds the largest batch_size that you can run per GPU. This depends on the VRAM available on the GPU and can therefore vary accross machines/nodes. An example job file can be found in [examples/example_find_batch_size.sh](https://github.com/astro-informatics/rcGAN/blob/dev-multiGPU/examples/example_find_batch_size.sh). Usage is: | ||
|
||
``` | ||
python find_batch_size.py --config [config_file.yml] | ||
``` | ||
|
||
Finally, to support larger batch sizes we can accumulate the gradients over batch sizes. In order to enable this and set the amount of accumulation you can add to your config file: | ||
|
||
``` | ||
batch_size: 8 # batch_size per GPU (because of DDP) | ||
accumulate_grad_batches: 2 # updates model after 2 batches per GPU | ||
``` | ||
|
||
When using the distributed data processing (DDP) training strategy, the model is copied exactly on each GPU and they all see only a part of the data during the epoch. After processing 1 batch on each of the GPUs, the gradients from each of the GPUs are averaged and the models are updated. If we use gradient accumulation the gradients are instead averaged over several of such steps. The effective batch size of the model is therefore: n_gpus * batch_size * accumulate_grad_batches. |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,36 @@ | ||
#Change checkpoint and sense_map path | ||
checkpoint_dir: /share/gpu0/jjwhit/mass_map/mm_models/ # Where model will save checkpoints | ||
data_path: /share/gpu0/jjwhit/kappa_cosmos_simulations/cropped_dataset/ # Path to simulation dataset | ||
cosmo_dir_path: /home/jjwhit/rcGAN/mass_map_utils/cosmos/ # Path to cosmos information such as mask | ||
save_path: /share/gpu0/jjwhit/samples/real_output/ # where figures and samples will be saved | ||
|
||
# Define the experience | ||
experience: mass_mapping | ||
kappa_mean: 0.00015744006243248638 # Value calculated during preprocessing | ||
kappa_std: 0.02968584954283938 # Value calculated during preprocessing | ||
|
||
# Number of code vectors for each phase | ||
num_z_test: 32 | ||
num_z_valid: 8 | ||
num_z_train: 2 | ||
|
||
# Data | ||
in_chans: 4 # Real+Imag parts from observation + Kaiser squires map | ||
out_chans: 1 # A real convergence map | ||
im_size: 300 # Pixel width/height (square iamges) | ||
|
||
# Optimizer: | ||
lr: 0.001 | ||
beta_1: 0 | ||
beta_2: 0.99 | ||
|
||
# Loss weights | ||
gp_weight: 10 | ||
adv_weight: 1e-5 | ||
|
||
# Training | ||
batch_size: 9 | ||
num_epochs: 100 | ||
psnr_gain_tol: 0.25 | ||
|
||
num_workers: 4 |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,34 @@ | ||
#Change checkpoint and sense_map path | ||
checkpoint_dir: /share/gpu0/mars/TNG_data/rcGAN/models/ | ||
data_path: /share/gpu0/mars/TNG_data/rcGAN/fourier/ | ||
|
||
# Define the experience | ||
experience: radio | ||
|
||
# Number of code vectors for each phase | ||
num_z_test: 32 | ||
num_z_valid: 8 | ||
num_z_train: 2 | ||
|
||
# Data | ||
in_chans: 3 # Real+Imag parts from obs | ||
out_chans: 2 | ||
im_size: 360 #384x384 pixel images | ||
|
||
|
||
# Optimizer: | ||
lr: 0.001 | ||
beta_1: 0 | ||
beta_2: 0.99 | ||
|
||
# Loss weights | ||
gp_weight: 10 | ||
adv_weight: 1e-5 | ||
|
||
# Training | ||
batch_size: 1 | ||
#Remember to increase this for full training | ||
num_epochs: 10 | ||
psnr_gain_tol: 0.25 | ||
|
||
num_workers: 4 |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,34 @@ | ||
#Change checkpoint and sense_map path | ||
checkpoint_dir: /share/gpu0/mars/TNG_data/rcGAN/models/varying/ | ||
data_path: /share/gpu0/mars/TNG_data/rcGAN/image_psfs/ | ||
|
||
# Define the experience | ||
experience: radio | ||
|
||
# Number of code vectors for each phase | ||
num_z_test: 32 | ||
num_z_valid: 8 | ||
num_z_train: 2 | ||
|
||
# Data | ||
in_chans: 3 # Real+Imag parts from obs | ||
out_chans: 2 | ||
im_size: 256 #384x384 pixel images | ||
|
||
|
||
# Optimizer: | ||
lr: 0.001 | ||
beta_1: 0 | ||
beta_2: 0.99 | ||
|
||
# Loss weights | ||
gp_weight: 10 | ||
adv_weight: 1e-5 | ||
|
||
# Training | ||
batch_size: 8 | ||
#Remember to increase this for full training | ||
num_epochs: 100 | ||
psnr_gain_tol: 0.25 | ||
|
||
num_workers: 4 |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,35 @@ | ||
#Change checkpoint and sense_map path | ||
checkpoint_dir: /share/gpu0/tl3/cGAN/radio/trained_model/ | ||
data_path: /share/gpu0/mars/TNG_data/rcGAN/image_psfs/ | ||
|
||
# Define the experience | ||
experience: radio | ||
|
||
# Number of code vectors for each phase | ||
num_z_test: 32 | ||
num_z_valid: 8 | ||
num_z_train: 2 | ||
|
||
# Data | ||
in_chans: 3 # Real+Imag parts from obs | ||
out_chans: 2 | ||
im_size: 256 #384x384 pixel images | ||
|
||
|
||
# Optimizer: | ||
lr: 0.001 | ||
beta_1: 0 | ||
beta_2: 0.99 | ||
|
||
# Loss weights | ||
gp_weight: 10 | ||
adv_weight: 1e-5 | ||
|
||
# Training | ||
batch_size: 4 | ||
accumulate_grad_batches: 2 | ||
#Remember to increase this for full training | ||
num_epochs: 1 | ||
psnr_gain_tol: 0.25 | ||
|
||
num_workers: 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It'd be good to add a small description of the method (something like the abstract used for the paper but more synthetic). Also select one figure from the paper to illustrate the method (I'd suggest the one from the COSMOS field showing the reconstruction and std dev).