Skip to content

Commit

Permalink
Merge pull request #43 from mancusolab/dev
Browse files Browse the repository at this point in the history
Update to new version by fixing typos, small bugs, and adding option to remove ambiguous SNPs
  • Loading branch information
zeyunlu authored Jul 25, 2024
2 parents 97cb3ad + b7955f0 commit c60ab10
Show file tree
Hide file tree
Showing 8 changed files with 354 additions and 79 deletions.
8 changes: 4 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -108,13 +108,13 @@ You can play it with your own ideas!
| 0.13 | Add `--keep` command to enable user to specify a file that contains the subjects ID SuShiE will perform on. Add `--ancestry_index` command to enable user to specify a file that contains the ancestry index for fine-mapping. With this, user can input single phenotype, genotype, and covariate file that contains all the subjects across ancestries. Implement padding to increase inference time. Record elbo at each iteration and can access it in the `infer.SuShiEResult` object. The alphas table now outputs the average purity and KL divergence for each `L`. Change `--kl_threshold` to `--divergence`. Add `--maf` command to remove SNPs that less than minor allele frequency threshold within each ancestry. Add `--max_select` command to randomly select maximum number of SNPs to compute purity to avoid unnecessary memory spending. Add a QC function to remove duplicated SNPs. |
| 0.14 | Remove KL-Divergence pruning. Enhance command line appearance and improve the output files contents. Fix small bugs on multivariate KL. |
| 0.15 | Fix several typos; add a sanity check on reading vcf genotype data by assigning gt_types==Unknown as NA; Add preprint information. |
| 0.16 | Add option to remove ambiguous SNPs; fix several bugs and enhance codes quality. |

## Support

Please report any bugs or feature requests in the [Issue
Tracker](https://github.com/mancusolab/sushie/issues). If users have any
questions or comments, please contact Zeyun Lu (<[email protected]>) and
Nicholas Mancuso (<[email protected]>).
For any questions, comments, bug reporting, and feature requests, please contact Zeyun Lu (<[email protected]>) and
Nicholas Mancuso (<[email protected]>), and open a new thread in the [Issue
Tracker](https://github.com/mancusolab/sushie/issues).

## Other Software

Expand Down
13 changes: 9 additions & 4 deletions data/make_example.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,8 @@

# flip allele if necessary
def _allele_check(baseA0, baseA1, compareA0, compareA1):
# no snps that have more than 2 alleles
# e.g., G and T for EUR and G and A for AFR
correct = jnp.array(
((baseA0 == compareA0) * 1) * ((baseA1 == compareA1) * 1), dtype=int
)
Expand All @@ -18,9 +20,8 @@ def _allele_check(baseA0, baseA1, compareA0, compareA1):
)
correct_idx = jnp.where(correct == 1)[0]
flipped_idx = jnp.where(flipped == 1)[0]
wrong_idx = jnp.where((correct + flipped) == 0)[0]

return correct_idx, flipped_idx, wrong_idx
return correct_idx, flipped_idx


# read plink1.9 triplet
Expand Down Expand Up @@ -51,14 +52,13 @@ def _allele_check(baseA0, baseA1, compareA0, compareA1):
if n_pop > 1:
for idx in range(1, n_pop):
# keep track of miss match alleles
correct_idx, tmp_flip_idx, tmp_wrong_idx = _allele_check(
correct_idx, tmp_flip_idx = _allele_check(
snps["a0_0"].values,
snps["a1_0"].values,
snps[f"a0_{idx}"].values,
snps[f"a1_{idx}"].values,
)
flip_idx.append(tmp_flip_idx)
snps = snps.drop(index=tmp_wrong_idx)
snps = snps.drop(columns=[f"a0_{idx}", f"a1_{idx}"])
snps = snps.rename(columns={"a0_0": "a0", "a1_0": "a1"})

Expand Down Expand Up @@ -130,3 +130,8 @@ def _allele_check(baseA0, baseA1, compareA0, compareA1):
pd.concat(fam_index)[["iid"]].iloc[sel_pt, :].to_csv(
"./keep.subject", index=False, header=None, sep="\t"
)

rng_key, unif_key = random.split(rng_key, 2)
unif = random.uniform(unif_key, shape=(snps.shape[0],), minval=5, maxval=200)
snps["unif"] = unif
snps[["snp", "unif"]].to_csv("./prior_weights", index=False, header=None, sep="\t")
109 changes: 109 additions & 0 deletions data/prior_weights
Original file line number Diff line number Diff line change
@@ -0,0 +1,109 @@
rs2294190 82.629944
rs4653029 153.17215
rs2075931 123.91783
rs1883109 127.030655
rs2038008 158.2589
rs17385253 36.513393
rs2142534 37.721756
rs12098170 194.61343
rs4653032 40.086555
rs4653033 132.69426
rs10799058 189.72482
rs2075934 139.20802
rs11578724 121.33664
rs12134720 43.281784
rs10489377 13.916
rs4653036 11.430512
rs10799059 176.69559
rs2075933 61.103603
rs12035126 64.29181
rs4653037 24.734858
rs1016091 189.90926
rs6425895 107.337944
rs7544781 167.4216
rs4653042 181.94672
rs10799060 135.90553
rs12409604 191.37685
rs750882 199.56975
rs798076 72.80513
rs798075 126.811356
rs10914956 76.430916
rs11586990 170.84335
rs10914958 34.19725
rs41416648 179.50003
rs798066 77.906
rs3856131 23.705788
rs3905173 87.37802
rs12047023 129.29813
rs3845481 106.01135
rs2359106 74.0428
rs4652841 127.19247
rs12027125 19.288433
rs12566683 165.95
rs6694043 137.73972
rs4652846 187.0494
rs12038357 58.336098
rs1539374 83.38
rs10493058 143.16483
rs10914967 145.76859
rs10737381 67.32597
rs1418483 191.92468
rs7539462 152.5103
rs12136079 79.86711
rs10799067 100.771484
rs1342467 92.286095
rs947671 21.026522
rs6665029 73.00425
rs6677231 25.26735
rs6671294 152.75058
rs7530276 146.42778
rs12566354 122.516945
rs10914983 193.34686
rs12074036 192.44273
rs12033008 70.78067
rs12039962 183.77693
rs1342476 61.950565
rs6679913 131.7318
rs6703587 181.58058
rs10914986 181.33696
rs728448 52.374058
rs728447 47.42535
rs10799073 108.81207
rs12090691 76.55221
rs1886340 72.35997
rs4653061 139.58275
rs4653062 120.76586
rs10914991 122.037544
rs12117895 170.4533
rs1342475 194.37923
rs2147384 111.788536
rs7544557 27.45278
rs6695510 28.803083
rs716289 108.95569
rs10753314 149.97383
rs10799074 184.9501
rs2359631 87.64986
rs7547341 5.1670213
rs12059222 12.2328415
rs2093880 76.70654
rs6425904 107.91474
rs7548924 115.589226
rs10914996 188.51263
rs12132123 155.16908
rs868501 124.81361
rs612353 92.61688
rs12036748 190.79884
rs4652852 5.195567
rs491000 65.55253
rs506361 85.42702
rs6425908 187.92491
rs9787098 43.03403
rs509788 22.220636
rs12039039 35.800373
rs12041637 172.85303
rs9787040 140.03925
rs10915010 53.220226
rs4491030 27.456919
rs10493062 127.62521
rs912537 66.919495
rs927656 145.42041
26 changes: 21 additions & 5 deletions docs/manual.rst
Original file line number Diff line number Diff line change
Expand Up @@ -47,7 +47,8 @@ Although we highly recommend users to perform high-quality QC on their own genot
#. Only keep SNPs that are available in all the ancestries.
#. Adjust genotype data across ancestries based on the same reference alleles. Drop non-biallelic SNPs.
#. Remove SNPs that have minor allele frequency (MAF) less than 1% within each ancestry (users can change 1% with ``--maf``).
#. For single ancestry SuSiE, users have the option to perform rank inverse normalization transformation on the phenotype data.
# Users also have an option to remove ambiguous SNPs (i.e., A/T, T/A, C/G, or GC) by specifying ``--remove-ambiguous`` (Default is NOT to remove them).
#. For single ancestry SuSiE or Mega-SuSiE, users have the option to perform rank inverse normalization transformation on the phenotype data.

See :func:`sushie.cli.process_raw` for these QCs' source codes.

Expand Down Expand Up @@ -260,6 +261,16 @@ Users can use ``--keep`` command to specify a file that contains the subject IDs
cd ./data/
sushie finemap --pheno EUR.pheno AFR.pheno --vcf vcf/EUR.vcf vcf/AFR.vcf --keep keep.subject --output ./test_result
14. I want to assign the prior weights for each SNP
--------------------------------------------------

Users can use ``--pi`` command to specify a tsv file that contains the SNP ID and their prior weights. The weights will be normalized to sum to 1 before inference.

.. code:: bash
cd ./data/
sushie finemap --pheno EUR.pheno AFR.pheno --vcf vcf/EUR.vcf vcf/AFR.vcf --pi prior_weights --output ./test_result
.. _Param:

Parameters
Expand Down Expand Up @@ -314,10 +325,10 @@ Parameters
- ``--L 5``
- Integer number of shared effects pre-specified. Larger number may cause slow inference.
* - ``--pi``
- Float
- 1/p
- ``--pi 0.1``
- Prior probability for each SNP to be causal (:math:`\pi` in :ref:`Model`). Default is ``1/p`` where ``p`` is the number of SNPs in the region. It is the fixed across all ancestries.
- str
- "uniform"
- ``--pi ./prior_weights``
- Prior probability for each SNP to be causal (:math:`\pi` in :ref:`Model`). Default is uniform (i.e., ``1/p`` where ``p`` is the number of SNPs in the region. It is the fixed across all ancestries. Alternatively, users can specify the file path that contains the prior weights for each SNP. The weights have to be positive value. The weights will be normalized to sum to 1 before inference. The file has to be a tsv file that contains two columns where the first column is the SNP ID and the second column is the prior weights. Additional columns will be ignored. For SNPs do not have prior weights in the file, it will be assigned the average value of the rest. It can be a compressed file (e.g., tsv.gz). No headers.
* - ``--resid-var``
- Float
- 1e-3
Expand Down Expand Up @@ -393,6 +404,11 @@ Parameters
- False
- ``--no-reorder``
- Indicator to re-order single effects based on Frobenius norm of alpha-weighted posterior mean square. Default is False (to re-order). Specify --no-reorder will store 'True' value.
* - ``--remove-ambiguous``
- Boolean
- False
- ``--remove-ambiguous``
- Indicator to remove ambiguous SNPs (i.e., A/T, T/A, C/G, or G/C) from the genotypes. Recommend to remove these SNPs if each ancestry data is from different studies. Default is False (do not remove). Specify --remove-ambiguous will store 'True' value.
* - ``--meta``
- Boolean
- False
Expand Down
2 changes: 1 addition & 1 deletion setup.cfg
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ author = Zeyun Lu, Nicholas Mancuso
author_email = [email protected], [email protected]
license = MIT
license_files = LICENSE.txt
long_description = file: README.rst
long_description = file: README.md
long_description_content_type = text/x-rst; charset=UTF-8
url = https://github.com/mancusolab/sushie
# Add here related links, for example:
Expand Down
Loading

0 comments on commit c60ab10

Please sign in to comment.