Skip to content

Commit

Permalink
update readme
Browse files Browse the repository at this point in the history
  • Loading branch information
Kern Lab authored Feb 3, 2018
1 parent 0895f41 commit 86f9cb1
Showing 1 changed file with 25 additions and 0 deletions.
25 changes: 25 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@ such as `conda` or `pip`. The complete list of dependencies looks like this:

- numpy
- scipy
- pandas
- scikit-allel
- scikit-learn
- tensorflow
Expand Down Expand Up @@ -262,4 +263,28 @@ optional arguments:
The predict mode takes as input nDims (as above), the two model files output by the train mode, an input file of empirical feature
vectors, and a file name for the prediction output.

#### a quick example of the train/predict cycle
We have supplied in the repo some example data that can give you a quick run through the train/predict cycle (we will also
shortly provide a soup-to-nuts example that starts by calculating feature vectors from simulations and ends with prediction of
genomic data). Let's quickly give that code a spin. The directories `testing/` and `training/` each contain appropriately
formatted diploid feature vectors that are ready to be fed into diploSHIC. First we will train the diploSHIC CNN, but we will
restrict the number of training epochs to 10 to keep things relatively brief (this runs in less than 5 minutes on our server).
```
$ python diploSHIC.py train 12 training/ testing/ fooModel --epochs 10
```
as it runs a bunch of information monitoring the training of the network will apear. We are tracking the loss and accuracy in the
validation set. When optimization is complete our trained network will be contained in two files, `fooModel.json` and
`fooModel.weights.hdf5`. The last bit of output from `diploSHIC.py` gives us information about the loss and accuracy on
the held out test data. From the above run my looks like this:
```
evaluation on test set:
diploSHIC loss: 0.404791
diploSHIC accuracy: 0.846800
```
Not bad. In practice I would set the `--epochs` value much higher than 10- the default setting of 100 should suffice in most cases.
Now that we have a trained model we can make predictions on some empirical data. In the repo there is a file called `testEmpirical.fvec`
that we will use as input
```
$ python diploSHIC.py predict 12 fooModel.json fooModel.weights.hdf5 testEmpirical.fvec testEmpirical.preds
```

0 comments on commit 86f9cb1

Please sign in to comment.