Update README.rst

text-machine-lab · Aug 16, 2018 · 5d86336 · 5d86336
1 parent 4fc765d
commit 5d86336
Showing 1 changed file with 20 additions and 52 deletions.
diff --git a/README.rst b/README.rst
@@ -15,27 +15,32 @@ Please note that for optimal performance, CliNER requires the users to obtain a
 * See the CliNER Wiki page for additional resources. 
 
   https://github.com/text-machine-lab/CliNER/wiki
+
+
+Out-of-the-Box Model
+--------
+
+Although i2b2 licensing prevents us from releasing our cliner models trained on i2b2 data, we generated some comparable models from automatically-annotated MIMIC II text.
+
+This silver MIMIC model can be found at http://text-machine.cs.uml.edu/cliner/models/silver.crf
 
 Installation
 --------
 
+        $ git clone https://github.com/text-machine-lab/CliNER.git
 
         $ pip install -r requirements.txt
 
+Verifying Installation
+--------
+
         $ wget http://text-machine.cs.uml.edu/cliner/models/silver.crf
 
         $ mv silver.crf models/silver.crf
 
         $ cliner predict --txt data/examples/ex_doc.txt --out data/predictions --model models/silver.crf --format i2b2
 
-
-Out-of-the-Box Model
---------
-
-Although i2b2 licensing prevents us from releasing our cliner models trained on i2b2 data, we generated some comparable models from automatically-annotated MIMIC II text.
-
-This silver MIMIC model can be found at http://text-machine.cs.uml.edu/cliner/models/silver.crf
-
+If you *do not* run into build errors, then your installation is complete.
 
 Example Data
 --------
@@ -57,7 +62,7 @@ Usage
 
 Here are some use cases:
 
-(1) Check that CliNER installed correctly
+(1) Help
 
 This help message will list the options available to run (train/predict/evaluate)
 
@@ -89,59 +94,22 @@ This allows us to evaluate how well CliNER does by comparing it against a gold s
 
 Evaluate how well the system predictions did. Both sets of data must be in the same format, and that format must be specified. This means that both the examples and data/test_predictions directories contain the file pretend.con.
 
-
-
 Optional Resources
 --------
 
 There are a few external resources that are not packaged with CliNER but can improve prediction performance for feature extraction with the CRF.
 
 **GENIA**
 
-*Why would I want this?* The GENIA tagger is a tool similar to CliNER but designed for Biomedical text. Depending on the domain of your data, this tool's pretrained model may or may not be able to improve performance for CliNER as it detects concepts.
-
-The GENIA tagger identifies named entities in biomedical text. 
-To install:
-
-        > wget http://www.nactem.ac.uk/tsujii/GENIA/tagger/geniatagger-3.0.2.tar.gz
-
-        > tar xzvf geniatagger-3.0.2.tar.gz
-
-        > cd geniatagger-3.0.2
-
-        > make
-
-Edit config.txt so that GENIA references the geniatagger executable just built. (e.g. "GENIA   /someuser/CliNER/geniatagger-3.0.2/geniatagger")
-
-[GENIA Reference](http://www.nactem.ac.uk/tsujii/GENIA/tagger/)
+*Why would I want this?* 
 
+The GENIA tagger is a tool similar to CliNER but designed for biomedical text. Depending on the domain of your data, this tool's pretrained model may or may not be able to improve performance for CliNER as it detects concepts.
 
 **UMLS**
 
-*Why would I want this?* The UMLS, or Unified Medical Language System, is a very comprehensive database of various medical terms and concepts. Access to it would allow CliNER to leverage domain-specific knowledge.
-
-SORRY! This resource is contains potentially sensitive clinical data, and requires a confidentiality agreement. We can't do that part for you. 
-
-In order to use the UMLS tables, you must request a license. 
-See: http://www.nlm.nih.gov/databases/umls.html
-
-How to obtain UMLS tables:
-
-* Download all the files from: https://www.nlm.nih.gov/research/umls/licensedcontent/umlsknowledgesources.html
-* Unzip mmsys.zip into a folder and put all other files downloaded into that folder.
-* Execute run_linux.sh and select 'Install UMLS' on gui.
-* Choose a destination for umls directory, hit 'Ok' and then 'Create New Config'.
-* Accept the agreement.
-* Select 'Only Active UMLS Sources' as your default subset.
-* Select 'Done' at the top right of gui pane and then select 'Begin Subset'.
-* This process may take a while, the directory '<Destination_Directory_Path>/<UMLS VERSION>/META' should contain the necessary files needed.
-
-You will need to get following tables: **LRARBR, MRREL.RRF, MRCONSO.RRF, MRSTY.RRF**
-
-**Put these tables in the $CLINER_DIR/umls_tables directory.**
+*Why would I want this?* 
 
-In order to tell CliNER that the tables are there, you must edit the file "$CLINER_DIR/config.txt" and change the line saying "UMLS  None" to "UMLS <path to dir containing tables>".
+The UMLS, or Unified Medical Language System, is a very comprehensive database of various medical terms and concepts. Access to it would allow CliNER to leverage domain-specific knowledge.
 
-**The database will be built from the tables when CliNER is run for the first time.**
-
-**[UMLS Reference](https://www.nlm.nih.gov/research/umls/quickstart.html)**   
+For installation of optional resources, please refer to the CliNER Wiki: 
+    https://github.com/text-machine-lab/CliNER/wiki#optional-resources