Skip to content
This repository has been archived by the owner on Aug 15, 2020. It is now read-only.

Commit

Permalink
Update README.rst
Browse files Browse the repository at this point in the history
  • Loading branch information
Sim Dy authored Aug 16, 2018
1 parent 4fc765d commit 5d86336
Showing 1 changed file with 20 additions and 52 deletions.
72 changes: 20 additions & 52 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -15,27 +15,32 @@ Please note that for optimal performance, CliNER requires the users to obtain a
* See the CliNER Wiki page for additional resources.

https://github.com/text-machine-lab/CliNER/wiki


Out-of-the-Box Model
--------

Although i2b2 licensing prevents us from releasing our cliner models trained on i2b2 data, we generated some comparable models from automatically-annotated MIMIC II text.

This silver MIMIC model can be found at http://text-machine.cs.uml.edu/cliner/models/silver.crf

Installation
--------

$ git clone https://github.com/text-machine-lab/CliNER.git

$ pip install -r requirements.txt

Verifying Installation
--------

$ wget http://text-machine.cs.uml.edu/cliner/models/silver.crf

$ mv silver.crf models/silver.crf

$ cliner predict --txt data/examples/ex_doc.txt --out data/predictions --model models/silver.crf --format i2b2


Out-of-the-Box Model
--------

Although i2b2 licensing prevents us from releasing our cliner models trained on i2b2 data, we generated some comparable models from automatically-annotated MIMIC II text.

This silver MIMIC model can be found at http://text-machine.cs.uml.edu/cliner/models/silver.crf

If you *do not* run into build errors, then your installation is complete.

Example Data
--------
Expand All @@ -57,7 +62,7 @@ Usage

Here are some use cases:

(1) Check that CliNER installed correctly
(1) Help

This help message will list the options available to run (train/predict/evaluate)

Expand Down Expand Up @@ -89,59 +94,22 @@ This allows us to evaluate how well CliNER does by comparing it against a gold s

Evaluate how well the system predictions did. Both sets of data must be in the same format, and that format must be specified. This means that both the examples and data/test_predictions directories contain the file pretend.con.



Optional Resources
--------

There are a few external resources that are not packaged with CliNER but can improve prediction performance for feature extraction with the CRF.

**GENIA**

*Why would I want this?* The GENIA tagger is a tool similar to CliNER but designed for Biomedical text. Depending on the domain of your data, this tool's pretrained model may or may not be able to improve performance for CliNER as it detects concepts.

The GENIA tagger identifies named entities in biomedical text.
To install:

> wget http://www.nactem.ac.uk/tsujii/GENIA/tagger/geniatagger-3.0.2.tar.gz

> tar xzvf geniatagger-3.0.2.tar.gz

> cd geniatagger-3.0.2

> make

Edit config.txt so that GENIA references the geniatagger executable just built. (e.g. "GENIA /someuser/CliNER/geniatagger-3.0.2/geniatagger")

[GENIA Reference](http://www.nactem.ac.uk/tsujii/GENIA/tagger/)
*Why would I want this?*

The GENIA tagger is a tool similar to CliNER but designed for biomedical text. Depending on the domain of your data, this tool's pretrained model may or may not be able to improve performance for CliNER as it detects concepts.

**UMLS**

*Why would I want this?* The UMLS, or Unified Medical Language System, is a very comprehensive database of various medical terms and concepts. Access to it would allow CliNER to leverage domain-specific knowledge.

SORRY! This resource is contains potentially sensitive clinical data, and requires a confidentiality agreement. We can't do that part for you.

In order to use the UMLS tables, you must request a license.
See: http://www.nlm.nih.gov/databases/umls.html

How to obtain UMLS tables:

* Download all the files from: https://www.nlm.nih.gov/research/umls/licensedcontent/umlsknowledgesources.html
* Unzip mmsys.zip into a folder and put all other files downloaded into that folder.
* Execute run_linux.sh and select 'Install UMLS' on gui.
* Choose a destination for umls directory, hit 'Ok' and then 'Create New Config'.
* Accept the agreement.
* Select 'Only Active UMLS Sources' as your default subset.
* Select 'Done' at the top right of gui pane and then select 'Begin Subset'.
* This process may take a while, the directory '<Destination_Directory_Path>/<UMLS VERSION>/META' should contain the necessary files needed.

You will need to get following tables: **LRARBR, MRREL.RRF, MRCONSO.RRF, MRSTY.RRF**

**Put these tables in the $CLINER_DIR/umls_tables directory.**
*Why would I want this?*

In order to tell CliNER that the tables are there, you must edit the file "$CLINER_DIR/config.txt" and change the line saying "UMLS None" to "UMLS <path to dir containing tables>".
The UMLS, or Unified Medical Language System, is a very comprehensive database of various medical terms and concepts. Access to it would allow CliNER to leverage domain-specific knowledge.

**The database will be built from the tables when CliNER is run for the first time.**

**[UMLS Reference](https://www.nlm.nih.gov/research/umls/quickstart.html)**
For installation of optional resources, please refer to the CliNER Wiki:
https://github.com/text-machine-lab/CliNER/wiki#optional-resources

0 comments on commit 5d86336

Please sign in to comment.