This project has been archived. For more information about BioMedICUS see the main project repository: https://github.com/nlpie/biomedicus
A lightweight (small and dependency-free) Java 8 library for Penn-like tokenization. This was developed as a stand-alone component of BioMedICUS, a biomedical and clinical NLP engine developed by the NLP-IE Group at the University of Minnesota Institute for Health Informatics.
To use in a maven project, include the following in your pom:
<dependencies>
<dependency>
<groupId>edu.umn.biomedicus</groupId>
<artifactId>biomedicus-tokenization</artifactId>
<version>0.0.3</version>
</dependency>
</dependencies>
Alternatively, download the .jar and include that in your libraries.
import edu.umn.biomedicus.tokenization.Tokenizer;
import edu.umn.biomedicus.tokenization.TokenResult;
public class Example {
public void example() {
String text = "An example sentence.";
for (TokenResult result : Tokenizer.tokenize(text)) {
CharSequence tokenText = result.text(text);
}
}
}
import edu.umn.biomedicus.tokenization.Tokenizer;
import edu.umn.biomedicus.tokenization.TokenResult;
public class Example {
public void example() {
String text = "An example sentence.";
List<TokenResult> results = Tokenizer.allTokens(text);
for (TokenResult result : results) {
CharSequence tokenText = result.text(text);
}
}
}
You can find the api documentation for this project here
For issues or enhancement requests, feel free to submit to the Issues tab on GitHub.
BioMedICUS has a gitter chat and a Google Group for contacting developers with questions, suggestions or feedback.
BioMedICUS is developed by the University of Minnesota Institute for Health Informatics NLP/IE Group with assistance from the Open Health Natural Language Processing (OHNLP) Consortium.
Anyone is welcome and encouraged to contribute. If you discover a bug, or think the project could use an enhancement, follow these steps:
- Create an issue and offer to code a solution. We can discuss the issue and decide whether any code would be a good addition to the project.
- Fork the project. [https://github.com/nlpie/biomedicus-tokenizer/fork]
- Create Feature branch (
git checkout -b feature-name
) - Code your solution.
- Follow the Google style guide for Java. There are IDE profiles available here.
- Write unit tests for any non-trivial aspects of your code. If you are fixing a bug write a regression test: one that confirms the behavior you fixed stays fixed.
- Commit to branch. (
git commit -am 'Summary of changes'
) - Push to GitHub (
git push origin feature-name
) - Create a pull request on this repository from your forked project. We will review and discuss your code and merge it.