This is a re-write of Python RAKE in Java.
An implementation of the Rapid Automatic Keyword Extraction (RAKE) algorithm as described in: Rose, S., Engel, D., Cramer, N., & Cowley, W. (2010). Automatic Keyword Extraction from Individual Documents
Normal run
Document doc = new Document(text);
RakeAnalyzer rake = new RakeAnalyzer();
rake.loadDocument(doc);
rake.runWithoutOffset();
System.out.println(doc.termListToString());
Run with offset information and stemming
Document doc = new Document(text);
RakeAnalyzer rake = new RakeAnalyzer();
rake.loadDocument(doc);
rake.run();
System.out.println(doc.termMapToString());
Recognized keywords from the algorithm based on stop words
- Adjoining keywords to recognized "axis of evil".
- KStemming algorithm ported from Lucene, to stem "university students" to "university student".
- Construct index of keywords with term frequency
tf
and document frequencydf
.
In pom.xml, another custom maven module dependency is required:
<dependency>
<groupId>io.deepreader.java.commons</groupId>
<artifactId>commons-util</artifactId>
<version>1.0-SNAPSHOT</version>
</dependency>
You can get the module manually by:
git clone https://github.com/idf/commons-util
, which is hosted here.