Skip to content

Latest commit

 

History

History
49 lines (45 loc) · 1.79 KB

README.md

File metadata and controls

49 lines (45 loc) · 1.79 KB

rake4j

This is a re-write of Python RAKE in Java.

An implementation of the Rapid Automatic Keyword Extraction (RAKE) algorithm as described in: Rose, S., Engel, D., Cramer, N., & Cowley, W. (2010). Automatic Keyword Extraction from Individual Documents

Run

Sample

Normal run

        Document doc = new Document(text);
        RakeAnalyzer rake = new RakeAnalyzer();
        rake.loadDocument(doc);
        rake.runWithoutOffset();
        System.out.println(doc.termListToString());

Run with offset information and stemming

        Document doc = new Document(text);
        RakeAnalyzer rake = new RakeAnalyzer();
        rake.loadDocument(doc);
        rake.run();
        System.out.println(doc.termMapToString());

Features

Recognized keywords from the algorithm based on stop words

  • Adjoining keywords to recognized "axis of evil".
  • KStemming algorithm ported from Lucene, to stem "university students" to "university student".
  • Construct index of keywords with term frequency tf and document frequency df.

Dependencies

In pom.xml, another custom maven module dependency is required:

        <dependency>
            <groupId>io.deepreader.java.commons</groupId>
            <artifactId>commons-util</artifactId>
            <version>1.0-SNAPSHOT</version>
        </dependency>

You can get the module manually by:

git clone https://github.com/idf/commons-util

, which is hosted here.

References

Python RAKE
Python RAKE (forked)
Java RAKE