rake4j

This is a re-write of Python RAKE in Java.

An implementation of the Rapid Automatic Keyword Extraction (RAKE) algorithm as described in: Rose, S., Engel, D., Cramer, N., & Cowley, W. (2010). Automatic Keyword Extraction from Individual Documents

Run

Sample

Normal run

        Document doc = new Document(text);
        RakeAnalyzer rake = new RakeAnalyzer();
        rake.loadDocument(doc);
        rake.runWithoutOffset();
        System.out.println(doc.termListToString());

Run with offset information and stemming

        Document doc = new Document(text);
        RakeAnalyzer rake = new RakeAnalyzer();
        rake.loadDocument(doc);
        rake.run();
        System.out.println(doc.termMapToString());

Features

Recognized keywords from the algorithm based on stop words

Adjoining keywords to recognized "axis of evil".
KStemming algorithm ported from Lucene, to stem "university students" to "university student".
Construct index of keywords with term frequency tf and document frequency df.

Dependencies

In pom.xml, another custom maven module dependency is required:

        <dependency>
            <groupId>io.deepreader.java.commons</groupId>
            <artifactId>commons-util</artifactId>
            <version>1.0-SNAPSHOT</version>
        </dependency>

You can get the module manually by:

git clone https://github.com/idf/commons-util

, which is hosted here.

References

Python RAKE
Python RAKE (forked)
Java RAKE

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

rake4j

Run

Sample

Features

Dependencies

References

Files

README.md

Latest commit

History

README.md

File metadata and controls

rake4j

Run

Sample

Features

Dependencies

References