Skip to content
/ rake4j Public

A Java implementation of the Rapid Automatic Keyword Extraction (RAKE)

License

Notifications You must be signed in to change notification settings

idf/rake4j

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

45 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

rake4j

This is a re-write of Python RAKE in Java.

An implementation of the Rapid Automatic Keyword Extraction (RAKE) algorithm as described in: Rose, S., Engel, D., Cramer, N., & Cowley, W. (2010). Automatic Keyword Extraction from Individual Documents

Run

Sample

Normal run

        Document doc = new Document(text);
        RakeAnalyzer rake = new RakeAnalyzer();
        rake.loadDocument(doc);
        rake.runWithoutOffset();
        System.out.println(doc.termListToString());

Run with offset information and stemming

        Document doc = new Document(text);
        RakeAnalyzer rake = new RakeAnalyzer();
        rake.loadDocument(doc);
        rake.run();
        System.out.println(doc.termMapToString());

Features

Recognized keywords from the algorithm based on stop words

  • Adjoining keywords to recognized "axis of evil".
  • KStemming algorithm ported from Lucene, to stem "university students" to "university student".
  • Construct index of keywords with term frequency tf and document frequency df.

Dependencies

In pom.xml, another custom maven module dependency is required:

        <dependency>
            <groupId>io.deepreader.java.commons</groupId>
            <artifactId>commons-util</artifactId>
            <version>1.0-SNAPSHOT</version>
        </dependency>

You can get the module manually by:

git clone https://github.com/idf/commons-util

, which is hosted here.

References

Python RAKE
Python RAKE (forked)
Java RAKE

About

A Java implementation of the Rapid Automatic Keyword Extraction (RAKE)

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages