Skip to content

Program which searches through thousands of papers and tries to match a certain input string

Notifications You must be signed in to change notification settings

nickhir/PhraseBase

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PhraseBase

This is a really small program, which can be used to quickly search through thousands of different papers for a specific phrase or expression.

Installation

  1. Download the papers through which the program will search from kaggle. Kaggle dowload

  2. unzip the document_parses.zip file. This might take very long, since approximately 200.000 papers are included. Unzipping everything will take up around 20GB of disk space. However, you can stop the unziping at any point and only work with a fraction of the papers, e.g. only 10.000 papers.

  3. clone the repository to the desired location and create an enviroment which contains all necessary packages

    git clone https://github.com/nickhir/PhraseBase.git
    cd PhraseBase
    conda env create -f environment.yml
    conda activate PhraseBase
    
  4. Lastly, modify path.txt so that the path points to the directory where the json files are, which you downloaded in step 1. The path has to look something like this path/to/the/directory.

How to use

Simply run python phrasebase.py inside of the PhraseBase enviroment. Afterwards you can type in the string you are looking for. Regular expressions can be used. The program will then iterate through ever json file which is located in the path that you specified. You can stop this process at any time by pressing ctrl+c to look at the results.

You can also run python phrasebase.py --IGNORECASE, which will ignore subsequently ignore upper/lower case of your input string.

Running python phrasebase.py --exact will only return exact matches.

showcase

About

Program which searches through thousands of papers and tries to match a certain input string

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages