Skip to content

Implemented four retrieval models for search engine implementation and evaluated their performance.

Notifications You must be signed in to change notification settings

spoorva/CS6200-Final_Project

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

99 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CS6200-Final_Project

Search Engine Implementation

Members -
Nanditha Sundararajan
Poorva Sonparote
Shruti Parpattedar


Language used - Python and Java
Coded in version - Python 3.7.2

Setup

This code requires the following software packages installed for it to run successfully:

  • Python 3.7
    Download and install from "https://www.python.org/downloads/"
  • Lucene 4.7.2
    Download and install Lucene from
    https://lucene.apache.org/
    https://archive.apache.org/dist/lucene/java/4.7.2/
  • BeautifulSoup package
    Can be downloaded from "https://www.crummy.com/software/BeautifulSoup/"
    Can be installed using pip, by entering the following command in Terminal or Command Line :
    	 pip install beautifulsoup4
    

    Compile and Run

    Unzip the given solution folder into a local directory. All necessary files required to run this project will be extracted.

    Phase 1 -

    Task 1 - Four baseline runs
    Implementation of TFIDF, Query Likelihood Model (JM smoothed) and BM25 using python. The program internally call Indexer.py and Parser.py to parse and index the corpus.

    Implementation of Lucene's default retrieval model using Java. The helper program Query_cleaning.py cleans the queries so that they can be used by Lucene.

    Run the following commands -
    Task1-First3Runs.py
    Query_cleaning.py
    Lucene-proj/src/LuceneRun.java

    Task 2 - Query Enhancement
    Implementation of two query enhancement techniques - query time stemming and pseudo relevance with BM25 retrieval model. The program internally call Indexer.py and Parser.py to parse and index the corpus.

    Run the following commands -
    Task2-QueryTimeStemming.py
    Task2-PseudoRelevance.py

    Task 3 - Stopping and Stemming Index
    Implementation of stopped corpus with no stemming and stemmed corpus with stemmed queries with BM25 and IFIDF retrieval models. The program internally call Indexer.py and Parser.py to parse and index the corpus.

    Run the following commands -
    Task3-StoppedIndex.py
    Task3-StemmedIndex.py

    Phase 2 -

    Implementation of snippet generation and query highlighting. The program internally call Indexer.py and Parser.py to parse and index the corpus, and snippetGeneration.py for snippet generation.

    Run the following commands -
    Phase2Run.py

    Phase 3 -

    Ninth run - Query Expansion using Pseudo Relevance Feedback with Stopping
    Implementing query enhancement using pseudo relevance feedback and stopping. The program internally call Indexer.py and Parser.py to parse and index the corpus.

    Run the following command -
    Phase3Run.py

    Evaluation
    Evaluating the various runs based on MAP, MRR, Precision, Recall, Precision @5 and @20 and Recall @5 and @20.
    Reads the list of runs to be evaluated from a file named Output_files_list.txt.

    Run the following command -
    Evaluation.py

    Extra Credit -

    Implementation of a search engine based on the Relevance Model using pseudo-relevance feedback and KL-Divergence for scoring. The program internally call Indexer.py and Parser.py to parse and index the corpus.

    Run the following command -
    KL_Divergence.py

    All the outputs are stored in the Outputs Folder and all the evaluation results, along with the compiled evaluations and MAP-MRR summary are stored in Evaluation folder.

  • About

    Implemented four retrieval models for search engine implementation and evaluated their performance.

    Topics

    Resources

    Stars

    Watchers

    Forks

    Releases

    No releases published

    Packages

    No packages published

    Contributors 3

    •  
    •  
    •