WebSearch

Chinese Report for this experiment is report.md. Original news data crawled is ~1.9G, and will not be provided here. While you can download the processed data.

To use this search engine, you need to extract output.zip to ./output folder, and just run bool_search.py or semantic_search.py, process.py is used for process raw data and generated output only.

Note: The code quality is broken including but not limited to ill-formed class methods, mixed OPP and OOP codes, etc. Please issue pull requests if you want to make it better, the authors are just too busy or lazy to fix these.

Dependencies

Install pip dependencies

pip install -r requirements.txt
Download NLTK data and set NLTK_DATA to download path.

wordnet -> NLTK_DATA/corpora/wordnet

stopwords -> NLTK_DATA/corpora/stopwords
Download pre-processed data(stated above)

You can adjust some parameters in source code to get better searching experience.

Credits

Thanks to SuzanaK for synonyms list (licensed under BY-SA 3.0), and all open source tools used in this project.

Name		Name	Last commit message	Last commit date
Latest commit History 76 Commits
dataset		dataset
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
bool_search.py		bool_search.py
process.py		process.py
report.md		report.md
requirements.txt		requirements.txt
semantic_search.py		semantic_search.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

WebSearch

Dependencies

Credits

About

Contributors 2

Languages

License

Catoverflow/WebSearch

Folders and files

Latest commit

History

Repository files navigation

WebSearch

Dependencies

Credits

About

Topics

Resources

License

Stars

Watchers

Forks

Contributors 2

Languages