TextSearch

TextSearch.jl is a package to create vector representations of text, mostly, independently of the language. It is intended to be used with SimilaritySearch.jl, but can be used independetly if needed. TextSearch.jl was renamed from TextModel.jl to reflect its capabilities and mission.

For generic text analysis you should use other packages like TextAnalysis.jl.

It supports a number of simple text preprocessing functions, and three different kinds of tokenizers, i.e., word n-grams, character q-grams, and skip-grams. It supports creating multisets of tokens, commonly named bag of words (BOW). TextSearch.jl can produce sparse vector representations based on term-weighting schemes like TF, IDF, and TFIDF. It also supports term-weighting schemes designed to cope text classification tasks, mostly based on distributional representations.

Installing

You may install the package as follows

] add TextSearch

also, you can run the set of tests as follows

] test TextSearch

Using the library

The directory examples contains a few examples of how to use it, based on Pluto.jl

After cloning the repository, you must intantiate the directory.

using Pkg
pkg"instantiate"

once you instantiated your environment, just run Pluto notebook and explore the examples

using Pluto
Pluto.run()

Name		Name	Last commit message	Last commit date
Latest commit History 458 Commits
.github/workflows		.github/workflows
docs		docs
examples		examples
src		src
test		test
.codecov.yml		.codecov.yml
.gitignore		.gitignore
LICENSE		LICENSE
Project.toml		Project.toml
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TextSearch

Installing

Using the library

About

Releases 85

Packages

Contributors 4

Languages

License

sadit/TextSearch.jl

Folders and files

Latest commit

History

Repository files navigation

TextSearch

Installing

Using the library

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 85

Packages 0

Contributors 4

Languages

Packages