From d46a854b8c9a58f3fd80a537aeb258b78d471541 Mon Sep 17 00:00:00 2001 From: Sanjaya Kumar Saxena Date: Sun, 24 Mar 2024 20:01:06 +0530 Subject: [PATCH] docs(README): add details of word vector support Co-authored-by: Rachna --- README.md | 10 +++++++--- 1 file changed, 7 insertions(+), 3 deletions(-) diff --git a/README.md b/README.md index 07d20e9..6b8b3eb 100644 --- a/README.md +++ b/README.md @@ -7,6 +7,8 @@ WinkNLP is a JavaScript library for Natural Language Processing (NLP). Designed specifically to make development of NLP applications **easier** and **faster**, winkNLP is optimized for the right balance of performance and accuracy. +Its word embedding support unlocks deeper text analysis. Represent words and text as numerical vectors with ease, bringing higher accuracy in tasks like semantic similarity, text classification, and beyond – even within a browser. + It is built ground up with [no external dependency](https://snyk.io/test/github/winkjs/wink-nlp?tab=dependencies) and has a [lean code base of ~10Kb minified & gzipped](https://bundlephobia.com/package/wink-nlp). A test coverage of [~100%](https://coveralls.io/github/winkjs/wink-nlp?branch=master) and compliance with the [Open Source Security Foundation best practices](https://bestpractices.coreinfrastructure.org/en/projects/6035) make winkNLP the ideal tool for building production grade systems with confidence. WinkNLP with full [Typescript support](https://github.com/winkjs/wink-nlp/blob/master/types/index.d.ts), runs on Node.js, [web browsers](https://github.com/winkjs/wink-nlp#how-to-install-for-web-browser) and [Deno](https://github.com/winkjs/wink-nlp#how-to-run-on-deno). @@ -35,11 +37,13 @@ WinkNLP has a [comprehensive natural language processing (NLP) pipeline](https:/ 🖼 Best-in-class text visualizationProgrammatically mark tokens, sentences, entities, etc. using HTML mark or any other tag of your choice. ♻️ Extensive text processing featuresRemove and/or retain tokens with specific attributes such as part-of-speech, named entity type, token type, stop word, shape and many more; compute Flesch reading ease score; generate n-grams; normalize, lemmatise or stem. Checkout how with the right kind of text preprocessing, even Naive Bayes classifier achieves impressive (≥90%) accuracy in sentiment analysis and chatbot intent classification tasks. 🔠 Pre-trained language modelsCompact sizes starting from ~1MB (minified & gzipped) – reduce model loading time drastically down to ~1 second on a 4G network. -💼 Host of utilities & toolsBM25 vectorizer; Several similarity methods – Cosine, Tversky, Sørensen-Dice, Otsuka-Ochiai; Helpers to get bag of words, frequency table, lemma/stem, stop word removal and many more. +↗️ Word vectors100-dimensional English word embeddings for over 350K English words, which are optimized for winkNLP. Allows easy computation of sentence or document embeddings. - -> WinkJS also has packages like [Naive Bayes classifier](https://github.com/winkjs/wink-naive-bayes-text-classifier), [multi-class averaged perceptron](https://github.com/winkjs/wink-perceptron) and [popular token and string distance methods](https://github.com/winkjs/wink-distance), which complement winkNLP. +### Utilities & Tools 💼 +- [BM25 Vectorizer](https://winkjs.org/wink-nlp/bm25-vectorizer.html) +- [Similarity methods](https://winkjs.org/wink-nlp/similarity.html) – Cosine, Tversky, Sørensen-Dice, Otsuka-Ochiai +- [its & as helpers](https://winkjs.org/wink-nlp/its-as-helper.html) to get Bag of Words, Frequency table, Lemma, Stem, Stop word removal, Negation handling and many more. ## Documentation