Skip to content

Latest commit

 

History

History
22 lines (16 loc) · 843 Bytes

README.md

File metadata and controls

22 lines (16 loc) · 843 Bytes

WORDCHEF

PHP + pgSQL + pgvector webapp to take linear combinations of words

  • Uses spaCy to generate full vocab and corresponding wordvectors
  • Wordvectors are stored in a PostgreSQL database with pgvector which allows fast semantic search
  • Given two words, look up their wordcvectors and take the average
  • Find the nearest five words to the averaged vector

TOPIC MODELING

Available at https://wordchef.app/topic_modeling

Built from https://github.com/jacksonwalters/nlp/topic_modeling

  • User uploads comments .csv and keywords .csv
  • Load pre-trained gensim doc2vec model
  • Embed comments as docs to get vector embeddings
  • Use k-means to cluster comments into groups
  • Compute TF-IDF scores across clusters
  • Embed keywords/bigrams and compute nearest bigram to cluster centroid
  • Find nearest keyword/bigram to each comment