Releases · x-tabdeveloping/turftopic

05 Nov 07:34

x-tabdeveloping

v0.8.0

1629496

v0.8.0 Latest

Latest

Automated Topic Naming

Turftopic now allows you to automatically assign human readable names to topics using LLMs or n-gram retrieval!

from turftopic import KeyNMF
from turftopic.namers import OpenAITopicNamer

model = KeyNMF(10).fit(corpus)

namer = OpenAITopicNamer("gpt-4o-mini")
model.rename_topics(namer)
model.print_topics()

Topic ID	Topic Name	Highest Ranking
0	Operating Systems and Software	windows, dos, os, ms, microsoft, unix, nt, memory, program, apps
1	Atheism and Belief Systems	atheism, atheist, atheists, belief, religion, religious, theists, beliefs, believe, faith
2	Computer Architecture and Performance	motherboard, ram, memory, cpu, bios, isa, speed, 486, bus, performance
3	Storage Technologies	disk, drive, scsi, drives, disks, floppy, ide, dos, controller, boot
	...

Assets 2

23 Oct 13:16

x-tabdeveloping

v0.7.0

0bc8a0e

v0.7.0

New in version 0.7.0

Component re-estimation, refitting and topic merging

Some models can now easily be modified after being trained in an efficient manner,
without having to recompute all attributes from scratch.
This is especially significant for clustering models and $S^3$.

from turftopic import SemanticSignalSeparation, ClusteringTopicModel

s3_model = SemanticSignalSeparation(5, feature_importance="combined").fit(corpus)
# Re-estimating term importances
s3_model.estimate_components(feature_importance="angular")
# Refitting S^3 with a different number of topics (very fast)
s3_model.refit(n_components=10, random_seed=42)

clustering_model = ClusteringTopicModel().fit(corpus)
# Reduces number of topics automatically with a given method
clustering_model.reduce_topics(n_reduce_to=20, reduction_method="smallest")
# Merge topics manually
clustering_model.join_topics([0,3,4,5])
# Resets original topics
clustering_model.reset_topics()
# Re-estimates term importances based on a different method
clustering_model.estimate_components(feature_importance="centroid")

Manual topic naming

You can now manually label topics in all models in Turftopic.

# you can specify a dict mapping IDs to names
model.rename_topics({0: "New name for topic 0", 5: "New name for topic 5"})
# or a list of topic names
model.rename_topics([f"Topic {i}" for i in range(10)])

Saving, loading and publishing to HF Hub

You can now load, save and publish models with dedicated functionality.

from turftopic import load_model

model.to_disk("out_folder/")
model = load_model("out_folder/")

model.push_to_hub("your_user/model_name")
model = load_model("your_user/model_name")

Assets 2

25 Jun 11:04

x-tabdeveloping

v0.4.0

1dbf359

v0.4.0

Release Highlights:

1. Online KeyNMF

KeyNMF can now be fitted in an online fashion in batches:

from itertools import batched
from turftopic import KeyNMF

model = KeyNMF(10, top_n=5)

corpus = ["some string", "etc", ...]
for batch in batched(corpus, 200):
    batch = list(batch)
    model.partial_fit(batch)

2. Precompute keyword matrices in KeyNMF

You can precompute the keyword matrix of KeyNMF models and then use them in training.

model.extract_keywords(["Cars are perhaps the most important invention of the last couple of centuries. They have revolutionized transportation in many ways."])

[{'transportation': 0.44713873,
  'invention': 0.560524,
  'cars': 0.5046208,
  'revolutionized': 0.3339205,
  'important': 0.21803442}]

keyword_matrix = model.extract_keywords(corpus)
model.fit(keywords=keyword_matrix)

3. Concept Compass in $S^3$

You can now produce a concept compass figure with $S^3$ similar to that in the paper:

from turftopic import SemanticSignalSeparation

model = SemanticSignalSeparation(10).fit(corpus)

# You will need to `pip install plotly` before this.
fig = model.concept_compass(topic_x=1, topic_y=4)
fig.show()

4. Bugfixes in Dynamic Modeling

Binning is now fixed in dynamic modeling and will create the appropriate number of time slices when asked to. The first time slice is not left out either.

Assets 2

10 Jun 08:09

x-tabdeveloping

v0.3.0

7117f23

v0.3.0

Highlight: Dynamic KeyNMF

From version 0.3.0 you can use KeyNMF for dynamic topic modeling:

from datetime import datetime
from turftopic import KeyNMF

corpus: list[str] = [...]
timestamps = list[datetime] = [...]

model = KeyNMF(10)
doc_topic_matrix = model.fit_transform_dynamic(corpus, timestamps=timestamps, bins=10)

model.print_topics_over_time()

# This needs Plotly: pip install plotly
model.plot_topics_over_time()

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Automated Topic Naming

New in version 0.7.0

Component re-estimation, refitting and topic merging

Manual topic naming

Saving, loading and publishing to HF Hub

Release Highlights:

1. Online KeyNMF

2. Precompute keyword matrices in KeyNMF

3. Concept Compass in $S^3$

4. Bugfixes in Dynamic Modeling

Highlight: Dynamic KeyNMF

Releases: x-tabdeveloping/turftopic

v0.8.0

Automated Topic Naming

v0.7.0

New in version 0.7.0

Component re-estimation, refitting and topic merging

Manual topic naming

Saving, loading and publishing to HF Hub

v0.4.0

Release Highlights:

1. Online KeyNMF

2. Precompute keyword matrices in KeyNMF

3. Concept Compass in $S^3$

4. Bugfixes in Dynamic Modeling

v0.3.0

Highlight: Dynamic KeyNMF