You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
thetaWeightedAvg=sums*thetathetaWeightedAvg=thetaWeightedAvg/num_docsprint('\nThe 10 most used topics are {}'.format(thetaWeightedAvg.argsort()[::-1][:10]))
From my understanding, the product of each document frequency (sums) with document-topic probabilities theta amplifies or reduces probability-based on the actual probability. And the average provides some insights on which topics are important in the whole corpus. Is it right? Also, what would be the difference if we only average the document-topic proportions (no weighting)
The text was updated successfully, but these errors were encountered:
I think the best topic can be selected according to task requirements. For example, if you want the easiest to explain, you can choose the topic consistency index; or to better fit the data, you can choose the confusion index
Could you please elaborate on the consistency/confusion index? I thought it was a way of selecting the most used topics one by doc_frequency and topic proportion
maybe this question is dumb but I don't understand why the average of the weighted document-topic-proportions is a metric for the most important topics?
From my understanding, the product of each document frequency (
sums
) with document-topic probabilitiestheta
amplifies or reduces probability-based on the actual probability. And the average provides some insights on which topics are important in the whole corpus. Is it right? Also, what would be the difference if we only average the document-topic proportions (no weighting)The text was updated successfully, but these errors were encountered: