-
Notifications
You must be signed in to change notification settings - Fork 3
Query Pattern Recognition
dbraga edited this page Dec 16, 2014
·
6 revisions
This module aims to identify important, frequently appearing query patterns using probabilistic topic modeling. This process of pattern recognition is usually performed on a particular dataset of interest e.g. query exceptions. The patterns recognized for such a dataset can potentially help diagnose the common causes of query exceptions.
We use the Mallet open source tool for this purpose.
- Edit the
application.default.properties
file to specify the path of the exceptions file, the destination file to store the mallet usable dataset, location of the stopwords file and the output visualization directory
file.exceptions = /tmp/exceptions
file.exceptions.mallet-format = /tmp/exceptions-only
file.stopWords = thoth-topic-modeling/src/main/resources/stopwords.txt
directory.topicModeling.visualization = thoth-topic-modeling/viz/
-
Run the
ThothExceptionsToMallet.java
class to create a Mallet usable dataset of query exceptions -
Make sure that the Topic Modeling parameters are appropriately set inside the
application.default.properties
file.
directory.topicModeling.numTopics = 10
directory.topicModeling.numIterations = 1000
directory.topicModeling.numKeywordsToOutput = 50
Run the TopicModel.java
class to generate a group of csv files, each corresponding to a topic.
- Open the thoth-topic-modeling/viz/index.html in a web container to visualize the topics in the form of tag clouds.
python -m SimpleHTTPServer thoth-topic-modeling/viz/