Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/corpus analysis #183

Open
wants to merge 3 commits into
base: develop
Choose a base branch
from

Conversation

flavioamieiro
Copy link
Member

Adds worker to calculate the FreqDist of a corpus

This is the first draft of a worker that can get a corpus and create an
analysis for it. This first attempt was a freqdist worker, that takes the
freqdist for each document in the corpus and condensates it in a new analysis:
the freqtdist for the entire corpus.

This is a work in progress because I was mainly worried with the basis for this
to work (specially the celery task). I did not pay any attention to the way the
worker itself is working (it's probably doing more work than it needs to), and
it also probably needs more tests.
from utils import TaskTest


class TestCorpusFreqDistWorker(TaskTest):
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't it be better to test PyPLNCorpusTask separately from CorpusFreqDist? Then later if another subclass of PyPLNCorpusTask is created only the returned dict would need to be checked.

Also, is this hitting an actual mongo instance? If so, would you consider mocking the db methods?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right. I was testing both in the same test case (and not testing correctly). I separated the tests and I think it's better now.

It is really hitting an actual mongo instance. This is inherited from the old days when MongoDict was still part of our codebase. It's also one of the reasons our tests are slow. I would be very glad to mock everything and have better, more isolated and quicker tests. I would probably need your help, though @geron :)

…ests

Thanks @geron for pointing out that I was testing everything together
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants