Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed-ended classifiers: Inconsistency in handling blank strings #4

Open
erleholgersen opened this issue May 30, 2017 · 1 comment

Comments

@erleholgersen
Copy link
Contributor

erleholgersen commented May 30, 2017

Blank strings/ strings that contain none of the feature words are currently handled differently by the three closed-ended classifiers.

For form and target, such strings are predicted to belong to the most common class in the training set (rally/ demonstration and domestic government, respectively). For issue, they are classified as none, which is not the most common class in the training set.

See page 19 of Alex's thesis chapter 2, and the following example code

import pandas as pd
from mpeds.classify_protest import MPEDS

test_classifier = MPEDS()
test_data = pd.Series(['', 'avocados and grapefruits'])


test_classifier.getIssue(test_data)
test_classifier.getForm(test_data)
test_classifier.getTarget(test_data)

@erleholgersen erleholgersen changed the title Closed-ended classifiers: Inconsistency in handing blank strings Closed-ended classifiers: Inconsistency in handling blank strings May 30, 2017
@alexhanna
Copy link
Member

Oh, that seems bizarre. We should probably return a Nonetype if this is the case and throw a warning that says something like "No words found in vectorizer."

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants