Closed-ended classifiers: Inconsistency in handling blank strings #4

erleholgersen · 2017-05-30T02:24:30Z

Blank strings/ strings that contain none of the feature words are currently handled differently by the three closed-ended classifiers.

For form and target, such strings are predicted to belong to the most common class in the training set (rally/ demonstration and domestic government, respectively). For issue, they are classified as none, which is not the most common class in the training set.

See page 19 of Alex's thesis chapter 2, and the following example code

import pandas as pd
from mpeds.classify_protest import MPEDS

test_classifier = MPEDS()
test_data = pd.Series(['', 'avocados and grapefruits'])


test_classifier.getIssue(test_data)
test_classifier.getForm(test_data)
test_classifier.getTarget(test_data)

The text was updated successfully, but these errors were encountered:

alexhanna · 2017-05-31T13:14:28Z

Oh, that seems bizarre. We should probably return a Nonetype if this is the case and throw a warning that says something like "No words found in vectorizer."

erleholgersen changed the title ~~Closed-ended classifiers: Inconsistency in handing blank strings~~ Closed-ended classifiers: Inconsistency in handling blank strings May 30, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Closed-ended classifiers: Inconsistency in handling blank strings #4

Closed-ended classifiers: Inconsistency in handling blank strings #4

erleholgersen commented May 30, 2017 •

edited

Loading

alexhanna commented May 31, 2017

Closed-ended classifiers: Inconsistency in handling blank strings #4

Closed-ended classifiers: Inconsistency in handling blank strings #4

Comments

erleholgersen commented May 30, 2017 • edited Loading

alexhanna commented May 31, 2017

erleholgersen commented May 30, 2017 •

edited

Loading