Respect tag rules for custom tags #1155

tony-scio · 2024-10-23T00:43:26Z

I'd like to be able to define custom tags that are applied only if their allowed parts of speech are respected. It looks like the plugin interface tries to support that, but it doesn't seem to be working. Wondering if this is a bug, feature request, or if there's another way to accomplish this. Here's what I tried:

nlp.plugin({
  tags: {
    Employee: {
      also: ['ProperNoun'],
      not: ['Verb', 'Adverb', 'Adjective'],
    },
  },
  words: {
    will: 'Employee',
  },
})

nlp('Will is an employee').match('#Employee') // Matches like I expect.
nlp('I will go to the store').match('#Employee') // Matches, but I expected not to match since "will" is used as a verb and "Employee" is defined not to be a verb.

spencermountain · 2024-10-27T15:03:15Z

hey Tony, you're right - there's a number of things going wrong with this example. Apologies for the confusion.

Let me look at fixing the default 'will is' tagging. Your plugin looks correct. You may be interested in the freeze() feature, to co-erce all 'will' appearances (co-erce them to your 'will' ?). There's a lot of gross overlap, when the user-defined lex gets beat-up by downstream tagging changes. This freeze feature is supposed to remedy this.

Will put this on the pile, for the next release. thanks
cheers

tony-scio · 2024-10-27T17:43:30Z

Thanks! If you point me in the right direction, I could also take a stab at a PR.

Regarding freeze, I see how it can enforce my custom lexicon, but don't see how I can tell it to enforce the default lexicon and apply a custom one only if it fits (at least in a way that'd work against multiple docs). If you wouldn't mind, could you write a couple of lines that'd use freeze to make the above example work on two different docs?

spencermountain · 2024-10-31T16:09:55Z

yeah, i'm torn about this too, and the lex vs freeze thing has a lot of gross mystery to it.
I would use the default lex, and cleanup any tagging issues with match().tag() statements.

doc.match('#ProperNoun [will] #Infinitive', 'Verb')
doc.match('[will] #Copula', 'Employee')

that way you're always in control over what you get, and there's no fancy-biz. (or at least, less!)
cheers

spencermountain added bug tagger labels Oct 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Respect tag rules for custom tags #1155

Respect tag rules for custom tags #1155

tony-scio commented Oct 23, 2024

spencermountain commented Oct 27, 2024

tony-scio commented Oct 27, 2024 •

edited

Loading

spencermountain commented Oct 31, 2024

Respect tag rules for custom tags #1155

Respect tag rules for custom tags #1155

Comments

tony-scio commented Oct 23, 2024

spencermountain commented Oct 27, 2024

tony-scio commented Oct 27, 2024 • edited Loading

spencermountain commented Oct 31, 2024

tony-scio commented Oct 27, 2024 •

edited

Loading