Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Respect tag rules for custom tags #1155

Open
tony-scio opened this issue Oct 23, 2024 · 3 comments
Open

Respect tag rules for custom tags #1155

tony-scio opened this issue Oct 23, 2024 · 3 comments

Comments

@tony-scio
Copy link
Contributor

I'd like to be able to define custom tags that are applied only if their allowed parts of speech are respected. It looks like the plugin interface tries to support that, but it doesn't seem to be working. Wondering if this is a bug, feature request, or if there's another way to accomplish this. Here's what I tried:

nlp.plugin({
  tags: {
    Employee: {
      also: ['ProperNoun'],
      not: ['Verb', 'Adverb', 'Adjective'],
    },
  },
  words: {
    will: 'Employee',
  },
})

nlp('Will is an employee').match('#Employee') // Matches like I expect.
nlp('I will go to the store').match('#Employee') // Matches, but I expected not to match since "will" is used as a verb and "Employee" is defined not to be a verb.
@spencermountain
Copy link
Owner

hey Tony, you're right - there's a number of things going wrong with this example. Apologies for the confusion.

Let me look at fixing the default 'will is' tagging. Your plugin looks correct. You may be interested in the freeze() feature, to co-erce all 'will' appearances (co-erce them to your 'will' ?). There's a lot of gross overlap, when the user-defined lex gets beat-up by downstream tagging changes. This freeze feature is supposed to remedy this.

Will put this on the pile, for the next release. thanks
cheers

@tony-scio
Copy link
Contributor Author

tony-scio commented Oct 27, 2024

Thanks! If you point me in the right direction, I could also take a stab at a PR.

Regarding freeze, I see how it can enforce my custom lexicon, but don't see how I can tell it to enforce the default lexicon and apply a custom one only if it fits (at least in a way that'd work against multiple docs). If you wouldn't mind, could you write a couple of lines that'd use freeze to make the above example work on two different docs?

@spencermountain
Copy link
Owner

yeah, i'm torn about this too, and the lex vs freeze thing has a lot of gross mystery to it.
I would use the default lex, and cleanup any tagging issues with match().tag() statements.

doc.match('#ProperNoun [will] #Infinitive', 'Verb')
doc.match('[will] #Copula', 'Employee')

that way you're always in control over what you get, and there's no fancy-biz. (or at least, less!)
cheers

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants