Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is it expected behavior that a Sentence can have two fields with the same name? #280

Open
BeckySharp opened this issue Apr 10, 2021 · 7 comments

Comments

@BeckySharp
Copy link
Contributor

see title. This is currently possible, if not desired I can add a require or something

@myedibleenso
Copy link
Member

myedibleenso commented Jun 22, 2021

Hmm...

So, for example, chunks:

Sentence(
  numTokens=9,
  fields=Seq(
    TokensField(
      name="words",
      tokens=Seq("he", "wore", "an", "itsy", "bitsy", "teeny", "weeny", "yellow", "speedo"),
    ),
    TokensField(
      name="chunks",
      tokens=Seq("O", "O", "O", "B-ADJP", "I-ADJP", "I-ADJP",  "I-ADJP",  "I-ADJP",  "I-ADJP", "O"),
    ),
    TokensField(
      name="chunks",
      tokens=Seq("O", "O", "O", "B-NP", "I-NP", "I-NP",  "I-NP",  "I-NP",  "I-NP", "I-NP"),
    ),
    TokensField(
      name="chunks",
      tokens=Seq("B-NP", "O", "B-NP", "I-NP", "I-NP",  "I-NP",  "I-NP",  "I-NP", "I-NP"),
    ),
    TokensField(
      name="chunks",
      tokens=Seq("O", "B-DP", "I-DP", "I-DP", "I-DP",  "I-DP",  "I-DP",  "I-DP", "I-DP"),
    ),
    TokensField(
      name="chunks",
      tokens=Seq("O", "B-VP", "I-VP", "I-VP", "I-VP",  "I-VP",  "I-VP",  "I-VP", "I-VP"),
    )
  )
)

... is currently possible, right?

@marcovzla
Copy link
Contributor

There is nothing stopping you from making an OdinsonDocument like that, but why would you want to index that? what behavior do you expect?

@BeckySharp
Copy link
Contributor Author

i would expect odinson to warn me at index time at least that i'm going to be... overwriting?? things? doing something weird/bad or something? idk...
i'm not saying that it should be valid, rather I am asking if we should let people make the "mistake" or if we should issue a warning / error

@myedibleenso
Copy link
Member

I may not have understood the original issue. I wanted to share an example of what I thought this issue describes to anchor the discussion.

why would you want to index that? what behavior do you expect?

I'd like to be able to match different types of phrases (chunks in this example) in different rules. This example is meant to illustrate that these overlap or one might subsume another. An alternative to this is to name each of these fields differently (ex. nps, vps, etc.).

I don't want to end up with only the last entry with this name when indexing.

@myedibleenso
Copy link
Member

^I haven't tried indexing and querying that example, so I don't know what the current behavior is tbh.

@BeckySharp
Copy link
Contributor Author

BeckySharp commented Jun 22, 2021

@myedibleenso wanna make a unit test to see what currently happens, and then we can (a) decide what we think the behavior should be (should we allow it? should we disallow it?) and then we can (b) make sure we achieve that behavior?

@myedibleenso
Copy link
Member

Whatever behavior we decide we want, I agree that it should be tested.

Sure, I'll create a unit test as a starting point.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants