Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Suppress less informative extractions #43

Open
schmmd opened this issue Dec 19, 2012 · 3 comments
Open

Suppress less informative extractions #43

schmmd opened this issue Dec 19, 2012 · 3 comments
Assignees

Comments

@schmmd
Copy link
Member

schmmd commented Dec 19, 2012

Our original intention with Ollie was to find as many correct extractions as possible. This way, since each application will have specific requirements, they could write simple logic to keep what they're interested in.

However, there often are strictly less informative extractions and these are not useful for many applications. For example, you might have the following:

Superintendent Janet Robinson said Adam Lanza attended Sandy Hook elementary, although she could not remember the year.

(he, was a student at, Sandy Hook)
(he, was a student at, some point)
(he, was, a student)

The last extraction is strictly less informative than the first. While in some applications (i.e. search) we may want all three so we have results for more queries, for others (i.e. document summarization) we don't want the third because it is redundant. Ollie should have an option of suppressing strictly less informative extractions.

@ghost ghost assigned schmmd Dec 19, 2012
@niranjanb
Copy link
Member

Yes, I think it would be useful to provide a means for suppressing strictly less informative extractions.

In addition to having a option for filtering, provide access to the logic that determines if an extraction is strictly less informative than another extraction from the same sentence.

def subsumes(other:OllieExtraction)

Going on a bit of a tangent and expanding a little on this idea, the following extraction comparison methods might be useful:

def overlapsWith(other:OllieExtraction)

def sharesArg1(other:OllieExtraction)

def sharesArg2(other:OllieExtraction)

def sharesArg(other:OllieExtraction)

Perhaps, these capabilities do not belong to the core ollie library but instead belong in a separate ollie-utils library, which provides mechanisms to manipulate and transform extractions (argument, relation string normalizations, and equality under these transformations).

Applications can (and in many cases need to) write their own logic to do these things but having a default implementation that comes with the Ollie library sounds useful to me.

@schmmd
Copy link
Member Author

schmmd commented Dec 26, 2012

Niranjan, just compare the intervals if you want this functionality. I.e. extr.arg1.span overlaps extr.arg2.span or extr.span overlaps extr2.span. Or you could compare if the nodes intersect. extr.arg1.nodes intersect extr.arg2.nodes == Set.empty or extr1.nodes intersect extr2.nodes == Set.empty. If you want to check if two extractions have the same arg1, you just need to do extr1.arg1 == extr2.arg1. Adding methods that perform simple operations that can already be expressed succinctly only adds a layer of confusion (both what does the method do and which method should I use).

@schmmd
Copy link
Member Author

schmmd commented Dec 26, 2012

But I'd love to have a normalization routine for relations or arguments. If you have some, let's talk about it sometime.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants