Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expose Tantivy's DisjunctionMaxQuery #244

Merged
merged 2 commits into from
Apr 24, 2024

Conversation

aecio
Copy link
Contributor

@aecio aecio commented Apr 22, 2024

Added the Query.disjunction_max_query method and corresponding tests.

Expected usage:

query1 = Query.term_query(index.schema, "title", "sea") 
query2 = Query.term_query(index.schema, "title", "mice") 
query = Query.disjunction_max_query([query1, query2], tie_breaker=0.5)

result = index.searcher().search(query, 10)

See also issue #20.

@aecio aecio force-pushed the expose-disjunction-max-query branch from a5e1f75 to f9afdfa Compare April 22, 2024 18:47
@alex-au-922
Copy link
Contributor

Seems our function signatures for both in Rust and in Python (BooleanQuery and DistjunctionMaxQuery) are quite different. Would you mind reviewing my commit as well #243 so that we might come up with a more consistent function signature?

@aecio
Copy link
Contributor Author

aecio commented Apr 23, 2024

It looks like the main difference is the use of PyList vs. Vec for the subqueries parameter. I can try to change it to use Vec instead if you think it's better. I assume PyO3 checks the types automatically in that case, so it's not necessary to check the types manually in the Rust implementation? I'll rebase and try to make these changes tomorrow.

@alex-au-922
Copy link
Contributor

alex-au-922 commented Apr 23, 2024

I have no preference actually, perhaps @cjrh may provide us some feedback? Whether you think PyList or Vec is better for the project. I can submit another PR for the BooleanQuery.

@cjrh
Copy link
Collaborator

cjrh commented Apr 23, 2024

My preference is to use the Rust types like Vec in general, and take advantage of PyO3's automatic conversion, like in the BooleanQuery PR. Thanks @alex-au-922 for having a look at this PR too.

@@ -197,6 +197,10 @@ class Query:
def all_query() -> Query:
pass

@staticmethod
def disjunction_max_query(subqueries: list[Query], tie_breaker: Optional[float]) -> Query:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def disjunction_max_query(subqueries: list[Query], tie_breaker: Optional[float]) -> Query:
def disjunction_max_query(subqueries: list[Query], tie_breaker: Optional[float] = None) -> Query:

[nit] According to PyO3 doc, tie_breaker would be automatically set to None (not tested.) If so, specifying that might be useful for users to save typing a bit.

As a convenience, functions without a #[pyo3(signature = (...))] option will treat trailing Option arguments as having a default of None.

https://pyo3.rs/v0.21.0/function/signature#trailing-optional-arguments

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does seem to set None automatically. Currently both of the following calls work:

query = Query.disjunction_max_query([query1, query2])
query = Query.disjunction_max_query([query1, query2], tie_breaker=0.5)

In the latest commit, I added a default value of None to the Python signature. Does this address the comment or did I miss the point?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for including my suggestion in the commit!

@aecio aecio force-pushed the expose-disjunction-max-query branch from f9afdfa to 71df684 Compare April 23, 2024 16:53
@aecio
Copy link
Contributor Author

aecio commented Apr 23, 2024

Hi all, thanks for the review!
I just pushed a new commit with the following Python signatures and using Rust's Vec<Query>:

    @staticmethod
    def boolean_query(subqueries: Sequence[tuple[Occur, Query]]) -> Query:
        pass

    @staticmethod
    def disjunction_max_query(subqueries: Sequence[Query], tie_breaker: Optional[float] = None) -> Query:
        pass
    #[staticmethod]
    pub(crate) fn disjunction_max_query(
        subqueries: Vec<Query>,
        tie_breaker: Option<&PyFloat>,
    ) -> PyResult<Query>

Let me know if this addresses the comments.

@cjrh
Copy link
Collaborator

cjrh commented Apr 24, 2024

I'm fixing the CI issue here: #251

When that's merged these tests can rerun.

@cjrh
Copy link
Collaborator

cjrh commented Apr 24, 2024

I approved the PR, just waiting to get a clean test run before merging.

@wallies wallies merged commit deb88cc into quickwit-oss:master Apr 24, 2024
@aecio
Copy link
Contributor Author

aecio commented Apr 24, 2024

Thank you all for the reviews and merging.

@cjrh
Copy link
Collaborator

cjrh commented Apr 24, 2024

@aecio No, thank you for your contribution 👏🏼 🙇🏼

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants