Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expose Tantivy's TermSetQuery #249

Merged
merged 1 commit into from
Apr 26, 2024
Merged

Conversation

aecio
Copy link
Contributor

@aecio aecio commented Apr 23, 2024

This PR adds the Query.term_set_query method and corresponding tests.

Expected usage:

terms = ["old", "man"]
query = Query.term_set_query(index.schema, "title", terms)
result = index.searcher().search(query, 10)

The new term_set_query method has a signature similar to the existing term_query:

@staticmethod
def term_query(schema: Schema, field_name: str, field_value: Any, index_option: str = "position") -> Query:
    pass

@staticmethod
def term_set_query(schema: Schema, field_name: str, field_values: Sequence[Any]) -> Query:
    pass

See also issue #20.

@aecio aecio force-pushed the expose-term-set-query branch from fb44194 to 90619e4 Compare April 24, 2024 02:19
src/query.rs Outdated
field_name: &str,
field_values: Vec<&PyAny>,
) -> PyResult<Query> {
let mut terms: Vec<tv::Term> = Vec::new();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we know how long field_values is, perhaps we can use with_capacity? Then we won't need multiple allocations in the loop. Just an idea.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, makes total sense. I wasn't aware of this API. I just updated and rebased the PR with the latest master.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for being late to the party. More often than not, it is preferable to use the built-in iterator adaptors with automatically do things like pre-allocate collections if possible, i.e. in this case collect should achieve the same effect but avoid adding the loop and allocation noise.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Specifically, something like

let terms = field_values
  .into_iter()
  .map(|field_value| {
    make_term(&schema.inner, field_name, &field_value).map_err(|err| {
      exceptions::PyTypeError::new_err(format!(
          "Cannot create a term from the value: {}",
          field_value
      ))
    })
  })
  .collect::<Result<Vec<_>, _>>()?;

should work and be just as efficient.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No worries, @adamreichold. I'm very new to Rust and I had tried a similar approach using iterator + map earlier but wasn't able to make the compiler happy. But with your snippet, I could make it work and was able to further simplify the code. It turns out that the original exception from make_term() is expressive enough so I just removed the map_errwhich was ignoring the argument |err| and causing a warning. Does it look better now?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Much better. Thanks for going the extra mile!

@cjrh cjrh added the feature-parity Feature parity with upstream tantivy label Apr 25, 2024
@aecio aecio force-pushed the expose-term-set-query branch 2 times, most recently from c0a3281 to 452616d Compare April 25, 2024 17:36
@aecio aecio force-pushed the expose-term-set-query branch from 452616d to d35fe09 Compare April 25, 2024 21:03
@cjrh cjrh merged commit 8216f17 into quickwit-oss:master Apr 26, 2024
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature-parity Feature parity with upstream tantivy
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants