Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tools(similarity): filter out low scoring chunks #144

Closed
matoushavlena opened this issue Nov 5, 2024 · 1 comment
Closed

tools(similarity): filter out low scoring chunks #144

matoushavlena opened this issue Nov 5, 2024 · 1 comment
Assignees
Labels
enhancement New feature or request

Comments

@matoushavlena
Copy link
Contributor

The wikipedia tool might be returning chunks that have low similarity scores and therefore not relevant/useful.

Similarly to minPageNameSimilarity, we would like to introduce a threshold that would filter out chunks/documents from the similarity tool. The initial value could be 0.25, but some exploration might be needed to decide on the right threshold.

When no documents are returned, the underlying tools (Wikipedia in this case) should return a LLM friendly message, such as "No results were found. Try to reformat your query.". This message already exists for the Wikipedia tool when no relevant pages are returned. We need to keep it DRY.

@pilartomas
Copy link
Contributor

The scoring is provider specific so the filter needs to reflect that by accepting a predicate. Meaning a single numeric value won't be sufficient.

The "No results were found." is indeed used by the wikipedia tool output but the runner check the output for emptiness and uses BeeToolNoResultsPrompt instead, so I wonder if it isn't solved already.J

@Tomas2D thoughts?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants