tools(similarity): filter out low scoring chunks #144

matoushavlena · 2024-11-05T12:01:18Z

The wikipedia tool might be returning chunks that have low similarity scores and therefore not relevant/useful.

Similarly to minPageNameSimilarity, we would like to introduce a threshold that would filter out chunks/documents from the similarity tool. The initial value could be 0.25, but some exploration might be needed to decide on the right threshold.

When no documents are returned, the underlying tools (Wikipedia in this case) should return a LLM friendly message, such as "No results were found. Try to reformat your query.". This message already exists for the Wikipedia tool when no relevant pages are returned. We need to keep it DRY.

The text was updated successfully, but these errors were encountered:

pilartomas · 2024-11-05T13:16:48Z

The scoring is provider specific so the filter needs to reflect that by accepting a predicate. Meaning a single numeric value won't be sufficient.

The "No results were found." is indeed used by the wikipedia tool output but the runner check the output for emptiness and uses BeeToolNoResultsPrompt instead, so I wonder if it isn't solved already.J

@Tomas2D thoughts?

Ref: #144 Signed-off-by: Tomas Dvorak <[email protected]>

Ref: #144

matoushavlena assigned pilartomas Nov 5, 2024

Tomas2D added the enhancement New feature or request label Nov 6, 2024

Tomas2D added a commit that referenced this issue Jan 2, 2025

feat(tools): add minScore filter condition for SimilarityTool

6598906

Ref: #144 Signed-off-by: Tomas Dvorak <[email protected]>

Tomas2D assigned Tomas2D and unassigned pilartomas Jan 2, 2025

Tomas2D mentioned this issue Jan 2, 2025

feat(tools): add minScore filter condition for SimilarityTool #266

Merged

Tomas2D added a commit that referenced this issue Jan 3, 2025

feat(tools): add minScore filter condition for SimilarityTool (#266)

e9d5d4f

Ref: #144

Tomas2D closed this as completed Jan 6, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

tools(similarity): filter out low scoring chunks #144

tools(similarity): filter out low scoring chunks #144

matoushavlena commented Nov 5, 2024

pilartomas commented Nov 5, 2024

tools(similarity): filter out low scoring chunks #144

tools(similarity): filter out low scoring chunks #144

Comments

matoushavlena commented Nov 5, 2024

pilartomas commented Nov 5, 2024