error while using SemanticChunking #1722

mstrYoda · 2025-01-08T12:16:53Z

While using SemanticChunking it gives the following error: ValueError: Failed to load embeddings via SentenceTransformerEmbeddings: SentenceTransformer.__init__() got an unexpected keyword argument 'similarity_threshold'

It is due to chonkie library has no similarity_threshold param, instead (I guess) it uses similarity_window.

class SemanticChunking(ChunkingStrategy):
    """Chunking strategy that splits text into semantic chunks using chonkie"""

    def __init__(
        self, embedder: Optional[Embedder] = None, chunk_size: int = 5000, similarity_threshold: Optional[float] = 0.5
    ):
        self.embedder = embedder or OpenAIEmbedder(model="text-embedding-3-small")
        self.chunk_size = chunk_size
        self.similarity_threshold = similarity_threshold
        self.chunker = SemanticChunker(
            embedding_model=self.embedder.model,  # type: ignore
            chunk_size=self.chunk_size,
            similarity_threshold=self.similarity_threshold, # this line gives error
        )

The text was updated successfully, but these errors were encountered:

manthanguptaa · 2025-01-08T12:58:15Z

Hey @mstrYoda good catch. The correct param is threshold here. I will make a fix

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

error while using SemanticChunking #1722

error while using SemanticChunking #1722

mstrYoda commented Jan 8, 2025

manthanguptaa commented Jan 8, 2025

error while using SemanticChunking #1722

error while using SemanticChunking #1722

Comments

mstrYoda commented Jan 8, 2025

manthanguptaa commented Jan 8, 2025