Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: EmbeddingsBuilder redesign #52

Open
11 of 14 tasks
garance-buricatu opened this issue Oct 8, 2024 · 1 comment
Open
11 of 14 tasks

feat: EmbeddingsBuilder redesign #52

garance-buricatu opened this issue Oct 8, 2024 · 1 comment
Assignees
Labels
design External API Design
Milestone

Comments

@garance-buricatu
Copy link
Contributor

garance-buricatu commented Oct 8, 2024

Feature Request

Any object that implements the trait Embedabble can be added to the embeddings builder and therefore can be embedded.

Motivation

Currently, embeddings builder is tightly coupled with the DocumentEmbedding type (only objects of this type can be used by the builder). Downsides: user needs to type cast their types to DocumentEmbedding and the structure of DocumentEmbedding is predetermined so the user needs to design their embedding code around this.

Proposal

  1. Define the trait Embedabble which contains a method fn embedabble() which returns a list of strings (the strings that should be embedded).
  2. Change the EmbeddingsBuilder to accept any type that implements the trait Embedabble.
  3. Write a custom derive macro that can be used on any struct which derives the Embedabble trait automatically.

Linked issues:

#60
#61
#62

Todo:

  • Give trait and macro same name like serde

  • implement Embeddable trait for many primitive types

  • macro fails when there are no embed tags

  • macro fails when embed with is empty string

  • need to handle error in custom embed function

  • review borrow and owning eveywhere. done for embeddings.rs

  • update docstrings. done for embeddings.rs

  • clean up embedding builder definition with traits maybe?

  • should build() return Embedding object or just string? return Embedding object - document string used for debugging

  • in memory vector store, rethink type of document D --> move to another branch

  • why is vector search of in memory vector store so bad? bug found

  • in memory store - implement from trait from both returns types of build into add_document

  • tests

  • make macro as feature flag

@garance-buricatu garance-buricatu self-assigned this Oct 8, 2024
@garance-buricatu garance-buricatu added the design External API Design label Oct 8, 2024
@garance-buricatu garance-buricatu changed the title refactor: EmbeddingsBuilder feat: EmbeddingsBuilder redesign Oct 8, 2024
@cvauclair
Copy link
Contributor

@garance-buricatu can you add some formatting and a little bit more context to this issue? Thank you!

@mateobelanger mateobelanger added this to the v0.5 milestone Nov 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
design External API Design
Projects
None yet
Development

No branches or pull requests

3 participants