-
-
Notifications
You must be signed in to change notification settings - Fork 150
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Microservice Embedding Endpoints and embedding model setup #4558
Comments
For now, no auth is needed. The other tweak I'd make is to design this with a bit more abstraction. So instead of I think that gets you to:
Which returns:
And for
Which responds:
The nice thing about this is if we want to embed something else later, we can, without having to tweak the API. Otherwise looks good to me. |
Since opinions will need to be split, we will need to find a way to link them to the parent opinion. I suggest doing so by adding a chunk number to opinion_id. E.g. we split opinion_id 123 into three chunks, and each chunk will have opinion_id respectively 123_1, 123_2, 123_3. If you agree with this approach, do we still want to have embedding api to be responsible for splitting opinions? If yes, we 'll need to be sending opinion_id too. My suggestion would be to have fastapi endpoint doing only the embedding work, with the logic if 'query' -> CPU, if 'opinion' -> GPU. The splitting and sending the embeddings + chunks into s3, will be handled by a separate instance / script. |
@legaltextai I was imagining that we could request all chunk embeddings for a single opinion at a time, so there’s no need to identify which opinion they belong to. I was reviewing Sentence Transformer encode_multi_process method which states:
So the idea for this endpoint is that it can receive a simple query
Or multiple opinion texts:
For simple queries, it would be as easy as just requesting one embedding for it and returning it. For multiple opinion texts, I was thinking that for each opinion text in the request, we could do something like:
This will only be possible if the order of the embeddings from the model response is the same as the list of sentences in the input, so we can map the chunks to their embeddings.
However, if the order of embeddings is not guaranteed, we won’t be able to map the chunks and their embeddings correctly, and we would need to look for a different strategy. Do you know how that works? Is the order of the sentence input the same as the embeddings?
We could split opinion texts before sending them to the embedding API, Yes. However, the request body will be a bit more complex. I was thinking of abstracting that process and handling it within the embedding API as a preliminary step before requesting embeddings from the model. This way, if we need to adjust chunk sizes or anything else, we can tweak it within the microservice, and we won’t have to worry about it in the Django command that requests embeddings. |
@legaltextai some questions about your approach, please.
Do you mean that within a single embedding request, you send multiple chunks in the body, and the embedding API returns a response with the embeddings for all the chunks? Or do you send multiple API requests, each with a single chunk? If it's the former, in your approach, how would the API handle embedding multiple chunks? Would it send one chunk at a time to the model, or send all the chunks at once? You shared some code that includes:
If this only supports one chunk at a time, I think we should explore @mlissner Regarding the proposed architecture, the embedding endpoint will also split opinion texts into chunks, but it's possible that it can only perform embeddings as @legaltextai described. However, we may need to delegate chunk splitting to two different processes in CL: one for the initial batch process and another for real-time indexing via the ES signal processor, which is also a valid approach. My initial thought was that text splitting could be tied to the embedding endpoint settings, like chunk length. However, I'm open to your thoughts so we can decide whether we want to handle text splitting outside the embedding endpoint. |
This is for a different embedding model (UAE) but this is how I used to embed those chunks in batches
... then I would upsert the records to my postgres database. We can replace metadata with text chunk and save into s3 instead. I am ok with any solution you decide to implement. I just thought having a single endpoint for embeddings only would be an easier approach. |
Sorry to take a minute to catch up on this conversation. I'm with Alberto that we should at least check out I also think it's better to do text splitting in the microservice for a few reasons:
I think the opposing view is simplicity in the microservice, but we can't get rid of the complexity of chunking — we can only decide where the complexity lives. If the microservice doesn't do chunking, then we do it in our batch script, which makes it part of CL itself. I'd much rather that complexity live outside of CL. The simpler we can make the monster that is CourtListener, the better. Thank you both! |
Thank you, Mike. I will test encode_multi_process to see if it can organize the chunks in the correct order. Another angle to consider - Is it only us who will be using this API endpoint? No external access? |
At least not for now. Others can download the microservice and run it in their infra, but we won't make ours accessible. |
Here is my first take on the fastapi for queries, and opinion batches. You may ignore that 'text' endpoint for now. The 'batch' endpoint breaks down opinions by sentences , not exceeding in total 350 words (approx 512 tokens, the context limit for this embedding model). Pls let me know what you think. Text Pre-Processing: as json is sensitive to text formatting, we need to decide whether to handle pre-processing within the microservice or in the client-side script. I've been testing with text from the CL website. We may need to add additional preprocessing steps. |
Hm, that link doesn't work, but where's the code for the microservice? |
can you pls try again. as the model is loaded into my GPU, can you pls let me know when you are done testing. |
I took a quick look. I don't know FastAPI, but the Swagger interface looked about right and the code seemed fine at a glance. I think you need to get that into a pull request format so that we can merge it into our own repo. To that end, I created freelawproject/inception. Some things to make it good:
Can you work on that? When that's done, what I think we can do with Alberto out is create a PR for the many pieces of this puzzle and when he gets back he can review them all. Sound good? |
sounds good. is Doctor still a good template to follow? |
Yeah, it should be great, thank you! |
After #4557 is done. We should continue on this one.
1.2 Add Microservice Embedding Endpoints and embedding model setup
Implement API endpoint to:
a. Split texts into chunks
b. Request embeddings for the chunks
Consider:
Query embeddings request body:
Opinion lists request:
Query embeddings response:
Opinion lists response:
Error Handling
Review and handle error types the embedding endpoint can return so we can differentiate between transient errors (e.g.,
ConnectionError
) and bad requests with appropriate HTTP status codes, so we can decide on the client side whether to retry the request or not. For example:400 Bad Request
422 Unprocessable Content
Authentication
Determine if the microservice requires authentication. If for internal use only, authentication can be omitted (similar to Doctor)
The output for this issue would be:
A question here @mlissner to determine the type of authentication to use:
Will this microservice be offered as a service to other customers, requiring it to be exposed outside our internal network?
The text was updated successfully, but these errors were encountered: