-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Generalize RAG + PDF Chat feature #641
base: main
Are you sure you want to change the base?
Conversation
🔥 So cool, will test it locally later, but just from looking at the demo, do you think there's an easy way to show an indicator of when a PDF is already uploaded and will be sent with the message ? |
I see. Let me think about it |
const loadingTask = pdfjsLib.getDocument({ data }); | ||
const pdf = await loadingTask.promise; | ||
|
||
const N_MAX_PAGES = 20; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems a bit low, 100 or 200 pages maybe?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. 100 or 200 would be bit slow to create embeddings on CPU using the current solution of transformers.js (I will provide actual benchmark numbers).
In this case, should I review/push for the community PR #646 that makes it possible to have an embedding endpoint for faster embeddings creation. We can still use tfjs embeddings (the current approach) for websearch, and use TEI-powered embeddings endpoint for PDF embeddings creation & possibly for assistants #639 (if users can upload documents). Wdyt ?
src/lib/components/UploadBtn.svelte
Outdated
/> | ||
<CarbonUpload class="mr-2 text-xs " /> Upload image | ||
<CarbonUpload class="mr-2 text-xs " /> | ||
{#if uploadPdfStatus === PdfUploadStatus.Uploaded} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes as already said, you probably want to display uploaded files somewhere in the UI.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
handled in #641 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The feature itself works super well, I'm super impressed by what it can do tbh 🔥
There's just a couple things that are not super clear to me around the user experience that I think could be improved:
- Why does it create an empty conversation on file upload ? Seems to me like it should show that a PDF has been uploaded client-side, and create the conversation with the PDF only when the first message is sent, the way we deal with images
- Would be cool to have an indicator in a conversation that shows that a PDF is available in a specific conversation. Like something at the top of the conversation that says
"${filename}.pdf is available in this conversation"
or something - It's not super clear from the UI perspective what happens when you upload a PDF to a conversation that already has one. I think it silently replaces the old PDF by the new one, but it's a bit confusing, I think we could just make it so that PDFs can only be uploaded at the beginning of a conversation, wdyt?
I also left a couple of fixes for type checking as comment/suggestions in the PR
context: string; | ||
} | ||
|
||
/* eslint-disable no-shadow */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/* eslint-disable no-shadow */ |
I think this can be removed by changing .eslintrc.cjs
- "no-shadow": ["error"],
+ "@typescript-eslint/no-shadow": "error",
@@ -0,0 +1,114 @@ | |||
<script lang="ts"> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it have been possible to reuse OpenWebSearchResults.svelte
and just make it a generic OpenResults
maybe ? Looking at the diff between the two it looks like the only difference is the button name ("PDF search"
and "Web search"
) and the input type
</script> | ||
|
||
<svelte:head> | ||
<title>{PUBLIC_APP_NAME}</title> | ||
</svelte:head> | ||
|
||
<ChatWindow | ||
on:message={(ev) => createConversation(ev.detail)} | ||
on:message={(ev) => createConversationWithMsg(ev.detail)} | ||
on:uploadpdf={(ev) => createConversationWithPdf(ev.detail)} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure why we create an empty chat with a document, seems to me like it should be done like for images, where the files are "stored" in the front-end and the conversation created along when the first message is sent ?
ba5f9b1
to
e9ffdab
Compare
Updates:
Screen.Recording.2024-01-09.at.1.45.02.PM.movWhat I'm working on now:
|
Thanks for working on this! One question I had is: what was your thought process for adding a new upload button for PDFs, versus using the existing drag-and-drop functionality that already exists for images? |
@wdhorton currently the UI might still evolve. For now, the reason for resuing the same upload btn (instead of adding a new upload btn) is: having two different upload btns makes the UI look cluttered, especially on smaller screens |
I have few questions:
|
* Generlize RAG * wip * fix casting
Update: this PR is getting big. Unfortunately, there is no other option (I think). The specific points are:
Should I open a PR for vectorDB support against this branch? wdyt @nsarrazin @gary149 |
Thanks for the quick response guys! |
@mishig25 What is your estimation for merging this in your opinion? |
Can't give exact date. But will do my best to merge it soon :) |
pdfSearchResults = await RAGs["pdfChat"].retrieveRagContext(conv, newPrompt, update); | ||
} | ||
|
||
messages[messages.length - 1].ragContext = pdfSearchResults; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This causes a bug with web search.
messages[messages.length - 1].ragContext = webSearchResults is being overridden with undefined if no pdf document has been uploaded by the user.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Submitted a PR to fix the issue #745
Merge. hire 10 more people to help. |
Will merge soon
Hiring 10 more people rarely results in 2x productivity (let alone 10x) |
It's a tough one to implement. I appreciate the work on this one. I had a couple of thoughts on this I'd like to share. First, I thought of a plugin-like system that is registered based on the file type, has Alternatively, it could also be a third party API - much like OpenAI function calling works - so that the responsibility is NOT on you but on the end user who chooses to deploy. It's great for developers but would probably be a pain for the those who just one to feel the power of deploying and has no use beyond (inspired by the shutdown story of banana.dev). I believe these would inherently fit better as the nature of open-source deployments is to customize. As such, there is an infinite number of use cases and solutions... PDFs, images, audio files, web search as |
Can you ideally skip apache2? Or check error logs from it to see if anything is being logged? Could be that some headers are lost or file being too big and blocked. What does your network devtools say on this upload response? |
The error on the console shows 403 Forbidden error meaning it has something to do with authentication. I am using custom models with separate endpoint rather than the default. However, the chat function with the custom model works as expected. When the pdf is being uploaded, it throws the 403. My question is, the pdf fetch function is using a different authentication? Let me add further console logs in both my local machine and server to see the difference. I will let you know the results here. |
I have the same problem as #693 It retrieves correctly when looking at the prompt and parameters, but the model answers as if it has gotten only the question. |
hey, whats the current status of the PR? |
Also very interested in this feature. Is there help needed for this? Wasn't sure the blocker or how to help. |
@mishig25 Thanks for working on this! Very interested in this feature as well thus curious if there are any updates it? |
Hi @mishig25 and all, from my side I would also offer to help in my freetime if I can, as this is quite an important feature. Reading the above I think that an overview of the required steps is important or someone that coordinates tasks. E.g. is something missing in PR #745 ? Or is a redesign of the entire RAG-logic including externalization of the API required? Or is there maybe a meeting needed with some overarching architecture etc.? |
I am really confused, current version of the chat doesn't seem to have the file upload despite this PR being sent a long ago. WHY??? |
it's better to just have 1 employee solving the issue - especially when the company is valued at $4,500 million dollars... |
why has this not be added? |
TLDR: implement PDF-chat feature
Update 1 here
Closes #609
When user uploads a PDF:
Limitations
Testing locally
*install new pdf-parse dependency with npm ci
Screen recording
Testing by uploading Mamba paper
Screen.Recording.2023-12-18.at.3.46.40.PM.mov