Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Doc().query method returns relevant answer from relevant papers but returns unrelevant pages #823

Open
Snikch63200 opened this issue Jan 20, 2025 · 1 comment
Labels
bug Something isn't working

Comments

@Snikch63200
Copy link

Snikch63200 commented Jan 20, 2025

Hi,

Whan i use Doc().query method on a doc collection, answer returns relevant answer from relevant papers but citation pages seems random.

The answer seems refers to different pages from documents. Maybe it is related to PDF parsing...

Any solution for this ?

Best regards.

@dosu.

Copy link

dosubot bot commented Jan 20, 2025

The issue you're experiencing with the Doc().query method returning seemingly random citation pages might be related to the PDF parsing approach or the logic used to associate answers with specific pages. Unfortunately, the specific logic or algorithm used by the Doc().query method to select citation pages is not detailed in the available information. However, there are known limitations with the PDF parsing approach that could potentially affect the results:

  1. Corrupt PDF Files: If a PDF file is corrupt, it might not be parsed correctly, leading to incorrect page references [1].

  2. Page Size Limit Exceeded: If the text on a page exceeds a certain limit, it might not be processed correctly, which could also affect page references [1].

To address this, you might want to check the integrity of your PDF files and ensure they are not exceeding any size limits during parsing. Additionally, reviewing the parsing logic in the parse_pdf_to_pages function might provide further insights into potential issues.

To continue talking to Dosu, mention @dosu.


Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Other

@dosubot dosubot bot added the bug Something isn't working label Jan 20, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant