Allow read_pdf to accept a file-like object #103

Lnk2past · 2019-12-06T15:36:11Z

In our use case we have PDF data streamed in memory from an external service; in order for us to process it using camelot we need to save that data out to a file and then pass the filename over. It would be great to be able to just send a file-like object through the interface instead, as this would save us from needing to write temporary files only to read them back in. I do not think there is a workaround for this at the moment, but if there is any information would be greatly appreciated.

I do not know if I will have time immediately soon to work on a PR, but does this sound like a reasonable feature to add?

The text was updated successfully, but these errors were encountered:

yeus · 2020-10-05T21:51:57Z

in this other repository (https://github.com/atlanhq/camelot) (I assume the original one?) there are already two merge requests pending and aiting to get accepted for this issue:

Maybe we can do this quickly with that ;). I think this is really a feature that a lot of poeple would like to have ...

vinayak-mehta · 2020-10-05T22:23:12Z

Thanks for pointing that out! Right now #13 is taking up a lot of my time, but I will try to get to this over the weekend.

yeus · 2020-10-05T22:26:37Z

For poeple where the main problem is, that you want to keep the file "in-memory" for example as a spooled temporary file, a short workaround could be the following:

use this library here: https://github.com/mbello/memory-tempfile to create a file on a a tmpfs in our memory. This soluion only works for linux though ... Additionally, its difficult to do this in docker images or on kubernetes.

yeus · 2020-10-05T22:28:45Z

@vinayak-mehta just saw your comment. Looking forward to this! If you need any help (testing, review...) just contact me ;) although I am not that deep into the library ...

vinayak-mehta · 2020-10-05T22:32:36Z

Thanks for the suggestion, and for offering your help! I will try to get to the PRs by the weekend and will definitely comment here if I need help :)

pilotjoe · 2020-10-08T19:43:11Z

I mentioned another use case for this in atlanhq/camelot#189, where reading from file-like object would come in handy when more advanced authentication is required for websites (e.g. SharePoint), requiring pulling the object using a library like requests.

vinayak-mehta · 2020-10-12T16:55:20Z

@pilotjoe Thank you for your comment describing the use-case.

Last week, I ended up spending a lot of time on #13. Will get to this soon.

yash12392 · 2021-03-09T15:55:23Z

Hey @vinayak-mehta , just checking in if you got around to doing this?

HeskethGD · 2022-06-20T12:40:35Z

Would love this feature to be implemented. The use case is an AWS Lambda function that has read a pdf from S3, processed it with regex to find relevant pages then we wish to pass the relevant pages as bytes to a table extraction package, ideally without having to write/read to/from file again in the Lambda.

Vesalon · 2023-06-23T07:41:46Z

want to add to the comments that this would a very useful feature to access. writing and reading from disk can be quite expensive

yg-smile · 2024-04-22T23:33:04Z

This would be very useful feature. Big appreciate if there is any update

bosd · 2024-09-01T18:39:23Z

This would be very useful feature. Big appreciate if there is any update

I was working an a forward port over here:
py-pdf#16
Feel free to help and contribute over there.

…-2024.8.30

ziaulrehman40 mentioned this issue Jan 13, 2020

FEATURE REQUEST: Support remote public files #108

Open

pilotjoe mentioned this issue Oct 8, 2020

[WIP] PDFHandler accepts file like objects atlanhq/camelot#189

Closed

Niremizov pushed a commit to omkod/camelot that referenced this issue Sep 2, 2024

Merge pull request camelot-dev#103 from py-pdf/dependabot/pip/certifi…

5c42348

…-2024.8.30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow read_pdf to accept a file-like object #103

Allow read_pdf to accept a file-like object #103

Lnk2past commented Dec 6, 2019

yeus commented Oct 5, 2020

vinayak-mehta commented Oct 5, 2020

yeus commented Oct 5, 2020

yeus commented Oct 5, 2020

vinayak-mehta commented Oct 5, 2020

pilotjoe commented Oct 8, 2020

vinayak-mehta commented Oct 12, 2020

yash12392 commented Mar 9, 2021 •

edited

Loading

HeskethGD commented Jun 20, 2022

Vesalon commented Jun 23, 2023

yg-smile commented Apr 22, 2024

bosd commented Sep 1, 2024

Allow read_pdf to accept a file-like object #103

Allow read_pdf to accept a file-like object #103

Comments

Lnk2past commented Dec 6, 2019

yeus commented Oct 5, 2020

vinayak-mehta commented Oct 5, 2020

yeus commented Oct 5, 2020

yeus commented Oct 5, 2020

vinayak-mehta commented Oct 5, 2020

pilotjoe commented Oct 8, 2020

vinayak-mehta commented Oct 12, 2020

yash12392 commented Mar 9, 2021 • edited Loading

HeskethGD commented Jun 20, 2022

Vesalon commented Jun 23, 2023

yg-smile commented Apr 22, 2024

bosd commented Sep 1, 2024

yash12392 commented Mar 9, 2021 •

edited

Loading