Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add support for file_bytes argument with managed_file_context() #16

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

bosd
Copy link
Collaborator

@bosd bosd commented Mar 26, 2024

This is an attempt to cherry-pick in the PR from #15.
Fixes: camelot-dev#170 and camelot-dev#245 and atlanhq/camelot#376
and atlanhq/camelot#331

Was really fighting with git and the merge commits on the fork of a fork and so on...
Cherry pick approach was the cleanest.

I hope to have solved the merge conflicts correctly.
Please review..

@bosd bosd force-pushed the Johnmaras-file-bytes-support2 branch 7 times, most recently from b605099 to 3a86892 Compare March 27, 2024 10:26
camelot/handlers.py Outdated Show resolved Hide resolved
@bosd bosd force-pushed the Johnmaras-file-bytes-support2 branch 7 times, most recently from 70af652 to e445899 Compare March 27, 2024 22:14
@bosd bosd marked this pull request as ready for review March 27, 2024 22:20
@bosd bosd requested a review from foarsitter April 3, 2024 06:01
Copy link
Collaborator

@foarsitter foarsitter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some of my thoughts:

  1. Adding the extra variable file_bytes feels for me more like an extra type for filepath.
  2. In __init__ there are two places where we raise an InvalidArguments
  3. filepath isn't used beyond open(self.filepath, "rb"), so why do we need a name attribute for file_bytes?
  4. The naming of managed_file_context isn't directly clear to me, open() would, or even implement PDFHandler as contextmanager so we can do with self but I think with self.open() would be better
  5. Is @contextmanager doing the cleanup magically or do we need to do that manually?

@maylorian
Copy link

Hi,

Is this something that's planned to be merged soon? We're trying to move away from the old camelot and to this repo but this one is what's holding us back.

Thanks.

@bosd
Copy link
Collaborator Author

bosd commented Aug 21, 2024

I'm currently not working on this one.
Been a bit focused on household tasks in this repo.
Feel free to contribute and superspeed this pr.

@bosd bosd added the enhancement New feature or request label Aug 26, 2024
@bosd bosd mentioned this pull request Aug 28, 2024
25 tasks
@bosd bosd added the help wanted Extra attention is needed label Sep 1, 2024
@bosd bosd force-pushed the Johnmaras-file-bytes-support2 branch 2 times, most recently from 8f7f5ab to bd01e8e Compare September 5, 2024 20:19
@bosd
Copy link
Collaborator Author

bosd commented Sep 14, 2024

@foarsitter Thanks for your review.
I was merely forward porting this to the new repo. Without looking very deep into it.
Now I investigate the code a bit more. But have to admit I dont understand this fully yet 😮‍💨

Some of my thoughts:

  1. Adding the extra variable file_bytes feels for me more like an extra type for filepath.
  2. In __init__ there are two places where we raise an InvalidArguments

Do you have an suggestion to refactor this code?

  1. filepath isn't used beyond open(self.filepath, "rb"), so why do we need a name attribute for file_bytes?

Idk, I would assume that file_bytes has its own attribute, because it needs to be handled differently.
e.g. writtten to a temp file on disk if it's getting too large to fit into memory?

  1. The naming of managed_file_context isn't directly clear to me, open() would, or even implement PDFHandler as contextmanager so we can do with self but I think with self.open() would be better
  2. Is @contextmanager doing the cleanup magically or do we need to do that manually?

Yes, I think it is done automagically. AFAIK thats the purpose for the wrapper.
https://realpython.com/python-with-statement/

Or should there be a:

    finally:
        file.close()

@bosd bosd force-pushed the Johnmaras-file-bytes-support2 branch from bd01e8e to bdc2ae7 Compare September 19, 2024 19:55
@bosd bosd marked this pull request as draft September 19, 2024 19:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Camelot functionality has read_pdf from file but no option read from bytes
4 participants