-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Future maintenance #15
Comments
Hi Stefan, I am interested in volunteering to get this done too. The outline you propose looks like a good plan. Are you the owner of the pdfrw organization? cc/ @sarnold |
We should aim to get the test suite running in CI too. |
Yes, I am. For the time being, it just is a placeholder.
It mostly does in this repository. Some tests have been disabled although, while they should be safe to enable as there is no visual difference (internal changes in |
Pinging @t-houssian and @Lucas-C who might be interested in this too. |
Thanks for the ping :) @MartinThoma may also be interested by the subject: Maybe the best option would be to maintain this package inside the https://github.com/py-pdf organization, which is already active? |
I think that's a great idea. It could bring greater visibility and increased collaboration between projects. |
@federicobond Thanks for the ping as well! I sadly don't have the time currently to help out much on this but do think that what @Lucas-C has mentioned is a great idea. I used this fork in a project of mine called fillpdf (https://github.com/t-houssian/fillpdf). I released this fork because it was the best I could find to use in my project. I created that fillpdf project because of how hard it was to work in the current pdf filling libraries so I think any clean up and making things more user friendly would be awesome. Feel free as well to add fillpdf to the ecosystem and use any of the code from it. Best of luck y'all! |
Thanks for pinging me 🤗 Yes, I've spend quite some time since April 2022 in merging PyPDF2 back into pypdf + setting up CI/tests + docs + merging over 100 PRs + fixing several hundred issues. Now we have at least one other super active pypdf developer again and I hope that PyPDF3 / PyPDF4 developers and users will move back to pypdf 🤞 It seems to me that pdfrw is solving a sub-set of the problems that pypdf is solving. For this reason I would love if the two projects (and especially the developers around them) could converge. I approached Patrick Maupin in April 2022. Sadly I don't know pdfrw well enough to really judge if merging the two would be reasonably possible. I was thinking that we might be able to define a "pypdf-core" which is similar to pdfrw, but nobody did any work in that direction so far. I'm also uncertain about which use-cases current pdfrw users actually have. Looking at SO, I'd rather recommend them to use pypdf. Another activity besides blog posts + answering questions is nudging the fpdf / fpdf2 people to make their relationship clear to the community. @Lucas-C and me recently received a super nice e-mail by the original author; I'm in good hope here 🎉 pdfrw in the py-pdf GitHub organizationI'd be open to move pdfrw into the py-pdf GitHub organization. I would love an exchange between PDF-related projects / developers and sharing of issues/solutions/test cases. The official git seems to be https://github.com/pmaupin/pdfrw / (1700 stars) whereas https://github.com/sarnold/pdfrw only has 24 stars. I'm interested in bringing the Python-PDF communities closer together, not in fracturing the communities even more. So I'd rather not move https://github.com/sarnold/pdfrw into py-pdf at this stage. The pdfrw GitHub organizationWhat does https://github.com/pdfrw do? I don't see anything in there. |
Thank you for your input @MartinThoma, very appreciated!
I agree this would be very desirable for the ecosystem.
I can add my 2 cents here: we began using PyPDF a few years ago at our company to include a stamp on each page of some files that are uploaded to our system. Its performance was pretty bad: it took whole seconds and consumed quite a bit of memory to process moderately long files. We ended up switching to pdfrw and saw a huge improvement. This could no longer hold now, but pdfrw worked well enough for us and was easy to debug that we remained with it since.
That would be awesome! Also increasing the bus factor for these projects.
I believe sarnold's fork is just pmaupin master + some small fixes/improvements, most of which we would need to land into master eventually (someone correct me if I'm wrong). Other than that, the projects haven't really diverged.
I believe it's just @stefan6419846 squatting the name in case it was going to be used. |
I totally agree! 😊 In fact, maybe we could consider merging https://github.com/PyFPDF (which is mostly Would you be open to this @MartinThoma? Also, maybe at some point the org should have a code of conduct & some projects management guidelines? (edit:) I see that the only public member of the |
This is correct. I just created this organization to block the name when thinking about the future of the project and creating this issue as well. As responses have been quite sparse until yesterday (with my e-mails to Patrick and Steve being unanswered for nearly two months now as well), I did not yet take this further. I am open to move this to the aforementioned
Speaking of my use-case: I mostly use |
Sounds awesome to me! We should talk about permissions/expectations beforehand, though. I would suggest that you open an issue/discussion in https://github.com/PyFPDF/fpdf2 to discuss this :-) The two roles I can give are: I would make you @Lucas-C an owner of py-pdf, but would appreciate if we had a discussion before adding new owners (for members, I don't care too much) Although owners have all permissions on all repositories, I would expect them/me not to interfere with them except if the repositories maintainer(s) are inactive for a long time (e.g. 3 months?) or if something security-critical happens (e.g. a dependency was introduced that is malicious/typo-squatting). As both, pypdf and fpdf are pretty big, we should write such things down within py-pdf (maybe make a github page at https://py-pdf.github.io/ ) |
I've heard that before 🤔 When I have some time I need to create benchmarks + investigate that 🕵️ |
Sounds great to me! 😊 |
Hey, I'm just a user, but I know how hard it it to keep a project going, so from a user perspective: do what you got to do! Also: thank you for your continued work. It is appreciated. |
I'm so happy this is moving along! 😄 As for pdfrw, should we wait until @Lucas-C becomes a py-pdf owner to discuss next steps? |
Hi! I described how I plan for I'd be happy to get feedback from you all 😊 |
I am not a developer, just a pypdf user. Thanks for your amazing job |
This month we discovered+fixed a couple of issues that affect file size ( py-pdf/pypdf#1926 , py-pdf/pypdf#1906 ). If you can come up with a nice comparison script or a good test scenario, I could add it to https://github.com/py-pdf/benchmarks I'm all for an open and fair assessment of the qualities of different libraries. This benchmark allowed us to improve the text extraction quality of pypdf a lot. Maybe we can do something similar for other workflows / operations. edit: Recently I'm spending a less time with open source. If you make a PR to https://github.com/py-pdf/benchmarks that might help 😅 |
EDIT: @MartinThoma I am not sure if your last post was an answer to mine or a general comment As I said, I am not a developer. I do not use git, so PRs are pretty unknown to me. |
Thank you for clarifying and for sharing your benchmarking code. I will eventually add the idea to https://github.com/py-pdf/benchmarks . It might just take some time (and I will list you as a co-author of that PR, so you get credit for it :-) ) |
@t-houssian @Lucas-C @MartinThoma what is the status of moving this repo to the py-pdf org? @sarnold is this project still maintained or archived? |
Good question. I have just moved Would you agree with this suggestion @MartinThoma & @MasterOdin? |
Makes sense, happy to help get the GH action pipelines setup. |
This fork already has GitHub actions set up, so this part should be relatively easy in theory. Nevertheless, some of the tests have apparently been disabled for now and might need further evaluation: https://github.com/sarnold/pdfrw/commits/master/tests/expected.txt I did some research some months ago about the actual differences on the PDF files (related to more recent reportlab package etc.) and as far as I remember, most of the (visual) results were rather identical (I am currently on vacation and thus have no access to my experimental code). Just for the record: Valid reference files generated by Python 3.5 (and partly Python 3.6) might be downloaded from the artifacts at https://github.com/stefan6419846/pdfrw_reference_python36/actions/ |
I want a healthy Python / PDF ecosystem and I want to avoid having lots of small projects with tons of overlap. Maintenance:
Unique Selling Point: pdfrw can make modifications to PDF files, similar to pypdf. However, pdfrw is a lot faster. Besides the speed, I don't know of a single feature that pdfrw supports which pypdf does not. Community: As it has big overlaps with pypdf, I take it as a comparison
Maintainer support for project transfer:
Summary I'm uncertain. I think pdfrw must have some very good ideas regarding parsing of PDFs built-in. However, I don't see a single feature that pdfrw supports and pypdf doesn't. I'm also not certain how good the community support of pdfrw/ pdfrw2 is and if we could maintain it well. Given those first impressions, I think I'd rather try to improve pypdf with ideas from pdfrw + help the community make a switch than move pdfrw to py-pdf. |
@Lucas-C Does fpdf2 use pdfrw2? If that is the case, I can see an inherent interest of you to take care of pdfrw. If you want to take care of it then, I'd be ok with it :-) However, we should try to get some option to release a new version on PyPI. I'm currently observing how this does not work well with camelot-py 🥲 |
I completely forgot this: pmaupin#232 If pdfrw is the basis of many other projects, I'd also say it would fit well into py-pdf. |
More download starts: https://pypistats.org/packages/pdfrw - 7% still use python 2 😱 https://pypistats.org/packages/pdfrw about 4% of python 2 users |
I wrote to both Patrick and Steve in February when I initially opened this issue to get their opinion about an organization-based approach and the future maintenance in general, but never received any public or private response from them. There might be different reasons for this. |
I have tried 5 days ago to contact Patrick Maupin, but didn't get a response so far. I would wait 2 weeks in total. If somebody wants to take the work of a maintainer of pdfrw, we could do the following:
|
No, All things considered, I'm not particularly interested in maintaining I think I'm even going to get rid of that |
I made a quick performance comparison between
Those are the execution times of running those scripts on my computer, using a 4.8MB base PDF document with 47 pages:
Based on those results, @MartinThoma: what do you think is the bottleneck here for Edit: the scripts I used can be found there: https://github.com/py-pdf/fpdf2/tree/master/tutorial (they require the source & destination PDF files to be specified as arguments) |
I am aware of the speed difference. I've actually already created a benchmark for it: https://github.com/py-pdf/benchmarks#watermarking-speed Sadly, I cannot pin-point a single simple reason for that difference. I think a part of the reason is that we represent floats in with |
I spent an hour investigating the performances of Whereas Maybe |
Thanks for your work on this fork, which seems to be the most active and up-to-date one.
Unfortunately, GitHub makes it hard to work with forks or even discover them as they usually are hidden in the search results and in-repository search for forks is not available. Additionally, while there is a package on PyPI, it is out-of-date and does not correspond to this repository directly.
What are your plans for the future of your fork? I considered working on an own fork to keep this package available for my use cases, but with your existing work this could become easier. What I am currently thinking of:
The text was updated successfully, but these errors were encountered: