You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hey all,
Thank you all for this fantastic script! It works very well, although I found a pdf (attached) whose highlights are being severely truncated. I tweaked boxhit function to return True if there is any overlap at all which gave me better results but then the script still does not pick up the last line of each highlight. It looks like original boxes and the rectangle in the Annotation object are indeed missing this last line (the annotation y0 is bigger than the item's)...
Anyway... I can provide more info if you'd like and I'd very much appreciate any insight into fixing this although it is also possible that it is more of a pdfminer issue...
Thanks for the report and sample PDF. I've futzed with the hit detection algorithm quite a few times before, but haven't had any reports of issues with it for a long time so I suspect this may be an issue with the PDF annotation software as much as it is with pdfminer. I could consider making the 0.5 constant tunable, but that sounds like it wouldn't have fully solved your issue (?)
Thanks, @0xabu for a quick reply! Even with the 0.5 const down to 0, I'm missing some of the letters...
I dug a bit deeper and I believe it is a pdfminer.six issue.
I filed (or rather commented an existing) issue there. Just in case, I'm leaving a link here: pdfminer/pdfminer.six#281 (comment)
Hey all,
Thank you all for this fantastic script! It works very well, although I found a pdf (attached) whose highlights are being severely truncated. I tweaked
boxhit
function to returnTrue
if there is any overlap at all which gave me better results but then the script still does not pick up the last line of each highlight. It looks like original boxes and the rectangle in the Annotation object are indeed missing this last line (the annotation y0 is bigger than the item's)...Anyway... I can provide more info if you'd like and I'd very much appreciate any insight into fixing this although it is also possible that it is more of a pdfminer issue...
pwc-tax-guide.pdf
The text was updated successfully, but these errors were encountered: