-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add OCR / translation #24
base: ocr
Are you sure you want to change the base?
Conversation
fixed docu
dir.create(output_dir_name,showWarnings = F) | ||
writeLines(x,file(paste0(dirname(path),"/outputs/",basename(path),".txt")) ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@behrica I'm a bit wary of this piece of code to create the directory and write the lines to that directory without the user directly specifying the directory or opting in to saving the results. I saw similar code earlier on too, is this piece needed for the OCR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason for this is traceability.
My users like the fact, that the magic of the OCR + translation becomes visible in the form of files,
so they can archive the whole folder of original PDFs + OCR result + translation result.
This can eventually later explain, why certain things have not been found., which is important in our context.
Maybe there is a better way to do this in R.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we could have an option to enable or disable this.
Same is true for progress logging.
As the eventually large pdfs gets uploaded to Azure, the "search" on lots of files could take hours.
(I implemented some caching, so a second search will not do it anymore),
Do you have a suggestion for this ?
Co-authored-by: Brandon LeBeau <[email protected]>
cleaned up some commented code replaced T -> TRUE, F -> FALSE
I cleaned up the code, used argument lists and replaced T -> TRUE, F-> FALSE |
I am aware, that I am not a good R programmer, more an R user, My PR is in this sense as well more a "proof of concept" then "ready made code", For my colleges this is a real time safer. |
new PR as requested