Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PDF text line contains apostrophe; are not searchable #59

Open
subha94u opened this issue Jan 22, 2013 · 3 comments
Open

PDF text line contains apostrophe; are not searchable #59

subha94u opened this issue Jan 22, 2013 · 3 comments

Comments

@subha94u
Copy link

When byte values <= 127 are interpreted as ASCII characters but when >127 are represented by multi-byte sequences and fonts with /Encoding /WinAnsiEncoding and most likely with `/Mac*Encoding'

I am facing a issue i.e in my PDF there are some text contains apostrophe; (e.g John's) but this texts are not searchable even no other word will search in that line. Then I did some google and found that its related with some encoding problem. Please have a look the problem and let me update with the solution.

Thanks

@KurtCode
Copy link
Owner

I am not sure I understand your first sentence.

A document sometimes uses a standard character encoding, like MacRoman or Windows encoding. Other may carry a custom mapping from character codes to Unicode characters. Finally, there are fonts that have their own idea of how to encode characters. The Computer Modern font is just one such font that I've come across.

I suspect the line with the apostrophe is written in a font that we do not know how to derive proper text content from.

@subha94u
Copy link
Author

Here is a PDF https://www.filesanywhere.com/fs/v.aspx?v=8a726b865d6573b5a3ac In this PDF when I search a text with apostrophe is not searching.

@songbaoqiang
Copy link

It‘s a common problem. Has anyone a idea to get ride of it?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants