Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unicode symbols handling #36

Open
ralfeus opened this issue Oct 10, 2017 · 4 comments
Open

Unicode symbols handling #36

ralfeus opened this issue Oct 10, 2017 · 4 comments

Comments

@ralfeus
Copy link

ralfeus commented Oct 10, 2017

Some symbols are handled incorrectly.
For example:
Original: “Smoking Kills.”
Result: ΓÇ£Smoking Kills.ΓÇ¥

Original: lawyers’
Result: lawyersΓÇÖ

@zTrix
Copy link
Owner

zTrix commented Oct 14, 2017

Any demo html page for testing?

@ralfeus
Copy link
Author

ralfeus commented Oct 14, 2017

Here it is

Book.zip

@zTrix
Copy link
Owner

zTrix commented Jan 2, 2018

Sorry for sooooooooo late response, but I failed to reproduce the same result as provided.

@adrelanos
Copy link

This HTTP header alone not work for me:

content-type: text/html

This HTTP header header for me:

content-type: text/html; charset=UTF-8

In other words, webpage2html gets confused by missing charset=UTF-8 HTTP header. If this is to be considered a bug or not, I don't know. But perhaps something worth documenting.

In my case it helped to add charset utf-8; to nginx location config.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants