Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

urlopen connection to you test HTTPS URL #26

Open
jaluecht opened this issue Mar 18, 2018 · 1 comment
Open

urlopen connection to you test HTTPS URL #26

jaluecht opened this issue Mar 18, 2018 · 1 comment

Comments

@jaluecht
Copy link

I am following you documentation to obtain the Aphrodite web page with the following code:

from urllib.request import urlopen

my_address = "https://realpython.com/practice/aphrodite.html"

html_page = urlopen(my_address)
html_text = html_page.read().decode('utf-8')

print(html_text)

I am getting SSL errors. When I add the cafile option, I get invalid certificate error. How can I makke this work?

@ScriptAutomate
Copy link

You are right, that the default code in the Scrape and Parse Text From Websites doesn't work without some modifications to it. The website is likely blocking straight urllib calls without using a common user agent.

I get the following error when using the default code from RealPython, as you have above:

HTTPError: HTTP Error 403: Forbidden

If anyone else if having problems here, this should work (if wanting to continue to use urllib as part of the exercise):

from urllib.request import Request, urlopen

mozilla_request = Request('https://realpython.com/practice/aphrodite.html',
                          headers={'User-Agent': 'Mozilla/5.0'})
html_page = urlopen(mozilla_request)
html_text = html_page.read().decode('utf-8')

print(html_text)

NOTE: I definitely need to credit this StackOverflow for my answer method above

Using the requests package works fine, too. It isn't a standard library, so a pip install requests or pipenv install requests would need to be run beforehand:

import requests

r = requests.get("https://realpython.com/practice/aphrodite.html")
print(r.text)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants