Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrade dependencies #839

Merged
merged 32 commits into from
Apr 2, 2024
Merged

Upgrade dependencies #839

merged 32 commits into from
Apr 2, 2024

Conversation

edsu
Copy link
Contributor

@edsu edsu commented Apr 12, 2023

This is a draft since 4 tests are failing after the upgrades...

Description

Upgrade various dependencies (gevent, werkzeug, markupsafe, jinja2, flask and httpbin) with the goal of fixing builds with Python v3.11 and higher.

Motivation and Context

I spent a bit of time trying to let gevent float to 3.1.2 and working through the effects of running pip install -r requirements.txt.

The good news is gevent v22.10.2 installs under Python3.11 now. But installing the latest werkzeug (which is currently unpinned) brings along the need to upgrade markupsafe and jinja2. I worked through the minimal changes to get that working, which mostly involved using pass_context instead of contextfunction.

The bad news is that the tests installed an old version of flask, which pulled in an old version of werkzeug which breaks things again. The reason why flask < v2 is installed is to get the old werkzeug which is needed by httpbin, which appears to be dead?

I think we could pin back werkzeug in our requirements.txt to solve this. But it seems like we would just be forestalling the inevitable need to upgrade? The approach taken here is to install a fork of httpbin for testing which has been updated to work with the latest werkzeug.

Fixes #835

Types of changes

  • Replay fix (fixes a replay specific issue)
  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)

Checklist:

  • My change requires a change to the documentation.
  • I have updated the documentation accordingly.
  • I have added or updated tests to cover my changes.
  • All new and existing tests passed.

Add aaronhmiller/httpbin fork which has had https://github.com/aaronhmiller/httpbin.git applied so it can work with latest werkzeug.
@FraMecca
Copy link

FraMecca commented Jun 2, 2023

Hi, thank you for this draft MR.

I am trying to get this project into the gentoo user repository so it would be very valuable to have python 3.10+ support.

For this reason I tried to add two small modifications in order to fix the tests:

  • this is my own fork adding a commit from your branch https://github.com/FraMecca/pywb/tree/upgrade-dependencies: it adds back the pop_path_info function that was removed from werkzeug.wsgi
  • I forked pyamf (required by warcio, so an indirect dependency) to work with python3.11: the C extension is not available anymore but there is replacement code. I think that works better because it is easier to update the code this way: https://github.com/FraMecca/pyamf

@FraMecca
Copy link

FraMecca commented Jun 2, 2023

Overall there are only 5 tests failing on my branch but they are all related to path rewriting. I need help fixing those because I have no experience with that part of the codebase

platform linux -- Python 3.10.6, pytest-7.1.3, pluggy-1.0.0
rootdir: /home/user/pywb, configfile: tox.ini, testpaths: tests
plugins: cov-4.1.0
collected 512 items                                                                                                                                                          

tests/test_acl.py ..........                                                                                                                                           [  1%]
tests/test_acl_manager.py .................                                                                                                                            [  5%]
tests/test_auto_colls.py ..........................................                                                                                                    [ 13%]
tests/test_cdx_server_app.py .................                                                                                                                         [ 16%]
tests/test_cert_req.py ..                                                                                                                                              [ 17%]
tests/test_cli.py .........                                                                                                                                            [ 18%]
tests/test_embargo.py ......                                                                                                                                           [ 20%]
tests/test_force_https.py ........                                                                                                                                     [ 21%]
tests/test_integration.py ....................................................................................................                                         [ 41%]
tests/test_live_rewriter.py ..........FFFF.............F......                                                                                                         [ 47%]
tests/test_locales.py ssss                                                                                                                                             [ 48%]
tests/test_manager.py .......                                                                                                                                          [ 50%]
tests/test_memento.py ..............................                                                                                                                   [ 55%]
tests/test_prefer_header.py ...................................................................................                                                        [ 72%]
tests/test_prefixed_deploy.py ......                                                                                                                                   [ 73%]
tests/test_proxy.py ssssssssssssssssssssssssssssssssssssssssssssss                                                                                                     [ 82%]
tests/test_range.py ............                                                                                                                                       [ 84%]
tests/test_record_dedup.py .....                                                                                                                                       [ 85%]
tests/test_record_replay.py ........................                                                                                                                   [ 90%]
tests/test_redirect_classic.py ............                                                                                                                            [ 92%]
tests/test_redirect_revisits.py .......                                                                                                                                [ 93%]
tests/test_redirects.py ...............                                                                                                                                [ 96%]
tests/test_root_coll.py ........                                                                                                                                       [ 98%]
tests/test_socks.py ....                                                                                                                                               [ 99%]
tests/test_zipnum_auto_dir.py ....     

@edsu
Copy link
Contributor Author

edsu commented Jun 19, 2023

@FraMecca I see 4 tests failing in my Python 3.11 dev environment:

FAILED tests/test_live_rewriter.py::TestLiveRewriter::test_live_bad_content_length[frame] - webtest.app.AppError: Bad response: 400 Bad Request (not 200)
FAILED tests/test_live_rewriter.py::TestLiveRewriter::test_live_bad_content_length[non-frame] - webtest.app.AppError: Bad response: 400 Bad Request (not 200)
FAILED tests/test_live_rewriter.py::TestLiveRewriter::test_live_bad_content_length_with_range[frame] - webtest.app.AppError: Bad response: 400 Bad Request (not 206)
FAILED tests/test_live_rewriter.py::TestLiveRewriter::test_live_bad_content_length_with_range[non-frame] - webtest.app.AppError: Bad response: 400 Bad Request (not 206)

I'm pretty sure these are related to me switching to a fork of httpbin. I need to get these tests problems resolved now since I'm having trouble installing greenlet v1 on the latest debian Docker images, which ship with Python 3.11.

@FraMecca
Copy link

@edsu so we have the same failures on our branches. I don't understand how you are able to pull that without switching to a py3 version of pyamf. What do you plan to do for httpbin?

This was removed in werkzeug 2.3 and exists as shift_pop_info in
wsgiref:

pallets/werkzeug#2415
@edsu
Copy link
Contributor Author

edsu commented Jun 29, 2023

@FraMecca I'm currently using a fork of httpbin that has been updated: https://github.com/webrecorder/pywb/blob/upgrade-dependencies/test_requirements.txt#L9

I updated this PR to replace werkzeug.wsgi import pop_path_info with wsgiref.util import shift_path_info since the werkzeug team decided it was duplicated there.

It looks like py3AMF is being installed conditionally in the setup.py. Is that not working for you?

It seems like JSON in responses is now minimized without spaces? I'm not
sure what changed in the stack to do that.
@edsu
Copy link
Contributor Author

edsu commented Jun 29, 2023

I think that perhaps these length tests are failing because spaces are now being removed from JSON responses somewhere in the stack, maybe by the httpbin fork?

Here are a bunch of space related changes I needed to make to get some tests to pass: 5b72d96

@FraMecca
Copy link

FraMecca commented Jul 1, 2023

@edsu I get 400 instead of 200 on some tests that use httbin, so it might be a different issue

@FraMecca
Copy link

FraMecca commented Jul 1, 2023

It looks like py3AMF is being installed conditionally in the setup.py. Is that not working for you?

It's needed, for the tests, right? In any case porting the library to py3 wasn't a big effort so I would be glad if it can be of help to you

@edsu
Copy link
Contributor Author

edsu commented Sep 22, 2023

@FraMecca there is already a Python3 port of pyamf at https://github.com/StdCarrot/Py3AMF. Does that not work for you?

@edsu
Copy link
Contributor Author

edsu commented Sep 22, 2023

It looks like https://github.com/psf/httpbin is being actively maintained by the Python Software Foundation so I'm going to switch to using that.

Sadly, I see it still has the same set of HTTP 400 instead of HTTP 200 errors:

FAILED tests/test_live_rewriter.py::TestLiveRewriter::test_live_bad_content_length[frame] - webtest.app.AppError: Bad response: 400 Bad Request (not 200)

But at least if a change in httbin is needed it is feasible to get it applied. The original repository is abandoned.

@edsu
Copy link
Contributor Author

edsu commented Sep 22, 2023

Sadly it appears that httpbin is no longer able to return invalid Content-Length headers:

curl -i -X GET "http://localhost:8080/response-headers?content-length=149,149" -H "accept: application/json"
HTTP/1.1 500 Internal Server Error
Connection: close
Content-Type: text/html
Content-Length: 141

<html>
  <head>
    <title>Internal Server Error</title>
  </head>
  <body>
    <h1><p>Internal Server Error</p></h1>

  </body>
</html

This is what appears in the docker console when run against the psf/httpbin:

[2023-09-22 10:21:01 +0000] [7] [ERROR] Error handling request /response-headers?content-length=149,149
Traceback (most recent call last):
  File "/opt/httpbin/lib/python3.10/site-packages/gunicorn/workers/base_async.py", line 55, in handle
    self.handle_request(listener_name, req, client, addr)
  File "/opt/httpbin/lib/python3.10/site-packages/gunicorn/workers/ggevent.py", line 128, in handle_request
    super().handle_request(listener_name, req, sock, addr)
  File "/opt/httpbin/lib/python3.10/site-packages/gunicorn/workers/base_async.py", line 108, in handle_request
    respiter = self.wsgi(environ, resp.start_response)
  File "/opt/httpbin/lib/python3.10/site-packages/flask/app.py", line 2213, in __call__
    return self.wsgi_app(environ, start_response)
  File "/opt/httpbin/lib/python3.10/site-packages/flask/app.py", line 2197, in wsgi_app
    return response(environ, start_response)
  File "/opt/httpbin/lib/python3.10/site-packages/werkzeug/wrappers/response.py", line 579, in __call__
    start_response(status, headers)
  File "/opt/httpbin/lib/python3.10/site-packages/gunicorn/http/wsgi.py", line 243, in start_response
    self.process_headers(headers)
  File "/opt/httpbin/lib/python3.10/site-packages/gunicorn/http/wsgi.py", line 264, in process_headers
    self.response_length = int(value)
ValueError: invalid literal for int() with base 10: '149,149'

Maybe the older version of httpbin wasn't so strict?

Use the latest httpbin from psf/httpbin and adjust some json responses
since the formatting has changed slightly.
@aquatix
Copy link

aquatix commented Feb 23, 2024

Any news on this? We would like to upgrade our old Debian 11 servers to 12 :)

@tw4l
Copy link
Member

tw4l commented Mar 29, 2024

Any news on this? We would like to upgrade our old Debian 11 servers to 12 :)

Hi, thanks for your patience and to @edsu for starting this branch! Apologies for the long delay on taking this up, we've been very focused on Browsertrix of late and are a small team. I'm planning to merge this into main soon with some other fixes/features and we should put out a new release shortly!

@tw4l
Copy link
Member

tw4l commented Mar 29, 2024

Pinning gevent==23.9.0 and greenlet>=2.0.2 to specific versions because httpbin requires greenlet < 3 for now. We can update to latest/greenlet 3 when the PSF-maintained httpbin updates, or contribute that separately.

@tw4l tw4l requested a review from ikreymer March 29, 2024 16:54
@tw4l tw4l marked this pull request as ready for review March 29, 2024 16:54
@tw4l
Copy link
Member

tw4l commented Mar 29, 2024

Note that werkzeug 2.3.0 dropped support for Python 3.7; if we want to continue to support Python < 3.8, we'd need to pin to werkzeug==2.2.3, which does not support Python 3.12 so may become an issue shortly.

@tw4l
Copy link
Member

tw4l commented Mar 29, 2024

It's also worth noting that werkzeug dropped support for Python 2 and Python 3.5 in version 2.0.0, released in 2021, and we previously didn't have the version pinned so likely our claim to support Python 2 and <3.6 hasn't actually been true for a little while. There are almost certainly some additional things that can be dropped accordingly, but I didn't want the scope of this PR to expand too much so that might be better done in a follow-up.

@tw4l
Copy link
Member

tw4l commented Apr 1, 2024

We decided to support Python 3.7-3.11 in the next release, and will drop Python 3.7, add Python 3.12, and upgrade dependencies further in the following release. I will make an issue to document what will be needed at that point.

Copy link
Member

@ikreymer ikreymer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work! This is an important step in moving pywb forward!

@tw4l tw4l merged commit b4955cc into main Apr 2, 2024
10 checks passed
@tw4l tw4l deleted the upgrade-dependencies branch April 2, 2024 21:16
@aquatix
Copy link

aquatix commented Apr 3, 2024

Great work @tw4l !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Pywb fails to install on 3.11 & 3.12 (at least on M1 Macs)
5 participants