[Improvement] Pip could resume download package at halfway the connection is poor #4796

winstonma · 2017-10-20T05:41:40Z

Pip version: 9.0.1
Python version: 3.6.2
Operating system: macOS 10.13

Description

When I have poor internet connection (the network is cut unexpectedly), updating pip package is painful. When I retry the pip install, it would stop at the midpoint and give me the same md5 error.

All I have to do is

Download the package from pypi (using browser or wget, both have retry/resume capability)
Pip install
remove the package

If pip download have resume feature then the problem could be solved.

What I've run

pip install -U jupyterlab in poor network condition

Collecting jupyterlab
  Downloading jupyterlab-0.28.4-py2.py3-none-any.whl (8.7MB)
    4% |█▋                              | 430kB 1.1MB/s eta 0:00:08
THESE PACKAGES DO NOT MATCH THE HASHES FROM THE REQUIREMENTS FILE. If you have updated the package versions, please update the hashes. Otherwise, examine the package contents carefully; someone may have tampered with them.
    jupyterlab from https://pypi.python.org/packages/b1/6d/d1d033186a07e08af9dc09db41401af7d6e18f98b73bd3bef75a1139dd1b/jupyterlab-0.28.4-py2.py3-none-any.whl#md5=9a93b1dc85f5924151f0ae9670024bd0:
        Expected md5 9a93b1dc85f5924151f0ae9670024bd0
             Got        4b6835257af9609a227a72b18ea011e3

The text was updated successfully, but these errors were encountered:

CTimmerman · 2018-07-09T12:12:38Z

I don't know how pip's hashing works, but here's some working, easy, simple, modular resume code in a single file/function: https://gist.github.com/CTimmerman/ccf884f8c8dcc284588f1811ed99be6c

seandepagnier · 2019-03-24T21:30:37Z

I have a poor connection and I often resume pip manually using wget.

This is easy for wheel using wget -c, then you can install the wheel with pip, but when it's a tarball I have to use the setup script and don't get the same result even though in the end it works.

chrahunt · 2020-01-25T03:23:25Z

This should be easier to implement now since all the logic regarding downloads is isolated in pip._internal.network.download.

johny65 · 2020-05-14T19:07:00Z

Any updates on this? I was installing a huge package (specifically Tensorflow, 500+MB), and for some reason pip was killed in 99% of download... Re run the command and it started downloading from 0...

pradyunsg · 2020-05-14T19:32:27Z

@johny65 No updates.

Folks are welcome to contribute this functionality to pip. As noted by @chrahunt, there's a clear part of the codebase for these changes to be made in. :)

McSinyx · 2020-05-15T09:55:39Z

I have a few questions about the design for this enhancement. First, why (or how) does this happen?

When I have poor internet connection (the network is cut unexpectedly) [...] When I retry the pip install, it would stop at the midpoint and give me the same md5 error.

My guess would be that back then the wheels are stored directly to the cache dir instead of being downloaded to a temporary location like it is handled now. Thus the hashing error should be solved.

However, because the wheel being downloaded is in a directory that will be cleaned up afterward, do we want to expose that mechanism to be configurable (e.g. pip install --wheel-dir=<user-assgned path> <packages>), or do we want to offer the last result for people with poor connections to pip download -d <user-assgned path> <packages> then pip install? Personally I prefer the latter approach, where we'd need to make pip download download directly to the specified dir, and I'm not sure if doing that would break any existing use case.

ShashankAW · 2020-10-22T04:25:18Z

Any updates on this? I was installing a huge package (specifically Tensorflow, 500+MB), and for some reason pip was killed in 99% of download... Re run the command and it started downloading from 0...

same with pytorch which was 1 GB in size. days quota just got exhausted and no fruitful result.

uranusjr · 2020-10-22T05:37:40Z

fwiw, you can always curl manually (applying the resuming logic you need and checking the integrity manually) and pip install the downloded file instead.

pradyunsg · 2020-12-01T00:50:41Z

Folks are welcome to contribute this functionality to pip.

yichi-yang · 2022-06-10T22:18:34Z

Folks are welcome to contribute this functionality to pip.

I'd like to give this a try and created a proof of concept PR here: #11180.

I'm not quite sure what the command line options will look like for this feature. I imagine we will need new options to turn on/off this feature and limit the number of retries (this is different from the --retries switch). So maybe use --resume-incomplete-download to opt-in and --resume-attempts to set the limit?

uranusjr · 2022-06-11T07:02:28Z

If this gets implemented, I would want it to be enabled by default, and fallback automatically to the previous implementation if resuming is not successful (e.g. if the server does not support resuming). This matches the behaviour of normal downloading clients e.g. web browsers.

yichi-yang · 2022-06-11T18:11:44Z

If this gets implemented, I would want it to be enabled by default, and fallback automatically to the previous implementation if resuming is not successful (e.g. if the server does not support resuming). This matches the behaviour of normal downloading clients e.g. web browsers.

How about the number of attempts? Should we keep making new requests as long as the responses have successful status code (e.g. 200) and non-empty bodies (some progress is made in each request)?

uranusjr · 2022-06-12T06:13:02Z

Instead of trying to guess how many attempts is reasonable, perhaps pip should store the incomplete download somewhere (e.g. in cache?) and resume it on the next pip install. This also better matches browser behaviour—the download is not re-attempted automatically, but the user can click a button to resume.

CTimmerman · 2022-06-12T12:54:19Z

If-Unmodified-Since should ensure it's the same file, safe to resume. https://gist.github.com/CTimmerman/ccf884f8c8dcc284588f1811ed99be6c

yichi-yang · 2022-06-12T18:52:40Z

Instead of trying to guess how many attempts is reasonable, perhaps pip should store the incomplete download somewhere (e.g. in cache?) and resume it on the next pip install. This also better matches browser behaviour—the download is not re-attempted automatically, but the user can click a button to resume.

Currently pip uses CacheControl to handle HTTP caching, but it doesn't cache responses with incomplete bodies (or Range requests with status code 206) so it doesn't help with our case (incomplete download). It seems to me that to implement a cache independent of existing HTTP and wheel caching for the sole purpose of resuming failed download will be a lot of work.

Also I'm not sure if the browser behavior is desirable in this case. With large wheels (e.g. pytorch > 2 GB) and my crappy Internet it consistently fails 4~5 times before completing. If users are installing many large packages (e.g. from a requirements.txt) having to manually resume multiple times can be annoying. That's why I think opt-in might work better. In most cases resuming is not required, but in the case it does we can present a warning informing the users that 1) the download is incomplete, and 2) they can use some command line option to automatically resume download next time.

pradyunsg · 2022-06-12T20:17:24Z

One caveat with trying to mimck the browser is that, unlike the browser's UI which lets the user cancel / pause / resume any specific download, pip doesn't have such a rich user interface via the CLI.

We'd need to, at least, provide one knob for this resuming behaviour -- either to opt-in or opt-out. I think when you're not in "resume my downloads" mode, pip should also clean up any existing incomplete downloads.

That said, picking between opt-in vs opt-out is not really blocker to needing to implement either behaviours. It's a matter of changing a flag's default value in the PR (let's use a flag with values like --incomplete-downloads=resume/discard for handling this) which is easy-enough. :)

yichi-yang · 2022-07-17T04:35:38Z

I think my PR #11180 is ready for a first round of review. Suggestions for more meaningful flag names, log messages, and exception messages are welcome.

Rom1deTroyes · 2022-10-21T21:35:47Z

Having the same problem downloading pytorch + open-cv on a streamlit project for the third time today (connection lost after 6 hours...), I wonder if making pip able to use an external downloader could be a thing ? yt-dl provides :

    --external-downloader COMMAND        Use the specified external downloader.
                                         Currently supports aria2c,avconv,axel,c
                                         url,ffmpeg,httpie,wget
    --external-downloader-args ARGS      Give these arguments to the external
                                         downloader

A kind of pip install --external-downloader wget --external-downloader-args '-r' requirements.txt ?

CTimmerman · 2022-10-22T10:20:28Z

Having the same problem downloading pytorch + open-cv on a streamlit project for the third time today (connection lost after 6 hours...), I wonder if making pip able to use an external downloader could be a thing ? yt-dl provides :
    --external-downloader COMMAND        Use the specified external downloader.
                                         Currently supports aria2c,avconv,axel,c
                                         url,ffmpeg,httpie,wget
    --external-downloader-args ARGS      Give these arguments to the external
                                         downloader
A kind of pip install --external-downloader wget --external-downloader-args '-r' requirements.txt ?

Which of those also works on Windows? Resuming HTTP downloads is simple, as evident by the PR at #11180 which is fine by me, but i feel it's such a basic feature it should be supported upstream.

pradyunsg · 2022-10-22T16:52:09Z

We're not going to be using an external programme for network interaction within pip. This should be implemented as logic within pip itself.

Nneji123 · 2023-01-25T07:21:25Z

What's the progress on this feature? It's annoying trying to install packages like tensorflow and pytorch and then getting errors when the downloads are almost complete

yichi-yang · 2023-01-25T07:40:58Z

What's the progress on this feature? It's annoying trying to install packages like tensorflow and pytorch and then getting errors when the downloads are almost complete

I have a proof-of-concept PR here: #11180. It's been a while since I last worked on it, and there has been some discussion about the user interface that I haven't incorporated into the PR.

Personally I feel like the major problems are:

Need to decided to if this is better fixed upstream (though I think parts of the resume logic will have to be handled by pip either case).
What user interface we should use?

I think it will be nice if we can have some input from the maintainers, e.g., priorities, expectations, etc.

uranusjr · 2023-01-31T17:32:19Z

By upstream do you mean requests? As for which UX to use, I don’t think anyone really expressed strong opinions, but only pointed out things the end product needs to be handle. So the best approach to drive this forward would be to implement what you feel is best and see what people think of it.

yichi-yang · 2023-01-31T18:00:27Z

By upstream do you mean requests? As for which UX to use, I don’t think anyone really expressed strong opinions, but only pointed out things the end product needs to be handle. So the best approach to drive this forward would be to implement what you feel is best and see what people think of it.

Sounds good. I'll update that PR when I got time (been busy lately).
By upstream I'm referring to the issue that requests doesn't enforce content length check: psf/requests#4956.

nbkgit · 2024-04-20T23:53:01Z

2024 still no resume for large packages, the connection is closed by the server and i have to start numpy and psyspark over and over again, a resume would save a lot of resources as pip retrieves the same stream also all over again.
I am sorry that i am not versed enough t write it myself, but it is necessary

mrlectus · 2024-05-03T10:09:45Z

2024 still no resume for large packages, the connection is closed by the server and i have to start numpy and psyspark over and over again, a resume would save a lot of resources as pip retrieves the same stream also all over again. I am sorry that i am not versed enough t write it myself, but it is necessary

Yes very necessary

pypa/pip#4796

thk686 · 2024-07-15T20:16:55Z

Currently in the western Amazon on a starlink connection trying to download birdnetlib and this is killing me. It would be so much better to use the rsync protocol with checksums.

gmargaritis · 2024-10-04T20:39:24Z

Hello everyone 👋

I opened a PR for this one (#12991). Happy to hear your thoughts and finally get it merged!

pradyunsg added type: enhancement Improvements to functionality C: download About fetching data from PyPI and other sources labels Oct 20, 2017

pradyunsg mentioned this issue Oct 29, 2017

pip install restarts download from scratch after "ReadTimeoutError" #4822

Closed

pradyunsg mentioned this issue Jul 10, 2018

pip cannot resume downloads #3128

Closed

chrahunt added type: feature request Request for a new feature and removed type: feature request Request for a new feature labels Dec 17, 2019

johnandersen777 mentioned this issue Jan 24, 2020

Downloads of packages served via torrent #7647

Closed

McSinyx mentioned this issue Aug 9, 2020

Add resume feature for download. #8735

Closed

pradyunsg added the state: awaiting PR Feature discussed, PR is needed label Dec 1, 2020

yichi-yang mentioned this issue Jun 10, 2022

Resume incomplete download #11180

Open

uranusjr mentioned this issue Aug 15, 2023

Resuming errored downloads #12227

Closed

1 task

zweger mentioned this issue Aug 23, 2023

interrupted download reports as hash failure #11153

Open

1 task

asottile-sentry mentioned this issue Nov 3, 2023

--retries has no effect during streaming downloads #12383

Open

1 task

pfmoore mentioned this issue Apr 17, 2024

re-enabling pip cache for a user in the third world with potato internet "ERROR: pip cache commands can not function since cache is disabled." #12633

Closed

1 task

njzjz mentioned this issue Apr 19, 2024

Repeated timeouts in GitHub Actions fetching wheel for large packages astral-sh/uv#1912

Closed

ichard26 mentioned this issue May 7, 2024

Continue downloads after network error. #12677

Closed

1 task

schivmeister mentioned this issue Jun 5, 2024

Dry-run "find" or equivalent option to list wheels and other files that will be downloaded #12749

Open

albertz added a commit to rwth-i6/returnn that referenced this issue Jun 26, 2024

CI, pip, retry download

855b4cd

pypa/pip#4796

albertz added a commit to rwth-i6/returnn that referenced this issue Jul 12, 2024

CI RF-tests, try TF install several times

d5b954b

pypa/pip#4796

notatallshaw mentioned this issue Jul 19, 2024

Feature Request: Auto-Resume Downloads in pip #12860

Closed

1 task

gmargaritis linked a pull request Oct 4, 2024 that will close this issue

Introduce resumable downloads with --resume-retries #12991

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Improvement] Pip could resume download package at halfway the connection is poor #4796

[Improvement] Pip could resume download package at halfway the connection is poor #4796

winstonma commented Oct 20, 2017 •

edited by pradyunsg

Loading

CTimmerman commented Jul 9, 2018 •

edited

Loading

seandepagnier commented Mar 24, 2019

chrahunt commented Jan 25, 2020

johny65 commented May 14, 2020

pradyunsg commented May 14, 2020

McSinyx commented May 15, 2020

ShashankAW commented Oct 22, 2020

uranusjr commented Oct 22, 2020

pradyunsg commented Dec 1, 2020

yichi-yang commented Jun 10, 2022

uranusjr commented Jun 11, 2022

yichi-yang commented Jun 11, 2022

uranusjr commented Jun 12, 2022

CTimmerman commented Jun 12, 2022 •

edited

Loading

yichi-yang commented Jun 12, 2022

pradyunsg commented Jun 12, 2022

yichi-yang commented Jul 17, 2022 •

edited

Loading

Rom1deTroyes commented Oct 21, 2022

CTimmerman commented Oct 22, 2022

pradyunsg commented Oct 22, 2022

Nneji123 commented Jan 25, 2023

yichi-yang commented Jan 25, 2023

uranusjr commented Jan 31, 2023

yichi-yang commented Jan 31, 2023

nbkgit commented Apr 20, 2024

mrlectus commented May 3, 2024

thk686 commented Jul 15, 2024

gmargaritis commented Oct 4, 2024

[Improvement] Pip could resume download package at halfway the connection is poor #4796

[Improvement] Pip could resume download package at halfway the connection is poor #4796

Comments

winstonma commented Oct 20, 2017 • edited by pradyunsg Loading

Description

What I've run

CTimmerman commented Jul 9, 2018 • edited Loading

seandepagnier commented Mar 24, 2019

chrahunt commented Jan 25, 2020

johny65 commented May 14, 2020

pradyunsg commented May 14, 2020

McSinyx commented May 15, 2020

ShashankAW commented Oct 22, 2020

uranusjr commented Oct 22, 2020

pradyunsg commented Dec 1, 2020

yichi-yang commented Jun 10, 2022

uranusjr commented Jun 11, 2022

yichi-yang commented Jun 11, 2022

uranusjr commented Jun 12, 2022

CTimmerman commented Jun 12, 2022 • edited Loading

yichi-yang commented Jun 12, 2022

pradyunsg commented Jun 12, 2022

yichi-yang commented Jul 17, 2022 • edited Loading

Rom1deTroyes commented Oct 21, 2022

CTimmerman commented Oct 22, 2022

pradyunsg commented Oct 22, 2022

Nneji123 commented Jan 25, 2023

yichi-yang commented Jan 25, 2023

uranusjr commented Jan 31, 2023

yichi-yang commented Jan 31, 2023

nbkgit commented Apr 20, 2024

mrlectus commented May 3, 2024

thk686 commented Jul 15, 2024

gmargaritis commented Oct 4, 2024

winstonma commented Oct 20, 2017 •

edited by pradyunsg

Loading

CTimmerman commented Jul 9, 2018 •

edited

Loading

CTimmerman commented Jun 12, 2022 •

edited

Loading

yichi-yang commented Jul 17, 2022 •

edited

Loading