Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CDN sync seems to be slower than usual this week #740

Closed
2 tasks done
leofang opened this issue Apr 6, 2023 · 28 comments
Closed
2 tasks done

CDN sync seems to be slower than usual this week #740

leofang opened this issue Apr 6, 2023 · 28 comments
Labels
locked [bot] locked due to inactivity type::bug describes erroneous operation, use severity::* to classify the type

Comments

@leofang
Copy link

leofang commented Apr 6, 2023

Checklist

  • I added a descriptive title
  • I searched open reports and couldn't find a duplicate

What happened?

CDN sync seems to be slower than usual this week. Taking libcublas as example
截圖 2023-04-05 下午8 42 23
I started monitoring the status via conda search --platform linux-aarch64 libcublas after this PR is merged and the copy to the conda-forge channel is done, and as shown above it took ~47 mins for conda search to find it. IIRC the CDN sync time has been significantly reduced to 15-30 mins before, so this is a bit concerning.

Conda Info

$ conda info

     active environment : opt_einsum_dev
    active env location : /home/leof/miniforge3/envs/opt_einsum_dev
            shell level : 2
       user config file : /home/leof/.condarc
 populated config files : /home/leof/miniforge3/.condarc
                          /home/leof/.condarc
          conda version : 23.3.1
    conda-build version : not installed
         python version : 3.9.16.final.0
       virtual packages : __archspec=1=x86_64
                          __cuda=12.1=0
                          __glibc=2.31=0
                          __linux=5.8.0=0
                          __unix=0=0
       base environment : /home/leof/miniforge3  (writable)
      conda av data dir : /home/leof/miniforge3/etc/conda
  conda av metadata url : None
           channel URLs : https://conda.anaconda.org/conda-forge/linux-64
                          https://conda.anaconda.org/conda-forge/noarch
          package cache : /home/leof/miniforge3/pkgs
                          /home/leof/.conda/pkgs
       envs directories : /home/leof/miniforge3/envs
                          /home/leof/.conda/envs
               platform : linux-64
             user-agent : conda/23.3.1 requests/2.28.2 CPython/3.9.16 Linux/5.8.0-53-generic ubuntu/20.04.2 glibc/2.31
                UID:GID : 1019:1019
             netrc file : None
           offline mode : False


### Conda Config

```shell
$ conda config --show-sources
==> /home/leof/miniforge3/.condarc <==
channels:
  - conda-forge

==> /home/leof/.condarc <==
channels:
  - conda-forge


### Conda list

_No response_

### Additional Context

_No response_
@leofang leofang added the type::bug describes erroneous operation, use severity::* to classify the type label Apr 6, 2023
@leofang
Copy link
Author

leofang commented Apr 6, 2023

cc: @jakirkham

@barabo
Copy link

barabo commented Apr 6, 2023

I'm looking into an issue with one of the Cloudflare caches. It seems that it only caches .tar.bz2 files (we forgot to add .conda when we started putting conda files in the repo for conda-forge), so .conda downloads would likely be much slower.

@jakirkham
Copy link
Member

Ah ok. This makes much more sense. Thanks Carl! 🙏

@jakirkham
Copy link
Member

jakirkham commented Apr 7, 2023

Recently saw an issue where a package, libnvjitlink, uploaded binaries for Linux and Windows at roughly the same time. However the Windows package mirrored much slower.

Edit: The Windows packages mentioned here took ~1.5hrs to mirror. This was the original build and this is the first CI build to get the package.


Screenshot showing Linux and Windows packages Screen Shot 2023-04-06 at 6 35 19 PM

Note: Download count is 1 because I did a download from the web UI.

Searching for Linux package
$ conda search 'libnvjitlink[channel=conda-forge, subdir=linux-64]'
Loading channels: done
# Name                       Version           Build  Channel             
libnvjitlink                 12.0.76      hcb278e6_0  conda-forge
Searching for Windows package
$ conda search 'libnvjitlink[channel=conda-forge, subdir=win-64]'
Loading channels: done
No match found for: conda-forge/win-64::libnvjitlink. Search: conda-forge/win-64::*libnvjitlink*

PackagesNotFoundError: The following packages are not available from current channels:

  - conda-forge/win-64::libnvjitlink

Current channels:

  - https://conda.anaconda.org/conda-forge/win-64
  - https://conda.anaconda.org/conda-forge/noarch

To search for alternate channels that may provide the conda package you're
looking for, navigate to

    https://anaconda.org

and use the search bar at the top of the page.

@dholth
Copy link
Contributor

dholth commented Apr 10, 2023

@barabo it makes sense to cache .conda but is it faster? Wouldn't the CDN sync usually be the first downloader of brand-new .conda packages from a cold cache? (But, we see plenty of downloads in the screenshot)

@jakirkham
Copy link
Member

Windows is only downloaded once because I clicked the link in the web UI to download it. There were 0 downloads prior and no additional downloads until CDN sync completed ~1.5hrs after upload

@jakirkham
Copy link
Member

jakirkham commented Apr 19, 2023

Noticing this with cuda-nvcc number 1 on ppc64le (other packages uploaded at the same time are already available):

Package has been up for ~1.75hrs, but is not available from CDN (getting missing package errors when requesting it)

Screenshot: Screen Shot 2023-04-19 at 3 45 11 PM

@leofang
Copy link
Author

leofang commented Apr 24, 2023

FYI it took >60 mins to reflect a simple channel label change: conda-forge/admin-requests#710 (comment)

@beckermr
Copy link

I want to chime in here that I am seeing CDN sync times on the conda-forge status page of over 15 minutes on a regular basis now.

@barabo
Copy link

barabo commented Apr 28, 2023

@dholth and I are going to sync on this early next week. Something does seem to be going on - we'll get to the bottom of it.

@jakirkham
Copy link
Member

Thanks Carl! 🙏

Please let us know if you need anything 🙂

@dholth
Copy link
Contributor

dholth commented May 1, 2023

We've shortened the cron interval so that updates should happen more frequently. Keep an eye on it and we'll see whether any other part of the pipeline is delayed.

@barabo
Copy link

barabo commented May 1, 2023

The cron interval was every 10 minutes, which was fine when the job reliably ran in under 7 minutes. It recently started going over 10 for some runs, so we shortened it to 2.

I'm still looking at the logs to see if there's a way to speed it up.

@jakirkham
Copy link
Member

Ok would be interested to know why the script is taking longer. AIUI there was some work in the past to cutdown the script runtime pretty significantly

@jezdez
Copy link
Member

jezdez commented May 15, 2023

Looking at this, I'm uncertain if we've come to a conclusion, @barabo do you think we can close this?

@jakirkham
Copy link
Member

Reading Carl's last comment, my (potentially incorrect) understanding is the cron job used for mirroring is starting to take longer. The cause for this is unknown and being investigated. So not yet fully resolved

@dholth
Copy link
Contributor

dholth commented May 15, 2023

I'm not too worried about it yet. The cron job used to take 6-7 minutes, and now it sometimes takes a little longer (which would have caused a 10 minute delay in the past); but sometimes it still runs in < 10 minutes. We should try to vacuum the databases at least.

@jakirkham
Copy link
Member

If it is reliably mirroring at 10min intervals great, the issues mentioned above were when +1hr mirroring times were seen

@m3vaz
Copy link

m3vaz commented Jun 29, 2023

@jakirkham @leofang we're still seeing these issues with packages that were posted 23 hours ago e.g. cuda-python.

@barabo
Copy link

barabo commented Jun 29, 2023

Looking into the nvidia clone worker right now. It appears to have gotten stuck 18 hours ago and needed a restart. I believe it's done updating now.

@m3vaz
Copy link

m3vaz commented Jun 29, 2023

@barabo Confirmed, I see the packages now.

Is there a way we could check on sync status for a given channel? (for when we hit similar issues in the future)

@barabo
Copy link

barabo commented Jun 29, 2023

I believe you can do something like this to get a sense for when a channel subdir was last updated.

curl -Is https://conda.anaconda.org/nvidia/linux-64/repodata.json | grep last-modified
last-modified: Thu, 29 Jun 2023 18:35:12 GMT

It won't work if there are no new packages in linux-64 for that channel, but if you know that's what you're looking for it should be a good test.

@m3vaz
Copy link

m3vaz commented Jun 29, 2023

Can we assume that the update job takes ~10 minutes and is run every 10 minutes (as referenced earlier in the issue)?

@jakirkham
Copy link
Member

cc @adibbley (for awareness)

@barabo
Copy link

barabo commented Jun 30, 2023

conda-forge syncs every 10 minutes, but I think the nvidia channel (and a few others) only sync every 20 minutes. We can look into increasing that cadence, if necessary.

@jakirkham
Copy link
Member

We are seeing this issue with the nvidia channel again. Could someone please take a look?

cc @raydouglass

@jezdez
Copy link
Member

jezdez commented Aug 1, 2023

@jakirkham we're looking into it

@jezdez
Copy link
Member

jezdez commented Dec 15, 2023

This was resolved at the time, closing.

@jezdez jezdez closed this as completed Dec 15, 2023
@github-actions github-actions bot added the locked [bot] locked due to inactivity label Aug 15, 2024
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Aug 15, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
locked [bot] locked due to inactivity type::bug describes erroneous operation, use severity::* to classify the type
Projects
Archived in project
Development

No branches or pull requests

7 participants