Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Comfy3D-pt25 is failing fresh build #73

Open
yuisheaven opened this issue Dec 16, 2024 · 13 comments
Open

Comfy3D-pt25 is failing fresh build #73

yuisheaven opened this issue Dec 16, 2024 · 13 comments

Comments

@yuisheaven
Copy link

yuisheaven commented Dec 16, 2024

I started a fresh installation of the Comfy3D-pt25 but it fails due to dependency conflicts. The last lines are these:

2024-12-16 15:33:13 [F1216 14:33:13.286945381 glutil.cpp:338] eglInitialize() failed
2024-12-16 15:33:36 /runner-scripts/entrypoint.sh: line 58: 1807 Aborted (core dumped) python3 ./ComfyUI/main.py --listen --port 8188 ${CLI_ARGS}

Here are the full logs of the fresh deploy:
docker-logs.txt


I do not think this is relevant for the specific case but I am running Docker Desktop on a Windows 11 machine.

The container was started by running the docker-compose up -d command on the docker compose file using the image from the cloud (not self-built)

@YanWenKun
Copy link
Owner

The [glutil.cpp:338] eglInitialize() failed was thrown by nvdiffrast. I'll test it soon on a Windows machine.

Before that, I suggest a quick try to use the dependencies come with the image:

Delete .cache and .local, keep other files. Then start the container.

@yuisheaven
Copy link
Author

I did that, but same result

@yuisheaven
Copy link
Author

I might want to add that I did not have any problems with the cu124-megapack version. But I want to use the Comfy3D-pt25 one which runs into the error mentioned in the description

@yuisheaven
Copy link
Author

(different names, volumes, networks)

@YanWenKun
Copy link
Owner

I have reproduced the issue on a clean Windows 11 Docker Desktop installation. This is strange, as the same image runs fine on Linux with Podman.

After hours of searching and troubleshooting, I couldn't find a solution. All the PyTorch 2.2/2.3/2.5 build fails to start.

I even set up a fresh WSL2 openSUSE distro and installed everything step-by-step, but it still throws the eglInitialize() failed error.

I plan to try building on Ubuntu. But for now I'm leaving this issue open to see if anyone else has any insights.

@YanWenKun YanWenKun pinned this issue Dec 17, 2024
@yuisheaven
Copy link
Author

By chance, do you know which was the last build of your Comfy3D-pt25 images that still worked on docker on windows? I would like to eventually try to use an older image in the meantime @YanWenKun

@yuisheaven
Copy link
Author

I tried it back to comfy3d-pt25-20241111 but with no success. Same error

@yuisheaven
Copy link
Author

2024-12-17 19:05:51 Building wheels for collected packages: nvdiffrast
2024-12-17 19:05:51 Building wheel for nvdiffrast (setup.py) ... done
2024-12-17 19:05:51 Created wheel for nvdiffrast: filename=nvdiffrast-0.3.3-py3-none-any.whl size=139906 sha256=e9805499e700c4c29f6d83c722e1d6ab340afed4ad8adb7c1eeb6819b7b727a5
2024-12-17 19:05:51 Stored in directory: /tmp/pip-ephem-wheel-cache-gq7wwgk2/wheels/d1/82/ea/91b5b9219953f7784a69f9e8d2dacad80beb8a99a5e46af62e
2024-12-17 19:05:51 Successfully built nvdiffrast
2024-12-17 19:05:52 Installing collected packages: numpy, nvdiffrast
2024-12-17 19:05:51 ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
2024-12-17 19:05:51 numba 0.60.0 requires numpy<2.1,>=1.22, but you have numpy 2.2.0 which is incompatible.
2024-12-17 19:05:51 gpytoolbox 0.3.3 requires numpy<2, but you have numpy 2.2.0 which is incompatible.
2024-12-17 19:05:51 Successfully installed numpy-2.2.0 nvdiffrast-0.3.3
2024-12-17 19:05:52 Collecting numpy==1.26.4
2024-12-17 19:05:52 Downloading numpy-1.26.4-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (61 kB)
2024-12-17 19:05:52 Downloading numpy-1.26.4-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (18.0 MB)

This block seems quite suspicious to me. It seems like a dependency conflict with numpy, numba and gpytoolbox.

mumba needs numpy between 1.22 and 2.1
gpytoolbox needs numpy < 2
So both would be fine with the one that is auto-installed later (1.26.4). But I assume there might be a problem because it installed numpy-2.2.0 and nvdiffrast-0.3.3 together. Maybe nvdiffrast was somewhere linked to the numpy-2.2.0 which then had problems with the downgrade?

@yuisheaven
Copy link
Author

yuisheaven commented Dec 17, 2024

I think I found something and it is basically coming from my post before. I edited the comfy3d-pt25/runner-scripts/build-deps.sh to first install numpy 1.26.4 and afterwards install all the other things.

I started its docker compose with docker compose up -d --build and now it still has problems importing a 3d package, but it does not crash anymore.
The reason for the failed import is because some package is still upgrading numpy at some point.
I see that in the builder-scripts/pak9.txt, numpy is pinned globally to 1.26.4 but some package somehow overwrites this and causes trouble. I am now trying to see if I can find out which one and see if the solution is to just prevent the numpy version upgrade

@yuisheaven
Copy link
Author

the package which always installs the wrong numpy version seems to be nvdiffrast. It always installs numpy 2.2.0, even if 1.26.4 is already installed. I tried it with --upgrade-strategy only-if-needed but it ignored that apparently..

@yuisheaven
Copy link
Author

so I now tried changing the diffrast installation to

pip install \
    "git+https://github.com/NVlabs/nvdiffrast.git" --upgrade-strategy only-if-needed

Which resulted in the two packages numba and gpytoolbox to apparently be installed correctly, but I still ended up in getting the glutil.cpp:338] eglInitialize() failed error again and it aborted.. so I am mainly out of clues at this point

@yuisheaven
Copy link
Author

I now also solved the warning of the unspecified cuda architecture by adding a
export TORCH_CUDA_ARCH_LIST="8.6" to the build-deps.sh, but 8.6 is of course only the specific one for me.

Nonetheless, I still run into the very same error in the end and dont really find

a clue on what exactly is causing it to fail or how to further troubleshoot. My main problem here also being that I am only familiar with ubuntu.

I've attached my latest log files and my current (for me personally better working) build-deps.sh as txt so I can upload it. I hope this might help a future developer in figuring out more than I could in the short time.

Additional info: nvidia-smi returned the graphics card correctly.

latest-docker-log.txt
build-deps.txt

@YanWenKun
Copy link
Owner

Thanks for the information!

I tried an Ubuntu-based image on Docker Desktop WSL2, manually installed everything. Got the Same [glutil.cpp:338] eglInitialize() failed error.

I then tried Podman Desktop WSL2, with GPU enabled, running comfy3d-pt25 image. Still got the [glutil.cpp:338] eglInitialize() failed error.

So far I'm not diving into the code. I noticed that dropping the egl may be an option. I'll try it later.

But for now I'm working on integrating TRELLIS, both for this Dockerfile, and the Windows package.
If you need 3D-Pack running on Windows, check that out. (Also, recommend using Sandboxie for a clean environment).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants