Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

QOL Improvements #27

Open
wants to merge 32 commits into
base: main
Choose a base branch
from
Open

QOL Improvements #27

wants to merge 32 commits into from

Conversation

d8ahazard
Copy link

Add:

  1. Args for custom checkpoint, llm, outputs locations.
  2. Args to pass HF token on startup.
  3. Args for default llm/checkpoint on startup.
  4. Selector for custom/default checkpoints, llms.
  5. Seed randomization via -1
  6. Save outputs to user-specified dir.
  7. Load any checkpoint via from_single_file

@d8ahazard d8ahazard mentioned this pull request Jun 2, 2024
@d8ahazard d8ahazard marked this pull request as draft June 2, 2024 18:33
@d8ahazard d8ahazard marked this pull request as ready for review June 2, 2024 18:44
@xhoxye
Copy link

xhoxye commented Jun 2, 2024

`Running on local URL: http://0.0.0.0:7860

To create a public link, set share=True in launch().
Traceback (most recent call last):
File "C:\ProgramData\anaconda3\envs\omost\lib\site-packages\gradio\queueing.py", line 528, in process_events
response = await route_utils.call_process_api(
File "C:\ProgramData\anaconda3\envs\omost\lib\site-packages\gradio\route_utils.py", line 270, in call_process_api
output = await app.get_blocks().process_api(
File "C:\ProgramData\anaconda3\envs\omost\lib\site-packages\gradio\blocks.py", line 1908, in process_api
result = await self.call_function(
File "C:\ProgramData\anaconda3\envs\omost\lib\site-packages\gradio\blocks.py", line 1497, in call_function
prediction = await utils.async_iteration(iterator)
File "C:\ProgramData\anaconda3\envs\omost\lib\site-packages\gradio\utils.py", line 632, in async_iteration
return await iterator.anext()
File "C:\ProgramData\anaconda3\envs\omost\lib\site-packages\gradio\utils.py", line 758, in asyncgen_wrapper
response = await iterator.anext()
File "E:\AI\Omost\Omost\chat_interface.py", line 554, in _stream_fn
first_response, first_interrupter = await async_iteration(generator)
File "C:\ProgramData\anaconda3\envs\omost\lib\site-packages\gradio\utils.py", line 632, in async_iteration
return await iterator.anext()
File "C:\ProgramData\anaconda3\envs\omost\lib\site-packages\gradio\utils.py", line 625, in anext
return await anyio.to_thread.run_sync(
File "C:\ProgramData\anaconda3\envs\omost\lib\site-packages\anyio\to_thread.py", line 56, in run_sync
return await get_async_backend().run_sync_in_worker_thread(
File "C:\ProgramData\anaconda3\envs\omost\lib\site-packages\anyio_backends_asyncio.py", line 2177, in run_sync_in_worker_thread
return await future
File "C:\ProgramData\anaconda3\envs\omost\lib\site-packages\anyio_backends_asyncio.py", line 859, in run
result = context.run(func, *args)
File "C:\ProgramData\anaconda3\envs\omost\lib\site-packages\gradio\utils.py", line 608, in run_sync_iterator_async
return next(iterator)
File "C:\ProgramData\anaconda3\envs\omost\lib\site-packages\torch\utils_contextlib.py", line 35, in generator_context response = gen.send(None)
File "E:\AI\Omost\Omost\gradio_app.py", line 208, in chat_fn
seed = np.random.randint(0, 2 ** 32 - 1)
File "numpy\random\mtrand.pyx", line 780, in numpy.random.mtrand.RandomState.randint
File "numpy\random\_bounded_integers.pyx", line 2881, in numpy.random._bounded_integers._rand_int32
ValueError: high is out of bounds for int32
Last assistant response is not valid canvas: expected string or bytes-like object`
QQ截图20240603070353
QQ截图20240603070529

@ginto-sakata
Copy link

@xhoxye seems like you have 32-bit version of numpy

@xhoxye
Copy link

xhoxye commented Jun 3, 2024

The original version works fine @ginto-sakata

@ginto-sakata
Copy link

ginto-sakata commented Jun 3, 2024

@xhoxye that's because @d8ahazard added a random seed generation, for which upper limit is 2**32 -1.
Your error indicates that you might have 32-bit version of numpy.
Check if your numpy is 64-bit:

>>> import numpy as np
>>> print(np.int_)
<class 'numpy.int64'>

@xhoxye
Copy link

xhoxye commented Jun 3, 2024

@ginto-sakata
How do I change it?

@d8ahazard
Copy link
Author

Try another pull, I think I fixed the precision issue.

@xhoxye
Copy link

xhoxye commented Jun 3, 2024

I'm still having the same error here, unless I force seed = np.random.randint(0, 2 ** 31 - 1), and reading hf_download when rendering an image is also an error, and I can't read the model I copied to the Omost\models\checkpoints folder when refreshing

@ginto-sakata
Copy link

ginto-sakata commented Jun 3, 2024

@ginto-sakata How do I change it?

Just make sure you have 64-bit python and install numpy again (you might want to do a clean install just in case)

edit: Never mind, the problem is not in your system, this is intended behavior for numpy on windows:
https://numpy.org/devdocs/reference/random/generated/numpy.random.randint.html

This function defaults to the C-long dtype, which is 32bit on windows and otherwise 64bit on 64bit platforms

I have tried on my windows system and yes, it shows <class 'numpy.int32'>

@lludlow
Copy link

lludlow commented Jun 3, 2024

  • Seed randomization via -1

Do we always want a random seed? a random seed button would give you more control

@d8ahazard
Copy link
Author

  • Seed randomization via -1

Do we always want a random seed? a random seed button would give you more control

Could be a good idea. I was just emulating how AUTO does it in that -1 == "randomized", otherwise, put in whatever you want.

@xhoxye
Copy link

xhoxye commented Jun 5, 2024

the .safetensors model use fp16, otherwise 8GB vram will not work

@jtabox
Copy link

jtabox commented Jun 19, 2024

Hi, I've been using this improved branch for a while, thanks for your work. I don't have any bug to report, just a question I can't figure out:
When using local SDXL models, when loading, some of them take forever at the Load to GPU: UNet2DConditionModel stage, while others load quickly, and I can't figure out why.

For example:

Using the default RunDiffusion/Juggernaut-X-v10 model:

Load to GPU: UNet2DConditionModel loads at 1.36 iterations/sec, and takes less than 20 secs.

While using a local juggernautXL_juggernautX.safetensors:

Load to GPU: UNet2DConditionModel loads at 100.39 seconds/iteration and I have to cancel it because it takes more than an hour.

This isn't happening with all my local models, some do it others don't. Is there anything I can do to change this? I'm running on a 3080 with 10 GB.

@Dibucci
Copy link

Dibucci commented Jun 25, 2024

I'm new to this but I'm loving how this works. it's honestly amazing imo. The only issue I seam to have is, I have no clue how to code or where/what to look for in the files to even change what model it is getting. Currently running this in a python env, but as someone who doesn't know code, well besides how to just setup a "simple" env, just wondering if there are any QOL improvements that would maybe include a dropdown for model selection to a folder that we can just put compatible models into?

I already use SD with comfyUI and other things, and would love to see how this works on some of my own SDXL merges I have worked on, but I'm at a loss for how to even do it.

@jtabox
Copy link

jtabox commented Jun 26, 2024

I'm new to this
... snip ...
I'm at a loss for how to even do it.

You're probably not using this specific fork you commented on? This comment chain is for another fork (https://github.com/runnitai/Omost/tree/main), not lllyasviel's main fork. Because this specific fork here already has some QoL improvements implemented, and among them is what you're asking for, i.e. the ability to use a dropdown menu and choose models you've put into a folder.

The easiest way to use this here fork would be to create a new folder and follow the same instructions as you did for the main fork, but change the git clone command to use this other repo.

  1. git clone https://github.com/runnitai/Omost.git
  2. conda create -n omost python=3.10 && conda activate omost && cd Omost
  3. pip install torch torchvision --index-url https://download.pytorch.org/whl/cu121
  4. pip install -r requirements
  5. python gradio_app.py

Inside the Omost folder you should see the subfolders models and then checkpoints. Put any models you'd like to use in the latter. Inside the UI there's a Refresh Models button. Press it and your models should be available from the dropdown menu.

Edit: as you might have read in my previous comment, some local models seem to take an extremely long time to load, making the process practically impossible. I haven't figured out what causes this yet. I have a 3080 so loading models and generating images isn't an issue generally, so it's probably not VRAM related. Not to mention that I can run those same models normally in A1111 & ComfyUI.

@Dibucci
Copy link

Dibucci commented Jun 27, 2024

ty Jtabox, I've never been good with git stuff, this helps a lot, I'll try this here shortly

now to see if it works with a symlink so i don't have to physically move stuff

@Dibucci
Copy link

Dibucci commented Jun 27, 2024

Edit: as you might have read in my previous comment, some local models seem to take an extremely long time to load, making the process practically impossible. I haven't figured out what causes this yet. I have a 3080 so loading models and generating images isn't an issue generally, so it's probably not VRAM related. Not to mention that I can run those same models normally in A1111 & ComfyUI.

I see what you mean. Just tried 3 different models, cuz I"m trying not to use pony its good but i hate the "score" system, and the one called EchoAlpha didn't load, the new Compassmix, which is a SDXL lightning model, just hung, and then i tried Furry_XL, aka Seart's furry model that was used to make Compassmix, and it worked, but it was moving so slow, at like 90-110 seconds per iteration. However, when on SD WebUI's I usually get about 2-4 iterations per second on a 1024x1024 res and at lest 1.2 if I go to as high as 1440x1440, which is rare but sometimes i get lucky at that res with a good image. I'm gonna try it with pony but i've also noticed that its got different LLM's to select from as well. So, is the one selected by default when I load up this fork of the program the same one used in the main branch? Only asking cuz with the Main Branch it generates at normal speed.... but i guess i better try RealVision first cuz I haven't tried it at pure default yet, just went straight to putting in my own model.

I have an RTX 4070TI OC with 12GB of VRAM, the OC is Overclock and it was OC from factory according to the package, and no, i don't know how to OC, or what it is OC to, so whatever it's set to its staying at. I won't touch something I don't understand myself.

EDIT: so, default with RealVisXL V4.0 works fine, about 2 it/s, which is fine and dandy. but so far any other model I seam to try, that is mentiond above, either hangs or takes forever and my whole system bogs down. Like my pc was lagging at like 10 fps for everything

@Dibucci
Copy link

Dibucci commented Jun 27, 2024

so, I followed your instructions, but idk if "I'm" doing something wrong or what at this point...

I sat here for about 15 Minutes and still couldn't get one iteration with BASE PonyDiffusionXL model... the ones that download from the program work fine, so... idk....

when it hangs like that idk what else to do so i just CTRL+C on the command window and it closes the running instance. I'm going to try downloading the models "fresh" from CivitAI and see if maybe its something to do with the models I have moved from my SD WebUI folder to this Programs folder.

if it doesn't work, then maybe i did something wrong or maybe theres some weird incompatability with my card... idk, and, if it does work with the same model "re-downloaded" straight from CivitAI into the Checkpoints folder of this program, them I'm just gonna be even more confused and wondering, If maybe I should delete and re-download all my models cuz somehow maybe "THEY" are the thing thats broken xD

this is what's going on in the console when It tries to gen the image

Load to GPU: LlamaForCausalLM
The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
G:\Omost\Omost 2\Omost\lib\site-packages\transformers\models\llama\modeling_llama.py:649: UserWarning: 1Torch was not compiled with flash attention. (Triggered internally at ..\aten\src\ATen\native\transformers\cuda\sdp_utils.cpp:455.)
  attn_output = torch.nn.functional.scaled_dot_product_attention(
You shouldn't move a model that is dispatched using accelerate hooks.
Unload to CPU: LlamaForCausalLM
Load to GPU: CLIPTextModel
Load to GPU: CLIPTextModel
Unload to CPU: CLIPTextModel
Unload to CPU: CLIPTextModel
Load to GPU: UNet2DConditionModel
  0%|                                                                                                             | 0/25 [00:00<?, ?it/s]K ```

@jtabox
Copy link

jtabox commented Jun 28, 2024

so, I followed your instructions, but idk if "I'm" doing something wrong or what at this point...
... snip ...
Load to GPU: UNet2DConditionModel

Aye, that's the exact point I'm stuck at, regarding some models. Some local models load fine, but others take so much time at this step (Load to GPU: UNet2DConditionModel) that I have to Ctrl+C. For the models that work, this process takes mere seconds, but for the models that don't, it's extremely slow and unusable.

Unfortunately my knowledge regarding coding these things kinda stops here, so I can't even figure out if it's some setting I'm missing or if it's because I lack GPU memory. Though I don't think it's the latter, those same models load fine in Forge, Fooocus & Comfy UI.

But yeah, that was the reason I posted initially in this thread, any suggestions would be very welcome.

@d8ahazard
Copy link
Author

@lllyasviel - I see you're on a tear doing updates to forge and focus. Maybe take a look at this PR, consider merging it?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants