Skip to content

Latest commit

 

History

History
352 lines (249 loc) · 14.3 KB

README.md

File metadata and controls

352 lines (249 loc) · 14.3 KB

FindAImage

Using AI image descriptions to organize the meme portfolio.

  • Simple utility. Search photos in browser.
  • Offline and private. Optionally use OpenAI or Google.
  • Internet-ready. Can publish album as a website.
  • Light wight. Does not require torch. 4GiB VRAM.
  • Free and open-source. NO WARRANTIES. See LICENSE.

preview

Who doesn't have a folder of their favorite memes? But it becomes tedious scrolling through pages and pages of memes and photos to find the right one for every occasion.

Enjoying it so far, or want more features? Support development. (PayPal donation link).

Get updates from GitHub

git clone https://github.com/themanyone/FindAImage.git
cd FindAImage

Or, if it's already downloaded, git pull.

Python Dependencies

pip install -r requirements.txt

Optional ChatGPT from OpenAI

Export OPENAI_API_KEY to enable ChatGPT. Edit .bashrc, or another startup file:

export OPENAI_API_KEY=<my API key>

Optional Google Gemini

  • Sign up for a GOOGLE_API_KEY
  • pip install -q -U google-generativeai
  • export GENAI_KEY=<YOUR API_KEY>

Local LLAVA server

A local server is a good way to generate captions, avoid censorship, and keep everything private. Install llama-cpp-python. If you already cloned llama.cpp, you can make a link to it under llama-cpp-python/vendors to avoid downloading it twice. Build the OpenAI Compatible Web Server using acceleration like CUDA or VULKAN, if possible. Look to the tutorial below for additional instructions on finding and downloading a LLAVA model for it.

We set up a llama.cfg that includes a link to our model. If you add other models, just make sure the model_alias contains 'vision' or 'llava' so we can identify it as a vision model. Increase n_gpu_layers if there is enough VRAM. Get models from here.

{
    "host": "0.0.0.0",
    "port": 8087,
    "root_path": "/completion",
    "models": [
        {
            "model": "/home/k/.local/share/models/llava-phi-3-mini-int4.gguf",
            "model_alias": "llava-phi-3",
            "chat_format": "llava-1-5",
            "clip_model_path":
"/home/k/.local/share/models/llava-phi-3-mini-mmproj-f16.gguf",
            "n_gpu_layers": 7,
            "offload_kqv": true,
            "n_threads": 7,
            "n_batch": 512,
            "n_ctx": 1024
        },
        
...
        
    ]
}

Then we make sure AImages.py matches the configuration we set up. If you change the above port, also change LLAVA_ENDPOINT in album_create.py. We're using port 8087 for these examples for no particular reason.

...
    elif ai_model == 'local':
        lclient = OpenAI(base_url=LLAVA_ENDPOINT, api_key="sk-xxx")
        # This uses the self-hosted path, which should be okay if the server
        # is on the same machine or network.
        url = f"{host}:{port}/images/{filename}"
        print(url)
        response = lclient.chat.completions.create(
            model = "llava-phi-3",
            messages=[

For best results,

  • start server with --config_file=llama.cfg,
  • download several .gguf models from here,
  • populate llama.cfg as in docs,
  • and have at least one llava model for images.

Photo Album Builder

Once llama-cpp-python is set up and running, and cofigured with some models, you can test captioning photos in the memes directory. This will create a server to host the photo album builder. The builder then creates a web page that will be the photo album.

./album_create.py memes

You should see a URL for the photo album builder. Ctrl+click it to open it. Or type it into your browser. Monitor memory usage with nvtop.

The link might look something like this. http://localhost:9165

If there is an existing index.html in the image folder, it will import captions from there. If not, it will scan the image metadata for keywords. If the photos were already tagged with keywords using a tool like LLavaImageTagger it will use those.

Supervise children. Be aware that these models are under active development. Their output, though usually fine, may not always be safe for all ages.

Once you open the web page, you can

  • select a model from the drop-down menu in the upper-left,
  • click buttons to generate captions,
  • click inside text boxes to manually edit captions,
  • and save the annotated photo album.

Copy the saved index.html back to the directory where the images are. Launch it with a browser (or double click it in your file manager) any time you want to search images.

Now try making portfolios out of other image folders.

./album_create.py ~/Pictures/2024

Bonus Chat

Test your llama-cpp-python configuration with aichat.py. It starts a chat server so anyone on your wifi can select and chat with the local LLMs you downloaded, upload or capture pictures from a webcam (for models that support them), read and translate text in images, or ask questions about them.

Canvas mode. You can edit questions, code, and responses right in the interface by clicking twice on the text. A button will appear to submit a new query with your edits, comments, or annotations.

chat

Linux Tutorial

This section is no longer required, but recommended. Learn to use local AI from the command line on Linux. From there we can automate caption generation of entire directories and subdirectories. The command line is where we get ideas to make this stuff.

Install at least tidy. For documentation, consider also installing pinfo.

Fedora, Centos. dnf install tidy pinfo

Ubuntu, Debian. dpkg -i tidy pinfo

Arch pacman -S tidy pinfo

Install llama.cpp

For this section, we are using our own unofficial fork of llama.cpp. We have submitted our changes via pull request. If accepted, maybe the official version will become usable.

git clone https://github.com/themanyone/llama.cpp.git
git checkout hk # switch to --template branch.

Build according to the project's instructions.

Install by copying executables to somewhere in $PATH, such as ~/.local/bin/.

cp llama-* ~/.local/bin/

Link the models directory.

cd #llama.cpp
ln -s $(pwd)/models ~/.local/share/models

Obtain a llava model and matching mmproj file from huggingface in gguf format.

wget -c https://huggingface.co/xtuner/llava-phi-3-mini-gguf/resolve/main/llava-phi-3-mini-int4.gguf?download=true
wget -c https://huggingface.co/xtuner/llava-phi-3-mini-gguf/resolve/main/llava-phi-3-mini-mmproj-f16.gguf?download=true

Build scripts

Create a script to launch the llava model and mmproj file together with your favorite options. Running this command will recreate ~/.local/bin/llava_phi3.sh. But maybe you should edit this in case locations are different. The -ngl 16 loads some of the model into VRAM. Increase it to -ngl 33 if you have plenty of AI resources.

cat << EOF > ~/.local/bin/llava_phi3.sh
llama-llava-cli -ngl 16 \
-m ~/.local/share/models/llava-phi-3-mini-int4.gguf \
--mmproj  ~/.local/share/models/llava-phi-3-mini-mmproj-f16.gguf \
-c 4096 "$@"
EOF

Now we are ready to test the model on some images. First, try a single image. Change directory to where images reside.

cd ~/Pictures
llava_phi3.sh --image file.jpg

Gather image data

If that works, we can create a database in the form of a web page of the whole directory. While technically not a database, it allows visual and text searching, plus copy and paste access to photographs.

shopt -s nullglob
printf -- "--image %q " *.png *.webm *.jpg *.jpeg|xargs llava_phi3.sh -p "Write a quick, 10-50 word caption for this image. Just one caption. Minimum 10 words." --template '<figure><img src="[image]" alt="[[image]]"><figcaption>[description]</figcaption></figure>' -c 4096 --log-disable | tee data

The printf -- option tells printf not to interpret everything as options. The %q outputs file names with spaces and special characters properly escaped. We could have used find for this. The nullglob option to shopt is necessary to prevent bash from causing errors if no images are found matching [pattern]. Bash tries to pass off the glob pattern itself as one of the images. So we turn that feature off.

You can even recurse sub-directories with printf, if you enable globstar shopt -s globstar. For more information (FYI): pinfo bash --node "The Shopt Builtin".

Update from .csv

Someday it might be necessary to update the captions, working with a subset of images in a comma-separated, quoted .csv file. This is made possible by reading the data into an array. FYI: pinfo bash --node "Arrays"

IFS="," read -r -a a <<< "files.csv"

Or if you have xsel installed. You can work with .csv data copied to the clipboard with CTRL-C.

IFS="," read -r -a a <<< "$(xsel -b)"

The IFS file separator tells Bash the file is comma-separated. FYI: pinfo bash --node "Word Splitting"

echo "${a[@]}"|xargs printf -- "--image %q " | xargs llava_phi3.sh -p "Write a quick, 10-50 word caption for this image. Just one caption. Minimum 10 words." --template '<figure><img src="[image]" alt="[[image]]"><figcaption>[description]</figcaption></figure>' -c 4096 --log-disable | tee data

The echo "${a[@]}" echos the file names properly quoted. FYI: pinfo bash --node "Quoting". We use xargs repeatedly to keep file name arguments quoted as they pass through the pipeline.

Analyze image data

Photos are processed one by one, formatting the output according to the template we provided. We now have a data file that looks like this.

<figure><img src="test pattern.png" alt="test pattern.png"><figcaption> The colorful television screen displays the image of a fish tank with blue, red, yellow, green, and blue elements.

</figcaption></figure><figure><img src="trading patterns.png" alt="trading patterns.png"><figcaption> A computer monitor displaying a variety of graphs and diagrams.

</figcaption></figure><figure><img src="Youtube-button.png" alt="Youtube-button.png"><figcaption> The YouTube logo is red and white.

</figcaption></figure><figure><img src="20230218_215924.jpg" alt="20230218_215924.jpg"><figcaption> A small digital scale shows the number 378.

</figcaption></figure><figure><img src="dad.jpg" alt="dad.jpg"><figcaption> A person plays the grand piano in an exhibition hall.

</figcaption></figure><figure><img src="ferry.jpg" alt="ferry.jpg"><figcaption> A boat is docked at a port near a forest.

</figcaption></figure><figure><img src="github_error.jpg" alt="github_error.jpg"><figcaption> The image shows a screenshot of a screenshot of a screenshot of a screenshot of a screenshot of a screenshot of a screenshot of a screenshot of a screenshot</figcaption></figure>

Build a web page automatically

We could manually clean this data up to make a proper HTML page. But tools like HTML tidy already exist for that. This command builds album.html from data.

tidy -i -o album.html data

<!DOCTYPE html>
<html>
<head>
  <meta name="generator" content=
  "HTML Tidy for HTML5 for Linux version 5.8.0">
  <title></title>
</head>
<body>
  <figure>
    <img src="test%20pattern.png" alt="test pattern.png">
    <figcaption>
      The colorful television screen displays the image of a fish
      tank with blue, red, yellow, green, and blue elements.
    </figcaption>
  </figure>
  <figure>
    <img src="trading%20patterns.png" alt="trading patterns.png">
    <figcaption>
      A computer monitor displaying a variety of graphs and
      diagrams.
    </figcaption>
  </figure>
...

We now have a nice web page of photos with AI-generated captions. Feel free to make corrections. Let's add some CSS to make the photo album look better. Insert <link rel="stylesheet" href="album.css"> somewhere between the and tags.

<!DOCTYPE html>
<html>
<head> 
  <meta name="generator" content=
  "HTML Tidy for HTML5 for Linux version 5.8.0">
  <title></title>
  <link rel="stylesheet" href="album.css">
</head>
...

And create a basic css file.

cat << EOF > album.css
  img{
    height:300px;
  }
  figure{
    width:350px;
    display:inline-block;
    white-space: nowrap;
  }
  figcaption {
    position: absolute;
    width: inherit;
    overflow: hidden;
    text-overflow: ellipsis;
    background: #181818;
  }
  figure:hover{
    white-space: normal;
  }
  body{
    background-color: #181818;
    color: #e0e0e0;
  }
EOF

Launch the album

Rename the album to something creative. Launch the album in the default web browser.

xdg-open album.html

Create a link to your album on the desktop. While viewing the album, simply drag the link in the address bar to the desktop. Or create a simlink from the command line.

ln -s album.html ~/Desktop/

Search for text captions in the browser by pressing CTRL+F. It will scroll to the image in question. Right click on your mug shots to copy them, paste them to social media, etc. You could also publish the album on a web server, github pages, or google drive. Or good old-fashioned lftp to your server box.

Advanced search

There is some JavaScript to alternately show and hide groups of images based on what you type into a search bar. For an example of this, look in the memes directory. You may use this so long as it doesn't become a hidden part of a commercial product.

Closing thoughts

Well, that's it. We built llama.cpp, downloaded a llava model, made some scripts, and built a photo album. We made a searchable web page, with AI-generated image captions. And we created a shortcut on the Desktop. What else could we be doing with the help of AI?

Discuss

- GitHub https://github.com/themanyone
- YouTube https://www.youtube.com/themanyone
- Mastodon https://mastodon.social/@themanyone
- Linkedin https://www.linkedin.com/in/henry-kroll-iii-93860426/
- [TheNerdShow.com](http://thenerdshow.com/)

Copyright (C) 2024 Henry Kroll III, www.thenerdshow.com. See LICENSE for details.