LLM Format: GGUF (Private GPT is using llama.ccp to run these models, therefore you need to use GGUF format for any LLMs you want to try.)
Local Vector DB: Chroma
-
Create a New Conda Environment: You can create a new Conda environment specifically for this project. This environment can have Python 3.11, which is required for
privateGPT
.conda create -n privategpt-py3.11 python=3.11 conda activate privategpt-py3.11
-
Clone the Repository: Clone the
privateGPT
repository as in the original instructions.git clone https://github.com/imartinez/privateGPT cd privateGPT
-
Install Poetry:
curl -sSL https://install.python-poetry.org | python3 -
-
Update Path to peotry executable (make sure to activate privategpt-py3.11 again):
cd vi ./bash_env export PATH="$HOME/.local/bin:$PATH" . .bashrc conda activate privategpt-py3.11
-
Install Dependencies with Poetry: Use Poetry to install the project dependencies.
poetry install --with ui,local
-
Run Setup Scripts: Continue with the setup scripts as in the original command.
poetry run python scripts/setup
-
Add GCC Environment Requirements
sudo apt update && sudo apt install -y build-essential cmake pkg-config
-
Install llama.cpp and llama-python.ccp
Ref: https://llama-cpp-python.readthedocs.io/en/latest/
This will build llama.cpp from source using cmake and your system's c compiler (required) and install the library alongside this python package.conda activate privategpt-py3.11 python -m pip install llama-cpp-python
-
Validate cuBLAS Intalled and Working
ls /usr/local/cuda/lib64/libcublas.so* /usr/local/cuda/lib64/libcublas.so /usr/local/cuda/lib64/libcublas.so.12 /usr/local/cuda/lib64/libcublas.so.12.3.4.1 echo $LD_LIBRARY_PATH /usr/local/cuda-12.3/lib64:/usr/local/cuda-12.3/lib64 git clone https://github.com/NVIDIA/CUDALibrarySamples cd cuBLAS/Level-1/amax cmake . make ./cublas_amax_example A 1.00 2.00 3.00 4.00 ===== result 4 =====
-
If cuBLAS not Installed - cuBLAS Installation for CUDA GPU Support (Not Tested)
Ref: https://github.com/ggerganov/llama.cpp#build
This provides BLAS acceleration using the CUDA cores of your Nvidia GPU. Make sure to have the CUDA toolkit installed. You can download it from your Linux distro's package manager (e.g. apt install nvidia-cuda-toolkit) or from here: CUDA Toolkit (https://developer.nvidia.com/cuda-downloads).
mkdir build
cd build
cmake .. -DLLAMA_CUBLAS=ON
cmake --build . --config Release
-
Install llama.cpp with CUDA GPU Support
CMAKE_ARGS='-DLLAMA_CUBLAS=on' poetry run pip install --force-reinstall --no-cache-dir llama-cpp-python
-
Launch PrivateGPT (Configures use of Local LLM)
cd privateGPT CUDA_VISIBLE_DEVICES=0 PGPT_PROFILES=local make run
-
Without Local LLM - Launch PrivateGPT: Since you're in the Conda environment, you can directly use
python
instead of specifyingpython3.11
.poetry run python -m private_gpt