Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[LLM] update llm server dockerfiles #9771

Open
wants to merge 2 commits into
base: develop
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
33 changes: 15 additions & 18 deletions llm/server/dockerfiles/Dockerfile_serving_cuda118_cudnn8
Original file line number Diff line number Diff line change
@@ -1,31 +1,28 @@
FROM registry.baidubce.com/paddlepaddle/fastdeploy:llm-base-gcc12.3-cuda11.8-cudnn8-nccl2.15.5

WORKDIR /opt/output/
COPY ./server/ /opt/output/Serving/

ENV LD_LIBRARY_PATH="/usr/local/cuda-11.8/compat/:$LD_LIBRARY_PATH"

RUN pip config set global.index-url https://pypi.tuna.tsinghua.edu.cn/simple
RUN python3 -m pip install --pre paddlepaddle-gpu -i https://www.paddlepaddle.org.cn/packages/nightly/cu118/ \
RUN python3 -m pip install --pre paddlepaddle-gpu -i https://www.paddlepaddle.org.cn/packages/nightly/cu123/ \
&& python3 -m pip install paddlenlp==3.0.0b0 \
&& python3 -m pip install --no-cache-dir sentencepiece pycryptodome tritonclient[all]==2.41.1
&& python3 -m pip install --no-cache-dir sentencepiece pycryptodome tritonclient[all]==2.41.1 \
&& python3 -m pip install --no-cache-dir --force-reinstall https://paddlepaddle-inference-banchmark.bj.bcebos.com/paddlenlp_ops-0.0.0-py3-none-any.whl \
&& apt-get clean && rm -rf /var/lib/apt/lists/*

RUN git clone https://gitee.com/paddlepaddle/PaddleNLP.git && cd PaddleNLP/csrc \
&& python3 setup_cuda.py build && python3 setup_cuda.py install --user \
&& cp -r /opt/output/PaddleNLP/paddlenlp /usr/local/lib/python3.10/dist-packages/ \
&& cp -r /root/.local/lib/python3.10/site-packages/* /usr/local/lib/python3.10/dist-packages/ \
&& rm -rf /opt/output/PaddleNLP
RUN mkdir -p /opt/source/ && cd /opt/source/ \
&& git clone https://github.com/PaddlePaddle/Paddle.git \
&& git clone https://github.com/PaddlePaddle/PaddleNLP.git \
&& cp -r /opt/source/PaddleNLP/paddlenlp /usr/local/lib/python3.10/dist-packages/ \
&& python3 -m pip install --no-cache-dir -r PaddleNLP/requirements.txt \
&& python3 -m pip install --no-cache-dir -r PaddleNLP/llm/server/server/requirements.txt

RUN python3 -m pip install -r /opt/output/Serving/requirements.txt && rm /opt/output/Serving/requirements.txt
RUN mv Serving/server /usr/local/lib/python3.10/dist-packages/
RUN mkdir -p /opt/output/Serving/llm_model/model/1 \
&& mv /opt/output/Serving/config/config.pbtxt /opt/output/Serving/llm_model/model/ \
&& rm -rf /opt/output/Serving/config/
RUN echo "from server.triton_server import TritonPythonModel" >>/opt/output/Serving/llm_model/model/1/model.py
&& cp /opt/source/PaddleNLP/llm/server/server/config/config.pbtxt /opt/output/Serving/llm_model/model/ \
&& cp /opt/source/PaddleNLP/llm/server/server/scripts/start_server.sh /opt/output/Serving/ \
&& cp /opt/source/PaddleNLP/llm/server/server/scripts/stop_server.sh /opt/output/Serving/

RUN cd /opt/output/Serving/ \
&& cp scripts/start_server.sh . && cp scripts/stop_server.sh . \
&& rm -rf scripts
ENV PYTHONPATH="/opt/source/PaddleNLP/llm/server/server"
RUN echo "from server.triton_server import TritonPythonModel" >>/opt/output/Serving/llm_model/model/1/model.py

ENV http_proxy=""
ENV https_proxy=""
31 changes: 14 additions & 17 deletions llm/server/dockerfiles/Dockerfile_serving_cuda123_cudnn9
Original file line number Diff line number Diff line change
@@ -1,31 +1,28 @@
FROM registry.baidubce.com/paddlepaddle/fastdeploy:llm-base-gcc12.3-cuda12.3-cudnn9-nccl2.15.5

WORKDIR /opt/output/
COPY ./server/ /opt/output/Serving/

ENV LD_LIBRARY_PATH="/usr/local/cuda-12.3/compat/:$LD_LIBRARY_PATH"

RUN pip config set global.index-url https://pypi.tuna.tsinghua.edu.cn/simple
RUN python3 -m pip install --pre paddlepaddle-gpu -i https://www.paddlepaddle.org.cn/packages/nightly/cu123/ \
&& python3 -m pip install paddlenlp==3.0.0b0 \
&& python3 -m pip install --no-cache-dir sentencepiece pycryptodome tritonclient[all]==2.41.1
&& python3 -m pip install --no-cache-dir sentencepiece pycryptodome tritonclient[all]==2.41.1 \
&& python3 -m pip install --no-cache-dir --force-reinstall https://paddlepaddle-inference-banchmark.bj.bcebos.com/paddlenlp_ops-0.0.0-py3-none-any.whl \
&& apt-get clean && rm -rf /var/lib/apt/lists/*

RUN git clone https://gitee.com/paddlepaddle/PaddleNLP.git && cd PaddleNLP/csrc \
&& python3 setup_cuda.py build && python3 setup_cuda.py install --user \
&& cp -r /opt/output/PaddleNLP/paddlenlp /usr/local/lib/python3.10/dist-packages/ \
&& cp -r /root/.local/lib/python3.10/site-packages/* /usr/local/lib/python3.10/dist-packages/ \
&& rm -rf /opt/output/PaddleNLP
RUN mkdir -p /opt/source/ && cd /opt/source/ \
&& git clone https://github.com/PaddlePaddle/Paddle.git \
&& git clone https://github.com/PaddlePaddle/PaddleNLP.git \
&& cp -r /opt/source/PaddleNLP/paddlenlp /usr/local/lib/python3.10/dist-packages/ \
&& python3 -m pip install --no-cache-dir -r PaddleNLP/requirements.txt \
&& python3 -m pip install --no-cache-dir -r PaddleNLP/llm/server/server/requirements.txt

RUN python3 -m pip install -r /opt/output/Serving/requirements.txt && rm /opt/output/Serving/requirements.txt
RUN mv Serving/server /usr/local/lib/python3.10/dist-packages/
RUN mkdir -p /opt/output/Serving/llm_model/model/1 \
&& mv /opt/output/Serving/config/config.pbtxt /opt/output/Serving/llm_model/model/ \
&& rm -rf /opt/output/Serving/config/
RUN echo "from server.triton_server import TritonPythonModel" >>/opt/output/Serving/llm_model/model/1/model.py
&& cp /opt/source/PaddleNLP/llm/server/server/config/config.pbtxt /opt/output/Serving/llm_model/model/ \
&& cp /opt/source/PaddleNLP/llm/server/server/scripts/start_server.sh /opt/output/Serving/ \
&& cp /opt/source/PaddleNLP/llm/server/server/scripts/stop_server.sh /opt/output/Serving/

RUN cd /opt/output/Serving/ \
&& cp scripts/start_server.sh . && cp scripts/stop_server.sh . \
&& rm -rf scripts
ENV PYTHONPATH="/opt/source/PaddleNLP/llm/server/server"
RUN echo "from server.triton_server import TritonPythonModel" >>/opt/output/Serving/llm_model/model/1/model.py

ENV http_proxy=""
ENV https_proxy=""