quickstart neuralchat notebook for workshops (#1269)

* hackathon content to introduce neuralchat Signed-off-by: kta-intel <[email protected]>
intel · Feb 8, 2024 · 033b7d5 · 033b7d5
1 parent f5a7435
commit 033b7d5
Show file tree

Hide file tree

Showing 2 changed files with 411 additions and 0 deletions.
diff --git a/...nsion_for_transformers/neural_chat/docs/notebooks/workshop/01_quickstart_neuralchat.ipynb b/...nsion_for_transformers/neural_chat/docs/notebooks/workshop/01_quickstart_neuralchat.ipynb
@@ -0,0 +1,382 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "5c6ddfc4-ecd6-4f0a-9e32-5ff4ab014db7",
+   "metadata": {},
+   "source": [
+    "# QuickStart: Intel® Extension For Transformers*: NeuralChat on 4th Generation Intel® Xeon® Scalable Processors"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0a628ae6-ddd2-41d3-acf1-855b6f31aef5",
+   "metadata": {},
+   "source": [
+    "## Prepare Environment"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7ae386d8-6138-478d-ae4d-8572a561967c",
+   "metadata": {},
+   "source": [
+    "Follow the README to install the necessary requirements to run this tutorial. In summary, you will need to install the following:\n",
+    "-  Intel(R) Extension for Transformers* from source (to get latest updates)\n",
+    "-  NeuralChat requirements\n",
+    "-  Retrieval Plugin Requirements\n",
+    "-  Audio Plugin (TTS and ASR) Requirements\n",
+    "  "
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5733fa69-5e68-453c-8f6f-401ee096f4a7",
+   "metadata": {},
+   "source": [
+    "Check hardware"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "20d39d47-087f-4e2a-b766-b3707ee342e9",
+   "metadata": {
+    "scrolled": true
+   },
+   "outputs": [],
+   "source": [
+    "!lscpu"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c3c6d1e2-61f1-4ee4-98c7-4f6202e7f2ea",
+   "metadata": {},
+   "source": [
+    "## Building a simple chatbot\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "a5555f5b-84d1-4154-9388-8aa17041fec0",
+   "metadata": {
+    "scrolled": true
+   },
+   "outputs": [],
+   "source": [
+    "from intel_extension_for_transformers.neural_chat import build_chatbot, PipelineConfig\n",
+    "config = PipelineConfig(model_name_or_path='Intel/neural-chat-7b-v3-1')\n",
+    "chatbot = build_chatbot(config)\n",
+    "response = chatbot.predict(query=\"Tell me about Intel Xeon Scalable Processors.\")\n",
+    "print(response)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "db00b822-eef0-4b92-9a9a-9d7a8d148d6f",
+   "metadata": {},
+   "source": [
+    "## Optimizing your chatbot\n",
+    "Enable mixed precision with bfloat16"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "6e832b1a-0f7a-4715-b245-85666729e206",
+   "metadata": {
+    "scrolled": true
+   },
+   "outputs": [],
+   "source": [
+    "from intel_extension_for_transformers.neural_chat import build_chatbot, PipelineConfig\n",
+    "from intel_extension_for_transformers.transformers import MixedPrecisionConfig\n",
+    "config = PipelineConfig(model_name_or_path='Intel/neural-chat-7b-v3-1',\n",
+    "                        optimization_config=MixedPrecisionConfig(dtype='bfloat16'))\n",
+    "chatbot = build_chatbot(config)\n",
+    "response = chatbot.predict(query=\"Tell me about Intel Xeon Scalable Processors.\")\n",
+    "print(response)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4ff9ca77-149b-47cc-9a5b-a6f70877f09e",
+   "metadata": {},
+   "source": [
+    "INT4 weight only quantization"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "a804023f-5a79-47ce-9b4b-61ea9746df5f",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from intel_extension_for_transformers.neural_chat import build_chatbot, PipelineConfig\n",
+    "from intel_extension_for_transformers.transformers import WeightOnlyQuantConfig\n",
+    "from intel_extension_for_transformers.neural_chat.config import LoadingModelConfig\n",
+    "config = PipelineConfig(model_name_or_path=\"Intel/neural-chat-7b-v3-1\",\n",
+    "                        optimization_config=WeightOnlyQuantConfig(compute_dtype=\"int8\", weight_dtype=\"int4_fullrange\"), \n",
+    "                        loading_config=LoadingModelConfig(use_llm_runtime=False))\n",
+    "chatbot = build_chatbot(config)\n",
+    "response = chatbot.predict(query=\"Tell me about Intel Xeon Scalable Processors.\")\n",
+    "print(response)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "adc089c9-623f-43ec-a470-31e5f8d4cac4",
+   "metadata": {},
+   "source": [
+    "## Customizing your chatbot\n",
+    "### Plugin: Retrieval\n",
+    "Without retrieval plugin"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "640b5c7d-15e0-4438-b451-9172159398fc",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from intel_extension_for_transformers.neural_chat import build_chatbot, PipelineConfig\n",
+    "config = PipelineConfig(model_name_or_path='Intel/neural-chat-7b-v3-1')\n",
+    "chatbot = build_chatbot(config)\n",
+    "response = chatbot.predict(query=\"How many cores does the Intel® Xeon® Platinum 8480+ Processor have in total?\")\n",
+    "print(response)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4a65c5a6-1d33-43f6-8c61-92322b74e2cc",
+   "metadata": {},
+   "source": [
+    "With retrieval plugin"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "aaef691b-048a-4d5c-ada9-39acec7eeea2",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "!mkdir docs\n",
+    "!curl -OL https://raw.githubusercontent.com/intel/intel-extension-for-transformers/main/intel_extension_for_transformers/neural_chat/assets/docs/sample.txt\n",
+    "!mv sample.txt ./docs"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "40e6474f-565b-4399-b250-a314447c2120",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "!cat ./docs/sample.txt"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "3779e396-a963-4503-8a85-a0d897445c58",
+   "metadata": {
+    "scrolled": true
+   },
+   "outputs": [],
+   "source": [
+    "from intel_extension_for_transformers.neural_chat import PipelineConfig\n",
+    "from intel_extension_for_transformers.neural_chat import build_chatbot\n",
+    "from intel_extension_for_transformers.neural_chat import plugins\n",
+    "plugins.retrieval.enable=True\n",
+    "plugins.retrieval.args[\"input_path\"]=\"./docs/sample.txt\"\n",
+    "config = PipelineConfig(model_name_or_path='Intel/neural-chat-7b-v3-1',\n",
+    "                        plugins=plugins)\n",
+    "chatbot = build_chatbot(config)\n",
+    "response = chatbot.predict(query=\"How many cores does the Intel® Xeon® Platinum 8480+ Processor have in total?\")\n",
+    "print(response)\n",
+    "\n",
+    "plugins.retrieval.enable=False # disable retrieval"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e2da2eef-e383-4e54-b9c1-4079cf053a33",
+   "metadata": {},
+   "source": [
+    "### Plugin: ASR & TTS\n",
+    "Enable voice chat"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "72fe3865-b23a-405d-8199-4a869f448ec0",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "!curl -OL https://raw.githubusercontent.com/intel/intel-extension-for-transformers/main/intel_extension_for_transformers/neural_chat/assets/audio/sample.wav"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "103340b5-1b57-486c-9fba-3060de63ab42",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from intel_extension_for_transformers.neural_chat import build_chatbot, PipelineConfig\n",
+    "from intel_extension_for_transformers.neural_chat import plugins\n",
+    "plugins.tts.enable = True\n",
+    "plugins.tts.args[\"output_audio_path\"] = \"./response.wav\"\n",
+    "plugins.asr.enable = True\n",
+    "\n",
+    "config = PipelineConfig(model_name_or_path='Intel/neural-chat-7b-v3-1',\n",
+    "                        plugins=plugins)\n",
+    "chatbot = build_chatbot(config)\n",
+    "result = chatbot.predict(query=\"./sample.wav\")\n",
+    "print(result)\n",
+    "\n",
+    "plugins.tts.enable = False\n",
+    "plugins.asr.enable = False"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e93a0779-8c26-4c62-a514-64bf9c58896f",
+   "metadata": {},
+   "source": [
+    "### [Optional]: Fine-tuning"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b8419deb-8a06-41ba-af86-875d1d1c0fc7",
+   "metadata": {},
+   "source": [
+    "We use the [Alpaca dataset](https://github.com/tatsu-lab/stanford_alpaca) from Stanford University as the general domain dataset to fine-tune the model. This dataset is provided in the form of a JSON file, [alpaca_data.json](https://github.com/tatsu-lab/stanford_alpaca/blob/main/alpaca_data.json). In Alpaca, researchers have manually crafted 175 seed tasks to guide `text-davinci-003` in generating 52K instruction data for diverse tasks."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "45f5c30c-ecb4-4341-a586-4efdb3906bbd",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "!curl -OL https://raw.githubusercontent.com/tatsu-lab/stanford_alpaca/main/alpaca_data.json"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "660a3a94-230a-497f-9ad9-2645b8f2d186",
+   "metadata": {},
+   "source": [
+    "Finetune the model on Alpaca-format dataset to conduct text generation.\n",
+    "\n",
+    "We employ the [LoRA approach](https://arxiv.org/pdf/2106.09685.pdf) to finetune the LLM efficiently."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "1008618d-a82e-4249-813e-ae46b95c64d5",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from transformers import TrainingArguments\n",
+    "from intel_extension_for_transformers.neural_chat.config import (\n",
+    "    ModelArguments,\n",
+    "    DataArguments,\n",
+    "    FinetuningArguments,\n",
+    "    TextGenerationFinetuningConfig,\n",
+    ")\n",
+    "from intel_extension_for_transformers.neural_chat.chatbot import finetune_model\n",
+    "model_args = ModelArguments(model_name_or_path=\"Intel/neural-chat-7b-v3-1\")\n",
+    "data_args = DataArguments(train_file=\"alpaca_data.json\")\n",
+    "training_args = TrainingArguments(\n",
+    "    output_dir='./finetuned_model_path',\n",
+    "    do_train=True,\n",
+    "    do_eval=True,\n",
+    "    num_train_epochs=3,\n",
+    "    overwrite_output_dir=True,\n",
+    "    per_device_train_batch_size=4,\n",
+    "    per_device_eval_batch_size=4,\n",
+    "    gradient_accumulation_steps=1,\n",
+    "    save_strategy=\"no\",\n",
+    "    log_level=\"info\",\n",
+    "    save_total_limit=2,\n",
+    "    bf16=True\n",
+    ")\n",
+    "finetune_args = FinetuningArguments()\n",
+    "finetune_cfg = TextGenerationFinetuningConfig(\n",
+    "            model_args=model_args,\n",
+    "            data_args=data_args,\n",
+    "            training_args=training_args,\n",
+    "            finetune_args=finetune_args,\n",
+    "        )\n",
+    "finetune_model(finetune_cfg)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "701abb4e-41fa-4aa4-81e0-2abc3f15f093",
+   "metadata": {},
+   "source": [
+    "Load the fine tuned model"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "4112b3b3-ef0d-4157-8c23-cf3ed5d59e8a",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from intel_extension_for_transformers.neural_chat import build_chatbot\n",
+    "from intel_extension_for_transformers.neural_chat import PipelineConfig\n",
+    "from intel_extension_for_transformers.neural_chat.config import LoadingModelConfig\n",
+    "\n",
+    "config = PipelineConfig(model_name_or_path=\"Intel/neural-chat-7b-v3-1\",\n",
+    "                      loading_config=LoadingModelConfig(peft_path=\"./finetuned_model_path\"))\n",
+    "chatbot = build_chatbot(config)\n",
+    "response = chatbot.predict(query=\"Tell me about Intel Xeon Scalable Processors.\")\n",
+    "print(response)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "186e46d1-3f94-423c-abcb-4a533777e26c",
+   "metadata": {},
+   "source": [
+    "### Congratulations! You have completed the NeuralChat quickstart\n",
+    "Visit [notebooks directory](https://github.com/intel/intel-extension-for-transformers/blob/c30353fcb0e5ceab440a7508b5980ccebcac8750/intel_extension_for_transformers/neural_chat/docs/full_notebooks.md) to see more examples"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "itrex",
+   "language": "python",
+   "name": "itrex"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.9.16"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}