diff --git a/sdk_blueprints/Gretel_Navigator_IAPI_Parallel_Blueprint.ipynb b/sdk_blueprints/Gretel_Navigator_IAPI_Parallel_Blueprint.ipynb new file mode 100644 index 00000000..b653df4c --- /dev/null +++ b/sdk_blueprints/Gretel_Navigator_IAPI_Parallel_Blueprint.ipynb @@ -0,0 +1,258 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": { + "id": "awlODvx7fQeB" + }, + "source": [ + "\n", + "[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/gretelai/gretel-blueprints/blob/main/sdk_blueprints/Gretel_Navigator_IAPI_Parallel.ipynb)\n", + "\n", + "
\n", + "\n", + "
\"Gretel\"
\n", + "\n", + "
\n", + "\n", + "## 👋 Welcome to the Navigator real-time inference API Parallel Blueprint!\n", + "\n", + "In this Blueprint, we will speed up the rate of generation by parallelizing requests to Navigator.\n", + "\n", + "\n", + "
\n", + "\n", + "## ✅ Set up your Gretel account\n", + "\n", + "To get started, you will need a [free Gretel account](https://console.gretel.ai/).\n", + "\n", + "
\n", + "\n", + "#### Ready? Let's go 🚀" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "HplFGj5HNZiJ" + }, + "source": [ + "## 💾 Install `gretel-client` and its dependencies" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "IZPDLpEPIXSW" + }, + "outputs": [], + "source": [ + "%%capture\n", + "!pip install gretel-client" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "JHqesHdmOCe_" + }, + "source": [ + "## 🛜 Configure your Gretel session\n", + "\n", + "- [The Gretel object](https://docs.gretel.ai/create-synthetic-data/gretel-sdk/the-gretel-object) provides a high-level interface for streamlining interactions with Gretel's APIs.\n", + "\n", + "- Retrieve your Gretel API key [here](https://console.gretel.ai/users/me/key)." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "CLIgnOzcNpHD" + }, + "outputs": [], + "source": [ + "from gretel_client import Gretel\n", + "\n", + "gretel = Gretel(api_key=\"prompt\", validate=True)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "7GO2SLE3ebdw" + }, + "source": [ + "## 🚀 Real-time inference API\n", + "\n", + "- The Navigator real-time inference API makes it possible to programmatically run Navigator outside the [Gretel Console](https://console.gretel.ai/navigator).\n", + "\n", + "- Our [Python SDK](https://github.com/gretelai/gretel-python-client) provides an intuitive high-level interface for the Navigator API.\n", + "\n", + "- Navigator currently supports two data generation modes: `\"tabular\"` and `\"natural_language\"`. In both modes, you can choose the backend model that powers the generation." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "xsiUQH5hP9TV" + }, + "outputs": [], + "source": [ + "# list \"tabular\" backend models\n", + "gretel.factories.get_navigator_model_list(\"tabular\")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "0zGw50JLQDWP" + }, + "outputs": [], + "source": [ + "# list \"natural_language\" backend models\n", + "gretel.factories.get_navigator_model_list(\"natural_language\")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "PrbIPm_5QKX4" + }, + "source": [ + "**Notes:**\n", + "\n", + "- `gretelai/auto` automatically selects the current default model, which will change with time as models continue to evolve.\n", + "\n", + "- The `factories` attribute of the `Gretel` object provides methods for creating new objects that interact with Gretel's APIs." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "Ru2Ogl83BSqn" + }, + "source": [ + "## 📊 Parallel tabular data generation\n", + "\n", + "- We use the `initialize_navigator_api` method of the `factories` attribute to create a Navigator API object per each thread.\n", + "\n", + "- With `model_type = \"tabular\"` (which is the default), we initialize Navigator's tabular API.\n", + "\n", + "- To select a different backend model, use the optional `backend_model` argument, which we've set to `gretelai/auto`." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "1InUug1Aeahi" + }, + "outputs": [], + "source": [ + "import random\n", + "from concurrent.futures import ThreadPoolExecutor\n", + "from threading import Lock\n", + "\n", + "import pandas as pd\n", + "from tqdm import tqdm\n", + "\n", + "\n", + "def generate_random_params():\n", + " \"\"\"\n", + " Generate random values for LLM parameters to ensure moderate creativity.\n", + "\n", + " Returns:\n", + " dict: A dictionary containing random values for temperature, top_p, and top_k.\n", + " \"\"\"\n", + " params = {\n", + " \"temperature\": round(\n", + " random.uniform(0.5, 0.75), 2\n", + " ), # Random float between 0.5 and 0.9\n", + " \"top_p\": round(\n", + " random.uniform(0.8, 0.95), 2\n", + " ), # Random float between 0.8 and 1.0\n", + " \"top_k\": random.randint(30, 45), # Random integer between 30 and 50\n", + " }\n", + " return params\n", + "\n", + "\n", + "def generate_records_parallel(prompt: str, num_records=25, num_threads=5):\n", + " shared_df = pd.DataFrame()\n", + "\n", + " mutex = Lock()\n", + "\n", + " def generate_data(progress: tqdm):\n", + " tabular = gretel.factories.initialize_navigator_api(\n", + " \"tabular\", backend_model=\"gretelai/auto\"\n", + " )\n", + " nonlocal mutex, shared_df\n", + " GENERATE_PARAMS = generate_random_params()\n", + " try:\n", + " for item in tabular.generate(\n", + " prompt,\n", + " num_records=num_records,\n", + " stream=True,\n", + " disable_progress_bar=True,\n", + " **GENERATE_PARAMS\n", + " ):\n", + " with mutex:\n", + " shared_df = pd.concat(\n", + " [shared_df, pd.DataFrame(item, index=[0])], ignore_index=True\n", + " )\n", + " progress.update(1)\n", + " except Exception as e:\n", + " print(\"Error!\")\n", + " print(e)\n", + "\n", + " with tqdm(total=num_records * num_threads) as progress, ThreadPoolExecutor(\n", + " num_threads\n", + " ) as executor:\n", + " for _ in range(1, num_records * num_threads + 1, num_records):\n", + " executor.submit(generate_data, progress)\n", + " return shared_df" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "9T7EydHVS4vd" + }, + "outputs": [], + "source": [ + "prompt = \"\"\"\n", + "Generate customer bank transaction data. Include the following columns:\n", + "- customer_name\n", + "- customer_id\n", + "- transaction_date\n", + "- transaction_amount\n", + "- transaction_type\n", + "- transaction_category\n", + "- account_balance\n", + "\"\"\"\n", + "num_records = 25\n", + "num_threads = 5\n", + "\n", + "df = generate_records_parallel(prompt, num_records=num_records, num_threads=num_threads)" + ] + } + ], + "metadata": { + "colab": { + "provenance": [] + }, + "kernelspec": { + "display_name": "Python 3", + "name": "python3" + }, + "language_info": { + "name": "python" + } + }, + "nbformat": 4, + "nbformat_minor": 0 +}