You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I would like to be able to rely on the prompt hub for our real-time chat results. Unfortunately, it seems like the performance of the API endpoints and/or the python SDK is not sufficient to enable this yet.
Given the following benchmark_prompthub_pull.py:
importtimeimportstatisticsfromdotenvimportload_dotenvfromlangsmithimportclientdefmeasure_pull_prompt(prompt_name, repeat=5):
"""Pull the specified prompt `repeat` times and measure latency."""times= []
c=client.Client() # We create a new client so that each measure is fresh.for_inrange(repeat):
start=time.perf_counter()
_=c.pull_prompt(prompt_name)
end=time.perf_counter()
times.append(end-start)
returntimesdefprint_stats(label, times):
"""Print min, max, mean, and standard deviation for the recorded times."""print(f"\nStats for '{label}':")
print(f" All times: {times}")
print(f" Mean time: {statistics.mean(times):.4f} s")
print(
f" StdDev time: {statistics.stdev(times):.4f} s"iflen(times) >1else" StdDev time: N/A (only one measurement)"
)
print(f" Min time: {min(times):.4f} s")
print(f" Max time: {max(times):.4f} s")
defmain():
load_dotenv() # Load environment variables, e.g. LangSmith credentialspublic_prompt="rlm/rag-prompt"# Example: a public prompt from the Hubprivate_prompt= (
"brims-seller-2024-12-17"# Example: a private prompt in your account
)
public_times=measure_pull_prompt(public_prompt, repeat=5)
private_times=measure_pull_prompt(private_prompt, repeat=5)
print_stats(public_prompt, public_times)
print_stats(private_prompt, private_times)
if__name__=="__main__":
main()
When I run, I get:
Stats for 'rlm/rag-prompt':
All times: [0.7337127500213683, 0.16061124997213483, 0.16808425000635907, 0.43246658297721297, 0.2569338330067694]
Mean time: 0.3504 s
StdDev time: 0.2407 s
Min time: 0.1606 s
Max time: 0.7337 s
Stats for 'brims-seller-2024-12-17':
All times: [0.23353637498803437, 0.1711990410112776, 1.7675039170426317, 0.2944247499690391, 0.4333985830307938]
Mean time: 0.5800 s
StdDev time: 0.6709 s
Min time: 0.1712 s
Max time: 1.7675 s
Based on this, it seems like five pulls of a private prompt range from 171 ms to 1.8 sec, with an average of about 671 ms. For public prompts, the times ranged from 161ms to 734ms, with an avg of 350ms. Note that this is for the most popular prompt on prompt hub, so this is likely an upper bound of expected performance for public prompts.
Given that LLM response time is already a bottleneck, we don't have the latency budget to use prompt hub at this time. Please work on this; it's a blocker for us for using the PromptHub, particularly in the context of LangGraph Platform where we would like assistants to reference PromptHub templates.
The text was updated successfully, but these errors were encountered:
I would like to be able to rely on the prompt hub for our real-time chat results. Unfortunately, it seems like the performance of the API endpoints and/or the python SDK is not sufficient to enable this yet.
Given the following
benchmark_prompthub_pull.py
:When I run, I get:
Based on this, it seems like five pulls of a private prompt range from 171 ms to 1.8 sec, with an average of about 671 ms. For public prompts, the times ranged from 161ms to 734ms, with an avg of 350ms. Note that this is for the most popular prompt on prompt hub, so this is likely an upper bound of expected performance for public prompts.
Given that LLM response time is already a bottleneck, we don't have the latency budget to use prompt hub at this time. Please work on this; it's a blocker for us for using the PromptHub, particularly in the context of LangGraph Platform where we would like assistants to reference PromptHub templates.
The text was updated successfully, but these errors were encountered: