Paper | Model Instruction | Framework | Installation | Train | Benchmarks | Acknowledgement
- [09.2024]: Join our Discord Community if you have any feedbacks!
- [09.2024]: Check our xLAM Technical Report Paper.
- [08.2024]: We are excited to announce the release of full xLAM family, our suite of Large Action Models! From the "tiny giant" to industrial powerhouses. These models have achieved impressive rankings, placing #1 and #6 on the Berkeley Function-Calling Leaderboard. Check our Hugging Face collection.
- [07.2024]: We are excited to announce the release of our two function-calling models: xLAM-1b-fc-r and xLAM-7b-fc-r. These models have achieved impressive rankings, placing #3 and #25 on the Berkeley Function-Calling Leaderboard, outperforming many significantly larger models. Stay tuned for more powerful models coming soon.
- [06.2024] Check our latest work APIGen, the best open-sourced models for function calling. Our dataset xlam-function-calling-60k is currently among the Top-3 trending datasets on HuggingFace, standing out in a field of 173,670 datasets as of July 4, 2024. See also the Twitter by Salesforce CEO, VentureBeat and 新智元.
- [03.2024] xLAM model is released! Try it together with AgentLite benchmark or other benchmarks, which is comparable to GPT-4!
- [02.2024] Initial Release of AgentOhana and xLAM paper!
This repo is for research purposes only.
Autonomous agents powered by large language models (LLMs) have garnered significant research attention. However, fully harnessing the potential of LLMs for agent-based tasks presents inherent challenges due to the heterogeneous nature of diverse data sources featuring multi-turn trajectories.
This repo introduces xLAM that aggregates agent trajectories from distinct environments, spanning a wide array of scenarios. It standardizes and unifies these trajectories into a consistent format, streamlining the creation of a generic data loader optimized for agent training. Leveraging the data unification, our training pipeline maintains equilibrium across different data sources and preserves independent randomness across devices during dataset partitioning and model training.
If you already know Mixtral, xLAM-v0.1 is a significant upgrade and better at many things. For the same number of parameters, the model have been fine-tuned across a wide range of agent tasks and scenarios, all while preserving the capabilities of the original model.
xLAM-v0.1-r represents the version 0.1 of the Large Action Model series, with the "-r" indicating it's tagged for research. This model is compatible with VLLM and FastChat platforms.
from transformers import AutoModelForCausalLM, AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("Salesforce/xLAM-v0.1-r")
model = AutoModelForCausalLM.from_pretrained("Salesforce/xLAM-v0.1-r", device_map="auto")
messages = [
{"role": "user", "content": "What is your favourite condiment?"},
{"role": "assistant", "content": "Well, I'm quite partial to a good squeeze of fresh lemon juice. It adds just the right amount of zesty flavour to whatever I'm cooking up in the kitchen!"},
{"role": "user", "content": "Do you have mayonnaise recipes?"}
]
inputs = tokenizer.apply_chat_template(messages, return_tensors="pt").to("cuda")
outputs = model.generate(inputs, max_new_tokens=512)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Note: You may need to tune the Temperature setting for different applications. Typically, a lower Temperature is helpful for tasks that require deterministic outcomes. Additionally, for tasks demanding adherence to specific formats or function calls, explicitly including formatting instructions is advisable and important.
There are two main options for serving the xLAM model as an OpenAI-compatible chat completion API (here we use Salesforce/xLAM-8x7b-r
and 4xA100 (40GB) setup as an example):
vLLM offers efficient serving with lower latency. To serve the model with vLLM:
vllm serve Salesforce/xLAM-8x7b-r --host 0.0.0.0 --port 8000 --tensor-parallel-size 4
FastChat provides a more feature-rich serving setup. To serve with FastChat:
- Start the controller:
python3 -m fastchat.serve.controller --host 0.0.0.0
- Start the OpenAI-compatible API server:
python3 -m fastchat.serve.openai_api_server --host 0.0.0.0 --port 8000
- Launch the model worker:
python3 -m fastchat.serve.vllm_worker \
--model-names "Salesforce/xLAM-8x7b-r" \
--model-path Salesforce/xLAM-8x7b-r \
--host 0.0.0.0 \
--port 31005 \
--worker-address http://localhost:31001 \
--num-gpus 4 \
--limit-worker-concurrency 64
Once the model is served, you can use the following xLAM client to interact with it for function calling or other applications:
from xLAM.client import xLAMChatCompletion, xLAMConfig
# Configure the client
config = xLAMConfig(base_url="http://localhost:8000/v1/", model="Salesforce/xLAM-8x7b-r")
llm = xLAMChatCompletion.from_config(config)
# Example conversation
messages = [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "What's the weather like in New York?"},
{"role": "assistant", "content": "To get the weather information for New York, I'll need to use the get_weather function.", "tool_calls": {"name": "get_weather", "arguments": '{"location": "New York", "unit": "fahrenheit"}'}},
{"role": "tool", "name": "get_weather", "content": '{"temperature": 72, "description": "Partly cloudy"}'},
{"role": "user", "content": "Now, search for the weather in San Francisco."}
]
# Example function definition (optional)
tools = [
{
"name": "get_weather",
"description": "Get the current weather for a location",
"parameters": {
"type": "object",
"properties": {
"location": {"type": "string", "description": "The city and state, e.g. San Francisco, New York"},
"unit": {"type": "string", "enum": ["celsius", "fahrenheit"], "description": "The unit of temperature to return"}
},
"required": ["location"]
}
},
{
"name": "search",
"description": "Search for information on the internet",
"parameters": {
"type": "object",
"properties": {
"query": {"type": "string", "description": "The search query, e.g. 'latest news on AI'"}
},
"required": ["query"]
}
},
{
"name": "respond",
"description": "When you are ready to respond, use this function. This function allows the assistant to formulate and deliver appropriate replies based on the input message and the context of the conversation. Generate a concise response for simple questions, and a more detailed response for complex questions.",
"parameters": {
"type": "object",
"properties": {
"message": {"type": "string", "description": "The content of the message to respond to."}
},
"required": ["message"]
}
}
]
response = llm.completion(messages, tools=tools)
print(response)
from fm_datasets import webshop_multi_turn_v2
from fm_utils.seed_random import init_device_seed
from fm_utils.interleave_datasets import interleave_data
sft_webshop_multi_turn = webshop_multi_turn_v2.SFTWebShopMultiTurnV2(tokenizer, script_args)
seed = init_device_seed(seed=42)
train_dataset, eval_dataset = \
interleave_data(
data_objects=[sft_webshop_multi_turn],
sample_probs=[1.0],
return_type="prompt_answer",
seq_length=4096,
seed=seed)
We have SFT trainer v1 and v2lite,
where v1 is more based on trl
module optimized for LoRA while v2lite is starting from scratch with Accelerator optimized for fully-finetuning.
They share almost the same interface.
from xLAM.fm_utils.derived_data_collator import DataCollatorForPromptAnswer
from xLAM.fm_trainers.sft_foundation_trainer import SFTFoundationTrainer
from xLAM.train.fm_trainers.sft_foundation_trainer_lite import SFTFoundationTrainerLite, prepare_accelerator
script_args = parser.parse_args_into_dataclasses()[0]
collator = DataCollatorForPromptAnswer(
instruction_template=instruction_template_ids,
response_template=response_template_ids,
tokenizer=tokenizer,
mlm=False)
# v2 trainer
accelerator = prepare_accelerator(script_args)
trainer = SFTFoundationTrainerLite(
args=script_args,
accelerator=accelerator,
train_dataset=train_dataset,
eval_dataset=eval_dataset,
collator=collator,
)
trainer.train()
You can use our configured docker environment gcr.io/salesforce-research-internal/xlam-2024-02-14
, and one example yaml file is shown at envs_config
.
Then, you can pip install -e . --no-dependencies
Or, you can directly pip install -e .
. There is a chance that your configured environment might have some error.
You can refer to the complete example scripts to learn more details
Or you can simply run the bash scripts to have a quick start for our example
for v1:
nohup accelerate launch --config_file xLAM/train/scripts/multi_gpu.yaml xLAM/train/scripts/sft_train_model_v1.py --model_name mistralai/Mixtral-8x7B-Instruct-v0.1 --seq_length 4096 --run_name sft_mixtral8X7B_v2_02072024 --output_dir {path} > sft_mixtral8X7B_v2_02072024.nohup 2>&1 &
for v2:
source xLAM/train/scripts/model_run_v2lite_full.sh
LLM Name | ZS | ZST | ReaAct | PlanAct | PlanReAct | BOLAA |
---|---|---|---|---|---|---|
Llama-2-70B-chat | 0.0089 | 0.0102 | 0.4273 | 0.2809 | 0.3966 | 0.4986 |
Vicuna-33B | 0.1527 | 0.2122 | 0.1971 | 0.3766 | 0.4032 | 0.5618 |
Mixtral-8x7B-Instruct-v0.1 | 0.4634 | 0.4592 | 0.5638 | 0.4738 | 0.3339 | 0.5342 |
GPT-3.5-Turbo | 0.4851 | 0.5058 | 0.5047 | 0.4930 | 0.5436 | 0.6354 |
GPT-3.5-Turbo-Instruct | 0.3785 | 0.4195 | 0.4377 | 0.3604 | 0.4851 | 0.5811 |
GPT-4-0613 | 0.5002 | 0.4783 | 0.4616 | 0.7950 | 0.4635 | 0.6129 |
xLAM-v0.1-r | 0.5201 | 0.5268 | 0.6486 | 0.6573 | 0.6611 | 0.6556 |
LLM Name | ZS | ZST | ReaAct | PlanAct | PlanReAct |
---|---|---|---|---|---|
Mixtral-8x7B-Instruct-v0.1 | 0.3912 | 0.3971 | 0.3714 | 0.3195 | 0.3039 |
GPT-3.5-Turbo | 0.4196 | 0.3937 | 0.3868 | 0.4182 | 0.3960 |
GPT-4-0613 | 0.5801 | 0.5709 | 0.6129 | 0.5778 | 0.5716 |
xLAM-v0.1-r | 0.5492 | 0.4776 | 0.5020 | 0.5583 | 0.5030 |
Please note: All prompts provided by AgentLite are considered "unseen prompts" for xLAM-v0.1-r, meaning the model has not been trained with data related to these prompts.
LLM Name | Act | ReAct | BOLAA |
---|---|---|---|
GPT-3.5-Turbo-16k | 0.6158 | 0.6005 | 0.6652 |
GPT-4-0613 | 0.6989 | 0.6732 | 0.7154 |
xLAM-v0.1-r | 0.6563 | 0.6640 | 0.6854 |
Easy | Medium | Hard | ||||
---|---|---|---|---|---|---|
LLM Name | F1 Score | Accuracy | F1 Score | Accuracy | F1 Score | Accuracy |
GPT-3.5-Turbo-16k-0613 | 0.410 | 0.350 | 0.330 | 0.25 | 0.283 | 0.20 |
GPT-4-0613 | 0.611 | 0.47 | 0.610 | 0.480 | 0.527 | 0.38 |
xLAM-v0.1-r | 0.532 | 0.45 | 0.547 | 0.46 | 0.455 | 0.36 |
LLM Name | Unseen Insts & Same Set | Unseen Tools & Seen Cat | Unseen Tools & Unseen Cat |
---|---|---|---|
TooLlama V2 | 0.4385 | 0.4300 | 0.4350 |
GPT-3.5-Turbo-0125 | 0.5000 | 0.5150 | 0.4900 |
GPT-4-0125-preview | 0.5462 | 0.5450 | 0.5050 |
xLAM-v0.1-r | 0.5077 | 0.5650 | 0.5200 |
LLM Name | 1-step | 2-step | 3-step | 4-step | 5-step |
---|---|---|---|---|---|
GPT-4-0613 | - | - | - | - | 69.45 |
Claude-Instant-1 | 12.12 | 32.25 | 39.25 | 44.37 | 45.90 |
xLAM-v0.1-r | 4.10 | 28.50 | 36.01 | 42.66 | 43.96 |
Claude-2 | 26.45 | 35.49 | 36.01 | 39.76 | 39.93 |
Lemur-70b-Chat-v1 | 3.75 | 26.96 | 35.67 | 37.54 | 37.03 |
GPT-3.5-Turbo-0613 | 2.73 | 16.89 | 24.06 | 31.74 | 36.18 |
AgentLM-70b | 6.48 | 17.75 | 24.91 | 28.16 | 28.67 |
CodeLlama-34b | 0.17 | 16.21 | 23.04 | 25.94 | 28.16 |
Llama-2-70b-chat | 4.27 | 14.33 | 15.70 | 16.55 | 17.92 |
LLM Name | Success Rate | Progress Rate |
---|---|---|
xLAM-v0.1-r | 0.533 | 0.766 |
DeepSeek-67B | 0.400 | 0.714 |
GPT-3.5-Turbo-0613 | 0.367 | 0.627 |
GPT-3.5-Turbo-16k | 0.317 | 0.591 |
Lemur-70B | 0.283 | 0.720 |
CodeLlama-13B | 0.250 | 0.525 |
CodeLlama-34B | 0.133 | 0.600 |
Mistral-7B | 0.033 | 0.510 |
Vicuna-13B-16K | 0.033 | 0.343 |
Llama-2-70B | 0.000 | 0.483 |
This code is licensed under Apache 2.0. For models based on the deepseek model, which require you to follow the use based restrictions in the linked deepseek license. This is a research only project.
We want to acknowledge the work which have made contributions to our paper and the agent research community! If you find our work useful, please consider to cite
@article{zhang2024agentohana,
title={AgentOhana: Design Unified Data and Training Pipeline for Effective Agent Learning},
author={Zhang, Jianguo and Lan, Tian and Murthy, Rithesh and Liu, Zhiwei and Yao, Weiran and Tan, Juntao and Hoang, Thai and Yang, Liangwei and Feng, Yihao and Liu, Zuxin and others},
journal={arXiv preprint arXiv:2402.15506},
year={2024}
}
@article{liu2024apigen,
title={APIGen: Automated PIpeline for Generating Verifiable and Diverse Function-Calling Datasets},
author={Liu, Zuxin and Hoang, Thai and Zhang, Jianguo and Zhu, Ming and Lan, Tian and Kokane, Shirley and Tan, Juntao and Yao, Weiran and Liu, Zhiwei and Feng, Yihao and others},
journal={arXiv preprint arXiv:2406.18518},
year={2024}
}
@article{zhang2024xlamfamilylargeaction,
title={xLAM: A Family of Large Action Models to Empower AI Agent Systems},
author={Zhang, Jianguo and Lan, Tian and Zhu, Ming and Liu, Zuxin and Hoang, Thai and Kokane, Shirley and Yao, Weiran and Tan, Juntao and Prabhakar, Akshara and Chen, Haolin and Liu, Zhiwei and Feng, Yihao and Awalgaonkar, Tulika and Murthy, Rithesh and Hu, Eric and Chen, Zeyuan and Xu, Ran and Niebles, Juan Carlos and Heinecke, Shelby and Wang, Huan and Savarese, Silvio and Xiong, Caiming},
journal={arXiv preprint arXiv:2409.03215}
year={2024}
}