Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gpt math solver #991

Draft
wants to merge 136 commits into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from 8 commits
Commits
Show all changes
136 commits
Select commit Hold shift + click to select a range
7dc4035
handle format error in message in _construct_params
yiranwu0 Apr 11, 2023
83ff983
fix typo
yiranwu0 Apr 12, 2023
ab2cada
Add math solver with automatic tool queries.
yiranwu0 Apr 16, 2023
2d70c99
add imports in QueryHandler
yiranwu0 Apr 16, 2023
c823cbf
update math solver
yiranwu0 Apr 23, 2023
766b022
require wolfram id in readme
yiranwu0 Apr 23, 2023
84ba0be
Merge branch 'main' into gpt_math_solver
yiranwu0 Apr 23, 2023
8f67ed7
fix bug in running python code
yiranwu0 Apr 23, 2023
a511f0a
Update flaml/autogen/math_solver/MathSolver.py
yiranwu0 Apr 23, 2023
87ad79d
Update flaml/autogen/math_solver/README.md
yiranwu0 Apr 23, 2023
a16fa5f
revise according to comments
yiranwu0 Apr 23, 2023
e21fd76
Merge branch 'gpt_math_solver' of github.com:kevin666aa/FLAML into gp…
yiranwu0 Apr 23, 2023
45dcb7f
fix code format
yiranwu0 Apr 23, 2023
435e7a4
Add prompt to system message
yiranwu0 Apr 23, 2023
d1747cf
refrtor file names
yiranwu0 Apr 24, 2023
56627a7
refine prompts
yiranwu0 Apr 24, 2023
9821820
add baseline PoT
yiranwu0 Apr 24, 2023
e37ee3e
fix bugs in query_handler
yiranwu0 Apr 24, 2023
5d44e5e
refine prompts
yiranwu0 Apr 24, 2023
bab2878
refine prompt to output fractions
yiranwu0 Apr 24, 2023
d0b0d4b
change prompt
yiranwu0 Apr 24, 2023
3e171a3
add temperature as args
yiranwu0 Apr 24, 2023
2261c5c
fix concat float to str
yiranwu0 Apr 24, 2023
8c5a86c
change prompt back to use fractions instead of decimal
yiranwu0 Apr 24, 2023
2b8b717
rewind prompt back to e37ee3
yiranwu0 Apr 25, 2023
8b68ff7
pass args.samples_per_category in PoT
yiranwu0 Apr 25, 2023
54407a7
fix counting bug in PoT and print in mth_solver
yiranwu0 Apr 25, 2023
4806631
fix error: convet exception to str
yiranwu0 Apr 25, 2023
80a7063
add logger to log stdouts and compress files
yiranwu0 Apr 25, 2023
d737644
refine logging
yiranwu0 Apr 25, 2023
d146e35
add option to put prompt in either system or user message, add option…
yiranwu0 Apr 26, 2023
26c0caa
clean up main.py
yiranwu0 Apr 26, 2023
2a1a47e
create pseudo_main.py
yiranwu0 Apr 26, 2023
edfc679
fix category loading bug
yiranwu0 Apr 26, 2023
6a15761
handle timeout
yiranwu0 Apr 26, 2023
ab64723
two new prompts
yiranwu0 Apr 26, 2023
f723a8f
add bash
yiranwu0 Apr 27, 2023
1a5c93c
more prompts
yiranwu0 Apr 27, 2023
955edca
change run sequence
yiranwu0 Apr 27, 2023
8519967
add more prompts
yiranwu0 Apr 28, 2023
912193e
catch wolfram error
yiranwu0 Apr 28, 2023
c8f90b4
more runs on v2.1 select, v1.2 select, add new v3select
yiranwu0 Apr 28, 2023
7a8c2ac
compress when all finished
yiranwu0 Apr 28, 2023
b9a7e04
py exec output fix
yiranwu0 Apr 28, 2023
65f1580
v3.1 select
yiranwu0 Apr 29, 2023
73088ce
new both prompt, v3.2select
yiranwu0 Apr 29, 2023
144c148
change execute to run
yiranwu0 Apr 29, 2023
812477a
refine query handling and v3.3select
yiranwu0 Apr 30, 2023
25e2708
catch wolfram errors
yiranwu0 Apr 30, 2023
1c00283
ablation on only using python and zeroshot baseline
yiranwu0 May 1, 2023
1330a00
change run sequence
yiranwu0 May 1, 2023
e61212f
new run
yiranwu0 May 1, 2023
2b5dd52
new run
yiranwu0 May 1, 2023
ac11d2a
consitent ouput folder in PoT
yiranwu0 May 1, 2023
9d291b9
1erun pot , refined prompt v1.3 v1.4 and v3.4
yiranwu0 May 2, 2023
ce7144a
resume 22 if not finished
yiranwu0 May 2, 2023
6fefde3
handle wolfram exception
yiranwu0 May 2, 2023
eaae6ce
one run for v1.5
yiranwu0 May 2, 2023
8fdf74f
one run for v1.5 corrections
yiranwu0 May 2, 2023
ca75c91
two more prompts v3.5select and v3.1python based on v3python
yiranwu0 May 3, 2023
47179ce
remove error string clipping
yiranwu0 May 3, 2023
a8c3758
handle UnicodeDecodeError
yiranwu0 May 3, 2023
132638a
handle UnicodeDecodeError
yiranwu0 May 3, 2023
280f9de
quick test on adding wolfram to v3.1python
yiranwu0 May 3, 2023
45a4abd
rerun v3.1 with refine, add v3.7select to further test wolfram
yiranwu0 May 4, 2023
b0efcbf
switch run seq v3.7select then v3.1python
yiranwu0 May 4, 2023
10c28ae
add v3.2python, slightly refine from v3.1. try out v3.3python
yiranwu0 May 4, 2023
bfe61aa
more args for PoT and refine load_leve5 func
yiranwu0 May 5, 2023
39ea367
trial 38-42: validate our methods on all level of problems, run large…
yiranwu0 May 5, 2023
0cebecb
update run.sh
yiranwu0 May 5, 2023
bddb610
move
sonichi May 6, 2023
326da82
add v4
yiranwu0 May 7, 2023
bd040b5
Merge branch 'gpt_math_solver' of github.com:kevin666aa/FLAML into gp…
yiranwu0 May 7, 2023
c8ba447
test with new system message
yiranwu0 May 7, 2023
62b5259
add baseline pnas, run v4.2 on level5 problems, test new sys message …
yiranwu0 May 8, 2023
ef509d4
fix trial 49
yiranwu0 May 8, 2023
e60850f
remove print
yiranwu0 May 8, 2023
5fe0b0b
run v3 with specified sentence removed, 4.2 with original sys message…
yiranwu0 May 9, 2023
d92b559
remove trial 52
yiranwu0 May 9, 2023
ede98a5
endpoint
sonichi May 9, 2023
7082355
Merge branch 'gpt_math_solver' of https://github.com/kevin666aa/FLAML…
sonichi May 9, 2023
7d34485
fix bug in queryhandler
yiranwu0 May 9, 2023
8e34218
Merge branch 'gpt_math_solver' of github.com:kevin666aa/FLAML into gp…
yiranwu0 May 9, 2023
c592837
fix queryhandler 2
yiranwu0 May 9, 2023
6345e0b
v3.3python
yiranwu0 May 10, 2023
40fd299
remove print
yiranwu0 May 10, 2023
dac1551
test final prompts
yiranwu0 May 11, 2023
fff4e4b
change run sequence
yiranwu0 May 11, 2023
da0f7d9
run exact v3.1 as before
yiranwu0 May 11, 2023
2775e08
keep runing v3.1python and add general_5
yiranwu0 May 11, 2023
ad10b71
add general_5
yiranwu0 May 11, 2023
7800a46
continue run 55 and 56
yiranwu0 May 12, 2023
4f78539
switch seq
yiranwu0 May 12, 2023
a76113f
trial 63 v3.5python, then run large-scale with v3.3python
yiranwu0 May 12, 2023
7d22c07
add v3.3, 3.7, 3.8
yiranwu0 May 13, 2023
908d283
revise 3.6-3.8
yiranwu0 May 13, 2023
079b4e2
v3.9
yiranwu0 May 13, 2023
f071214
test interalge and precal on v3.9
yiranwu0 May 13, 2023
6444f91
test v3.9 on 50 problems, then zero shot
yiranwu0 May 13, 2023
1c4a278
fix prompt
yiranwu0 May 13, 2023
c744613
endpoint
sonichi May 13, 2023
2ad469f
Merge branch 'gpt_math_solver' of https://github.com/kevin666aa/FLAML…
sonichi May 13, 2023
b806dfb
run all problems on v3.9, and pnas
yiranwu0 May 13, 2023
3733028
endpoint
sonichi May 13, 2023
cbd0be0
Merge remote-tracking branch 'upstream/main' into gpt_math_solver
May 15, 2023
89d7512
run v1python
yiranwu0 May 15, 2023
3791326
Merge branch 'gpt_math_solver' of github.com:kevin666aa/FLAML into gp…
yiranwu0 May 15, 2023
2a6ffa1
run v1python+wolfram
yiranwu0 May 16, 2023
d833938
run pot with sys message
yiranwu0 May 19, 2023
bf73756
endpoint
sonichi May 19, 2023
f1b3873
Merge branch 'gpt_math_solver' of https://github.com/kevin666aa/FLAML…
sonichi May 19, 2023
e3d8de1
run pot with system message
yiranwu0 May 19, 2023
d84213d
Merge branch 'gpt_math_solver' of https://github.com/kevin666aa/FLAML…
sonichi May 19, 2023
f8c68ff
Merge branch 'gpt_math_solver' of github.com:kevin666aa/FLAML into gp…
May 19, 2023
9bc17db
fewshot+zeroshot prompt
May 19, 2023
769803e
add assert
May 19, 2023
85d9b59
refine fewshot
yiranwu0 May 20, 2023
bce7f4f
run pre-commit
yiranwu0 May 20, 2023
59bc9f9
rerun v3.9 with cache and get token info
yiranwu0 May 21, 2023
32de58f
run PoT on all problems
yiranwu0 May 22, 2023
9dabf61
Merge remote-tracking branch 'upstream/main' into gpt_math_solver
yiranwu0 May 22, 2023
9c3efd4
merge new changes and update pot
yiranwu0 May 22, 2023
fc8bcdc
endpoint
sonichi May 22, 2023
c711143
Merge branch 'gpt_math_solver' of https://github.com/kevin666aa/FLAML…
sonichi May 22, 2023
f535e50
fix decode in PoT
yiranwu0 May 22, 2023
841ff2a
Merge branch 'gpt_math_solver' of github.com:kevin666aa/FLAML into gp…
yiranwu0 May 22, 2023
43d8277
clean up and rename
yiranwu0 May 27, 2023
01f7712
resolve conflict in setup
yiranwu0 May 27, 2023
d4d8242
Merge branch 'microsoft:main' into gpt_math_solver
yiranwu0 Jun 7, 2023
1cfce5f
clean up
yiranwu0 Jun 7, 2023
be7bb3d
update readme
yiranwu0 Jun 7, 2023
d3e8719
add mathchat flow hart
yiranwu0 Jun 7, 2023
2c8823f
Update README.md
yiranwu0 Jun 7, 2023
7808b4f
Merge branch 'microsoft:main' into gpt_math_solver
yiranwu0 Jun 10, 2023
348446b
add missing files
yiranwu0 Jul 10, 2023
c49ab9c
Merge branch 'gpt_math_solver' of github.com:kevin666aa/FLAML into gp…
yiranwu0 Jul 10, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
262 changes: 262 additions & 0 deletions flaml/autogen/math_solver/MathSolver.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,262 @@
from QueryHandler import QueryHandler
from flaml.autogen.math_utils import eval_math_responses, remove_boxed, last_boxed_only_string, write_json, remove_asy_sections, math_type_mapping
from flaml import oai
import os
import json
import re
import copy
from openai.error import InvalidRequestError

PROMPTS = {
"select":"""
Let's use two tools (python code and Wolfram alpha) to solve this problem step by step. You should always follow your own reasoning and only query when necessary.
yiranwu0 marked this conversation as resolved.
Show resolved Hide resolved

First state the key idea to solve the problem. Then follow the process:
1. Output one step.
2. Take out any queries that can be asked through python or Wolfram alpha (for example, any calculations or equations that can be calculated) and choose the best tool to be used.
Please format the query in json:
{ "tool" : "", # "python" or "wolfram"
"query": "", # your query here, either python code or Wolfram query.
}
Note: when you put python code in the query, you should: 1.make sure the indentation is correct(use '\\t'). 2. use 'print' function for the output. 3. always use fractions instead of decimal.
4. Wait for me to give the results.
5. Correct this step based on the results, or give a new query if the results are invalid.
6. When you get the answer, put the answer in \\box{}.
yiranwu0 marked this conversation as resolved.
Show resolved Hide resolved

Problem:
""",

# use python
'python':
"""
Let's use python code to solve this problem step by step. You should always follow your own reasoning and only query when necessary.

First state the key idea to solve the problem. Then follow the process:
1. Output one step.
2. Take out any queries that can be asked through python (for example, any calculations or equations that can be calculated). When you are querying python, you should: 1.use tab('\\t') for indentation. 2. use 'print' function for the output. 3. always output fractions instead of decimal.
Please format the query in json:
{ "tool" : "python",
"query": "", # your code here.
}
4. Wait for me to give the results.
5. Correct this step based on the results, or give a new query if the results are invalid.
6. When you get the answer, put the answer in \\box{}.

Problem:
""",

# use wolfram
'wolfram':
"""
Let's use Wolfram Alpha to solve this problem step by step. You should always follow your own reasoning and only query when necessary.

First state the key idea to solve the problem. Then follow the process:
1. Output one step.
2. Take out any queries that can be asked through Wolfram Alpha (for example, any calculations or equations that can be calculated).
Please format the query in json:
{ "tool" : "wolfram",
"query": "", # your query here. Please use wolfram language.
}
4. Wait for me to give the results.
5. Correct this step based on the results, or give a new query if the results are invalid.
6. When you get the answer, put the answer in \\box{}.

Problem:
""",
}

class MathSolver:
def __init__(self, model,
prompt_type='select',
max_round=10,
max_invalid_q_per_step = 3,
n=1,
use_cache=True):
self.max_round = max_round
if prompt_type not in PROMPTS:
raise ValueError(f'Tool {prompt_type} not supported, choose from {PROMPTS.keys()}')

self.prompt_type = prompt_type
self.prompt = PROMPTS[prompt_type]

self.deafult_config = {
'model': model,
'messages' : [
{"role": "system", "content": "You are a helpful assistant."},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One question could be: Does changing the system prompt make any difference?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about making the prompt before the "Problem:" a system message?

],
'n' : n, # n should be 1 for now
# 'temperature' : 1,
}

self.max_invalid_q_per_step = max_invalid_q_per_step
self.use_cache = use_cache


def make_conversation(self, problem, n=1, file_to_be_saved=None):
query_handler = QueryHandler()

# initialize the conversation
config = copy.deepcopy(self.deafult_config)
config['messages'].append({"role": "user", "content": self.prompt + remove_asy_sections(problem['problem'])})

# save a readable conversation in txt file
def save_message_to_file(message):
if conversation_saver is not None:
conversation_saver.write(message)
conversation_saver.flush()
conversation_saver = None
if file_to_be_saved is not None:
conversation_saver = open(file_to_be_saved, 'a')
seperate_line = '\n'+ '-'* 40 + '\n'
save_message_to_file(f'Problem: {self.str_splitter(problem["problem"])}\n\n {seperate_line}')
yiranwu0 marked this conversation as resolved.
Show resolved Hide resolved

# init parameters
is_valid_reply = False # only valid when detect \box
invalid_q = 0 # for query
token_used, total_cost = 0, 0
response_with_ans = "" # save the response with \box to get the answer
for rr in range(self.max_round):
# 1. get the response from the assistant
try:
raw_responses = oai.ChatCompletion.create(None, **config, use_cache=self.use_cache)
except InvalidRequestError as e:
yiranwu0 marked this conversation as resolved.
Show resolved Hide resolved
print(problem['type'], problem['problem_id'], e)
break
if raw_responses == -1:
break
yiranwu0 marked this conversation as resolved.
Show resolved Hide resolved
responses = oai.ChatCompletion.extract_text(raw_responses)
assert len(responses) == 1, 'More than one response' # right now we only use one response
save_message_to_file(f'assistant: {self.str_splitter(responses[0])}{seperate_line}')
# token_used = raw_responses['usage']['total_tokens']
total_cost += oai.ChatCompletion.cost(self.deafult_config['model'], raw_responses)
config['messages'].append({"role": "assistant", "content": responses[0]})
if '\\box' in responses[0]:
yiranwu0 marked this conversation as resolved.
Show resolved Hide resolved
# if the assistant gives a valid reply, stop the conversation
is_valid_reply = True
response_with_ans = responses[0]
break

# 2. handle the response and get the query
query_response, is_query_sucess = query_handler.handle_query(responses[0])
if len(query_response) > 2000:
# prevent long response by string length, 2000 chars -> around 500-1000 tokens
query_response = 'Your requested query response is too long. You might have made a mistake. Please revise your reasoning and query.'
yiranwu0 marked this conversation as resolved.
Show resolved Hide resolved
yiranwu0 marked this conversation as resolved.
Show resolved Hide resolved
is_query_sucess = False
config['messages'].append({"role": "user", "content": query_response})

invalid_q = 0 if is_query_sucess else invalid_q + 1
if invalid_q >= self.max_invalid_q_per_step:
assert config['messages'][-1]['role'] == 'user', 'The last message should be from user'
skip_query_str = 'Please revisit the problem statement and your reasoning. If you think this step is correct, solve it yourself and continue the next step. Otherwise, correct this step.'
config['messages'][-1]['content'] = skip_query_str
save_message_to_file(f'****: Replacing {query_response}****\n')
invalid_q = 0

save_message_to_file('user: {a}{s}'.format(a=config['messages'][-1]['content'], s=seperate_line))
save_message_to_file('Solution: ' + problem['solution'])

return {
'valid_q_count' : query_handler.valid_q_count, # number of valid queries
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If len(query_response) > 2000, is it a valid query or not?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a valid query(have a result) but not a good query(the result is not desired).

'total_q_count' : query_handler.total_q_count,
'is_valid_reply': is_valid_reply, # whether the assistant can give a valid reply
'response_with_ans': response_with_ans,
'ans' : remove_boxed(last_boxed_only_string(response_with_ans)),
yiranwu0 marked this conversation as resolved.
Show resolved Hide resolved
'messages': config['messages'],
'round' : rr+1,
yiranwu0 marked this conversation as resolved.
Show resolved Hide resolved
'cost' : total_cost,
}


def str_splitter(self, string, length=130):
"""
Add '\n' every 'length' characters to make the output more readable.
If at 'length' there is a word, add '\n' before the word.

Args:
string (str): The input string to be processed.
length (int): The maximum number of characters in a line before adding a newline.

Returns:
str: The processed string with newlines added.
"""

words = string.split(' ')
current_line = []
current_length = 0
result = []

for word in words:
if current_length + len(word) + len(current_line) > length:
result.append(' '.join(current_line))
current_line = []
current_length = 0

current_line.append(word)
current_length += len(word)

if current_line:
result.append(' '.join(current_line))

return '\n'.join(result)


def solve_one_category(self, problem_set, saving_folder):
"""
Solve all problems in a category.
Assumption 1: all problems are of the same type
Assumption 2: if resume from a previous run, the sequence of problems are the same as the previous run, using same shuffling seed

Args:
problem_set (list): a list of problems
saving_folder (str): the result folder to save the solved problems, the category folder will be created inside

Returns:
None
"""

# assume all problems are of the same type: TODO: ensure this assumption
saving_folder = os.path.join(saving_folder, math_type_mapping[problem_set[0]['type']])
# mkdir if not exist
os.makedirs(saving_folder, exist_ok=True)

# from the saving folder load solved problems
done_problems = set([int(f.split('.')[0]) for f in os.listdir(saving_folder) if 'json' in f])

correct_counts = 0
for count, problem in enumerate(problem_set):
problem_path = os.path.join(saving_folder, problem['problem_id'] + '.json')

# 1. if problem already solved, continue
if int(problem['problem_id']) in done_problems:
problem = json.load(open(problem_path, 'r'))
correct_counts += problem['is_correct']
print(f'{count}: {correct_counts}/{count+1} successes. valid response: {problem["is_valid_reply"]}, Correct: {problem["is_correct"]}, {problem["round"]} rounds. (This problem is loaded from previous run)')
continue

# 2. solve the problem
result = self.make_conversation(problem, file_to_be_saved=os.path.join(saving_folder, problem['problem_id'] + '.txt'))
metrics = eval_math_responses([result['response_with_ans']], problem['solution'])

# 3. save the result
correct_ans = remove_boxed(last_boxed_only_string(problem['solution']))
yiranwu0 marked this conversation as resolved.
Show resolved Hide resolved
problem.update({
'is_valid_reply': result['is_valid_reply'],
'is_correct': bool(metrics['success_vote']),
'correct_ans': correct_ans,
'voted_answer': remove_boxed(last_boxed_only_string(metrics['voted_answer'])) ,
yiranwu0 marked this conversation as resolved.
Show resolved Hide resolved
'round': result['round'],
'valid_q_count': result['valid_q_count'], # total number of valid queries
'total_q_count': result['total_q_count'], # total number of queries
'cost': result['cost'], # total cost of the conversation
'messages': result['messages'], # the conversation
})
write_json(problem, problem_path)

# 4. continue to next problem
correct_counts += problem['is_correct']
print(f'Problem {problem["problem_id"]}. Is Valid: {problem["is_valid_reply"]}, Is Correct: {bool(problem["is_correct"])}, Conversation Round: {problem["round"]}, Accum Sucesses: {correct_counts}/{count+1}')
yiranwu0 marked this conversation as resolved.
Show resolved Hide resolved

tp = problem_set[0]['type']
print(f'{tp} correct rate: {correct_counts}/{len(problem_set)} = {correct_counts/len(problem_set)}')


90 changes: 90 additions & 0 deletions flaml/autogen/math_solver/MathVoting.py
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm skipping this file for now.

Original file line number Diff line number Diff line change
@@ -0,0 +1,90 @@
from QueryHandler import QueryHandler
from flaml.autogen.math_utils import eval_math_responses, remove_boxed, last_boxed_only_string, write_json, remove_asy_sections, math_type_mapping
from flaml import oai
import os
import json
import re
import copy


class SelfConsistency:
def __init__(self, n=10, n_per_time=5, cache_folder='.cache'):
self.n = n
self.n_per_time = n_per_time
self.start_seed = 41
self.cache_folder = cache_folder

def vanilla_voting(self, accum_responses, solution):
if type(accum_responses[0]) == dict:
accum_responses = [r['response_with_ans'] for r in accum_responses]
return eval_math_responses(accum_responses, solution)

def early_stop_voting(self, accum_responses):
if type(accum_responses[0]) == dict:
accum_responses = [r['response_with_ans'] for r in accum_responses]
pass

def sequential_reasoning_path_sampling(self, problem, saving_folder, solving):
"""

Args:
problem (dict): problem dict
saving_folder (str): saving folder
solver (function): solver function, either MathSolver.make_conversation or vanilla prompt

return from vanilla prompt: {
'responses': responses,
'cost': oai.ChatCompletion.cost(model, raw_responses),
'prompt_cost': oai.ChatCompletion.price1K(model, 0) * raw_responses["usage"]["prompt_tokens"] / 1000
}

return from math solver: {
'valid_q_count' : query_handler.valid_q_count, # number of valid queries
'total_q_count' : query_handler.total_q_count,
'is_valid_reply': is_valid_reply, # whether the assistant can give a valid reply
'response_with_ans': response_with_ans,
'ans': ans,
'messages': config['messages'],
'round' : len(config['messages'])//2 + 1,
'cost' : total_cost,
}

"""
accum_responses = [] # can be a list of dicts (for mathsolver) or list of strings
accum_cost = 0
file = os.path.join(saving_folder, 'responses_' + problem['problem_id'] + '.json')
if os.path.exists(file):
accum_responses = json.load(open(file, 'r'))['responses']
accum_cost = json.load(open(file, 'r'))['cost']

query_count = len(accum_responses)
tmp_n = self.n_per_time
while query_count < self.n:
oai.ChatCompletion.set_cache(seed=self.start_seed + query_count, cache_path=self.cache_folder)
tmp_n = min(tmp_n, self.n - self.n_per_time)

responses = solving(problem=problem, n=tmp_n)

if 'responses' in responses.keys():
accum_responses.extend(responses['responses'])
if query_count != 0:
accum_cost -= responses['prompt_cost'] # if not the first round, deduct the prompt cost
else: # the response comes from math solver, single response
accum_responses.extend([responses])

accum_cost += responses['cost']
write_json({'cost': accum_cost,
'true_ans': remove_boxed(last_boxed_only_string(problem['solution'])),
'answers': [remove_boxed(last_boxed_only_string(r)) for r in accum_responses],
'responses': accum_responses,
},
file) # save the responses each time

query_count += tmp_n

# TODO: cost calculation: should prompt for each round being counted?
return {'responses': accum_responses, 'cost': accum_cost}




Loading