-
Notifications
You must be signed in to change notification settings - Fork 222
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ollama client (from adalflow.components.model_client.ollama_client) does not work with stream=True #299
Comments
I can work on this @liyin2015 . I think changing After going through the code, I think the way adalflow is processing it in the I made the following changes that seems to be solve it. Let me know if this seems good, I can open up a PR. def _post_call(self, completion: Any) -> GeneratorOutput:
r"""Get string completion and process it with the output_processors."""
# parse chat completion will only fill the raw_response
log.debug("in post call")
output: GeneratorOutput = self.model_client.parse_chat_completion(completion)
if isinstance(output, types.GeneratorType):
def processed_generator():
"""Process each chunk dynamically."""
try:
for raw_output in output:
log.debug(f"Processing raw chunk: {raw_output.raw_response}")
processed_chunk = raw_output.raw_response
if self.output_processors and processed_chunk:
try:
processed_chunk = self.output_processors(processed_chunk)
except Exception as e:
log.error(f"Error processing the output processors: {e}")
yield GeneratorOutput(
data=None,
raw_response=raw_output.raw_response,
error=str(e),
)
continue
yield GeneratorOutput(
data=processed_chunk,
raw_response=raw_output.raw_response,
error=None,
)
except Exception as e:
log.error(f"Error while streaming processed chunks: {e}")
yield GeneratorOutput(error=str(e), raw_response=None)
# Return a new GeneratorOutput with the processed generator
return GeneratorOutput(data=processed_generator(), raw_response=output)
# Now adding the data filed to the output
data = output.raw_response
if self.output_processors:
if data:
try:
data = self.output_processors(data)
output.data = data
except Exception as e:
log.error(f"Error processing the output processors: {e}")
output.error = str(e)
else:
output.data = data
return output I tested it with the following ollama_ai = {
"model_client": OllamaClient(host=host),
"model_kwargs": {
"model": "phi3:latest",
"stream": True,
},
}
generator = Generator(**ollama_ai)
output = generator({"input_str": "What is the capital of France?"})
for chunk in output.data:
print(chunk.data, end="", flush=True)
# for stream: False
# print(output.data) |
Thanks @BalasubramanyamEvani . Added the else clause to the if to make it clearer. Code below:
I did a round of testing and it works. Regards, |
Bug description
The application does not work with stream set to True. The class adalflow.components.model_client.ollama_client has the method todo stream input as below:
The yield method would require a loop to get all the tokens. Is there a reason to use yield instead of return.
There are two ways to go about solving this:
SOLUTION 1
Change yield to return.
SOLUTION 2
Change method parse_chat_completion to get all the token and then return the GeneratorOutput
One thing to remember is that for async implementation we have to create async_parse_chat_completion as the method parse_chat_completion would not work for asynchronous calls.
@liyin2015 Once reviewed and verified that this is an issue i would go ahed and raise a PR for the implementation.
Regards,
What version are you seeing the problem on?
How to reproduce the bug
The text was updated successfully, but these errors were encountered: