Replies: 4 comments
-
Think what you are asking. If output is text, split in 2 responses (for content_limit or whatever reason): If output is an object, split in 2 responses: Why? Neither Instructor nor Pydantic expect an object in 'pieces'. I think, but could be wrong. See #566 |
Beta Was this translation helpful? Give feedback.
-
Maybe I was unclear in my description: the issue in this case happens when producing a list or an iterable of smaller objects. If the number of those objects is high, you can still reach the output context window limit. |
Beta Was this translation helpful? Give feedback.
-
this is tought, not sure what the best path forward is, can you provide a sketch? just api level what the DX looks like? |
Beta Was this translation helpful? Give feedback.
-
@jxnl the thought was to use a similar design to the retry logic, but check the streamed completion chunks for Testing this manually, it works well for a few iterations and then all the models start to repeat themselves regardless of the complexity of the requested objects. I'm a little puzzled by that result, and it has me questioning if this would even be a useful feature. |
Beta Was this translation helpful? Give feedback.
-
Hi, I've been running into an issue for awhile now, and it looks like a good opportunity for a feature for instructor.
Most of the supported models return up to 4096 tokens, and if the model returns more than that and indicates it stopped due to "length" (token length), an
IncompleteOutputException
is rightfully thrown.For OpenAI and Anthropic (haven't tested on other models), it's possible to prompt the model to continue providing structured output where it left off, so long as the user includes the progress so far in the prompt. It may take one or more passes to actually retrieve the entire output. This happens when context length is on the longer side, which a lot of newer models are supporting, but lists of items returned are too long to fit into the output context window.
I'd be glad to take a stab at a contribution here, but want to know if it would be useful to provide an option to prompt the llm API to "continue" up to x number of times, similar to retry logic.
Beta Was this translation helpful? Give feedback.
All reactions