Bugfix/kickoff hangs when llm call fails #1943

bhancockio · 2025-01-21T20:12:18Z

Pretty important bug fix for OSS:

Root issue:

We are not properly handling the issue when LiteLLM fails to make a call because it doesn't have the proper API keys.
As a result, the crew tries to make the same llm call until we hit the max iteration limit (20 by default).

Issues for users:

To the users, it looks like crew is broken and not making progress. Checkout this issue: [BUG] kickoff hangs when LLM call fails #1934

Solution:

We properly handle LiteLLM exceptions now and exit early.

Closes #1934

joaomdmoura · 2025-01-21T20:14:38Z

Disclaimer: This review was made by a crew of AI Agents.

Code Review Comment for PR #1943

Overview

This PR significantly improves the handling of authentication errors and enhances logging mechanisms in the CrewAI codebase, specifically related to LiteLLM integration and error management.

Key Code Improvements

1. Logging Enhancements

In the current code, there are several print statements for debugging purposes. This approach is not suitable for production-level code:

Current Implementation:

print("Authentication error: Please check your API credentials")

Suggested Improvement:
Transition to the Python logging framework:

import logging

logging.error("Authentication error: Please check your API credentials")

This change will allow for different logging levels and better control over log output.

2. Error Handling Improvement

The nested try-except structures could lead to decreased readability and maintenance challenges. It's advisable to simplify the error handling mechanism:

Current Implementation:

try:
    # operation
except LiteLLMAuthenticationError as auth_error:
    # handle auth error
except Exception as e:
    # handle general error

Suggested Improvement:
Encapsulate proper error-handling logic to reduce nesting:

def _handle_errors(self):
    try:
        # operation
    except LiteLLMAuthenticationError:
        self._handle_auth_error()
    except Exception as e:
        self._handle_generic_error(e)

This modular approach improves readability and can be reused in other parts of the code.

3. Magic Constants

The code currently makes use of hard-coded strings, which should be defined as constants for maintainability:

Current Implementation:

self._printer.print(content="Authentication error with litellm occurred", color="red")

Suggested Improvement:
Define these as class constants:

class CrewAgentExecutor:
    ERROR_COLOR = "red"
    AUTH_ERROR_MESSAGE = "Authentication error with litellm occurred."

This practice fosters easier updates and consistency in error messaging.

Links to Historical Context and Learnings

While I couldn't fetch related PRs, it is important to note that previous pull requests have highlighted the need for consistency in error handling and logging strategy, as evidenced by the ongoing pattern of switching from print statements to a more robust logging framework.

Lessons learned from earlier PRs emphasize the significance of structured error management and a cohesive logging strategy that can greatly enhance debugging capabilities and system transparency.

Specific Recommendations

Adopt a Consistent Logging Framework: Replace all debugging print statements with the appropriate logging levels to ensure production readiness.
Enhance Documentation: Incorporate detailed docstrings for all new methods, including parameters, return types, and exceptional scenarios that may arise.
Testing Enhancements: Implement unit tests for newly introduced error handling scenarios. Include integration tests specifically focused on LiteLLM authentication flows.
Centralized Configuration Management: Extract configurable parameters into a configuration file, allowing for easy modifications and environment-specific configurations.
Security Review: Ensure sensitive information does not get logged, particularly around authentication errors, to mitigate risks of data leaks.

Conclusion

Overall, the adjustments made in PR #1943 enhance code maintainability, reliability, and production readiness. Addressing the outlined recommendations will promote better practices and a higher quality codebase as we continue to develop the CrewAI project. This thoughtful enhancement and continued focus on code quality will pave the way for a robust error handling and logging framework that supports our ongoing efforts in machine learning and AI.

lorenzejay · 2025-01-22T19:10:53Z

src/crewai/agent.py

+            if isinstance(e, LiteLLMAuthenticationError):
+                # Do not retry on authentication errors
+                raise e


nice ! When we raise should we create a colored logger to make it clear no keys provided ?

lorenzejay · 2025-01-22T19:11:32Z

src/crewai/agents/crew_agent_executor.py

@@ -145,10 +149,40 @@ def _invoke_loop(self):
                if self._is_context_length_exceeded(e):
                    self._handle_context_length()
                    continue
+                elif self._is_litellm_authentication_error(e):


nice you did it here! beautiful !

lorenzejay · 2025-01-22T19:14:48Z

tests/agent_test.py

+        goal="test goal",
+        backstory="test backstory",
+        llm=LLM(model="gpt-4"),
+        max_retry_limit=0,  # Disable retries for authentication errors


shouldnt this work without doing this? as in if that error happens, we should drop max_retry_limit to 0 ?

bhancockio added 2 commits January 21, 2025 13:51

Wip to address #1934

002568f

implement proper try / except

e32d100

clean up PR

e69f4bc

devin-ai-integration bot mentioned this pull request Jan 21, 2025

Add test coverage for LiteLLM authentication error handling #1944

Closed

bhancockio and others added 7 commits January 21, 2025 15:33

add tests

37d425b

Fix tests and code that was broken

8bd292e

mnore clean up

daa0168

Fixing tests

dc2e684

fix stop type errors]

0ddf84a

Merge branch 'main' into bugfix/kickoff-hangs-when-llm-call-fails

7d576f3

more fixes

9e23bd8

gvieira approved these changes Jan 22, 2025

View reviewed changes

lorenzejay reviewed Jan 22, 2025

View reviewed changes

bhancockio merged commit 67f0de1 into main Jan 22, 2025
4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bugfix/kickoff hangs when llm call fails #1943

Bugfix/kickoff hangs when llm call fails #1943

bhancockio commented Jan 21, 2025 •

edited

Loading

joaomdmoura commented Jan 21, 2025

lorenzejay Jan 22, 2025

lorenzejay Jan 22, 2025

lorenzejay Jan 22, 2025

Bugfix/kickoff hangs when llm call fails #1943

Bugfix/kickoff hangs when llm call fails #1943

Conversation

bhancockio commented Jan 21, 2025 • edited Loading

joaomdmoura commented Jan 21, 2025

Code Review Comment for PR #1943

Overview

Key Code Improvements

1. Logging Enhancements

2. Error Handling Improvement

3. Magic Constants

Links to Historical Context and Learnings

Specific Recommendations

Conclusion

lorenzejay Jan 22, 2025

Choose a reason for hiding this comment

lorenzejay Jan 22, 2025

Choose a reason for hiding this comment

lorenzejay Jan 22, 2025

Choose a reason for hiding this comment

bhancockio commented Jan 21, 2025 •

edited

Loading