Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement Failover Mechanism for Critical Dependencies to Ensure 99% Uptime #113

Open
SverreNystad opened this issue May 8, 2024 · 1 comment
Assignees
Labels
enhancement New feature or request

Comments

@SverreNystad
Copy link
Member

Implement Failover Mechanism for Critical Dependencies to Ensure 99% Uptime

Description

To meet the Availability quality requirement A1, which states: "System uptime must be 99%, with capabilities to handle critical operations around the clock," we need to address the uptime dependencies of TutorAI on our commercial off-the-shelf (COTS) solutions, specifically OpenAI and MongoDB.

Current Issue

  1. OpenAI:

    • Uptime Guarantee: OpenAI does not provide a Service Level Agreement (SLA) guaranteeing any specific uptime.
    • Track Record: OpenAI does not consistently achieve 99% uptime.
    • Impact: Without a failover mechanism, any downtime from OpenAI directly affects TutorAI's availability.
  2. MongoDB:

    • Uptime Guarantee: MongoDB provides an SLA guaranteeing at least 99% uptime (as per their SLA documentation).
    • Impact: Despite the SLA, downtime would still disrupt major functionalities of TutorAI.

Proposed Solution

To ensure TutorAI meets its uptime requirement, we must implement a failover mechanism for both OpenAI and MongoDB:

  1. For OpenAI:

    • Develop a failover system to automatically switch API usage to an alternative Large Language Model (LLM) provider such as Gemini, Claude, LLama, or Grok during OpenAI downtimes.
  2. For MongoDB:

    • Implement a fallback solution for critical database operations. This could involve setting up a secondary database system or utilizing a distributed database architecture to minimize downtime impact.

Action Items

  • Research and Integration:
    • Evaluate potential LLM providers (Gemini, Claude, LLama, Grok) for compatibility and performance.
    • Develop and test the failover mechanism to switch between LLM providers seamlessly.
  • Database Fallback Solutions:
    • Identify suitable fallback strategies for MongoDB.
    • Implement and test the chosen database failover solution.

Conclusion

Implementing these failover mechanisms is crucial to ensuring that TutorAI can achieve the required 99% uptime, thus maintaining reliable operations around the clock despite potential downtime from our COTS dependencies.

@SverreNystad SverreNystad added the enhancement New feature or request label May 8, 2024
@SverreNystad SverreNystad self-assigned this May 8, 2024
@SverreNystad SverreNystad changed the title Add several TextGenerator implementations Implement Failover Mechanism for Critical Dependencies to Ensure 99% Uptime May 8, 2024
@SverreNystad
Copy link
Member Author

It could be of use to use the Chain of Responsibility to handle the failover:
Chain of Responsibility is behavioral design pattern that allows passing request along the chain of potential handlers until one of them handles request. Read more here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant