Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Health Check API #43

Open
Amruth-Vamshi opened this issue Dec 19, 2024 · 0 comments
Open

Health Check API #43

Amruth-Vamshi opened this issue Dec 19, 2024 · 0 comments

Comments

@Amruth-Vamshi
Copy link
Contributor

Health Check API Documentation

Overview

The Health Check API provides a centralized mechanism to monitor the availability and functionality of all critical services utilized by the chatbot application. This includes both internal systems and external APIs. The health check ensures that issues can be promptly detected and resolved.

Purpose

The /health API is used to:

  • Verify if all dependencies (databases, APIs, etc.) are operational.
  • Detect issues early and log failures with detailed diagnostics.
  • Provide actionable data for system administrators to resolve issues quickly.

Monitored Services

Internal Services

  1. Postgres (Database)

    • Status: Connectivity check.
    • Impact: Telemetry and other database-dependent features will fail if unavailable.
  2. Redis (Cache System)

    • Status: Set/Get operation validation.
    • Impact: Query response times may increase if unavailable.

External APIs

  1. Bhashini

    • Services Checked:
      • Text Translation
      • Speech-to-Text
      • Text-to-Speech
      • Language Detection
    • Impact: Translation, speech services, and language identification will fail.
  2. Wadhwani LLM

    • Services Checked:
      • LLM-based responses to non payment questions.
    • Impact: Chatbot will fail to answer user queries.
  3. PM Kisan

    • Services Checked:
      • OTP Sending and Verification
      • User Details Retrieval
    • Impact: PM Kisan-related features such as OTP authentication will fail.

API Endpoints

1. GET /health

Description

Aggregates health statuses for all monitored services.

Success Response

{
  "status": "ok",
  "info": {
    "Postgres": {
      "status": { "isAvailable": true },
      "name": "Postgres",
      "type": "internal",
      "impactMessage": "Telemetry will not work",
      "sla": {
        "timeForResolutionInMinutes": 60,
        "priority": 0
      }
    },
    "Redis": {
      "status": { "isAvailable": true },
      "name": "Redis",
      "type": "internal",
      "impactMessage": "Query time of API's will be slower",
      "sla": {
        "timeForResolutionInMinutes": 60,
        "priority": 0
      }
    },
    "bhashini": {
      "status": { "isAvailable": true },
      "name": "bhashini",
      "type": "external",
      "impactMessage": "Bhashini services (translate, transliterate, t2s, s2t, language detection) will not work",
      "sla": {
        "timeForResolutionInMinutes": -1,
        "priority": 0
      }
    },
    "Wadhwani": {
      "status": { "isAvailable": true },
      "name": "Wadhwani LLM",
      "type": "external",
      "impactMessage": "Chatbot will not be able to answer general questions",
      "sla": {
        "timeForResolutionInMinutes": 120,
        "priority": 1
      }
    },
    "PM Kisan": {
      "status": { "isAvailable": true },
      "name": "PM Kisan",
      "type": "external",
      "impactMessage": "PM Kisan OTP verification and user details retrieval will not work",
      "sla": {
        "timeForResolutionInMinutes": 120,
        "priority": 1
      }
    }
  },
  "error": {},
  "details": {}
}

Error Response

When one or more services fail, the /health API returns an error response with details of the failed services.

Example Error Response

{
  "status": "error",
  "info": {},
  "error": {
    "Redis": {
      "status": { "isAvailable": false },
      "name": "Redis",
      "type": "internal",
      "impactMessage": "Query time of API's will be slower",
      "error": "Redis connection failed",
      "sla": {
        "timeForResolutionInMinutes": 60,
        "priority": 0
      }
    },
    "bhashini": {
      "status": { "isAvailable": false },
      "name": "bhashini",
      "type": "external",
      "impactMessage": "Bhashini services (translate, transliterate, t2s, s2t, language detection) will not work",
      "error": "Translation API timed out",
      "sla": {
        "timeForResolutionInMinutes": 120,
        "priority": 1
      }
    }
  },
  "details": {}
}

2. GET /health/ping

Description

Simple health check to confirm the service is running.

Success Response

{
  "status": "ok",
  "details": {
    "bff": {
      "status": "up"
    }
  }
}

Implementation Details

Controller (health.controller.ts)

  • Defines endpoints for /health and /health/ping.
  • Delegates checks to the HealthService.

Service (health.service.ts)

  • Implements detailed health checks for each service.
  • Each service check
  • Returns a structured response using getStatus() for success.
  • Throws HealthCheckError on failure.

Monitoring Services

Internal Services:
  1. Postgres:
  • Run a simple SQL query (SELECT 1) to ensure database connectivity.
  1. Redis:
  • Perform a set and get operation to validate cache functionality.
External APIs:
  1. Bhashini:
    Validate APIs for:
  • Translation
  • Speech-to-Text
  • Text-to-Speech
  • Language Detection
  1. Wadhwani LLM:
  • Test a predefined query to verify functionality.
  1. PM Kisan:
  • OTP Verification: Validate OTP send and retrieval APIs.
  • User Details Retrieval: Ensure user data can be fetched correctly.

Error Handling

  • All failures are reported in the error section of the JSON response.
  • Each error includes:
    • name: Service name.
    • type: Internal/External.
    • impactMessage: What breaks due to the failure.
    • error: Detailed error message.
    • sla: Time to resolve and priority.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant