Skip to content

how it works system degraded

Shaylen Reddy edited this page Oct 8, 2023 · 2 revisions

The System-Degraded State

The idea for this came from an AZ-400 exam objective in my exam and from my studies, no proper solution was ever provided so I came up with this

The objective:

Develop a Site Reliability Engineering (SRE) strategy (5-10%)

  • Develop an actionable alerting strategy
    • develop communication mechanism to notify users of degraded systems

The way I chose to implement this was using a banner that checked if the SystemDegraded app setting is true and notifies the user when it's turned on, however, the idea is simple, but the process of toggling it on automatically involved some pieces of Azure

This piece is one I should have demonstrated when the solution was in Azure [missed opportunities 😞]

The Code Related to the System-Degraded State

This is the app setting

{
    "SystemDegraded": false
}

This is the code for the banner in _Layout.cshtml

@inject IConfiguration Configuration

@if (Configuration.GetValue<bool>("SystemDegraded") is true)
{
    <div class="alert alert-danger text-center p-0" role="alert">
        <div class="d-inline-flex py-3">
            <div>
                <span class="fas fa-exclamation-triangle me-2"></span>
            </div>
            <div>
                <p class="text-start m-0">
                    System Degraded: Parts of the system are down and we are trying our best to resolve this.<br>
                    Sorry for the inconvenience caused
                </p>
            </div>
        </div>
    </div>
}

Taking Advantage of Azure Services

In order for this solution to come together, a few things need to happen

The health check status needs to published to Application Insights and enabled when Application Insights is enabled

// Part of the AddCommonChecks()
if (healthChecksModel.PublishHealthStatusToAppInsights)
{
    healthChecks.AddApplicationInsightsPublisher();
}

For every Liveness and Health check performed, if the application is responsive, it's result is published to Application Insights

Next, a log query needs to evaluate the logs every five minutes for how many times the health check status was 0

customEvents
| where name == "AspNetCoreHealthCheck"
| where customMeasurements.["AspNetCoreHealthCheckStatus"] == 0
| project
    timestamp,
    ApplicationName=customDimensions.["Assembly"],
    ApplicationVersion=application_Version,
    HealthCheckStatus=customMeasurements.["AspNetCoreHealthCheckStatus"],
    HealthCheckDuration=customMeasurements.["AspNetCoreHealthCheckDuration"]

View the full commit that explains this in detail here

Thereafter, an alert with an action group [responders] needs to be created so that an ops team can be notified and have the system-degraded state turned on

// The completed action group
resource actionGroup 'Microsoft.Insights/actionGroups@2022-06-01' = {
  name: actionGroupName
  location: 'Global'
  properties: {
    groupShortName: 'sysdegraded'
    enabled: true
    emailReceivers: [
      {
        name: 'myself'
        emailAddress: emailAddressOfResponder
        useCommonAlertSchema: false
      }
    ]
    azureFunctionReceivers: [
      {
        name: 'System Degraded Toggler'
        functionAppResourceId: functionApp.id
        functionName: 'TurnItOn'
        httpTriggerUrl: 'https://${functionApp.properties.defaultHostName}/api/turniton?code=${listKeys('${functionApp.id}/host/default', functionApp.apiVersion).masterKey}'
        useCommonAlertSchema: true
      }
    ]
  }
  tags: {
    intendedResourceName: 'ag-systemdegraded-${environment}'
  }
}

The solution begins with App Configuration that will be added as a configuration source that will override appsettings.json [explored below]

Here's the key-value in App Configuration as part of IaC

resource systemDegradedKeyValue 'Microsoft.AppConfiguration/configurationStores/keyValues@2022-05-01' = {
  parent: appConfiguration
  name: 'SystemDegraded'
  properties: {
    contentType: 'application/json'
    value: 'false'
  }
}

Though this works, this needs to happen automatically, an individual should not be the one to turn it on after the alert is fired

Here enters the System-Degraded Toggler Function App

There's an Azure SDK, Azure.Data.AppConfiguration, that provides functionality to create and/or modify app settings in App Configuration

Great, so now the solution is beginning to look more complete, simply use an Http Trigger for the function app and it'll turn on the system-degraded state

But there's one thing missing, authentication of the function app to App Configuration

Now, a feature called Managed Identity comes into play, this allows services in Azure to authenticate to other services without storing credentials anywhere

The system-assigned managed identity is used and lives for the lifetime of the Azure service

// Slightly shortened for brevity
[Function("TurnItOn")]
public async Task<HttpResponseData> Run([HttpTrigger(AuthorizationLevel.Admin, "post")] HttpRequestData request)
{
    var appConfigurationUri = new Uri($"https://{GetEnvironmentVariable("AzureAppConfigName")}.azconfig.io");

    var configurationClient = new ConfigurationClient(appConfigurationUri, new DefaultAzureCredential());

    var systemDegradedConfigurationSetting =
        new ConfigurationSetting("SystemDegraded", "true")
        {
            ContentType = "application/json"
        };

    await configurationClient.SetConfigurationSettingAsync(systemDegradedConfigurationSetting);

    var response = request.CreateResponse(HttpStatusCode.OK);

    return response;
}

The Code to Enable Configuration Refresh

Though this code is part of the WebConfigurationBuilder used for all applications, after the WebApplication is built, it's only used for the Mvc Frontend

if (builder.Configuration.GetValue<bool>("AzureAppConfig:Enabled"))
{
    // Adds Azure App Configuration support using 'SystemDegraded' as the sentinel key to enable configuration refresh
    // It only has 'SystemDegraded' to have it toggled on by a function app and override the state to inform users
    builder.Configuration.AddAzureAppConfiguration(options =>
    {
        options
            .Connect(builder.Configuration["AzureAppConfig:ConnectionString"])
            .Select("*")
            .ConfigureRefresh(refreshOptions =>
            {
                refreshOptions.Register("SystemDegraded", true);
                refreshOptions.SetCacheExpiration(TimeSpan.FromSeconds(30));
            });
    });
}

This code provides an abstraction for using Azure App Configuration

public static WebApplication ConditionallyUseAzureAppConfiguration(this WebApplication app)
{
    if (app.Configuration.GetValue<bool>("AzureAppConfig:Enabled"))
    {
        app.UseAzureAppConfiguration();
    }

    return app;
}

This line is added only for the Mvc Frontend, but potentially can be added for all applications

app.ConditionallyUseAzureAppConfiguration();

Sequence Diagram

This diagram provides a high level [although not that detailed] look at this process

sequenceDiagram
    participant healthchecksui as Health Checks UI
    participant webapps as All Applications
    participant appi as Application Insights
    actor team as Ops Team
    participant func as System-Degraded Toggler
    participant appcs as App Configuration
    loop Every 5 minutes
        loop Every 30 seconds
            healthchecksui ->> webapps: /health/liveness
            webapps ->> appi: Publish health status
            healthchecksui ->> webapps: /health
            webapps ->> appi: Publish health status
        end

        appi ->> appi: Run log query

        alt Logs count > 5
            appi ->> appi: Trigger alert
            appi ->> team: Sends email notification
            appi ->> func: Makes POST request
            func ->> appcs: Sets 'SystemDegraded' to true
        end
    end

Loading
Clone this wiki locally