You must be signed in to change notification settings - Fork 0
how it works system degraded
The idea for this came from an AZ-400 exam objective in my exam and from my studies, no proper solution was ever provided so I came up with this
The objective:
Develop a Site Reliability Engineering (SRE) strategy (5-10%)
- Develop an actionable alerting strategy
- develop communication mechanism to notify users of degraded systems
The way I chose to implement this was using a banner that checked if the SystemDegraded
app setting is true
and notifies the user when it's turned on, however, the idea is simple, but the process of toggling it on automatically involved some pieces of Azure
This piece is one I should have demonstrated when the solution was in Azure [missed opportunities 😞]
This is the app setting
"SystemDegraded": false
This is the code for the banner in _Layout.cshtml
@inject IConfiguration Configuration
@if (Configuration.GetValue<bool>("SystemDegraded") is true)
<div class="alert alert-danger text-center p-0" role="alert">
<div class="d-inline-flex py-3">
<span class="fas fa-exclamation-triangle me-2"></span>
<p class="text-start m-0">
System Degraded: Parts of the system are down and we are trying our best to resolve this.<br>
Sorry for the inconvenience caused
In order for this solution to come together, a few things need to happen
The health check status needs to published to Application Insights
and enabled when Application Insights is enabled
// Part of the AddCommonChecks()
if (healthChecksModel.PublishHealthStatusToAppInsights)
For every Liveness
and Health
check performed, if the application is responsive, it's result is published to Application Insights
Next, a log query
needs to evaluate the logs every five minutes for how many times the health check status was 0
| where name == "AspNetCoreHealthCheck"
| where customMeasurements.["AspNetCoreHealthCheckStatus"] == 0
| project
View the full commit that explains this in detail here
Thereafter, an alert with an action group
[responders] needs to be created so that an ops team can be notified and have the system-degraded state turned on
// The completed action group
resource actionGroup 'Microsoft.Insights/actionGroups@2022-06-01' = {
name: actionGroupName
location: 'Global'
properties: {
groupShortName: 'sysdegraded'
enabled: true
emailReceivers: [
name: 'myself'
emailAddress: emailAddressOfResponder
useCommonAlertSchema: false
azureFunctionReceivers: [
name: 'System Degraded Toggler'
functionAppResourceId: functionApp.id
functionName: 'TurnItOn'
httpTriggerUrl: 'https://${functionApp.properties.defaultHostName}/api/turniton?code=${listKeys('${functionApp.id}/host/default', functionApp.apiVersion).masterKey}'
useCommonAlertSchema: true
tags: {
intendedResourceName: 'ag-systemdegraded-${environment}'
The solution begins with App Configuration
that will be added as a configuration source that will override appsettings.json
[explored below]
Here's the key-value in App Configuration as part of IaC
resource systemDegradedKeyValue 'Microsoft.AppConfiguration/configurationStores/keyValues@2022-05-01' = {
parent: appConfiguration
name: 'SystemDegraded'
properties: {
contentType: 'application/json'
value: 'false'
Though this works, this needs to happen automatically, an individual should not be the one to turn it on after the alert is fired
Here enters the System-Degraded Toggler
Function App
There's an Azure SDK, Azure.Data.AppConfiguration, that provides functionality to create and/or modify app settings in App Configuration
Great, so now the solution is beginning to look more complete, simply use an Http Trigger
for the function app and it'll turn on the system-degraded state
But there's one thing missing, authentication of the function app to App Configuration
Now, a feature called Managed Identity
comes into play, this allows services in Azure to authenticate to other services without storing credentials anywhere
The system-assigned managed identity is used and lives for the lifetime of the Azure service
// Slightly shortened for brevity
public async Task<HttpResponseData> Run([HttpTrigger(AuthorizationLevel.Admin, "post")] HttpRequestData request)
var appConfigurationUri = new Uri($"https://{GetEnvironmentVariable("AzureAppConfigName")}.azconfig.io");
var configurationClient = new ConfigurationClient(appConfigurationUri, new DefaultAzureCredential());
var systemDegradedConfigurationSetting =
new ConfigurationSetting("SystemDegraded", "true")
ContentType = "application/json"
await configurationClient.SetConfigurationSettingAsync(systemDegradedConfigurationSetting);
var response = request.CreateResponse(HttpStatusCode.OK);
return response;
Though this code is part of the WebConfigurationBuilder
used for all applications, after the WebApplication
is built, it's only used for the Mvc Frontend
if (builder.Configuration.GetValue<bool>("AzureAppConfig:Enabled"))
// Adds Azure App Configuration support using 'SystemDegraded' as the sentinel key to enable configuration refresh
// It only has 'SystemDegraded' to have it toggled on by a function app and override the state to inform users
builder.Configuration.AddAzureAppConfiguration(options =>
.ConfigureRefresh(refreshOptions =>
refreshOptions.Register("SystemDegraded", true);
This code provides an abstraction for using Azure App Configuration
public static WebApplication ConditionallyUseAzureAppConfiguration(this WebApplication app)
if (app.Configuration.GetValue<bool>("AzureAppConfig:Enabled"))
return app;
This line is added only for the Mvc Frontend, but potentially can be added for all applications
This diagram provides a high level [although not that detailed] look at this process
participant healthchecksui as Health Checks UI
participant webapps as All Applications
participant appi as Application Insights
actor team as Ops Team
participant func as System-Degraded Toggler
participant appcs as App Configuration
loop Every 5 minutes
loop Every 30 seconds
healthchecksui ->> webapps: /health/liveness
webapps ->> appi: Publish health status
healthchecksui ->> webapps: /health
webapps ->> appi: Publish health status
appi ->> appi: Run log query
alt Logs count > 5
appi ->> appi: Trigger alert
appi ->> team: Sends email notification
appi ->> func: Makes POST request
func ->> appcs: Sets 'SystemDegraded' to true
- Health Checks UI
- Mvc Frontend
- Web Backend-For-Frontend
- Address Service
- Address Worker
- Identity Service
- Order Service
- Order Worker
- Tyres Service