Alerts proactively notify you when issues are found with your infrastructure or application using your monitoring data in Azure Monitor.
They allow you to identify and address issues before the users of your system notice them.

Below are few of the target resource for whom we can create alerts in Azure:
- Virtual machines.
- Storage accounts.
- Log Analytics workspace.
- Application Insights.
- SQLMI
- Databases
We can create an alert based on following criterias:
- Metric values
- Log search queries
- Activity log events
- Health of the underlying Azure platform
- Tests for website availability
Note: In this posts we will only use ARM templates to create alerts instead of portal as part of automation.
Metric Alerts
The Azure platform offers over a thousand metrics, many of which have dimensions. For example: CPU Utilization, Memory Overload, diskRead etc.
Sample template.json file
{
"$schema": "https://schema.management.azure.com/schemas/2015-01-01/deploymentTemplate.json#",
"contentVersion": "1.0.0.0",
"parameters": {
"alertName": {
"type": "string",
"minLength": 1,
"metadata": {
"description": "Name of the alert"
}
},
"alertDescription": {
"type": "string",
"defaultValue": "This is a metric alert",
"metadata": {
"description": "Description of alert"
}
},
"alertSeverity": {
"type": "int",
"defaultValue": 3,
"allowedValues": [
0,
1,
2,
3,
4
],
"metadata": {
"description": "Severity of alert {0,1,2,3,4}"
}
},
"isEnabled": {
"type": "bool",
"defaultValue": true,
"metadata": {
"description": "Specifies whether the alert is enabled"
}
},
"resourceId": {
"type": "string",
"minLength": 1,
"metadata": {
"description": "Full Resource ID of the resource emitting the metric that will be used for the comparison. For example /subscriptions/00000000-0000-0000-0000-0000-00000000/resourceGroups/ResourceGroupName/providers/Microsoft.compute/virtualMachines/VM_xyz"
}
},
"metricName": {
"type": "string",
"minLength": 1,
"metadata": {
"description": "Name of the metric used in the comparison to activate the alert."
}
},
"operator": {
"type": "string",
"defaultValue": "GreaterThan",
"allowedValues": [
"Equals",
"NotEquals",
"GreaterThan",
"GreaterThanOrEqual",
"LessThan",
"LessThanOrEqual"
],
"metadata": {
"description": "Operator comparing the current value with the threshold value."
}
},
"threshold": {
"type": "string",
"defaultValue": "0",
"metadata": {
"description": "The threshold value at which the alert is activated."
}
},
"timeAggregation": {
"type": "string",
"defaultValue": "Average",
"allowedValues": [
"Average",
"Minimum",
"Maximum",
"Total",
"Count"
],
"metadata": {
"description": "How the data that is collected should be combined over time."
}
},
"windowSize": {
"type": "string",
"defaultValue": "PT5M",
"allowedValues": [
"PT1M",
"PT5M",
"PT15M",
"PT30M",
"PT1H",
"PT6H",
"PT12H",
"PT24H"
],
"metadata": {
"description": "Period of time used to monitor alert activity based on the threshold. Must be between one minute and one day. ISO 8601 duration format."
}
},
"evaluationFrequency": {
"type": "string",
"defaultValue": "PT1M",
"allowedValues": [
"PT1M",
"PT5M",
"PT15M",
"PT30M",
"PT1H"
],
"metadata": {
"description": "how often the metric alert is evaluated represented in ISO 8601 duration format"
}
},
"actionGroupId": {
"type": "string",
"defaultValue": "",
"metadata": {
"description": "The ID of the action group that is triggered when the alert is activated or deactivated"
}
}
},
"variables": { },
"resources": [
{
"name": "[parameters('alertName')]",
"type": "Microsoft.Insights/metricAlerts",
"location": "global",
"apiVersion": "2018-03-01",
"tags": {},
"properties": {
"description": "[parameters('alertDescription')]",
"severity": "[parameters('alertSeverity')]",
"enabled": "[parameters('isEnabled')]",
"scopes": ["[parameters('resourceId')]"],
"evaluationFrequency":"[parameters('evaluationFrequency')]",
"windowSize": "[parameters('windowSize')]",
"criteria": {
"odata.type": "Microsoft.Azure.Monitor.SingleResourceMultipleMetricCriteria",
"allOf": [
{
"name" : "1st criterion",
"metricName": "[parameters('metricName')]",
"dimensions":[],
"operator": "[parameters('operator')]",
"threshold" : "[parameters('threshold')]",
"timeAggregation": "[parameters('timeAggregation')]"
}
]
},
"actions": [
{
"actionGroupId": "[parameters('actionGroupId')]"
}
]
}
}
]
}
Sample parameters.json file
{
"$schema": "https://schema.management.azure.com/schemas/2015-01-01/deploymentParameters.json#",
"contentVersion": "1.0.0.0",
"parameters": {
"alertName": {
"value": "New Metric Alert"
},
"alertDescription": {
"value": "New metric alert created via template"
},
"alertSeverity": {
"value":3
},
"isEnabled": {
"value": true
},
"resourceId": {
"value": "/subscriptions/replace-with-subscription-id/resourceGroups/replace-with-resourceGroup-name/providers/Microsoft.Compute/virtualMachines/replace-with-resource-name"
},
"metricName": {
"value": "Percentage CPU"
},
"operator": {
"value": "GreaterThan"
},
"threshold": {
"value": "80"
},
"timeAggregation": {
"value": "Average"
},
"actionGroupId": {
"value": "/subscriptions/replace-with-subscription-id/resourceGroups/resource-group-name/providers/Microsoft.Insights/actionGroups/replace-with-action-group"
}
}
}
See below description for the parameters used in above template
Criteria – A combination of signal and logic applied on a target resource. Examples:
- Percentage CPU > 70%
- Server Response Time > 4 ms
- Result count of a log query > 100
Alert Name – A specific name for the alert rule configured by the user.
Alert Description – A description for the alert rule configured by the user.
Severity – The severity of the alert after the criteria specified in the alert rule is met. Severity can range from 0 to 4.
- Sev 0 = Critical
- Sev 1 = Error
- Sev 2 = Warning
- Sev 3 = Informational
- Sev 4 = Verbose
Action and Action Groups – A specific action taken when the alert is fired. An action group can be an event like an azure function or simply an email action.
Activity log based Alerts
Activity log alerts are alerts that activate when a new activity log event occurs that matches the conditions specified in the alert. Based on the order and volume of the events recorded in Azure activity log, the alert rule will fire
To create an activity log alert rule by using an Azure Resource Manager template, you create a resource of the type microsoft.insights/activityLogAlerts
.
{
"$schema": "https://schema.management.azure.com/schemas/2015-01-01/deploymentTemplate.json#",
"contentVersion": "1.0.0.0",
"parameters": {
"activityLogAlertName": {
"type": "string",
"metadata": {
"description": "Unique name (within the Resource Group) for the Activity log alert."
}
},
"activityLogAlertEnabled": {
"type": "bool",
"defaultValue": true,
"metadata": {
"description": "Indicates whether or not the alert is enabled."
}
},
"actionGroupResourceId": {
"type": "string",
"metadata": {
"description": "Resource Id for the Action group."
}
}
},
"resources": [
{
"type": "Microsoft.Insights/activityLogAlerts",
"apiVersion": "2017-04-01",
"name": "[parameters('activityLogAlertName')]",
"location": "Global",
"properties": {
"enabled": "[parameters('activityLogAlertEnabled')]",
"scopes": [
"[subscription().id]"
],
"condition": {
"allOf": [
{
"field": "category",
"equals": "Administrative"
},
{
"field": "operationName",
"equals": "Microsoft.Resources/deployments/write"
},
{
"field": "resourceType",
"equals": "Microsoft.Resources/deployments"
}
]
},
"actions": {
"actionGroups":
[
{
"actionGroupId": "[parameters('actionGroupResourceId')]"
}
]
}
}
}
]
}
Custom Log Alerts using KQL
Here we see a simple alert based on Kusto Query Language (KQL) which we can run and test in Log Analytics workspace and the create a query based alert in Azure Monitor.
Log alerts allow users to use a Log Analytics query to evaluate resources logs every set frequency, and fire an alert based on the results. Rules can trigger one or more actions using Action Groups.
For example, following query helps in troubleshooting APP service errors by querying AppServiceConsoleLogs table
AppServiceConsoleLogs |
where ResultDescription contains "error"
Now let us see how can we use this query in our ARM template to create a log alert.
template.json file
{
"$schema": "https://schema.management.azure.com/schemas/2015-01-01/deploymentTemplate.json#",
"contentVersion": "1.0.0.0",
"parameters": {
"alertName": {
"type": "string",
"minLength": 1,
"metadata": {
"description": "Name of the alert"
}
},
"location": {
"type": "string",
"minLength": 1,
"metadata": {
"description": "Location of the alert"
}
},
"alertDescription": {
"type": "string",
"defaultValue": "This is a metric alert",
"metadata": {
"description": "Description of alert"
}
},
"alertSeverity": {
"type": "int",
"defaultValue": 3,
"allowedValues": [
0,
1,
2,
3,
4
],
"metadata": {
"description": "Severity of alert {0,1,2,3,4}"
}
},
"isEnabled": {
"type": "bool",
"defaultValue": true,
"metadata": {
"description": "Specifies whether the alert is enabled"
}
},
"resourceId": {
"type": "string",
"minLength": 1,
"metadata": {
"description": "Full Resource ID of the resource emitting the metric that will be used for the comparison. For example /subscriptions/00000000-0000-0000-0000-0000-00000000/resourceGroups/ResourceGroupName/providers/Microsoft.compute/virtualMachines/VM_xyz"
}
},
"query": {
"type": "string",
"minLength": 1,
"metadata": {
"description": "Name of the metric used in the comparison to activate the alert."
}
},
"metricMeasureColumn": {
"type": "string",
"metadata": {
"description": "Name of the measure column used in the alert evaluation."
}
},
"resourceIdColumn": {
"type": "string",
"metadata": {
"description": "Name of the resource ID column used in the alert targeting the alerts."
}
},
"operator": {
"type": "string",
"defaultValue": "GreaterThan",
"allowedValues": [
"Equals",
"NotEquals",
"GreaterThan",
"GreaterThanOrEqual",
"LessThan",
"LessThanOrEqual"
],
"metadata": {
"description": "Operator comparing the current value with the threshold value."
}
},
"threshold": {
"type": "string",
"defaultValue": "0",
"metadata": {
"description": "The threshold value at which the alert is activated."
}
},
"numberOfEvaluationPeriods": {
"type": "string",
"defaultValue": "4",
"metadata": {
"description": "The number of periods to check in the alert evaluation."
}
},
"minFailingPeriodsToAlert": {
"type": "string",
"defaultValue": "3",
"metadata": {
"description": "The number of unhealthy periods to alert on (must be lower or equal to numberOfEvaluationPeriods)."
}
},
"timeAggregation": {
"type": "string",
"defaultValue": "Average",
"allowedValues": [
"Average",
"Minimum",
"Maximum",
"Total",
"Count"
],
"metadata": {
"description": "How the data that is collected should be combined over time."
}
},
"windowSize": {
"type": "string",
"defaultValue": "PT5M",
"allowedValues": [
"PT1M",
"PT5M",
"PT15M",
"PT30M",
"PT1H",
"PT6H",
"PT12H",
"PT24H"
],
"metadata": {
"description": "Period of time used to monitor alert activity based on the threshold. Must be between one minute and one day. ISO 8601 duration format."
}
},
"evaluationFrequency": {
"type": "string",
"defaultValue": "PT1M",
"allowedValues": [
"PT1M",
"PT5M",
"PT15M",
"PT30M",
"PT1H"
],
"metadata": {
"description": "how often the metric alert is evaluated represented in ISO 8601 duration format"
}
},
"muteActionsDuration": {
"type": "string",
"defaultValue": "PT5M",
"allowedValues": [
"PT1M",
"PT5M",
"PT15M",
"PT30M",
"PT1H",
"PT6H",
"PT12H",
"PT24H"
],
"metadata": {
"description": "Mute actions for the chosen period of time (in ISO 8601 duration format) after the alert is fired."
}
},
"actionGroupId": {
"type": "string",
"defaultValue": "",
"metadata": {
"description": "The ID of the action group that is triggered when the alert is activated or deactivated"
}
}
},
"variables": { },
"resources": [
{
"name": "[parameters('alertName')]",
"type": "Microsoft.Insights/scheduledQueryRules",
"location": "[parameters('location')]",
"apiVersion": "2020-05-01-preview",
"tags": {},
"properties": {
"description": "[parameters('alertDescription')]",
"severity": "[parameters('alertSeverity')]",
"enabled": "[parameters('isEnabled')]",
"scopes": ["[parameters('resourceId')]"],
"evaluationFrequency":"[parameters('evaluationFrequency')]",
"windowSize": "[parameters('windowSize')]",
"criteria": {
"allOf": [
{
"query": "[parameters('query')]",
"metricMeasureColumn": "[parameters('metricMeasureColumn')]",
"resourceIdColumn": "[parameters('resourceIdColumn')]",
"dimensions":[],
"operator": "[parameters('operator')]",
"threshold" : "[parameters('threshold')]",
"timeAggregation": "[parameters('timeAggregation')]",
"failingPeriods": {
"numberOfEvaluationPeriods": "[parameters('numberOfEvaluationPeriods')]",
"minFailingPeriodsToAlert": "[parameters('minFailingPeriodsToAlert')]"
}
}
]
},
"muteActionsDuration": "[parameters('muteActionsDuration')]",
"actions": [
{
"actionGroupId": "[parameters('actionGroupId')]"
}
]
}
}
]
}
parameter.json file
{
"$schema": "https://schema.management.azure.com/schemas/2015-01-01/deploymentParameters.json#",
"contentVersion": "1.0.0.0",
"parameters": {
"alertName": {
"value": "AppServiceErrorAlert"
},
"location": {
"value": "EastUS"
},
"alertDescription": {
"value": "This is custom log alert to trouble shoot App Service"
},
"alertSeverity": {
"value": "2"
},
"isEnabled": {
"value": "true"
},
"resourceId": {
"value": "/subscriptions/00000000-0000-0000-0000-0000-00000000/resourceGroups/ResourceGroupName/providers/Microsoft.compute/virtualMachines/VM_xyz"
},
"query": {
"value": "AppServiceConsoleLogs | where ResultDescription contains 'error'"
},
"operator": {
"value": "Greater Than"
},
"threshold": {
"value": "0"
},
"numberOfEvaluationPeriods": {
"value": "4"
},
"timeAggregation": {
"value": "Maximum"
},
"windowSize": {
"value": "PT5H"
},
"evaluationFrequency": {
"value": "PT5M"
},
"actionGroupId": {
"value": ""/subscriptions/00000000-0000-0000-0000-0000-00000000/resourceGroups/ResourceGroupName/providers/Microsoft.compute/actionGroups/Action_Group_Name""
}
}
AZ CLI command to execute above templates:
az deployment group create --resource-group testgroup --template-file template.json --parameters parameters.json
So this was a short tutorial on alerts, in future posts we will do deep dive onto KQL based Custom log alerts.
Great Article! In the Custom KQL Alert, is the resourceId the Log Analytics Workspace? How does it know which Log Analytics to perform this query against?
LikeLike