Microsoft Azure is the cloud platform of choice for thousands of UK businesses, from startups running their first virtual machine to enterprises operating complex multi-region architectures. Azure offers extraordinary flexibility and power, but that flexibility comes with a significant risk: without proper monitoring, Azure costs can spiral out of control, and performance issues can go undetected until they affect your customers or staff.
The pay-as-you-go model that makes cloud computing so attractive also makes it uniquely challenging to manage financially. Unlike on-premises infrastructure where costs are largely fixed and predictable, Azure charges fluctuate based on resource consumption, data transfer, storage volumes, and dozens of other variables. A misconfigured virtual machine left running over a bank holiday weekend, an auto-scaling rule that triggers too aggressively, or a storage account accumulating data without lifecycle policies can each add hundreds or thousands of pounds to your monthly bill.
This guide covers the essential tools, techniques, and best practices for monitoring both performance and costs in Azure, ensuring your cloud investment delivers value without unpleasant surprises on the invoice.
Azure Monitor: Your Central Observatory
Azure Monitor is the native platform for collecting, analysing, and acting on telemetry data from your Azure environment. It consolidates metrics (numerical performance data), logs (detailed event records), and traces (application-level diagnostics) into a unified platform. Every Azure resource automatically generates basic metrics — CPU utilisation, memory usage, network throughput, disk IOPS — which Azure Monitor collects without any additional configuration.
For deeper visibility, Application Insights — a feature of Azure Monitor — provides application performance management (APM) for web applications. It tracks request rates, response times, failure rates, dependency calls, and user sessions, giving you a complete picture of how your applications are performing from the end user's perspective.
Key Metrics to Track in Azure Monitor
Understanding which metrics to prioritise is essential for effective Azure monitoring. For virtual machines, CPU utilisation and available memory are the primary indicators of compute health. Sustained CPU usage above 80 per cent typically indicates that a VM is undersized for its workload, whilst consistently low usage below 20 per cent suggests the resource is oversized and costing more than necessary. Memory pressure is equally important — when available memory drops below 15 per cent, applications begin to slow as the operating system resorts to disk-based paging.
Disk metrics deserve particular attention in Azure environments. Azure managed disks have defined IOPS and throughput limits based on their tier, and exceeding these limits results in throttling that can severely impact application performance. Monitor both disk read/write IOPS and disk queue length — a consistently high queue length indicates that your storage tier cannot keep pace with demand, and upgrading to a Premium SSD or Ultra Disk may be necessary.
Network metrics round out the infrastructure monitoring picture. Track both inbound and outbound data transfer, as Azure charges for outbound data leaving a region. Unexpected spikes in outbound traffic can indicate both performance issues and cost concerns simultaneously. For web applications, monitor the number of active connections and connection failures to identify capacity constraints before they affect end users.
Log Analytics Workspaces
While metrics provide numerical snapshots of resource health, Log Analytics workspaces in Azure Monitor offer deep investigative capability through detailed log data. Logs capture events, errors, configuration changes, and security-related activities across your entire Azure estate. The Kusto Query Language (KQL) enables powerful analysis of this data, allowing your team to correlate events across multiple resources, identify root causes of incidents, and spot trends that metrics alone would miss.
For UK businesses, designing your Log Analytics workspace architecture requires balancing several considerations. A single centralised workspace simplifies management and enables cross-resource correlation, but may create concerns about data sovereignty if you operate across multiple regions. The data retention period also has cost implications — Azure charges for data ingestion and retention beyond the default 30-day period. Many UK organisations find that a 90-day retention period in Log Analytics, combined with longer-term archival to Azure Storage for compliance purposes, strikes the right balance between operational usefulness and cost.
While Azure Monitor provides comprehensive native monitoring, many UK businesses supplement it with third-party tools such as Datadog, New Relic, or Grafana for enhanced visualisation, cross-cloud monitoring, or specific capabilities. The right choice depends on your environment complexity. For most UK SMEs running Azure-only workloads, Azure Monitor combined with Azure Advisor provides sufficient visibility without the additional cost and complexity of third-party platforms.
Setting Up Effective Alerts
Monitoring data is only useful if it triggers action when something goes wrong. Azure Monitor alerts allow you to define conditions — such as CPU exceeding 90 per cent for more than five minutes, or a web application returning more than ten errors per minute — and automatically notify your team or trigger remediation actions when those conditions are met.
Effective alerting requires careful calibration. Too few alerts means problems go unnoticed. Too many alerts creates noise fatigue, where your team stops paying attention because most alerts are false positives. Start with alerts for genuinely critical conditions and refine the thresholds over time based on your environment's normal behaviour patterns.
Action Groups and Automated Remediation
When an alert fires, Azure uses action groups to determine what happens next. An action group defines a set of notification and remediation actions — sending an email to your IT team, posting a message to a Microsoft Teams channel, triggering an SMS to the on-call engineer, or invoking an Azure Automation runbook to attempt automatic remediation. For UK businesses with limited IT staff, automated remediation can be particularly valuable. A runbook that automatically restarts a failed application service or scales up a resource that has hit its capacity limit can resolve issues in minutes rather than hours, especially outside normal working hours.
Consider implementing tiered alerting that reflects the severity and urgency of different conditions. Warning-level alerts might send an email to the team channel during business hours, whilst critical alerts should trigger SMS notifications and phone calls regardless of the time. For organisations operating under service level agreements with their clients, this tiered approach ensures that SLA-threatening issues receive immediate attention whilst less urgent matters are addressed during normal operations.
Azure also supports smart alert grouping, which uses machine learning to identify related alerts and group them together. This reduces noise significantly — rather than receiving fifty individual alerts when a network issue causes multiple resources to fail simultaneously, your team receives a single grouped notification that identifies the common cause. This feature is particularly useful during major incidents where alert storms can overwhelm the response team.
Essential Azure Alerts for UK Businesses
| Alert Category | Metric | Suggested Threshold | Severity |
|---|---|---|---|
| Compute | VM CPU utilisation | > 90% for 5+ minutes | Warning |
| Compute | VM available memory | < 10% for 5+ minutes | Critical |
| Storage | Storage account capacity | > 80% of quota | Warning |
| Application | HTTP 5xx error rate | > 5 per minute | Critical |
| Application | Response time | > 3 seconds average | Warning |
| Security | Failed sign-in attempts | > 10 in 5 minutes | Critical |
| Cost | Daily spend anomaly | > 120% of 30-day average | Warning |
Azure Cost Management: Controlling Your Cloud Bill
Azure Cost Management + Billing is the built-in tool for understanding, monitoring, and optimising your Azure spending. It provides cost analysis views that break down spending by resource, resource group, subscription, service, region, and time period. For UK businesses managing multiple Azure subscriptions or sharing costs across departments, it supports budgets, cost allocation, and chargeback reporting.
Setting Budgets and Alerts
The first step in cost control is setting budgets. Create a monthly budget in Azure Cost Management that reflects your expected spending level, and configure alerts at 80 per cent, 90 per cent, and 100 per cent of the budget. This provides early warning when spending is tracking above expectations and gives you time to investigate and intervene before the month ends.
Azure Advisor Cost Recommendations
Azure Advisor is an intelligent assistant that analyses your resource utilisation patterns and provides specific recommendations for reducing costs. Common recommendations include shutting down or resizing underutilised virtual machines, purchasing Reserved Instances for consistently running workloads, deleting unused public IP addresses and unattached disks, and implementing storage lifecycle policies to move infrequently accessed data to cooler tiers.
Building a Cost Optimisation Workflow
Azure Advisor recommendations are only valuable if they are acted upon consistently. Establish a regular cost optimisation workflow — ideally a monthly review meeting where your team examines the latest Advisor recommendations, reviews spending trends in Cost Management, and makes decisions about rightsizing, Reserved Instance purchases, and resource cleanup. Many UK businesses find that dedicating just two hours per month to this review process yields savings that far outweigh the time invested.
The FinOps methodology, increasingly adopted by UK organisations, provides a structured framework for cloud financial management. At its core, FinOps recognises that cloud cost management is a shared responsibility involving finance, technology, and business teams. Rather than leaving cost decisions solely to IT, FinOps encourages collaboration — finance teams understand the billing model, technology teams understand the resource requirements, and business teams understand the value each workload delivers. This cross-functional approach prevents the common pattern where IT optimises costs in isolation, potentially degrading performance for revenue-generating applications.
For larger UK organisations, consider implementing a cloud centre of excellence or designating a FinOps practitioner responsible for ongoing cost governance. This role involves monitoring spending trends, identifying optimisation opportunities, negotiating Enterprise Agreement terms with Microsoft, and ensuring that cost-saving recommendations are implemented consistently across all teams and subscriptions. The investment in this capability typically pays for itself many times over through reduced waste and more informed purchasing decisions.
Reserved Instances and Savings Plans
For workloads that run continuously — production servers, databases, application gateways — Reserved Instances offer substantial savings over pay-as-you-go pricing. By committing to a one-year or three-year term for specific resources, you can save between 30 and 72 per cent compared to on-demand rates.
Azure Savings Plans offer similar discounts with more flexibility. Rather than committing to a specific VM size and region, a Savings Plan commits to a fixed hourly spend amount and applies the discount automatically across eligible resources. This is particularly useful for UK businesses whose workload composition may change over time.
Reserved Instances
- Deepest discounts (up to 72% savings)
- Committed to specific VM size and region
- Ideal for stable, predictable workloads
- 1-year or 3-year commitment terms
- Exchange and refund policies available
- Best for production servers and databases
Savings Plans
- Good discounts (up to 65% savings)
- Flexible across VM sizes and regions
- Ideal for changing workload compositions
- 1-year or 3-year commitment terms
- Automatically applies to eligible usage
- Best for dynamic or growing environments
Tagging Strategy for Cost Visibility
Tags are metadata labels applied to Azure resources that enable cost allocation and reporting. A well-implemented tagging strategy answers critical business questions: how much are we spending on the development environment versus production? What is the cloud cost per project or client? Which department is driving the highest Azure consumption?
At a minimum, implement tags for environment (production, staging, development), department or cost centre, project or client name, owner (the person responsible for the resource), and creation date. Enforce tagging compliance through Azure Policy, which can prevent the creation of untagged resources and automatically apply default tags where they are missing.
Tag Governance and Enforcement
Implementing a tagging strategy is only half the battle — maintaining tag compliance over time is the greater challenge. Without governance, tag quality degrades rapidly as new team members create resources without understanding the tagging requirements, or existing staff take shortcuts under pressure. Azure Policy provides the enforcement mechanism, but it needs to be configured thoughtfully. Overly strict policies that block resource creation when tags are missing can frustrate developers and slow deployment pipelines, whilst overly lenient policies that merely audit non-compliance tend to be ignored.
A pragmatic approach for UK businesses is to enforce mandatory tags — such as environment, owner, and cost centre — through deny policies that prevent resource creation without them, whilst using audit policies for recommended but optional tags. Combine this with Azure Policy initiatives that can be applied consistently across all subscriptions in your organisation. Regular compliance reports generated from Azure Policy data provide visibility into tagging health and highlight areas that need attention.
Cost Allocation and Showback Reporting
Tags enable cost allocation, but the real value comes from using that allocation data to drive accountability. Showback reporting — sharing cost data with department heads or project managers — creates awareness of cloud consumption patterns and encourages more responsible resource usage. When a project manager sees that their development environment is costing three thousand pounds per month because VMs are left running around the clock, they are far more motivated to implement auto-shutdown schedules than when that cost is buried in a centralised IT budget.
Azure Cost Management supports exporting cost data to Power BI for custom reporting, or you can use the built-in cost views filtered by tags to generate department-specific or project-specific spending reports. For UK businesses using Microsoft 365 and Power BI, this integration provides a familiar reporting interface that non-technical stakeholders can use to explore their cloud spending independently.
Performance Monitoring Best Practices
Cost management and performance monitoring are two sides of the same coin. Oversized resources waste money but also mask performance issues by throwing capacity at problems rather than solving them. Undersized resources save money in the short term but degrade user experience and can cause outages during peak demand.
Establish performance baselines for your key workloads during normal operation. Record metrics like average CPU utilisation, memory consumption, disk latency, and network throughput over a typical week. These baselines provide the reference point against which anomalies can be detected and capacity planning decisions can be made.
Application Performance Monitoring
Infrastructure metrics alone do not tell the complete performance story. A virtual machine may show healthy CPU and memory utilisation whilst the application running on it delivers a poor user experience due to inefficient database queries, slow third-party API calls, or memory leaks in the application code. Application Insights, integrated with Azure Monitor, bridges this gap by instrumenting your application code to capture detailed performance telemetry.
For UK businesses running web applications on Azure, Application Insights provides an application map that visualises the dependencies between your application components and external services, highlighting where failures and slowdowns are occurring. The smart detection feature uses machine learning to identify anomalies — unusual increases in failure rates, degradation in response times, or unexpected changes in traffic patterns — without requiring you to define explicit thresholds for every possible condition. This is particularly valuable for applications with variable usage patterns, such as e-commerce platforms that experience seasonal peaks around Black Friday or the January sales.
Network Performance Considerations
Network performance is often the overlooked element of Azure monitoring, yet it frequently has the greatest impact on end-user experience. For UK businesses serving customers across the country, the latency between your Azure resources and your users matters enormously. Azure Network Watcher provides tools for diagnosing network issues, including connection monitoring that continuously tests connectivity between resources, and packet capture for detailed network traffic analysis.
Consider implementing Azure Front Door or Azure CDN for customer-facing applications. These services cache content at edge locations closer to your users, reducing latency and improving page load times. They also provide built-in DDoS protection and web application firewall capabilities, addressing both performance and security requirements. For UK businesses with users primarily in the United Kingdom and Europe, selecting Azure regions in UK South, UK West, or West Europe and pairing them with a CDN ensures optimal performance for your primary audience.
Monitoring egress costs is another critical aspect of network performance management. Azure charges for data leaving its network, and these costs can accumulate significantly for applications that serve large files, stream media, or transfer data between regions. Review your network architecture regularly to identify opportunities for reducing egress — caching strategies, compression, and co-locating related resources in the same region can all contribute to lower egress charges without compromising performance.
Monitoring Azure performance and costs is not a set-and-forget exercise. It requires ongoing attention, regular review, and a willingness to act on the insights your monitoring tools provide. The businesses that manage Azure most effectively are those that treat cloud cost management as a continuous discipline rather than a periodic clean-up, and that recognise performance and cost as interconnected aspects of the same challenge.
Need Help Managing Your Azure Environment?
Cloudswitched provides Azure management services for UK businesses, combining performance monitoring, cost optimisation, and security management into a comprehensive cloud operations service. We help you right-size resources, implement Reserved Instances, and maintain performance baselines — so your Azure investment delivers maximum value. Contact us for an Azure cost review.
GET IN TOUCH