Installation

Install with CLI Recommended
gh skills-hub install azure-compute

Don't have the extension? Run gh extension install samueltauil/skills-hub first.

Download and extract to your repository:

.github/skills/azure-compute/

Extract the ZIP to .github/skills/ in your repo. The folder name must match azure-compute for Copilot to auto-discover it.

Skill Files (23)

SKILL.md 5.3 KB
---
name: azure-compute
description: "Azure VM and VMSS router for recommendations, pricing, autoscale, orchestration, connectivity troubleshooting, capacity reservations, and Essential Machine Management. WHEN: Azure VM, VMSS, scale set, recommend, compare, server, website, burstable, lightweight, VM family, workload, GPU, learning, simulation, dev/test, backend, autoscale, load balancer, Flexible orchestration, Uniform orchestration, cost estimate, connect, refused, Linux, black screen, reset password, reach VM, port 3389, NSG, troubleshoot, capacity reservation, CRG, reserve VMs, guarantee capacity, pre-provision capacity, CRG association, CRG disassociation, essential machine management, EMM, machine enrollment."
license: MIT
metadata:
  author: Microsoft
  version: "2.4.2"
---

# Azure Compute Skill

Routes Azure VM requests to the appropriate workflow based on user intent.

## When to Use This Skill

Activate this skill when the user:
- Asks about Azure Virtual Machines (VMs) or VM Scale Sets (VMSS)
- Asks about choosing a VM, VM sizing, pricing, or cost estimates
- Needs a workload-based recommendation for scenarios like database, GPU, deep learning, HPC, web tier, or dev/test
- Mentions VM families, autoscale, load balancing, or Flexible versus Uniform orchestration
- Wants to troubleshoot Azure VM connectivity issues such as unreachable VMs, RDP/SSH failures, black screens, NSG/firewall issues, or credential resets
- Asks about Capacity Reservation Groups (CRGs), reserving VM capacity, associating/disassociating VMs with a CRG, or guaranteeing compute capacity
- Asks about Essential Machine Management (EMM), machine enrollment, onboarding VMs for monitoring/security, or enabling machine management at subscription level
- Uses prompts like "Help me choose a VM"

## Routing

```text
User intent?
β”œβ”€ Recommend / choose / compare / price a VM or VMSS
β”‚  └─ Route to [VM Recommender](workflows/vm-recommender/vm-recommender.md)
β”‚
β”œβ”€ Can't connect / RDP / SSH / troubleshoot a VM
β”‚  └─ Route to [VM Troubleshooter](workflows/vm-troubleshooter/vm-troubleshooter.md)
β”‚
β”œβ”€ Capacity reservation / CRG / reserve capacity / associate VM with CRG
β”‚  └─ Route to [Capacity Reservation](workflows/capacity-reservation/capacity-reservation.md)
β”‚
β”œβ”€ Essential Machine Management / EMM / machine enrollment
β”‚  └─ Route to [Essential Machine Management](workflows/essential-machine-management/essential-machine-management.md)
β”‚
└─ Unclear
   └─ Ask: "Are you looking for a VM recommendation, troubleshooting a connectivity issue, managing capacity reservations, or enabling Essential Machine Management?"
```

| Signal                                                                        | Workflow                                                                                   |
| ----------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------ |
| "recommend VM", "which VM", "VM size", "VM pricing", "VMSS", "scale set"     | [VM Recommender](workflows/vm-recommender/vm-recommender.md)                               |
| "can't connect", "RDP", "SSH", "NSG blocking", "reset password", "black screen" | [VM Troubleshooter](workflows/vm-troubleshooter/vm-troubleshooter.md)                   |
| "capacity reservation", "CRG", "reserve capacity", "guarantee capacity", "associate VM with CRG" | [Capacity Reservation](workflows/capacity-reservation/capacity-reservation.md) |
| "essential machine management", "EMM", "machine enrollment" | [Essential Machine Management](workflows/essential-machine-management/essential-machine-management.md) |

> **Routing rule:** Always read the matched workflow file before accessing any reference files. The workflow file contains the step-by-step guidance and the reference routing table for the user's request.

## Workflows

| Workflow                  | Purpose                                                  | References                                                                   |
| ------------------------- | -------------------------------------------------------- | ---------------------------------------------------------------------------- |
| **VM Recommender**        | Recommend VM sizes, VMSS, pricing using public APIs/docs | [vm-families](references/vm-families.md), [retail-prices-api](references/retail-prices-api.md), [vmss-guide](references/vmss-guide.md), [vm-quotas](references/vm-quotas.md) |
| **VM Troubleshooter**     | Diagnose and resolve VM connectivity failures (RDP/SSH)  | [cannot-connect-to-vm](workflows/vm-troubleshooter/references/cannot-connect-to-vm.md) |
| **Capacity Reservation**  | Create and manage Capacity Reservation Groups (CRGs)     | [capacity-reservation-overview](workflows/capacity-reservation/references/capacity-reservation-overview.md), [association-disassociation](workflows/capacity-reservation/references/association-disassociation.md) |
| **Essential Machine Management** | Enable and manage EMM for subscription-level VM onboarding | [emm-overview](workflows/essential-machine-management/references/emm-overview.md), [emm-prerequisites](workflows/essential-machine-management/references/emm-prerequisites.md), [emm-enable-flow-portal-guidance](workflows/essential-machine-management/references/emm-enable-flow-portal-guidance.md), [emm-enable-flow](workflows/essential-machine-management/references/emm-enable-flow.md) |

references/
retail-prices-api.md 6.3 KB
# Azure Retail Prices API Guide

The [Azure Retail Prices API](https://learn.microsoft.com/en-us/rest/api/cost-management/retail-prices/azure-retail-prices) is **unauthenticated** β€” no Azure account or subscription needed.

## Endpoint

```text
https://prices.azure.com/api/retail/prices
```

Preview version (includes savings plan rates):
```text
https://prices.azure.com/api/retail/prices?api-version=2023-01-01-preview
```

## Querying VM Prices

> **No Azure CLI command exists** for the Retail Prices API. Since the API is unauthenticated, use `curl` (bash) or `Invoke-RestMethod` (PowerShell) directly. The `az rest` command also works but adds no auth benefit.

### Basic VM price lookup

```http
GET https://prices.azure.com/api/retail/prices?$filter=serviceName eq 'Virtual Machines' and armRegionName eq 'eastus' and armSkuName eq 'Standard_D4s_v5' and priceType eq 'Consumption'
```

```bash
curl -s "https://prices.azure.com/api/retail/prices?\$filter=serviceName%20eq%20'Virtual%20Machines'%20and%20armRegionName%20eq%20'eastus'%20and%20armSkuName%20eq%20'Standard_D4s_v5'%20and%20priceType%20eq%20'Consumption'"
```

```powershell
$filter = "serviceName eq 'Virtual Machines' and armRegionName eq 'eastus' and armSkuName eq 'Standard_D4s_v5' and priceType eq 'Consumption'"
$response = Invoke-RestMethod "https://prices.azure.com/api/retail/prices?`$filter=$filter"
$response.Items | Select-Object armSkuName, retailPrice, unitOfMeasure, meterName
```

### Filter by family (all D-series in a region)

```http
GET https://prices.azure.com/api/retail/prices?$filter=serviceName eq 'Virtual Machines' and armRegionName eq 'eastus' and contains(armSkuName, 'Standard_D') and priceType eq 'Consumption'
```

```bash
curl -s "https://prices.azure.com/api/retail/prices?\$filter=serviceName%20eq%20'Virtual%20Machines'%20and%20armRegionName%20eq%20'eastus'%20and%20contains(armSkuName,%20'Standard_D')%20and%20priceType%20eq%20'Consumption'"
```

```powershell
$filter = "serviceName eq 'Virtual Machines' and armRegionName eq 'eastus' and contains(armSkuName, 'Standard_D') and priceType eq 'Consumption'"
$response = Invoke-RestMethod "https://prices.azure.com/api/retail/prices?`$filter=$filter"
$response.Items | Select-Object armSkuName, retailPrice, meterName
```

### Reservation pricing

```http
GET https://prices.azure.com/api/retail/prices?$filter=serviceName eq 'Virtual Machines' and armSkuName eq 'Standard_D4s_v5' and priceType eq 'Reservation'
```

```bash
curl -s "https://prices.azure.com/api/retail/prices?\$filter=serviceName%20eq%20'Virtual%20Machines'%20and%20armSkuName%20eq%20'Standard_D4s_v5'%20and%20priceType%20eq%20'Reservation'"
```

```powershell
$filter = "serviceName eq 'Virtual Machines' and armSkuName eq 'Standard_D4s_v5' and priceType eq 'Reservation'"
$response = Invoke-RestMethod "https://prices.azure.com/api/retail/prices?`$filter=$filter"
$response.Items | Select-Object armSkuName, retailPrice, reservationTerm, meterName
```

### Non-USD currency

Append `currencyCode` parameter:
```http
GET https://prices.azure.com/api/retail/prices?currencyCode='EUR'&$filter=serviceName eq 'Virtual Machines' and armSkuName eq 'Standard_D4s_v5'
```

```bash
curl -s "https://prices.azure.com/api/retail/prices?currencyCode='EUR'&\$filter=serviceName%20eq%20'Virtual%20Machines'%20and%20armSkuName%20eq%20'Standard_D4s_v5'"
```

```powershell
$filter = "serviceName eq 'Virtual Machines' and armSkuName eq 'Standard_D4s_v5'"
$response = Invoke-RestMethod "https://prices.azure.com/api/retail/prices?currencyCode='EUR'&`$filter=$filter"
$response.Items | Select-Object armSkuName, retailPrice, currencyCode, meterName
```

## Available Filters

| Filter          | Example Value                    | Notes                          |
| --------------- | -------------------------------- | ------------------------------ |
| `serviceName`   | `'Virtual Machines'`             | Case-sensitive in preview API  |
| `armRegionName` | `'eastus'`, `'westeurope'`       | ARM region name                |
| `armSkuName`    | `'Standard_D4s_v5'`              | Full ARM SKU name              |
| `priceType`     | `'Consumption'`, `'Reservation'` | Pay-as-you-go vs reserved      |
| `serviceFamily` | `'Compute'`                      | Broad category                 |
| `productName`   | `'Virtual Machines Dv5 Series'`  | Product line                   |
| `meterName`     | `'D4s v5'`, `'D4s v5 Spot'`      | Includes Spot and Low Priority |

> **Warning:** Filter values are **case-sensitive** in API version `2023-01-01-preview` and later.

## Response Fields

| Field                  | Description                                                        |
| ---------------------- | ------------------------------------------------------------------ |
| `armSkuName`           | ARM SKU name (e.g., `Standard_D4s_v5`)                             |
| `retailPrice`          | Microsoft retail price (USD unless overridden)                     |
| `unitOfMeasure`        | Usually `1 Hour` for VMs                                           |
| `armRegionName`        | Region code                                                        |
| `meterName`            | Human-readable meter (includes "Spot" / "Low Priority" variants)   |
| `productName`          | Product line with OS (e.g., "Virtual Machines Dv5 Series Windows") |
| `type`                 | `Consumption`, `Reservation`, or `DevTestConsumption`              |
| `reservationTerm`      | `1 Year` or `3 Years` (reservation only)                           |
| `savingsPlan`          | Array with `term` and `unitPrice` (preview API only)               |
| `isPrimaryMeterRegion` | Filter to `true` to avoid duplicate regional meters                |

## Pagination

API returns max 1,000 records per request. Follow `NextPageLink` in the response to get more:

```json
{ "NextPageLink": "https://prices.azure.com:443/api/retail/prices?$filter=...&$skip=1000" }
```

## Tips for Recommendations

1. **Filter Linux vs Windows**: `productName` contains the OS β€” e.g., `'Virtual Machines Dv5 Series'` (Linux) vs `'Virtual Machines Dv5 Series Windows'`
2. **Use `isPrimaryMeterRegion eq true`** to deduplicate
3. **Compare Consumption + Reservation + Savings Plan** for full cost picture
4. **Monthly estimate**: `retailPrice Γ— 730` (hours/month)
5. **Spot pricing**: Filter `meterName` containing `'Spot'` for discounted interruptible VMs
vm-families.md 5.7 KB
# VM Family Guide

Select a VM family by matching the user's workload to the right category. Families describe hardware intent β€” not individual SKUs.

> **Source**: [Azure VM sizes overview](https://learn.microsoft.com/en-us/azure/virtual-machines/sizes/overview)
>
> **Note:** This reference may become stale. Before making final recommendations, verify critical specifications (especially Spot VM support, newer series availability, and specific family capabilities) by fetching the relevant learn.microsoft.com documentation.

## Family Selection Table

| Workload                             | Family                | Series                             | Quota Resource Name                       | Why                                                   |
| ------------------------------------ | --------------------- | ---------------------------------- | ----------------------------------------- | ----------------------------------------------------- |
| Web servers, dev/test, microservices | **General Purpose**   | D-series (Dsv5, Ddsv5, Dasv5)      | `standardDSv5Family` / `standardDDSv5Family` | Balanced CPU:memory ratio                             |
| Burstable / intermittent loads       | **General Purpose**   | B-series (Bsv2, Basv2)             | `standardBsv2Family` / `standardBasv2Family` | Low baseline CPU, credits for bursts; cheapest option |
| CI/CD, batch, gaming servers         | **Compute Optimized** | F-series (Fsv2, Fasv6)             | `standardFSv2Family`                      | High CPU:memory ratio                                 |
| Relational DBs, in-memory caches     | **Memory Optimized**  | E-series (Esv5, Edsv5, Easv5)      | `standardESv5Family` / `standardEDSv5Family` | High memory:CPU ratio                                 |
| SAP HANA, very large DBs             | **Memory Optimized**  | M-series (Msv3, Mdsv3)             | `standardMSMediumMemoryv3Family`          | Extreme memory (up to 4 TB)                           |
| Big Data, NoSQL, data warehousing    | **Storage Optimized** | L-series (Lsv3, Lasv3)             | `standardLSv3Family`                      | High disk throughput and IOPS                         |
| ML training, inference, rendering    | **GPU**               | NC-series (NCadsH100v5, NCasT4v3)  | `StandardNCadsH100v5Family`               | NVIDIA GPU compute                                    |
| Large-scale AI/ML training           | **GPU**               | ND-series (ND_MI300X_v5, NDH100v5) | `standardNDSH100v5Family`                 | Multi-GPU, high memory                                |
| Virtual desktop, cloud gaming        | **GPU**               | NV-series (NVadsA10v5)             | `StandardNVADSA10v5Family`                | GPU graphics/visualization                            |
| Cloud gaming, VDI (AMD GPU)          | **GPU**               | NG-series (NGadsV620v1)            | `StandardNGADSV620v1Family`               | AMD Radeon GPU; cost-effective graphics               |
| Confidential workloads               | **Confidential**      | DC-series (DCasv5, DCadsv5)        | `standardDCASv5Family`                    | Hardware-based TEE isolation                          |
| Confidential + encrypted memory      | **Confidential**      | EC-series (ECasv5, ECadsv5)        | `standardECASv5Family`                    | TEE isolation with memory encryption                  |
| CFD, weather simulation, FEA         | **HPC**               | HB/HC-series (HBv4, HBv5)          | `standardHBv4Family` / `standardHBv5Family` | InfiniBand, high memory bandwidth                     |
| EDA, large memory HPC                | **HPC**               | HX-series                          | `standardHXFamily`                        | Very large memory capacity                            |

> ⚠️ **Do not normalize quota name casing.** The mixed casing (e.g., `standard` vs `Standard`) matches the exact values returned by `az vm list-usage`. Changing them will break quota lookups.

## Decision Tree

```text
Workload needs GPU?
β”œβ”€ Yes β†’ training/inference? β†’ NC/ND-series
β”‚        visualization/VDI?  β†’ NV/NG-series
β”œβ”€ No
β”‚  β”œβ”€ Confidential computing? β†’ DC/EC-series
β”‚  β”œβ”€ HPC (MPI, InfiniBand)? β†’ HB/HC/HX-series
β”‚  β”œβ”€ High disk I/O (NoSQL, warehousing)? β†’ L-series
β”‚  β”œβ”€ Memory-heavy (DB, cache, SAP)?
β”‚  β”‚  β”œβ”€ Extreme (>1 TB RAM) β†’ M-series
β”‚  β”‚  └─ Standard β†’ E-series
β”‚  β”œβ”€ CPU-heavy (batch, CI/CD)? β†’ F-series
β”‚  β”œβ”€ Burstable / dev-test? β†’ B-series
β”‚  └─ Balanced / general web β†’ D-series
```

## Key Trade-offs

| Choice                    | Pro                              | Con                                            |
| ------------------------- | -------------------------------- | ---------------------------------------------- |
| B-series (burstable)      | Lowest cost                      | Throttled when credits exhausted               |
| AMD (`a` suffix) vs Intel | ~5–15% cheaper                   | Some workloads assume Intel extensions         |
| ARM (`p` suffix, Cobalt)  | Best price-performance for Linux | Windows not supported; check app compatibility |
| Previous-gen (v4, v3)     | Sometimes cheaper                | Not recommended for new deployments            |
| Spot VMs                  | Up to 90% discount               | Can be evicted with 30s notice                 |

## Naming Convention

`Standard_<Family><Subfamily?><vCPUs><Features>_<Version>`

| Letter | Meaning                   |
| ------ | ------------------------- |
| `a`    | AMD CPU                   |
| `p`    | ARM (Cobalt/Ampere) CPU   |
| `d`    | Local temp disk           |
| `s`    | Premium SSD capable       |
| `l`    | Low memory per core       |
| `i`    | Isolated (dedicated host) |
| `b`    | Block storage perf        |

Example: `Standard_D4as_v5` β†’ D-family, AMD, 4 vCPUs, premium SSD, version 5.
vm-quotas.md 4.1 KB
# VM Quota Validation Guide

Check Azure VM/VMSS quota availability before recommending or deploying. Ensures the subscription and region have sufficient vCPU capacity.

> ⚠️ **NEVER use the `azure-quota` MCP server as as It is unreliable.** Always try `az quota` CLI commands first.

## Quota Structure

VM quotas are tracked at **two levels** under `Microsoft.Compute`:

| Quota Level | Resource Name | What It Limits |
|---|---|---|
| **Total Regional** | `cores` | All vCPUs across all families in a region |
| **Per-Family** | e.g., `standardDSv3Family` | vCPUs for a specific VM family |

> ⚠️ **Both levels must have capacity.** A deployment fails if either is exceeded.

### Common Quota Resource Names

See [vm-families.md](./vm-families.md) for quota resource names per VM family. Use `az quota list` to discover names not listed there.

> ⚠️ **Do NOT guess quota names from SKU names.** Use `az quota list` to discover correct resource names.

## Quota Check Workflow

### Option A: `az vm list-usage` (Recommended for VM quotas)

No extension required. Returns **both current usage and limit in a single call** for all VM families in a region β€” equivalent to running `az quota usage show` and `az quota list` together for VM vCPU quotas.

```bash
# All VM family quotas in a region
az vm list-usage --location <region> -o table

# Filter to a specific family
az vm list-usage --location <region> --query "[?contains(name.value,'<quotaName>')].{Name:name.localizedValue, QuotaName:name.value, Current:currentValue, Limit:limit}" -o table
```

> πŸ’‘ **Tip:** `az vm list-usage` is the simplest way to check VM quotas. Use `az quota` (Option B) when you need to **request quota increases** or manage quotas for non-VM resource types.

### Option B: `az quota` CLI (For quota increases or non-VM resources)

Prerequisite: `az extension add --name quota`

| Step | Command | Purpose |
|---|---|---|
| 1. Discover names | `az quota list --scope /subscriptions/<sub-id>/providers/Microsoft.Compute/locations/<region> -o table` | Find quota resource name for the VM family |
| 2. Check usage | `az quota usage show --resource-name <name> --scope ...` | Current vCPU consumption |
| 3. Check limit | `az quota show --resource-name <name> --scope ...` | Maximum allowed vCPUs |
| 4. Check regional | Repeat steps 2–3 with `--resource-name cores` | Total regional vCPU cap |

### Calculate Capacity

```text
Available = Limit - Current Usage  (check both family AND regional)
vCPUs Needed = vCPUs per VM Γ— Instance Count

βœ… Deploy if: vCPUs Needed ≀ min(Family Available, Regional Available)
❌ Blocked if: either is exceeded
```

**Example:** 3Γ— `Standard_D4s_v5` (4 vCPUs each) = 12 needed. Family: 100βˆ’40 = 60 βœ…. Regional: 350βˆ’280 = 70 βœ….

## Handling Insufficient Quota

| Option | Action |
|---|---|
| **Request increase** | `az quota update --resource-name <name> --scope ... --limit-object value=<new-limit> --resource-type dedicated`. Most increases auto-approve within minutes. |
| **Try different region** | Run the quota check workflow against alternative regions to find available capacity |
| **Switch VM family** | Recommend an alternative family with quota (e.g., D-series full β†’ Dads v5 AMD variant) |

## VMSS Considerations

For scale sets, validate against **autoscale maximum**: `vCPUs per VM Γ— Max Instance Count`.

| Autoscale Setting | vCPUs to Validate |
|---|---|
| Fixed count (5 instances) | vCPUs Γ— 5 |
| Autoscale min=2, max=10 | vCPUs Γ— 10 |

## Error Reference

| Error | Cause | Action |
|---|---|---|
| `QuotaExceeded` | Family vCPU limit reached | Request increase or change family/region |
| `OperationNotAllowed` | Subscription lacks capacity | Request quota increase |
| `cores` limit hit | Regional vCPUs exhausted | Request regional increase |
| CLI commands fail entirely | Auth/extension issue | Use MCP fallback (see below) |

## Related Resources

- Invoke the **azure-quotas** skill for complete quota CLI workflows across all Azure providers
- [VM Family Guide](vm-families.md) β€” Family-to-workload mapping
- [Azure VM quotas documentation](https://learn.microsoft.com/en-us/azure/virtual-machines/quotas)
vmss-guide.md 6.2 KB
# VMSS Guide

Determine when to recommend a Virtual Machine Scale Set (VMSS) over a single VM, and which VMSS configuration to suggest.

> **Note:** This reference provides quick guidance but may become stale. Always verify VMSS features, limitations, and orchestration mode capabilities by fetching the latest documentation from:
> - https://learn.microsoft.com/en-us/azure/virtual-machine-scale-sets/overview
> - https://learn.microsoft.com/en-us/azure/virtual-machine-scale-sets/virtual-machine-scale-sets-autoscale-overview
> - https://learn.microsoft.com/en-us/azure/virtual-machine-scale-sets/orchestration-modes-api-comparison

## What Is a VM Scale Set?

A VMSS creates and manages a group of load-balanced, identically configured VM instances. Key capabilities:

- **Autoscale** β€” automatically add/remove instances based on metrics or schedules
- **High availability** β€” spread instances across fault domains and Availability Zones
- **Load balancing** β€” integrate with Azure Load Balancer (L4) or Application Gateway (L7)
- **Large scale** β€” up to 1,000 instances per scale set (marketplace images)
- **No extra cost** β€” you pay only for the underlying VM instances, storage, and networking

## When to Recommend VMSS vs Single VM

| Scenario                                              | Recommend | Reasoning                                   |
| ----------------------------------------------------- | --------- | ------------------------------------------- |
| Stateless web/API behind a load balancer              | VMSS      | Homogeneous fleet, autoscale on demand      |
| Batch or parallel compute jobs                        | VMSS      | Scale out for jobs, scale to zero when idle |
| Autoscale needed (CPU, queue depth, schedule)         | VMSS      | Built-in autoscale rules                    |
| Microservices with identical replicas                 | VMSS      | Consistent config, rolling updates          |
| High availability across zones (many instances)       | VMSS      | Automatic zone distribution                 |
| Single long-lived server (jumpbox, domain controller) | VM        | No scaling benefit; simpler config          |
| Unique per-instance configuration                     | VM        | Scale sets assume identical instances       |
| Quick proof of concept or dev/test                    | VM        | Faster to stand up, lower complexity        |

## Orchestration Modes

VMSS supports two orchestration modes. **Flexible** is recommended for all new workloads.

| Feature                  | Flexible (recommended) | Uniform (legacy) |
| ------------------------ | ---------------------- | ---------------- |
| Mix VM sizes in one set  | βœ… Yes | ❌ No |
| Add existing VMs to set  | βœ… Yes | ❌ No |
| Availability Zone spread | βœ… Automatic | βœ… Automatic |
| Fault domain control     | βœ… Yes | βœ… Yes |
| Max instances            | 1,000 | 1,000 |
| Spot instances           | βœ… Yes | βœ… Yes |
| Single-instance VMSS     | βœ… Yes | ❌ No |
| VM model updates         | Automatic, Manual, Rolling | Automatic, Manual, Rolling |

> **Warning:** Orchestration mode cannot be changed after creation. Always recommend Flexible unless the user has a specific Uniform requirement.

## Autoscale Patterns

| Pattern            | Trigger                                  | Example                                                      |
| ------------------ | ---------------------------------------- | ------------------------------------------------------------ |
| **Metric-based**   | CPU, memory, queue length, custom metric | Scale out when avg CPU > 70% for 5 min                       |
| **Schedule-based** | Time of day, day of week                 | Scale to 10 instances Mon–Fri 8 AM; scale down to 2 at night |
| **Combined**       | Metric + schedule together               | Baseline schedule with metric burst capacity                 |
| **Predictive**     | ML-forecasted demand (preview)           | Pre-scale before expected traffic spike                      |

### Autoscale Best Practices

- Set a **minimum instance count β‰₯ 2** for production HA
- Use a **cool-down period** (default 5 min) to avoid flapping
- Scale out aggressively, scale in conservatively (asymmetric rules)
- Monitor with [Azure Monitor autoscale diagnostics](https://learn.microsoft.com/en-us/azure/azure-monitor/autoscale/autoscale-best-practices)

## Networking

| Component               | When to Use                                                              |
| ----------------------- | ------------------------------------------------------------------------ |
| **Azure Load Balancer** | Layer-4 (TCP/UDP) traffic distribution; most common for backend services |
| **Application Gateway** | Layer-7 (HTTP/HTTPS) with TLS termination, URL routing, WAF              |
| **No load balancer**    | Batch/HPC jobs where instances pull work from a queue                    |

## Cost Estimation Tips

- VMSS itself is **free** β€” cost is the sum of per-instance VM pricing
- Estimate at **min** and **max** instance counts for autoscale budgets
- Use **Spot instances** in VMSS for up to 90% savings on interruptible workloads
- Combine with **Reservations** or **Savings Plans** on the baseline instance count

## Key VMSS Limits

| Limit                                  | Value                              |
| -------------------------------------- | ---------------------------------- |
| Max instances per scale set            | 1,000 (marketplace/gallery images) |
| Max instances (managed image)          | 600                                |
| Scale sets per subscription per region | 2,500                              |
| Scale operations concurrency           | Up to 1,000 VMs in a single batch  |

## Further Reading

- [VMSS orchestration modes](https://learn.microsoft.com/en-us/azure/virtual-machine-scale-sets/virtual-machine-scale-sets-orchestration-modes)
- [Autoscale best practices](https://learn.microsoft.com/en-us/azure/azure-monitor/autoscale/autoscale-best-practices)
- [VMSS networking](https://learn.microsoft.com/en-us/azure/virtual-machine-scale-sets/virtual-machine-scale-sets-networking)
- [VMSS Flexible portal quickstart](https://learn.microsoft.com/en-us/azure/virtual-machine-scale-sets/flexible-virtual-machine-scale-sets-portal)
workflows/capacity-reservation/
capacity-reservation.md 7.8 KB
# Azure Capacity Reservation

Helps users create and configure Azure Capacity Reservation Groups (CRGs) to guarantee VM compute capacity in a specific region without deploying VMs.

## Reference Files

Read these before responding to the user:

| Signal                                           | Reference                                                                    |
|--------------------------------------------------|------------------------------------------------------------------------------|
| General CRG concepts, CLI commands, finding CRGs | [Capacity Reservation Overview](references/capacity-reservation-overview.md) |
| Associate/disassociate VM or VMSS with a CRG     | [Association & Disassociation](references/association-disassociation.md)     |

## When to Use This Workflow

Activate this workflow when the user explicitly asks about Capacity Reservation Groups (CRGs) or capacity reservations.

Also **proactively suggest** CRG when the user's scenario matches any of these patterns:

- **Deployment failure is unacceptable** β€” disaster recovery, customer-facing services, or mission-critical workloads where capacity unavailability would cause an outage
- **Known scale-out events** β€” product launches, seasonal traffic spikes, or planned migrations where capacity must be guaranteed ahead of time
- **In-demand SKUs** β€” GPU, high-memory, or new/popular VM sizes that are frequently capacity-constrained
- **Specific SKU + zone + region required** β€” the workload cannot fall back to a different size, zone, or region
- **Centralized capacity pooling** β€” capacity is being managed centrally across multiple subscriptions (CRGs support cross-subscription sharing)

> **Note:** CRGs are typically used for critical workloads only, not all deployments. They are SLA-backed but billed at pay-as-you-go rates whether capacity is consumed or not.

## Key Concepts

| Concept                           | Description                                                                                                      |
|-----------------------------------|------------------------------------------------------------------------------------------------------------------|
| **Capacity Reservation Group**    | A logical container that holds one or more capacity reservations; must be associated with VMs at deployment time |
| **Capacity Reservation**          | A reservation for a specific VM size and quantity in a specific Availability Zone                                |
| **Scope**                         | CRGs are scoped to a single Azure region and subscription                                                        |
| **Billing**                       | Charges begin as soon as the reservation is created, whether or not VMs are deployed against it                  |

## Workflow

### Step 1: Gather Requirements

Ask the user for (infer when possible, except where noted):

| Requirement              | Required | Notes                                                                                                                                                                                                  |
|--------------------------|----------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| **Region**               | Yes      | Infer from context if possible (e.g., eastus, westeurope)                                                                                                                                              |
| **VM size(s)**           | Yes      | e.g., Standard_D4s_v5, Standard_E8s_v5                                                                                                                                                                 |
| **Quantity**             | Yes      | **Always ask β€” do not infer.**                                                                                                                                                                         |
| **Availability Zone(s)** | No       | CRGs can be created without zones. Only include zones if the user explicitly requests a zonal reservation. **Do not pick a zone on the user's behalf** unless they explicitly ask for any/random zone  |
| **Resource group**       | Yes      | Existing or new resource group name                                                                                                                                                                    |

### Step 2: Create Capacity Reservation Group and Reservation

> ⚠️ **PowerShell users:** Replace `\` line continuations with backticks (`` ` ``) or collapse commands to a single line.

```bash
# Create the CRG
# Zonal (specify one or more zones the group will support):
az capacity reservation group create \
  -g <resource-group> \
  -n <crg-name> \
  -l <region> \
  --zones 1 2 3

# Non-zonal (omit --zones for regional-only reservations):
az capacity reservation group create \
  -g <resource-group> \
  -n <crg-name> \
  -l <region>

# Create the reservation
# If the CRG is zonal, specify --zone matching one of the group's zones.
# If the CRG is non-zonal, omit --zone.
az capacity reservation create \
  -g <resource-group> \
  -c <crg-name> \
  -n <reservation-name> \
  --sku <vm-size> \
  --capacity <quantity> \
  --zone <zone>            # omit if CRG is non-zonal
```

### Step 3: Verify Reservation

```bash
az capacity reservation show \
  -g <resource-group> \
  -c <crg-name> \
  -n <reservation-name> \
  --query "{name:name, sku:sku, capacity:sku.capacity, provisioningState:provisioningState}"
```

### Step 4: Offer Next Steps

- Associate VMs or VMSS with the Capacity Reservation Group at deployment time
- See [Capacity Reservation Overview](references/capacity-reservation-overview.md) for detailed guidance

## Managing Existing Reservations

For operations beyond creation, see the relevant section in the [Capacity Reservation Overview](references/capacity-reservation-overview.md):

- **Associate a VM or VMSS** with a CRG β€” see [Association & Disassociation](references/association-disassociation.md)
- **Disassociate a VM or VMSS** from a CRG β€” see [Association & Disassociation](references/association-disassociation.md)
- **Find a matching CRG** for a VM, or enumerate all reservations/groups β€” see [Finding Valid CRGs](references/capacity-reservation-overview.md#finding-valid-crgs-for-a-vm)

## Error Handling

| Scenario                             | Action                                                                                                                                                                                        |
|--------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| SKU not available in region/zone     | Run `az vm list-skus --location <region> --size <vm-size> --resource-type virtualMachines -o table`. Suggest alternatives from output                                                         |
| Quota exceeded                       | Use the **azure-quotas** skill to check usage and request an increase                                                                                                                         |
| Insufficient platform capacity       | Azure lacks physical hardware in the region/zone. Suggest a different zone, region, or VM size                                                                                                |
| Duplicate SKU + zone in CRG          | Only one reservation per VM size per zone (or per size if non-zonal) is allowed in a CRG. Update the existing reservation's capacity instead                                                  |
workflows/capacity-reservation/references/
association-disassociation.md 3.8 KB
# Associating and Disassociating VMs/VMSS with a Capacity Reservation Group

## Association Model

```text
Capacity Reservation Group (CRG)
β”œβ”€β”€ Capacity Reservation: Standard_D4s_v5 Γ— 5 (Zone 1)
β”œβ”€β”€ Capacity Reservation: Standard_D4s_v5 Γ— 3 (Zone 2)
└── Capacity Reservation: Standard_E8s_v5 Γ— 2 (Zone 1)

VM / VMSS
└── capacityReservationGroup.id = <CRG resource ID>
    └── Azure auto-matches to a reservation with the right VM size + zone
```

### Associating VMs

Set the `capacityReservationGroup` property when creating or updating a VM.

#### New VM

```bash
az vm create \
  -g <rg> \
  -n <vm-name> \
  --image <image> \
  --size Standard_D4s_v5 \
  --zone 1 \
  --capacity-reservation-group <crg-id>
```

#### Existing VM

Zonal VMs can be associated while running:

```bash
az vm update -g <rg> -n <vm-name> --capacity-reservation-group <crg-id>
```

Regional VMs (no zone) must be deallocated first:

```bash
az vm deallocate -g <rg> -n <vm-name>
az vm update -g <rg> -n <vm-name> --capacity-reservation-group <crg-id>
az vm start -g <rg> -n <vm-name>
```

### Associating VMSS

```bash
az vmss create \
  -g <rg> \
  -n <vmss-name> \
  --image <image> \
  --vm-sku Standard_D4s_v5 \
  --instance-count 5 \
  --zones 1 \
  --capacity-reservation-group <crg-id>
```

Existing VMSS can be associated using `az vmss update` similarly to VMs. Regional VMSS must be deallocated first. Zonal VMSS can be associated without deallocating, but this is currently a [Preview feature](https://learn.microsoft.com/en-us/azure/virtual-machines/capacity-reservation-associate-virtual-machine-scale-set).

## Disassociating from a Capacity Reservation Group

Both the VM/VMSS and the underlying capacity reservation logically occupy capacity. Azure imposes constraints to avoid ambiguous allocation states, so you cannot simply remove the association while resources are running against it.

There are three ways to disassociate. The commands below use `az vm` β€” for VMSS, substitute `az vmss` and add `az vmss update-instances --instance-ids "*"` as a final step when using a **Manual** upgrade policy.

### Option 1: Deallocate, then remove association

Best when the VM/VMSS can tolerate downtime.

```bash
az vm deallocate -g <rg> -n <vm-name>
az vm update -g <rg> -n <vm-name> --capacity-reservation-group None
az vm start -g <rg> -n <vm-name>   # optional
```

### Option 2: Set reserved quantity to zero, then remove association

Best when the VM/VMSS cannot be deallocated and the reservation is no longer needed.

```bash
az capacity reservation update \
  -g <rg> --capacity-reservation-group <crg> \
  -n <reservation-name> --capacity 0
az vm update -g <rg> -n <vm-name> --capacity-reservation-group None
```

### Option 3: Delete the VM/VMSS

Deleting the resource automatically removes the association. Some latency may occur before the capacity reservation allocation state updates.

### VMSS Upgrade Policy Behavior

| Policy        | Behavior                                                               |
|---------------|------------------------------------------------------------------------|
| **Automatic** | Instances update automatically β€” no further action needed              |
| **Rolling**   | Instances update in batches with an optional pause between them        |
| **Manual**    | You must run `az vmss update-instances --instance-ids "*"` per update  |

## Learn More

- [Associate a VM to a Capacity Reservation Group](https://learn.microsoft.com/en-us/azure/virtual-machines/capacity-reservation-associate-vm)
- [Remove/disassociate a VM from a Capacity Reservation Group](https://learn.microsoft.com/en-us/azure/virtual-machines/capacity-reservation-remove-vm)
- [Remove/disassociate a VMSS from a Capacity Reservation Group](https://learn.microsoft.com/en-us/azure/virtual-machines/capacity-reservation-remove-virtual-machine-scale-set)
capacity-reservation-overview.md 5.9 KB
# Capacity Reservation Overview

Reference material for Azure Capacity Reservation Groups and Capacity Reservations.

## What Is a Capacity Reservation Group?

A Capacity Reservation Group (CRG) is a logical container for one or more capacity reservations. It acts as the association point for VMs and VMSS β€” you associate a VM or scale set with the **group**, and Azure matches the VM to a suitable reservation within that group.

## Constraints

| Constraint                     | Detail                                                                                                     |
|--------------------------------|------------------------------------------------------------------------------------------------------------|
| **Region-scoped**              | A CRG and all its reservations must be in the same Azure region                                            |
| **Zone-specific**              | Each reservation targets a specific Availability Zone (or is non-zonal)                                    |
| **Subscription-scoped**        | A CRG lives in a single subscription but can be shared with other subscriptions via the `sharing` property |
| **VM size per reservation**    | Each capacity reservation covers exactly one VM size                                                       |
| **Billing starts immediately** | You are charged for reserved capacity whether or not VMs are running against it                            |

## Association and Disassociation

See [association-disassociation.md](association-disassociation.md) for how to associate and disassociate VMs/VMSS with a CRG.

## Common CLI Commands

| Action                      | Command                                                                                                     |
|-----------------------------|-------------------------------------------------------------------------------------------------------------|
| List CRGs                   | `az capacity reservation group list`                                                                        |
| Show CRG                    | `az capacity reservation group show -g <rg> -n <crg> --instance-view`                                       |
| Delete CRG                  | `az capacity reservation group delete -g <rg> -n <crg>`                                                     |
| List reservations           | `az capacity reservation list -g <rg> --capacity-reservation-group <crg>`                                   |
| Update reservation quantity | `az capacity reservation update -g <rg> --capacity-reservation-group <crg> -n <res> --capacity <new-count>` |
| Delete reservation          | `az capacity reservation delete -g <rg> --capacity-reservation-group <crg> -n <res>`                        |

## Finding Valid CRGs for a VM

To associate a VM with a CRG, the CRG must contain a capacity reservation that matches the VM's **size**, **region**, and **zone** (if zonal). While `az capacity reservation group list` can enumerate CRGs at the subscription level, filtering down to matching reservations across many groups is inefficient. Azure Resource Graph is recommended for cross-resource-group discovery.

### Option 1: Azure Resource Graph (recommended)

ARG can query all capacity reservations across resource groups in a single call, filtering by location, VM size, and zone. This is the most efficient approach.

> ⚠️ **Prerequisite:** `az extension add --name resource-graph`

You must collapse this query to a **single line** before running it:

```bash
az graph query -q "
  Resources
  | where type =~ 'Microsoft.Compute/capacityReservationGroups/capacityReservations'
  | where location =~ '<region>'
  | where properties.provisioningState =~ 'Succeeded'
  | where sku.name =~ '<vm-size>'
  | project id,
            crgId = extract('(.*)/capacityReservations', 1, id),
            resourceGroup,
            zones,
            size = sku.name,
            capacity = coalesce(sku.capacity, 0),
            associationCount = coalesce(array_length(properties.virtualMachinesAssociated), 0),
            location
" --query "data[]" -o table
```

The `crgId` in the output is the parent Capacity Reservation Group resource ID β€” this is the value to use when associating a VM or VMSS.

To further narrow results for zonal VMs, add a zone filter:

```kql
| where zones has '<zone>'
```

### Option 2: CLI enumeration

If ARG is unavailable, list CRGs per resource group and inspect their reservations:

```bash
# List all CRGs
az capacity reservation group list -o table

# List reservations within a CRG and check for matching size/capacity
az capacity reservation list \
  -g <rg> \
  --capacity-reservation-group <crg-name> \
  --query "[?sku.name=='<vm-size>'].{name:name, size:sku.name, capacity:sku.capacity, zones:zones}" \
  -o table
```

## Estimating Reservation Cost

Capacity reservations are billed at the same pay-as-you-go rate as the underlying VM size, whether or not VMs are running against them. Use the [Retail Prices API guide](../../../references/retail-prices-api.md) (unauthenticated) to look up hourly rates.

**Estimated monthly cost:** `quantity Γ— hourly rate Γ— 730`

> ⚠️ Prices returned are **estimates based on current retail pay-as-you-go rates**, not a final cost or contractual commitment. Actual charges may vary due to taxes, discounts (Reserved Instances, Savings Plans), or price changes.

## Important Notes

- **Deletion is blocked until prerequisites are met:** Azure rejects a CRG delete unless all VMs/VMSS are disassociated and all capacity reservations are deleted. Order: disassociate VMs/VMSS β†’ delete reservations β†’ delete group.
- **Quota required:** Capacity reservations consume vCPU quota just like running VMs.

## Learn More

- [Azure Capacity Reservations documentation](https://learn.microsoft.com/en-us/azure/virtual-machines/capacity-reservation-overview)
- [Create a Capacity Reservation](https://learn.microsoft.com/en-us/azure/virtual-machines/capacity-reservation-create)
- [Association and disassociation](association-disassociation.md)
workflows/essential-machine-management/
essential-machine-management.md 4.9 KB
# Essential Machine Management (EMM) Workflow

Routes EMM-related requests to the appropriate reference based on user intent.

## Overview

Essential Machine Management simplifies onboarding and configuration of management for Azure VMs and Arc-enabled servers at the subscription level. When enabled, all VMs in a subscription are automatically enrolled with a curated set of monitoring, security, and operations features.

> ⚠️ **Warning:** EMM is currently in **public preview**.

## Routing

```text
User intent?
β”œβ”€ Enable / onboard / enroll subscription for EMM
β”‚  └─ Copilot-guided (default) β†’ Load [EMM Enable Flow](references/emm-enable-flow.md)
β”‚
β”œβ”€ User explicitly asks for portal guidance
β”‚  └─ Load [EMM Enable Flow (Portal)](references/emm-enable-flow-portal-guidance.md)
β”‚
β”œβ”€ What is EMM / features / pricing / tiers
β”‚  └─ Load [EMM Overview](references/emm-overview.md)
β”‚
β”œβ”€ Prerequisites / permissions / roles / managed identity
β”‚  └─ Load [EMM Prerequisites](references/emm-prerequisites.md)
β”‚
β”œβ”€ View enrolled subscriptions / browse / status
β”‚  └─ See "Browse Enrolled Subscriptions" below
β”‚
β”œβ”€ Offboard / disable EMM for a subscription
β”‚  └─ See "Offboard a Subscription" below
β”‚
└─ Troubleshoot EMM issues
   └─ See "Troubleshooting" below
```

| Signal | Reference |
| ------ | --------- |
| "enable EMM", "onboard subscription", "enroll VMs", "set up machine management" | [EMM Enable Flow](references/emm-enable-flow.md) |
| User explicitly mentions "portal", "Azure portal", "portal UI" | [EMM Enable Flow (Portal)](references/emm-enable-flow-portal-guidance.md) |
| "what is EMM", "features", "pricing", "tiers", "what does EMM include" | [EMM Overview](references/emm-overview.md) |
| "permissions", "roles", "prerequisites", "managed identity for EMM" | [EMM Prerequisites](references/emm-prerequisites.md) |

> ⚠️ **Important:** Only route to the portal guide when the user explicitly mentions "portal". All other enable requests use the Copilot-guided flow.

## Browse Enrolled Subscriptions

Query the EMM resource on each subscription to check enrollment status:

```text
GET https://management.azure.com/subscriptions/{subscriptionId}/providers/Microsoft.ManagedOps/managedOps/default?api-version=2025-07-28-preview
```

| Response | Meaning |
| -------- | ------- |
| `200` with `provisioningState: Succeeded` | Subscription is enrolled |
| `200` with `provisioningState: Failed` | Enrollment attempted but failed β€” check error details |
| `404` | Subscription is not enrolled |

When enrolled, the response includes:
- **SKU/tier** β€” e.g. Essential
- **Enabled services** β€” Azure Monitor Insights, Update Manager, Change Tracking, Policy & Machine Configuration, Defender CSPM, Defender for Servers
- **UAMI** β€” the user-assigned managed identity resource ID
- **Workspaces** β€” Log Analytics and Azure Monitor workspace resource IDs
- **Created by / date** β€” who enrolled and when (in `systemData`)

To scan multiple subscriptions, use `mcp_azure_mcp_subscription_list` to list available subscriptions, then query each one. Report results as a table:

```text
| Subscription | Status | SKU | Services Enabled |
```

## Offboard a Subscription

To disable EMM for a subscription, follow the "Disable EMM (Offboard)" section in [EMM Enable Flow](references/emm-enable-flow.md).

> ⚠️ **Warning:** When you disable a subscription, machines no longer use consolidated pricing. Pricing reverts to standard per-service pricing which may increase costs. Existing VM configurations are not removed β€” disable unneeded services manually.

## Troubleshooting

For common EMM issues, refer to the official documentation:
- [Troubleshoot Essential Machine Management (Preview)](https://learn.microsoft.com/en-us/azure/operations/configuration-enrollment-troubleshoot)

Common issues include:
- Missing role assignments (EMM Administrator, Managed Identity Operator, Resource Policy Contributor)
- Resource provider `Microsoft.ManagedOps` not registered in the subscription
- UAMI lacking Contributor permission on the subscription
- Cross-subscription workspace access requires additional RP registration

## Error Handling

| Error | Cause | Remediation |
| ----- | ----- | ----------- |
| Permission denied during enable | User lacks required roles | Assign EMM Administrator, Managed Identity Operator, and Resource Policy Contributor roles |
| UAMI role check fails | Managed identity lacks Contributor | Assign Contributor role to the UAMI at subscription scope |
| RP not registered | `Microsoft.ManagedOps` not registered | Register via `Register-AzResourceProvider -ProviderNamespace "Microsoft.ManagedOps"` |
| Cross-subscription workspace error | Workspace in different sub without RP registration | Register `Microsoft.ManagedOps` in the workspace subscription and assign EMM Administrator on the workspace resource group |
| Deployment fails | ARM template validation error | Check deployment link in browse view for detailed error; verify all prerequisites |
workflows/essential-machine-management/references/
emm-enable-flow-portal-guidance.md 3.1 KB
# EMM Enable Flow (Portal)

Step-by-step guide for enabling Essential Machine Management through the Azure portal UI.

## Quick Reference

| Property | Value |
| -------- | ----- |
| Portal blade | `EnableMachineManagement.ReactView` |
| Extension | `Microsoft_Azure_Computehub` |
| Portal path | Compute infrastructure β†’ Monitoring+Operations β†’ Essential Machine Management β†’ Enable |
| Resource type | `Microsoft.ManagedOps/ManagedOps` |

## Enable Flow Steps

The portal enable flow is a multi-tab wizard with 4 tabs:

### Tab 1: Scope

Select the target subscription and managed identity.

| Field | Description | Required |
| ----- | ----------- | -------- |
| Subscription | The subscription to enable EMM for. Shows VM and Arc machine counts per subscription. | βœ… |
| User-assigned managed identity | UAMI with Contributor on the subscription. Used for onboarding VMs. | βœ… |

**Validation displayed:**
- Required user role assignments vs current user role assignments
- Required UAMI role assignments vs current UAMI role assignments

> πŸ’‘ **Tip:** If roles are missing, the UI shows exactly which roles are needed. Assign them before proceeding.

### Tab 2: Configure

Select or create the monitoring workspaces.

| Field | Description | Required |
| ----- | ----------- | -------- |
| Log Analytics workspace | Collects log data (Change Tracking & Inventory). Can create new inline. | βœ… |
| Azure Monitor workspace | Collects metrics data (VM Insights). Can create new inline. | βœ… |

**Notes:**
- Workspaces can be in a different subscription than the one being enabled
- If cross-subscription, additional RP registration and role assignments are needed (see [Prerequisites](emm-prerequisites.md))

### Tab 3: Security

Optional security add-ons.

| Feature | Description | Cost |
| ------- | ----------- | ---- |
| Foundational CSPM | Agentless, risk-prioritized cloud security posture insights. Always included. | Free |
| Defender CSPM | Advanced CSPM with attack path analysis. Optional toggle. | Paid |
| Defender for Cloud | Comprehensive server protection with EDR, vulnerability management, file integrity monitoring. Optional toggle. | Paid |

### Tab 4: Review & Enable

Displays a summary of all selections:
- Included features (always: Azure Monitor VM Insights, Azure Policy & Machine Configurations, Change Tracking & Inventory, Azure Update Manager)
- Selected scope (subscription, UAMI)
- Configure selections (Log Analytics workspace, Azure Monitor workspace)
- Security add-ons enabled
- Pricing information with links

Clicking **Enable** triggers:
1. Resource provider registrations on the target subscription
2. Cross-subscription RP registration if workspaces are in a different subscription
3. Subscription-level ARM template deployment

## What Happens After Enable

- A deployment is created: `ManagedOps_{uamiName}_{subscriptionId}`
- Policy assignments are created to configure all VMs in the subscription
- Remediation tasks are created for existing VMs
- New VMs added to the subscription are automatically enrolled
- The subscription appears in the browse view with status "Succeeded"
emm-enable-flow.md 9.2 KB
# EMM Enable Flow

Copilot-guided step-by-step workflow for enabling Essential Machine Management on a subscription. Copilot orchestrates each step, triggering the necessary CLI commands or API calls on behalf of the user.

## Quick Reference

| Property | Value |
| -------- | ----- |
| Resource type | `Microsoft.ManagedOps/ManagedOps` |
| Resource provider | `Microsoft.ManagedOps` |
| API version | `2025-07-28-preview` |
| Deployment scope | Subscription-level |

## Workflow Steps

### Step 1: Select Target Subscription

Ask the user which subscription to enable EMM for. Use MCP tools to list subscriptions if needed.

| MCP Tool | Purpose |
| -------- | ------- |
| `mcp_azure_mcp_subscription_list` | List available subscriptions |

Store the selected `subscriptionId` and `tenant` for all subsequent steps.

### Step 2: Validate User Role Assignments

Check that the current user has the 3 required roles on the target subscription. This requires two API calls: one to get the user's role assignments, and one to get all role definitions. Then compare the user's assigned permissions against the required roles.

**Step 2a: Get current user's object ID**

```bash
az rest --method GET --url "https://graph.microsoft.com/v1.0/me" --query id -o tsv
```

**Step 2b: Get user's role assignments on the subscription**

```text
GET https://management.azure.com/subscriptions/{subscriptionId}/providers/Microsoft.Authorization/roleAssignments?api-version=2022-04-01&$filter=assignedTo('{objectId}')
```

> πŸ’‘ **Tip:** The `assignedTo` filter is self-scoped β€” it allows the user to query their own role assignments without needing `Microsoft.Authorization/roleAssignments/read`. However, a 403 will still occur if the user has no role on the subscription at all.

**Step 2c: Get all role definitions on the subscription**

```text
GET https://management.azure.com/subscriptions/{subscriptionId}/providers/Microsoft.Authorization/roleDefinitions?api-version=2022-04-01
```

**Step 2d: Join and check permissions**

For each role assignment, match `properties.roleDefinitionId` to the role definitions to resolve the role name and its `properties.permissions[]`. Then check whether the user's combined permissions cover all three required roles:

| Required Role | Key Permissions (actions) |
| ------------- | ------------------------ |
| Essential Machine Management Administrator | `Microsoft.ManagedOps/managedOps/*`, `Microsoft.Insights/dataCollectionRules/*`, `Microsoft.Monitor/accounts/*`, `Microsoft.OperationalInsights/workspaces/read`, `Microsoft.Security/pricings/*` |
| Managed Identity Operator | `Microsoft.ManagedIdentity/userAssignedIdentities/*/read`, `Microsoft.ManagedIdentity/userAssignedIdentities/*/assign/action` |
| Resource Policy Contributor | `Microsoft.Authorization/policyassignments/*`, `Microsoft.Authorization/policydefinitions/*`, `Microsoft.PolicyInsights/*` |

> πŸ’‘ **Tip:** If the user has **Owner** at subscription scope, they satisfy all required permissions. Check for these first as a fast path.

```text
Check result?
β”œβ”€ All 3 roles covered β†’ Proceed to Step 3
β”œβ”€ Owner found β†’ All roles satisfied, proceed to Step 3
└─ Missing roles β†’ Inform user which roles are missing and how to assign them, then re-check
```

### Step 3: Select or Create a User-Assigned Managed Identity (UAMI)

Ask the user to provide an existing UAMI or create a new one. The UAMI must have **Contributor** on the target subscription.

Verify the UAMI's role using the same API pattern as Step 2, but filter by the UAMI's principal ID (object ID) instead of the user's:

```text
GET https://management.azure.com/subscriptions/{subscriptionId}/providers/Microsoft.Authorization/roleAssignments?api-version=2022-04-01&$filter=assignedTo('{uamiPrincipalId}')
```

Check that at least one assignment resolves to the **Contributor** role definition.

> πŸ’‘ **Tip:** If the UAMI lacks the Contributor role, guide the user to assign it before proceeding.

Store the full UAMI resource ID: `/subscriptions/<sub>/resourceGroups/<rg>/providers/Microsoft.ManagedIdentity/userAssignedIdentities/<name>`

### Step 4: Select or Create Monitoring Workspaces

Ask the user for a **Log Analytics workspace** and an **Azure Monitor workspace**. Offer to create new ones if needed.

| Resource | CLI Command | Purpose |
| -------- | ----------- | ------- |
| Log Analytics workspace (list) | `az monitor log-analytics workspace list --subscription <subId> -o table` | List existing workspaces |
| Log Analytics workspace (create) | `az monitor log-analytics workspace create --workspace-name <name> --resource-group <rg> --subscription <subId> --location <location>` | Create new workspace |
| Azure Monitor workspace (list) | `az resource list --resource-type "Microsoft.Monitor/accounts" --subscription <subId> -o table` | List existing workspaces |
| Azure Monitor workspace (create) | `az resource create --resource-type "Microsoft.Monitor/accounts" --name <name> --resource-group <rg> --subscription <subId> --location <location> --properties "{}"` | Create new workspace |

> ⚠️ **Warning:** If workspaces are in a **different subscription** than the target:
> - Register `Microsoft.ManagedOps` RP in the workspace subscription
> - User needs **EMM Administrator** role on the workspace resource group
> - UAMI needs **Contributor** on the workspace resource group

Store both workspace resource IDs.

### Step 5: Configure Security Options

Ask the user about optional security add-ons.

| Feature | Default | Cost |
| ------- | ------- | ---- |
| Foundational CSPM | Always enabled | Free |
| Defender CSPM | Disabled | Paid |
| Defender for Cloud | Disabled | Paid |

Store user selections as `enabled` or `disabled`.

### Step 6: Register Resource Providers

Register required RPs on the target subscription before deployment.

```bash
az provider register --namespace Microsoft.ManagedOps --subscription <subscriptionId>
az provider register --namespace Microsoft.OperationsManagement --subscription <subscriptionId>
az provider register --namespace Microsoft.PolicyInsights --subscription <subscriptionId>
az provider register --namespace Microsoft.Insights --subscription <subscriptionId>
az provider register --namespace Microsoft.OperationalInsights --subscription <subscriptionId>
az provider register --namespace Microsoft.Monitor --subscription <subscriptionId>
az provider register --namespace Microsoft.ManagedIdentity --subscription <subscriptionId>
az provider register --namespace Microsoft.Security --subscription <subscriptionId>
```

> πŸ’‘ **Tip:** RP registration is idempotent β€” safe to run even if already registered.

### Step 7: Deploy EMM via ARM API

Submit the PUT request to enable EMM on the subscription.

```text
PUT /subscriptions/{subscriptionId}/providers/Microsoft.ManagedOps/managedOps/default?api-version=2025-07-28-preview
```

Request body:

```json
{
  "properties": {
    "desiredConfiguration": {
      "defenderCspm": "<enabled|disabled>",
      "defenderForServers": "<enabled|disabled>",
      "changeTrackingAndInventory": {
        "logAnalyticsWorkspaceId": "<log-analytics-workspace-resource-id>"
      },
      "userAssignedManagedIdentityId": "<uami-resource-id>",
      "azureMonitorInsights": {
        "azureMonitorWorkspaceId": "<azure-monitor-workspace-resource-id>"
      }
    }
  }
}
```

Populate the request body with the values collected in previous steps.

### Step 8: Verify Enrollment

After deployment completes, confirm the subscription is enrolled.

```text
GET /subscriptions/{subscriptionId}/providers/Microsoft.ManagedOps/managedOps/default?api-version=2025-07-28-preview
```

```text
Deployment status?
β”œβ”€ Succeeded β†’ Report success to user. All existing VMs will be enrolled via policy remediation.
β”œβ”€ In progress β†’ Wait and re-check after a short interval.
└─ Failed β†’ Read error details and route to Error Handling in the parent workflow.
```

## Disable EMM (Offboard)

To disable EMM for a subscription:

```text
DELETE /subscriptions/{subscriptionId}/providers/Microsoft.ManagedOps/managedOps/default?api-version=2025-07-28-preview
```

> ⚠️ **Warning:** Disabling reverts pricing to standard per-service rates, which may increase costs. Existing VM configurations are not removed.

## Error Handling

| Error | Cause | Remediation |
| ----- | ----- | ----------- |
| 403 on role check | User has no RBAC role assignment on the subscription (the `assignedTo` filter is self-scoped and does not require `roleAssignments/read`, but the user must have at least one role on the subscription) | Inform user they lack Owner or Contributor role on this subscription and cannot proceed with EMM enrollment |
| Missing required roles | User missing EMM Administrator, Managed Identity Operator, or Resource Policy Contributor | Guide user to assign missing roles, then re-validate |
| UAMI lacks Contributor | Managed identity missing Contributor role | Assign Contributor to the UAMI at subscription scope |
| RP registration failed | Insufficient permissions to register providers | User needs Contributor or Owner on the subscription |
| PUT deployment fails | ARM validation error | Check error details; verify all prerequisites met |
| Cross-subscription error | Workspace in different sub without RP/role setup | Register `Microsoft.ManagedOps` in workspace sub; assign roles on workspace RG |
emm-overview.md 2.4 KB
# EMM Overview

Essential Machine Management (EMM) simplifies onboarding and configuration of management for Azure VMs and Arc-enabled servers at the subscription level.

## What is EMM?

When you enable a subscription for EMM, all VMs and Arc-enabled servers in that subscription are automatically enrolled and configured with a curated set of management features. Any new VMs added to the subscription are also automatically enrolled.

## Features Included

### Essentials Tier (Always Enabled)

| Feature | Description |
| ------- | ----------- |
| Azure Monitor VM Insights | Monitors VM performance and health, configures metric-based recommended alerts |
| Azure Update Manager | Automates OS update deployment |
| Azure Machine Configuration | Audits Azure security baseline policy |
| Change Tracking & Inventory | Tracks VM configuration changes, maintains resource inventory |

### Security Tier (Optional Add-ons)

| Feature | Description | Cost |
| ------- | ----------- | ---- |
| Foundational CSPM | Agentless, risk-prioritized security posture insights | Free |
| Defender CSPM | Advanced CSPM with attack path analysis | Paid |
| Defender for Cloud | EDR, vulnerability management, file integrity monitoring, threat detection | Paid |

## Pricing

- **Azure VMs:** Essentials tier features at no extra charge
- **Arc-enabled servers with Windows Server SA/PayGo/ESU:** No extra charge
- **Other Arc-enabled servers:** $9/server/month once billing is enabled (future date, currently free in preview)
- **Change Tracking & Inventory logs:** Incur separate Log Analytics ingestion charges
- **Security tier add-ons:** Standard Microsoft Defender pricing applies

## Key Characteristics

- **Subscription-level scope:** Enables for all VMs in a subscription at once
- **No VM exclusion:** Currently no ability to exclude individual VMs
- **Existing services preserved:** If a VM already has Update Manager with a maintenance schedule, it keeps that schedule
- **REST API available:** Official docs focus on the portal experience, but a REST API (`Microsoft.ManagedOps`) is available and used by the Copilot-guided flow
- **Resource type:** `Microsoft.ManagedOps/ManagedOps`

## Documentation Links

- [Essential Machine Management (Preview)](https://learn.microsoft.com/en-us/azure/operations/configuration-enrollment)
- [Troubleshoot EMM](https://learn.microsoft.com/en-us/azure/operations/configuration-enrollment-troubleshoot)
emm-prerequisites.md 3.5 KB
# EMM Prerequisites

Requirements that must be met before enabling Essential Machine Management.

## Required Azure Resources

| Resource | Purpose |
| -------- | ------- |
| Log Analytics workspace | Collects log data from Change Tracking & Inventory |
| Azure Monitor workspace | Collects metrics data from VM Insights |
| User-assigned managed identity (UAMI) | Used to onboard and configure VMs in the subscription |

## Required User Roles

The user performing the enrollment must have these roles on the target subscription:

| Role | Description |
| ---- | ----------- |
| Essential Machine Management Administrator | Manages EMM resources, DCRs, monitor/workspace operations, security pricing |
| Managed Identity Operator | Reads and assigns user-assigned identities |
| Resource Policy Contributor | Creates/modifies resource policies, policy assignments, and exemptions |

### Cross-Subscription Workspace Scenario

If the Log Analytics or Azure Monitor workspace is in a **different subscription**:
- The user must also have **Essential Machine Management Administrator** on the resource group of the workspace
- The `Microsoft.ManagedOps` RP must be registered in the workspace subscription

## Required Managed Identity Roles

The user-assigned managed identity must have:

| Role | Scope |
| ---- | ----- |
| Contributor | Target subscription being enabled |

If workspaces are in a different subscription:
- **Contributor** on the resource group of the Log Analytics workspace and/or Azure Monitor workspace

## EMM Administrator Permissions Detail

The Essential Machine Management Administrator role includes these actions:

```text
Microsoft.Resources/deployments/*
Microsoft.Insights/dataCollectionRules/read
Microsoft.Insights/dataCollectionRules/write
Microsoft.Monitor/accounts/write
Microsoft.Monitor/accounts/read
Microsoft.ManagedOps/managedOps/read
Microsoft.ManagedOps/managedOps/write
Microsoft.ManagedOps/managedOps/delete
Microsoft.OperationsManagement/solutions/read
Microsoft.OperationsManagement/solutions/write
Microsoft.OperationalInsights/workspaces/read
Microsoft.OperationalInsights/workspaces/sharedkeys/action
Microsoft.OperationalInsights/workspaces/sharedkeys/read
Microsoft.OperationalInsights/workspaces/listKeys/action
Microsoft.Resources/subscriptions/resourceGroups/read
Microsoft.Insights/metricAlerts/write
Microsoft.Insights/metricAlerts/read
Microsoft.Security/pricings/write
Microsoft.Security/pricings/read
```

## Resource Provider Registrations

The following RPs are registered automatically during the enable flow:

| Resource Provider | Purpose |
| ----------------- | ------- |
| `Microsoft.ManagedOps` | Core EMM resource provider |
| `Microsoft.OperationsManagement` | Operations management solutions |
| `Microsoft.PolicyInsights` | Policy compliance and remediation |
| `Microsoft.Insights` | Monitoring and data collection rules |
| `Microsoft.OperationalInsights` | Log Analytics workspaces |
| `Microsoft.Monitor` | Azure Monitor workspaces |
| `Microsoft.ManagedIdentity` | Managed identity operations |
| `Microsoft.Security` | Defender for Cloud and CSPM |
| `Microsoft.Resources` | ARM deployments |

## Validation Checklist

Before enabling EMM, verify:

- [ ] User has all 3 required roles on the subscription
- [ ] UAMI exists and has Contributor on the subscription
- [ ] Log Analytics workspace exists (or will be created)
- [ ] Azure Monitor workspace exists (or will be created)
- [ ] If cross-subscription workspaces: additional roles and RP registrations in place
workflows/vm-recommender/
vm-recommender.md 10.3 KB
# Azure VM Recommender

Recommend Azure VM sizes, VM Scale Sets (VMSS), and configurations by analyzing workload type, performance requirements, scaling needs, and budget. No Azure subscription required β€” all data comes from public Microsoft documentation and the unauthenticated Retail Prices API.

## When to Use This Skill

- User asks which Azure VM or VMSS to choose for a workload
- User needs VM size recommendations for web, database, ML, batch, HPC, or other workloads
- User wants to compare VM families, sizes, or pricing tiers
- User asks about trade-offs between VM options (cost vs performance)
- User needs a cost estimate for Azure VMs without an Azure account
- User asks whether to use a single VM or a scale set
- User needs autoscaling, high availability, or load-balanced VM recommendations
- User asks about VMSS orchestration modes (Flexible vs Uniform)

## Workflow

> Use reference files for initial filtering

> **CRITICAL: then always verify with live documentation** from learn.microsoft.com before making final recommendations. If `web_fetch` fails, use reference files as fallback but warn the user the information may be stale.

### Step 1: Gather Requirements

Ask the user for (infer when possible):

| Requirement            | Examples                                                           |
| ---------------------- | ------------------------------------------------------------------ |
| **Workload type**      | Web server, relational DB, ML training, batch processing, dev/test |
| **vCPU / RAM needs**   | "4 cores, 16 GB RAM" or "lightweight" / "heavy"                    |
| **GPU needed?**        | Yes β†’ GPU families; No β†’ general/compute/memory                    |
| **Storage needs**      | High IOPS, large temp disk, premium SSD                            |
| **Budget priority**    | Cost-sensitive, performance-first, balanced                        |
| **OS**                 | Linux or Windows (affects pricing)                                 |
| **Region**             | Affects availability and price                                     |
| **Instance count**     | Single instance, fixed count, or variable/dynamic                  |
| **Scaling needs**      | None, manual scaling, autoscale based on metrics or schedule       |
| **Availability needs** | Best-effort, fault-domain isolation, cross-zone HA                 |
| **Load balancing**     | Not needed, Azure Load Balancer (L4), Application Gateway (L7)     |

### Step 2: Determine VM vs VMSS

**Workflow:**

1. Review [VMSS Guide](../../references/vmss-guide.md) to understand when VMSS vs single VM is appropriate
2. Use the gathered requirements to decide which approach fits best
3. **REQUIRED: If recommending VMSS**, fetch current documentation to verify capabilities:
   ```bash
   web_fetch https://learn.microsoft.com/en-us/azure/virtual-machine-scale-sets/overview
   web_fetch https://learn.microsoft.com/en-us/azure/virtual-machine-scale-sets/virtual-machine-scale-sets-autoscale-overview
   ```
4. **If `web_fetch` fails**, proceed with reference file guidance but include this warning:
   > Unable to verify against latest Azure documentation. Recommendation based on reference material that may not reflect recent updates.

```text
Needs autoscaling?
β”œβ”€ Yes β†’ VMSS
β”œβ”€ No
β”‚  β”œβ”€ Multiple identical instances needed?
β”‚  β”‚  β”œβ”€ Yes β†’ VMSS
β”‚  β”‚  └─ No
β”‚  β”‚     β”œβ”€ High availability across fault domains / zones?
β”‚  β”‚     β”‚  β”œβ”€ Yes, many instances β†’ VMSS
β”‚  β”‚     β”‚  └─ Yes, 1-2 instances β†’ VM + Availability Zone
β”‚  β”‚     └─ Single instance sufficient? β†’ VM
```

| Signal                                        | Recommendation                | Why                                                                   |
| --------------------------------------------- | ----------------------------- | --------------------------------------------------------------------- |
| Autoscale on CPU, memory, or schedule         | **VMSS**                      | Built-in autoscale; no custom automation needed                       |
| Stateless web/API tier behind a load balancer | **VMSS**                      | Homogeneous fleet with automatic distribution                         |
| Batch / parallel processing across many nodes | **VMSS**                      | Scale out on demand, scale to zero when idle                          |
| Mixed VM sizes in one group                   | **VMSS (Flexible)**           | Flexible orchestration supports mixed SKUs                            |
| Single long-lived server (jumpbox, AD DC)     | **VM**                        | No scaling benefit; simpler management                                |
| Unique per-instance config required           | **VM**                        | Scale sets assume homogeneous configuration                           |
| Stateful workload, tightly-coupled cluster    | **VM** (or VMSS case-by-case) | Evaluate carefully; VMSS Flexible can work for some stateful patterns |

> **Warning:** If the user is unsure, default to **single VM** for simplicity. Recommend VMSS only when scaling, HA, or fleet management is clearly needed.

### Step 3: Select VM Family

**Workflow:**

1. Review [VM Family Guide](../../references/vm-families.md) to identify 2-3 candidate VM families that match the workload requirements
2. **REQUIRED: verify specifications** for your chosen candidates by fetching current documentation:
   ```bash
   web_fetch https://learn.microsoft.com/en-us/azure/virtual-machines/sizes/<family-category>/<series-name>
   ```
   
   Examples:
   - B-series: `https://learn.microsoft.com/en-us/azure/virtual-machines/sizes/general-purpose/b-family`
   - D-series: `https://learn.microsoft.com/en-us/azure/virtual-machines/sizes/general-purpose/ddsv5-series`
   - GPU: `https://learn.microsoft.com/en-us/azure/virtual-machines/sizes/gpu-accelerated/nc-family`

3. **If considering Spot VMs**, also fetch:
   ```bash
   web_fetch https://learn.microsoft.com/en-us/azure/virtual-machine-scale-sets/use-spot
   ```

4. **If `web_fetch` fails**, proceed with reference file guidance but include this warning:
   > Unable to verify against latest Azure documentation. Recommendation based on reference material that may not reflect recent updates or limitations (e.g., Spot VM compatibility).

This step applies to both single VMs and VMSS since scale sets use the same VM SKUs.

### Step 4: Look Up Pricing

Query the Azure Retail Prices API β€” [Retail Prices API Guide](../../references/retail-prices-api.md)

> **Tip:** VMSS has no extra charge β€” pricing is per-VM instance. Use the same VM pricing from the API and multiply by the expected instance count to estimate VMSS cost. For autoscaling workloads, estimate cost at both the minimum and maximum instance count.

### Step 5: Validate Quota Availability

> **GATE β€” Do not present recommendations until quota is validated.**

If the user has an Azure subscription and region, follow the [VM Quota Validation Guide](../../references/vm-quotas.md) to check vCPU capacity for each candidate VM family. Skip this step if no subscription β€” add a note that quota should be checked before deployment.

| Outcome | Action |
|---|---|
| βœ… Sufficient | Proceed to Step 6 |
| ⚠️ Near limit (>80%) | Proceed but warn; suggest requesting increase |
| ❌ Insufficient | Request increase, swap family, or try another region |

Include a "Quota Status" column (βœ…/⚠️/❌) in the recommendation table.

> πŸ“– **Full details:** See [VM Quota Validation Guide](../../references/vm-quotas.md) for quota structure, CLI commands, VMSS considerations, and fallback methods.

### Step 6: Present Recommendations

Provide **2–3 options** with trade-offs:

| Column         | Purpose                                         |
| -------------- | ----------------------------------------------- |
| Hosting Model  | VM or VMSS (with orchestration mode if VMSS)    |
| VM Size        | ARM SKU name (e.g., `Standard_D4s_v5`)          |
| vCPUs / RAM    | Core specs                                      |
| Instance Count | 1 for VM; min–max range for VMSS with autoscale |
| Estimated $/hr | Per-instance pay-as-you-go from API             |
| Why            | Fit for the workload                            |
| Trade-off      | What the user gives up                          |

> **Tip:** Always explain *why* a family fits and what the user trades off (cost vs cores, burstable vs dedicated, single VM simplicity vs VMSS scalability, etc.).

For VMSS recommendations, also mention:
- Recommended orchestration mode (Flexible for most new workloads)
- Autoscale strategy (metric-based, schedule-based, or both)
- Load balancer type (Azure Load Balancer for L4, Application Gateway for L7/TLS)

### Step 7: Offer Next Steps

- Compare reservation / savings plan pricing (query API with `priceType eq 'Reservation'`)
- Suggest [Azure Pricing Calculator](https://azure.microsoft.com/pricing/calculator/) for full estimates
- For VMSS: suggest reviewing [autoscale best practices](https://learn.microsoft.com/en-us/azure/azure-monitor/autoscale/autoscale-best-practices) and [VMSS networking](https://learn.microsoft.com/en-us/azure/virtual-machine-scale-sets/virtual-machine-scale-sets-networking)

## Error Handling

| Scenario                        | Action                                                                         |
| ------------------------------- | ------------------------------------------------------------------------------ |
| API returns empty results       | Broaden filters β€” check `armRegionName`, `serviceName`, `armSkuName` spelling  |
| User unsure of workload type    | Ask clarifying questions; default to General Purpose D-series                  |
| Region not specified            | Use `eastus` as default; note prices vary by region                            |
| Unclear if VM or VMSS needed    | Ask about scaling and instance count; default to single VM if unsure           |
| User asks VMSS pricing directly | Use same VM pricing API β€” VMSS has no extra charge; multiply by instance count |

## References

- [VM Family Guide](../../references/vm-families.md) β€” Family-to-workload mapping and selection
- [Retail Prices API Guide](../../references/retail-prices-api.md) β€” Query patterns, filters, and examples
- [VMSS Guide](../../references/vmss-guide.md) β€” When to use VMSS, orchestration modes, and autoscale patterns
- [VM Quota Validation Guide](../../references/vm-quotas.md) β€” vCPU quota checks, CLI commands, and capacity planning
workflows/vm-troubleshooter/references/
cannot-connect-to-vm.md 5.6 KB
# Cannot Connect to VM

Index of VM connectivity troubleshooting references. Route to the appropriate file based on the user's symptom category.

> ⚠️ **Determine OS first.** If the user hasn't stated their OS, check via CLI (`az vm get-instance-view`) or ask. OS matters because:
> - **Windows** β†’ RDP (port 3389), Windows Firewall, TermService, PowerShell Run Commands
> - **Linux** β†’ SSH (port 22), iptables/firewalld/UFW, sshd, Shell Run Commands
> - **Other images** (FreeBSD, Flatcar, etc.) β†’ SSH; firewall and init systems vary β€” fetch the latest docs

## Routing

| Signal in User Message                                                         | Category                  | Reference                                                  |
| ------------------------------------------------------------------------------ | ------------------------- | ---------------------------------------------------------- |
| "can't RDP", timeout, black screen, RDP error, internal error                  | Unable to RDP             | [rdp-connectivity.md](rdp-connectivity.md)                 |
| "can't SSH", refused, permission denied, publickey                             | Unable to SSH             | [ssh-connectivity.md](ssh-connectivity.md)                 |
| NSG, no public IP, NIC disabled, routing, effective rules                      | Network Issues            | [network-connectivity.md](network-connectivity.md)         |
| Guest firewall, Windows Firewall, iptables, firewalld, BlockInboundAlways      | Firewall Blocking         | [firewall-blocking.md](firewall-blocking.md)               |
| VM agent down, Run Command timeout, Serial Console, boot diagnostics, BSOD     | VM Agent Not Responding   | [vm-agent-not-responding.md](vm-agent-not-responding.md)   |
| Wrong password, credentials, access denied, CredSSP, account expired           | Credential / Auth Errors  | [credential-auth-errors.md](credential-auth-errors.md)     |
| TermService stopped, RDP disabled, port changed, TLS cert, NLA, GPO, licensing | RDP Service / Config      | [rdp-service-config.md](rdp-service-config.md)             |

## Workflow

1. Identify the symptom category from the routing table above
2. Open the matching reference file for the Symptoms β†’ Solutions table and Quick Commands
3. Narrow to the specific solution row matching the user's symptom
4. **Before any extension-backed operation, run [Pre-Flight Safety Checks](#pre-flight-safety-checks)**
5. Fetch the linked documentation URL for the latest guidance
6. Summarize diagnostic steps and resolution, referencing the official docs

---

## Pre-Flight Safety Checks

> ⚠️ **Warning:** Always run these checks before any command that depends on the VM agent or extensions (`az vm user update`, `az vm user reset-ssh`, `az vm user reset-remote-desktop`, `az vm run-command invoke`). Running extension-backed operations on a VM with an unhealthy agent or stuck extensions can **deadlock the VM** and require manual portal recovery.

```bash
# 1. Check VM power state, provisioning state, and agent status
az vm get-instance-view --name <vm-name> -g <resource-group> \
  --query "instanceView.{powerState:[statuses[?starts_with(code,'PowerState/')]][0][0].code, provisioningState:[statuses[?starts_with(code,'ProvisioningState/')]][0][0].code, vmAgentStatus:vmAgent.statuses[0].displayStatus}" -o json

# 2. Check existing extension states
az vm extension list --vm-name <vm-name> -g <resource-group> \
  --query "[].{name:name, provisioningState:provisioningState}" -o table
```

| Check | Safe Value | Unsafe β€” Do NOT proceed |
| ----- | ---------- | ----------------------- |
| Power state | `PowerState/running` | Any other value, missing, or query error |
| Provisioning state | `ProvisioningState/succeeded` | `Updating`, `Creating`, `Failed`, `Deleting`, missing, or query error |
| VM agent status | `Ready` | `Not Ready`, `null`, missing, or query error |
| Extension states | All `Succeeded` or no extensions | Any extension in `Creating`, `Updating`, `Deleting`, or `Failed` |

> πŸ’‘ **Tip:** If a check returns `null`, empty, or the CLI command itself errors, treat the result as **unsafe**.

**If any check is unsafe:**
1. **Stop.** Do NOT run any extension-backed command.
2. Inform the user which check(s) failed and what the current state is.
3. Use non-agent alternatives: **Serial Console**, **offline repair VM**, or **Portal-based actions**.
4. If the state appears transient (e.g., VM just started, provisioning briefly not `succeeded`), wait 30–60 seconds and **re-run the checks only** β€” do not run the extension command until all checks pass.

---

## Escalation

If the issue doesn't match any symptom above, or if the documented solutions don't resolve it:

1. **Check Azure Resource Health** β€” Portal > VM > Resource health (checks for platform-level issues)
2. **Offer to restart the VM** (requires user approval) β€” `az vm restart --name <vm-name> -g <resource-group>`
3. **Offer to redeploy the VM** (requires user approval β€” moves to new host) β€” `az vm redeploy --name <vm-name> -g <resource-group>`
4. **Comprehensive troubleshooting:**
   - Windows: [Troubleshoot RDP connections](https://learn.microsoft.com/en-us/troubleshoot/azure/virtual-machines/windows/troubleshoot-rdp-connection)
   - Linux: [Troubleshoot SSH connections](https://learn.microsoft.com/en-us/troubleshoot/azure/virtual-machines/linux/troubleshoot-ssh-connection)
   - Windows hub: [All Windows VM troubleshooting docs](https://learn.microsoft.com/en-us/troubleshoot/azure/virtual-machines/windows/welcome-virtual-machines-windows)
   - Linux hub: [All Linux VM troubleshooting docs](https://learn.microsoft.com/en-us/troubleshoot/azure/virtual-machines/linux/welcome-virtual-machines-linux)
credential-auth-errors.md 5.7 KB
# Credential and Authentication Errors

User can reach the VM but authentication fails.

## Windows (RDP) β€” Symptoms β†’ Solutions

| Symptom                                                    | Solution                                                                      | Documentation                                                                                                                                 |
| ---------------------------------------------------------- | ----------------------------------------------------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------- |
| "Your credentials did not work"                            | Reset password via Portal or CLI                                              | [Reset RDP service or password](https://learn.microsoft.com/en-us/troubleshoot/azure/virtual-machines/windows/reset-rdp)                      |
| "Must change password before logging on"                   | Reset password via Portal (bypasses the requirement)                          | [Reset RDP service or password](https://learn.microsoft.com/en-us/troubleshoot/azure/virtual-machines/windows/reset-rdp)                      |
| "This user account has expired"                            | Extend account via Run Command: `net user <user> /expires:never`              | [Reset RDP service or password](https://learn.microsoft.com/en-us/troubleshoot/azure/virtual-machines/windows/reset-rdp)                      |
| "Trust relationship between workstation and domain failed" | Reset machine account or rejoin domain                                        | [Troubleshoot RDP connection](https://learn.microsoft.com/en-us/troubleshoot/azure/virtual-machines/windows/troubleshoot-rdp-connection)      |
| "Access is denied" / "Connection was denied"               | Add user to Remote Desktop Users group                                        | [Specific RDP errors](https://learn.microsoft.com/en-us/troubleshoot/azure/virtual-machines/windows/troubleshoot-specific-rdp-errors#wincred) |
| Wrong username format                                      | Use `VMNAME\user` for local, `DOMAIN\user` for domain accounts                | [Specific RDP errors](https://learn.microsoft.com/en-us/troubleshoot/azure/virtual-machines/windows/troubleshoot-specific-rdp-errors#wincred) |
| CredSSP "encryption oracle" error                          | Temporary: set AllowEncryptionOracle=2 on client; permanent: patch both sides | [CredSSP remediation](https://learn.microsoft.com/en-us/troubleshoot/azure/virtual-machines/windows/credssp-encryption-oracle-remediation)    |

## Linux (SSH) β€” Symptoms β†’ Solutions

| Symptom                                                     | Solution                                                                      | Documentation                                                                                                                                    |
| ----------------------------------------------------------- | ----------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------ |
| "Permission denied (publickey)"                             | Wrong key, wrong user, or key not in authorized_keys β€” reset key via CLI      | [Detailed SSH troubleshooting](https://learn.microsoft.com/en-us/troubleshoot/azure/virtual-machines/linux/detailed-troubleshoot-ssh-connection) |
| "Permission denied (password)"                              | Wrong password or password auth disabled in sshd_config                        | [Detailed SSH troubleshooting](https://learn.microsoft.com/en-us/troubleshoot/azure/virtual-machines/linux/detailed-troubleshoot-ssh-connection) |
| Account locked after failed attempts                        | Unlock via Run Command: `passwd -u <user>` or `pam_tally2 --reset --user <user>` | [Troubleshoot SSH connection](https://learn.microsoft.com/en-us/troubleshoot/azure/virtual-machines/linux/troubleshoot-ssh-connection)           |
| "Permission denied" with Entra ID (AAD) SSH login           | Missing role: Virtual Machine Administrator Login or Virtual Machine User Login | [Troubleshoot SSH connection](https://learn.microsoft.com/en-us/troubleshoot/azure/virtual-machines/linux/troubleshoot-ssh-connection)           |
| sudo password prompt fails / user not in sudoers            | Fix via Run Command or Serial Console                                          | [Troubleshoot SSH connection](https://learn.microsoft.com/en-us/troubleshoot/azure/virtual-machines/linux/troubleshoot-ssh-connection)           |

## Quick Commands β€” Windows

> ⚠️ **Warning:** Commands below use the VM agent/extensions. Run [Pre-Flight Safety Checks](cannot-connect-to-vm.md#pre-flight-safety-checks) before using them.

```bash
# ⚑ Reset password
az vm user update --name <vm-name> -g <resource-group> -u <username> -p '<new-password>'

# ⚑ Reset RDP configuration (also re-enables NLA)
az vm user reset-remote-desktop --name <vm-name> -g <resource-group>
```

## Quick Commands β€” Linux

> ⚠️ **Warning:** Commands below use the VM agent/extensions. Run [Pre-Flight Safety Checks](cannot-connect-to-vm.md#pre-flight-safety-checks) before using them.

```bash
# ⚑ Reset SSH public key
az vm user update --name <vm-name> -g <resource-group> \
  -u <username> --ssh-key-value "<ssh-public-key>"

# ⚑ Reset password for Linux VM
az vm user update --name <vm-name> -g <resource-group> \
  -u <username> -p '<new-password>'

# ⚑ Unlock a locked account via Run Command
az vm run-command invoke --name <vm-name> -g <resource-group> \
  --command-id RunShellScript --scripts "passwd -u <username>"
```
firewall-blocking.md 4.8 KB
# Firewall Blocking Connectivity

Guest OS firewall (Windows Firewall or Linux iptables/firewalld) is blocking inbound connections even though NSG allows them.

## Symptoms β†’ Solutions

| Symptom                                                     | OS      | Solution                                             | Documentation                                                                                                                                          |
| ----------------------------------------------------------- | ------- | ---------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------ |
| Windows Firewall blocking RDP                               | Windows | Re-enable "Remote Desktop" firewall rule group       | [Guest OS firewall blocking](https://learn.microsoft.com/en-us/troubleshoot/azure/virtual-machines/windows/guest-os-firewall-blocking-inbound-traffic) |
| Firewall policy set to BlockInboundAlways                   | Windows | Reset to `blockinbound,allowoutbound` policy         | [Enable/disable firewall rule](https://learn.microsoft.com/en-us/troubleshoot/azure/virtual-machines/windows/enable-disable-firewall-rule-guest-os)    |
| Third-party AV/firewall blocking                            | Windows | Stop the third-party service, test, then reconfigure | [Guest OS firewall blocking](https://learn.microsoft.com/en-us/troubleshoot/azure/virtual-machines/windows/guest-os-firewall-blocking-inbound-traffic) |
| iptables/nftables blocking SSH (port 22)                    | Linux   | Add allow rule or flush blocking chain               | [Troubleshoot SSH connection](https://learn.microsoft.com/en-us/troubleshoot/azure/virtual-machines/linux/troubleshoot-ssh-connection)                 |
| firewalld blocking SSH                                      | Linux   | Open port 22 in the active zone                      | [Troubleshoot SSH connection](https://learn.microsoft.com/en-us/troubleshoot/azure/virtual-machines/linux/troubleshoot-ssh-connection)                 |
| UFW blocking SSH (Ubuntu/Debian)                            | Linux   | Run `ufw allow 22/tcp` or disable UFW temporarily    | [Troubleshoot SSH connection](https://learn.microsoft.com/en-us/troubleshoot/azure/virtual-machines/linux/troubleshoot-ssh-connection)                 |
| Cannot access firewall settings β€” no connectivity (Windows) | Windows | Use offline repair VM to modify registry             | [Disable guest OS firewall offline](https://learn.microsoft.com/en-us/troubleshoot/azure/virtual-machines/windows/disable-guest-os-firewall-windows)   |
| Cannot access firewall settings β€” no connectivity (Linux)   | Linux   | Use Serial Console or repair VM to edit iptables/firewalld config | [Repair Linux VM commands](https://learn.microsoft.com/en-us/troubleshoot/azure/virtual-machines/linux/repair-linux-vm-using-azure-virtual-machine-repair-commands) |

## Quick Commands β€” Windows

> ⚠️ **Warning:** Commands marked with ⚑ use the VM agent/extensions. Run [Pre-Flight Safety Checks](cannot-connect-to-vm.md#pre-flight-safety-checks) before using them.

```bash
# ⚑ Reset RDP config (re-enables RDP, creates firewall rule for 3389)
az vm user reset-remote-desktop --name <vm-name> -g <resource-group>

# ⚑ Query Windows Firewall rules via Run Command
az vm run-command invoke --name <vm-name> -g <resource-group> \
  --command-id RunPowerShellScript \
  --scripts "netsh advfirewall firewall show rule name='Remote Desktop - User Mode (TCP-In)'"

# ⚑ Enable Remote Desktop firewall rule via Run Command
az vm run-command invoke --name <vm-name> -g <resource-group> \
  --command-id RunPowerShellScript \
  --scripts "netsh advfirewall firewall set rule group='Remote Desktop' new enable=yes"
```

## Quick Commands β€” Linux

> ⚠️ **Warning:** Commands below use the VM agent/extensions. Run [Pre-Flight Safety Checks](cannot-connect-to-vm.md#pre-flight-safety-checks) before using them.

```bash
# ⚑ Check iptables rules via Run Command
az vm run-command invoke --name <vm-name> -g <resource-group> \
  --command-id RunShellScript --scripts "iptables -L -n --line-numbers"

# ⚑ Allow SSH through iptables via Run Command
az vm run-command invoke --name <vm-name> -g <resource-group> \
  --command-id RunShellScript --scripts "iptables -I INPUT -p tcp --dport 22 -j ACCEPT"

# ⚑ Check firewalld status and open SSH via Run Command
az vm run-command invoke --name <vm-name> -g <resource-group> \
  --command-id RunShellScript \
  --scripts "firewall-cmd --state && firewall-cmd --add-service=ssh --permanent && firewall-cmd --reload"

# Check/allow UFW (Ubuntu/Debian) via Run Command
az vm run-command invoke --name <vm-name> -g <resource-group> \
  --command-id RunShellScript --scripts "ufw status; ufw allow 22/tcp"
```
network-connectivity.md 4.7 KB
# Network Connectivity Problems

User's VM is running but unreachable due to network-level issues (NSG, routing, NIC, DNS).

> ⚠️ **OS Note:** NSG, routing, and public IP issues are OS-agnostic (Azure platform layer). NIC and DNS issues have OS-specific remediation β€” see the OS column below.

## Symptoms β†’ Solutions

| Symptom                                    | OS      | Solution                                                | Documentation                                                                                                                                  |
| ------------------------------------------ | ------- | ------------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------- |
| NSG has no allow rule for RDP/SSH port     | Any     | Add inbound allow rule for TCP 3389 or 22               | [NSG blocking RDP](https://learn.microsoft.com/en-us/troubleshoot/azure/virtual-machines/windows/troubleshoot-rdp-nsg-problem)                 |
| NSG at both NIC and subnet level blocking  | Any     | Traffic must pass both NSGs β€” check effective rules     | [Diagnose VM traffic filtering](https://learn.microsoft.com/en-us/azure/network-watcher/diagnose-vm-network-traffic-filtering-problem)         |
| Custom route (UDR) sending traffic to NVA  | Any     | Check effective routes, verify NVA is forwarding        | [Diagnose VM routing](https://learn.microsoft.com/en-us/azure/network-watcher/diagnose-vm-network-routing-problem)                             |
| VM has no public IP                        | Any     | Add a public IP or connect via Azure Bastion            | [Public IP addresses](https://learn.microsoft.com/en-us/azure/virtual-network/ip-services/public-ip-addresses)                                 |
| NIC is disabled inside guest OS            | Windows | Enable NIC via Run Command or Serial Console            | [Troubleshoot RDP β€” NIC disabled](https://learn.microsoft.com/en-us/troubleshoot/azure/virtual-machines/windows/troubleshoot-rdp-nic-disabled) |
| NIC is down inside guest OS                | Linux   | Bring interface up via Run Command or Serial Console    | [Troubleshoot SSH connection](https://learn.microsoft.com/en-us/troubleshoot/azure/virtual-machines/linux/troubleshoot-ssh-connection)          |
| Static IP misconfiguration inside guest    | Windows | Azure VMs should use DHCP; reset NIC to restore         | [Reset network interface](https://learn.microsoft.com/en-us/troubleshoot/azure/virtual-machines/windows/reset-network-interface)               |
| Static IP misconfiguration inside guest    | Linux   | Restore DHCP config in `/etc/netplan/` or `/etc/sysconfig/network-scripts/` | [Troubleshoot SSH connection](https://learn.microsoft.com/en-us/troubleshoot/azure/virtual-machines/linux/troubleshoot-ssh-connection) |
| Ghost NIC after disk swap or resize        | Windows | Old NIC holds IP config, new NIC can't get IP           | [Reset network interface](https://learn.microsoft.com/en-us/troubleshoot/azure/virtual-machines/windows/reset-network-interface)               |
| DNS resolution failure                     | Any     | Check DNS server config; Azure default is 168.63.129.16 | [DHCP troubleshooting](https://learn.microsoft.com/en-us/troubleshoot/azure/virtual-machines/windows/troubleshoot-rdp-dhcp-disabled)           |

## Quick Commands β€” Platform (Any OS)

```bash
# Check effective NSG rules on a NIC
az network nic list-effective-nsg --name <nic-name> -g <resource-group>

# Check effective routes
az network nic show-effective-route-table --name <nic-name> -g <resource-group> -o table

# Check if VM has a public IP
az vm list-ip-addresses --name <vm-name> -g <resource-group> -o table

# Test connectivity from VM to a destination
az network watcher test-connectivity --source-resource <vm-resource-id> \
  --dest-address <destination-ip> --dest-port <port> -g <resource-group>
```

## Quick Commands β€” Windows

```bash
# Reset NIC (restores DHCP, removes stale config β€” Windows only)
az vm repair reset-nic --name <vm-name> -g <resource-group> --yes
```

## Quick Commands β€” Linux

> ⚠️ **Warning:** Commands below use the VM agent/extensions. Run [Pre-Flight Safety Checks](cannot-connect-to-vm.md#pre-flight-safety-checks) before using them.

```bash
# ⚑ Check network interface status via Run Command
az vm run-command invoke --name <vm-name> -g <resource-group> \
  --command-id RunShellScript --scripts "ip link show; ip addr show"

# ⚑ Bring interface up via Run Command
az vm run-command invoke --name <vm-name> -g <resource-group> \
  --command-id RunShellScript --scripts "ip link set eth0 up && dhclient eth0"
```
rdp-connectivity.md 6.9 KB
# Unable to RDP into the VM

User is trying to RDP into a Windows VM but the connection fails (timeout, refused, or error dialog).

## Symptoms β†’ Solutions

| Symptom                                                                     | Solution                                                   | Documentation                                                                                                                                                                            |
| --------------------------------------------------------------------------- | ---------------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| Connection times out, no response at all                                    | NSG missing allow rule for port 3389                       | [NSG blocking RDP](https://learn.microsoft.com/en-us/troubleshoot/azure/virtual-machines/windows/troubleshoot-rdp-nsg-problem)                                                           |
| Connection times out, NSG rules look correct                                | Guest OS firewall is blocking inbound RDP                  | [Guest OS firewall blocking](https://learn.microsoft.com/en-us/troubleshoot/azure/virtual-machines/windows/guest-os-firewall-blocking-inbound-traffic)                                   |
| "Your credentials did not work"                                             | Wrong password or username format                          | [Credentials error](https://learn.microsoft.com/en-us/troubleshoot/azure/virtual-machines/windows/troubleshoot-specific-rdp-errors#windows-security-error-your-credentials-did-not-work) |
| "An internal error has occurred"                                            | RDP service, TLS certificate, or security layer issue      | [RDP internal error](https://learn.microsoft.com/en-us/troubleshoot/azure/virtual-machines/windows/troubleshoot-rdp-internal-error)                                                      |
| Black screen after login                                                    | Explorer.exe crash, GPU driver, or GPO stuck               | [Detailed RDP troubleshooting](https://learn.microsoft.com/en-us/troubleshoot/azure/virtual-machines/windows/detailed-troubleshoot-rdp)                                                  |
| "No Remote Desktop License Servers available"                               | RDS licensing grace period expired                         | [Specific RDP errors](https://learn.microsoft.com/en-us/troubleshoot/azure/virtual-machines/windows/troubleshoot-specific-rdp-errors#rdplicense)                                         |
| "Remote Desktop can't find the computer"                                    | VM has no public IP, DNS issue, or VM is deallocated       | [Specific RDP errors](https://learn.microsoft.com/en-us/troubleshoot/azure/virtual-machines/windows/troubleshoot-specific-rdp-errors#rdpname)                                            |
| "An authentication error has occurred / LSA"                                | NLA/CredSSP mismatch, clock skew, or wrong username format | [Specific RDP errors](https://learn.microsoft.com/en-us/troubleshoot/azure/virtual-machines/windows/troubleshoot-specific-rdp-errors#rdpauth)                                            |
| "Remote Desktop can't connect to the remote computer"                       | Generic β€” multiple possible causes                         | [Specific RDP errors](https://learn.microsoft.com/en-us/troubleshoot/azure/virtual-machines/windows/troubleshoot-specific-rdp-errors#rdpconnect)                                         |
| "Because of a security error"                                               | TLS certificate or version mismatch                        | [RDP general error](https://learn.microsoft.com/en-us/troubleshoot/azure/virtual-machines/windows/troubleshoot-rdp-general-error)                                                        |
| RDP connects then disconnects immediately                                   | Session limits, idle timeout, or resource exhaustion       | [RDP disconnections](https://learn.microsoft.com/en-us/troubleshoot/azure/virtual-machines/windows/troubleshoot-rdp-connection)                                                          |
| Works from some IPs but not others                                          | NSG source IP restriction too narrow                       | [NSG blocking RDP](https://learn.microsoft.com/en-us/troubleshoot/azure/virtual-machines/windows/troubleshoot-rdp-nsg-problem)                                                           |
| Event log shows specific RDP error Event IDs                                | Match Event ID to known cause (e.g., 1058, 36870)          | [RDP issues by Event ID](https://learn.microsoft.com/en-us/troubleshoot/azure/virtual-machines/windows/event-id-troubleshoot-vm-rdp-connecton)                                           |
| "Authentication error has occurred" / "function requested is not supported" | CredSSP, NLA, or certificate issue                         | [Authentication errors on RDP](https://learn.microsoft.com/en-us/troubleshoot/azure/virtual-machines/windows/cannot-connect-rdp-azure-vm)                                                |
| Guest NIC is disabled inside the VM                                         | Enable NIC via Run Command or Serial Console               | [Troubleshoot RDP β€” NIC disabled](https://learn.microsoft.com/en-us/troubleshoot/azure/virtual-machines/windows/troubleshoot-rdp-nic-disabled)                                           |

## Quick Commands

> ⚠️ **Warning:** Commands marked with ⚑ use the VM agent/extensions. Run [Pre-Flight Safety Checks](cannot-connect-to-vm.md#pre-flight-safety-checks) before using them.

```bash
# Check VM power state
az vm get-instance-view --name <vm-name> -g <resource-group> \
  --query "instanceView.statuses[1].displayStatus" -o tsv

# Check NSG rules
az network nsg rule list --nsg-name <nsg-name> -g <resource-group> -o table

# ⚑ Reset RDP configuration to defaults (re-enables RDP, resets port, restarts TermService)
az vm user reset-remote-desktop --name <vm-name> -g <resource-group>

# ⚑ Reset VM password
az vm user update --name <vm-name> -g <resource-group> -u <username> -p '<new-password>'

# IP Flow Verify β€” test if NSG allows traffic
az network watcher test-ip-flow --direction Inbound --protocol TCP \
  --local <vm-private-ip>:3389 --remote <your-public-ip>:* \
  --vm <vm-name> -g <resource-group>
```

## General RDP Troubleshooting

If the symptom doesn't match a specific row above, follow Microsoft's systematic approach:

- [Troubleshoot RDP connections to an Azure VM](https://learn.microsoft.com/en-us/troubleshoot/azure/virtual-machines/windows/troubleshoot-rdp-connection)
- [Detailed RDP troubleshooting steps](https://learn.microsoft.com/en-us/troubleshoot/azure/virtual-machines/windows/detailed-troubleshoot-rdp)
rdp-service-config.md 3.2 KB
# RDP Service and Configuration Issues

VM is reachable but the RDP service itself is broken or misconfigured.

## Symptoms β†’ Solutions

| Symptom                                | Solution                                                               | Documentation                                                                                                                                    |
| -------------------------------------- | ---------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------ |
| TermService not running                | Start the service and set to Automatic                                 | [Reset RDP service](https://learn.microsoft.com/en-us/troubleshoot/azure/virtual-machines/windows/reset-rdp)                                     |
| RDP port changed from 3389             | Reset port or update NSG to allow the custom port                      | [Detailed RDP troubleshooting](https://learn.microsoft.com/en-us/troubleshoot/azure/virtual-machines/windows/detailed-troubleshoot-rdp)          |
| RDP disabled (fDenyTSConnections = 1)  | Reset RDP config via CLI or Portal                                     | [Reset RDP service](https://learn.microsoft.com/en-us/troubleshoot/azure/virtual-machines/windows/reset-rdp)                                     |
| TLS/SSL certificate expired or corrupt | Delete cert and restart TermService to regenerate                      | [RDP internal error](https://learn.microsoft.com/en-us/troubleshoot/azure/virtual-machines/windows/troubleshoot-rdp-internal-error)              |
| NLA/Security Layer mismatch            | Temporarily disable NLA for recovery                                   | [RDP general error](https://learn.microsoft.com/en-us/troubleshoot/azure/virtual-machines/windows/troubleshoot-rdp-general-error)                |
| GPO overriding local RDP settings      | Check `HKLM:\SOFTWARE\Policies\Microsoft\Windows NT\Terminal Services` | [Detailed RDP troubleshooting](https://learn.microsoft.com/en-us/troubleshoot/azure/virtual-machines/windows/detailed-troubleshoot-rdp)          |
| RDS licensing expired                  | Remove RDSH role or configure license server                           | [Specific RDP errors](https://learn.microsoft.com/en-us/troubleshoot/azure/virtual-machines/windows/troubleshoot-specific-rdp-errors#rdplicense) |

## Quick Commands

> ⚠️ **Warning:** Commands marked with ⚑ use the VM agent/extensions. Run [Pre-Flight Safety Checks](cannot-connect-to-vm.md#pre-flight-safety-checks) before using them.

```bash
# ⚑ Reset all RDP configuration to defaults
az vm user reset-remote-desktop --name <vm-name> -g <resource-group>

# ⚑ Check TermService status via Run Command
az vm run-command invoke --name <vm-name> -g <resource-group> \
  --command-id RunPowerShellScript --scripts "Get-Service TermService | Select-Object Status, StartType"

# Restart VM (if RDP service is unrecoverable β€” requires user approval)
az vm restart --name <vm-name> -g <resource-group>

# Redeploy VM (moves to new host β€” last resort, requires user approval)
az vm redeploy --name <vm-name> -g <resource-group>
```
ssh-connectivity.md 5.4 KB
# Unable to SSH into the VM

User is trying to SSH into a Linux VM but the connection fails.

## Symptoms β†’ Solutions

| Symptom                                           | Solution                                                                                   | Documentation                                                                                                                                    |
| ------------------------------------------------- | ------------------------------------------------------------------------------------------ | ------------------------------------------------------------------------------------------------------------------------------------------------ |
| "Connection refused" on port 22                   | SSH service not running or listening on a different port                                   | [Troubleshoot SSH connection](https://learn.microsoft.com/en-us/troubleshoot/azure/virtual-machines/linux/troubleshoot-ssh-connection)           |
| "Connection timed out"                            | NSG blocking port 22, VM not running, or no public IP                                      | [Troubleshoot SSH connection](https://learn.microsoft.com/en-us/troubleshoot/azure/virtual-machines/linux/troubleshoot-ssh-connection)           |
| "Permission denied (publickey)"                   | Wrong SSH key, wrong user, or key not in authorized_keys                                   | [Detailed SSH troubleshooting](https://learn.microsoft.com/en-us/troubleshoot/azure/virtual-machines/linux/detailed-troubleshoot-ssh-connection) |
| "Permission denied (password)"                    | Wrong password or password auth disabled in sshd_config                                    | [Detailed SSH troubleshooting](https://learn.microsoft.com/en-us/troubleshoot/azure/virtual-machines/linux/detailed-troubleshoot-ssh-connection) |
| "Host key verification failed"                    | VM was redeployed and got a new host key                                                   | Remove old entry from `~/.ssh/known_hosts`                                                                                                       |
| "Server unexpectedly closed connection"           | Disk full, SSH config error, or PAM issue                                                  | [Detailed SSH troubleshooting](https://learn.microsoft.com/en-us/troubleshoot/azure/virtual-machines/linux/detailed-troubleshoot-ssh-connection) |
| SSH hangs with no response                        | Firewall (iptables/firewalld), routing, or NIC issue                                       | [Troubleshoot SSH connection](https://learn.microsoft.com/en-us/troubleshoot/azure/virtual-machines/linux/troubleshoot-ssh-connection)           |
| Cannot SSH into Debian Linux VM                   | Debian-specific network or sshd config issue                                               | [Cannot connect to Debian Linux VM](https://learn.microsoft.com/en-us/troubleshoot/azure/virtual-machines/linux/cannot-connect-debian-linux)     |
| SSH blocked after SELinux policy change           | SELinux misconfigured β€” blocking sshd                                                      | [SELinux troubleshooting](https://learn.microsoft.com/en-us/troubleshoot/azure/virtual-machines/linux/linux-selinux-troubleshooting)             |
| "Permission denied" with Entra ID (AAD) SSH login | Missing role assignment: Virtual Machine Administrator Login or Virtual Machine User Login | [Troubleshoot SSH connection](https://learn.microsoft.com/en-us/troubleshoot/azure/virtual-machines/linux/troubleshoot-ssh-connection)           |
| Linux VM not booting β€” UEFI boot failure          | Gen2 VM UEFI boot issue preventing SSH                                                     | [Linux VM UEFI boot failures](https://learn.microsoft.com/en-us/troubleshoot/azure/virtual-machines/linux/azure-linux-vm-uefi-boot-failures)     |

## Quick Commands

> ⚠️ **Warning:** Commands marked with ⚑ use the VM agent/extensions. Run [Pre-Flight Safety Checks](cannot-connect-to-vm.md#pre-flight-safety-checks) before using them.

```bash
# ⚑ Reset SSH configuration to defaults (resets sshd_config, restarts sshd)
az vm user reset-ssh --name <vm-name> -g <resource-group>

# ⚑ Reset SSH public key for a user
az vm user update --name <vm-name> -g <resource-group> \
  -u <username> --ssh-key-value "<ssh-public-key>"

# ⚑ Reset password for Linux VM
az vm user update --name <vm-name> -g <resource-group> \
  -u <username> -p '<new-password>'

# ⚑ Check if sshd is running via Run Command
az vm run-command invoke --name <vm-name> -g <resource-group> \
  --command-id RunShellScript --scripts "systemctl status sshd"

# ⚑ Check SELinux status via Run Command
az vm run-command invoke --name <vm-name> -g <resource-group> \
  --command-id RunShellScript --scripts "getenforce"

# ⚑ Set SELinux to permissive mode (temporary β€” survives until reboot)
az vm run-command invoke --name <vm-name> -g <resource-group> \
  --command-id RunShellScript --scripts "setenforce 0"
```

## General SSH Troubleshooting

If the symptom doesn't match a specific row above, follow Microsoft's systematic approach:

- [Troubleshoot SSH connections to an Azure Linux VM](https://learn.microsoft.com/en-us/troubleshoot/azure/virtual-machines/linux/troubleshoot-ssh-connection)
- [Detailed SSH troubleshooting steps](https://learn.microsoft.com/en-us/troubleshoot/azure/virtual-machines/linux/detailed-troubleshoot-ssh-connection)
vm-agent-not-responding.md 3.6 KB
# VM Agent Not Responding

Run Command and password reset depend on the VM agent. If the agent is unhealthy, alternative methods are needed.

> ⚠️ **OS Note:** Serial Console, Boot Diagnostics, and repair VM commands are available for both Windows and Linux but use separate doc pages and tools. Match the correct OS below.

## Symptoms β†’ Solutions

| Symptom                                                          | OS      | Solution                                                 | Documentation                                                                                                                                                       |
| ---------------------------------------------------------------- | ------- | -------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| Run Command times out                                            | Windows | VM agent may be down β€” use Serial Console instead        | [Serial Console β€” Windows](https://learn.microsoft.com/en-us/troubleshoot/azure/virtual-machines/windows/serial-console-overview)                                   |
| Run Command times out                                            | Linux   | VM agent may be down β€” use Serial Console instead        | [Serial Console β€” Linux](https://learn.microsoft.com/en-us/troubleshoot/azure/virtual-machines/linux/serial-console-linux)                                         |
| Password reset fails via Portal/CLI                              | Windows | VMAccess extension can't communicate β€” use offline reset | [Reset password without agent](https://learn.microsoft.com/en-us/troubleshoot/azure/virtual-machines/windows/reset-local-password-without-agent)                    |
| Password/key reset fails via Portal/CLI                          | Linux   | VMAccess extension can't communicate β€” use Serial Console | [Serial Console β€” Linux](https://learn.microsoft.com/en-us/troubleshoot/azure/virtual-machines/linux/serial-console-linux)                                        |
| VM not booting (Boot Diagnostics shows BSOD/stuck)               | Windows | OS-level issue β€” use repair VM for offline fix           | [Repair Windows VM](https://learn.microsoft.com/en-us/troubleshoot/azure/virtual-machines/windows/repair-windows-vm-using-azure-virtual-machine-repair-commands)    |
| VM not booting (Boot Diagnostics shows kernel panic/stuck)       | Linux   | Use repair VM for offline Linux disk fix                 | [Repair Linux VM](https://learn.microsoft.com/en-us/troubleshoot/azure/virtual-machines/linux/repair-linux-vm-using-azure-virtual-machine-repair-commands)          |
| VMAccess extension error on domain controller                    | Windows | VMAccess doesn't support DCs β€” use Serial Console        | [Serial Console β€” Windows](https://learn.microsoft.com/en-us/troubleshoot/azure/virtual-machines/windows/serial-console-overview)                                   |

## Quick Commands

```bash
# Connect to Serial Console via CLI
az serial-console connect --name <vm-name> -g <resource-group>

# Enable boot diagnostics (required for Serial Console)
az vm boot-diagnostics enable --name <vm-name> -g <resource-group>

# Get boot diagnostics screenshot/log
az vm boot-diagnostics get-boot-log --name <vm-name> -g <resource-group>

# Create repair VM for offline fixes
az vm repair create --name <vm-name> -g <resource-group> \
  --repair-username repairadmin --repair-password '<password>'

# Restore after offline fix
az vm repair restore --name <vm-name> -g <resource-group>
```
workflows/vm-troubleshooter/
vm-troubleshooter.md 9.8 KB
# Azure VM Connectivity Troubleshooting

> Diagnose and resolve Azure VM connectivity failures (RDP/SSH) by identifying symptoms, routing to the right solution, fetching the latest Microsoft documentation, and guiding the user through resolution.

## Quick Reference

| Property      | Details                                                                                                           |
| ------------- | ----------------------------------------------------------------------------------------------------------------- |
| Best for      | RDP/SSH connection failures, NSG/firewall misconfig, credential resets, NIC issues                                |
| Primary tools | Azure CLI, Azure PowerShell, Serial Console, Boot Diagnostics, Run Command                                        |
| Reference     | [references/cannot-connect-to-vm.md](references/cannot-connect-to-vm.md) |

## MCP Tools

| Tool            | Purpose                                                | Parameters                                                                                                           |
| --------------- | ------------------------------------------------------ | -------------------------------------------------------------------------------------------------------------------- |
| `fetch_webpage` | Fetch latest Microsoft troubleshooting docs at runtime | `urls` (Required): Array of doc URLs from reference file; `query` (Optional): User's symptom for relevant extraction |

## Triggers

Activate this skill when user mentions:

- "can't connect to my VM" / "can't RDP" / "can't SSH"
- "RDP not working" / "SSH refused" / "connection timed out"
- "black screen" on VM
- "reset VM password" / "forgot password"
- "NSG blocking" / "firewall blocking" / "port 3389"
- "serial console" access
- "internal error" on RDP
- "VM not reachable" / "public IP not working"
- "RDP disconnects" / "session dropped"

---

## Guardrails

- **Default to read-only diagnostics.** Gather evidence before suggesting any fix.
- Do not run extension-backed commands (`az vm user update`, `az vm user reset-ssh`, `az vm user reset-remote-desktop`, `az vm run-command invoke`) without first passing [Pre-Flight Safety Checks](#phase-25-pre-flight-safety-checks-before-extension-backed-operations).
- Do not restart, redeploy, deallocate, or delete a VM unless the user explicitly asks for remediation.
- Do not conclude root cause without quoting the evidence that supports it (e.g., NSG rule output, VM agent status, extension state).
- When multiple issues are found (e.g., NSG + credential), fix the network-layer issue first before attempting agent-dependent fixes.

## Evidence Order

Gather diagnostic evidence in this order before suggesting remediation:

1. **VM state:** power state, provisioning state, VM agent health, extension states
2. **Network layer:** public IP, NSG rules (NIC + subnet), effective routes, IP flow verify
3. **Guest OS layer (if agent is healthy):** service status via Run Command, firewall rules, sshd/TermService config

---

## Workflow

### Phase 1: Determine User Intent

Infer the connectivity issue from the user's message. If the issue is clear, proceed to Phase 2. If ambiguous, ask **one** clarifying question:

| Signal in User Message                                                    | Inferred Category  |
| ------------------------------------------------------------------------- | ------------------ |
| "can't RDP", "RDP timeout", "RDP error", "black screen", "internal error" | Unable to RDP      |
| "can't SSH", "SSH refused", "permission denied", "publickey"              | Unable to SSH      |
| "NSG", "firewall", "port blocked", "no public IP", "NIC disabled"         | Network / Firewall |
| "credentials", "password", "wrong password", "access denied"              | Credential / Auth  |
| "VM agent", "Run Command not working", "Serial Console"                   | VM Agent / Tools   |

If unclear, ask: **"Are you trying to connect via RDP (Windows) or SSH (Linux), and what error message or behavior are you seeing?"**

If the user shares an Azure VM name or resource ID, attempt to use the azure-resource-lookup skill if available. If not available, attempt to use the Azure CLI.

### Phase 2: Route to Solution

Open [references/cannot-connect-to-vm.md](references/cannot-connect-to-vm.md) and use its routing table to identify the symptom category and open the matching sub-reference for the full **Symptoms β†’ Solutions** table and Quick Commands.

If additional details are needed to narrow to a specific solution row, ask the user. For example:
- "What error message do you see in the RDP dialog?"
- "Does the connection time out, or do you get an error immediately?"
- "Is this a Windows or Linux VM?"

### Phase 2.5: Pre-Flight Safety Checks (Before Extension-Backed Operations)

> ⚠️ **Warning:** This phase is **mandatory** before running any command that depends on the VM agent or extensions. Skipping these checks can deadlock the VM and require manual portal recovery.

**Extension-backed commands include:** `az vm user update`, `az vm user reset-ssh`, `az vm user reset-remote-desktop`, `az vm run-command invoke`, and any operation that installs or invokes a VM extension.

Run the pre-flight checks from [references/cannot-connect-to-vm.md β€” Pre-Flight Safety Checks](references/cannot-connect-to-vm.md#pre-flight-safety-checks) and evaluate:

| Check | Required Value | If Failed |
| ----- | -------------- | --------- |
| VM power state | `PowerState/running` | Start the VM first |
| VM provisioning state | `ProvisioningState/succeeded` | Do NOT run extension commands. Wait for current operation to complete, or use Serial Console / offline repair |
| VM agent status | `Ready` | Do NOT run extension commands. Use Serial Console or offline repair instead |
| Existing extensions | No extensions in `Creating`, `Updating`, or `Deleting` state | Do NOT add new extensions. Wait for completion, remove stuck extensions via Portal, or use Serial Console |

> πŸ’‘ **Tip:** If any check returns `null`, empty, or the CLI command itself errors, treat the result as **unsafe**.

**If any check fails:**
1. **Stop.** Do NOT attempt any extension-backed remediation.
2. **Inform the user** which check(s) failed and what the current state is.
3. **Suggest non-agent alternatives:** Serial Console, offline repair VM, or Portal-based actions.
4. If the state appears transient (e.g., VM just started), wait 30–60 seconds and **re-run the pre-flight checks** β€” do not run the extension command until all checks pass.

### Phase 3: Fetch Documentation

Once you've identified the specific solution row, fetch the linked Microsoft documentation URL for the latest troubleshooting guidance:

```javascript
fetch_webpage({
  urls: ["<documentation-url-from-solution-row>"],
  query: "<user's specific symptom or error message>"
})
```

This ensures the user gets current guidance even if Microsoft updates their docs.

### Phase 4: Diagnose and Respond

Combine the fetched documentation with the quick commands from the reference file to give the user a response:

1. **Explain the likely cause** based on their symptom
2. **Provide the immediate diagnostic/fix commands** from the reference file's Quick Commands section
3. **Summarize the key resolution steps** from the fetched documentation
4. **If the user is logged into Azure**, offer to run diagnostic CLI commands to confirm the root cause before applying fixes
5. **Recommend next steps** β€” what to verify after the fix, and what to do if it doesn't work

### Phase 5: Escalation (if needed)

If the symptom doesn't match any solution in the reference file, or the fix doesn't resolve the issue:

1. Check Azure Resource Health: `az vm get-instance-view --name <vm> -g <rg> --query "instanceView.statuses" -o table`
2. Offer to restart the VM (requires user approval): `az vm restart --name <vm> -g <rg>`
3. Offer to redeploy the VM (requires user approval β€” moves to new host): `az vm redeploy --name <vm> -g <rg>`
4. Fetch the comprehensive guide: [Troubleshoot RDP connections](https://learn.microsoft.com/en-us/troubleshoot/azure/virtual-machines/windows/troubleshoot-rdp-connection) or [Troubleshoot SSH connections](https://learn.microsoft.com/en-us/troubleshoot/azure/virtual-machines/linux/troubleshoot-ssh-connection)

---

## Error Handling

| Error                                  | Likely Cause                    | Action                                                                             |
| -------------------------------------- | ------------------------------- | ---------------------------------------------------------------------------------- |
| `fetch_webpage` fails or returns empty | URL may have changed            | Fall back to quick commands in reference file; suggest user check the URL manually |
| CLI command fails with "not found"     | VM name or resource group wrong | Ask user to verify VM name and resource group                                      |
| Run Command times out                  | VM agent not responding         | Route to "VM Agent Not Responding" section in reference file                       |
| Serial Console not available           | Boot diagnostics not enabled    | Run `az vm boot-diagnostics enable` first                                          |
| Password reset fails                   | VMAccess extension error        | Check reference file for VMAccess alternatives (offline reset, Serial Console)     |
| VM stuck in "Updating" after extension op | Extension deadlocked the VM agent | Do NOT add more extensions. Remove stuck extensions via Portal, then restart. See [Pre-Flight Safety Checks](references/cannot-connect-to-vm.md#pre-flight-safety-checks) |
| `VMAgentStatusCommunicationError`      | Agent not reporting status      | Do NOT run extension commands. Use Serial Console or offline repair VM             |

---

## References

- [Cannot Connect to VM β€” Symptom Router](references/cannot-connect-to-vm.md)

License (MIT)

View full license text
MIT License

Copyright 2025 (c) Microsoft Corporation.

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.