Installation

Install with CLI Recommended
gh skills-hub install azure-compute

Don't have the extension? Run gh extension install samueltauil/skills-hub first.

Download and extract to your repository:

.github/skills/azure-compute/

Extract the ZIP to .github/skills/ in your repo. The folder name must match azure-compute for Copilot to auto-discover it.

Skill Files (7)

SKILL.md 2.9 KB
---
name: azure-compute
description: "Azure VM and VMSS router for recommendations, pricing, autoscale, orchestration, and connectivity troubleshooting. WHEN: Azure VM, VMSS, scale set, recommend, compare, server, website, burstable, lightweight, VM family, workload, GPU, learning, simulation, dev/test, backend, autoscale, load balancer, Flexible orchestration, Uniform orchestration, cost estimate, connect, refused, Linux, black screen, reset password, reach VM, port 3389, NSG, troubleshoot."
license: MIT
metadata:
  author: Microsoft
  version: "2.0.0"
---

# Azure Compute Skill

Routes Azure VM requests to the appropriate workflow based on user intent.

## When to Use This Skill

Activate this skill when the user:
- Asks about Azure Virtual Machines (VMs) or VM Scale Sets (VMSS)
- Asks about choosing a VM, VM sizing, pricing, or cost estimates
- Needs a workload-based recommendation for scenarios like database, GPU, deep learning, HPC, web tier, or dev/test
- Mentions VM families, autoscale, load balancing, or Flexible versus Uniform orchestration
- Wants to troubleshoot Azure VM connectivity issues such as unreachable VMs, RDP/SSH failures, black screens, NSG/firewall issues, or credential resets
- Uses prompts like "Help me choose a VM"

## Routing

```text
User intent?
β”œβ”€ Recommend / choose / compare / price a VM or VMSS
β”‚  └─ Route to [VM Recommender](workflows/vm-recommender/vm-recommender.md)
β”‚
β”œβ”€ Can't connect / RDP / SSH / troubleshoot a VM
β”‚  └─ Route to [VM Troubleshooter](workflows/vm-troubleshooter/vm-troubleshooter.md)
β”‚
└─ Unclear
   └─ Ask: "Are you looking for a VM recommendation, or troubleshooting a connectivity issue?"
```

| Signal                                                                        | Workflow                                                           |
| ----------------------------------------------------------------------------- | ------------------------------------------------------------------ |
| "recommend VM", "which VM", "VM size", "VM pricing", "VMSS", "scale set"     | [VM Recommender](workflows/vm-recommender/vm-recommender.md)       |
| "can't connect", "RDP", "SSH", "NSG blocking", "reset password", "black screen" | [VM Troubleshooter](workflows/vm-troubleshooter/vm-troubleshooter.md) |

## Workflows

| Workflow              | Purpose                                                  | References                                                                   |
| --------------------- | -------------------------------------------------------- | ---------------------------------------------------------------------------- |
| **VM Recommender**    | Recommend VM sizes, VMSS, pricing using public APIs/docs | [vm-families](references/vm-families.md), [retail-prices-api](references/retail-prices-api.md), [vmss-guide](references/vmss-guide.md) |
| **VM Troubleshooter** | Diagnose and resolve VM connectivity failures (RDP/SSH) | [cannot-connect-to-vm](workflows/vm-troubleshooter/references/cannot-connect-to-vm.md) |
references/
retail-prices-api.md 6.3 KB
# Azure Retail Prices API Guide

The [Azure Retail Prices API](https://learn.microsoft.com/en-us/rest/api/cost-management/retail-prices/azure-retail-prices) is **unauthenticated** β€” no Azure account or subscription needed.

## Endpoint

```text
https://prices.azure.com/api/retail/prices
```

Preview version (includes savings plan rates):
```text
https://prices.azure.com/api/retail/prices?api-version=2023-01-01-preview
```

## Querying VM Prices

> **No Azure CLI command exists** for the Retail Prices API. Since the API is unauthenticated, use `curl` (bash) or `Invoke-RestMethod` (PowerShell) directly. The `az rest` command also works but adds no auth benefit.

### Basic VM price lookup

```http
GET https://prices.azure.com/api/retail/prices?$filter=serviceName eq 'Virtual Machines' and armRegionName eq 'eastus' and armSkuName eq 'Standard_D4s_v5' and priceType eq 'Consumption'
```

```bash
curl -s "https://prices.azure.com/api/retail/prices?\$filter=serviceName%20eq%20'Virtual%20Machines'%20and%20armRegionName%20eq%20'eastus'%20and%20armSkuName%20eq%20'Standard_D4s_v5'%20and%20priceType%20eq%20'Consumption'"
```

```powershell
$filter = "serviceName eq 'Virtual Machines' and armRegionName eq 'eastus' and armSkuName eq 'Standard_D4s_v5' and priceType eq 'Consumption'"
$response = Invoke-RestMethod "https://prices.azure.com/api/retail/prices?`$filter=$filter"
$response.Items | Select-Object armSkuName, retailPrice, unitOfMeasure, meterName
```

### Filter by family (all D-series in a region)

```http
GET https://prices.azure.com/api/retail/prices?$filter=serviceName eq 'Virtual Machines' and armRegionName eq 'eastus' and contains(armSkuName, 'Standard_D') and priceType eq 'Consumption'
```

```bash
curl -s "https://prices.azure.com/api/retail/prices?\$filter=serviceName%20eq%20'Virtual%20Machines'%20and%20armRegionName%20eq%20'eastus'%20and%20contains(armSkuName,%20'Standard_D')%20and%20priceType%20eq%20'Consumption'"
```

```powershell
$filter = "serviceName eq 'Virtual Machines' and armRegionName eq 'eastus' and contains(armSkuName, 'Standard_D') and priceType eq 'Consumption'"
$response = Invoke-RestMethod "https://prices.azure.com/api/retail/prices?`$filter=$filter"
$response.Items | Select-Object armSkuName, retailPrice, meterName
```

### Reservation pricing

```http
GET https://prices.azure.com/api/retail/prices?$filter=serviceName eq 'Virtual Machines' and armSkuName eq 'Standard_D4s_v5' and priceType eq 'Reservation'
```

```bash
curl -s "https://prices.azure.com/api/retail/prices?\$filter=serviceName%20eq%20'Virtual%20Machines'%20and%20armSkuName%20eq%20'Standard_D4s_v5'%20and%20priceType%20eq%20'Reservation'"
```

```powershell
$filter = "serviceName eq 'Virtual Machines' and armSkuName eq 'Standard_D4s_v5' and priceType eq 'Reservation'"
$response = Invoke-RestMethod "https://prices.azure.com/api/retail/prices?`$filter=$filter"
$response.Items | Select-Object armSkuName, retailPrice, reservationTerm, meterName
```

### Non-USD currency

Append `currencyCode` parameter:
```http
GET https://prices.azure.com/api/retail/prices?currencyCode='EUR'&$filter=serviceName eq 'Virtual Machines' and armSkuName eq 'Standard_D4s_v5'
```

```bash
curl -s "https://prices.azure.com/api/retail/prices?currencyCode='EUR'&\$filter=serviceName%20eq%20'Virtual%20Machines'%20and%20armSkuName%20eq%20'Standard_D4s_v5'"
```

```powershell
$filter = "serviceName eq 'Virtual Machines' and armSkuName eq 'Standard_D4s_v5'"
$response = Invoke-RestMethod "https://prices.azure.com/api/retail/prices?currencyCode='EUR'&`$filter=$filter"
$response.Items | Select-Object armSkuName, retailPrice, currencyCode, meterName
```

## Available Filters

| Filter          | Example Value                    | Notes                          |
| --------------- | -------------------------------- | ------------------------------ |
| `serviceName`   | `'Virtual Machines'`             | Case-sensitive in preview API  |
| `armRegionName` | `'eastus'`, `'westeurope'`       | ARM region name                |
| `armSkuName`    | `'Standard_D4s_v5'`              | Full ARM SKU name              |
| `priceType`     | `'Consumption'`, `'Reservation'` | Pay-as-you-go vs reserved      |
| `serviceFamily` | `'Compute'`                      | Broad category                 |
| `productName`   | `'Virtual Machines Dv5 Series'`  | Product line                   |
| `meterName`     | `'D4s v5'`, `'D4s v5 Spot'`      | Includes Spot and Low Priority |

> **Warning:** Filter values are **case-sensitive** in API version `2023-01-01-preview` and later.

## Response Fields

| Field                  | Description                                                        |
| ---------------------- | ------------------------------------------------------------------ |
| `armSkuName`           | ARM SKU name (e.g., `Standard_D4s_v5`)                             |
| `retailPrice`          | Microsoft retail price (USD unless overridden)                     |
| `unitOfMeasure`        | Usually `1 Hour` for VMs                                           |
| `armRegionName`        | Region code                                                        |
| `meterName`            | Human-readable meter (includes "Spot" / "Low Priority" variants)   |
| `productName`          | Product line with OS (e.g., "Virtual Machines Dv5 Series Windows") |
| `type`                 | `Consumption`, `Reservation`, or `DevTestConsumption`              |
| `reservationTerm`      | `1 Year` or `3 Years` (reservation only)                           |
| `savingsPlan`          | Array with `term` and `unitPrice` (preview API only)               |
| `isPrimaryMeterRegion` | Filter to `true` to avoid duplicate regional meters                |

## Pagination

API returns max 1,000 records per request. Follow `NextPageLink` in the response to get more:

```json
{ "NextPageLink": "https://prices.azure.com:443/api/retail/prices?$filter=...&$skip=1000" }
```

## Tips for Recommendations

1. **Filter Linux vs Windows**: `productName` contains the OS β€” e.g., `'Virtual Machines Dv5 Series'` (Linux) vs `'Virtual Machines Dv5 Series Windows'`
2. **Use `isPrimaryMeterRegion eq true`** to deduplicate
3. **Compare Consumption + Reservation + Savings Plan** for full cost picture
4. **Monthly estimate**: `retailPrice Γ— 730` (hours/month)
5. **Spot pricing**: Filter `meterName` containing `'Spot'` for discounted interruptible VMs
vm-families.md 4.8 KB
# VM Family Guide

Select a VM family by matching the user's workload to the right category. Families describe hardware intent β€” not individual SKUs.

> **Source**: [Azure VM sizes overview](https://learn.microsoft.com/en-us/azure/virtual-machines/sizes/overview)
>
> **Note:** This reference may become stale. Before making final recommendations, verify critical specifications (especially Spot VM support, newer series availability, and specific family capabilities) by fetching the relevant learn.microsoft.com documentation.

## Family Selection Table

| Workload                             | Family                | Series                             | Why                                                   |
| ------------------------------------ | --------------------- | ---------------------------------- | ----------------------------------------------------- |
| Web servers, dev/test, microservices | **General Purpose**   | D-series (Dsv5, Ddsv5, Dasv5)      | Balanced CPU:memory ratio                             |
| Burstable / intermittent loads       | **General Purpose**   | B-series (Bsv2, Basv2)             | Low baseline CPU, credits for bursts; cheapest option |
| CI/CD, batch, gaming servers         | **Compute Optimized** | F-series (Fsv2, Fasv6)             | High CPU:memory ratio                                 |
| Relational DBs, in-memory caches     | **Memory Optimized**  | E-series (Esv5, Edsv5, Easv5)      | High memory:CPU ratio                                 |
| SAP HANA, very large DBs             | **Memory Optimized**  | M-series (Msv3, Mdsv3)             | Extreme memory (up to 4 TB)                           |
| Big Data, NoSQL, data warehousing    | **Storage Optimized** | L-series (Lsv3, Lasv3)             | High disk throughput and IOPS                         |
| ML training, inference, rendering    | **GPU**               | NC-series (NCadsH100v5, NCasT4v3)  | NVIDIA GPU compute                                    |
| Large-scale AI/ML training           | **GPU**               | ND-series (ND_MI300X_v5, NDH100v5) | Multi-GPU, high memory                                |
| Virtual desktop, cloud gaming        | **GPU**               | NV-series (NVadsA10v5)             | GPU graphics/visualization                            |
| Cloud gaming, VDI (AMD GPU)          | **GPU**               | NG-series (NGadsV620v1)            | AMD Radeon GPU; cost-effective graphics               |
| Confidential workloads               | **Confidential**      | DC-series (DCasv5, DCadsv5)        | Hardware-based TEE isolation                          |
| Confidential + encrypted memory      | **Confidential**      | EC-series (ECasv5, ECadsv5)        | TEE isolation with memory encryption                  |
| CFD, weather simulation, FEA         | **HPC**               | HB/HC-series (HBv4, HBv5)          | InfiniBand, high memory bandwidth                     |
| EDA, large memory HPC                | **HPC**               | HX-series                          | Very large memory capacity                            |

## Decision Tree

```text
Workload needs GPU?
β”œβ”€ Yes β†’ training/inference? β†’ NC/ND-series
β”‚        visualization/VDI?  β†’ NV/NG-series
β”œβ”€ No
β”‚  β”œβ”€ Confidential computing? β†’ DC/EC-series
β”‚  β”œβ”€ HPC (MPI, InfiniBand)? β†’ HB/HC/HX-series
β”‚  β”œβ”€ High disk I/O (NoSQL, warehousing)? β†’ L-series
β”‚  β”œβ”€ Memory-heavy (DB, cache, SAP)?
β”‚  β”‚  β”œβ”€ Extreme (>1 TB RAM) β†’ M-series
β”‚  β”‚  └─ Standard β†’ E-series
β”‚  β”œβ”€ CPU-heavy (batch, CI/CD)? β†’ F-series
β”‚  β”œβ”€ Burstable / dev-test? β†’ B-series
β”‚  └─ Balanced / general web β†’ D-series
```

## Key Trade-offs

| Choice                    | Pro                              | Con                                            |
| ------------------------- | -------------------------------- | ---------------------------------------------- |
| B-series (burstable)      | Lowest cost                      | Throttled when credits exhausted               |
| AMD (`a` suffix) vs Intel | ~5–15% cheaper                   | Some workloads assume Intel extensions         |
| ARM (`p` suffix, Cobalt)  | Best price-performance for Linux | Windows not supported; check app compatibility |
| Previous-gen (v4, v3)     | Sometimes cheaper                | Not recommended for new deployments            |
| Spot VMs                  | Up to 90% discount               | Can be evicted with 30s notice                 |

## Naming Convention

`Standard_<Family><Subfamily?><vCPUs><Features>_<Version>`

| Letter | Meaning                   |
| ------ | ------------------------- |
| `a`    | AMD CPU                   |
| `p`    | ARM (Cobalt/Ampere) CPU   |
| `d`    | Local temp disk           |
| `s`    | Premium SSD capable       |
| `l`    | Low memory per core       |
| `i`    | Isolated (dedicated host) |
| `b`    | Block storage perf        |

Example: `Standard_D4as_v5` β†’ D-family, AMD, 4 vCPUs, premium SSD, version 5.
vmss-guide.md 6.2 KB
# VMSS Guide

Determine when to recommend a Virtual Machine Scale Set (VMSS) over a single VM, and which VMSS configuration to suggest.

> **Note:** This reference provides quick guidance but may become stale. Always verify VMSS features, limitations, and orchestration mode capabilities by fetching the latest documentation from:
> - https://learn.microsoft.com/en-us/azure/virtual-machine-scale-sets/overview
> - https://learn.microsoft.com/en-us/azure/virtual-machine-scale-sets/virtual-machine-scale-sets-autoscale-overview
> - https://learn.microsoft.com/en-us/azure/virtual-machine-scale-sets/orchestration-modes-api-comparison

## What Is a VM Scale Set?

A VMSS creates and manages a group of load-balanced, identically configured VM instances. Key capabilities:

- **Autoscale** β€” automatically add/remove instances based on metrics or schedules
- **High availability** β€” spread instances across fault domains and Availability Zones
- **Load balancing** β€” integrate with Azure Load Balancer (L4) or Application Gateway (L7)
- **Large scale** β€” up to 1,000 instances per scale set (marketplace images)
- **No extra cost** β€” you pay only for the underlying VM instances, storage, and networking

## When to Recommend VMSS vs Single VM

| Scenario                                              | Recommend | Reasoning                                   |
| ----------------------------------------------------- | --------- | ------------------------------------------- |
| Stateless web/API behind a load balancer              | VMSS      | Homogeneous fleet, autoscale on demand      |
| Batch or parallel compute jobs                        | VMSS      | Scale out for jobs, scale to zero when idle |
| Autoscale needed (CPU, queue depth, schedule)         | VMSS      | Built-in autoscale rules                    |
| Microservices with identical replicas                 | VMSS      | Consistent config, rolling updates          |
| High availability across zones (many instances)       | VMSS      | Automatic zone distribution                 |
| Single long-lived server (jumpbox, domain controller) | VM        | No scaling benefit; simpler config          |
| Unique per-instance configuration                     | VM        | Scale sets assume identical instances       |
| Quick proof of concept or dev/test                    | VM        | Faster to stand up, lower complexity        |

## Orchestration Modes

VMSS supports two orchestration modes. **Flexible** is recommended for all new workloads.

| Feature                  | Flexible (recommended) | Uniform (legacy) |
| ------------------------ | ---------------------- | ---------------- |
| Mix VM sizes in one set  | βœ… Yes | ❌ No |
| Add existing VMs to set  | βœ… Yes | ❌ No |
| Availability Zone spread | βœ… Automatic | βœ… Automatic |
| Fault domain control     | βœ… Yes | βœ… Yes |
| Max instances            | 1,000 | 1,000 |
| Spot instances           | βœ… Yes | βœ… Yes |
| Single-instance VMSS     | βœ… Yes | ❌ No |
| VM model updates         | Automatic, Manual, Rolling | Automatic, Manual, Rolling |

> **Warning:** Orchestration mode cannot be changed after creation. Always recommend Flexible unless the user has a specific Uniform requirement.

## Autoscale Patterns

| Pattern            | Trigger                                  | Example                                                      |
| ------------------ | ---------------------------------------- | ------------------------------------------------------------ |
| **Metric-based**   | CPU, memory, queue length, custom metric | Scale out when avg CPU > 70% for 5 min                       |
| **Schedule-based** | Time of day, day of week                 | Scale to 10 instances Mon–Fri 8 AM; scale down to 2 at night |
| **Combined**       | Metric + schedule together               | Baseline schedule with metric burst capacity                 |
| **Predictive**     | ML-forecasted demand (preview)           | Pre-scale before expected traffic spike                      |

### Autoscale Best Practices

- Set a **minimum instance count β‰₯ 2** for production HA
- Use a **cool-down period** (default 5 min) to avoid flapping
- Scale out aggressively, scale in conservatively (asymmetric rules)
- Monitor with [Azure Monitor autoscale diagnostics](https://learn.microsoft.com/en-us/azure/azure-monitor/autoscale/autoscale-best-practices)

## Networking

| Component               | When to Use                                                              |
| ----------------------- | ------------------------------------------------------------------------ |
| **Azure Load Balancer** | Layer-4 (TCP/UDP) traffic distribution; most common for backend services |
| **Application Gateway** | Layer-7 (HTTP/HTTPS) with TLS termination, URL routing, WAF              |
| **No load balancer**    | Batch/HPC jobs where instances pull work from a queue                    |

## Cost Estimation Tips

- VMSS itself is **free** β€” cost is the sum of per-instance VM pricing
- Estimate at **min** and **max** instance counts for autoscale budgets
- Use **Spot instances** in VMSS for up to 90% savings on interruptible workloads
- Combine with **Reservations** or **Savings Plans** on the baseline instance count

## Key VMSS Limits

| Limit                                  | Value                              |
| -------------------------------------- | ---------------------------------- |
| Max instances per scale set            | 1,000 (marketplace/gallery images) |
| Max instances (managed image)          | 600                                |
| Scale sets per subscription per region | 2,500                              |
| Scale operations concurrency           | Up to 1,000 VMs in a single batch  |

## Further Reading

- [VMSS orchestration modes](https://learn.microsoft.com/en-us/azure/virtual-machine-scale-sets/virtual-machine-scale-sets-orchestration-modes)
- [Autoscale best practices](https://learn.microsoft.com/en-us/azure/azure-monitor/autoscale/autoscale-best-practices)
- [VMSS networking](https://learn.microsoft.com/en-us/azure/virtual-machine-scale-sets/virtual-machine-scale-sets-networking)
- [VMSS Flexible portal quickstart](https://learn.microsoft.com/en-us/azure/virtual-machine-scale-sets/flexible-virtual-machine-scale-sets-portal)
workflows/vm-recommender/
vm-recommender.md 9.3 KB
# Azure VM Recommender

Recommend Azure VM sizes, VM Scale Sets (VMSS), and configurations by analyzing workload type, performance requirements, scaling needs, and budget. No Azure subscription required β€” all data comes from public Microsoft documentation and the unauthenticated Retail Prices API.

## When to Use This Skill

- User asks which Azure VM or VMSS to choose for a workload
- User needs VM size recommendations for web, database, ML, batch, HPC, or other workloads
- User wants to compare VM families, sizes, or pricing tiers
- User asks about trade-offs between VM options (cost vs performance)
- User needs a cost estimate for Azure VMs without an Azure account
- User asks whether to use a single VM or a scale set
- User needs autoscaling, high availability, or load-balanced VM recommendations
- User asks about VMSS orchestration modes (Flexible vs Uniform)

## Workflow

> Use reference files for initial filtering

> **CRITICAL: then always verify with live documentation** from learn.microsoft.com before making final recommendations. If `web_fetch` fails, use reference files as fallback but warn the user the information may be stale.

### Step 1: Gather Requirements

Ask the user for (infer when possible):

| Requirement            | Examples                                                           |
| ---------------------- | ------------------------------------------------------------------ |
| **Workload type**      | Web server, relational DB, ML training, batch processing, dev/test |
| **vCPU / RAM needs**   | "4 cores, 16 GB RAM" or "lightweight" / "heavy"                    |
| **GPU needed?**        | Yes β†’ GPU families; No β†’ general/compute/memory                    |
| **Storage needs**      | High IOPS, large temp disk, premium SSD                            |
| **Budget priority**    | Cost-sensitive, performance-first, balanced                        |
| **OS**                 | Linux or Windows (affects pricing)                                 |
| **Region**             | Affects availability and price                                     |
| **Instance count**     | Single instance, fixed count, or variable/dynamic                  |
| **Scaling needs**      | None, manual scaling, autoscale based on metrics or schedule       |
| **Availability needs** | Best-effort, fault-domain isolation, cross-zone HA                 |
| **Load balancing**     | Not needed, Azure Load Balancer (L4), Application Gateway (L7)     |

### Step 2: Determine VM vs VMSS

**Workflow:**

1. Review [VMSS Guide](../../references/vmss-guide.md) to understand when VMSS vs single VM is appropriate
2. Use the gathered requirements to decide which approach fits best
3. **REQUIRED: If recommending VMSS**, fetch current documentation to verify capabilities:
   ```bash
   web_fetch https://learn.microsoft.com/en-us/azure/virtual-machine-scale-sets/overview
   web_fetch https://learn.microsoft.com/en-us/azure/virtual-machine-scale-sets/virtual-machine-scale-sets-autoscale-overview
   ```
4. **If `web_fetch` fails**, proceed with reference file guidance but include this warning:
   > Unable to verify against latest Azure documentation. Recommendation based on reference material that may not reflect recent updates.

```text
Needs autoscaling?
β”œβ”€ Yes β†’ VMSS
β”œβ”€ No
β”‚  β”œβ”€ Multiple identical instances needed?
β”‚  β”‚  β”œβ”€ Yes β†’ VMSS
β”‚  β”‚  └─ No
β”‚  β”‚     β”œβ”€ High availability across fault domains / zones?
β”‚  β”‚     β”‚  β”œβ”€ Yes, many instances β†’ VMSS
β”‚  β”‚     β”‚  └─ Yes, 1-2 instances β†’ VM + Availability Zone
β”‚  β”‚     └─ Single instance sufficient? β†’ VM
```

| Signal                                        | Recommendation                | Why                                                                   |
| --------------------------------------------- | ----------------------------- | --------------------------------------------------------------------- |
| Autoscale on CPU, memory, or schedule         | **VMSS**                      | Built-in autoscale; no custom automation needed                       |
| Stateless web/API tier behind a load balancer | **VMSS**                      | Homogeneous fleet with automatic distribution                         |
| Batch / parallel processing across many nodes | **VMSS**                      | Scale out on demand, scale to zero when idle                          |
| Mixed VM sizes in one group                   | **VMSS (Flexible)**           | Flexible orchestration supports mixed SKUs                            |
| Single long-lived server (jumpbox, AD DC)     | **VM**                        | No scaling benefit; simpler management                                |
| Unique per-instance config required           | **VM**                        | Scale sets assume homogeneous configuration                           |
| Stateful workload, tightly-coupled cluster    | **VM** (or VMSS case-by-case) | Evaluate carefully; VMSS Flexible can work for some stateful patterns |

> **Warning:** If the user is unsure, default to **single VM** for simplicity. Recommend VMSS only when scaling, HA, or fleet management is clearly needed.

### Step 3: Select VM Family

**Workflow:**

1. Review [VM Family Guide](../../references/vm-families.md) to identify 2-3 candidate VM families that match the workload requirements
2. **REQUIRED: verify specifications** for your chosen candidates by fetching current documentation:
   ```bash
   web_fetch https://learn.microsoft.com/en-us/azure/virtual-machines/sizes/<family-category>/<series-name>
   ```
   
   Examples:
   - B-series: `https://learn.microsoft.com/en-us/azure/virtual-machines/sizes/general-purpose/b-family`
   - D-series: `https://learn.microsoft.com/en-us/azure/virtual-machines/sizes/general-purpose/ddsv5-series`
   - GPU: `https://learn.microsoft.com/en-us/azure/virtual-machines/sizes/gpu-accelerated/nc-family`

3. **If considering Spot VMs**, also fetch:
   ```bash
   web_fetch https://learn.microsoft.com/en-us/azure/virtual-machine-scale-sets/use-spot
   ```

4. **If `web_fetch` fails**, proceed with reference file guidance but include this warning:
   > Unable to verify against latest Azure documentation. Recommendation based on reference material that may not reflect recent updates or limitations (e.g., Spot VM compatibility).

This step applies to both single VMs and VMSS since scale sets use the same VM SKUs.

### Step 4: Look Up Pricing

Query the Azure Retail Prices API β€” [Retail Prices API Guide](../../references/retail-prices-api.md)

> **Tip:** VMSS has no extra charge β€” pricing is per-VM instance. Use the same VM pricing from the API and multiply by the expected instance count to estimate VMSS cost. For autoscaling workloads, estimate cost at both the minimum and maximum instance count.

### Step 5: Present Recommendations

Provide **2–3 options** with trade-offs:

| Column         | Purpose                                         |
| -------------- | ----------------------------------------------- |
| Hosting Model  | VM or VMSS (with orchestration mode if VMSS)    |
| VM Size        | ARM SKU name (e.g., `Standard_D4s_v5`)          |
| vCPUs / RAM    | Core specs                                      |
| Instance Count | 1 for VM; min–max range for VMSS with autoscale |
| Estimated $/hr | Per-instance pay-as-you-go from API             |
| Why            | Fit for the workload                            |
| Trade-off      | What the user gives up                          |

> **Tip:** Always explain *why* a family fits and what the user trades off (cost vs cores, burstable vs dedicated, single VM simplicity vs VMSS scalability, etc.).

For VMSS recommendations, also mention:
- Recommended orchestration mode (Flexible for most new workloads)
- Autoscale strategy (metric-based, schedule-based, or both)
- Load balancer type (Azure Load Balancer for L4, Application Gateway for L7/TLS)

### Step 6: Offer Next Steps

- Compare reservation / savings plan pricing (query API with `priceType eq 'Reservation'`)
- Suggest [Azure Pricing Calculator](https://azure.microsoft.com/pricing/calculator/) for full estimates
- For VMSS: suggest reviewing [autoscale best practices](https://learn.microsoft.com/en-us/azure/azure-monitor/autoscale/autoscale-best-practices) and [VMSS networking](https://learn.microsoft.com/en-us/azure/virtual-machine-scale-sets/virtual-machine-scale-sets-networking)

## Error Handling

| Scenario                        | Action                                                                         |
| ------------------------------- | ------------------------------------------------------------------------------ |
| API returns empty results       | Broaden filters β€” check `armRegionName`, `serviceName`, `armSkuName` spelling  |
| User unsure of workload type    | Ask clarifying questions; default to General Purpose D-series                  |
| Region not specified            | Use `eastus` as default; note prices vary by region                            |
| Unclear if VM or VMSS needed    | Ask about scaling and instance count; default to single VM if unsure           |
| User asks VMSS pricing directly | Use same VM pricing API β€” VMSS has no extra charge; multiply by instance count |

## References

- [VM Family Guide](../../references/vm-families.md) β€” Family-to-workload mapping and selection
- [Retail Prices API Guide](../../references/retail-prices-api.md) β€” Query patterns, filters, and examples
- [VMSS Guide](../../references/vmss-guide.md) β€” When to use VMSS, orchestration modes, and autoscale patterns
workflows/vm-troubleshooter/references/
cannot-connect-to-vm.md 28.5 KB
# Cannot Connect to VM

When to use this reference file: when the user is facing connectivity issues with their Azure VM, such as:

- Unable to RDP/SSH into the VM
- Network connectivity problems (NSG rules, firewall, routing)
- VM agent not responding
- Credential or authentication errors
- Black screen or RDP disconnections
- RDP service or configuration issues

## Workflow

1. Identify the specific connectivity issue from the symptom categories below
2. Narrow down to a specific solution item
3. Fetch the relevant troubleshooting URL for the latest guidance
4. Summarize the key steps to diagnose and resolve, referencing the official documentation

---

## Unable to RDP into the VM

User is trying to RDP into a Windows VM but the connection fails (timeout, refused, or error dialog).

### Symptoms β†’ Solutions

| Symptom                                                                     | Solution                                                   | Documentation                                                                                                                                                                            |
| --------------------------------------------------------------------------- | ---------------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| Connection times out, no response at all                                    | NSG missing allow rule for port 3389                       | [NSG blocking RDP](https://learn.microsoft.com/en-us/troubleshoot/azure/virtual-machines/windows/troubleshoot-rdp-nsg-problem)                                                           |
| Connection times out, NSG rules look correct                                | Guest OS firewall is blocking inbound RDP                  | [Guest OS firewall blocking](https://learn.microsoft.com/en-us/troubleshoot/azure/virtual-machines/windows/guest-os-firewall-blocking-inbound-traffic)                                   |
| "Your credentials did not work"                                             | Wrong password or username format                          | [Credentials error](https://learn.microsoft.com/en-us/troubleshoot/azure/virtual-machines/windows/troubleshoot-specific-rdp-errors#windows-security-error-your-credentials-did-not-work) |
| "An internal error has occurred"                                            | RDP service, TLS certificate, or security layer issue      | [RDP internal error](https://learn.microsoft.com/en-us/troubleshoot/azure/virtual-machines/windows/troubleshoot-rdp-internal-error)                                                      |
| Black screen after login                                                    | Explorer.exe crash, GPU driver, or GPO stuck               | [Black screen troubleshooting](https://learn.microsoft.com/en-us/troubleshoot/azure/virtual-machines/windows/troubleshoot-rdp-black-screen)                                              |
| "No Remote Desktop License Servers available"                               | RDS licensing grace period expired                         | [Specific RDP errors](https://learn.microsoft.com/en-us/troubleshoot/azure/virtual-machines/windows/troubleshoot-specific-rdp-errors#rdplicense)                                         |
| "Remote Desktop can't find the computer"                                    | VM has no public IP, DNS issue, or VM is deallocated       | [Specific RDP errors](https://learn.microsoft.com/en-us/troubleshoot/azure/virtual-machines/windows/troubleshoot-specific-rdp-errors#rdpname)                                            |
| "An authentication error has occurred / LSA"                                | NLA/CredSSP mismatch, clock skew, or wrong username format | [Specific RDP errors](https://learn.microsoft.com/en-us/troubleshoot/azure/virtual-machines/windows/troubleshoot-specific-rdp-errors#rdpauth)                                            |
| "Remote Desktop can't connect to the remote computer"                       | Generic β€” multiple possible causes                         | [Specific RDP errors](https://learn.microsoft.com/en-us/troubleshoot/azure/virtual-machines/windows/troubleshoot-specific-rdp-errors#rdpconnect)                                         |
| "Because of a security error"                                               | TLS certificate or version mismatch                        | [RDP general error](https://learn.microsoft.com/en-us/troubleshoot/azure/virtual-machines/windows/troubleshoot-rdp-general-error)                                                        |
| RDP connects then disconnects immediately                                   | Session limits, idle timeout, or resource exhaustion       | [RDP disconnections](https://learn.microsoft.com/en-us/troubleshoot/azure/virtual-machines/windows/troubleshoot-rdp-connection)                                                          |
| Works from some IPs but not others                                          | NSG source IP restriction too narrow                       | [NSG blocking RDP](https://learn.microsoft.com/en-us/troubleshoot/azure/virtual-machines/windows/troubleshoot-rdp-nsg-problem)                                                           |
| Event log shows specific RDP error Event IDs                                | Match Event ID to known cause (e.g., 1058, 36870)          | [RDP issues by Event ID](https://learn.microsoft.com/en-us/troubleshoot/azure/virtual-machines/windows/event-id-troubleshoot-vm-rdp-connecton)                                           |
| "Authentication error has occurred" / "function requested is not supported" | CredSSP, NLA, or certificate issue                         | [Authentication errors on RDP](https://learn.microsoft.com/en-us/troubleshoot/azure/virtual-machines/windows/cannot-connect-rdp-azure-vm)                                                |
| Guest NIC is disabled inside the VM                                         | Enable NIC via Run Command or Serial Console               | [Troubleshoot RDP β€” NIC disabled](https://learn.microsoft.com/en-us/troubleshoot/azure/virtual-machines/windows/troubleshoot-rdp-nic-disabled)                                           |

### Quick Commands β€” RDP

```bash
# Check VM power state
az vm get-instance-view --name <vm-name> -g <resource-group> \
  --query "instanceView.statuses[1].displayStatus" -o tsv

# Check NSG rules
az network nsg rule list --nsg-name <nsg-name> -g <resource-group> -o table

# Reset RDP configuration to defaults (re-enables RDP, resets port, restarts TermService)
az vm user reset-remote-desktop --name <vm-name> -g <resource-group>

# Reset VM password
az vm user update --name <vm-name> -g <resource-group> -u <username> -p '<new-password>'

# IP Flow Verify β€” test if NSG allows traffic
az network watcher test-ip-flow --direction Inbound --protocol TCP \
  --local <vm-private-ip>:3389 --remote <your-public-ip>:* \
  --vm <vm-name> -g <resource-group>
```

### General RDP Troubleshooting

If the symptom doesn't match a specific row above, follow Microsoft's systematic approach:

- [Troubleshoot RDP connections to an Azure VM](https://learn.microsoft.com/en-us/troubleshoot/azure/virtual-machines/windows/troubleshoot-rdp-connection)
- [Detailed RDP troubleshooting steps](https://learn.microsoft.com/en-us/troubleshoot/azure/virtual-machines/windows/detailed-troubleshoot-rdp)

---

## Unable to SSH into the VM

User is trying to SSH into a Linux VM but the connection fails.

### Symptoms β†’ Solutions

| Symptom                                           | Solution                                                                                   | Documentation                                                                                                                                    |
| ------------------------------------------------- | ------------------------------------------------------------------------------------------ | ------------------------------------------------------------------------------------------------------------------------------------------------ |
| "Connection refused" on port 22                   | SSH service not running or listening on a different port                                   | [Troubleshoot SSH connection](https://learn.microsoft.com/en-us/troubleshoot/azure/virtual-machines/linux/troubleshoot-ssh-connection)           |
| "Connection timed out"                            | NSG blocking port 22, VM not running, or no public IP                                      | [Troubleshoot SSH connection](https://learn.microsoft.com/en-us/troubleshoot/azure/virtual-machines/linux/troubleshoot-ssh-connection)           |
| "Permission denied (publickey)"                   | Wrong SSH key, wrong user, or key not in authorized_keys                                   | [Detailed SSH troubleshooting](https://learn.microsoft.com/en-us/troubleshoot/azure/virtual-machines/linux/detailed-troubleshoot-ssh-connection) |
| "Permission denied (password)"                    | Wrong password or password auth disabled in sshd_config                                    | [Detailed SSH troubleshooting](https://learn.microsoft.com/en-us/troubleshoot/azure/virtual-machines/linux/detailed-troubleshoot-ssh-connection) |
| "Host key verification failed"                    | VM was redeployed and got a new host key                                                   | Remove old entry from `~/.ssh/known_hosts`                                                                                                       |
| "Server unexpectedly closed connection"           | Disk full, SSH config error, or PAM issue                                                  | [Detailed SSH troubleshooting](https://learn.microsoft.com/en-us/troubleshoot/azure/virtual-machines/linux/detailed-troubleshoot-ssh-connection) |
| SSH hangs with no response                        | Firewall (iptables/firewalld), routing, or NIC issue                                       | [Troubleshoot SSH connection](https://learn.microsoft.com/en-us/troubleshoot/azure/virtual-machines/linux/troubleshoot-ssh-connection)           |
| Cannot SSH into Debian Linux VM                   | Debian-specific network or sshd config issue                                               | [Cannot connect to Debian Linux VM](https://learn.microsoft.com/en-us/troubleshoot/azure/virtual-machines/linux/cannot-connect-debian-linux)     |
| SSH blocked after SELinux policy change           | SELinux misconfigured β€” blocking sshd                                                      | [SELinux troubleshooting](https://learn.microsoft.com/en-us/troubleshoot/azure/virtual-machines/linux/linux-selinux-troubleshooting)             |
| "Permission denied" with Entra ID (AAD) SSH login | Missing role assignment: Virtual Machine Administrator Login or Virtual Machine User Login | [Troubleshoot SSH connection](https://learn.microsoft.com/en-us/troubleshoot/azure/virtual-machines/linux/troubleshoot-ssh-connection)           |
| Linux VM not booting β€” UEFI boot failure          | Gen2 VM UEFI boot issue preventing SSH                                                     | [Linux VM UEFI boot failures](https://learn.microsoft.com/en-us/troubleshoot/azure/virtual-machines/linux/azure-linux-vm-uefi-boot-failures)     |

### Quick Commands β€” SSH

```bash
# Reset SSH configuration to defaults (resets sshd_config, restarts sshd)
az vm user reset-ssh --name <vm-name> -g <resource-group>

# Reset SSH public key for a user
az vm user update --name <vm-name> -g <resource-group> \
  -u <username> --ssh-key-value "<ssh-public-key>"

# Reset password for Linux VM
az vm user update --name <vm-name> -g <resource-group> \
  -u <username> -p '<new-password>'

# Check if sshd is running via Run Command
az vm run-command invoke --name <vm-name> -g <resource-group> \
  --command-id RunShellScript --scripts "systemctl status sshd"

# Check SELinux status via Run Command
az vm run-command invoke --name <vm-name> -g <resource-group> \
  --command-id RunShellScript --scripts "getenforce"

# Set SELinux to permissive mode (temporary β€” survives until reboot)
az vm run-command invoke --name <vm-name> -g <resource-group> \
  --command-id RunShellScript --scripts "setenforce 0"
```

### General SSH Troubleshooting

If the symptom doesn't match a specific row above, follow Microsoft's systematic approach:

- [Troubleshoot SSH connections to an Azure Linux VM](https://learn.microsoft.com/en-us/troubleshoot/azure/virtual-machines/linux/troubleshoot-ssh-connection)
- [Detailed SSH troubleshooting steps](https://learn.microsoft.com/en-us/troubleshoot/azure/virtual-machines/linux/detailed-troubleshoot-ssh-connection)

---

## Network Connectivity Problems

User's VM is running but unreachable due to network-level issues (NSG, routing, NIC, DNS).

### Symptoms β†’ Solutions

| Symptom                                   | Solution                                                | Documentation                                                                                                                                  |
| ----------------------------------------- | ------------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------- |
| NSG has no allow rule for RDP/SSH port    | Add inbound allow rule for TCP 3389 or 22               | [NSG blocking RDP](https://learn.microsoft.com/en-us/troubleshoot/azure/virtual-machines/windows/troubleshoot-rdp-nsg-problem)                 |
| NSG at both NIC and subnet level blocking | Traffic must pass both NSGs β€” check effective rules     | [Diagnose VM traffic filtering](https://learn.microsoft.com/en-us/azure/network-watcher/diagnose-vm-network-traffic-filtering-problem)         |
| Custom route (UDR) sending traffic to NVA | Check effective routes, verify NVA is forwarding        | [Diagnose VM routing](https://learn.microsoft.com/en-us/azure/network-watcher/diagnose-vm-network-routing-problem)                             |
| VM has no public IP                       | Add a public IP or connect via Azure Bastion            | [Public IP addresses](https://learn.microsoft.com/en-us/azure/virtual-network/ip-services/public-ip-addresses)                                 |
| NIC is disabled inside guest OS (Windows) | Enable NIC via Run Command or Serial Console            | [Troubleshoot RDP β€” NIC disabled](https://learn.microsoft.com/en-us/troubleshoot/azure/virtual-machines/windows/troubleshoot-rdp-nic-disabled) |
| Static IP misconfiguration inside guest   | Azure VMs should use DHCP; reset NIC to restore         | [Reset network interface](https://learn.microsoft.com/en-us/troubleshoot/azure/virtual-machines/windows/reset-network-interface)               |
| Ghost NIC after disk swap or resize       | Old NIC holds IP config, new NIC can't get IP           | [Reset network interface](https://learn.microsoft.com/en-us/troubleshoot/azure/virtual-machines/windows/reset-network-interface)               |
| DNS resolution failure                    | Check DNS server config; Azure default is 168.63.129.16 | [DHCP troubleshooting](https://learn.microsoft.com/en-us/troubleshoot/azure/virtual-machines/windows/troubleshoot-dhcp-failed-to-configure)    |

### Quick Commands β€” Network

```bash
# Reset NIC (restores DHCP, removes stale config)
az vm repair reset-nic --name <vm-name> -g <resource-group> --yes

# Check effective NSG rules on a NIC
az network nic list-effective-nsg --name <nic-name> -g <resource-group>

# Check effective routes
az network nic show-effective-route-table --name <nic-name> -g <resource-group> -o table

# Check if VM has a public IP
az vm list-ip-addresses --name <vm-name> -g <resource-group> -o table

# Test connectivity from VM to a destination
az network watcher test-connectivity --source-resource <vm-resource-id> \
  --dest-address <destination-ip> --dest-port 3389 -g <resource-group>
```

---

## Firewall Blocking Connectivity

Guest OS firewall (Windows Firewall or Linux iptables/firewalld) is blocking inbound connections even though NSG allows them.

### Symptoms β†’ Solutions

| Symptom                                           | Solution                                             | Documentation                                                                                                                                          |
| ------------------------------------------------- | ---------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------ |
| Windows Firewall blocking RDP                     | Re-enable "Remote Desktop" firewall rule group       | [Guest OS firewall blocking](https://learn.microsoft.com/en-us/troubleshoot/azure/virtual-machines/windows/guest-os-firewall-blocking-inbound-traffic) |
| Firewall policy set to BlockInboundAlways         | Reset to `blockinbound,allowoutbound` policy         | [Enable/disable firewall rule](https://learn.microsoft.com/en-us/troubleshoot/azure/virtual-machines/windows/enable-disable-firewall-rule-guest-os)    |
| Third-party AV/firewall blocking                  | Stop the third-party service, test, then reconfigure | [Guest OS firewall blocking](https://learn.microsoft.com/en-us/troubleshoot/azure/virtual-machines/windows/guest-os-firewall-blocking-inbound-traffic) |
| Linux iptables/firewalld blocking SSH             | Add allow rule for port 22                           | [Troubleshoot SSH connection](https://learn.microsoft.com/en-us/troubleshoot/azure/virtual-machines/linux/troubleshoot-ssh-connection)                 |
| Cannot access firewall settings (no connectivity) | Use offline repair VM to modify registry             | [Disable guest OS firewall offline](https://learn.microsoft.com/en-us/troubleshoot/azure/virtual-machines/windows/disable-guest-os-firewall-windows)   |

### Quick Commands β€” Firewall

```bash
# Reset RDP config (re-enables RDP, creates firewall rule for 3389)
az vm user reset-remote-desktop --name <vm-name> -g <resource-group>

# Query Windows Firewall rules via Run Command
az vm run-command invoke --name <vm-name> -g <resource-group> \
  --command-id RunPowerShellScript \
  --scripts "netsh advfirewall firewall show rule name='Remote Desktop - User Mode (TCP-In)'"

# Enable Remote Desktop firewall rule via Run Command
az vm run-command invoke --name <vm-name> -g <resource-group> \
  --command-id RunPowerShellScript \
  --scripts "netsh advfirewall firewall set rule group='Remote Desktop' new enable=yes"
```

---

## VM Agent Not Responding

Run Command and password reset depend on the VM agent. If the agent is unhealthy, alternative methods are needed.

### Symptoms β†’ Solutions

| Symptom                                                          | Solution                                                 | Documentation                                                                                                                                                       |
| ---------------------------------------------------------------- | -------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| Run Command times out                                            | VM agent may be down β€” use Serial Console instead        | [Serial Console overview](https://learn.microsoft.com/en-us/troubleshoot/azure/virtual-machines/windows/serial-console-overview)                                    |
| Password reset fails via Portal/CLI                              | VMAccess extension can't communicate β€” use offline reset | [Reset password without agent](https://learn.microsoft.com/en-us/troubleshoot/azure/virtual-machines/windows/reset-local-password-without-agent)                    |
| VM not booting (Boot Diagnostics shows BSOD/stuck)               | OS-level issue β€” use repair VM for offline fix           | [Repair VM commands](https://learn.microsoft.com/en-us/troubleshoot/azure/virtual-machines/windows/repair-windows-vm-using-azure-virtual-machine-repair-commands)   |
| VMAccess extension error on domain controller                    | VMAccess doesn't support DCs β€” use Serial Console        | [Serial Console overview](https://learn.microsoft.com/en-us/troubleshoot/azure/virtual-machines/windows/serial-console-overview)                                    |
| VM agent not responding on Linux VM                              | Use Serial Console for Linux to access the VM            | [Serial Console for Linux](https://learn.microsoft.com/en-us/troubleshoot/azure/virtual-machines/linux/serial-console-linux-overview)                               |
| Linux VM not booting (Boot Diagnostics shows kernel panic/stuck) | Use repair VM for offline Linux disk fix                 | [Repair Linux VM commands](https://learn.microsoft.com/en-us/troubleshoot/azure/virtual-machines/linux/repair-linux-vm-using-azure-virtual-machine-repair-commands) |

### Quick Commands β€” Diagnostic Tools

```bash
# Connect to Serial Console via CLI
az serial-console connect --name <vm-name> -g <resource-group>

# Enable boot diagnostics (required for Serial Console)
az vm boot-diagnostics enable --name <vm-name> -g <resource-group>

# Get boot diagnostics screenshot/log
az vm boot-diagnostics get-boot-log --name <vm-name> -g <resource-group>

# Create repair VM for offline fixes
az vm repair create --name <vm-name> -g <resource-group> \
  --repair-username repairadmin --repair-password '<password>'

# Restore after offline fix
az vm repair restore --name <vm-name> -g <resource-group>
```

---

## Credential and Authentication Errors

User can reach the VM but authentication fails.

### Symptoms β†’ Solutions

| Symptom                                                    | Solution                                                                      | Documentation                                                                                                                                 |
| ---------------------------------------------------------- | ----------------------------------------------------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------- |
| "Your credentials did not work"                            | Reset password via Portal or CLI                                              | [Reset RDP service or password](https://learn.microsoft.com/en-us/troubleshoot/azure/virtual-machines/windows/reset-rdp)                      |
| "Must change password before logging on"                   | Reset password via Portal (bypasses the requirement)                          | [Reset RDP service or password](https://learn.microsoft.com/en-us/troubleshoot/azure/virtual-machines/windows/reset-rdp)                      |
| "This user account has expired"                            | Extend account via Run Command: `net user <user> /expires:never`              | [Reset RDP service or password](https://learn.microsoft.com/en-us/troubleshoot/azure/virtual-machines/windows/reset-rdp)                      |
| "Trust relationship between workstation and domain failed" | Reset machine account or rejoin domain                                        | [Troubleshoot RDP connection](https://learn.microsoft.com/en-us/troubleshoot/azure/virtual-machines/windows/troubleshoot-rdp-connection)      |
| "Access is denied" / "Connection was denied"               | Add user to Remote Desktop Users group                                        | [Specific RDP errors](https://learn.microsoft.com/en-us/troubleshoot/azure/virtual-machines/windows/troubleshoot-specific-rdp-errors#wincred) |
| Wrong username format                                      | Use `VMNAME\user` for local, `DOMAIN\user` for domain accounts                | [Specific RDP errors](https://learn.microsoft.com/en-us/troubleshoot/azure/virtual-machines/windows/troubleshoot-specific-rdp-errors#wincred) |
| CredSSP "encryption oracle" error                          | Temporary: set AllowEncryptionOracle=2 on client; permanent: patch both sides | [CredSSP remediation](https://learn.microsoft.com/en-us/troubleshoot/azure/virtual-machines/windows/credssp-encryption-oracle-remediation)    |

### Quick Commands β€” Credentials

```bash
# Reset password
az vm user update --name <vm-name> -g <resource-group> -u <username> -p '<new-password>'

# Reset RDP configuration (also re-enables NLA)
az vm user reset-remote-desktop --name <vm-name> -g <resource-group>
```

---

## RDP Service and Configuration Issues

VM is reachable but the RDP service itself is broken or misconfigured.

### Symptoms β†’ Solutions

| Symptom                                | Solution                                                               | Documentation                                                                                                                                    |
| -------------------------------------- | ---------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------ |
| TermService not running                | Start the service and set to Automatic                                 | [Reset RDP service](https://learn.microsoft.com/en-us/troubleshoot/azure/virtual-machines/windows/reset-rdp)                                     |
| RDP port changed from 3389             | Reset port or update NSG to allow the custom port                      | [Detailed RDP troubleshooting](https://learn.microsoft.com/en-us/troubleshoot/azure/virtual-machines/windows/detailed-troubleshoot-rdp)          |
| RDP disabled (fDenyTSConnections = 1)  | Reset RDP config via CLI or Portal                                     | [Reset RDP service](https://learn.microsoft.com/en-us/troubleshoot/azure/virtual-machines/windows/reset-rdp)                                     |
| TLS/SSL certificate expired or corrupt | Delete cert and restart TermService to regenerate                      | [RDP internal error](https://learn.microsoft.com/en-us/troubleshoot/azure/virtual-machines/windows/troubleshoot-rdp-internal-error)              |
| NLA/Security Layer mismatch            | Temporarily disable NLA for recovery                                   | [RDP general error](https://learn.microsoft.com/en-us/troubleshoot/azure/virtual-machines/windows/troubleshoot-rdp-general-error)                |
| GPO overriding local RDP settings      | Check `HKLM:\SOFTWARE\Policies\Microsoft\Windows NT\Terminal Services` | [Detailed RDP troubleshooting](https://learn.microsoft.com/en-us/troubleshoot/azure/virtual-machines/windows/detailed-troubleshoot-rdp)          |
| RDS licensing expired                  | Remove RDSH role or configure license server                           | [Specific RDP errors](https://learn.microsoft.com/en-us/troubleshoot/azure/virtual-machines/windows/troubleshoot-specific-rdp-errors#rdplicense) |

### Quick Commands β€” RDP Service

```bash
# Reset all RDP configuration to defaults
az vm user reset-remote-desktop --name <vm-name> -g <resource-group>

# Check TermService status via Run Command
az vm run-command invoke --name <vm-name> -g <resource-group> \
  --command-id RunPowerShellScript --scripts "Get-Service TermService | Select-Object Status, StartType"

# Restart VM (if RDP service is unrecoverable)
az vm restart --name <vm-name> -g <resource-group>

# Redeploy VM (moves to new host β€” last resort)
az vm redeploy --name <vm-name> -g <resource-group>
```

---

## Escalation

If the issue doesn't match any symptom above, or if the documented solutions don't resolve it:

1. **Check Azure Resource Health** β€” Portal > VM > Resource health (checks for platform-level issues)
2. **Restart the VM** β€” `az vm restart --name <vm-name> -g <resource-group>`
3. **Redeploy the VM** β€” `az vm redeploy --name <vm-name> -g <resource-group>` (moves to new host)
4. **Comprehensive troubleshooting:**
   - Windows: [Troubleshoot RDP connections](https://learn.microsoft.com/en-us/troubleshoot/azure/virtual-machines/windows/troubleshoot-rdp-connection)
   - Linux: [Troubleshoot SSH connections](https://learn.microsoft.com/en-us/troubleshoot/azure/virtual-machines/linux/troubleshoot-ssh-connection)
   - Windows hub: [All Windows VM troubleshooting docs](https://learn.microsoft.com/en-us/troubleshoot/azure/virtual-machines/windows/welcome-virtual-machines-windows)
   - Linux hub: [All Linux VM troubleshooting docs](https://learn.microsoft.com/en-us/troubleshoot/azure/virtual-machines/linux/welcome-virtual-machines-linux)
workflows/vm-troubleshooter/
vm-troubleshooter.md 6.4 KB
# Azure VM Connectivity Troubleshooting

> Diagnose and resolve Azure VM connectivity failures (RDP/SSH) by identifying symptoms, routing to the right solution, fetching the latest Microsoft documentation, and guiding the user through resolution.

## Quick Reference

| Property      | Details                                                                                                           |
| ------------- | ----------------------------------------------------------------------------------------------------------------- |
| Best for      | RDP/SSH connection failures, NSG/firewall misconfig, credential resets, NIC issues                                |
| Primary tools | Azure CLI, Azure PowerShell, Serial Console, Boot Diagnostics, Run Command                                        |
| Reference     | [references/cannot-connect-to-vm.md](references/cannot-connect-to-vm.md) |

## MCP Tools

| Tool            | Purpose                                                | Parameters                                                                                                           |
| --------------- | ------------------------------------------------------ | -------------------------------------------------------------------------------------------------------------------- |
| `fetch_webpage` | Fetch latest Microsoft troubleshooting docs at runtime | `urls` (Required): Array of doc URLs from reference file; `query` (Optional): User's symptom for relevant extraction |

## Triggers

Activate this skill when user mentions:

- "can't connect to my VM" / "can't RDP" / "can't SSH"
- "RDP not working" / "SSH refused" / "connection timed out"
- "black screen" on VM
- "reset VM password" / "forgot password"
- "NSG blocking" / "firewall blocking" / "port 3389"
- "serial console" access
- "internal error" on RDP
- "VM not reachable" / "public IP not working"
- "RDP disconnects" / "session dropped"

---

## Workflow

### Phase 1: Determine User Intent

Infer the connectivity issue from the user's message. If the issue is clear, proceed to Phase 2. If ambiguous, ask **one** clarifying question:

| Signal in User Message                                                    | Inferred Category  |
| ------------------------------------------------------------------------- | ------------------ |
| "can't RDP", "RDP timeout", "RDP error", "black screen", "internal error" | Unable to RDP      |
| "can't SSH", "SSH refused", "permission denied", "publickey"              | Unable to SSH      |
| "NSG", "firewall", "port blocked", "no public IP", "NIC disabled"         | Network / Firewall |
| "credentials", "password", "wrong password", "access denied"              | Credential / Auth  |
| "VM agent", "Run Command not working", "Serial Console"                   | VM Agent / Tools   |

If unclear, ask: **"Are you trying to connect via RDP (Windows) or SSH (Linux), and what error message or behavior are you seeing?"**

If the user shares an Azure VM name or resource ID, attempt to use the azure-resource-lookup skill if available. If not available, attempt to the use the Azure CLI.

### Phase 2: Route to Solution

Open [references/cannot-connect-to-vm.md](references/cannot-connect-to-vm.md) and find the **Symptoms β†’ Solutions** table that matches the user's category. Narrow down to the specific row matching their symptom.

If additional details are needed to narrow to a specific solution row, ask the user. For example:
- "What error message do you see in the RDP dialog?"
- "Does the connection time out, or do you get an error immediately?"
- "Is this a Windows or Linux VM?"

### Phase 3: Fetch Documentation

Once you've identified the specific solution row, fetch the linked Microsoft documentation URL for the latest troubleshooting guidance:

```javascript
fetch_webpage({
  urls: ["<documentation-url-from-solution-row>"],
  query: "<user's specific symptom or error message>"
})
```

This ensures the user gets current guidance even if Microsoft updates their docs.

### Phase 4: Diagnose and Respond

Combine the fetched documentation with the quick commands from the reference file to give the user a response:

1. **Explain the likely cause** based on their symptom
2. **Provide the immediate diagnostic/fix commands** from the reference file's Quick Commands section
3. **Summarize the key resolution steps** from the fetched documentation
4. **If the user is logged into Azure**, offer to run diagnostic CLI commands to confirm the root cause before applying fixes
5. **Recommend next steps** β€” what to verify after the fix, and what to do if it doesn't work

### Phase 5: Escalation (if needed)

If the symptom doesn't match any solution in the reference file, or the fix doesn't resolve the issue:

1. Check Azure Resource Health: `az vm get-instance-view --name <vm> -g <rg> --query "instanceView.statuses" -o table`
2. Try restart: `az vm restart --name <vm> -g <rg>`
3. Try redeploy: `az vm redeploy --name <vm> -g <rg>`
4. Fetch the comprehensive guide: [Troubleshoot RDP connections](https://learn.microsoft.com/en-us/troubleshoot/azure/virtual-machines/windows/troubleshoot-rdp-connection) or [Troubleshoot SSH connections](https://learn.microsoft.com/en-us/troubleshoot/azure/virtual-machines/linux/troubleshoot-ssh-connection)

---

## Error Handling

| Error                                  | Likely Cause                    | Action                                                                             |
| -------------------------------------- | ------------------------------- | ---------------------------------------------------------------------------------- |
| `fetch_webpage` fails or returns empty | URL may have changed            | Fall back to quick commands in reference file; suggest user check the URL manually |
| CLI command fails with "not found"     | VM name or resource group wrong | Ask user to verify VM name and resource group                                      |
| Run Command times out                  | VM agent not responding         | Route to "VM Agent Not Responding" section in reference file                       |
| Serial Console not available           | Boot diagnostics not enabled    | Run `az vm boot-diagnostics enable` first                                          |
| Password reset fails                   | VMAccess extension error        | Check reference file for VMAccess alternatives (offline reset, Serial Console)     |

---

## References

- [Cannot Connect to VM β€” Symptom Router](references/cannot-connect-to-vm.md)

License (MIT)

View full license text
MIT License

Copyright 2025 (c) Microsoft Corporation.

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.