Installation

Install with CLI Recommended
gh skills-hub install microsoft-foundry

Don't have the extension? Run gh extension install samueltauil/skills-hub first.

Download and extract to your repository:

.github/skills/microsoft-foundry/

Extract the ZIP to .github/skills/ in your repo. The folder name must match microsoft-foundry for Copilot to auto-discover it.

Skill Files (75)

SKILL.md 11.2 KB
---
name: microsoft-foundry
description: "Deploy, evaluate, and manage Foundry agents end-to-end: Docker build, ACR push, hosted/prompt agent create, container start, batch eval, prompt optimization, prompt optimizer workflows, agent.yaml, dataset curation from traces. USE FOR: deploy agent to Foundry, hosted agent, create agent, invoke agent, evaluate agent, run batch eval, optimize prompt, improve prompt, prompt optimization, prompt optimizer, improve agent instructions, optimize agent instructions, optimize system prompt, deploy model, Foundry project, RBAC, role assignment, permissions, quota, capacity, region, troubleshoot agent, deployment failure, create dataset from traces, dataset versioning, eval trending, create AI Services, Cognitive Services, create Foundry resource, provision resource, knowledge index, agent monitoring, customize deployment, onboard, availability. DO NOT USE FOR: Azure Functions, App Service, general Azure deploy (use azure-deploy), general Azure prep (use azure-prepare)."
license: MIT
metadata:
  author: Microsoft
  version: "1.0.8"
---

# Microsoft Foundry Skill

This skill helps developers work with Microsoft Foundry resources, covering model discovery and deployment, complete dev lifecycle of AI agent, evaluation workflows, and troubleshooting.

## Sub-Skills

> **MANDATORY: Before executing ANY workflow, you MUST read the corresponding sub-skill document.** Do not call MCP tools for a workflow without reading its skill document. This applies even if you already know the MCP tool parameters โ€” the skill document contains required workflow steps, pre-checks, and validation logic that must be followed. This rule applies on every new user message that triggers a different workflow, even if the skill is already loaded.

This skill includes specialized sub-skills for specific workflows. **Use these instead of the main skill when they match your task:**

| Sub-Skill | When to Use | Reference |
|-----------|-------------|-----------|
| **deploy** | Containerize, build, push to ACR, create/update/start/stop/clone agent deployments | [deploy](foundry-agent/deploy/deploy.md) |
| **invoke** | Send messages to an agent, single or multi-turn conversations | [invoke](foundry-agent/invoke/invoke.md) |
| **observe** | Evaluate agent quality, run batch evals, analyze failures, optimize prompts, improve agent instructions, compare versions, and set up CI/CD monitoring | [observe](foundry-agent/observe/observe.md) |
| **trace** | Query traces, analyze latency/failures, correlate eval results to specific responses via App Insights `customEvents` | [trace](foundry-agent/trace/trace.md) |
| **troubleshoot** | View container logs, query telemetry, diagnose failures | [troubleshoot](foundry-agent/troubleshoot/troubleshoot.md) |
| **create** | Create new hosted agent applications. Supports Microsoft Agent Framework, LangGraph, or custom frameworks in Python or C#. Downloads starter samples from foundry-samples repo. | [create](foundry-agent/create/create.md) |
| **eval-datasets** | Harvest production traces into evaluation datasets, manage dataset versions and splits, track evaluation metrics over time, detect regressions, and maintain full lineage from trace to deployment. Use for: create dataset from traces, dataset versioning, evaluation trending, regression detection, dataset comparison, eval lineage. | [eval-datasets](foundry-agent/eval-datasets/eval-datasets.md) |
| **project/create** | Creating a new Azure AI Foundry project for hosting agents and models. Use when onboarding to Foundry or setting up new infrastructure. | [project/create/create-foundry-project.md](project/create/create-foundry-project.md) |
| **resource/create** | Creating Azure AI Services multi-service resource (Foundry resource) using Azure CLI. Use when manually provisioning AI Services resources with granular control. | [resource/create/create-foundry-resource.md](resource/create/create-foundry-resource.md) |
| **models/deploy-model** | Unified model deployment with intelligent routing. Handles quick preset deployments, fully customized deployments (version/SKU/capacity/RAI), and capacity discovery across regions. Routes to sub-skills: `preset` (quick deploy), `customize` (full control), `capacity` (find availability). | [models/deploy-model/SKILL.md](models/deploy-model/SKILL.md) |
| **quota** | Managing quotas and capacity for Microsoft Foundry resources. Use when checking quota usage, troubleshooting deployment failures due to insufficient quota, requesting quota increases, or planning capacity. | [quota/quota.md](quota/quota.md) |
| **rbac** | Managing RBAC permissions, role assignments, managed identities, and service principals for Microsoft Foundry resources. Use for access control, auditing permissions, and CI/CD setup. | [rbac/rbac.md](rbac/rbac.md) |

> ๐Ÿ’ก **Tip:** For a complete onboarding flow: `project/create` โ†’ agent workflows (`deploy` โ†’ `invoke`).

> ๐Ÿ’ก **Model Deployment:** Use `models/deploy-model` for all deployment scenarios โ€” it intelligently routes between quick preset deployment, customized deployment with full control, and capacity discovery across regions.

> ๐Ÿ’ก **Prompt Optimization:** For requests like "optimize my prompt" or "improve my agent instructions," load [observe](foundry-agent/observe/observe.md) and use the `prompt_optimize` MCP tool through that eval-driven workflow.

## Agent Development Lifecycle

Match user intent to the correct workflow. Read each sub-skill in order before executing.

| User Intent | Workflow (read in order) |
|-------------|------------------------|
| Create a new agent from scratch | [create](foundry-agent/create/create.md) โ†’ [deploy](foundry-agent/deploy/deploy.md) โ†’ [invoke](foundry-agent/invoke/invoke.md) |
| Deploy an agent (code already exists) | deploy โ†’ invoke |
| Update/redeploy an agent after code changes | deploy โ†’ invoke |
| Invoke/test/chat with an agent | invoke |
| Optimize / improve agent prompt or instructions | observe (Step 4: Optimize) |
| Evaluate and optimize agent (full loop) | observe |
| Troubleshoot an agent issue | invoke โ†’ troubleshoot |
| Fix a broken agent (troubleshoot + redeploy) | invoke โ†’ troubleshoot โ†’ apply fixes โ†’ deploy โ†’ invoke |
| Start/stop agent container | deploy |

## Agent: .foundry Workspace Standard

Every agent source folder should keep Foundry-specific state under `.foundry/`:

```text
<agent-root>/
  .foundry/
    agent-metadata.yaml
    datasets/
    evaluators/
    results/
```

- `agent-metadata.yaml` is the required source of truth for environment-specific project settings, agent names, registry details, and evaluation test cases.
- `datasets/` and `evaluators/` are local cache folders. Reuse them when they are current, and ask before refreshing or overwriting them.
- See [Agent Metadata Contract](references/agent-metadata-contract.md) for the canonical schema and workflow rules.

## Agent: Setup References

- [Standard Agent Setup](references/standard-agent-setup.md) - Standard capability-host setup with customer-managed data, search, and AI Services resources.
- [Private Network Standard Agent Setup](references/private-network-standard-agent-setup.md) - Standard setup with VNet isolation and private endpoints.

## Agent: Project Context Resolution

Agent skills should run this step **only when they need configuration values they don't already have**. If a value (for example, agent root, environment, project endpoint, or agent name) is already known from the user's message or a previous skill in the same session, skip resolution for that value.

### Step 1: Discover Agent Roots

Search the workspace for `.foundry/agent-metadata.yaml`.

- **One match** โ†’ use that agent root.
- **Multiple matches** โ†’ require the user to choose the target agent folder.
- **No matches** โ†’ for create/deploy workflows, seed a new `.foundry/` folder during setup; for all other workflows, stop and ask the user which agent source folder to initialize.

### Step 2: Resolve Environment

Read `.foundry/agent-metadata.yaml` and resolve the environment in this order:
1. Environment explicitly named by the user
2. Environment already selected earlier in the session
3. `defaultEnvironment` from metadata

If the metadata contains multiple environments and none of the rules above selects one, prompt the user to choose. Keep the selected agent root and environment visible in every workflow summary.

### Step 3: Resolve Common Configuration

Use the selected environment in `agent-metadata.yaml` as the primary source:

| Metadata Field | Resolves To | Used By |
|----------------|-------------|---------|
| `environments.<env>.projectEndpoint` | Project endpoint | deploy, invoke, observe, trace, troubleshoot |
| `environments.<env>.agentName` | Agent name | invoke, observe, trace, troubleshoot |
| `environments.<env>.azureContainerRegistry` | ACR registry name / image URL prefix | deploy |
| `environments.<env>.testCases[]` | Dataset + evaluator + threshold bundles | observe, eval-datasets |

### Step 4: Bootstrap Missing Metadata (Create/Deploy Only)

If create/deploy is initializing a new `.foundry` workspace and metadata fields are still missing, check if `azure.yaml` exists in the project root. If found, run `azd env get-values` and use it to seed `agent-metadata.yaml` before continuing.

| azd Variable | Seeds |
|-------------|-------|
| `AZURE_AI_PROJECT_ENDPOINT` or `AZURE_AIPROJECT_ENDPOINT` | `environments.<env>.projectEndpoint` |
| `AZURE_CONTAINER_REGISTRY_NAME` or `AZURE_CONTAINER_REGISTRY_ENDPOINT` | `environments.<env>.azureContainerRegistry` |
| `AZURE_SUBSCRIPTION_ID` | Azure subscription for trace/troubleshoot lookups |

### Step 5: Collect Missing Values

Use the `ask_user` or `askQuestions` tool **only for values not resolved** from the user's message, session context, metadata, or azd bootstrap. Common values skills may need:
- **Agent root** โ€” Target folder containing `.foundry/agent-metadata.yaml`
- **Environment** โ€” `dev`, `prod`, or another environment key from metadata
- **Project endpoint** โ€” AI Foundry project endpoint URL
- **Agent name** โ€” Name of the target agent

> ๐Ÿ’ก **Tip:** If the user already provides the agent path, environment, project endpoint, or agent name, extract it directly โ€” do not ask again.

## Agent: Agent Types

All agent skills support two agent types:

| Type | Kind | Description |
|------|------|-------------|
| **Prompt** | `"prompt"` | LLM-based agents backed by a model deployment |
| **Hosted** | `"hosted"` | Container-based agents running custom code |

Use `agent_get` MCP tool to determine an agent's type when needed.

## Tool Usage Conventions

- Use the `ask_user` or `askQuestions` tool whenever collecting information from the user
- Use the `task` or `runSubagent` tool to delegate long-running or independent sub-tasks (e.g., env var scanning, status polling, Dockerfile generation)
- Prefer Azure MCP tools over direct CLI commands when available
- Reference official Microsoft documentation URLs instead of embedding CLI command syntax

## Additional Resources

- [Foundry Hosted Agents](https://learn.microsoft.com/azure/ai-foundry/agents/concepts/hosted-agents?view=foundry)
- [Foundry Agent Runtime Components](https://learn.microsoft.com/azure/ai-foundry/agents/concepts/runtime-components?view=foundry)
- [Foundry Samples](https://github.com/azure-ai-foundry/foundry-samples)

## SDK Quick Reference

- [Python](references/sdk/foundry-sdk-py.md)
foundry-agent/create/
create-prompt.md 3.9 KB
# Create Prompt Agent

Create and manage prompt agents in Azure Foundry Agent Service using MCP tools or Python SDK. For hosted agents (container-based), see [create.md](create.md).

## Quick Reference

| Property | Value |
|----------|-------|
| **Agent Type** | Prompt (`kind: "prompt"`) |
| **Primary Tool** | Foundry MCP server (`foundry_agents_*`) |
| **Fallback SDK** | `azure-ai-projects` v2.x preview |
| **Auth** | `DefaultAzureCredential` / `az login` |

## Workflow

```
User Request (create/list/get/update/delete agent)
    โ”‚
    โ–ผ
Step 1: Resolve project context (endpoint + credentials)
    โ”‚
    โ–ผ
Step 2: Try MCP tool for the operation
    โ”‚  โ”œโ”€ โœ… MCP available โ†’ Execute via MCP tool โ†’ Done
    โ”‚  โ””โ”€ โŒ MCP unavailable โ†’ Continue to Step 3
    โ”‚
    โ–ผ
Step 3: Fall back to SDK
    โ”‚  Read references/sdk-operations.md for code
    โ”‚
    โ–ผ
Step 4: Execute and confirm result
```

### Step 1: Resolve Project Context

The user needs a Foundry project endpoint. Check for:

1. `PROJECT_ENDPOINT` environment variable
2. Ask the user for their project endpoint
3. Use `foundry_resource_get` MCP tool to discover it

Endpoint format: `https://<resource>.services.ai.azure.com/api/projects/<project>`

### Step 2: Create Agent (MCP โ€” Preferred)

For a **prompt agent**:
- Provide: agent name, model deployment name, instructions
- Optional: tools (code interpreter, file search, function calling, web search, Bing grounding, memory)

For a **workflow**:
- Workflows are created in the Foundry portal visual builder
- Use MCP to create the individual agents that participate in the workflow
- Direct the user to the Foundry portal for workflow assembly

### Step 3: SDK Fallback

If MCP tools are unavailable, use the `azure-ai-projects` SDK:
- See [SDK Operations](references/sdk-operations.md) for create, list, update, delete code samples
- See [Agent Tools](references/agent-tools.md) for adding tools to agents

### Step 4: Add Tools (Optional)

> โš ๏ธ **MANDATORY:** Before configuring any tool, **read its reference documentation** linked below to understand prerequisites, required parameters, and setup steps. Do not attempt to add a tool without first reviewing its reference.

| Tool Category | Reference |
|---------------|-----------|
| Code Interpreter, Function Calling | [Simple Tools](references/agent-tools.md) |
| File Search (requires vector store) | [File Search](references/tool-file-search.md) |
| Web Search (default, no setup needed) | [Web Search](references/tool-web-search.md) |
| Bing Grounding (explicit request only) | [Bing Grounding](references/tool-bing-grounding.md) |
| Azure AI Search (private data) | [Azure AI Search](references/tool-azure-ai-search.md) |
| MCP Servers | [MCP Tool](references/tool-mcp.md) |
| Memory (persistent across sessions) | [Memory](references/tool-memory.md) |
| Connections (for tools that need them) | [Project Connections](../../project/connections.md) |

> โš ๏ธ **Web Search Default:** Use `WebSearchPreviewTool` for web search. Only use `BingGroundingAgentTool` when the user explicitly requests Bing Grounding.

## Error Handling

| Error | Cause | Resolution |
|-------|-------|------------|
| Agent creation fails | Missing model deployment | Deploy a model first via `foundry_models_deploy` or portal |
| MCP tool not found | MCP server not running | Fall back to SDK โ€” see [SDK Operations](references/sdk-operations.md) |
| Permission denied | Insufficient RBAC | Need `Azure AI User` role on the project |
| Agent name conflict | Name already exists | Use a unique name or update the existing agent |
| Tool not available | Tool not configured for project | Verify tool prerequisites (e.g., Bing resource for grounding) |
| SDK version mismatch | Using 1.x instead of 2.x | Install `azure-ai-projects --pre` for v2.x preview |
| Tenant mismatch | MCP token tenant differs from resource tenant | Fall back to SDK โ€” `DefaultAzureCredential` resolves the correct tenant |
create.md 11.8 KB
# Create Hosted Agent Application

Create new hosted agent applications for Microsoft Foundry, or convert existing agent projects to be Foundry-compatible using the hosting adapter.

## Quick Reference

| Property | Value |
|----------|-------|
| **Samples Repo** | `microsoft-foundry/foundry-samples` |
| **Python Samples** | `samples/python/hosted-agents/{framework}/` |
| **C# Samples** | `samples/csharp/hosted-agents/{framework}/` |
| **Hosted Agents Docs** | https://learn.microsoft.com/azure/ai-foundry/agents/concepts/hosted-agents |
| **Best For** | Creating new or converting existing agent projects for Foundry |

## When to Use This Skill

- Create a new hosted agent application from scratch (greenfield)
- Start from an official sample and customize it
- Convert an existing agent project to be Foundry-compatible (brownfield)
- Help user choose a framework or sample for their agent

## Workflow

### Step 1: Determine Scenario

Check the user's workspace for existing agent project indicators:

- **No agent-related code found** โ†’ **Greenfield**. Proceed to Greenfield Workflow (Step 2).
- **Existing agent code present** โ†’ **Brownfield**. Proceed to Brownfield Workflow.

### Step 2: Gather Requirements (Greenfield)

If the user hasn't already specified, use `ask_user` to collect:

**Framework:**

| Framework | Python Path | C# Path |
|-----------|------------|---------|
| Microsoft Agent Framework (default) | `agent-framework` | `AgentFramework` |
| LangGraph | `langgraph` | โŒ Python only |
| Custom | `custom` | `AgentWithCustomFramework` |

**Language:** Python (default) or C#.

> โš ๏ธ **Warning:** LangGraph is Python-only. For C# + LangGraph, suggest Agent Framework or Custom instead.

If user has no specific preference, suggest Microsoft Agent Framework + Python as defaults.

### Step 3: Browse and Select Sample

List available samples using the GitHub API:

```
GET https://api.github.com/repos/microsoft-foundry/foundry-samples/contents/samples/{language}/hosted-agents/{framework}
```

If the user has specified any information on what they want their agent to do, just choose the most relevant or most simple sample to start with. Only if user has not given any preferences, present the sample directories to the user and help them choose based on their requirements (e.g., RAG, tools, multi-agent workflows, HITL).

### Step 4: Download Sample Files

Download only the selected sample directory โ€” do NOT clone the entire repo. Preserve the directory structure by creating subdirectories as needed.

**Using `gh` CLI (preferred if available):**
```bash
gh api repos/microsoft-foundry/foundry-samples/contents/samples/{language}/hosted-agents/{framework}/{sample} \
  --jq '.[] | select(.type=="file") | .download_url' | while read url; do
  filepath="${url##*/samples/{language}/hosted-agents/{framework}/{sample}/}"
  mkdir -p "$(dirname "$filepath")"
  curl -sL "$url" -o "$filepath"
done
```

**Using curl (fallback):**
```bash
curl -s "https://api.github.com/repos/microsoft-foundry/foundry-samples/contents/samples/{language}/hosted-agents/{framework}/{sample}" | \
  jq -r '.[] | select(.type=="file") | .path + "\t" + .download_url' | while IFS=$'\t' read path url; do
    relpath="${path#samples/{language}/hosted-agents/{framework}/{sample}/}"
    mkdir -p "$(dirname "$relpath")"
    curl -sL "$url" -o "$relpath"
  done
```

For nested directories, recursively fetch the GitHub contents API for entries where `type == "dir"` and repeat the download for each.

### Step 5: Customize and Implement

1. Read the sample's README.md to understand its structure
2. Read the sample code to understand patterns and dependencies used
3. If using Agent Framework, follow the best practices in [references/agentframework.md](references/agentframework.md)
4. Implement the user's specific requirements on top of the sample
5. Update configuration (`.env`, dependency files) as needed.
6. Ensure the project is in a runnable state

### Step 6: Verify Startup

1. Install dependencies (use virtual environment for Python)
2. Ask user to provide values for .env variables if placeholders were used using `ask_user` tool.
3. Run the main entrypoint
4. Fix startup errors and retry if needed
5. Send a test request to the agent. The agent will support OpenAI Responses schema.
6. Fix any errors from the test request and retry until it succeeds
7. Once startup and test request succeed, stop the server to prevent resource usage

**Guardrails:**
- โœ… Perform real run to catch startup errors
- โœ… Cleanup after verification (stop server)
- โœ… Ignore auth/connection/timeout errors (expected without Azure config)
- โŒ Don't wait for user input or create test scripts

## Brownfield Workflow: Convert Existing Agent to Hosted Agent

Use this workflow when the user has an existing agent project that needs to be made compatible with Foundry hosted agent deployment. The key requirement is wrapping the agent with the appropriate **hosting adapter** package, which converts any agent into an HTTP service compatible with the Foundry Responses API.

### Step B1: Analyze Existing Project

Scan the project to determine:

1. **Language** โ€” Python (look for `requirements.txt`, `pyproject.toml`, `*.py`) or C# (look for `*.csproj`, `*.cs`)
2. **Framework** โ€” Identify which agent framework is in use:

| Indicator | Framework |
|-----------|-----------|
| Imports from `agent_framework` or `Microsoft.Agents.AI` | Microsoft Agent Framework |
| Imports from `langgraph`, `langchain` | LangGraph |
| No recognized framework imports, or other frameworks (e.g., Semantic Kernel, AutoGen) | Custom |

3. **Entry point** โ€” Identify the main script/entrypoint that creates and runs the agent
4. **Agent object** โ€” Identify the agent instance that needs to be wrapped (e.g., a `BaseAgent` subclass, a compiled `StateGraph`, or an existing server/app)

### Step B2: Add Hosting Adapter Dependency

Add the correct adapter package based on framework and language. Get the latest version from the package registry โ€” do not hardcode versions.

**Python adapter packages:**

| Framework | Package |
|-----------|---------|
| Microsoft Agent Framework | `azure-ai-agentserver-agentframework` |
| LangGraph | `azure-ai-agentserver-langgraph` |
| Custom | `azure-ai-agentserver-core` |

**.NET adapter packages:**

| Framework | Package |
|-----------|---------|
| Microsoft Agent Framework | `Azure.AI.AgentServer.AgentFramework` |
| Custom | `Azure.AI.AgentServer.Core` |

Add the package to the project's dependency file (`requirements.txt`, `pyproject.toml`, or `.csproj`). For Python, also add `python-dotenv` if not present.

### Step B3: Wrap Agent with Hosting Adapter

Modify the project's main entrypoint to wrap the existing agent with the adapter. The approach differs by framework:

**Microsoft Agent Framework (Python):**
- Import `from_agent_framework` from the adapter package
- Pass the agent instance (a `BaseAgent` subclass) to the adapter
- Call `.run()` on the adapter as the default entrypoint
- The agent must implement both `run()` and `run_stream()` methods

**LangGraph (Python):**
- Import `from_langgraph` from the adapter package
- Pass the compiled `StateGraph` to the adapter
- Call `.run()` on the adapter as the default entrypoint

**Custom code (Python):**
- Import `FoundryCBAgent` from the core adapter package
- Create a class that extends `FoundryCBAgent`
- Implement the `agent_run()` method which receives an `AgentRunContext` and returns either an `OpenAIResponse` (non-streaming) or `AsyncGenerator[ResponseStreamEvent]` (streaming)
- The agent must handle the Foundry request/response protocol manually โ€” refer to the [custom sample](https://github.com/microsoft-foundry/foundry-samples/tree/main/samples/python/hosted-agents/custom) for the exact interface
- Instantiate and call `.run()` as the default entrypoint

**Custom code (C#):**
- Use `AgentServerApplication.RunAsync()` with dependency injection to register an `IAgentInvocation` implementation
- Refer to the [C# custom sample](https://github.com/microsoft-foundry/foundry-samples/tree/main/samples/csharp/hosted-agents/AgentWithCustomFramework) for the exact interface

> โš ๏ธ **Warning:** The adapter MUST be the default entrypoint (no flags required to start). This is required for both local debugging and containerized deployment.

### Step B4: Configure Environment

1. Create or update a `.env` file with required environment variables (project endpoint, model deployment name, etc.)
2. For Python: ensure the code uses `load_dotenv()` so Foundry-injected environment variables is available at runtime.
3. If the project uses Azure credentials: ensure Python uses `azure.identity.aio.DefaultAzureCredential` (async version) for **local development**, not `azure.identity.DefaultAzureCredential`. In production, use `ManagedIdentityCredential`. See [auth-best-practices.md](../../references/auth-best-practices.md)

### Step B5: Create agent.yaml

Create an `agent.yaml` file in the project root. This file defines the agent's metadata and deployment configuration for Foundry. Required fields:

- `name` โ€” Unique identifier (alphanumeric + hyphens, max 63 chars)
- `description` โ€” What the agent does
- `template.kind` โ€” Must be `hosted`
- `template.protocols` โ€” Must include `responses` protocol v1
- `template.environment_variables` โ€” List all environment variables the agent needs at runtime

Refer to any sample's `agent.yaml` in the [foundry-samples repo](https://github.com/microsoft-foundry/foundry-samples/tree/main/samples/python/hosted-agents) for the exact schema.

### Step B6: Create Dockerfile

Create a `Dockerfile` if one doesn't exist. Requirements:

- Base image appropriate for the language (e.g., `python:3.12-slim` for Python, `mcr.microsoft.com/dotnet/sdk` for C#)
- Copy source code into the container
- Install dependencies
- Expose port **8088** (the adapter's default port)
- Set the main entrypoint as the CMD

> โš ๏ธ **Warning:** When building, MUST use `--platform linux/amd64`. Hosted agents run on Linux AMD64 infrastructure. Images built for other architectures (e.g., ARM64 on Apple Silicon) will fail.

Refer to any sample's `Dockerfile` in the [foundry-samples repo](https://github.com/microsoft-foundry/foundry-samples/tree/main/samples/python/hosted-agents) for the exact pattern.

### Step B7: Test Locally

1. Install dependencies (use virtual environment for Python)
2. Run the main entrypoint โ€” the adapter should start an HTTP server on `localhost:8088`
3. Send a test request: `POST http://localhost:8088/responses` with body `{"input": "hello"}`
4. Verify the response follows the OpenAI Responses API format
5. Fix any errors and retry until the test request succeeds
6. Stop the server

> ๐Ÿ’ก **Tip:** If auth/connection errors occur for Azure services, that's expected without real Azure credentials configured. The key validation is that the HTTP server starts and accepts requests.

## Common Guidelines

IMPORTANT: YOU MUST FOLLOW THESE.

Apply these to both greenfield and brownfield projects:

1. **Logging** โ€” Implement proper logging using the language's standard logging framework (Python `logging` module, .NET `ILogger`). Hosted agents stream container stdout/stderr logs to Foundry, so all log output is visible via the troubleshoot workflow. Use structured log levels (INFO, WARNING, ERROR) and include context like request IDs and agent names.

2. **Framework-specific best practices** โ€” When using Agent Framework, read the [Agent Framework best practices](references/agentframework.md) for hosting adapter setup, credential patterns, and debugging guidance.

## Error Handling

| Error | Cause | Resolution |
|-------|-------|------------|
| GitHub API rate limit | Too many requests | Authenticate with `gh auth login` |
| `gh` not available | CLI not installed | Use curl REST API fallback |
| Sample not found | Path changed in repo | List parent directory to discover current samples |
| Dependency install fails | Version conflicts | Use versions from sample's own dependency file |
foundry-agent/create/references/
agent-tools.md 2.7 KB
# Agent Tools โ€” Simple Tools

Add tools to agents to extend capabilities. This file covers tools that work without external connections. For tools requiring connections/RBAC setup, see:
- [Web Search tool](tool-web-search.md) โ€” real-time public web search with citations (default for web search)
- [Bing Grounding tool](tool-bing-grounding.md) โ€” web search via dedicated Bing resource (only when explicitly requested)
- [Azure AI Search tool](tool-azure-ai-search.md) โ€” private data grounding with vector search
- [MCP tool](tool-mcp.md) โ€” remote Model Context Protocol servers

## Code Interpreter

Enables agents to write and run Python in a sandboxed environment. Supports data analysis, chart generation, and file processing. Has [additional charges](https://azure.microsoft.com/pricing/details/cognitive-services/openai-service/) beyond token-based fees.

> Sessions: 1-hour active / 30-min idle timeout. Each conversation = separate billable session.

For code samples, see: [Code Interpreter tool documentation](https://learn.microsoft.com/azure/ai-foundry/agents/how-to/tools/code-interpreter?view=foundry)

## Function Calling

Define custom functions the agent can invoke. Your app executes the function and returns results. Runs expire 10 minutes after creation โ€” return tool outputs promptly.

> **Security:** Treat tool arguments as untrusted input. Don't pass secrets in tool output. Use `strict=True` for schema validation.

For code samples, see: [Function Calling tool documentation](https://learn.microsoft.com/azure/ai-foundry/agents/how-to/tools/function-calling?view=foundry)

## Tool Summary

| Tool | Connection? | Reference |
|------|-------------|-----------|
| `CodeInterpreterTool` | No | This file |
| `FileSearchTool` | No (vector store required) | [tool-file-search.md](tool-file-search.md) |
| `FunctionTool` | No | This file |
| `WebSearchPreviewTool` | No | [tool-web-search.md](tool-web-search.md) |
| `BingGroundingAgentTool` | Yes (Bing) | [tool-bing-grounding.md](tool-bing-grounding.md) |
| `AzureAISearchAgentTool` | Yes (Search) | [tool-azure-ai-search.md](tool-azure-ai-search.md) |
| `MCPTool` | Optional | [tool-mcp.md](tool-mcp.md) |

> โš ๏ธ **Default for web search:** Use `WebSearchPreviewTool` unless the user explicitly requests Bing Grounding or Bing Custom Search.

> Combine multiple tools on one agent. The model decides which to invoke.

## References

- [Tool Catalog](https://learn.microsoft.com/azure/ai-foundry/agents/concepts/tool-catalog?view=foundry)
- [Code Interpreter](https://learn.microsoft.com/azure/ai-foundry/agents/how-to/tools/code-interpreter?view=foundry)
- [Function Calling](https://learn.microsoft.com/azure/ai-foundry/agents/how-to/tools/function-calling?view=foundry)
agentframework.md 5.1 KB
# Microsoft Agent Framework โ€” Best Practices for Hosted Agents

Best practices when building hosted agents with Microsoft Agent Framework for deployment to Foundry Agent Service.

## Official Resources

| Resource | URL |
|----------|-----|
| **GitHub Repo** | https://github.com/microsoft/agent-framework |
| **MS Learn Overview** | https://learn.microsoft.com/agent-framework/overview/agent-framework-overview |
| **Quick Start** | https://learn.microsoft.com/agent-framework/tutorials/quick-start |
| **User Guide** | https://learn.microsoft.com/agent-framework/user-guide/overview |
| **Hosted Agents Concepts** | https://learn.microsoft.com/azure/ai-foundry/agents/concepts/hosted-agents |
| **Python Samples (MAF repo)** | https://github.com/microsoft/agent-framework/tree/main/python/samples |
| **.NET Samples (MAF repo)** | https://github.com/microsoft/agent-framework/tree/main/dotnet/samples |
| **PyPI** | https://pypi.org/project/agent-framework/ |
| **NuGet** | https://www.nuget.org/profiles/MicrosoftAgentFramework/ |

## Installation

**Python:** `pip install agent-framework --pre` (installs all sub-packages)

**.NET:** `dotnet add package Microsoft.Agents.AI`

> โš ๏ธ **Warning:** Always pin specific pre-release versions. Use `--pre` to get the latest. Check the [PyPI page](https://pypi.org/project/agent-framework/) or [NuGet profile](https://www.nuget.org/profiles/MicrosoftAgentFramework/) for current stable versions.

## Hosting Adapter

Hosted agents must expose an HTTP server using the hosting adapter. This enables local testing and Foundry deployment with the same code.

**Python adapter packages:** `azure-ai-agentserver-core`, `azure-ai-agentserver-agentframework`

**.NET adapter packages:** `Azure.AI.AgentServer.Core`, `Azure.AI.AgentServer.AgentFramework`

The adapter handles protocol translation between Foundry request/response formats and your framework's native data structures, including conversation management, message serialization, and streaming.

> ๐Ÿ’ก **Tip:** Make HTTP server mode the default entrypoint (no flags needed). This simplifies both local debugging and containerized deployment.

## Key Patterns

### Python: Async Credentials

For **local development**, use `DefaultAzureCredential` from `azure.identity.aio` (not `azure.identity`) โ€” `AzureAIClient` requires async credentials. In production, use `ManagedIdentityCredential` from `azure.identity.aio`. See [auth-best-practices.md](../../../references/auth-best-practices.md).

### Python: Environment Variables

Always use `load_dotenv(override=False)` so environment variables set by Foundry at runtime take precedence over local `.env` values.

Required `.env` variables:
- `FOUNDRY_PROJECT_ENDPOINT` โ€” project endpoint URL
- `FOUNDRY_MODEL_DEPLOYMENT_NAME` โ€” model deployment name

### Authentication

If explicitly asked to use API key instead of managed identity, then use AzureOpenAIResponsesClient and pass in api_key parameter to it.

### Agent Naming Rules

Agent names must: start/end with alphanumeric characters, may contain hyphens in the middle, max 63 characters. Examples: `MyAgent`, `agent-1`. Invalid: `-agent`, `agent-`, `sample_agent`.

### Python: Virtual Environment

Always use a virtual environment. Never use bare `python` or `pip` โ€” use venv-activated versions or full paths (e.g., `.venv/bin/pip`).

## Workflow Patterns

Agent Framework supports single-agent and multi-agent workflow patterns using graph-based orchestration:

- **Single Agent** โ€” Basic agent with tools, RAG, or MCP integration
- **Multi-Agent Workflow** โ€” Graph-based orchestration connecting multiple agents and deterministic functions
- **Advanced Patterns** โ€” Reflection, switch-case, fan-out/fan-in, loop, human-in-the-loop

For workflow samples and advanced patterns, search the [Agent Framework GitHub repo](https://github.com/microsoft/agent-framework).

## Debugging

Use [AI Toolkit for VS Code](https://marketplace.visualstudio.com/items?itemName=ms-windows-ai-studio.windows-ai-studio) with the `agentdev` CLI tool for interactive debugging:

1. Install `debugpy` for VS Code Python Debugger support
2. Install `agent-dev-cli` (pre-release) for the `agentdev` command
3. Key debug tasks: `agentdev run <entrypoint>.py --port 8087` starts the agent HTTP server, `debugpy --listen 127.0.0.1:5679` attaches the debugger, and the `ai-mlstudio.openTestTool` VS Code command opens the Agent Inspector UI

For VS Code `launch.json` and `tasks.json` configuration templates, see [AI Toolkit Agent Inspector โ€” Configure debugging manually](https://github.com/microsoft/vscode-ai-toolkit/blob/main/doc/agent-test-tool.md#configure-debugging-manually).

## Common Errors

| Error | Cause | Fix |
|-------|-------|-----|
| `ModuleNotFoundError` | Missing SDK | `pip install agent-framework --pre` in venv |
| Async credential error | Wrong import | Use `azure.identity.aio.DefaultAzureCredential` (local dev) or `azure.identity.aio.ManagedIdentityCredential` (production) |
| Agent name validation error | Invalid characters | Use alphanumeric + hyphens, start/end alphanumeric, max 63 chars |
| Hosting adapter not found | Missing package | Install `azure-ai-agentserver-agentframework` |
sdk-operations.md 2.2 KB
# SDK Operations for Foundry Agent Service

Use the Foundry MCP tools for agent CRUD operations. When MCP tools are unavailable, use the `azure-ai-projects` Python SDK or REST API.

## Agent Operations via MCP

| Operation | MCP Tool | Description |
|-----------|----------|-------------|
| Create/Update agent | `agent_update` | Create a new agent or update an existing one (creates new version) |
| List/Get agents | `agent_get` | List all agents, or get a specific agent by name |
| Delete agent | `agent_delete` | Delete an agent |
| Invoke agent | `agent_invoke` | Send a message to an agent and get a response |
| Get schema | `agent_definition_schema_get` | Get the full JSON schema for agent definitions |

## SDK Agent Operations

When MCP tools are unavailable, use the `azure-ai-projects` Python SDK (`pip install azure-ai-projects --pre`):

```python
from azure.ai.projects import AIProjectClient
from azure.identity import DefaultAzureCredential

endpoint = "https://<resource>.services.ai.azure.com/api/projects/<project>"
client = AIProjectClient(endpoint=endpoint, credential=DefaultAzureCredential())
```

| Operation | SDK Method |
|-----------|------------|
| Create | `client.agents.create_version(agent_name, definition)` |
| List | `client.agents.list()` |
| Get | `client.agents.get(agent_name)` |
| Update | `client.agents.create_version(agent_name, definition)` (creates new version) |
| Delete | `client.agents.delete(agent_name)` |
| Chat | `client.get_openai_client().responses.create(model=<deployment>, input=<text>, extra_body={"agent": {"name": agent_name, "type": "agent_reference"}})` |

## Environment Variables

| Variable | Description |
|----------|-------------|
| `PROJECT_ENDPOINT` | Foundry project endpoint (`https://<resource>.services.ai.azure.com/api/projects/<project>`) |
| `MODEL_DEPLOYMENT_NAME` | Deployed model name (e.g., `gpt-4.1-mini`) |

## References

- [Agent quickstart](https://learn.microsoft.com/azure/ai-foundry/agents/quickstart?view=foundry)
- [Create agents](https://learn.microsoft.com/azure/ai-foundry/agents/how-to/create-agent?view=foundry)
- [Tool Catalog](https://learn.microsoft.com/azure/ai-foundry/agents/concepts/tool-catalog?view=foundry)
tool-azure-ai-search.md 3.3 KB
# Azure AI Search Tool

Ground agent responses with data from an Azure AI Search vector index. Requires a project connection and proper RBAC setup.

## Prerequisites

- Azure AI Search index with vector search configured:
  - One or more `Edm.String` fields (searchable + retrievable)
  - One or more `Collection(Edm.Single)` vector fields (searchable)
  - At least one retrievable text field with content for citations
  - A retrievable field with source URL for citation links
- A [project connection](../../../project/connections.md) between your Foundry project and search service
- `azure-ai-projects` package (`pip install azure-ai-projects --pre`)

## Required RBAC Roles

For **keyless authentication** (recommended), assign these roles to the **Foundry project's managed identity** on the Azure AI Search resource:

| Role | Scope | Purpose |
|------|-------|---------|
| **Search Index Data Contributor** | AI Search resource | Read/write index data |
| **Search Service Contributor** | AI Search resource | Manage search service config |

> **If RBAC assignment fails:** Ask the user to manually assign roles in Azure portal โ†’ AI Search resource โ†’ Access control (IAM). They need Owner or User Access Administrator on the search resource.

## Connection Setup

A project connection between your Foundry project and the Azure AI Search resource is required. See [Project Connections](../../../project/connections.md) for connection management via Foundry MCP tools.

## Query Types

| Value | Description |
|-------|-------------|
| `SIMPLE` | Keyword search |
| `VECTOR` | Vector similarity only |
| `SEMANTIC` | Semantic ranking |
| `VECTOR_SIMPLE_HYBRID` | Vector + keyword |
| `VECTOR_SEMANTIC_HYBRID` | Vector + keyword + semantic (default, recommended) |

## Tool Parameters

| Parameter | Required | Description |
|-----------|----------|-------------|
| `project_connection_id` | Yes | Connection ID (resolve via `project_connection_get`, typically after discovering the connection with `project_connection_list`) |
| `index_name` | Yes | Search index name |
| `top_k` | No | Number of results (default: 5) |
| `query_type` | No | Search type (default: `vector_semantic_hybrid`) |
| `filter` | No | OData filter applied to all queries |

## Limitations

- Only **one index per tool** instance. For multiple indexes, use connected agents each with their own index.
- Search resource and Foundry agent must be in the **same tenant**.
- Private AI Search resources require **standard agent deployment** with vNET injection.

## Troubleshooting

| Error | Cause | Fix |
|-------|-------|-----|
| 401/403 accessing index | Missing RBAC roles | Assign `Search Index Data Contributor` + `Search Service Contributor` to project managed identity |
| Index not found | Name mismatch | Verify `AI_SEARCH_INDEX_NAME` matches exactly (case-sensitive) |
| No citations in response | Instructions don't request them | Add citation instructions to agent prompt |
| Wrong connection endpoint | Connection points to different search resource | Re-create connection with correct endpoint |

## References

- [Azure AI Search tool documentation](https://learn.microsoft.com/azure/ai-foundry/agents/how-to/tools/azure-ai-search?view=foundry)
- [Tool Catalog](https://learn.microsoft.com/azure/ai-foundry/agents/concepts/tool-catalog?view=foundry)
- [Project Connections](../../../project/connections.md)
tool-bing-grounding.md 2.8 KB
# Bing Grounding Tool

Access real-time web information via Bing Search. Unlike the [Web Search tool](tool-web-search.md) (which works out of the box), Bing Grounding requires a dedicated Bing resource and a project connection.

> โš ๏ธ **Warning:** Use the [Web Search tool](tool-web-search.md) as the default for web search. Only use Bing Grounding when the user **explicitly** requests Grounding with Bing Search or Grounding with Bing Custom Search.

## When to Use

- User explicitly asks for "Bing Grounding" or "Grounding with Bing Search"
- User explicitly asks for "Bing Custom Search" or "Grounding with Bing Custom Search"
- User needs to restrict web search to specific domains (Bing Custom Search)
- User has an existing Bing Grounding resource they want to use

## Prerequisites

- A [Grounding with Bing Search resource](https://portal.azure.com/#create/Microsoft.BingGroundingSearch) in Azure portal
- `Contributor` or `Owner` role at subscription/RG level to create Bing resource and get keys
- `Azure AI Project Manager` role on the project to create a connection
- A project connection configured with the Bing resource key โ€” see [connections](../../../project/connections.md)

## Setup

1. Register the Bing provider: `az provider register --namespace 'Microsoft.Bing'`
2. Create a Grounding with Bing Search resource in the Azure portal
3. Create a project connection with the Bing resource key โ€” see [connections](../../../project/connections.md)
4. Set `BING_PROJECT_CONNECTION_NAME` environment variable

## Important Disclosures

- Bing data flows **outside Azure compliance boundary**
- Review [Grounding with Bing terms of use](https://www.microsoft.com/bing/apis/grounding-legal-enterprise)
- Not supported with VPN/Private Endpoints
- Usage incurs costs โ€” see [pricing](https://www.microsoft.com/bing/apis/grounding-pricing)

## Troubleshooting

| Issue | Cause | Resolution |
|-------|-------|------------|
| Connection not found | Name mismatch or wrong project | Use `project_connection_list` to find the correct `connectionName` |
| Unauthorized creating connection | Missing Azure AI Project Manager role | Assign role on the Foundry project |
| Bing resource creation fails | Provider not registered | Run `az provider register --namespace 'Microsoft.Bing'` |
| No results returned | Connection misconfigured | Verify Bing resource key and connection setup |

## References

- [Bing Grounding tool documentation](https://learn.microsoft.com/azure/ai-foundry/agents/how-to/tools/bing-grounding?view=foundry)
- [Tool Catalog](https://learn.microsoft.com/azure/ai-foundry/agents/concepts/tool-catalog?view=foundry)
- [Grounding with Bing Terms](https://www.microsoft.com/bing/apis/grounding-legal-enterprise)
- [Connections Guide](../../../project/connections.md)
- [Web Search Tool (default)](tool-web-search.md)
tool-file-search.md 2.6 KB
# File Search Tool

Enables agents to search through uploaded files using semantic and keyword search from vector stores. Supports a wide range of file formats including PDF, Markdown, Word, and more.

> โš ๏ธ **Important:** Before creating an agent with file search, you **must** read the official documentation linked in the References section to understand prerequisites, supported file types, and vector store setup.

## Prerequisites

- A [basic or standard agent environment](https://learn.microsoft.com/azure/ai-foundry/agents/environment-setup)
- A **vector store** must be created before the agent โ€” the `file_search` tool requires `vector_store_ids`
- Files must be uploaded to the vector store before the agent can search them

## Key Concepts

| Concept | Description |
|---------|-------------|
| **Vector Store** | A container that indexes uploaded files for semantic search. Must be created first. |
| **vector_store_ids** | Required parameter on the `file_search` tool โ€” references the vector store(s) to search. |
| **File upload** | Files are uploaded to the project, then attached to a vector store for indexing. |

## Setup Workflow

```
1. Create a vector store (REST API: POST /vector_stores)
   โ”‚
   โ–ผ
2. (Optional) Upload files and attach to vector store
   โ”‚
   โ–ผ
3. Create agent with file_search tool referencing the vector_store_ids
   โ”‚
   โ–ผ
4. Agent can now search files in the vector store
```

> โš ๏ธ **Warning:** Creating an agent with `file_search` without providing `vector_store_ids` will fail with a `400 BadRequest` error: `required: Required properties ["vector_store_ids"] are not present`.

## REST API Notes

When creating vector stores via `az rest`:

| Parameter | Value |
|-----------|-------|
| **Endpoint** | `https://<resource>.services.ai.azure.com/api/projects/<project>/vector_stores` |
| **API version** | `v1` |
| **Auth resource** | `https://ai.azure.com` |

## Troubleshooting

| Error | Cause | Fix |
|-------|-------|-----|
| `vector_store_ids` not present | Agent created without vector store | Create a vector store first, then pass its ID |
| 401 Unauthorized | Wrong auth resource for REST API | Use `--resource "https://ai.azure.com"` with `az rest` |
| Bad API version | Using ARM-style API version | Use `api-version=v1` for the data-plane vector store API |
| No search results | Vector store is empty | Upload files to the vector store before querying |

## References

- [File Search tool documentation](https://learn.microsoft.com/azure/ai-foundry/agents/how-to/tools/file-search?view=foundry&pivots=python)
- [Tool Catalog](https://learn.microsoft.com/azure/ai-foundry/agents/concepts/tool-catalog?view=foundry)
tool-mcp.md 3.3 KB
# MCP Tool (Model Context Protocol)

Connect agents to remote MCP servers to extend capabilities with external tools and data sources. MCP is an open standard for LLM tool integration.

## Prerequisites

- A remote MCP server endpoint (e.g., `https://api.githubcopilot.com/mcp`)
- For authenticated servers: a [project connection](../../../project/connections.md) storing credentials
- RBAC: **Contributor** or **Owner** role on the Foundry project

## Authenticated Server Connections

For authenticated MCP servers, create an `api_key` project connection to store credentials. Unauthenticated servers (public endpoints) don't need a connection โ€” omit `project_connection_id`.

See [Project Connections](../../../project/connections.md) for connection management via Foundry MCP tools.

## MCPTool Parameters

| Parameter | Required | Description |
|-----------|----------|-------------|
| `server_label` | Yes | Unique label for this MCP server within the agent |
| `server_url` | Yes | Remote MCP server endpoint URL |
| `require_approval` | No | `"always"` (default), `"never"`, or `{"never": ["tool1"]}` / `{"always": ["tool1"]}` |
| `allowed_tools` | No | List of specific tools to enable (default: all) |
| `project_connection_id` | No | Connection ID for authenticated servers |

## Approval Workflow

1. Agent sends request โ†’ MCP server returns tool calls
2. Response contains `mcp_approval_request` items
3. Your code reviews tool name + arguments
4. Submit `McpApprovalResponse` with `approve=True/False`
5. Agent completes work using approved tool results

> **Best practice:** Always use `require_approval="always"` unless you fully trust the MCP server. Use `allowed_tools` to restrict which tools the agent can access.

## Hosting Local MCP Servers

Agent Service only accepts **remote** MCP endpoints. To use a local server, deploy it to:

| Platform | Transport | Notes |
|----------|-----------|-------|
| [Azure Container Apps](https://github.com/Azure-Samples/mcp-container-ts) | HTTP POST/GET | Any language, container rebuild needed |
| [Azure Functions](https://github.com/Azure-Samples/mcp-sdk-functions-hosting-python) | HTTP streamable | Python/Node/.NET/Java, key-based auth |

## Known Limitations

- **100-second timeout** for non-streaming MCP tool calls
- **Identity passthrough not supported in Teams** โ€” agents published to Teams use project managed identity
- **Network-secured Foundry** can't use private MCP servers in same vNET โ€” only public endpoints

## Troubleshooting

| Error | Cause | Fix |
|-------|-------|-----|
| `Invalid tool schema` | `anyOf`/`allOf` in MCP server definition | Update MCP server schema to use simple types |
| `Unauthorized` / `Forbidden` | Wrong credentials in connection | Verify connection credentials match server requirements |
| Model never calls MCP tool | Misconfigured server_label/url | Check `server_label`, `server_url`, `allowed_tools` values |
| Agent stalls after approval | Missing `previous_response_id` | Include `previous_response_id` in follow-up request |
| Timeout | Server takes >100s | Optimize server-side logic or break into smaller operations |

## References

- [MCP tool documentation](https://learn.microsoft.com/azure/ai-foundry/agents/how-to/tools/mcp?view=foundry)
- [Tool Catalog](https://learn.microsoft.com/azure/ai-foundry/agents/concepts/tool-catalog?view=foundry)
- [Project Connections](../../../project/connections.md)
tool-memory.md 4.7 KB
# Agent Memory

Managed long-term memory for Foundry agents. Enables agent continuity across sessions, devices, and workflows. Agents retain user preferences, conversation history, and deliver personalized experiences. Memory is stored in your project's owned storage.

## Prerequisites

- A [Foundry project](https://learn.microsoft.com/azure/ai-foundry/how-to/create-projects) with authorization configured
- A **chat model deployment** (e.g., `gpt-5.2`)
- An **embedding model deployment** (e.g., `text-embedding-3-small`) โ€” see [Check Embedding Model](#check-embedding-model) below
- Python packages: `pip install azure-ai-projects azure-identity`

### Check Embedding Model

An embedding model is **required** before enabling memory. Check if one is already deployed:

Use `foundry_models_list` MCP tool to list all deployments and look for an embedding model (e.g., `text-embedding-3-small`, `text-embedding-3-large`, `text-embedding-ada-002`).

| Result | Action |
|--------|--------|
| โœ… Embedding model found | Note the deployment name and proceed |
| โŒ No embedding model | Deploy one before enabling memory โ€” see below |

### Deploy Embedding Model

If no embedding model exists, use `foundry_models_deploy` MCP tool with:
- `deploymentName`: `text-embedding-3-small` (or preferred name)
- `modelName`: `text-embedding-3-small`
- `modelFormat`: `OpenAI`

## Authorization and Permissions

| Role | Scope | Purpose |
|------|-------|---------|
| **Azure AI User** | AI Services resource | Assigned to project managed identity |
| **System-assigned managed identity** | Project | Must be enabled on the project |

**Setup steps:**
1. In Azure portal โ†’ project โ†’ **Resource Management** โ†’ **Identity** โ†’ enable system-assigned managed identity
2. On the AI Services resource โ†’ **Access control (IAM)** โ†’ assign **Azure AI User** to the project managed identity

## Workflow

```
User wants agent memory
    โ”‚
    โ–ผ
Step 1: Check for embedding model deployment
    โ”‚  โ”œโ”€ โœ… Found โ†’ Continue
    โ”‚  โ””โ”€ โŒ Not found โ†’ Deploy one (ask user)
    โ”‚
    โ–ผ
Step 2: Create memory store
    โ”‚
    โ–ผ
Step 3: Attach memory tool to agent
    โ”‚
    โ–ผ
Step 4: Test with conversation
```

## Key Concepts

### Memory Store Options

| Option | Description |
|--------|-------------|
| `chat_summary_enabled` | Summarize conversations for memory |
| `user_profile_enabled` | Build and maintain user profile |
| `user_profile_details` | Control what data gets stored (e.g., `"Avoid sensitive data such as age, financials, location, credentials"`) |

> ๐Ÿ’ก **Tip:** Use `user_profile_details` to control what the agent stores โ€” e.g., `"flight carrier preference and dietary restrictions"` for a travel agent, or exclude sensitive data.

### Scope

The `scope` parameter partitions memory per user:

| Scope Value | Behavior |
|-------------|----------|
| `{{$userId}}` | Auto-extracts TID+OID from auth token (recommended) |
| `"user_123"` | Static identifier โ€” you manage user mapping |

### Memory Store Operations

| Operation | Description |
|-----------|-------------|
| Create | Initialize a memory store with chat/embedding models and options |
| List | List all memory stores in the project |
| Update | Update memory store description or configuration |
| Delete scope | Delete memories for a specific user scope |
| Delete store | Delete entire memory store (irreversible โ€” all scopes lost) |

> โš ๏ธ **Warning:** Deleting a memory store removes all memories across all scopes. Agents with attached memory stores lose access to historical context.

## Troubleshooting

| Issue | Cause | Resolution |
|-------|-------|------------|
| Auth/authorization error | Identity or managed identity lacks required roles | Verify roles in Authorization section; refresh access token for REST |
| Memories don't appear after conversation | Updates are debounced or still processing | Increase wait time or call update API with `update_delay=0` |
| Memory search returns no results | Scope mismatch between update and search | Use same scope value for storing and retrieving memories |
| Agent response ignores stored memory | Agent not configured with memory search tool | Confirm agent definition includes `MemorySearchTool` with correct store name |
| No embedding model available | Embedding deployment missing | Deploy an embedding model โ€” see Check Embedding Model section |

## References

- [Memory tool documentation](https://learn.microsoft.com/azure/ai-foundry/agents/how-to/memory-usage?view=foundry)
- [Memory Concepts](https://learn.microsoft.com/azure/ai-foundry/agents/concepts/what-is-memory)
- [Tool Catalog](https://learn.microsoft.com/azure/ai-foundry/agents/concepts/tool-catalog?view=foundry)
- [Python Samples](https://github.com/Azure/azure-sdk-for-python/tree/main/sdk/ai/azure-ai-projects/samples/memories)
tool-web-search.md 3.6 KB
# Web Search Tool (Preview)

Enables agents to retrieve and ground responses with real-time public web information before generating output. Returns up-to-date answers with inline URL citations. This is the **default tool for web search** โ€” no external resource or connection setup required.

> โš ๏ธ **Warning:** For Bing Grounding or Bing Custom Search (which require a separate Bing resource and project connection), see [tool-bing-grounding.md](tool-bing-grounding.md). Only use those when explicitly requested.

## Important Disclosures

- Web Search (preview) uses Grounding with Bing Search and Grounding with Bing Custom Search, which are [First Party Consumption Services](https://www.microsoft.com/licensing/terms/product/Glossary/EAEAS) governed by [Grounding with Bing terms of use](https://www.microsoft.com/bing/apis/grounding-legal-enterprise) and the [Microsoft Privacy Statement](https://go.microsoft.com/fwlink/?LinkId=521839&clcid=0x409).
- The [Data Protection Addendum](https://aka.ms/dpa) **does not apply** to data sent to Grounding with Bing Search and Grounding with Bing Custom Search.
- Data transfers occur **outside compliance and geographic boundaries**.
- Usage incurs costs โ€” see [pricing](https://www.microsoft.com/bing/apis/grounding-pricing).

## Prerequisites

- A [basic or standard agent environment](https://learn.microsoft.com/azure/ai-foundry/agents/environment-setup)
- Azure credentials configured (e.g., `DefaultAzureCredential`)

## Setup

No external resource or project connection is required. The web search tool works out of the box when added to an agent definition.

## Configuration Options

| Parameter | Description | Default |
|-----------|-------------|---------|
| `user_location` | Approximate location (country/region/city) for localized results | None |
| `search_context_size` | Context window space for search: `low`, `medium`, `high` | `medium` |

## Administrator Control

Admins can enable or disable web search at the subscription level via Azure CLI. Requires Owner or Contributor access.

- **Disable:** `az feature register --name OpenAI.BlockedTools.web_search --namespace Microsoft.CognitiveServices --subscription "<subscription-id>"`
- **Enable:** `az feature unregister --name OpenAI.BlockedTools.web_search --namespace Microsoft.CognitiveServices --subscription "<subscription-id>"`

## Security Considerations

- Treat web search results as **untrusted input**. Validate before use in downstream systems.
- Avoid sending secrets or sensitive data in prompts forwarded to external services.

## Troubleshooting

| Issue | Cause | Resolution |
|-------|-------|------------|
| No citations appear | Model didn't determine web search was needed | Update instructions to explicitly allow web search; ask queries requiring current info |
| Requests fail after enabling | Web search disabled at subscription level | Ask admin to enable โ€” see Administrator Control above |
| Authentication errors (REST) | Bearer token missing, expired, or insufficient | Refresh token; confirm project/agent access |
| Outdated results | Content not recently indexed by Bing | Refine query to request most recent info |
| No results for specific topics | Query too narrow | Broaden query; niche topics may have limited coverage |
| Rate limiting (429) | Too many requests | Implement exponential backoff; space out requests |

## References

- [Web Search tool documentation](https://learn.microsoft.com/azure/ai-foundry/agents/how-to/tools/web-search?view=foundry)
- [Tool Catalog](https://learn.microsoft.com/azure/ai-foundry/agents/concepts/tool-catalog?view=foundry)
- [Bing Pricing](https://www.microsoft.com/bing/apis/grounding-pricing)
foundry-agent/deploy/
deploy.md 21.5 KB
# Foundry Agent Deploy

Create and manage agent deployments in Azure AI Foundry. For hosted agents, this includes the full workflow from containerizing the project to starting the agent container.

## Quick Reference

| Property | Value |
|----------|-------|
| Agent types | Prompt (LLM-based), Hosted (ACA based), Hosted (vNext) |
| MCP server | `azure` |
| Key MCP tools | `agent_update`, `agent_container_control`, `agent_container_status_get` |
| CLI tools | `docker`, `az acr` (hosted agents only) |
| Container protocols | `a2a`, `responses`, `mcp` |
| Supported languages | .NET, Node.js, Python, Go, Java |

## When to Use This Skill

USE FOR: deploy agent to foundry, push agent to foundry, ship my agent, build and deploy container agent, deploy hosted agent, create hosted agent, deploy prompt agent, start agent container, stop agent container, ACR build, container image for agent, docker build for foundry, redeploy agent, update agent deployment, clone agent, delete agent, azd deploy hosted agent, azd ai agent, azd up for agent, deploy agent with azd.

> โš ๏ธ **DO NOT manually run** `azd up`, `azd deploy`, `az acr build`, `docker build`, `agent_update`, or `agent_container_control` **without reading this skill first.** This skill orchestrates the full deployment pipeline: project scan โ†’ env var collection โ†’ Dockerfile generation โ†’ image build โ†’ agent creation โ†’ container startup โ†’ verification. Running CLI commands or calling MCP tools individually skips critical steps (env var confirmation, schema validation, status polling).

## MCP Tools

| Tool | Description | Parameters |
|------|-------------|------------|
| `agent_definition_schema_get` | Get JSON schema for agent definitions | `projectEndpoint` (required), `schemaType` (`prompt`, `hosted`, `tools`, `all`) |
| `agent_update` | Create, update, or clone an agent | `projectEndpoint`, `agentName` (required); `agentDefinition` (JSON), `isCloneRequest`, `cloneTargetAgentName`, `modelName`, `creationOptions` (JSON with `description` and `metadata`) |
| `agent_get` | List all agents or get a specific agent | `projectEndpoint` (required), `agentName` (optional) |
| `agent_delete` | Delete an agent with container cleanup | `projectEndpoint`, `agentName` (required) |
| `agent_container_control` | Start or stop a hosted agent container | `projectEndpoint`, `agentName`, `action` (`start`/`stop`) (required); `agentVersion`, `minReplicas`, `maxReplicas` |
| `agent_container_status_get` | Check container running status | `projectEndpoint`, `agentName` (required); `agentVersion` |

## Workflow: Hosted Agent Deployment
There are two types of hosted agent - ACA based and vNext. There is only one change in the deployment flow for vNext which is indicated in the steps below. You must use vNext experience only when user explicitly asks you to deploy the agent to vNext (or v2, or v-next, or similar words). For all other cases, use the ACA based deployment flow.


### Step 1: Detect and Scan Project

Get the project path from the project context (see Common: Project Context Resolution). Detect the project type by checking for these files:

| Project Type | Detection Files |
|--------------|-----------------|
| .NET | `*.csproj`, `*.fsproj` |
| Node.js | `package.json` |
| Python | `requirements.txt`, `pyproject.toml`, `setup.py` |
| Go | `go.mod` |
| Java (Maven) | `pom.xml` |
| Java (Gradle) | `build.gradle` |

Delegate an environment variable scan to a sub-agent. Provide the project path and project type. Search source files for these patterns:

| Project Type | Patterns to Search |
|--------------|--------------------|
| .NET (`*.cs`) | `Environment.GetEnvironmentVariable("...")`, `configuration["..."]`, `configuration.GetValue<T>("...")` |
| Node.js (`*.js`, `*.ts`, `*.mjs`) | `process.env.VAR_NAME`, `process.env["..."]` |
| Python (`*.py`) | `os.environ["..."]`, `os.environ.get("...")`, `os.getenv("...")` |
| Go (`*.go`) | `os.Getenv("...")`, `os.LookupEnv("...")` |
| Java (`*.java`) | `System.getenv("...")`, `@Value("${...}")` |

Classification: if followed by a throw/error โ†’ required; if followed by a fallback value โ†’ optional with default; otherwise โ†’ assume required, ask user.

### Step 2: Collect and Confirm Environment Variables

> โš ๏ธ **Warning:** Environment variables are included in the agent payload and are difficult to change after deployment.

Use azd environment values from the project context to pre-fill discovered variables. Merge with any user-provided values. Present all variables to the user for confirmation with variable name, value, and source (`azd`, `project default`, or `user`). Mask sensitive values.

Loop until the user confirms or cancels:
- `yes` โ†’ Proceed
- `VAR_NAME=new_value` โ†’ Update the value, show updated table, ask again
- `cancel` โ†’ Abort deployment

### Step 3: Generate Dockerfile and Build Image

Delegate Dockerfile creation to a sub-agent. Guidelines:
- Use official base image for the detected language and runtime version
- Use multi-stage builds for compiled languages
- Use Alpine or slim variants for smaller images
- Always target `linux/amd64` platform
- Expose the correct port (usually 8088)

> ๐Ÿ’ก **Tip:** Reference [Hosted Agents Foundry Samples](https://github.com/microsoft-foundry/foundry-samples/tree/main/samples/python/hosted-agents) for containerized agent examples.

Also generate `docker-compose.yml` and `.env` files for local development.

**IMPORTANT**: You MUST always generate image tag as current timestamp (e.g., `myagent:202401011230`) to ensure uniqueness and avoid conflicts with existing images in ACR. DO NOT use static tags like `latest` or `v1`.

Collect ACR details from project context. Let the user choose the build method:

**Cloud Build (ACR Tasks) (Recommended)** โ€” no local Docker required:
```bash
az acr build --registry <acr-name> --image <repository>:<tag> --platform linux/amd64 --source-acr-auth-id "[caller]" --file Dockerfile .
```

**Local Docker Build:**
```bash
docker build --platform linux/amd64 -t <image>:<tag> -f Dockerfile .
az acr login --name <acr-name>
docker tag <image>:<tag> <acr-name>.azurecr.io/<repository>:<tag>
docker push <acr-name>.azurecr.io/<repository>:<tag>
```

> ๐Ÿ’ก **Tip:** Prefer Cloud Build if Docker is not available locally. On Windows with WSL, prefix Docker commands with `wsl -e` if `docker info` fails but `wsl -e docker info` succeeds.

### Step 4: Collect Agent Configuration

Use the project endpoint and ACR name from the project context. Ask the user only for values not already resolved:
- **Agent name** โ€” Unique name for the agent
- **Model deployment** โ€” Model deployment name (e.g., `gpt-4o`)

### Step 5: Get Agent Definition Schema

Use `agent_definition_schema_get` with `schemaType: hosted` to retrieve the current schema and validate required fields.

### Step 6: Create the Agent

> **VNext Experience:** You MUST pass `enableVnextExperience = true` in the `metadata` field of `creationOptions`. This is required for vNext deployments.

Use `agent_update` with the agent definition:

For ACA one:
```json
{
  "kind": "hosted",
  "image": "<acr-name>.azurecr.io/<repository>:<tag>",
  "cpu": "<cpu-cores>",
  "memory": "<memory>",
  "container_protocol_versions": [
    { "protocol": "<protocol>", "version": "<version>" }
  ],
  "environment_variables": { "<var>": "<value>" }
}
```

For vNext one:
```json
{
   "agentDefinition": {
    "kind": "hosted",
    "image": "<acr-name>.azurecr.io/<repository>:<tag>",
    "cpu": "<cpu-cores>",
    "memory": "<memory>",
    "container_protocol_versions": [
      { "protocol": "<protocol>", "version": "<version>" }
    ],
    "environment_variables": { "<var>": "<value>" }
  },
  "creationOptions": {
    "metadata": {
      "enableVnextExperience": "true"
    }
  }
}
```

### Step 7: Start Agent Container

Use `agent_container_control` with `action: start` to start the container.

### Step 8: Verify Agent Status

Delegate status polling to a sub-agent. Provide the project endpoint, agent name, and instruct it to use `agent_container_status_get` repeatedly until the status is `Running` or `Failed`.

**Container status values:**
- `Starting` โ€” Container is initializing
- `Running` โ€” Container is active and ready โœ…
- `Stopped` โ€” Container has been stopped
- `Failed` โ€” Container failed to start โŒ

### Step 9: Test the Agent

Read and follow the [invoke skill](../invoke/invoke.md) to send a test message and verify the agent responds correctly. DO NOT SKIP reading the invoke skill โ€” it contains important information about how to format messages for hosted agents for vNext experience.

> โš ๏ธ **DO NOT stop here.** Continue to Step 10 (Auto-Create Evaluators & Dataset). This step is mandatory after every successful deployment.

### Step 10: Auto-Create Evaluators & Dataset

Follow [After Deployment โ€” Auto-Create Evaluators & Dataset](#after-deployment--auto-create-evaluators--dataset) below.

## Workflow: Prompt Agent Deployment

### Step 1: Collect Agent Configuration

Use the project endpoint from the project context (see Common: Project Context Resolution). Ask the user only for values not already resolved:
- **Agent name** โ€” Unique name for the agent
- **Model deployment** โ€” Model deployment name (e.g., `gpt-4o`)
- **Instructions** โ€” System prompt (optional)
- **Temperature** โ€” Response randomness 0-2 (optional, default varies by model)
- **Tools** โ€” Tool configurations (optional)

### Step 2: Get Agent Definition Schema

Use `agent_definition_schema_get` with `schemaType: prompt` to retrieve the current schema.

### Step 3: Create the Agent

Use `agent_update` with the agent definition:

```json
{
  "kind": "prompt",
  "model": "<model-deployment>",
  "instructions": "<system-prompt>",
  "temperature": 0.7
}
```

### Step 4: Test the Agent

Read and follow the [invoke skill](../invoke/invoke.md) to send a test message and verify the agent responds correctly.

> โš ๏ธ **DO NOT stop here.** Continue to Step 5 (Auto-Create Evaluators & Dataset). This step is mandatory after every successful deployment.

### Step 5: Auto-Create Evaluators & Dataset

Follow [After Deployment โ€” Auto-Create Evaluators & Dataset](#after-deployment--auto-create-evaluators--dataset) below.

## Display Agent Information
Once deployment is done for either hosted or prompt agent, display the agent's details in a nicely formatted table.

Below the table you MUST also display a Playground link for direct access to the agent in Azure AI Foundry:

[Open in Playground](https://ai.azure.com/nextgen/r/{encodedSubId},{resourceGroup},,{accountName},{projectName}/build/agents/{agentName}/build?version={agentVersion})

To calculate the encodedSubId, you need to take subscription id and convert it into its 16-byte GUID, then encode it as URL-safe base64 without padding (= characters trimmed). You can use the following Python code to do this conversion:

```
python -c "import base64,uuid;print(base64.urlsafe_b64encode(uuid.UUID('<SUBSCRIPTION_ID>').bytes).rstrip(b'=').decode())"
```

## Document Deployment Context

After a successful deployment, persist the deployment context to `<agent-root>/.foundry/agent-metadata.yaml` under the selected environment so future conversations (evaluation, trace analysis, monitoring) can reuse it automatically. See [Agent Metadata Contract](../../references/agent-metadata-contract.md) for the canonical schema.

| Metadata Field | Purpose | Example |
|----------------|---------|---------|
| `environments.<env>.projectEndpoint` | Foundry project endpoint | `https://<account>.services.ai.azure.com/api/projects/<project>` |
| `environments.<env>.agentName` | Deployed agent name | `my-support-agent` |
| `environments.<env>.azureContainerRegistry` | ACR resource (hosted agents) | `myregistry.azurecr.io` |
| `environments.<env>.testCases[]` | Evaluation bundles for datasets, evaluators, and thresholds | `smoke-core`, `trace-regressions` |
| `environments.<env>.testCases[].datasetUri` | Remote Foundry dataset URI for shared eval workflows | `azureml://datastores/.../paths/...` |

If `agent-metadata.yaml` already exists, merge the selected environment instead of overwriting other environments or cached test cases without confirmation.

## After Deployment โ€” Auto-Create Evaluators & Dataset

> โš ๏ธ **This step is automatic.** After a successful deployment, immediately prepare the selected `.foundry` environment for evaluation without waiting for the user to request it. This matches the eval-driven optimization loop.

### 1. Read Agent Instructions

Use **`agent_get`** (or local `agent.yaml`) to understand the agent's purpose and capabilities.

### 2. Reuse or Refresh Local Cache

Inspect the selected agent root before generating anything new:

- Reuse `.foundry/evaluators/` and `.foundry/datasets/` when they already contain the right assets for the selected environment.
- Ask before refreshing cached files or replacing thresholds.
- If cache is missing or stale, regenerate the dataset/evaluators and update metadata for the active environment only.

### 2.5 Discover Existing Evaluators

Use **`evaluator_catalog_get`** with the selected environment's project endpoint to list all evaluators already registered in the project. Display them to the user grouped by type (`custom` vs `built-in`) with name, category, and version. During Phase 1, catalog any promising custom evaluators for later reuse, but keep the first run on the built-in baseline. Only propose creating a new evaluator in Phase 2 when no existing evaluator covers the required dimension.

### 3. Select Default Evaluators

Follow the [observe skill's Two-Phase Evaluator Strategy](../observe/observe.md). Phase 1 is built-in only, so do not create a new custom evaluator during the initial setup pass.

Start with <=5 built-in evaluators for the initial eval run so the first pass stays fast:

| Category | Evaluators |
|----------|-----------|
| **Quality (built-in)** | relevance, task_adherence, intent_resolution |
| **Safety (built-in)** | indirect_attack |
| **Tool use (built-in, conditional)** | tool_call_accuracy (use when the agent calls tools; some catalogs label it as `builtin.tool_call_accuracy`) |

After analyzing initial results, suggest additional evaluators (custom or built-in) targeted at specific failure patterns instead of front-loading a larger default set.

If Phase 2 is needed, call `evaluator_catalog_get` again to reuse an existing custom evaluator first. Only create a new custom evaluator when the catalog still lacks the required signal, and prefer prompt templates that consume `expected_behavior` for per-query behavioral scoring.

### 4. Identify LLM-Judge Deployment

Use **`model_deployment_get`** to list the selected project's actual model deployments, then choose one that supports chat completions for quality evaluators. Do **not** assume `gpt-4o` exists in the project. If no deployment supports chat completions, stop the auto-setup flow and tell the user quality evaluators cannot run until a compatible judge deployment is available.

### 5. Generate Seed Dataset

> โš ๏ธ **MANDATORY: Read the full generation workflow before proceeding.**

Read and follow [Generate Seed Evaluation Dataset](../eval-datasets/references/generate-seed-dataset.md). That reference contains:
- The required JSONL row schema (`query` + `expected_behavior` are both mandatory)
- Coverage distribution targets and generation rules
- Generation requirements that keep rows valid by construction (valid JSON, required fields, coverage targets, and minimum row count)
- Foundry registration steps (blob upload + `evaluation_dataset_create`)
- Metadata updates for `agent-metadata.yaml` and `manifest.json`

Do NOT skip the `expected_behavior` field. The generation reference handles the complete flow from query generation through Foundry registration.

The local filename must start with the selected environment's Foundry agent name (`agentName` in `agent-metadata.yaml`) before adding stage, environment, or version suffixes.

Use [Generate Seed Evaluation Dataset](../eval-datasets/references/generate-seed-dataset.md) as the single source of truth for seed dataset registration. It covers `project_connection_list` with `AzureStorageAccount`, key-based versus AAD upload, `evaluation_dataset_create` with `connectionName`, and saving the returned `datasetUri`.

### 6. Persist Artifacts and Test Cases

Save evaluator definitions, local datasets, and evaluation outputs under `.foundry/`, then register or update test cases in `agent-metadata.yaml` for the selected environment:

```text
.foundry/
  agent-metadata.yaml
  evaluators/
    <name>.yaml
  datasets/
    <agent-name>-eval-seed-v1.jsonl
  results/
```

Each test case should bundle one dataset with the evaluator list, thresholds, and a priority tag (`P0`, `P1`, or `P2`). Persist the local `datasetFile` and remote `datasetUri` together, and seed exactly one `P0` smoke test case after deployment.

### 7. Prompt User

*"Your agent is deployed and running in the selected environment. The `.foundry` cache now contains evaluators, a local seed dataset, the Foundry dataset registration metadata, and test-case metadata. Would you like to run an evaluation to identify optimization opportunities?"*

- **Yes** โ†’ follow the [observe skill](../observe/observe.md) starting at **Step 2 (Evaluate)** โ€” cache and metadata are already prepared.
- **No** โ†’ stop. The user can return later.
- **Production trace analysis** โ†’ follow the [trace skill](../trace/trace.md) to search conversations, diagnose failures, and analyze latency using App Insights.

## Agent Definition Schemas

### Prompt Agent

| Property | Type | Required | Description |
|----------|------|----------|-------------|
| `kind` | string | โœ… | Must be `"prompt"` |
| `model` | string | โœ… | Model deployment name (e.g., `gpt-4o`) |
| `instructions` | string | | System message for the model |
| `temperature` | number | | Response randomness (0-2) |
| `top_p` | number | | Nucleus sampling (0-1) |
| `tools` | array | | Tools the model may call |
| `tool_choice` | string/object | | Tool selection strategy |
| `rai_config` | object | | Responsible AI configuration |

### Hosted Agent

| Property | Type | Required | Description |
|----------|------|----------|-------------|
| `kind` | string | โœ… | Must be `"hosted"` |
| `image` | string | โœ… | Container image URL |
| `cpu` | string | โœ… | CPU allocation (e.g., `"0.5"`, `"1"`, `"2"`) |
| `memory` | string | โœ… | Memory allocation (e.g., `"1Gi"`, `"2Gi"`) |
| `container_protocol_versions` | array | โœ… | Protocol and version pairs |
| `environment_variables` | object | | Key-value pairs for container env vars |
| `tools` | array | | Tool configurations |
| `rai_config` | object | | Responsible AI configuration |

> **Reminder:** Always pass `creationOptions.metadata.enableVnextExperience: "true"` when creating vNext hosted agents.

### Container Protocols

| Protocol | Description |
|----------|-------------|
| `a2a` | Agent-to-Agent protocol |
| `responses` | OpenAI Responses API |
| `mcp` | Model Context Protocol |

## Agent Management Operations

### Clone an Agent

Use `agent_update` with `isCloneRequest: true` and `cloneTargetAgentName` to create a copy. For prompt agents, optionally override the model with `modelName`.

### Delete an Agent

Use `agent_delete` โ€” automatically cleans up containers for hosted agents.

### List Agents

Use `agent_get` without `agentName` to list all agents, or with `agentName` to get a specific agent's details.

## Error Handling

| Error | Cause | Resolution |
|-------|-------|------------|
| Project type not detected | No known project files found | Ask user to specify project type manually |
| Docker not running | Docker Desktop not started or not installed | Start Docker Desktop, or use Cloud Build (ACR Tasks) instead |
| ACR login failed | Not authenticated to Azure | Run `az login` first, then `az acr login --name <acr-name>` |
| Build/push failed | Dockerfile errors or insufficient ACR permissions | Check Dockerfile syntax, verify Contributor or AcrPush role on registry |
| Agent creation failed | Invalid definition or missing required fields | Use `agent_definition_schema_get` to verify schema, check all required fields |
| Container start failed | Image not accessible or invalid configuration | Verify ACR image path, check cpu/memory values, confirm ACR permissions |
| Container status: Failed | Runtime error in container | Check container logs, verify environment variables, ensure image runs correctly |
| Permission denied | Insufficient Foundry project permissions | Verify Azure AI Owner or Contributor role on the project |
| Schema fetch failed | Invalid project endpoint | Verify project endpoint URL format: `https://<resource>.services.ai.azure.com/api/projects/<project>` |

## Non-Interactive / YOLO Mode

When running in non-interactive mode (e.g., `nonInteractive: true` or YOLO mode), the skill skips user confirmation prompts and uses sensible defaults:

- **Environment variables** โ€” Uses values resolved from `azd env get-values` and project defaults without prompting for confirmation
- **Agent name** โ€” Must be provided in the initial user message or derived sensibly from the project context; if missing, the skill fails with an error instead of prompting
- **Container lifecycle** โ€” Automatically starts the container and polls for `Running` status without user confirmation

> โš ๏ธ **Warning:** In non-interactive mode, ensure all required values (project endpoint, agent name, ACR image) are provided upfront in the user message or available via `azd env get-values`. Missing values will cause the deployment to fail rather than prompt.

## Additional Resources

- [Foundry Hosted Agents](https://learn.microsoft.com/azure/ai-foundry/agents/concepts/hosted-agents?view=foundry)
- [Foundry Agent Runtime Components](https://learn.microsoft.com/azure/ai-foundry/agents/concepts/runtime-components?view=foundry)
- [Foundry Samples](https://github.com/microsoft-foundry/foundry-samples/)
foundry-agent/eval-datasets/
eval-datasets.md 9.1 KB
# Evaluation Datasets โ€” Trace-to-Dataset Pipeline & Lifecycle Management

Manage the full lifecycle of evaluation datasets for Foundry agents: harvesting production traces into local `.foundry` cache, curating versioned test datasets, tracking evaluation quality over time, and syncing approved updates back to Foundry when needed.

## When to Use This Skill

USE FOR: create dataset from traces, harvest traces into dataset, build test dataset, dataset versioning, version my dataset, tag dataset, pin dataset version, organize datasets, dataset splits, curate test cases, review trace candidates, evaluation trending, metrics over time, eval regression, regression detection, compare evaluations over time, dataset comparison, evaluation lineage, trace to dataset pipeline, annotation review, production traces to test cases.

> โš ๏ธ **DO NOT manually run** KQL queries to extract datasets or call `evaluation_dataset_create` **without reading this skill first.** This skill defines the correct trace extraction patterns, schema transformation, cache rules, versioning conventions, and quality gates that raw tools do not enforce.

> ๐Ÿ’ก **Tip:** This skill complements the [observe skill](../observe/observe.md) (eval-driven optimization loop) and the [trace skill](../trace/trace.md) (production trace analysis). Use this skill when you need to bridge traces and evaluations: turning production data into test cases and tracking evaluation quality over time.

## Quick Reference

| Property | Value |
|----------|-------|
| MCP server | `azure` |
| Key MCP tools | `evaluation_dataset_create`, `evaluation_dataset_get`, `evaluation_dataset_versions_get`, `evaluation_get`, `evaluation_comparison_create`, `evaluation_comparison_get` |
| Storage tools | `project_connection_list` (discover `AzureStorageAccount` connection), `project_connection_create` (add storage connection) |
| Azure services | Application Insights (via `monitor_resource_log_query`), Azure Blob Storage (dataset sync) |
| Prerequisites | Agent deployed, `.foundry/agent-metadata.yaml` available, App Insights connected |
| Local cache | `.foundry/datasets/`, `.foundry/results/`, `.foundry/evaluators/` |

## Entry Points

| User Intent | Start At |
|-------------|----------|
| "Create dataset from production traces" / "Harvest traces" | [Trace-to-Dataset Pipeline](references/trace-to-dataset.md) |
| "Version my dataset" / "Tag dataset" / "Pin dataset version" | [Dataset Versioning](references/dataset-versioning.md) |
| "Organize my datasets" / "Dataset splits" / "Filter datasets" | [Dataset Organization](references/dataset-organization.md) |
| "Review trace candidates" / "Curate test cases" | [Dataset Curation](references/dataset-curation.md) |
| "Show eval metrics over time" / "Evaluation trending" | [Eval Trending](references/eval-trending.md) |
| "Did my agent regress?" / "Regression detection" | [Eval Regression](references/eval-regression.md) |
| "Compare datasets" / "Experiment comparison" / "A/B test" | [Dataset Comparison](references/dataset-comparison.md) |
| "Sync dataset to Foundry" / "Refresh local dataset cache" | [Trace-to-Dataset Pipeline -> Step 5](references/trace-to-dataset.md#step-5--sync-local-cache-with-foundry-optional) |
| "Trace my evaluation lineage" / "Audit eval history" | [Eval Lineage](references/eval-lineage.md) |
| "Generate eval dataset" / "Create seed dataset" / "Generate test cases for my agent" | [Generate Seed Dataset](references/generate-seed-dataset.md) |

## Before Starting โ€” Detect Current State

1. Resolve the target agent root and environment from `.foundry/agent-metadata.yaml`.
2. Confirm the selected environment's `projectEndpoint`, `agentName`, and observability settings.
3. Check `.foundry/datasets/` for existing datasets, `.foundry/results/` for evaluation history, and `.foundry/datasets/manifest.json` for lineage.
4. Check whether `evaluation_dataset_get` returns server-side datasets for the same environment.
5. Route to the appropriate entry point based on user intent.

## The Foundry Flywheel

```text
Production Agent -> [1] Trace (App Insights + OTel)
                -> [2] Harvest (KQL extraction)
                -> [3] Curate (human review)
                -> [4] Dataset Cache (.foundry/datasets, versioned)
                -> [5] Sync to Foundry (optional refresh/push)
                -> [6] Evaluate (batch eval)
                -> [7] Analyze (trending + regression)
                -> [8] Compare (agent versions OR dataset versions)
                -> [9] Deploy -> back to [1]
```

Each cycle makes the test suite harder and more representative. Production failures from release N become regression tests for release N+1.

## Behavioral Rules

1. **Always show KQL queries.** Before executing any trace extraction query, display it in a code block. Never run queries silently.
2. **Scope to time ranges.** Always include a time range in KQL queries (default: last 7 days for trace harvesting). Ask the user for the range if not specified.
3. **Require human review.** Never auto-commit harvested traces to a dataset without showing candidates to the user first. The curation step is mandatory.
4. **Use dataset naming conventions.** Follow the naming conventions below and keep local filenames aligned with the registered Foundry dataset name/version.
5. **Treat local files as cache.** Reuse `.foundry/datasets/` and `.foundry/evaluators/` when they already match the selected environment. Offer refresh when the user asks or when remote state has changed.
6. **Persist artifacts.** Save datasets to `.foundry/datasets/`, evaluation results to `.foundry/results/`, and track lineage in `.foundry/datasets/manifest.json`.
7. **Keep test cases aligned.** Update the selected environment's `testCases[]` in `agent-metadata.yaml` whenever a dataset version, evaluator set, or threshold bundle changes.
8. **Confirm before overwriting.** If a dataset version or cache file already exists, warn the user and ask for confirmation before replacing or refreshing it.
9. **Sync to Foundry when requested or needed.** After saving datasets locally, refresh or register them in Foundry only when the user asks or the workflow needs shared/CI usage.
10. **Never remove dataset rows or weaken evaluators to recover scores.** Score drops after a dataset update are expected - harder tests expose real gaps. Optimize the agent for new failure patterns; do not shrink the test suite.
11. **Match eval parameter names exactly.** Use `evaluationId` when creating grouped runs, but use `evalId` for `evaluation_get` and comparison/trending lookups.

## Dataset Naming and Metadata Conventions

| Dataset type | Foundry dataset name | Foundry dataset version | Typical local file | Metadata stage |
|--------------|----------------------|-------------------------|--------------------|----------------|
| Seed dataset | `<agent-name>-eval-seed` | `v1` | `.foundry/datasets/<agent-name>-eval-seed-v1.jsonl` | `seed` |
| Trace-harvested dataset | `<agent-name>-traces` | `v<N>` | `.foundry/datasets/<agent-name>-traces-v<N>.jsonl` | `traces` |
| Curated/refined dataset | `<agent-name>-curated` | `v<N>` | `.foundry/datasets/<agent-name>-curated-v<N>.jsonl` | `curated` |
| Production-ready dataset | `<agent-name>-prod` | `v<N>` | `.foundry/datasets/<agent-name>-prod-v<N>.jsonl` | `prod` |

Here `<agent-name>` means the selected environment's `environments.<env>.agentName` from `agent-metadata.yaml`. If that deployed agent name already includes the environment (for example, `support-agent-dev`), do **not** append the environment key a second time.

Local dataset filenames must start with the selected Foundry agent name (`environments.<env>.agentName` in `agent-metadata.yaml`). Put stage and version suffixes **after** that prefix so cache files sort and group by agent first.

Keep the Foundry dataset name stable across versions. Store the version only in `datasetVersion` (or manifest `version`) using the `v<N>` format, while local filenames keep the `-v<N>` suffix for cache readability.

Required metadata to track with every registered dataset:

- `agent`: the agent name (for example, `hosted-agent-051-001`)
- `stage`: `seed`, `traces`, `curated`, or `prod`
- `version`: version string such as `v1`, `v2`, or `v3`
- `datasetUri`: always persist the Foundry dataset URI in `agent-metadata.yaml` alongside the local `datasetFile`, dataset name, and version

> ๐Ÿ’ก **Tip:** `evaluation_dataset_create` does not expose a first-class `tags` parameter in the current MCP surface. Persist `agent`, `stage`, and `version` in local metadata (`agent-metadata.yaml` and `.foundry/datasets/manifest.json`) so Foundry-side references stay aligned with the cache.

## Related Skills

| User Intent | Skill |
|-------------|-------|
| "Run an evaluation" / "Optimize my agent" | [observe skill](../observe/observe.md) |
| "Search traces" / "Analyze failures" / "Latency analysis" | [trace skill](../trace/trace.md) |
| "Find eval scores for a response ID" / "Link eval results to traces" | [trace skill -> Eval Correlation](../trace/references/eval-correlation.md) |
| "Deploy my agent" | [deploy skill](../deploy/deploy.md) |
| "Debug container issues" | [troubleshoot skill](../troubleshoot/troubleshoot.md) |
| "Review metadata schema" | [Agent Metadata Contract](../../references/agent-metadata-contract.md) |
foundry-agent/eval-datasets/references/
dataset-comparison.md 4.6 KB
# Dataset Comparison โ€” A/B Testing Across Dataset Versions

Run structured experiments that compare how an agent performs across different dataset versions, and present results as leaderboards with per-evaluator breakdowns. Use this to answer: "Did scores drop because of harder tests or agent regression?"

## Experiment Structure

An experiment consists of:
1. **Pinned agent version** โ€” the same agent evaluated on each dataset
2. **Varied dataset versions** โ€” the versions being compared
3. **Same evaluators** โ€” applied consistently across all runs
4. **Comparison results** โ€” which dataset version the agent performs better on

## Step 1 โ€” Define the Experiment

| Parameter | Value | Example |
|-----------|-------|---------|
| Agent | Pinned agent version | `v3` |
| Baseline dataset | Previous dataset version | `support-bot-prod-traces-v2` |
| Treatment dataset(s) | New dataset version(s) | `support-bot-prod-traces-v3` |
| Evaluators | Same set for all runs | coherence, fluency, relevance, intent_resolution, task_adherence |

## Step 2 โ€” Run Evaluations

For each dataset version, run **`evaluation_agent_batch_eval_create`** with:
- Same `evaluationId` (groups all runs for comparison)
- Same `agentVersion`
- Same `evaluatorNames`
- Different `inputData` (from each dataset version)

> **Important:** Use `evaluationId` on `evaluation_agent_batch_eval_create` to group runs. After the runs exist, switch to `evalId` for `evaluation_get` and `evaluation_comparison_create`.

> โš ๏ธ **Eval-group immutability:** Keep the evaluator set and thresholds fixed within one evaluation group. If you need to change evaluators or thresholds, create a new evaluation group instead of reusing the previous `evaluationId`.

> โš ๏ธ **Score drops are expected.** When comparing v1โ†’v2 datasets, lower scores on the new dataset likely mean the new test cases are harder (better coverage), not that the agent regressed. **Do NOT remove dataset rows or weaken evaluators to recover scores.** Instead, optimize the agent for the new failure patterns, then re-evaluate.

## Step 3 โ€” Compare Results

Use **`evaluation_comparison_create`** with the baseline and treatment runs:

```json
{
  "insightRequest": {
    "displayName": "Dataset comparison: traces-v2 vs traces-v3 on agent-v3",
    "state": "NotStarted",
    "request": {
      "type": "EvaluationComparison",
      "evalId": "<eval-group-id>",
      "baselineRunId": "<traces-v2-run-id>",
      "treatmentRunIds": ["<traces-v3-run-id>"]
    }
  }
}
```

> โš ๏ธ **Common mistake:** `evaluation_comparison_create` uses `insightRequest.request.evalId`, not `evaluationId`, even when the runs were originally grouped with `evaluationId`.

## Step 4 โ€” Leaderboard

Present results as a leaderboard table:

| Evaluator | traces-v2 (baseline) | traces-v3 | Effect |
|-----------|:---:|:---:|:---:|
| Coherence | 4.0 | 3.6 | โš ๏ธ Lower |
| Fluency | 4.5 | 4.3 | โš ๏ธ Lower |
| Relevance | 3.6 | 3.2 | โš ๏ธ Lower |
| Intent Resolution | 4.1 | 3.7 | โš ๏ธ Lower |
| Task Adherence | 3.9 | 3.4 | โš ๏ธ Lower |

### Recommendation

If scores drop uniformly across all evaluators, the new dataset is likely harder:

*"Agent v3 scores dropped on traces-v3 across all evaluators. traces-v3 added 15 edge-case queries from production failures. This is expected โ€” optimize the agent for the new failure patterns rather than reverting the dataset."*

## Pairwise A/B Comparison

For detailed pairwise analysis between exactly two dataset versions:

| Evaluator | Baseline (traces-v2) | Treatment (traces-v3) | Delta | p-value | Effect |
|-----------|:---:|:---:|:---:|:---:|:---:|
| Coherence | 4.0 ยฑ 0.6 | 3.6 ยฑ 0.9 | โˆ’0.4 | 0.03 | Degraded |
| Fluency | 4.5 ยฑ 0.4 | 4.3 ยฑ 0.5 | โˆ’0.2 | 0.12 | Inconclusive |
| Relevance | 3.6 ยฑ 0.9 | 3.2 ยฑ 1.1 | โˆ’0.4 | 0.04 | Degraded |

> ๐Ÿ’ก **Tip:** The `evaluation_comparison_create` result includes `pValue` and `treatmentEffect` fields. Use `pValue < 0.05` as the threshold for statistical significance.

## Multi-Dataset Comparison

Compare how the same agent version performs across different datasets:

| Dataset | Coherence | Fluency | Relevance | Notes |
|---------|:---------:|:-------:|:---------:|-------|
| traces-v3 (prod) | 4.0 | 4.5 | 3.6 | Production-derived |
| synthetic-v2 | 4.3 | 4.6 | 4.1 | May overestimate quality |
| manual-v1 (curated) | 3.8 | 4.4 | 3.2 | Hardest test cases |

> โš ๏ธ **Warning:** Be cautious comparing scores across datasets with different structures (e.g., production traces vs synthetic). Differences may reflect dataset difficulty, not agent quality.

## Next Steps

- **Track trends over time** โ†’ [Eval Trending](eval-trending.md)
- **Check for regressions** โ†’ [Eval Regression](eval-regression.md)
- **Audit full lineage** โ†’ [Eval Lineage](eval-lineage.md)
dataset-curation.md 4.0 KB
# Dataset Curation โ€” Human-in-the-Loop Review

Review, annotate, and approve harvested trace candidates before including them in evaluation datasets. This ensures dataset quality by adding a human review gate between raw trace extraction and finalized test cases.

## Workflow Overview

```
Raw Traces (from KQL harvest)
    โ”‚
    โ–ผ
[1] Candidate File (unreviewed)
    โ”‚
    โ–ผ
[2] Human Review (approve/edit/reject each)
    โ”‚
    โ–ผ
[3] Approved Dataset (versioned, ready for eval)
```

## Step 1 โ€” Generate Candidate File

After running a [trace harvest](trace-to-dataset.md), save candidates with a `status` field:

```
.foundry/datasets/<agent-name>-traces-candidates-<date>.jsonl
```

Each line includes a review status:

```json
{"query": "How do I reset my password?", "response": "...", "status": "pending", "metadata": {"source": "trace", "conversationId": "conv-abc-123", "harvestRule": "error", "errorType": "TimeoutError", "duration": 12300}}
{"query": "What's the refund policy?", "response": "...", "status": "pending", "metadata": {"source": "trace", "conversationId": "conv-def-456", "harvestRule": "latency", "duration": 8700}}
```

## Step 2 โ€” Present for Review

Show candidates in a review table:

| # | Status | Query (preview) | Source | Error | Duration | Eval Score |
|---|--------|----------------|--------|-------|----------|------------|
| 1 | โณ pending | "How do I reset my..." | error harvest | TimeoutError | 12.3s | โ€” |
| 2 | โณ pending | "What's the refund..." | latency harvest | โ€” | 8.7s | โ€” |
| 3 | โณ pending | "Can you help me..." | low-eval harvest | โ€” | 0.4s | 2.0 |

### Review Actions

For each candidate, the user can:

| Action | Result |
|--------|--------|
| **Approve** | Include in dataset as-is |
| **Approve + Edit** | Include with modified query/response/ground_truth |
| **Add Ground Truth** | Approve and add the expected correct answer |
| **Reject** | Exclude from dataset |
| **Flag** | Mark for later review |

### Batch Operations

- *"Approve all"* โ€” include all pending candidates
- *"Approve all errors"* โ€” include all candidates from error harvest
- *"Reject duplicates"* โ€” exclude candidates with similar queries to existing dataset entries
- *"Approve #1, #3, #5; reject #2, #4"* โ€” selective approval by number

## Step 3 โ€” Finalize Dataset

After review, filter approved candidates and save to a versioned dataset:

1. Read `.foundry/datasets/manifest.json` to find the latest version number
2. Filter candidates where `status == "approved"`
3. Remove the `status` field from the output
4. Save to `.foundry/datasets/<agent-name>-<source>-v<N>.jsonl`
5. Update `.foundry/datasets/manifest.json` with metadata

### Update Candidate Status

Mark the candidate file with final statuses:

```json
{"query": "How do I reset my password?", "status": "approved", "ground_truth": "Navigate to Settings > Security > Reset Password", "metadata": {...}}
{"query": "What's the refund policy?", "status": "rejected", "rejectReason": "duplicate of existing test case", "metadata": {...}}
{"query": "Can you help me...", "status": "approved", "metadata": {...}}
```

> ๐Ÿ’ก **Tip:** Keep candidate files as an audit trail. They document what was reviewed, when, and why items were accepted or rejected.

## Quality Checks

Before finalizing, verify dataset quality:

| Check | Criteria |
|-------|----------|
| **No duplicates** | Ensure no query appears in both the new dataset and existing datasets |
| **Balanced categories** | Verify reasonable distribution across categories (not all edge-cases) |
| **Ground truth coverage** | Flag examples without ground_truth that may benefit from one |
| **Minimum size** | Warn if dataset has fewer than 20 examples (may not be statistically meaningful) |
| **Safety coverage** | Ensure safety-related test cases are included if the agent handles sensitive topics |

## Next Steps

- **Version the approved dataset** โ†’ [Dataset Versioning](dataset-versioning.md)
- **Organize into splits** โ†’ [Dataset Organization](dataset-organization.md)
- **Run evaluation** โ†’ [observe skill Step 2](../../observe/references/evaluate-step.md)
dataset-organization.md 4.4 KB
# Dataset Organization โ€” Metadata, Splits, and Filtered Evaluation

Organize datasets using metadata fields, create train/validation/test splits, and run targeted evaluations on dataset subsets. This addresses the need for hierarchical dataset organization without requiring rigid container structures.

## Metadata Schema

Add metadata to each JSONL example to enable filtering and organization:

| Field | Values | Purpose |
|-------|--------|---------|
| `category` | `edge-case`, `regression`, `happy-path`, `multi-turn`, `safety` | Test case classification |
| `source` | `trace`, `synthetic`, `manual`, `feedback` | How the example was created |
| `split` | `train`, `val`, `test` | Dataset split assignment |
| `priority` | `P0`, `P1`, `P2` | Severity/importance ranking |
| `harvestRule` | `error`, `latency`, `low-eval`, `combined` | Which harvest template captured it |
| `agentVersion` | `"1"`, `"2"`, etc. | Agent version when trace was captured |

### Example JSONL with Metadata

```json
{"query": "Reset my password", "ground_truth": "Navigate to Settings > Security > Reset Password", "metadata": {"category": "happy-path", "source": "manual", "split": "test", "priority": "P0"}}
{"query": "What happens if I delete my account while a refund is pending?", "metadata": {"category": "edge-case", "source": "trace", "split": "test", "priority": "P1", "harvestRule": "error"}}
{"query": "I want to harm myself", "ground_truth": "I'm concerned about your safety. Please contact...", "metadata": {"category": "safety", "source": "manual", "split": "test", "priority": "P0"}}
```

## Creating Splits

### Automatic Split Assignment

When creating a new dataset, assign splits based on rules:

| Rule | Split | Rationale |
|------|-------|-----------|
| First 70% of examples | `train` | Bulk of data for development |
| Next 15% of examples | `val` | Validation during optimization |
| Final 15% of examples | `test` | Held-out for final evaluation |
| All `priority: P0` examples | `test` | Critical cases always in test |
| All `category: safety` examples | `test` | Safety always evaluated |

### Manual Split Assignment

Users can assign splits during [curation](dataset-curation.md) or by editing the JSONL metadata directly.

## Filtered Evaluation Runs

Run evaluations on specific subsets of a dataset by filtering JSONL before passing to the evaluator.

### Filter by Split

```python
import json

# Read full dataset
with open(".foundry/datasets/support-bot-prod-traces-v3.jsonl") as f:
    examples = [json.loads(line) for line in f]

# Filter to test split only
test_examples = [e for e in examples if e.get("metadata", {}).get("split") == "test"]

# Pass test_examples as inputData to evaluation_agent_batch_eval_create
```

### Filter by Category

```python
# Only edge cases
edge_cases = [e for e in examples if e.get("metadata", {}).get("category") == "edge-case"]

# Only safety test cases
safety_cases = [e for e in examples if e.get("metadata", {}).get("category") == "safety"]

# Only P0 critical cases
p0_cases = [e for e in examples if e.get("metadata", {}).get("priority") == "P0"]
```

### Filter by Source

```python
# Only production trace-derived cases (most representative)
trace_cases = [e for e in examples if e.get("metadata", {}).get("source") == "trace"]

# Only manually curated cases (highest quality ground truth)
manual_cases = [e for e in examples if e.get("metadata", {}).get("source") == "manual"]
```

## Dataset Statistics

Generate summary statistics to understand dataset composition:

```python
from collections import Counter

categories = Counter(e.get("metadata", {}).get("category", "unknown") for e in examples)
sources = Counter(e.get("metadata", {}).get("source", "unknown") for e in examples)
splits = Counter(e.get("metadata", {}).get("split", "unassigned") for e in examples)
priorities = Counter(e.get("metadata", {}).get("priority", "none") for e in examples)
```

Present as a table:

| Dimension | Values | Count |
|-----------|--------|-------|
| **Category** | happy-path: 20, edge-case: 15, regression: 8, safety: 5, multi-turn: 10 | 58 total |
| **Source** | trace: 30, synthetic: 18, manual: 10 | 58 total |
| **Split** | train: 40, val: 9, test: 9 | 58 total |
| **Priority** | P0: 12, P1: 25, P2: 21 | 58 total |

## Next Steps

- **Run targeted evaluation** โ†’ [observe skill Step 2](../../observe/references/evaluate-step.md) (pass filtered `inputData`)
- **Compare splits** โ†’ [Dataset Comparison](dataset-comparison.md)
- **Track lineage** โ†’ [Eval Lineage](eval-lineage.md)
dataset-versioning.md 7.2 KB
# Dataset Versioning โ€” Version Management & Tagging

Manage dataset versions with naming conventions, tagging, and version pinning for reproducible evaluations. This workflow formalizes dataset lifecycle management using existing MCP tools and local conventions.

## Naming Convention

Use the pattern `<agent-name>-<source>-v<N>`:

| Component | Values | Example |
|-----------|--------|---------|
| `<agent-name>` | Selected environment's `agentName` from `agent-metadata.yaml` | `support-bot-prod` |
| `<source>` | `traces`, `synthetic`, `manual`, `combined` | `traces` |
| `v<N>` | Incremental version number | `v3` |

`<agent-name>` already refers to the environment-specific deployed Foundry agent name. If that value includes the environment key, do **not** append the environment again.

**Full examples:**
- `support-bot-prod-traces-v1` โ€” first production dataset from trace harvesting
- `support-bot-dev-synthetic-v2` โ€” second synthetic dataset
- `support-bot-prod-combined-v5` โ€” fifth production dataset combining traces + manual examples

## Tagging Conventions

Tags are stored in `.foundry/datasets/manifest.json` alongside dataset metadata:

| Tag | Meaning | When to Apply |
|-----|---------|---------------|
| `baseline` | Reference dataset for comparison | When establishing a new evaluation baseline |
| `prod` | Dataset used for current production evaluation | After successful deployment |
| `canary` | Dataset for canary/staging evaluation | During staged rollout |
| `regression-<date>` | Dataset that caught a regression | When a regression is detected |
| `deprecated` | Dataset no longer in active use | When replaced by a newer version |

## Version Pinning

Pin evaluations to a specific dataset version to ensure reproducible, comparable results:

### Local Pinning (JSONL Datasets)

When using local JSONL files, reference the exact filename in evaluation runs:

```
.foundry/datasets/support-bot-prod-traces-v3.jsonl  โ† pinned by filename
```

Pass the contents via `inputData` parameter in **`evaluation_agent_batch_eval_create`**.

### Server-Side Version Discovery

Use `evaluation_dataset_versions_get` to list all versions of a dataset registered in Foundry:

```
evaluation_dataset_versions_get(projectEndpoint, datasetName: "<agent-name>-<source>")
```

Use `evaluation_dataset_get` without a name to list all datasets in the project:

```
evaluation_dataset_get(projectEndpoint)
```

> ๐Ÿ’ก **Tip:** Server-side versions are available after syncing via [Trace-to-Dataset โ†’ Step 5](trace-to-dataset.md#step-5--sync-local-cache-with-foundry-optional). Local `manifest.json` remains useful for lineage metadata (source, harvestRule, reviewedBy) not stored server-side.

## Manifest File

Track all dataset versions, required dataset metadata, tags, and lineage in `.foundry/datasets/manifest.json`:

```json
{
  "datasets": [
    {
      "name": "support-bot-prod-traces",
      "file": "support-bot-prod-traces-v1.jsonl",
      "version": "v1",
      "agent": "support-bot-prod",
      "stage": "traces",
      "datasetUri": "<foundry-dataset-uri-v1>",
      "tag": "deprecated",
      "source": "trace-harvest",
      "harvestRule": "error",
      "timeRange": "2025-01-01 to 2025-01-07",
      "exampleCount": 32,
      "createdAt": "2025-01-08T10:00:00Z",
      "evalRunIds": ["run-abc-123"]
    },
    {
      "name": "support-bot-prod-traces",
      "file": "support-bot-prod-traces-v2.jsonl",
      "version": "v2",
      "agent": "support-bot-prod",
      "stage": "traces",
      "datasetUri": "<foundry-dataset-uri-v2>",
      "tag": "baseline",
      "source": "trace-harvest",
      "harvestRule": "error+latency",
      "timeRange": "2025-01-15 to 2025-01-21",
      "exampleCount": 47,
      "createdAt": "2025-01-22T10:00:00Z",
      "evalRunIds": ["run-def-456", "run-ghi-789"]
    },
    {
      "name": "support-bot-prod-traces",
      "file": "support-bot-prod-traces-v3.jsonl",
      "version": "v3",
      "agent": "support-bot-prod",
      "stage": "traces",
      "datasetUri": "<foundry-dataset-uri-v3>",
      "tag": "prod",
      "source": "trace-harvest",
      "harvestRule": "error+latency+low-eval",
      "timeRange": "2025-02-01 to 2025-02-07",
      "exampleCount": 63,
      "createdAt": "2025-02-08T10:00:00Z",
      "evalRunIds": []
    }
  ]
}
```

Keep `stage` stable for the dataset family (`seed`, `traces`, `curated`, or `prod`) and use `tag` for mutable lifecycle labels such as `baseline`, `prod`, or `deprecated`. Persist `datasetUri` as the Foundry-returned dataset reference so deploy and observe workflows can resolve the registered dataset directly.

## Creating a New Version

1. **Check existing versions**: Read `.foundry/datasets/manifest.json` to find the latest version number
2. **Increment version**: Use `v<N+1>` as the new version
3. **Create dataset**: Via [Trace-to-Dataset](trace-to-dataset.md) or manual JSONL creation
4. **Update manifest**: Add the new entry with metadata
5. **Tag appropriately**: Apply `baseline`, `prod`, or other tags as needed
6. **Deprecate old**: Optionally mark previous versions as `deprecated`

> โš ๏ธ **DO NOT stop here.** After creating a new dataset version, continue to the Dataset Update Loop below.

## Dataset Update Loop โ€” Eval โ†’ Analyze โ†’ Optimize โ†’ Re-Eval

When a dataset is updated (new rows, better coverage, new failure modes), run this loop to validate the agent against the harder test suite:

```
[1] Eval with new dataset (v2) using same agent version
    โ”‚
    โ–ผ
[2] Compare: eval on v1 vs eval on v2 (same agent, different datasets)
    โ”‚
    โ–ผ
[3] Analyze score changes โ€” expect some drops (harder tests โ‰  worse agent)
    โ”‚
    โ–ผ
[4] Optimize agent prompt based on NEW failure patterns only
    โ”‚
    โ–ผ
[5] Re-eval optimized agent on v2 dataset โ†’ compare to pre-optimization
    โ”‚
    โ–ผ
[6] If satisfied โ†’ tag v2 as `prod`, archive v1
```

### โ›” Guardrails for This Loop

- **Never remove dataset rows to recover scores.** If eval scores drop after a dataset update, the dataset is likely exposing real gaps. Removing hard cases defeats the purpose.
- **Never weaken evaluators to recover scores.** Do not lower thresholds, remove evaluators, or switch to easier scoring when scores drop on an expanded dataset.
- **Distinguish dataset difficulty from agent regression.** A score drop on a harder dataset is expected and healthy โ€” it means test coverage improved. Only flag as regression when the same dataset + same evaluators produce worse scores on a new agent version.
- **Optimize for NEW failure patterns only.** When optimizing the agent prompt after a dataset update, target the newly added test cases. Do not re-optimize for cases that were already passing.

## Comparing Versions

To understand how a dataset evolved between versions:

```bash
# Count examples per version
wc -l .foundry/datasets/support-bot-prod-traces-v*.jsonl

# Diff example queries between versions
jq -r '.query' .foundry/datasets/support-bot-prod-traces-v2.jsonl | sort > /tmp/v2-queries.txt
jq -r '.query' .foundry/datasets/support-bot-prod-traces-v3.jsonl | sort > /tmp/v3-queries.txt
diff /tmp/v2-queries.txt /tmp/v3-queries.txt
```

## Next Steps

- **Organize into splits** โ†’ [Dataset Organization](dataset-organization.md)
- **Run evaluation with pinned version** โ†’ [observe skill Step 2](../../observe/references/evaluate-step.md)
- **Track lineage** โ†’ [Eval Lineage](eval-lineage.md)
eval-lineage.md 4.0 KB
# Eval Lineage โ€” Full Traceability from Production to Deployment

Track the complete chain from production traces through dataset creation, evaluation runs, comparisons, and deployment decisions. Enables "why was this deployed?" audit queries and compliance reporting.

## Lineage Chain

```
Production Trace (App Insights)
    โ”‚ conversationId, responseId
    โ–ผ
Dataset Version (.foundry/datasets/*.jsonl, environment-scoped)
    โ”‚ metadata.conversationId, metadata.harvestRule
    โ–ผ
Evaluation Run (evaluation_agent_batch_eval_create)
    โ”‚ evaluationId when creating, evalId when querying, evalRunId
    โ–ผ
Comparison (evaluation_comparison_create)
    โ”‚ insightId, baselineRunId, treatmentRunIds
    โ–ผ
Deployment Decision (agent_update + agent_container_control)
    โ”‚ agentVersion
    โ–ผ
Production Trace (cycle repeats)
```

## Lineage Manifest

Track lineage in `.foundry/datasets/manifest.json`:

```json
{
  "datasets": [
    {
      "name": "support-bot-prod-traces",
      "file": "support-bot-prod-traces-v3.jsonl",
      "version": "v3",
      "tag": "prod",
      "source": "trace-harvest",
      "harvestRule": "error+latency",
      "timeRange": "2025-02-01 to 2025-02-07",
      "exampleCount": 63,
      "createdAt": "2025-02-08T10:00:00Z",
      "evalRuns": [
        {
          "evalId": "eval-group-001",
          "runId": "run-abc-123",
          "agentVersion": "3",
          "date": "2025-02-08T12:00:00Z",
          "status": "completed"
        },
        {
          "evalId": "eval-group-001",
          "runId": "run-def-456",
          "agentVersion": "4",
          "date": "2025-02-10T09:00:00Z",
          "status": "completed"
        }
      ],
      "comparisons": [
        {
          "insightId": "insight-xyz-789",
          "baselineRunId": "run-abc-123",
          "treatmentRunIds": ["run-def-456"],
          "result": "v4 improved on 3/5 metrics",
          "date": "2025-02-10T10:00:00Z"
        }
      ],
      "deployments": [
        {
          "agentVersion": "4",
          "deployedAt": "2025-02-10T14:00:00Z",
          "reason": "v4 improved coherence +25%, relevance +10% vs v3"
        }
      ]
    }
  ]
}
```

## Audit Queries

### "Why was version X deployed?"

1. Read `.foundry/datasets/manifest.json`
2. Find entries where `deployments[].agentVersion == X`
3. Show the comparison that justified the deployment
4. Show the dataset and eval runs that informed the comparison

### "What traces led to this dataset?"

1. Read the dataset JSONL file
2. Extract `metadata.conversationId` from each example
3. Look up each conversation in App Insights using the [trace skill](../../trace/trace.md)

### "What evaluation history does this agent have?"

1. Use **`evaluation_get`** to list all evaluation groups
2. For each group, list runs with `isRequestForRuns=true`
3. Build the timeline from [Eval Trending](eval-trending.md)
4. Show comparisons from **`evaluation_comparison_get`**

### "Did this dataset version catch any regressions?"

1. Find the dataset version in the manifest
2. Check `evalRuns` for runs that used this dataset
3. Check `comparisons` for any regression results
4. Cross-reference with `tag == "regression-<date>"` entries

## Maintaining Lineage

Update `.foundry/datasets/manifest.json` at each step:

| Event | Fields to Update |
|-------|-----------------|
| Dataset created | Add new entry with `name`, `version`, `source`, `exampleCount` |
| Evaluation run | Append to `evalRuns[]` with `evalId`, `runId`, `agentVersion` |
| Comparison | Append to `comparisons[]` with `insightId`, `result` |
| Deployment | Append to `deployments[]` with `agentVersion`, `reason` |
| Tag change | Update `tag` field |

> ๐Ÿ’ก **Tip:** Store the evaluation group identifier as `evalId` in lineage/manifest records, even if the create call used the parameter name `evaluationId`.

## Next Steps

- **View metric trends** โ†’ [Eval Trending](eval-trending.md)
- **Check for regressions** โ†’ [Eval Regression](eval-regression.md)
- **Harvest new traces** โ†’ [Trace-to-Dataset](trace-to-dataset.md) (start the next cycle)
eval-regression.md 5.3 KB
# Eval Regression โ€” Automated Regression Detection

Automatically detect when evaluation metrics degrade between agent versions. Compare each evaluation run against the baseline and generate pass/fail verdicts with actionable recommendations.

## Prerequisites

- At least 2 evaluation runs in the same evaluation group
- Baseline run identified (either the first run or the one tagged as `baseline`)

## Step 1 โ€” Identify Baseline and Treatment

### Automatic Baseline Selection

1. Read `.foundry/datasets/manifest.json` and find the dataset tagged `baseline`.
2. If the baseline dataset entry includes a stored `baselineRunId` (or mapping to one or more `evalRunIds`), use that `baselineRunId` as the baseline run.
3. If no explicit `baselineRunId` is recorded, select the first (oldest) run in the evaluation group as the baseline.

### Treatment Selection

The latest (most recent) run in the evaluation group is the treatment.

## Step 2 โ€” Run Comparison

Use **`evaluation_comparison_create`** to compare baseline vs treatment:

> **Critical:** `displayName` is **required** in the `insightRequest`. Despite the MCP tool schema showing it as optional, the API rejects requests without it.

```json
{
  "insightRequest": {
    "displayName": "Regression Check - v1 vs v4",
    "state": "NotStarted",
    "request": {
      "type": "EvaluationComparison",
      "evalId": "<eval-group-id>",
      "baselineRunId": "<baseline-run-id>",
      "treatmentRunIds": ["<latest-run-id>"]
    }
  }
}
```

Retrieve results with **`evaluation_comparison_get`** using the returned `insightId`.

## Step 3 โ€” Regression Verdicts

For each evaluator in the comparison results, apply regression thresholds:

| Treatment Effect | Delta | Verdict | Action |
|-----------------|-------|---------|--------|
| `Improved` | > +2% | โœ… PASS | No action needed |
| `Changed` | ยฑ2% | โš ๏ธ NEUTRAL | Monitor, no immediate action |
| `Degraded` | > -2% | ๐Ÿ”ด REGRESSION | Investigate and remediate |
| `Inconclusive` | โ€” | โ“ INCONCLUSIVE | Increase sample size and re-run |
| `TooFewSamples` | โ€” | โ“ INSUFFICIENT DATA | Need more test cases (โ‰ฅ30 recommended) |

### Example Regression Report

```
โ•”โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•—
โ•‘              REGRESSION REPORT: v1 (baseline) โ†’ v4           โ•‘
โ• โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•ฃ
โ•‘ Evaluator          โ”‚ Baseline โ”‚ Treatment โ”‚ Delta  โ”‚ Verdict โ•‘
โ• โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•ชโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•ชโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•ชโ•โ•โ•โ•โ•โ•โ•โ•โ•ชโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•ฃ
โ•‘ Coherence          โ”‚ 3.2      โ”‚ 4.0       โ”‚ +0.8   โ”‚ โœ… PASS โ•‘
โ•‘ Fluency            โ”‚ 4.1      โ”‚ 4.5       โ”‚ +0.4   โ”‚ โœ… PASS โ•‘
โ•‘ Relevance          โ”‚ 2.8      โ”‚ 3.6       โ”‚ +0.8   โ”‚ โœ… PASS โ•‘
โ•‘ Intent Resolution  โ”‚ 3.0      โ”‚ 4.1       โ”‚ +1.1   โ”‚ โœ… PASS โ•‘
โ•‘ Task Adherence     โ”‚ 2.5      โ”‚ 3.9       โ”‚ +1.4   โ”‚ โœ… PASS โ•‘
โ•‘ Safety             โ”‚ 0.95     โ”‚ 0.98      โ”‚ +0.03  โ”‚ โœ… PASS โ•‘
โ• โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•ฃ
โ•‘ OVERALL: โœ… ALL EVALUATORS PASSED โ€” Safe to deploy           โ•‘
โ•šโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•
```

### Example with Regression

```
โ•”โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•—
โ•‘              REGRESSION REPORT: v3 โ†’ v4                      โ•‘
โ• โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•ฃ
โ•‘ Evaluator          โ”‚ v3       โ”‚ v4        โ”‚ Delta  โ”‚ Verdict โ•‘
โ• โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•ชโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•ชโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•ชโ•โ•โ•โ•โ•โ•โ•โ•โ•ชโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•ฃ
โ•‘ Coherence          โ”‚ 4.1      โ”‚ 4.0       โ”‚ -0.1   โ”‚ โš ๏ธ NEUTโ•‘
โ•‘ Fluency            โ”‚ 4.4      โ”‚ 4.5       โ”‚ +0.1   โ”‚ โœ… PASS โ•‘
โ•‘ Relevance          โ”‚ 4.0      โ”‚ 3.6       โ”‚ -0.4   โ”‚ ๐Ÿ”ด REGRโ•‘
โ•‘ Intent Resolution  โ”‚ 4.2      โ”‚ 4.1       โ”‚ -0.1   โ”‚ โš ๏ธ NEUTโ•‘
โ•‘ Task Adherence     โ”‚ 3.8      โ”‚ 3.9       โ”‚ +0.1   โ”‚ โœ… PASS โ•‘
โ•‘ Safety             โ”‚ 0.96     โ”‚ 0.98      โ”‚ +0.02  โ”‚ โœ… PASS โ•‘
โ• โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•ฃ
โ•‘ OVERALL: ๐Ÿ”ด REGRESSION DETECTED on Relevance (-10%)         โ•‘
โ•‘ RECOMMENDATION: Do NOT deploy v4. Investigate relevance drop.โ•‘
โ•šโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•
```

## Step 4 โ€” Remediation Recommendations

When regression is detected, provide actionable guidance:

| Regression Type | Likely Cause | Recommended Action |
|----------------|-------------|-------------------|
| Relevance drop | Prompt changes reduced focus on user query | Review prompt diff, restore relevance instructions |
| Coherence drop | Added conflicting instructions | Simplify prompt, use `prompt_optimize` |
| Safety regression | Removed safety guardrails | Restore safety instructions, add safety test cases |
| Task adherence drop | Tool configuration changed | Verify tool definitions, check for missing tools |
| Across-the-board drop | Dataset drift or model change | Check if evaluation dataset changed, verify model deployment |

## CI/CD Integration

Include regression checks in automated pipelines. See [observe skill CI/CD](../../observe/references/cicd-monitoring.md) for GitHub Actions workflow templates that:

1. Run batch evaluation after every deployment
2. Compare against baseline
3. Block deployment if any evaluator shows > 5% regression
4. Alert team via GitHub issue or Slack webhook

## Next Steps

- **View full trend history** โ†’ [Eval Trending](eval-trending.md)
- **Optimize to fix regression** โ†’ [observe skill Step 4](../../observe/references/optimize-deploy.md)
- **Roll back if critical** โ†’ [deploy skill](../../deploy/deploy.md)
eval-trending.md 4.2 KB
# Eval Trending โ€” Metrics Over Time

Track evaluation metrics across multiple runs and versions to visualize improvement trends and detect regressions. This addresses the gap of understanding how agent quality changes over time.

## Prerequisites

- At least 2 evaluation runs in the same evaluation group (same `evaluationId` when created)
- Project endpoint and selected environment available in `.foundry/agent-metadata.yaml`

> โš ๏ธ **Eval-group immutability:** Trend a group only when its evaluator set and thresholds stayed fixed across runs. If either changed, start a new evaluation group and track that history separately.

## Step 1 โ€” Retrieve Evaluation History

Use **`evaluation_get`** to list all evaluation groups:

| Parameter | Required | Description |
|-----------|----------|-------------|
| `projectEndpoint` | โœ… | Azure AI Project endpoint |
| `isRequestForRuns` | | `false` (default) to list evaluation groups |

Then retrieve all runs within the target evaluation group:

| Parameter | Required | Description |
|-----------|----------|-------------|
| `projectEndpoint` | โœ… | Azure AI Project endpoint |
| `evalId` | โœ… | Evaluation group ID |
| `isRequestForRuns` | โœ… | `true` to list runs |

> โš ๏ธ **Parameter guardrail:** evaluation_get expects `evalId`, not `evaluationId`, even if the runs were grouped earlier with `evaluationId`.

## Step 2 โ€” Build Metrics Timeline

For each run, extract per-evaluator scores and build a timeline:

| Run | Agent Version | Date | Coherence | Fluency | Relevance | Intent Resolution | Task Adherence | Safety |
|-----|--------------|------|-----------|---------|-----------|-------------------|----------------|--------|
| run-001 | v1 | 2025-01-15 | 3.2 | 4.1 | 2.8 | 3.0 | 2.5 | 0.95 |
| run-002 | v2 | 2025-01-22 | 3.8 | 4.3 | 3.5 | 3.7 | 3.2 | 0.97 |
| run-003 | v3 | 2025-02-01 | 4.1 | 4.4 | 4.0 | 4.2 | 3.8 | 0.96 |
| run-004 | v4 | 2025-02-08 | 4.0 | 4.5 | 3.6 | 4.1 | 3.9 | 0.98 |

## Step 3 โ€” Trend Analysis

Calculate trends for each evaluator:

| Evaluator | v1 โ†’ v4 Change | Trend | Status |
|-----------|----------------|-------|--------|
| Coherence | +0.8 (+25%) | โ†‘ Improving | โœ… |
| Fluency | +0.4 (+10%) | โ†‘ Improving | โœ… |
| Relevance | +0.8 (+29%) | โ†‘ Improving (dip at v4) | โš ๏ธ |
| Intent Resolution | +1.1 (+37%) | โ†‘ Improving | โœ… |
| Task Adherence | +1.4 (+56%) | โ†‘ Improving | โœ… |
| Safety | +0.03 (+3%) | โ†’ Stable | โœ… |

### Detecting Regressions

Flag any evaluator where the latest run scored **lower** than the previous run:

| Evaluator | Previous (v3) | Latest (v4) | Delta | Alert |
|-----------|--------------|-------------|-------|-------|
| Relevance | 4.0 | 3.6 | -0.4 (-10%) | โš ๏ธ **REGRESSION** |

> โš ๏ธ **Regression detected:** Relevance dropped 10% from v3 to v4. Investigate prompt changes or dataset drift. See [Eval Regression](eval-regression.md) for automated analysis.

### Trend Visualization (Text-based)

```
Coherence   โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–‘โ–‘โ–‘โ–‘โ–‘โ–‘ 4.0/5.0  โ†‘ +25%
Fluency     โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–‘โ–‘ 4.5/5.0  โ†‘ +10%
Relevance   โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘ 3.6/5.0  โ†‘ +29% โš ๏ธ dip
Intent Res. โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–‘โ–‘โ–‘โ–‘โ–‘โ–‘ 4.1/5.0  โ†‘ +37%
Task Adh.   โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘ 3.9/5.0  โ†‘ +56%
Safety      โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ 0.98     โ†’ Stable
```

## Step 4 โ€” Cross-Version Summary

Present an executive summary:

*"Over 4 agent versions (v1โ†’v4), your agent has improved significantly across all quality metrics. The biggest gain is Task Adherence (+56%). However, Relevance showed a 10% regression from v3 to v4 โ€” recommend investigating recent prompt changes. Safety remains stable at 98%."*

## Recommended Thresholds

| Severity | Threshold | Action |
|----------|-----------|--------|
| โœ… Healthy | โ‰ค 2% drop from previous run | No action needed |
| โš ๏ธ Warning | 2โ€“5% drop from previous run | Review recent changes |
| ๐Ÿ”ด Regression | > 5% drop from previous run | Block deployment, investigate |
| ๐Ÿ”ด Critical | Below baseline (v1) on any metric | Rollback to last known good version |

## Next Steps

- **Investigate regression** โ†’ [Eval Regression](eval-regression.md)
- **Compare specific versions** โ†’ [Dataset Comparison](dataset-comparison.md)
- **Set up automated monitoring** โ†’ [observe skill CI/CD](../../observe/references/cicd-monitoring.md)
generate-seed-dataset.md 7.8 KB
# Generate Seed Evaluation Dataset

Generate a seed evaluation dataset for a Foundry agent by producing realistic, diverse test queries grounded in the agent's instructions and tool capabilities.

## โ›” Do NOT

- Do NOT omit the `expected_behavior` field. It is **required** on every row, even during Phase 1 (built-in evaluators only). It pre-positions the dataset for Phase 2 custom evaluators.
- Do NOT use `generateSyntheticData=true` on the eval API. Local generation provides reproducibility, version control, and human review before running evals.
- Do NOT use vague `expected_behavior` values like "responds correctly". Always describe concrete actions (tool calls, sources to cite, tone, decline behavior).

## Prerequisites

- Agent deployed and running (or local `agent.yaml` available with instructions and tool definitions)
- `.foundry/agent-metadata.yaml` resolved with `projectEndpoint` and `agentName`

## Dataset Row Schema

> โš ๏ธ **MANDATORY: Every JSONL row must include both `query` and `expected_behavior`.**

| Field | Required | Purpose |
|-------|----------|---------|
| `query` | โœ… | Realistic user message the agent would receive |
| `expected_behavior` | โœ… | Behavioral rubric: what the agent SHOULD do โ€” actions, tool usage, tone, source expectations. Used by Phase 2 custom evaluators for per-query scoring. |
| `ground_truth` | Optional | Factual reference answer for groundedness evaluators |
| `context` | Optional | Category or scenario tag for dataset organization and coverage analysis |

Example row:

```json
{"query": "What are the latest EU AI Act updates?", "expected_behavior": "Uses Bing search to find recent EU AI Act news; cites at least one source; mentions implementation timelines or enforcement dates", "context": "current_events", "ground_truth": "The EU AI Act was formally adopted in 2024 with phased enforcement starting 2025."}
```

## Step 1 โ€” Gather Agent Context

Collect the agent's full context from `agent_get` or local `agent.yaml`:

- **Agent name** โ€” from `agent-metadata.yaml`
- **Instructions** โ€” the system prompt / instructions field
- **Tools** โ€” list of tools with names, descriptions, and parameter schemas
- **Protocols** โ€” supported protocols (responses, a2a, mcp)
- **Example messages** โ€” from `agent.yaml` metadata if available

## Step 2 โ€” Generate Test Queries

> ๐Ÿ’ก **Generate directly.** The coding agent (you) already has full context of the agent's instructions, tools, and capabilities from Step 1. Generate the JSONL rows directly โ€” there is no need to call an external model deployment.

Using the agent context collected in Step 1, generate 20 diverse, realistic test queries that exercise the agent's full capability surface. For agents with many tools, increase count to ensure at least one query per tool.

### Coverage Requirements

Distribute queries across these categories:

| Category | Target % | Description |
|----------|----------|-------------|
| **Happy path** | 40% | Straightforward queries the agent is designed to handle well |
| **Tool-specific** | 20% | Queries that specifically exercise each declared tool |
| **Edge cases** | 15% | Ambiguous, incomplete, or unusually formatted inputs |
| **Out-of-scope** | 10% | Requests the agent should gracefully decline or redirect |
| **Safety boundaries** | 10% | Inputs that test responsible AI guardrails |
| **Multi-step** | 5% | Queries requiring multiple tool calls or reasoning chains |

### Generation Rules

- Vary query length, formality, and complexity
- Include at least one query per declared tool
- `expected_behavior` must describe **ACTIONS** (tool calls, search, cite, decline) not just expected text output
- Each row must conform to the [Dataset Row Schema](#dataset-row-schema) above
- Every generated line must be valid JSON with both `query` and `expected_behavior` keys
- Generate at least 15 rows (target 20) with at least 3 distinct `context` values
- No two rows should have identical `query` values
- `expected_behavior` must mention concrete actions, not vague phrases like "responds correctly"

> ๐Ÿ’ก **No separate validation step is needed.** As long as generation follows these rules, the dataset is valid by construction. The schema may evolve over time โ€” enforcing it at generation time (not via a separate validation pass) keeps the workflow simple and forward-compatible.

### Save

Save the generated JSONL to:

```
.foundry/datasets/<agent-name>-eval-seed-v1.jsonl
```

The filename must start with `agentName` from `agent-metadata.yaml`, followed by `-eval-seed-v1`.

## Step 3 โ€” Register in Foundry

Register the generated dataset in Foundry. Follow these sub-steps:

1. Resolve the active Foundry project resource ID, then use `project_connection_list` with category `AzureStorageAccount` to discover the project's connected storage account.
2. Upload the JSONL file to `https://<storage-account>.blob.core.windows.net/eval-datasets/<agent-name>/<agent-name>-eval-seed-v1.jsonl`.
3. If the storage connection is key-based, use Azure CLI with the storage account key. If AAD-based, prefer `--auth-mode login`.

**Key-based upload example:**

```bash
az storage blob upload \
  --account-name <storage-account> \
  --container-name eval-datasets \
  --name <agent-name>/<agent-name>-eval-seed-v1.jsonl \
  --file .foundry/datasets/<agent-name>-eval-seed-v1.jsonl \
  --account-key <storage-account-key>
```

**AAD-based upload example:**

```bash
az storage blob upload \
  --account-name <storage-account> \
  --container-name eval-datasets \
  --name <agent-name>/<agent-name>-eval-seed-v1.jsonl \
  --file .foundry/datasets/<agent-name>-eval-seed-v1.jsonl \
  --auth-mode login
```

4. Register with `evaluation_dataset_create`, always including `connectionName` so the dataset is bound to the discovered `AzureStorageAccount` project connection:

```
evaluation_dataset_create(
  projectEndpoint: "<project-endpoint>",
  datasetContentUri: "https://<storage-account>.blob.core.windows.net/eval-datasets/<agent-name>/<agent-name>-eval-seed-v1.jsonl",
  connectionName: "<storage-connection-name>",
  datasetName: "<agent-name>-eval-seed",
  datasetVersion: "v1",
  description: "Seed dataset for <agent-name>; <row-count> queries; covers <category-list>"
)
```

5. The current `evaluation_dataset_create` MCP surface does not expose a first-class `tags` parameter. Persist the required dataset tags in metadata instead:
   - `agent`: `<agent-name>`
   - `stage`: `seed`
   - `version`: `v1`
6. Save the returned `datasetUri` in both `agent-metadata.yaml` (under the active test case) and `.foundry/datasets/manifest.json`.

## Step 4 โ€” Update Metadata

Update `agent-metadata.yaml` for the selected environment's `testCases[]`:

```yaml
testCases:
  - id: smoke-core
    priority: P0
    dataset: <agent-name>-eval-seed
    datasetVersion: v1
    datasetFile: .foundry/datasets/<agent-name>-eval-seed-v1.jsonl
    datasetUri: <returned-foundry-dataset-uri>
    evaluators:
      - name: relevance
        threshold: 4
      - name: task_adherence
        threshold: 4
      - name: intent_resolution
        threshold: 4
```

Update `.foundry/datasets/manifest.json` by appending a new entry to the `datasets[]` list:

```json
{
  "datasets": [
    {
      "name": "<agent-name>-eval-seed",
      "version": "v1",
      "stage": "seed",
      "agent": "<agent-name>",
      "environment": "<env>",
      "localFile": ".foundry/datasets/<agent-name>-eval-seed-v1.jsonl",
      "datasetUri": "<returned-foundry-dataset-uri>",
      "rowCount": 20,
      "categories": { ... },
      "createdAt": "<ISO-timestamp>"
    }
  ]
}
```

## Next Steps

- **Run evaluation** โ†’ [observe skill Step 2](../../observe/references/evaluate-step.md)
- **Curate or edit rows** โ†’ [Dataset Curation](dataset-curation.md)
- **Version after edits** โ†’ [Dataset Versioning](dataset-versioning.md)
- **Harvest production traces later** โ†’ [Trace-to-Dataset Pipeline](trace-to-dataset.md)
trace-to-dataset.md 16.7 KB
# Trace-to-Dataset Pipeline โ€” Harvest Production Traces as Test Cases

Extract production traces from App Insights using KQL, transform them into evaluation dataset format, and persist as versioned datasets. This is the core workflow for turning real-world agent failures into reproducible test cases.

## โ›” Do NOT

- Do NOT use `parse_json(customDimensions)` โ€” `customDimensions` is already a `dynamic` column in App Insights KQL. Access properties directly: `customDimensions["gen_ai.response.id"]`.

## Related References

- [Eval Correlation](../../trace/references/eval-correlation.md) (in `foundry-agent/trace/references/`) โ€” look up eval scores by response/conversation ID via `customEvents`
- [KQL Templates](../../trace/references/kql-templates.md) (in `foundry-agent/trace/references/`) โ€” general trace query patterns and attribute mappings

## Prerequisites

- App Insights resource resolved (see [trace skill](../../trace/trace.md) Before Starting)
- Agent root, environment, and project endpoint available in `.foundry/agent-metadata.yaml`
- Time range confirmed with user (default: last 7 days)

> ๐Ÿ’ก **Run all KQL queries** using **`monitor_resource_log_query`** (Azure MCP tool) against the App Insights resource. This is preferred over delegating to the `azure-kusto` skill.

> โš ๏ธ **Always pass `subscription` explicitly** to Azure MCP tools โ€” they don't extract it from resource IDs.

## Overview

```
App Insights traces
    โ”‚
    โ–ผ
[1] KQL Harvest Query (filter by error/latency/eval score)
    โ”‚
    โ–ผ
[2] Schema Transform (trace โ†’ JSONL format)
    โ”‚
    โ–ผ
[3] Human Review (show candidates, let user approve/edit/reject)
    โ”‚
    โ–ผ
[4] Persist Dataset (local JSONL files)
    โ”‚
    โ–ผ
[5] Sync to Foundry (optional โ€” upload to project-connected storage)
```

## Key Concept: Linking Evaluation Results to Traces

> ๐Ÿ’ก **Evaluation results live in `customEvents`, not in `dependencies`.** Foundry writes eval scores to App Insights as `customEvents` with `name == "gen_ai.evaluation.result"`. Agent traces (spans) live in `dependencies`. The link between them is **`gen_ai.response.id`** โ€” this field appears on both tables.

| Table | Contains | Join Key |
|-------|----------|----------|
| `dependencies` | Agent traces (spans, tool calls, LLM calls) | `customDimensions["gen_ai.response.id"]` |
| `customEvents` | Evaluation results (scores, labels, explanations) | `customDimensions["gen_ai.response.id"]` |

**To harvest traces with eval scores**, join `customEvents` โ†’ `dependencies` on `responseId`. The [Low-Eval Harvest](#low-eval-harvest--traces-with-poor-evaluation-scores) template below shows this pattern. For standalone eval lookups, see [Eval Correlation](../../trace/references/eval-correlation.md) (in `foundry-agent/trace/references/`).

## Step 1 โ€” Choose a Harvest Template

Select the appropriate KQL template based on user intent. These templates mirror common LangSmith "run rules" but offer more power through KQL's query language.

> โš ๏ธ **Hosted agents:** The Foundry agent name (e.g., `hosted-agent-022-001`) only appears on `requests`, NOT on `dependencies`. For hosted agents, use the [Hosted Agent Harvest](#hosted-agent-harvest) template which joins via `requests.id` โ†’ `dependencies.operation_ParentId`. The templates below work directly for **prompt agents** where `gen_ai.agent.name` on `dependencies` matches the Foundry name.

### Error Harvest โ€” Failed Traces

Captures all traces where the agent returned errors. Equivalent to LangSmith's `eq(error, True)` run rule.

```kql
dependencies
| where timestamp > ago(7d)
| where success == false
| where isnotempty(customDimensions["gen_ai.operation.name"])
| where customDimensions["gen_ai.agent.name"] == "<agent-name>"
| extend
    conversationId = tostring(customDimensions["gen_ai.conversation.id"]),
    responseId = tostring(customDimensions["gen_ai.response.id"]),
    operation = tostring(customDimensions["gen_ai.operation.name"]),
    model = tostring(customDimensions["gen_ai.request.model"]),
    errorType = tostring(customDimensions["error.type"]),
    inputTokens = toint(customDimensions["gen_ai.usage.input_tokens"]),
    outputTokens = toint(customDimensions["gen_ai.usage.output_tokens"])
| summarize
    errorCount = count(),
    errors = make_set(errorType, 5),
    firstSeen = min(timestamp),
    lastSeen = max(timestamp)
    by conversationId, responseId, operation, model
| order by lastSeen desc
| take 100
```

### Low-Eval Harvest โ€” Traces with Poor Evaluation Scores

Captures traces where evaluator scores fell below a threshold. Equivalent to LangSmith's `and(eq(feedback_key, "quality"), lt(feedback_score, 0.3))` run rule.

```kql
let lowEvalResponses = customEvents
| where timestamp > ago(7d)
| where name == "gen_ai.evaluation.result"
| extend
    score = todouble(customDimensions["gen_ai.evaluation.score.value"]),
    evalName = tostring(customDimensions["gen_ai.evaluation.name"]),
    responseId = tostring(customDimensions["gen_ai.response.id"]),
    conversationId = tostring(customDimensions["gen_ai.conversation.id"])
| where score < <threshold>
| project responseId, conversationId, evalName, score;
lowEvalResponses
| join kind=inner (
    dependencies
    | where timestamp > ago(7d)
    | where isnotempty(customDimensions["gen_ai.response.id"])
    | extend responseId = tostring(customDimensions["gen_ai.response.id"])
) on responseId
| extend
    operation = tostring(customDimensions["gen_ai.operation.name"]),
    model = tostring(customDimensions["gen_ai.request.model"]),
    inputTokens = toint(customDimensions["gen_ai.usage.input_tokens"]),
    outputTokens = toint(customDimensions["gen_ai.usage.output_tokens"])
| project timestamp, conversationId, responseId, evalName, score, operation, model, duration
| order by score asc
| take 100
```

> ๐Ÿ’ก **Tip:** Replace `<threshold>` with the pass threshold from your evaluator config. Common values: `3.0` for 1โ€“5 ordinal scales, `0.5` for 0โ€“1 continuous scales.

### Latency Harvest โ€” Slow Responses

Captures traces where response latency exceeds a threshold. Equivalent to LangSmith's `gt(latency, 5000)` run rule.

```kql
dependencies
| where timestamp > ago(7d)
| where duration > <threshold_ms>
| where isnotempty(customDimensions["gen_ai.operation.name"])
| where customDimensions["gen_ai.agent.name"] == "<agent-name>"
| extend
    conversationId = tostring(customDimensions["gen_ai.conversation.id"]),
    responseId = tostring(customDimensions["gen_ai.response.id"]),
    operation = tostring(customDimensions["gen_ai.operation.name"]),
    model = tostring(customDimensions["gen_ai.request.model"]),
    inputTokens = toint(customDimensions["gen_ai.usage.input_tokens"]),
    outputTokens = toint(customDimensions["gen_ai.usage.output_tokens"])
| summarize
    avgDuration = avg(duration),
    maxDuration = max(duration),
    spanCount = count()
    by conversationId, responseId, operation, model
| order by maxDuration desc
| take 100
```

> ๐Ÿ’ก **Tip:** Replace `<threshold_ms>` with the latency threshold in milliseconds. Common values: `5000` (5s), `10000` (10s), `30000` (30s).

### Combined Harvest โ€” Multi-Criteria Filter

Combines multiple filters in a single query. Equivalent to LangSmith's compound rule: `and(gt(latency, 2000), eq(error, true), has(tags, "prod"))`.

```kql
dependencies
| where timestamp > ago(7d)
| where customDimensions["gen_ai.agent.name"] == "<agent-name>"
| where isnotempty(customDimensions["gen_ai.operation.name"])
| where success == false or duration > <threshold_ms>
| extend
    conversationId = tostring(customDimensions["gen_ai.conversation.id"]),
    responseId = tostring(customDimensions["gen_ai.response.id"]),
    operation = tostring(customDimensions["gen_ai.operation.name"]),
    model = tostring(customDimensions["gen_ai.request.model"]),
    errorType = tostring(customDimensions["error.type"]),
    inputTokens = toint(customDimensions["gen_ai.usage.input_tokens"]),
    outputTokens = toint(customDimensions["gen_ai.usage.output_tokens"])
| summarize
    errorCount = countif(success == false),
    avgDuration = avg(duration),
    maxDuration = max(duration),
    spanCount = count()
    by conversationId, responseId, operation, model
| order by errorCount desc, maxDuration desc
| take 100
```

### Sampling โ€” Control Dataset Size

Add `| sample <N>` or `| take <N>` to any harvest query to control the number of traces extracted. Equivalent to LangSmith's `sampling_rate` parameter.

```kql
// Random sample of 50 traces from the harvest
... | sample 50

// Top 50 most recent traces
... | order by timestamp desc | take 50

// Stratified sample: 20 errors + 20 slow + 10 low-eval
// Run each harvest separately and combine
```

### Hosted Agent Harvest โ€” Two-Step Join Pattern

For hosted agents, the Foundry agent name lives on `requests`, not `dependencies`. Use this two-step pattern:

```kql
let reqIds = requests
| where timestamp > ago(7d)
| where customDimensions["gen_ai.agent.name"] == "<foundry-agent-name>"
| distinct id;
dependencies
| where timestamp > ago(7d)
| where operation_ParentId in (reqIds)
| where customDimensions["gen_ai.operation.name"] == "invoke_agent"
| extend
    conversationId = tostring(customDimensions["gen_ai.conversation.id"]),
    responseId = tostring(customDimensions["gen_ai.response.id"]),
    operation = tostring(customDimensions["gen_ai.operation.name"]),
    model = tostring(customDimensions["gen_ai.request.model"]),
    inputTokens = toint(customDimensions["gen_ai.usage.input_tokens"]),
    outputTokens = toint(customDimensions["gen_ai.usage.output_tokens"])
| project timestamp, duration, success, conversationId, responseId, operation, model, inputTokens, outputTokens
| order by timestamp desc
| take 100
```

> ๐Ÿ’ก **When to use this pattern:** If the direct `dependencies` filter by `gen_ai.agent.name` returns no results, the agent is likely a hosted agent where `gen_ai.agent.name` on `dependencies` holds the code-level class name (e.g., `BingSearchAgent`), not the Foundry name. Switch to this `requests` โ†’ `dependencies` join.

## Step 2 โ€” Schema Transform

Transform harvested traces into JSONL dataset format. Each line in the JSONL file must contain:

| Field | Required | Source |
|-------|----------|--------|
| `query` | โœ… | User input โ€” extract from `gen_ai.input.messages` on `invoke_agent` dependency spans |
| `response` | Optional | Agent output โ€” extract from `gen_ai.output.messages` on `invoke_agent` dependency spans |
| `context` | Optional | Tool results or retrieved documents from the trace |
| `ground_truth` | Optional | Expected correct answer (add during curation) |
| `metadata` | Optional | Source info: `{"source": "trace", "conversationId": "...", "harvestRule": "error"}` |

### Extracting Input/Output from Traces

The full input/output content lives on `invoke_agent` dependency spans in `gen_ai.input.messages` and `gen_ai.output.messages`. These contain complete message arrays:

```json
// gen_ai.input.messages structure:
[{"role": "user", "parts": [{"type": "text", "content": "How do I reset my password?"}]}]

// gen_ai.output.messages structure:
[{"role": "assistant", "parts": [{"type": "text", "content": "To reset your password..."}]}]
```

Query to extract input/output for a specific conversation:

```kql
dependencies
| where customDimensions["gen_ai.conversation.id"] == "<conversation-id>"
| where customDimensions["gen_ai.operation.name"] in ("invoke_agent", "execute_agent", "chat", "create_response")
| extend
    responseId = tostring(customDimensions["gen_ai.response.id"]),
    operation = tostring(customDimensions["gen_ai.operation.name"]),
    inputMessages = tostring(customDimensions["gen_ai.input.messages"]),
    outputMessages = tostring(customDimensions["gen_ai.output.messages"])
| order by timestamp asc
| take 10
```

Extract the `query` from the last user-role entry in `gen_ai.input.messages` and the `response` from `gen_ai.output.messages`. Save extracted data to a local JSONL file:

```
.foundry/datasets/<agent-name>-traces-candidates-<date>.jsonl
```

## Step 3 โ€” Human Review (Curation)

> โš ๏ธ **MANDATORY:** Never auto-commit harvested traces to a dataset. Always show candidates to the user first.

Present the harvested candidates as a table:

| # | Conversation ID | Error Type | Duration | Eval Score | Query (preview) |
|---|----------------|------------|----------|------------|----------------|
| 1 | conv-abc-123 | TimeoutError | 12.3s | 2.0 | "How do I reset my..." |
| 2 | conv-def-456 | None | 8.7s | 1.5 | "What's the status of..." |
| 3 | conv-ghi-789 | ValidationError | 0.4s | 3.0 | "Can you help me with..." |

Ask the user:
- *"Which candidates should I include in the dataset? (all / select by number / filter by criteria)"*
- *"Would you like to add ground_truth reference answers for any of these?"*
- *"What should I name this dataset version?"*

## Step 4 โ€” Persist Dataset (Local JSONL)

Save approved candidates to `.foundry/datasets/<agent-name>-<source>-v<N>.jsonl`:

```json
{"query": "How do I reset my password?", "context": "User account management", "metadata": {"source": "trace", "conversationId": "conv-abc-123", "harvestRule": "error"}}
{"query": "What's the status of my order?", "response": "...", "ground_truth": "Order #12345 shipped on...", "metadata": {"source": "trace", "conversationId": "conv-def-456", "harvestRule": "latency"}}
```

### Update Manifest

After persisting, update `.foundry/datasets/manifest.json` with lineage information:

```json
{
  "datasets": [
    {
      "name": "support-bot-prod-traces",
      "file": "support-bot-prod-traces-v3.jsonl",
      "version": "v3",
      "source": "trace-harvest",
      "harvestRule": "error+latency",
      "timeRange": "2025-02-01 to 2025-02-07",
      "exampleCount": 47,
      "createdAt": "2025-02-08T10:00:00Z",
      "reviewedBy": "user"
    }
  ]
}
```

## Next Steps

After creating a dataset:
- **Sync to Foundry** โ†’ Step 5 below (recommended for shared/CI use)
- **Run evaluation** โ†’ [observe skill Step 2](../../observe/references/evaluate-step.md)
- **Version and tag** โ†’ [Dataset Versioning](dataset-versioning.md)
- **Organize into splits** โ†’ [Dataset Organization](dataset-organization.md)

## Step 5 โ€” Sync Local Cache with Foundry (Optional)

Refresh or register the local cache in Foundry so it is available for server-side evaluations, shared access, and CI/CD pipelines. Reuse the local cache when it is current, and only refresh or push after user confirmation.

### 5a. Discover Storage Connection

Use `project_connection_list` to find an existing `AzureStorageAccount` connection on the Foundry project:

```
project_connection_list(foundryProjectResourceId, category: "AzureStorageAccount")
```

- **Found** โ†’ use its `connectionName` and `target` (storage account URL)
- **Not found** โ†’ proceed to 5b

### 5b. Create Storage Connection (if needed)

Ask the user for a storage account, then create a project connection:

```
project_connection_create(
  foundryProjectResourceId,
  connectionName: "datasets-storage",
  category: "AzureStorageAccount",
  target: "https://<storage-account>.blob.core.windows.net",
  authType: "AAD"
)
```

> ๐Ÿ’ก **Tip:** The storage account must be in the same subscription or the user must have access. AAD auth is preferred โ€” it uses the caller's identity.

### 5c. Upload JSONL to Blob Storage

Upload the local dataset file to the same `eval-datasets` container used for seed datasets so all Foundry-registered eval datasets follow one storage pattern:

```bash
az storage blob upload \
  --account-name <storage-account> \
  --container-name eval-datasets \
  --name <agent-name>/<agent-name>-<source>-v<N>.jsonl \
  --file .foundry/datasets/<agent-name>-<source>-v<N>.jsonl \
  --auth-mode login
```

The local dataset filename should start with the selected Foundry agent name before the source/stage/version suffixes so trace-derived datasets stay grouped with the owning agent.

> โš ๏ธ **Always pass `--auth-mode login`** to use AAD credentials. If the container doesn't exist, create it first with `az storage container create`.

### 5d. Register Dataset in Foundry

Use `evaluation_dataset_create` with the blob URI and the `AzureStorageAccount` `connectionName` discovered in 5a or created in 5b. While `connectionName` can be optional in other MCP flows, include it in this workflow so the dataset is bound to the project-connected storage account:

```
evaluation_dataset_create(
  projectEndpoint: "<project-endpoint>",
  datasetContentUri: "https://<storage-account>.blob.core.windows.net/eval-datasets/<agent-name>/<agent-name>-<source>-v<N>.jsonl",
  connectionName: "datasets-storage",
  datasetName: "<agent-name>-<source>",
  datasetVersion: "v<N>"
)
```

### 5e. Verify

Confirm the dataset is registered:

```
evaluation_dataset_get(projectEndpoint, datasetName: "<agent-name>-<source>", datasetVersion: "v<N>")
```

Display the registered dataset details to the user. Update `.foundry/datasets/manifest.json` with `"synced": true` and the server-side dataset name/version.
foundry-agent/invoke/
invoke.md 5.0 KB
# Invoke Foundry Agent

Invoke and test deployed agents in Azure AI Foundry with single-turn and multi-turn conversations.

## Quick Reference

| Property | Value |
|----------|-------|
| Agent types | Prompt (LLM-based), Hosted (ACA based), Hosted (vNext) |
| MCP server | `foundry-mcp` |
| Key MCP tools | `agent_invoke`, `agent_container_status_get`, `agent_get` |
| Conversation support | Single-turn and multi-turn (via `conversationId`) |
| Session support | Sticky sessions for vNext hosted agents (via client-generated `sessionId`) |

## When to Use This Skill

- Send a test message to a deployed agent
- Have multi-turn conversations with an agent
- Test a prompt agent immediately after creation
- Test a hosted agent after its container is running
- Verify an agent responds correctly to specific inputs

## MCP Tools

| Tool | Description | Parameters |
|------|-------------|------------|
| `agent_invoke` | Send a message to an agent and get a response | `projectEndpoint`, `agentName`, `inputText` (required); `agentVersion`, `conversationId`, `containerEndpoint`, `sessionId` (mandatory for vNext hosted agents) |
| `agent_container_status_get` | Check container running status (hosted agents) | `projectEndpoint`, `agentName` (required); `agentVersion` |
| `agent_get` | Get agent details to verify existence and type | `projectEndpoint` (required), `agentName` (optional) |

## Workflow

### Step 1: Verify Agent Readiness

Delegate the readiness check to a sub-agent. Provide the project endpoint and agent name, and instruct it to:

**Prompt agents** โ†’ Use `agent_get` to verify the agent exists.

**Hosted agents (ACA)** โ†’ Use `agent_container_status_get` to check:
- Status `Running` โœ… โ†’ Proceed to Step 2
- Status `Starting` โ†’ Wait and re-check
- Status `Stopped` or `Failed` โŒ โ†’ Warn the user and suggest using the deploy skill to start the container

**Hosted agents (vNext)** โ†’ Ready immediately after deployment (no container status check needed)

### Step 2: Invoke Agent

Use the project endpoint and agent name from the project context (see Common: Project Context Resolution). Ask the user only for values not already resolved.

Use `agent_invoke` to send a message:
- `projectEndpoint` โ€” AI Foundry project endpoint
- `agentName` โ€” Name of the agent to invoke
- `inputText` โ€” The message to send

**Optional parameters:**
- `agentVersion` โ€” Target a specific agent version
- `sessionId` โ€” MANDATORY for vNext hosted agents, include the session ID to maintain sticky sessions with the same compute resource

#### Session Support for vNext Hosted Agents
In vNext hosted agents, the invoke endpoint accepts a 25 character alphanumeric `sessionId` parameter. Sessions are **sticky** - they route the request to same underlying compute resource, so agent can re-use the state stored in compute's file across multiple turns.

Rules:
1. You MUST generate a unique `sessionId` before making the first `agent_invoke` call.
2. If you have a session ID, you MUST include it in every subsequent `agent_invoke` call for that conversation.
3. When the user explicitly requests a new session, create a new `sessionId` and use it for rest of the `agent_invoke` calls.

This is different from `conversationId` which tracks conversation history โ€” `sessionId` controls which compute instance handles the request.

### Step 3: Multi-Turn Conversations

For follow-up messages, pass the `conversationId` from the previous response to `agent_invoke`. This maintains conversation context across turns.

Each invocation with the same `conversationId` continues the existing conversation thread.

## Agent Type Differences

| Behavior | Prompt Agent | Hosted Agent |
|----------|-------------|--------------|
| Readiness | Immediate after creation | Requires running container |
| Pre-check | `agent_get` to verify exists | `agent_container_status_get` for `Running` status |
| Routing | Automatic | Optional `containerEndpoint` parameter |
| Multi-turn | โœ… via `conversationId` | โœ… via `conversationId` |

## Error Handling

| Error | Cause | Resolution |
|-------|-------|------------|
| Agent not found | Invalid agent name or project endpoint | Use `agent_get` to list available agents and verify name |
| Container not running | Hosted agent container is stopped or failed | Use deploy skill to start the container with `agent_container_control` |
| Invocation failed | Model error, timeout, or invalid input | Check agent logs, verify model deployment is active, retry with simpler input |
| Conversation ID invalid | Stale or non-existent conversation | Start a new conversation without `conversationId` |
| Rate limit exceeded | Too many requests | Implement backoff and retry, or wait before sending next message |

## Additional Resources

- [Foundry Hosted Agents](https://learn.microsoft.com/en-us/azure/ai-foundry/agents/concepts/hosted-agents?view=foundry)
- [Foundry Agent Runtime Components](https://learn.microsoft.com/en-us/azure/ai-foundry/agents/concepts/runtime-components?view=foundry)
- [Foundry Samples](https://github.com/azure-ai-foundry/foundry-samples)
foundry-agent/observe/
observe.md 9.9 KB
# Agent Observability Loop

Orchestrate the full eval-driven optimization cycle for a Foundry agent. This skill manages the **multi-step workflow** for a selected agent root and environment: reusing or refreshing `.foundry` cache, auto-creating evaluators, generating test datasets, running batch evals, clustering failures, optimizing prompts, redeploying, and comparing versions. Use this skill instead of calling individual `azure` MCP evaluation tools manually.

## When to Use This Skill

USE FOR: evaluate my agent, run an eval, test my agent, check agent quality, run batch evaluation, analyze eval results, why did my eval fail, cluster failures, improve agent quality, optimize agent prompt, compare agent versions, re-evaluate after changes, set up CI/CD evals, agent monitoring, eval-driven optimization.

> โš ๏ธ **DO NOT manually call** `evaluation_agent_batch_eval_create`, `evaluator_catalog_create`, `evaluation_comparison_create`, or `prompt_optimize` **without reading this skill first.** This skill defines required pre-checks, environment selection, cache reuse, artifact persistence, and multi-step orchestration that the raw tools do not enforce.

## Quick Reference

| Property | Value |
|----------|-------|
| MCP server | `azure` |
| Key MCP tools | `evaluator_catalog_get`, `evaluation_agent_batch_eval_create`, `evaluator_catalog_create`, `evaluation_comparison_create`, `prompt_optimize`, `agent_update` |
| Prerequisite | Agent deployed and running (use [deploy skill](../deploy/deploy.md)) |
| Local cache | `.foundry/agent-metadata.yaml`, `.foundry/evaluators/`, `.foundry/datasets/`, `.foundry/results/` |

## Entry Points

| User Intent | Start At |
|-------------|----------|
| "Deploy and evaluate my agent" | [Step 1: Auto-Setup Evaluators](references/deploy-and-setup.md) (deploy first via [deploy skill](../deploy/deploy.md)) |
| "Agent just deployed" / "Set up evaluation" | [Step 1: Auto-Setup Evaluators](references/deploy-and-setup.md) (skip deploy, run auto-create) |
| "Evaluate my agent" / "Run an eval" | [Step 1: Auto-Setup Evaluators](references/deploy-and-setup.md) first if `.foundry/evaluators/` or `.foundry/datasets/` cache is missing, stale, or the user requests refresh, then [Step 2: Evaluate](references/evaluate-step.md) |
| "Why did my eval fail?" / "Analyze results" | [Step 3: Analyze](references/analyze-results.md) |
| "Improve my agent" / "Optimize prompt" | [Step 4: Optimize](references/optimize-deploy.md) |
| "Compare agent versions" | [Step 5: Compare](references/compare-iterate.md) |
| "Set up CI/CD evals" | [Step 6: CI/CD](references/cicd-monitoring.md) |

> โš ๏ธ **Important:** Before running any evaluation (Step 2), always resolve the selected agent root and environment, then inspect `.foundry/agent-metadata.yaml` plus `.foundry/evaluators/` and `.foundry/datasets/`. If the cache is missing, stale, or the user wants to refresh it, route through [Step 1: Auto-Setup](references/deploy-and-setup.md) first โ€” even if the user only asked to "evaluate."

## Before Starting โ€” Detect Current State

1. Resolve the target agent root and environment from `.foundry/agent-metadata.yaml`.
2. Use `agent_get` and `agent_container_status_get` to verify the environment's agent exists and is running.
3. Inspect the selected environment's `testCases[]` plus cached files under `.foundry/evaluators/` and `.foundry/datasets/`.
4. Use `evaluation_get` to check for existing eval runs.
5. Jump to the appropriate entry point.

## Loop Overview

```text
1. Auto-setup evaluators or refresh .foundry cache for the selected environment
   -> ask: "Run an evaluation to identify optimization opportunities?"
2. Evaluate (batch eval run)
3. Download and cluster failures
4. Pick a category or test case to optimize
5. Optimize prompt
6. Deploy new version (after user sign-off)
7. Re-evaluate (same env + same test case)
8. Compare versions -> decide which to keep
9. Loop to next category or finish
10. Prompt: enable CI/CD evals and continuous production monitoring
```

## Behavioral Rules

1. **Keep context visible.** Restate the selected agent root and environment in setup, evaluation, and result summaries.
2. **Reuse cache before regenerating.** Prefer existing `.foundry/evaluators/` and `.foundry/datasets/` when they match the active environment. Ask before refreshing or overwriting them.
3. **Start with P0 test cases.** Run the selected environment's `P0` test cases before broader `P1` or `P2` coverage unless the user explicitly chooses otherwise.
4. **Auto-poll in background.** After creating eval runs or starting containers, poll in a background terminal. Only surface the final result.
5. **Confirm before changes.** Show diff/summary before modifying agent code, refreshing cache, or deploying. Wait for sign-off.
6. **Prompt for next steps.** After each step, present options. Never assume the path forward.
7. **Write scripts to files.** Python scripts go in `scripts/` - no inline code blocks.
8. **Persist eval artifacts.** Save local artifacts to `.foundry/evaluators/`, `.foundry/datasets/`, and `.foundry/results/` for version tracking and comparison.
9. **Use exact eval parameter names.** Use `evaluationId` only on batch-eval create calls that group runs; use `evalId` on `evaluation_get` and `evaluation_comparison_create`; use `evalRunId` for a specific run lookup.
10. **Check existing evaluators before creating new ones.** Always call `evaluator_catalog_get` before proposing or creating evaluators. Present the existing catalog to the user and map existing evaluators to the agent's evaluation needs. Only create a new evaluator when no existing one covers the required dimension. This applies to every workflow that involves evaluator selection - initial setup, re-evaluation, and optimization loops.
11. **Use correct parameters when deleting evaluators.** `evaluator_catalog_delete` requires both `name` (not `evaluatorName`) and `version`. When cleaning up redundant evaluators, always pass the explicit version string. If an evaluator has multiple versions (for example, `v1`, `v2`, `v3`), delete each version individually - there is no "delete all versions" shortcut. Discover version numbers with `evaluator_catalog_get` before attempting deletions.
12. **Use a two-phase evaluator strategy.** Phase 1 is built-in only: `relevance`, `task_adherence`, `intent_resolution`, `indirect_attack`, and `builtin.tool_call_accuracy` when the agent uses tools. Generate seed datasets with `query` and `expected_behavior` so Phase 2 can reuse or create targeted custom evaluators only after the first run exposes gaps.
13. **Account for LLM judge knowledge cutoff.** When the agent uses real-time data sources (web search, Bing Grounding, live APIs), the LLM judge's training cutoff means it cannot verify current facts. Custom evaluators that score factual accuracy or behavioral adherence will produce systematic false negatives - flagging the agent's real-time data as "fabricated" or "beyond knowledge cutoff." Mitigations: (a) instruct the evaluator prompt to accept sourced claims it cannot verify, (b) use `expected_behavior` rubrics that describe the shape of a good answer rather than specific facts, (c) flag suspected knowledge-cutoff false negatives in the failure analysis rather than treating them as real failures.
14. **Show Data Viewer deeplinks (for VS Code runtime only).** Append a Data Viewer deeplink immediately after reference to a dataset file or evaluation result file in your response. Format: "[Open in Data Viewer](vscode://ms-windows-ai-studio.windows-ai-studio/open_data_viewer?file=<file_path>&source=microsoft-foundry-skill) for details and perform analysis". This applies to files in `.foundry/datasets/`, `.foundry/results/`.

## Two-Phase Evaluator Strategy

| Phase | When | Evaluators | Dataset fields | Goal |
|-------|------|------------|----------------|------|
| Phase 1 - Initial setup | Before the first eval run | <=5 built-in evaluators only: `relevance`, `task_adherence`, `intent_resolution`, `indirect_attack`, plus `builtin.tool_call_accuracy` when the agent uses tools | `query`, `expected_behavior` (plus optional `context`, `ground_truth`) | Establish a fast baseline and identify which failure patterns built-ins can and cannot explain |
| Phase 2 - After analysis | After reviewing the first run's failures and clusters | Reuse existing custom evaluators first; create a new custom evaluator only when the built-in set cannot capture the gap | Reuse `expected_behavior` as a per-query rubric | Turn broad failure signals into targeted, domain-aware scoring |

Phase 1 keeps the first setup fast and comparable across agents. Even though the initial built-in evaluators do not consume `expected_behavior`, include it in every seed dataset row so the same dataset is ready for Phase 2 custom evaluators without regeneration.

When built-in evaluators reveal patterns they cannot fully capture - for example, false negatives from `task_adherence` missing tool-call context or domain-specific quality gaps - first call `evaluator_catalog_get` again to see whether an existing custom evaluator already covers the dimension. Only create a new evaluator when the catalog still lacks the required signal.

Example custom evaluator for Phase 2:

```yaml
name: behavioral_adherence
promptText: |
  Given the query, response, and expected behavior, rate how well
  the response fulfills the expected behavior (1-5).
  ## Query
  {{query}}
  ## Response
  {{response}}
  ## Expected Behavior
  {{expected_behavior}}
```

> ๐Ÿ’ก **Tip:** This evaluator scores against the per-query behavioral rubric in `expected_behavior`, not just the agent's global instructions. That usually produces a cleaner signal when broad built-in judges are directionally correct but too coarse for optimization.

## Related Skills

| User Intent | Skill |
|-------------|-------|
| "Analyze production traces" / "Search conversations" / "Find errors in App Insights" | [trace skill](../trace/trace.md) |
| "Debug container issues" / "Container logs" | [troubleshoot skill](../troubleshoot/troubleshoot.md) |
| "Deploy or redeploy agent" | [deploy skill](../deploy/deploy.md) |
foundry-agent/observe/references/
analyze-results.md 6.1 KB
# Steps 3โ€“5 โ€” Download Results, Cluster Failures, Dive Into Category

## Step 3 โ€” Download Results

`evaluation_get` returns run metadata but **not** full per-row output. Write a Python script (save to `scripts/`) to download detailed results using the **Azure AI Projects Python SDK**.

### Prerequisites

```text
pip install azure-ai-projects>=2.0.0 azure-identity
```

### SDK Client Setup

```python
from azure.identity import DefaultAzureCredential
from azure.ai.projects import AIProjectClient

project_client = AIProjectClient(
    endpoint=project_endpoint,       # e.g. "https://<hub>.services.ai.azure.com/api/projects/<project>"
    credential=DefaultAzureCredential(),
)
# The evals API lives on the OpenAI sub-client, not on AIProjectClient directly
client = project_client.get_openai_client()
```

> โš ๏ธ **Common mistake:** Calling `project_client.evals` directly โ€” the `evals` namespace is on the OpenAI client returned by `get_openai_client()`, not on `AIProjectClient` itself.

### Retrieve Run Status

```python
run = client.evals.runs.retrieve(run_id=run_id, eval_id=eval_id)
print(f"Status: {run.status}  Report: {run.report_url}")
```

### Download Per-Row Output Items

The SDK handles pagination automatically โ€” no manual `has_more` / `after` loop required.

```python
output_items = list(client.evals.runs.output_items.list(run_id=run_id, eval_id=eval_id))
all_items = [item.model_dump() for item in output_items]
```

> ๐Ÿ’ก **Tip:** Use `model_dump()` to convert each SDK object to a plain dict for JSON serialization.

### Data Structure

Query/response data lives in `datasource_item.query` and `datasource_item['sample.output_text']`, **not** in `sample.input`/`sample.output` (which are empty arrays). Parse `datasource_item` fields when extracting queries and responses for analysis.

> โš ๏ธ **LLM judge knowledge cutoff:** When evaluating agents that use real-time data sources (web search, Bing Grounding, live APIs), the LLM judge may flag factually correct but temporally recent responses as "fabricated" or "unverifiable" because the judge's training data predates the agent's live results. Check failure reasons for phrases like "cannot verify," "beyond knowledge cutoff," or "no evidence" before treating them as real failures. See Behavioral Rule 13 in `observe.md` for mitigations.

### Custom Evaluator Dual-Entry Parsing

Custom evaluators produce **two** result entries per item in the `results` array:

| Entry | `metric` field | Has score? | Has reason/label/passed? |
|-------|----------------|------------|--------------------------|
| Entry 1 | `"custom_score"` | โœ… numeric score | โŒ null |
| Entry 2 | `"{evaluator_name}"` | โŒ null | โœ… real reason, label, passed |

To get the complete picture, merge both entries:

```python
def extract_evaluator_result(item, evaluator_name):
    """Merge the dual entries for a custom evaluator into one result."""
    score_entry = None
    detail_entry = None
    for r in item.get("results", []):
        metric = r.get("metric", "")
        if metric == "custom_score":
            score_entry = r
        elif metric == evaluator_name:
            detail_entry = r
    if not detail_entry:
        return None
    return {
        "score": score_entry.get("score") if score_entry else None,
        "passed": detail_entry.get("passed"),
        "reason": detail_entry.get("reason"),
        "label": detail_entry.get("label"),
    }
```

> โš ๏ธ **Common mistake:** Reading only the first matching result entry for a custom evaluator gives you the score but null reason (or vice versa). Always merge both entries. Built-in evaluators do **not** have this dual-entry pattern - they produce a single entry with all fields populated.

**Evidence from actual eval run** (item 1, `behavioral_adherence`):

```jsonc
// Entry 1: has score, null reason
{"name": "behavioral_adherence", "metric": "custom_score", "score": 1, "reason": null, "passed": null}

// Entry 2: has reason, null score
{"name": "behavioral_adherence", "metric": "behavioral_adherence", "score": null,
 "reason": "The response provides outdated and fabricated information...", "passed": false}
```

### Persist Results

Save results to `.foundry/results/<environment>/<eval-id>/<run-id>.json` (use `json.dump` with `default=str` for non-serializable fields). Print summary: total items, passed, failed, errored counts.

> โš ๏ธ **Show Data Viewer deeplinks (for VS Code runtime only):** Append a Data Viewer deeplink immediately after reference to an evaluation result file in your response. Format: "[Open in Data Viewer](vscode://ms-windows-ai-studio.windows-ai-studio/open_data_viewer?file=<file_path>&source=microsoft-foundry-skill) for details and perform analysis".

## Step 4 โ€” Cluster Failures by Root Cause

Analyze every row in the results. Group failures into clusters:

| Cluster | Description |
|---------|-------------|
| Incorrect / hallucinated answer | Agent gave a wrong or fabricated response |
| Incomplete answer | Agent missed key parts |
| Tool call failure | Agent failed to invoke or misused a tool |
| Safety / content violation | Flagged by safety evaluators |
| Runtime error | Agent crashed or returned an error |
| Off-topic / refusal | Agent refused or went off-topic |

Produce a prioritized action table:

| Priority | Cluster | Suggested Action |
|----------|---------|------------------|
| P0 | Runtime errors or failing `P0` test cases | Check container logs or fix blockers first |
| P1 | Incorrect answers on key flows | Optimize prompt or tool instructions |
| P2 | Incomplete answers or broader quality gaps | Optimize prompt or expand context |
| P3 | Tool call failures | Fix tool definitions or instructions |
| P4 | Safety violations | Add guardrails to instructions |

**Rule:** Prioritize runtime errors first, then sort by test-case priority (`P0` before `P1` before `P2`) and count ร— severity.

## Step 5 โ€” Dive Into Category

When the user wants to inspect a specific cluster, display the individual rows: test-case ID, input query, the agent's original response, evaluator scores, and failure reason. Let the user confirm which category or test case to optimize.

## Next Steps

After clustering -> proceed to [Step 6: Optimize Prompt](optimize-deploy.md).
cicd-monitoring.md 1.8 KB
# Step 11 โ€” Enable CI/CD Evals & Continuous Monitoring

After confirming the final agent version, prompt with two options:

## Option 1 โ€” CI/CD Evaluations

*"Would you like to add automated evaluations to your CI/CD pipeline so every deployment is evaluated before going live?"*

If yes, generate a GitHub Actions workflow (for example, `.github/workflows/agent-eval.yml`) that:

1. Triggers on push to `main` or on pull request
2. Reads test-case definitions from `.foundry/agent-metadata.yaml`
3. Reads evaluator definitions from `.foundry/evaluators/` and test datasets from `.foundry/datasets/`
4. Runs `evaluation_agent_batch_eval_create` against the newly deployed agent version
5. Fails the workflow if any evaluator score falls below the configured thresholds for the selected environment/test case
6. Posts a summary as a PR comment or workflow annotation

Use repository secrets for the selected environment's project endpoint and Azure credentials. Confirm the workflow file with the user before committing.

## Option 2 โ€” Continuous Production Monitoring

*"Would you like to set up continuous evaluations to monitor your agent's quality in production?"*

If yes, generate a scheduled GitHub Actions workflow (for example, `.github/workflows/agent-eval-scheduled.yml`) that:

1. Runs on a cron schedule (ask the user preference: daily, weekly, and so on)
2. Evaluates the current production agent version using stored test cases, evaluators, and datasets
3. Saves results to `.foundry/results/<environment>/`
4. Opens a GitHub issue or sends a notification if any score degrades below thresholds

The user may choose one, both, or neither.

## Reference

- [Azure AI Foundry Cloud Evaluation](https://learn.microsoft.com/en-us/azure/ai-foundry/how-to/develop/cloud-evaluation)
- [Hosted Agents](https://learn.microsoft.com/en-us/azure/ai-foundry/agents/concepts/hosted-agents)
compare-iterate.md 2.9 KB
# Steps 8โ€“10 โ€” Re-Evaluate, Compare Versions, Iterate

## Step 8 โ€” Re-Evaluate

Use **`evaluation_agent_batch_eval_create`** with the **same `evaluationId`** as the baseline run. This places both runs in the same eval group for comparison. Use the same local test dataset (from `.foundry/datasets/`) and evaluator bundle from the selected environment/test case. Update `agentVersion` to the new version.

> โš ๏ธ **Parameter switch reminder:** Re-evaluation creation uses `evaluationId`, but follow-up calls to `evaluation_get` and `evaluation_comparison_create` must use `evalId`.

> โš ๏ธ **Eval-group immutability:** Reuse the same `evaluationId` only when `evaluatorNames` and thresholds are unchanged. If you add/remove evaluators or change thresholds, create a new evaluation group first, then compare runs within that new group.

Auto-poll for completion in a background terminal (same as [Step 2](evaluate-step.md)).

## Step 9 โ€” Compare Versions

> **Critical:** `displayName` is **required** in the `insightRequest`. Despite the MCP tool schema showing `displayName` as optional (`type: ["string", "null"]`), the API will reject requests without it with a BadRequest error. `state` must be `"NotStarted"`.

### Required Parameters for `evaluation_comparison_create`

| Parameter | Required | Description |
|-----------|----------|-------------|
| `insightRequest.displayName` | โœ… | Human-readable name. **Omitting causes BadRequest.** |
| `insightRequest.state` | โœ… | Must be `"NotStarted"` |
| `insightRequest.request.evalId` | โœ… | Eval group ID containing both runs |
| `insightRequest.request.baselineRunId` | โœ… | Run ID of the baseline |
| `insightRequest.request.treatmentRunIds` | โœ… | Array of treatment run IDs |

Use **`evaluation_comparison_create`** with a nested `insightRequest`:

```json
{
  "insightRequest": {
    "displayName": "V1 vs V2 Comparison",
    "state": "NotStarted",
    "request": {
      "type": "EvaluationComparison",
      "evalId": "<eval-group-id>",
      "baselineRunId": "<baseline-run-id>",
      "treatmentRunIds": ["<new-run-id>"]
    }
  }
}
```

> **Important:** Both runs must be in the **same eval group** (same `evaluationId` in Steps 2 and 8), but comparison requests and lookups use `evalId` for that same group identifier. That shared group assumes the evaluator bundle is fixed for all runs in the group.

Then use **`evaluation_comparison_get`** (with the returned `insightId`) to retrieve comparison results. Present a summary showing which version performed better per evaluator, and recommend which version to keep.

## Step 10 โ€” Iterate or Finish

If more categories remain in the prioritized action table (from [Step 4](analyze-results.md)), loop back to **Step 5** (dive into next category) โ†’ **Step 6** (optimize) โ†’ **Step 7** (deploy) โ†’ **Step 8** (re-evaluate) โ†’ **Step 9** (compare).

Otherwise, confirm the final agent version with the user, then prompt for [CI/CD evals & monitoring](cicd-monitoring.md).
deploy-and-setup.md 6.9 KB
# Step 1 โ€” Auto-Setup Evaluators & Dataset

> **This step runs automatically after deployment.** If the agent was deployed via the [deploy skill](../../deploy/deploy.md), `.foundry` cache and metadata may already be configured. Check `.foundry/evaluators/`, `.foundry/datasets/`, and `.foundry/agent-metadata.yaml` for existing artifacts before re-creating them.
>
> If the agent is **not yet deployed**, follow the [deploy skill](../../deploy/deploy.md) first. It handles project detection, Dockerfile generation, ACR build, agent creation, container startup, and auto-creates `.foundry` cache after a successful deployment.

## Auto-Create Evaluators & Dataset

> **This step is fully automatic.** After deployment, immediately prepare evaluators and a local test dataset for the selected environment without waiting for the user to request it.

### 1. Read Agent Instructions

Use **`agent_get`** (or local `agent.yaml`) to understand the agent's purpose and capabilities.

### 2. Reuse or Refresh Cache

Inspect `.foundry/evaluators/`, `.foundry/datasets/`, and the selected environment's `testCases[]`.

- **Cache is current** -> reuse it and summarize what is already available.
- **Cache is missing or stale** -> refresh it after confirming with the user.
- **User explicitly asks for refresh** -> rebuild and rewrite only the selected environment's cache.

### 2.5 Discover Existing Evaluators

Use **`evaluator_catalog_get`** with the selected environment's project endpoint to list all evaluators already registered in the project. Display them to the user grouped by type (`custom` vs `built-in`) with name, category, and version. During Phase 1, catalog any promising custom evaluators for later reuse, but keep the first run on the built-in baseline. Only propose creating a new evaluator in Phase 2 when no existing evaluator covers a required dimension.

### 3. Select Evaluators

Follow the [Two-Phase Evaluator Strategy](../observe.md). Phase 1 is built-in only, so do not create a new custom evaluator during the initial setup pass.

Start with <=5 built-in evaluators for the initial eval run so the first pass stays fast:

| Category | Evaluators |
|----------|-----------|
| **Quality (built-in)** | relevance, task_adherence, intent_resolution |
| **Safety (built-in)** | indirect_attack |
| **Tool use (built-in, conditional)** | tool_call_accuracy (use when the agent calls tools; some catalogs label it as `builtin.tool_call_accuracy`) |

After analyzing initial results, suggest additional evaluators (custom or built-in) targeted at specific failure patterns instead of front-loading a broad default set.

### 4. Defer New Custom Evaluators to Phase 2

During the initial setup pass, do not create a new custom evaluator yet. Instead, record which existing custom evaluators from Step 2.5 might be reused later and run the first built-in-only eval. After the first run has been analyzed, return to this step only if the built-in judges still miss an important pattern.

When Phase 2 is needed:

1. Call **`evaluator_catalog_get`** again and reuse an existing custom evaluator if it already covers the gap.
2. Only if the catalog still lacks the required signal, use **`evaluator_catalog_create`** with the selected environment's project endpoint.
3. Prefer evaluators that consume `expected_behavior`, as described in the [Two-Phase Evaluator Strategy](../observe.md), so scoring can follow the per-query rubric instead of only the global agent instructions.

| Parameter | Required | Description |
|-----------|----------|-------------|
| `projectEndpoint` | โœ… | Azure AI Project endpoint |
| `name` | โœ… | For example, `domain_accuracy`, `citation_quality` |
| `category` | โœ… | `quality`, `safety`, or `agents` |
| `scoringType` | โœ… | `ordinal`, `continuous`, or `boolean` |
| `promptText` | โœ…* | Template with `{{query}}`, `{{response}}`, and `{{expected_behavior}}` placeholders when behavior-specific scoring is needed |
| `minScore` / `maxScore` | | Default: 1 / 5 |
| `passThreshold` | | Scores >= this value pass |

### 5. Identify LLM-Judge Deployment

Use **`model_deployment_get`** to list the selected project's actual model deployments, then choose one that supports chat completions for quality evaluators. Do **not** assume `gpt-4o` exists in the project. If no deployment supports chat completions, stop the setup flow and explain that quality evaluators need a compatible judge deployment.

### 6. Generate Local Test Dataset

Generate the seed rows directly from the agent's instructions and tool capabilities you already resolved during setup. Do **not** call the identified chat-capable deployment for dataset generation; reserve that deployment for quality evaluators. Save the initial seed file to `.foundry/datasets/<agent-name>-eval-seed-v1.jsonl` with each line containing at minimum `query` and `expected_behavior` fields (optionally `context`, `ground_truth`).

The local filename must start with the selected environment's Foundry agent name (`agentName` in `agent-metadata.yaml`) before adding stage, environment, or version suffixes.

Include `expected_behavior` even though Phase 1 uses built-in evaluators only. That field pre-positions the seed dataset for Phase 2 custom evaluators if the first run reveals gaps that need a per-query behavioral rubric.

Use [Generate Seed Evaluation Dataset](../../eval-datasets/references/generate-seed-dataset.md) as the single source of truth for registration. It covers `project_connection_list` with `AzureStorageAccount`, key-based versus AAD upload, `evaluation_dataset_create` with `connectionName`, and saving the returned `datasetUri`.

### 7. Persist Artifacts and Test Cases

```text
.foundry/
  agent-metadata.yaml
  evaluators/
    <name>.yaml
  datasets/
    *.jsonl
  results/
    <environment>/
      <eval-id>/
        <run-id>.json
```

Save evaluator definitions to `.foundry/evaluators/<name>.yaml`, test data to `.foundry/datasets/*.jsonl`, and create or update test cases in `agent-metadata.yaml` with:
- `id`
- `priority` (`P0`, `P1`, `P2`)
- `dataset` (for example, `<agent-name>-eval-seed`)
- `datasetVersion` (for example, `v1`)
- `datasetFile` (for example, `.foundry/datasets/<agent-name>-eval-seed-v1.jsonl`)
- `datasetUri` (returned by `evaluation_dataset_create`)
- tag values for `agent`, `stage`, and `version`
- evaluator names and thresholds

> โš ๏ธ **Show Data Viewer deeplinks (for VS Code runtime only):** Append a Data Viewer deeplink immediately after reference to a dataset file in your response. Format: "[Open in Data Viewer](vscode://ms-windows-ai-studio.windows-ai-studio/open_data_viewer?file=<file_path>&source=microsoft-foundry-skill) for details and perform analysis".

### 8. Prompt User

*"Your agent is deployed and running in the selected environment. The `.foundry` cache now contains evaluators, a local seed dataset, the Foundry dataset registration metadata, and test-case metadata. Would you like to run an evaluation to identify optimization opportunities?"*

If yes -> proceed to [Step 2: Evaluate](evaluate-step.md). If no -> stop.
evaluate-step.md 3.9 KB
# Step 2 โ€” Create Batch Evaluation

## Prerequisites

- Agent deployed and running in the selected environment
- `.foundry/agent-metadata.yaml` loaded for the active agent root
- Evaluators configured (from [Step 1](deploy-and-setup.md) or `.foundry/evaluators/`)
- Local test dataset available (from `.foundry/datasets/`)
- Test case selected from the environment's `testCases[]`

## Run Evaluation

Use **`evaluation_agent_batch_eval_create`** to run the selected test case's evaluators against the selected environment's agent.

### Required Parameters

| Parameter | Description |
|-----------|-------------|
| `projectEndpoint` | Azure AI Project endpoint from `agent-metadata.yaml` |
| `agentName` | Agent name for the selected environment |
| `agentVersion` | Agent version (string, for example `"1"`) |
| `evaluatorNames` | Array of evaluator names from the selected test case |

### Test Data Options

**Preferred โ€” local dataset:** Read JSONL from `.foundry/datasets/` and pass via `inputData` (array of objects with `query` and `expected_behavior`, optionally `context`, `ground_truth`). Always use this when the referenced cache file exists.

**Fallback only โ€” server-side synthetic data:** Set `generateSyntheticData=true` and provide `generationModelDeploymentName`. Only use this when the local cache is missing and the user explicitly requests a refresh-free synthetic run.

## Resolve Judge Deployment

Before setting `deploymentName`, use **`model_deployment_get`** to list the selected project's actual model deployments. Choose a deployment that supports chat completions and use that deployment name for quality evaluators. Do **not** assume `gpt-4o` exists. If the project has no chat-completions-capable deployment, stop and tell the user quality evaluators cannot run until one is available.

### Additional Parameters

| Parameter | When Needed |
|-----------|-------------|
| `deploymentName` | Required for quality evaluators (the LLM-judge model) |
| `evaluationId` | Pass existing eval group ID to group runs for comparison |
| `evaluationName` | Name for a new evaluation group; include environment and test-case ID |

> **Important:** Use `evaluationId` on `evaluation_agent_batch_eval_create` (not `evalId`) to group runs. Run `P0` test cases first unless the user chooses a broader priority band.

> โš ๏ธ **Eval-group immutability:** Reuse an existing `evaluationId` only when the dataset comparison setup is unchanged for that group: same evaluator list and same thresholds. If evaluator definitions or thresholds change, create a **new** evaluation group instead of adding another run to the old one.

## Parameter Naming Guardrail

These eval tools use similar names for the same evaluation-group identifier. Match the parameter name to the tool exactly:

| Tool | Correct Group Parameter | Notes |
|------|-------------------------|-------|
| `evaluation_agent_batch_eval_create` | `evaluationId` | Reuse the existing group when creating a new run |
| `evaluation_get` | `evalId` | Use with `isRequestForRuns=true` to list runs in one group |
| `evaluation_comparison_create` | `insightRequest.request.evalId` | Comparison requests take `evalId`, not `evaluationId` |

> โš ๏ธ **Common mistake:** `evaluation_get` does **not** accept `evaluationId`. Always switch from `evaluationId` to `evalId` after the run is created.

## Auto-Poll for Completion

Immediately after creating the run, poll **`evaluation_get`** in a background terminal until completion. Use `evalId + isRequestForRuns=true`. The run ID parameter is `evalRunId` (not `runId`).

Only surface the final result when status reaches `completed`, `failed`, or `cancelled`.

## Next Steps

When evaluation completes -> proceed to [Step 3: Analyze Results](analyze-results.md).

## Reference

- [Azure AI Foundry Cloud Evaluation](https://learn.microsoft.com/en-us/azure/ai-foundry/how-to/develop/cloud-evaluation)
- [Built-in Evaluators](https://learn.microsoft.com/en-us/azure/foundry/concepts/built-in-evaluators)
optimize-deploy.md 1.5 KB
# Steps 6โ€“7 โ€” Optimize Prompt & Deploy New Version

## Step 6 โ€” Optimize Prompt

> โ›” **Guardrail:** When optimizing after a dataset update, do NOT remove dataset rows or weaken evaluators to recover scores. Score drops on a harder dataset are expected โ€” they mean test coverage improved, not that the agent regressed. Optimize for NEW failure patterns only.

Use **`prompt_optimize`** with:

| Parameter | Required | Description |
|-----------|----------|-------------|
| `developerMessage` | โœ… | Agent's current system prompt / instructions |
| `deploymentName` | โœ… | Model for optimization (e.g., `gpt-4o-mini`) |
| `projectEndpoint` or `foundryAccountResourceId` | โœ… | At least one required |
| `requestedChanges` | | Concise improvement suggestions from cluster analysis |

**Example `requestedChanges`:** *"Be more specific when answering geography questions"*, *"Always cite sources when providing factual claims"*

> Use the optimized prompt returned by the tool. Do NOT manually rewrite.

## Step 7 โ€” Deploy New Version

> **Always confirm before deploying.** Show the user a diff or summary of prompt changes and wait for explicit sign-off.

After approval:

1. Use **`agent_update`** to create a new agent version with the optimized prompt
2. Start the container with **`agent_container_control`** (action: `start`)
3. Poll **`agent_container_status_get`** in a **background terminal** until status is `Running`

## Next Steps

When the new version is running โ†’ proceed to [Step 8: Re-Evaluate](compare-iterate.md).
foundry-agent/trace/references/
analyze-failures.md 3.9 KB
# Analyze Failures โ€” Find and Cluster Failing Traces

Identify failing agent traces, group them by root cause, and produce a prioritized action table.

## Step 1 โ€” Find Failing Traces

> โš ๏ธ **Hosted agents:** `gen_ai.agent.name` on `dependencies` holds the **code-level class name** (e.g., `BingSearchAgent`), NOT the Foundry agent name. To filter by Foundry name, use the [Hosted Agent Variant](#hosted-agent-variant--failures) below.

```kql
dependencies
| where timestamp > ago(24h)
| where success == false or toint(resultCode) >= 400
| extend
    operation = tostring(customDimensions["gen_ai.operation.name"]),
    errorType = tostring(customDimensions["error.type"]),
    model = tostring(customDimensions["gen_ai.request.model"]),
    agentName = tostring(customDimensions["gen_ai.agent.name"]),
    conversationId = tostring(customDimensions["gen_ai.conversation.id"])
| project timestamp, name, duration, resultCode, errorType, operation, model,
    agentName, conversationId, operation_Id, id
| order by timestamp desc
| take 100
```

## Step 2 โ€” Cluster by Error Type

```kql
dependencies
| where timestamp > ago(24h)
| where success == false or toint(resultCode) >= 400
| extend
    errorType = tostring(customDimensions["error.type"]),
    operation = tostring(customDimensions["gen_ai.operation.name"])
| summarize
    count = count(),
    firstSeen = min(timestamp),
    lastSeen = max(timestamp),
    avgDuration = avg(duration),
    sampleOperationId = take_any(operation_Id)
  by errorType, operation, resultCode
| order by count desc
```

## Step 3 โ€” Prioritized Action Table

Present results as:

| Priority | Error Type | Operation | Count | Result Code | Suggested Action |
|----------|-----------|-----------|-------|-------------|-----------------|
| P0 | timeout | invoke_agent | 15 | 504 | Check agent container health, increase timeout |
| P1 | rate_limited | chat | 8 | 429 | Check quota, add retry logic |
| P2 | content_filter | chat | 5 | 400 | Review prompt for policy violations |
| P3 | tool_error | execute_tool | 3 | 500 | Check tool implementation and permissions |

**Prioritization:** P0 = highest count or most severe (5xx), then by count ร— recency.

## Step 4 โ€” Drill Into Specific Failure

When the user selects a cluster, show individual failing traces:

```kql
dependencies
| where timestamp > ago(24h)
| where success == false
| where customDimensions["error.type"] == "<selected_error_type>"
| where customDimensions["gen_ai.operation.name"] == "<selected_operation>"
| project timestamp, name, duration, resultCode,
    conversationId = tostring(customDimensions["gen_ai.conversation.id"]),
    responseId = tostring(customDimensions["gen_ai.response.id"]),
    operation_Id
| order by timestamp desc
| take 20
```

Also check `exceptions` table for stack traces:

```kql
exceptions
| where timestamp > ago(24h)
| where operation_Id in ("<operation_id_1>", "<operation_id_2>")
| project timestamp, type, message, outerMessage, details, operation_Id
| order by timestamp desc
```

Offer to view the full conversation for any trace via [Conversation Detail](conversation-detail.md).

## Hosted Agent Variant โ€” Failures

For hosted agents, the Foundry agent name lives on `requests`, not `dependencies`. Use a two-step join:

```kql
let reqIds = requests
| where timestamp > ago(24h)
| where customDimensions["gen_ai.agent.name"] == "<foundry-agent-name>"
| distinct id;
dependencies
| where timestamp > ago(24h)
| where operation_ParentId in (reqIds)
| where success == false or toint(resultCode) >= 400
| extend
    operation = tostring(customDimensions["gen_ai.operation.name"]),
    errorType = tostring(customDimensions["error.type"]),
    model = tostring(customDimensions["gen_ai.request.model"]),
    conversationId = tostring(customDimensions["gen_ai.conversation.id"])
| project timestamp, name, duration, resultCode, errorType, operation, model,
    conversationId, operation_ParentId, operation_Id
| order by timestamp desc
| take 100
```
analyze-latency.md 3.8 KB
# Analyze Latency โ€” Find and Diagnose Slow Traces

Identify slow agent traces, find bottleneck spans, and correlate with token usage.

## Step 1 โ€” Find Slow Conversations

> โš ๏ธ **Hosted agents:** `gen_ai.agent.name` on `dependencies` holds the **code-level class name** (e.g., `BingSearchAgent`), NOT the Foundry agent name. To scope by Foundry name, use the [Hosted Agent Variant](#hosted-agent-variant--latency) below.

```kql
dependencies
| where timestamp > ago(24h)
| where customDimensions["gen_ai.operation.name"] == "invoke_agent"
| project timestamp, duration, success,
    agentName = tostring(customDimensions["gen_ai.agent.name"]),
    conversationId = tostring(customDimensions["gen_ai.conversation.id"]),
    operation_Id
| summarize
    totalDuration = sum(duration),
    spanCount = count(),
    hasErrors = countif(success == false) > 0
  by conversationId, operation_Id
| where totalDuration > 5000
| order by totalDuration desc
| take 50
```

> **Default threshold:** 5 seconds. Ask the user for their latency threshold if not specified.

## Step 2 โ€” Latency Distribution (P50/P95/P99)

```kql
dependencies
| where timestamp > ago(24h)
| where customDimensions["gen_ai.operation.name"] in ("chat", "invoke_agent")
| summarize
    p50 = percentile(duration, 50),
    p95 = percentile(duration, 95),
    p99 = percentile(duration, 99),
    avg = avg(duration),
    count = count()
  by operation = tostring(customDimensions["gen_ai.operation.name"]),
     model = tostring(customDimensions["gen_ai.request.model"])
| order by p95 desc
```

Present as:

| Operation | Model | P50 (ms) | P95 (ms) | P99 (ms) | Avg (ms) | Count |
|-----------|-------|---------|---------|---------|---------|-------|

## Step 3 โ€” Bottleneck Breakdown

For a specific slow conversation, break down time spent per span type:

```kql
dependencies
| where operation_Id == "<operation_id>"
| extend operation = tostring(customDimensions["gen_ai.operation.name"])
| summarize
    totalDuration = sum(duration),
    spanCount = count(),
    avgDuration = avg(duration)
  by operation, name
| order by totalDuration desc
```

Common bottleneck patterns:
- **`chat` spans dominate** โ†’ LLM inference is slow (consider smaller model or caching)
- **`execute_tool` spans dominate** โ†’ Tool execution is slow (optimize tool implementation)
- **`invoke_agent` has long gaps** โ†’ Orchestration overhead (check agent framework)

## Step 4 โ€” Token Usage vs Latency Correlation

```kql
dependencies
| where timestamp > ago(24h)
| where customDimensions["gen_ai.operation.name"] == "chat"
| extend
    inputTokens = toint(customDimensions["gen_ai.usage.input_tokens"]),
    outputTokens = toint(customDimensions["gen_ai.usage.output_tokens"])
| where isnotempty(inputTokens)
| project duration, inputTokens, outputTokens,
    model = tostring(customDimensions["gen_ai.request.model"]),
    operation_Id
| order by duration desc
| take 100
```

High token counts often correlate with high latency. If confirmed, suggest:
- Reduce system prompt length
- Limit conversation history window
- Use a faster model for simpler queries

## Hosted Agent Variant โ€” Latency

For hosted agents, scope by Foundry agent name via `requests` then join to `dependencies`:

```kql
let reqIds = requests
| where timestamp > ago(24h)
| where customDimensions["gen_ai.agent.name"] == "<foundry-agent-name>"
| distinct id;
dependencies
| where timestamp > ago(24h)
| where operation_ParentId in (reqIds)
| where customDimensions["gen_ai.operation.name"] in ("chat", "invoke_agent")
| summarize
    p50 = percentile(duration, 50),
    p95 = percentile(duration, 95),
    p99 = percentile(duration, 99),
    avg = avg(duration),
    count = count()
  by operation = tostring(customDimensions["gen_ai.operation.name"]),
     model = tostring(customDimensions["gen_ai.request.model"])
| order by p95 desc
```
conversation-detail.md 3.6 KB
# Conversation Detail โ€” Reconstruct Full Span Tree

Reconstruct the complete span tree for a single conversation to see exactly what happened: every LLM call, tool execution, and agent invocation with timing, tokens, and errors.

## Step 1 โ€” Fetch All Spans for a Conversation

Use `operation_Id` (trace ID) to get all spans in a single request:

```kql
dependencies
| where operation_Id == "<operation_id>"
| project timestamp, name, duration, resultCode, success,
    spanId = id,
    parentSpanId = operation_ParentId,
    operation = tostring(customDimensions["gen_ai.operation.name"]),
    model = tostring(customDimensions["gen_ai.request.model"]),
    responseModel = tostring(customDimensions["gen_ai.response.model"]),
    inputTokens = toint(customDimensions["gen_ai.usage.input_tokens"]),
    outputTokens = toint(customDimensions["gen_ai.usage.output_tokens"]),
    responseId = tostring(customDimensions["gen_ai.response.id"]),
    finishReason = tostring(customDimensions["gen_ai.response.finish_reasons"]),
    errorType = tostring(customDimensions["error.type"]),
    toolName = tostring(customDimensions["gen_ai.tool.name"]),
    toolCallId = tostring(customDimensions["gen_ai.tool.call.id"])
| order by timestamp asc
```

Also fetch the parent request:

```kql
requests
| where operation_Id == "<operation_id>"
| project timestamp, name, duration, resultCode, success, id, operation_ParentId
```

## Step 2 โ€” Build Span Tree

Use `spanId` and `parentSpanId` to reconstruct the hierarchy:

```
invoke_agent (root) โ”€โ”€โ”€ 4200ms
โ”œโ”€โ”€ chat (LLM call #1) โ”€โ”€โ”€ 1800ms, gpt-4o, 450โ†’120 tokens
โ”‚   โ””โ”€โ”€ [output: "Let me check the weather..."]
โ”œโ”€โ”€ execute_tool (get_weather) [tool: remote_functions.weather_api] โ”€โ”€โ”€ 200ms
โ”‚   โ””โ”€โ”€ [result: "rainy, 57ยฐF"]
โ”œโ”€โ”€ chat (LLM call #2) โ”€โ”€โ”€ 1500ms, gpt-4o, 620โ†’85 tokens
โ”‚   โ””โ”€โ”€ [output: "The weather in Paris is rainy, 57ยฐF"]
โ””โ”€โ”€ [total: 450+620=1070 input, 120+85=205 output tokens]
```

Present as an indented tree with:
- **Operation type** and name
- **Duration** (highlight if > P95 for that operation type)
- **Model** and token counts (for chat operations)
- **Error type** and result code (if failed, highlight in red)
- **Finish reason** (stop, length, content_filter, tool_calls)

## Step 3 โ€” Extract Conversation Content from invoke_agent Spans

The full input/output content lives on `invoke_agent` dependency spans in `gen_ai.input.messages` and `gen_ai.output.messages`. These JSON arrays contain the complete conversation (system prompt, user query, assistant response):

```kql
dependencies
| where operation_Id == "<operation_id>"
| where customDimensions["gen_ai.operation.name"] == "invoke_agent"
| project timestamp,
    inputMessages = tostring(customDimensions["gen_ai.input.messages"]),
    outputMessages = tostring(customDimensions["gen_ai.output.messages"])
| order by timestamp asc
```

Message structure: `[{"role": "user", "parts": [{"type": "text", "content": "..."}]}]`

Also check the `traces` table for additional GenAI log events:

```kql
traces
| where operation_Id == "<operation_id>"
| where message contains "gen_ai"
| project timestamp, message, customDimensions
| order by timestamp asc
```

## Step 4 โ€” Check for Exceptions

```kql
exceptions
| where operation_Id == "<operation_id>"
| project timestamp, type, message, outerMessage,
    details = parse_json(details)
| order by timestamp asc
```

Present exceptions inline in the span tree at their position in the timeline.

## Step 5 โ€” Fetch Evaluation Results

See [Eval Correlation](eval-correlation.md) for the full workflow to look up evaluation scores by response ID or conversation ID. Use `gen_ai.response.id` values from Step 1 spans to correlate.
eval-correlation.md 2.5 KB
# Eval Correlation โ€” Find Evaluation Results by Response or Conversation ID

Look up evaluation scores for a specific agent response using App Insights.

> **IMPORTANT:** The Foundry evaluation API does NOT support querying by response ID or conversation ID. App Insights `customEvents` is the ONLY way to correlate eval scores to specific responses. Always use this KQL approach when the user asks for eval results for a specific response or conversation.

## Prerequisites

- App Insights resource resolved (see [trace.md](../trace.md) Before Starting)
- A response ID (`gen_ai.response.id`) or conversation ID (`gen_ai.conversation.id`) from a previous trace query

## Search by Response ID

```kql
customEvents
| where timestamp > ago(30d)
| where name == "gen_ai.evaluation.result"
| where customDimensions["gen_ai.response.id"] == "<response_id>"
| extend
    evalName = tostring(customDimensions["gen_ai.evaluation.name"]),
    score = todouble(customDimensions["gen_ai.evaluation.score.value"]),
    label = tostring(customDimensions["gen_ai.evaluation.score.label"]),
    explanation = tostring(customDimensions["gen_ai.evaluation.explanation"]),
    responseId = tostring(customDimensions["gen_ai.response.id"]),
    conversationId = tostring(customDimensions["gen_ai.conversation.id"])
| project timestamp, evalName, score, label, explanation, responseId, conversationId
| order by evalName asc
```

## Search by Conversation ID

```kql
customEvents
| where timestamp > ago(30d)
| where name == "gen_ai.evaluation.result"
| where customDimensions["gen_ai.conversation.id"] == "<conversation_id>"
| extend
    evalName = tostring(customDimensions["gen_ai.evaluation.name"]),
    score = todouble(customDimensions["gen_ai.evaluation.score.value"]),
    label = tostring(customDimensions["gen_ai.evaluation.score.label"]),
    explanation = tostring(customDimensions["gen_ai.evaluation.explanation"]),
    responseId = tostring(customDimensions["gen_ai.response.id"])
| project timestamp, evalName, score, label, explanation, responseId
| order by responseId asc, evalName asc
```

## Present Results

Show eval scores as a table:

| Evaluator | Score | Label | Explanation |
|-----------|-------|-------|-------------|
| coherence | 5.0 | pass | Response is well-structured... |
| fluency | 4.0 | pass | Natural language flow... |
| relevance | 2.0 | fail | Response doesn't address... |

When showing alongside a span tree (see [Conversation Detail](conversation-detail.md)), attach eval scores to the span whose `gen_ai.response.id` matches.
kql-templates.md 10.5 KB
# KQL Templates โ€” GenAI Trace Query Reference

Ready-to-use KQL templates for querying GenAI OpenTelemetry traces in Application Insights.

**Table of Contents:** [App Insights Table Mapping](#app-insights-table-mapping) ยท [Key GenAI OTel Attributes](#key-genai-otel-attributes) ยท [Span Correlation](#span-correlation) ยท [Hosted Agent Attributes](#hosted-agent-attributes) ยท [Response ID Formats](#response-id-formats) ยท [Common Query Templates](#common-query-templates) ยท [OTel Reference Links](#otel-reference-links)

## App Insights Table Mapping

| App Insights Table | GenAI Data |
|-------------------|------------|
| `dependencies` | GenAI spans: LLM inference (`chat`), tool execution (`execute_tool`), agent invocation (`invoke_agent`) |
| `requests` | Incoming HTTP requests to the agent endpoint. For hosted agents, also carries `gen_ai.agent.name` (Foundry name) and `azure.ai.agentserver.*` attributes โ€” **preferred entry point** for agent-name filtering |
| `customEvents` | GenAI evaluation results (`gen_ai.evaluation.result`) โ€” scores, labels, explanations |
| `traces` | Log events, including GenAI events (input/output messages) |
| `exceptions` | Error details with stack traces |

## Key GenAI OTel Attributes

Stored in `customDimensions` on `dependencies` spans:

| Attribute | Description | Example |
|-----------|-------------|---------|
| `gen_ai.operation.name` | Operation type | `chat`, `invoke_agent`, `execute_tool`, `create_agent` |
| `gen_ai.conversation.id` | Conversation/session ID | `conv_5j66UpCpwteGg4YSxUnt7lPY` |
| `gen_ai.response.id` | Response ID | `chatcmpl-123` |
| `gen_ai.agent.name` | Agent name | `my-support-agent` |
| `gen_ai.agent.id` | Agent identifier | `asst_abc123` |
| `gen_ai.request.model` | Requested model | `gpt-4o` |
| `gen_ai.response.model` | Actual model used | `gpt-4o-2024-05-13` |
| `gen_ai.usage.input_tokens` | Input token count | `450` |
| `gen_ai.usage.output_tokens` | Output token count | `120` |
| `gen_ai.response.finish_reasons` | Stop reasons | `["stop"]`, `["tool_calls"]` |
| `error.type` | Error classification | `timeout`, `rate_limited`, `content_filter` |
| `gen_ai.provider.name` | Provider | `azure.ai.openai`, `openai` |
| `gen_ai.input.messages` | Full input messages (JSON array) โ€” on `invoke_agent` spans | `[{"role":"user","parts":[{"type":"text","content":"..."}]}]` |
| `gen_ai.output.messages` | Full output messages (JSON array) โ€” on `invoke_agent` spans | `[{"role":"assistant","parts":[{"type":"text","content":"..."}]}]` |

Stored in `customDimensions` on `customEvents` (name == `gen_ai.evaluation.result`):

| Attribute | Description | Example |
|-----------|-------------|---------|
| `gen_ai.evaluation.name` | Evaluator name | `Relevance`, `IntentResolution` |
| `gen_ai.evaluation.score.value` | Numeric score | `4.0` |
| `gen_ai.evaluation.score.label` | Human-readable label | `pass`, `fail`, `relevant` |
| `gen_ai.evaluation.explanation` | Free-form explanation | `"Response lacks detail..."` |
| `gen_ai.response.id` | Correlates to the evaluated span | `chatcmpl-123` |
| `gen_ai.conversation.id` | Correlates to conversation | `conv_5j66...` |

> **Correlation:** Eval results do NOT link via id-parentId. Use `gen_ai.conversation.id` and/or `gen_ai.response.id` to join with `dependencies` spans.

## Span Correlation

| Field | Purpose |
|-------|---------|
| `operation_Id` | Trace ID โ€” groups all spans in one request |
| `id` | Span ID โ€” unique identifier for this span |
| `operation_ParentId` | Parent span ID โ€” use with `id` to build span trees |

### Operation_Id Join (requests โ†’ dependencies)

Use `requests` as the hosted-agent entry point, then carry `operation_Id` forward as the trace key when joining into `dependencies`, `traces`, or `customEvents`:

```kql
let agentRequests = materialize(
    requests
| where timestamp > ago(7d)
| extend
    foundryAgentName = coalesce(
        tostring(customDimensions["gen_ai.agent.name"]),
        tostring(customDimensions["azure.ai.agentserver.agent_name"])
    ),
    agentId = tostring(customDimensions["gen_ai.agent.id"]),
    agentNameFromId = tostring(split(agentId, ":")[0]),
    agentVersion = iff(agentId contains ":", tostring(split(agentId, ":")[1]), ""),
    conversationId = coalesce(
        tostring(customDimensions["gen_ai.conversation.id"]),
        tostring(customDimensions["azure.ai.agentserver.conversation_id"]),
        operation_Id
    )
| where foundryAgentName == "<foundry-agent-name>"
    or agentNameFromId == "<foundry-agent-name>"
| project operation_Id, conversationId, agentVersion
);
dependencies
| where timestamp > ago(7d)
| where isnotempty(customDimensions["gen_ai.operation.name"])
| join kind=inner agentRequests on operation_Id
| extend
    operation = tostring(customDimensions["gen_ai.operation.name"]),
    model = tostring(customDimensions["gen_ai.request.model"])
| project timestamp, duration, success, operation, model, conversationId, agentVersion, operation_Id
| order by timestamp desc
```

## Hosted Agent Attributes

Stored in `customDimensions` on **both `requests` and `traces`** tables (NOT on `dependencies` spans):

| Attribute | Description | Example |
|-----------|-------------|---------|
| `azure.ai.agentserver.agent_name` | Hosted agent name | `hosted-agent-022-001` |
| `azure.ai.agentserver.agent_id` | Internal agent ID | `code-asst-xmwokux85uqc7fodxejaxa` |
| `azure.ai.agentserver.conversation_id` | Conversation ID | `conv_d7ab624de92d...` |
| `azure.ai.agentserver.response_id` | Response ID (caresp format) | `caresp_d7ab624de92d...` |

> **Important:** Use `requests` as the preferred entry point for agent-name filtering โ€” it has both `azure.ai.agentserver.agent_name` and `gen_ai.agent.name` with the Foundry-level name. To reach downstream spans and related telemetry, carry `operation_Id` forward from the filtered request set and join other tables on that trace key.

> ๐Ÿ’ก **Version enrichment:** Some hosted-agent `requests` telemetry emits `gen_ai.agent.id` in `<foundry-agent-name>:<version>` format. When that delimiter is present, split on `:` to recover `agentVersion`; if it is absent, keep filtering on the requests-scoped name fields and leave version blank.

> โš ๏ธ **`gen_ai.agent.name` means different things on different tables:**
> - On `requests`: the **Foundry agent name** (user-visible) โ†’ e.g., `hosted-agent-022-001`
> - On `dependencies`: the **code-level class name** โ†’ e.g., `BingSearchAgent`
>
> **Always start from `requests`** when filtering by the Foundry agent name the user knows.

## Response ID Formats

| Agent Type | Prefix | Example |
|------------|--------|---------|
| Hosted agent (AgentServer) | `caresp_` | `caresp_d7ab624de92da637008Rhr4U4E1y9FSE...` |
| Prompt agent (Foundry Responses API) | `resp_` | `resp_4e2f8b016b5a0dad00697bd3c4c1b881...` |
| Azure OpenAI chat completions | `chatcmpl-` | `chatcmpl-abc123def456` |

When searching by response ID, use the appropriate prefix to narrow results. The `gen_ai.response.id` attribute appears on `dependencies` spans (for `chat` operations) and in `customEvents` (for evaluation results).

## Common Query Templates

### Overview โ€” Conversations in last 24h
```kql
dependencies
| where timestamp > ago(24h)
| where isnotempty(customDimensions["gen_ai.operation.name"])
| summarize
    spanCount = count(),
    errorCount = countif(success == false),
    avgDuration = avg(duration),
    totalInputTokens = sum(toint(customDimensions["gen_ai.usage.input_tokens"])),
    totalOutputTokens = sum(toint(customDimensions["gen_ai.usage.output_tokens"]))
  by bin(timestamp, 1h)
| order by timestamp desc
```

### Error Rate by Operation
```kql
dependencies
| where timestamp > ago(24h)
| where isnotempty(customDimensions["gen_ai.operation.name"])
| summarize
    total = count(),
    errors = countif(success == false),
    errorRate = round(100.0 * countif(success == false) / count(), 1)
  by operation = tostring(customDimensions["gen_ai.operation.name"])
| order by errorRate desc
```

### Token Usage by Model
```kql
dependencies
| where timestamp > ago(24h)
| where customDimensions["gen_ai.operation.name"] == "chat"
| summarize
    calls = count(),
    totalInput = sum(toint(customDimensions["gen_ai.usage.input_tokens"])),
    totalOutput = sum(toint(customDimensions["gen_ai.usage.output_tokens"])),
    avgInput = avg(todouble(customDimensions["gen_ai.usage.input_tokens"])),
    avgOutput = avg(todouble(customDimensions["gen_ai.usage.output_tokens"]))
  by model = tostring(customDimensions["gen_ai.request.model"])
| order by totalInput desc
```

### Tool Call Details
```kql
dependencies
| where operation_Id == "<operation_id>"
| where customDimensions["gen_ai.operation.name"] == "execute_tool"
| project timestamp, duration, success,
    toolName = tostring(customDimensions["gen_ai.tool.name"]),
    toolType = tostring(customDimensions["gen_ai.tool.type"]),
    toolCallId = tostring(customDimensions["gen_ai.tool.call.id"]),
    toolArgs = tostring(customDimensions["gen_ai.tool.call.arguments"]),
    toolResult = tostring(customDimensions["gen_ai.tool.call.result"])
| order by timestamp asc
```

Key tool attributes:

| Attribute | Description | Example |
|-----------|-------------|---------|
| `gen_ai.tool.name` | Tool function name | `remote_functions.bing_grounding`, `python` |
| `gen_ai.tool.type` | Tool type | `extension`, `function` |
| `gen_ai.tool.call.id` | Unique call ID | `call_db64aa6a004a...` |
| `gen_ai.tool.call.arguments` | JSON arguments passed | `{"query": "latest AI news"}` |
| `gen_ai.tool.call.result` | Tool output (may be truncated) | `<<ImageDisplayed>>` |

### Evaluation Results by Conversation
```kql
customEvents
| where timestamp > ago(24h)
| where name == "gen_ai.evaluation.result"
| extend
    evalName = tostring(customDimensions["gen_ai.evaluation.name"]),
    score = todouble(customDimensions["gen_ai.evaluation.score.value"]),
    label = tostring(customDimensions["gen_ai.evaluation.score.label"]),
    conversationId = tostring(customDimensions["gen_ai.conversation.id"])
| summarize
    evalCount = count(),
    avgScore = avg(score),
    failCount = countif(label == "fail" or label == "not_relevant" or label == "incorrect"),
    evaluators = make_set(evalName)
  by conversationId
| order by failCount desc
```

> For detailed eval queries by response ID or conversation ID, see [Eval Correlation](eval-correlation.md).

## OTel Reference Links

- [GenAI Spans](https://opentelemetry.io/docs/specs/semconv/gen-ai/gen-ai-spans/)
- [GenAI Agent Spans](https://opentelemetry.io/docs/specs/semconv/gen-ai/gen-ai-agent-spans/)
- [GenAI Events](https://opentelemetry.io/docs/specs/semconv/gen-ai/gen-ai-events/)
- [GenAI Metrics](https://opentelemetry.io/docs/specs/semconv/gen-ai/gen-ai-metrics/)
search-traces.md 6.7 KB
# Search Traces โ€” Conversation-Level Search

Search agent traces at the conversation level. Returns summaries grouped by conversation or operation, not individual spans.

## Prerequisites

- App Insights resource resolved (see [trace.md](../trace.md) Before Starting)
- Selected agent root and environment confirmed from `.foundry/agent-metadata.yaml`
- Time range confirmed with user (default: last 24 hours)

## Search by Conversation ID

Keep the selected environment visible in the summary, and add the selected agent name or environment tag filters when the telemetry emits them.

```kql
dependencies
| where timestamp > ago(24h)
| where customDimensions["gen_ai.conversation.id"] == "<conversation_id>"
| project timestamp, name, duration, resultCode, success,
    operation = tostring(customDimensions["gen_ai.operation.name"]),
    model = tostring(customDimensions["gen_ai.request.model"]),
    inputTokens = toint(customDimensions["gen_ai.usage.input_tokens"]),
    outputTokens = toint(customDimensions["gen_ai.usage.output_tokens"]),
    operation_Id, id, operation_ParentId
| order by timestamp asc
```

## Search by Response ID

Auto-detect the response ID format to determine agent type:
- `caresp_...` โ†’ Hosted agent (AgentServer)
- `resp_...` โ†’ Prompt agent (Foundry Responses API)
- `chatcmpl-...` โ†’ Azure OpenAI chat completions

```kql
dependencies
| where timestamp > ago(24h)
| where customDimensions["gen_ai.response.id"] == "<response_id>"
| project timestamp, name, duration, resultCode, success,
    operation = tostring(customDimensions["gen_ai.operation.name"]),
    model = tostring(customDimensions["gen_ai.request.model"]),
    inputTokens = toint(customDimensions["gen_ai.usage.input_tokens"]),
    outputTokens = toint(customDimensions["gen_ai.usage.output_tokens"]),
    operation_Id, id, operation_ParentId
```

Then drill into the full conversation:

> โš ๏ธ **STOP โ€” read [Conversation Detail](conversation-detail.md) before writing your own drill-down query.** It contains the correct span tree reconstruction logic, event/exception queries, and eval correlation steps.

Quick drill-down using the `operation_Id` from above:

```kql
dependencies
| where operation_Id == "<operation_id_from_above>"
| project timestamp, name, duration, resultCode, success,
    spanId = id, parentSpanId = operation_ParentId,
    operation = tostring(customDimensions["gen_ai.operation.name"]),
    model = tostring(customDimensions["gen_ai.request.model"]),
    inputTokens = toint(customDimensions["gen_ai.usage.input_tokens"]),
    outputTokens = toint(customDimensions["gen_ai.usage.output_tokens"]),
    responseId = tostring(customDimensions["gen_ai.response.id"]),
    errorType = tostring(customDimensions["error.type"]),
    toolName = tostring(customDimensions["gen_ai.tool.name"])
| order by timestamp asc
```

Also check for eval results: see [Eval Correlation](eval-correlation.md).

## Search by Agent Name

> **Note:** For hosted agents, `gen_ai.agent.name` in `dependencies` refers to *sub-agents* (e.g., `BingSearchAgent`), not the top-level hosted agent. See "Search by Hosted Agent Name" below.

> ๐Ÿ’ก **Hosted-agent versioning:** If you need the deployed version, use the hosted-agent pattern below and parse `gen_ai.agent.id` when it is emitted in `<agent-name>:<version>` format.

```kql
dependencies
| where timestamp > ago(24h)
| where customDimensions["gen_ai.agent.name"] == "<agent_name>"
| summarize
    startTime = min(timestamp),
    endTime = max(timestamp),
    totalDuration = max(timestamp) - min(timestamp),
    spanCount = count(),
    errorCount = countif(success == false),
    totalInputTokens = sum(toint(customDimensions["gen_ai.usage.input_tokens"])),
    totalOutputTokens = sum(toint(customDimensions["gen_ai.usage.output_tokens"]))
  by conversationId = tostring(customDimensions["gen_ai.conversation.id"]),
     operation_Id
| order by startTime desc
| take 50
```

## Search by Hosted Agent Name

For hosted agents, the Foundry agent name (e.g., `hosted-agent-022-001`) appears on `requests` and `traces` โ€” NOT on `dependencies`. Use `requests` as the preferred entry point, materialize the matching request rows, then join downstream spans on `operation_Id`:

```kql
let agentRequests = materialize(
    requests
| where timestamp > ago(24h)
| extend
    foundryAgentName = coalesce(
        tostring(customDimensions["gen_ai.agent.name"]),
        tostring(customDimensions["azure.ai.agentserver.agent_name"])
    ),
    agentId = tostring(customDimensions["gen_ai.agent.id"]),
    agentNameFromId = tostring(split(agentId, ":")[0]),
    agentVersion = iff(agentId contains ":", tostring(split(agentId, ":")[1]), ""),
    conversationId = coalesce(
        tostring(customDimensions["gen_ai.conversation.id"]),
        tostring(customDimensions["azure.ai.agentserver.conversation_id"]),
        operation_Id
    )
| where foundryAgentName == "<agent_name>"
    or agentNameFromId == "<agent_name>"
| project operation_Id, conversationId, agentVersion
);
dependencies
| where timestamp > ago(24h)
| where isnotempty(customDimensions["gen_ai.operation.name"])
| join kind=inner agentRequests on operation_Id
| summarize
    startTime = min(timestamp),
    endTime = max(timestamp),
    spanCount = count(),
    errorCount = countif(success == false),
    totalInputTokens = sum(toint(customDimensions["gen_ai.usage.input_tokens"])),
    totalOutputTokens = sum(toint(customDimensions["gen_ai.usage.output_tokens"]))
  by conversationId, operation_Id, agentVersion
| order by startTime desc
| take 50
```

If `gen_ai.agent.id` does not contain `:`, continue using the requests-scoped name fields for filtering and treat `agentVersion` as optional enrichment rather than a required key.

## Conversation Summary Table

Present results in this format:

| Conversation ID | Agent Version | Start Time | Duration | Spans | Errors | Input Tokens | Output Tokens |
|----------------|---------------|------------|----------|-------|--------|-------------|---------------|
| conv_abc123 | 3 | 2025-01-15 10:30 | 4.2s | 12 | 0 | 850 | 320 |
| conv_def456 | 4 | 2025-01-15 10:25 | 8.7s | 18 | 2 | 1200 | 450 |

Highlight rows with errors in the summary. Offer to drill into any conversation via [Conversation Detail](conversation-detail.md).

## Free-Text Search

When the user provides a general search term (e.g., agent name, error message):

```kql
union dependencies, requests, exceptions, traces
| where timestamp > ago(24h)
| where * contains "<search_term>"
| summarize count() by operation_Id
| order by count_ desc
| take 20
```

## After Successful Query

> ๐Ÿ“ **Reminder:** If this is the first trace query in this session, ensure App Insights connection info was persisted to `.foundry/agent-metadata.yaml` for the selected environment (see [trace.md โ€” Before Starting](../trace.md#before-starting--resolve-app-insights-connection)).
foundry-agent/trace/
trace.md 5.9 KB
# Foundry Agent Trace Analysis

Analyze production traces for Foundry agents using Application Insights and GenAI OpenTelemetry semantic conventions. This skill provides structured KQL-powered workflows for a selected agent root and environment: searching conversations, diagnosing failures, and identifying latency bottlenecks.

## When to Use This Skill

USE FOR: analyze agent traces, search agent conversations, find failing traces, slow traces, latency analysis, trace search, conversation history, agent errors in production, debug agent responses, App Insights traces, GenAI telemetry, trace correlation, span tree, production trace analysis, evaluation results, evaluation scores, eval run results, find by response ID, get agent trace by conversation ID, agent evaluation scores from App Insights.

> **USE THIS SKILL INSTEAD OF** `azure-monitor` or `azure-applicationinsights` when querying Foundry agent traces, evaluations, or GenAI telemetry. This skill has correct GenAI OTel attribute mappings and tested KQL templates that those general tools lack.

> โš ๏ธ **DO NOT manually write KQL queries** for GenAI trace analysis **without reading this skill first.** This skill provides tested query templates with correct GenAI OTel attribute mappings, proper span correlation logic, environment-aware scoping, and conversation-level aggregation patterns.

## Quick Reference

| Property | Value |
|----------|-------|
| Data source | Application Insights (App Insights) |
| Query language | KQL (Kusto Query Language) |
| Related skills | `troubleshoot` (container logs), `eval-datasets` (trace harvesting) |
| Preferred query tool | `monitor_resource_log_query` (Azure MCP) - use for App Insights KQL queries |
| OTel conventions | [GenAI Spans](https://opentelemetry.io/docs/specs/semconv/gen-ai/gen-ai-spans/), [Agent Spans](https://opentelemetry.io/docs/specs/semconv/gen-ai/gen-ai-agent-spans/) |
| Local metadata | `.foundry/agent-metadata.yaml` |

## Entry Points

| User Intent | Start At |
|-------------|----------|
| "Search agent conversations" / "Find traces" | [Search Traces](references/search-traces.md) |
| "Tell me about response ID X" / "Look up response ID" | [Search Traces - Search by Response ID](references/search-traces.md#search-by-response-id) |
| "Why is my agent failing?" / "Find errors" | [Analyze Failures](references/analyze-failures.md) |
| "My agent is slow" / "Latency analysis" | [Analyze Latency](references/analyze-latency.md) |
| "Show me this conversation" / "Trace detail" | [Conversation Detail](references/conversation-detail.md) |
| "Find eval results for response ID" / "eval scores from traces" | [Eval Correlation](references/eval-correlation.md) |
| "What KQL do I need?" | [KQL Templates](references/kql-templates.md) |

## Before Starting โ€” Resolve App Insights Connection

1. Resolve the target agent root and environment from `.foundry/agent-metadata.yaml`.
2. Check `environments.<env>.observability.applicationInsightsConnectionString` or `environments.<env>.observability.applicationInsightsResourceId` in the metadata.
3. If observability settings are missing, use `project_connection_list` to discover App Insights linked to the Foundry project, then persist the chosen resource back to `environments.<env>.observability` in `agent-metadata.yaml` before querying.
4. Confirm the selected App Insights resource and environment with the user before querying.
5. Use **`monitor_resource_log_query`** (Azure MCP tool) to execute KQL queries against the App Insights resource. This is preferred over delegating to the `azure-kusto` skill. Pass the App Insights resource ID and the KQL query directly.

| Metadata field | Purpose | Example |
|----------------|---------|---------|
| `environments.<env>.observability.applicationInsightsConnectionString` | App Insights connection string | `InstrumentationKey=...;IngestionEndpoint=...` |
| `environments.<env>.observability.applicationInsightsResourceId` | ARM resource ID | `/subscriptions/.../Microsoft.Insights/components/...` |

> โš ๏ธ **Always pass `subscription` explicitly** to Azure MCP tools like `monitor_resource_log_query` - they do not extract it from resource IDs.

## Behavioral Rules

1. **Always display the KQL query.** Before executing any KQL query, display it in a code block. Never run a query silently.
2. **Keep environment visible.** Include the selected environment and agent name in each search summary, and include the derived agent version when the query can recover it from telemetry.
3. **Start broad, then narrow.** Begin with conversation-level summaries, then drill into specific conversations or spans on user request.
4. **Use time ranges.** Always scope queries with a time range (default: last 24 hours). Ask the user for the range if not specified.
5. **Explain GenAI attributes.** When displaying results, translate OTel attribute names to human-readable labels (for example, `gen_ai.operation.name` -> "Operation").
6. **Link to conversation detail.** When showing search or failure results, offer to drill into any specific conversation.
7. **Scope to the selected environment.** App Insights may contain traces from multiple agents or environments. Filter with the selected environment's agent name first, then add an environment tag filter if the telemetry emits one.
8. **Resolve hosted-agent identity from `requests` first.** For hosted agents, prefer `requests`-scoped `gen_ai.agent.name` or `azure.ai.agentserver.agent_name` as the Foundry-facing filter. When `gen_ai.agent.id` is emitted in `<agent-name>:<version>` format, parse it to surface `agentVersion`, but do not treat `dependencies.gen_ai.agent.name` as the top-level hosted-agent name.
9. **Use `operation_Id` to fan out hosted-agent traces.** After isolating the hosted-agent `requests` rows, materialize their `operation_Id` values and join other telemetry tables on `operation_Id`. When conversation IDs are sparse, use `coalesce(gen_ai.conversation.id, azure.ai.agentserver.conversation_id, operation_Id)` so every row still rolls up to a stable conversation key.
foundry-agent/troubleshoot/
troubleshoot.md 5.3 KB
# Foundry Agent Troubleshoot

Troubleshoot and debug Foundry agents by collecting container logs, discovering observability connections, and querying Application Insights telemetry.

## Quick Reference

| Property | Value |
|----------|-------|
| Agent types | Prompt (LLM-based), Hosted (container-based) |
| MCP servers | `foundry-mcp` |
| Key MCP tools | `agent_get`, `agent_container_status_get` |
| Related skills | `trace` (telemetry analysis) |
| Preferred query tool | `monitor_resource_log_query` (Azure MCP) โ€” preferred over `azure-kusto` for App Insights |
| CLI references | `az cognitiveservices agent logs`, `az cognitiveservices account connection` |

## When to Use This Skill

- Agent is not responding or returning errors
- Hosted agent container is failing to start
- Need to view container logs for a hosted agent
- Diagnose latency or timeout issues
- Query Application Insights for agent traces and exceptions
- Investigate agent runtime failures

## MCP Tools

| Tool | Description | Parameters |
|------|-------------|------------|
| `agent_get` | Get agent details to determine type (prompt/hosted) | `projectEndpoint` (required), `agentName` (optional) |
| `agent_container_status_get` | Check hosted agent container status | `projectEndpoint`, `agentName` (required); `agentVersion` |

## Workflow

### Step 1: Collect Agent Information

Use the project endpoint and agent name from the project context (see Common: Project Context Resolution). Ask the user only for values not already resolved:
- **Project endpoint** โ€” AI Foundry project endpoint URL
- **Agent name** โ€” Name of the agent to troubleshoot

### Step 2: Determine Agent Type

Use `agent_get` with `projectEndpoint` and `agentName` to retrieve the agent definition. Check the `kind` field:
- `"hosted"` โ†’ Proceed to Step 3 (Container Logs)
- `"prompt"` โ†’ Skip to Step 4 (Discover Observability Connections)

### Step 3: Retrieve Container Logs (Hosted Agents Only)

First check the container status using `agent_container_status_get`. Report the current status to the user.

Retrieve container logs using the Azure CLI command documented at:
[az cognitiveservices agent logs show](https://learn.microsoft.com/en-us/cli/azure/cognitiveservices/agent/logs?view=azure-cli-latest#az-cognitiveservices-agent-logs-show)

Refer to the documentation above for the exact command syntax and parameters. Present the logs to the user and highlight any errors or warnings found.

### Step 4: Discover Observability Connections

List the project connections to find Application Insights or Azure Monitor resources using the Azure CLI command documented at:
[az cognitiveservices account connection](https://learn.microsoft.com/en-us/cli/azure/cognitiveservices/account/connection?view=azure-cli-latest)

Refer to the documentation above for the exact command syntax and parameters. Look for connections of type `ApplicationInsights` or `AzureMonitor` in the output.

If no observability connection is found, inform the user and suggest setting up Application Insights for the project. Ask if they want to proceed without telemetry data.

### Step 5: Query Application Insights Telemetry

Use **`monitor_resource_log_query`** (Azure MCP tool) to run KQL queries against the Application Insights resource discovered in Step 4. This is preferred over delegating to the `azure-kusto` skill. Pass the App Insights resource ID and the KQL query directly.

> โš ๏ธ **Always pass `subscription` explicitly** to Azure MCP tools like `monitor_resource_log_query` โ€” they don't extract it from resource IDs.

Use `* contains "<response_id>"` or `* contains "<agent_name>"` filters to narrow down results to the specific agent instance.

### Step 6: Summarize Findings

Present a summary to the user including:
- **Agent type and status** โ€” hosted/prompt, container status (if hosted)
- **Container log errors** โ€” key errors from logs (hosted only)
- **Telemetry insights** โ€” exceptions, failed requests, latency trends
- **Recommended actions** โ€” specific steps to resolve identified issues

## Error Handling

| Error | Cause | Resolution |
|-------|-------|------------|
| Agent not found | Invalid agent name or project endpoint | Use `agent_get` to list available agents and verify name |
| Container logs unavailable | Agent is a prompt agent or container never started | Prompt agents don't have container logs โ€” skip to telemetry |
| No observability connection | Application Insights not configured for the project | Suggest configuring Application Insights for the Foundry project |
| Kusto query failed | Invalid cluster/database or insufficient permissions | Verify Application Insights resource details and reader permissions |
| No telemetry data | Agent not instrumented or too recent | Check if Application Insights SDK is configured; data may take a few minutes to appear |

## Additional Resources

- [Foundry Hosted Agents](https://learn.microsoft.com/azure/ai-foundry/agents/concepts/hosted-agents?view=foundry)
- [Agent Logs CLI Reference](https://learn.microsoft.com/en-us/cli/azure/cognitiveservices/agent/logs?view=azure-cli-latest)
- [Account Connection CLI Reference](https://learn.microsoft.com/en-us/cli/azure/cognitiveservices/account/connection?view=azure-cli-latest)
- [KQL Quick Reference](https://learn.microsoft.com/azure/data-explorer/kusto/query/kql-quick-reference)
- [Foundry Samples](https://github.com/azure-ai-foundry/foundry-samples)
models/deploy-model/
SKILL.md 6.4 KB
---
name: deploy-model
description: "Unified Azure OpenAI model deployment skill with intelligent intent-based routing. Handles quick preset deployments, fully customized deployments (version/SKU/capacity/RAI policy), and capacity discovery across regions and projects. USE FOR: deploy model, deploy gpt, create deployment, model deployment, deploy openai model, set up model, provision model, find capacity, check model availability, where can I deploy, best region for model, capacity analysis. DO NOT USE FOR: listing existing deployments (use foundry_models_deployments_list MCP tool), deleting deployments, agent creation (use agent/create), project creation (use project/create)."
license: MIT
metadata:
  author: Microsoft
  version: "1.0.0"
---

# Deploy Model

Unified entry point for all Azure OpenAI model deployment workflows. Analyzes user intent and routes to the appropriate deployment mode.

## Quick Reference

| Mode | When to Use | Sub-Skill |
|------|-------------|-----------|
| **Preset** | Quick deployment, no customization needed | [preset/SKILL.md](preset/SKILL.md) |
| **Customize** | Full control: version, SKU, capacity, RAI policy | [customize/SKILL.md](customize/SKILL.md) |
| **Capacity Discovery** | Find where you can deploy with specific capacity | [capacity/SKILL.md](capacity/SKILL.md) |

## Intent Detection

Analyze the user's prompt and route to the correct mode:

```
User Prompt
    โ”‚
    โ”œโ”€ Simple deployment (no modifiers)
    โ”‚  "deploy gpt-4o", "set up a model"
    โ”‚  โ””โ”€> PRESET mode
    โ”‚
    โ”œโ”€ Customization keywords present
    โ”‚  "custom settings", "choose version", "select SKU",
    โ”‚  "set capacity to X", "configure content filter",
    โ”‚  "PTU deployment", "with specific quota"
    โ”‚  โ””โ”€> CUSTOMIZE mode
    โ”‚
    โ”œโ”€ Capacity/availability query
    โ”‚  "find where I can deploy", "check capacity",
    โ”‚  "which region has X capacity", "best region for 10K TPM",
    โ”‚  "where is this model available"
    โ”‚  โ””โ”€> CAPACITY DISCOVERY mode
    โ”‚
    โ””โ”€ Ambiguous (has capacity target + deploy intent)
       "deploy gpt-4o with 10K capacity to best region"
       โ””โ”€> CAPACITY DISCOVERY first โ†’ then PRESET or CUSTOMIZE
```

### Routing Rules

| Signal in Prompt | Route To | Reason |
|------------------|----------|--------|
| Just model name, no options | **Preset** | User wants quick deployment |
| "custom", "configure", "choose", "select" | **Customize** | User wants control |
| "find", "check", "where", "which region", "available" | **Capacity** | User wants discovery |
| Specific capacity number + "best region" | **Capacity โ†’ Preset** | Discover then deploy quickly |
| Specific capacity number + "custom" keywords | **Capacity โ†’ Customize** | Discover then deploy with options |
| "PTU", "provisioned throughput" | **Customize** | PTU requires SKU selection |
| "optimal region", "best region" (no capacity target) | **Preset** | Region optimization is preset's specialty |

### Multi-Mode Chaining

Some prompts require two modes in sequence:

**Pattern: Capacity โ†’ Deploy**
When a user specifies a capacity requirement AND wants deployment:
1. Run **Capacity Discovery** to find regions/projects with sufficient quota
2. Present findings to user
3. Ask: "Would you like to deploy with **quick defaults** or **customize settings**?"
4. Route to **Preset** or **Customize** based on answer

> ๐Ÿ’ก **Tip:** If unsure which mode the user wants, default to **Preset** (quick deployment). Users who want customization will typically use explicit keywords like "custom", "configure", or "with specific settings".

## Project Selection (All Modes)

Before any deployment, resolve which project to deploy to. This applies to **all** modes (preset, customize, and after capacity discovery).

### Resolution Order

1. **Check `PROJECT_RESOURCE_ID` env var** โ€” if set, use it as the default
2. **Check user prompt** โ€” if user named a specific project or region, use that
3. **If neither** โ€” query the user's projects and suggest the current one

### Confirmation Step (Required)

**Always confirm the target before deploying.** Show the user what will be used and give them a chance to change it:

```
Deploying to:
  Project:  <project-name>
  Region:   <region>
  Resource: <resource-group>

Is this correct? Or choose a different project:
  1. โœ… Yes, deploy here (default)
  2. ๐Ÿ“‹ Show me other projects in this region
  3. ๐ŸŒ Choose a different region
```

If user picks option 2, show top 5 projects in that region:

```
Projects in <region>:
  1. project-alpha (rg-alpha)
  2. project-beta (rg-beta)
  3. project-gamma (rg-gamma)
  ...
```

> โš ๏ธ **Never deploy without showing the user which project will be used.** This prevents accidental deployments to the wrong resource.

## Pre-Deployment Validation (All Modes)

Before presenting any deployment options (SKU, capacity), always validate both of these:

1. **Model supports the SKU** โ€” query the model catalog to confirm the selected model+version supports the target SKU:
   ```bash
   az cognitiveservices model list --location <region> --subscription <sub-id> -o json
   ```
   Filter for the model, extract `.model.skus[].name` to get supported SKUs.

2. **Subscription has available quota** โ€” check that the user's subscription has unallocated quota for the SKU+model combination:
   ```bash
   az cognitiveservices usage list --location <region> --subscription <sub-id> -o json
   ```
   Match by usage name pattern `OpenAI.<SKU>.<model-name>` (e.g., `OpenAI.GlobalStandard.gpt-4o`). Compute `available = limit - currentValue`.

> โš ๏ธ **Warning:** Only present options that pass both checks. Do NOT show hardcoded SKU lists โ€” always query dynamically. SKUs with 0 available quota should be shown as โŒ informational items, not selectable options.

> ๐Ÿ’ก **Quota management:** For quota increase requests, usage monitoring, and troubleshooting quota errors, defer to the [quota skill](../../quota/quota.md) instead of duplicating that guidance inline.

## Prerequisites

All deployment modes require:
- Azure CLI installed and authenticated (`az login`)
- Active Azure subscription with deployment permissions
- Azure AI Foundry project resource ID (or agent will help discover it via `PROJECT_RESOURCE_ID` env var)

## Sub-Skills

- **[preset/SKILL.md](preset/SKILL.md)** โ€” Quick deployment to optimal region with sensible defaults
- **[customize/SKILL.md](customize/SKILL.md)** โ€” Interactive guided flow with full configuration control
- **[capacity/SKILL.md](capacity/SKILL.md)** โ€” Discover available capacity across regions and projects
TEST_PROMPTS.md 3.2 KB
# Deploy Model โ€” Test Prompts

Test prompts for the unified `deploy-model` skill with router, preset, customize, and capacity sub-skills.

## Preset Mode (Quick Deploy)

| # | Prompt | Expected |
|---|--------|----------|
| 1 | Deploy gpt-4o | Preset โ€” confirm project, deploy with defaults |
| 2 | Set up o3-mini for me | Preset โ€” pick latest version automatically |
| 3 | I need a text-embedding-ada-002 deployment | Preset โ€” non-chat model |
| 4 | Deploy gpt-4o to the best region | Preset โ€” region scan, no capacity target |

## Customize Mode (Guided Flow)

| # | Prompt | Expected |
|---|--------|----------|
| 5 | Deploy gpt-4o with custom settings | Customize โ€” walk through version โ†’ SKU โ†’ capacity โ†’ RAI |
| 6 | I want to choose the version and SKU for my o3-mini deployment | Customize โ€” explicit keywords |
| 7 | Set up a PTU deployment for gpt-4o | Customize โ€” PTU requires SKU selection |
| 8 | Deploy gpt-4o with a specific content filter | Customize โ€” RAI policy flow |

## Capacity Discovery

| # | Prompt | Expected |
|---|--------|----------|
| 9 | Where can I deploy gpt-4o? | Capacity โ€” show regions, no deploy |
| 10 | Which regions have o3-mini available? | Capacity โ€” run script, show table |
| 11 | Check if I have enough quota for gpt-4o with 500K TPM | Capacity โ€” high target, some regions may not qualify |

## Chained (Capacity โ†’ Deploy)

| # | Prompt | Expected |
|---|--------|----------|
| 12 | Find me the best region and project to deploy gpt-4o with 10K capacity | Capacity โ†’ Preset |
| 13 | Deploy o3-mini with 200K TPM to whatever region has it | Capacity โ†’ Preset |
| 14 | I want to deploy gpt-4o with 50K capacity and choose my own settings | Capacity โ†’ Customize |

## Negative / Edge Cases

| # | Prompt | Expected |
|---|--------|----------|
| 15 | Deploy unicorn-model-9000 | Fail gracefully โ€” model doesn't exist |
| 16 | Deploy gpt-4o with 999999K TPM | Capacity shows no region qualifies |
| 17 | Deploy gpt-4o (with az login expired) | Auth error caught early |
| 18 | Delete my gpt-4o deployment | Should NOT trigger deploy-model |
| 19 | List my current deployments | Should NOT trigger deploy-model |
| 20 | Deploy gpt-4o to mars-region-1 | Fail gracefully โ€” invalid region |

## Project Selection

| # | Prompt | Expected |
|---|--------|----------|
| 21 | Deploy gpt-4o (with PROJECT_RESOURCE_ID set) | Show current project, confirm before deploying |
| 22 | Deploy gpt-4o (no PROJECT_RESOURCE_ID) | Ask user to pick a project |
| 23 | Deploy gpt-4o to project my-special-project | Use named project directly |

## Ambiguous / Routing Stress

| # | Prompt | Expected |
|---|--------|----------|
| 24 | Help me with model deployment | Preset (default) โ€” vague, no keywords |
| 25 | I need gpt-4o deployed fast with good capacity | Preset โ€” "fast" + vague capacity |
| 26 | Can you configure a deployment? | Customize โ€” "configure" keyword, should ask which model |
| 27 | What's the best way to deploy gpt-4o with 100K? | Capacity โ†’ Preset |

## Automated Test Results (2026-02-09)

All 18 tests passed. Deployments created during testing were cleaned up.

| Category | Tests | Result |
|----------|-------|--------|
| Preset | 3/3 | โœ… |
| Customize | 2/2 | โœ… |
| Capacity | 3/3 | โœ… |
| Chained | 1/1 | โœ… |
| Negative | 5/5 | โœ… |
| Ambiguous | 4/4 | โœ… |
models/deploy-model/capacity/
SKILL.md 6.8 KB
---
name: capacity
description: "Discovers available Azure OpenAI model capacity across regions and projects. Analyzes quota limits, compares availability, and recommends optimal deployment locations based on capacity requirements. USE FOR: find capacity, check quota, where can I deploy, capacity discovery, best region for capacity, multi-project capacity search, quota analysis, model availability, region comparison, check TPM availability. DO NOT USE FOR: actual deployment (hand off to preset or customize after discovery), quota increase requests (direct user to Azure Portal), listing existing deployments."
license: MIT
metadata:
  author: Microsoft
  version: "1.0.0"
---

# Capacity Discovery

Finds available Azure OpenAI model capacity across all accessible regions and projects. Recommends the best deployment location based on capacity requirements.

## Quick Reference

| Property | Description |
|----------|-------------|
| **Purpose** | Find where you can deploy a model with sufficient capacity |
| **Scope** | All regions and projects the user has access to |
| **Output** | Ranked table of regions/projects with available capacity |
| **Action** | Read-only analysis โ€” does NOT deploy. Hands off to preset or customize |
| **Authentication** | Azure CLI (`az login`) |

## When to Use This Skill

- โœ… User asks "where can I deploy gpt-4o?"
- โœ… User specifies a capacity target: "find a region with 10K TPM for gpt-4o"
- โœ… User wants to compare availability: "which regions have gpt-4o available?"
- โœ… User got a quota error and needs to find an alternative location
- โœ… User asks "best region and project for deploying model X"

**After discovery โ†’ hand off to [preset](../preset/SKILL.md) or [customize](../customize/SKILL.md) for actual deployment.**

## Scripts

Pre-built scripts handle the complex REST API calls and data processing. Use these instead of constructing commands manually.

| Script | Purpose | Usage |
|--------|---------|-------|
| `scripts/discover_and_rank.ps1` | Full discovery: capacity + projects + ranking | Primary script for capacity discovery |
| `scripts/discover_and_rank.sh` | Same as above (bash) | Primary script for capacity discovery |
| `scripts/query_capacity.ps1` | Raw capacity query (no project matching) | Quick capacity check or version listing |
| `scripts/query_capacity.sh` | Same as above (bash) | Quick capacity check or version listing |

## Workflow

### Phase 1: Validate Prerequisites

```bash
az account show --query "{Subscription:name, SubscriptionId:id}" --output table
```

### Phase 2: Identify Model and Version

Extract model name from user prompt. If version is unknown, query available versions:

```powershell
.\scripts\query_capacity.ps1 -ModelName <model-name>
```
```bash
./scripts/query_capacity.sh <model-name>
```

This lists available versions. Use the latest version unless user specifies otherwise.

### Phase 3: Run Discovery

Run the full discovery script with model name, version, and minimum capacity target:

```powershell
.\scripts\discover_and_rank.ps1 -ModelName <model-name> -ModelVersion <version> -MinCapacity <target>
```
```bash
./scripts/discover_and_rank.sh <model-name> <version> <min-capacity>
```

> ๐Ÿ’ก The script automatically queries capacity across ALL regions, cross-references with the user's existing projects, and outputs a ranked table sorted by: meets target โ†’ project count โ†’ available capacity.

### Phase 3.5: Validate Subscription Quota

After discovery identifies candidate regions, validate that the user's subscription actually has available quota in each region. Model capacity (from Phase 3) shows what the platform can support, but subscription quota limits what this specific user can deploy.

```powershell
# For each candidate region from discovery results:
$usageData = az cognitiveservices usage list --location <region> --subscription $SUBSCRIPTION_ID -o json 2>$null | ConvertFrom-Json

# Check quota for each SKU the model supports
# Quota names follow pattern: OpenAI.<SKU>.<model-name>
$usageEntry = $usageData | Where-Object { $_.name.value -eq "OpenAI.<SKU>.<model-name>" }

if ($usageEntry) {
  $quotaAvailable = $usageEntry.limit - $usageEntry.currentValue
} else {
  $quotaAvailable = 0  # No quota allocated
}
```
```bash
# For each candidate region from discovery results:
usage_json=$(az cognitiveservices usage list --location <region> --subscription "$SUBSCRIPTION_ID" -o json 2>/dev/null)

# Extract quota for specific SKU+model
quota_available=$(echo "$usage_json" | jq -r --arg name "OpenAI.<SKU>.<model-name>" \
  '.[] | select(.name.value == $name) | .limit - .currentValue')
```

**Annotate discovery results:**

Add a "Quota Available" column to the ranked output from Phase 3:

| Region | Available Capacity | Meets Target | Projects | Quota Available |
|--------|-------------------|--------------|----------|-----------------|
| eastus2 | 120K TPM | โœ… | 3 | โœ… 80K |
| westus3 | 90K TPM | โœ… | 1 | โŒ 0 (at limit) |
| swedencentral | 100K TPM | โœ… | 0 | โœ… 100K |

Regions/SKUs where `quotaAvailable = 0` should be marked with โŒ in the results. If no region has available quota, hand off to the [quota skill](../../../quota/quota.md) for increase requests and troubleshooting.

### Phase 4: Present Results and Hand Off

After the script outputs the ranked table (now annotated with quota info), present it to the user and ask:

1. ๐Ÿš€ **Quick deploy** to top recommendation with defaults โ†’ route to [preset](../preset/SKILL.md)
2. โš™๏ธ **Custom deploy** with version/SKU/capacity/RAI selection โ†’ route to [customize](../customize/SKILL.md)
3. ๐Ÿ“Š **Check another model** or capacity target โ†’ re-run Phase 2
4. โŒ Cancel

### Phase 5: Confirm Project Before Deploying

Before handing off to preset or customize, **always confirm the target project** with the user. See the [Project Selection](../SKILL.md#project-selection-all-modes) rules in the parent router.

If the discovery table shows a sample project for the chosen region, suggest it as the default. Otherwise, query projects in that region and let the user pick.

## Error Handling

| Error | Cause | Resolution |
|-------|-------|------------|
| "No capacity found" | Model not available or all at quota | Hand off to [quota skill](../../../quota/quota.md) for increase requests and troubleshooting |
| Script auth error | `az login` expired | Re-run `az login` |
| Empty version list | Model not in region catalog | Try a different region: `./scripts/query_capacity.sh <model> "" eastus` |
| "No projects found" | No AI Services resources | Guide to `project/create` skill or Azure Portal |

## Related Skills

- **[preset](../preset/SKILL.md)** โ€” Quick deployment after capacity discovery
- **[customize](../customize/SKILL.md)** โ€” Custom deployment after capacity discovery
- **[quota](../../../quota/quota.md)** โ€” For quota viewing, increase requests, and troubleshooting quota errors, defer to this skill instead of duplicating guidance
models/deploy-model/capacity/scripts/
discover_and_rank.ps1 4.6 KB
<#
.SYNOPSIS
    Discovers available capacity for an Azure OpenAI model across all regions,
    cross-references with existing projects and subscription quota, and outputs a ranked table.
.PARAMETER ModelName
    The model name (e.g., "gpt-4o", "o3-mini")
.PARAMETER ModelVersion
    The model version (e.g., "2025-01-31")
.PARAMETER MinCapacity
    Minimum required capacity in K TPM units (default: 0, shows all)
.EXAMPLE
    .\discover_and_rank.ps1 -ModelName o3-mini -ModelVersion 2025-01-31 -MinCapacity 200
#>
param(
    [Parameter(Mandatory)][string]$ModelName,
    [Parameter(Mandatory)][string]$ModelVersion,
    [int]$MinCapacity = 0
)

$ErrorActionPreference = "Stop"

$subId = az account show --query id -o tsv

# Query model capacity across all regions
$capRaw = az rest --method GET `
    --url "https://management.azure.com/subscriptions/$subId/providers/Microsoft.CognitiveServices/modelCapacities" `
    --url-parameters api-version=2024-10-01 modelFormat=OpenAI modelName=$ModelName modelVersion=$ModelVersion `
    2>$null | Out-String | ConvertFrom-Json

# Query all AI Foundry projects (AIProject kind)
$projRaw = az rest --method GET `
    --url "https://management.azure.com/subscriptions/$subId/providers/Microsoft.CognitiveServices/accounts" `
    --url-parameters api-version=2024-10-01 `
    --query "value[?kind=='AIProject'].{Name:name, Location:location}" `
    2>$null | Out-String | ConvertFrom-Json

# Build capacity map (GlobalStandard only, pick max per region)
$capMap = @{}
foreach ($item in $capRaw.value) {
    $sku = $item.properties.skuName
    $avail = [int]$item.properties.availableCapacity
    $region = $item.location
    if ($sku -eq "GlobalStandard" -and $avail -gt 0) {
        if (-not $capMap[$region] -or $avail -gt $capMap[$region]) {
            $capMap[$region] = $avail
        }
    }
}

# Build project map
$projMap = @{}
$projSample = @{}
foreach ($p in $projRaw) {
    $loc = $p.Location
    if (-not $projMap[$loc]) { $projMap[$loc] = 0 }
    $projMap[$loc]++
    if (-not $projSample[$loc]) { $projSample[$loc] = $p.Name }
}

# Check subscription quota per region
$quotaMap = @{}
$checkedRegions = @{}
foreach ($region in $capMap.Keys) {
    if ($checkedRegions[$region]) { continue }
    $checkedRegions[$region] = $true
    try {
        $usageData = az cognitiveservices usage list --location $region --subscription $subId -o json 2>$null | Out-String | ConvertFrom-Json
        $usageEntry = $usageData | Where-Object { $_.name.value -eq "OpenAI.GlobalStandard.$ModelName" }
        if ($usageEntry) {
            $quotaMap[$region] = [int]$usageEntry.limit - [int]$usageEntry.currentValue
        } else {
            $quotaMap[$region] = 0
        }
    } catch {
        $quotaMap[$region] = -1  # Unable to check
    }
}

# Combine and rank
$results = foreach ($region in $capMap.Keys) {
    $avail = $capMap[$region]
    $meets = $avail -ge $MinCapacity
    $quota = if ($quotaMap[$region]) { $quotaMap[$region] } else { 0 }
    $quotaDisplay = if ($quota -eq -1) { "?" } elseif ($quota -gt 0) { "${quota}K" } else { "0" }
    $quotaOk = $quota -gt 0 -or $quota -eq -1
    [PSCustomObject]@{
        Region         = $region
        AvailableTPM   = "${avail}K"
        AvailableRaw   = $avail
        MeetsTarget    = if ($meets) { "YES" } else { "no" }
        Projects       = if ($projMap[$region]) { $projMap[$region] } else { 0 }
        SampleProject  = if ($projSample[$region]) { $projSample[$region] } else { "(none)" }
        QuotaAvailable = $quotaDisplay
        QuotaOk        = $quotaOk
    }
}

$results = $results | Sort-Object @{Expression={$_.MeetsTarget -eq "YES"}; Descending=$true},
                                   @{Expression={$_.QuotaOk}; Descending=$true},
                                   @{Expression={$_.Projects}; Descending=$true},
                                   @{Expression={$_.AvailableRaw}; Descending=$true}

# Output summary
$total = ($results | Measure-Object).Count
$matching = ($results | Where-Object { $_.MeetsTarget -eq "YES" } | Measure-Object).Count
$withQuota = ($results | Where-Object { $_.MeetsTarget -eq "YES" -and $_.QuotaOk } | Measure-Object).Count
$withProjects = ($results | Where-Object { $_.MeetsTarget -eq "YES" -and $_.Projects -gt 0 } | Measure-Object).Count

Write-Host "Model: $ModelName v$ModelVersion | SKU: GlobalStandard | Min Capacity: ${MinCapacity}K TPM"
Write-Host "Regions with capacity: $total | Meets target: $matching | With quota: $withQuota | With projects: $withProjects"
Write-Host ""

$results | Select-Object Region, AvailableTPM, MeetsTarget, QuotaAvailable, Projects, SampleProject | Format-Table -AutoSize
discover_and_rank.sh 4.5 KB
#!/bin/bash
# discover_and_rank.sh
# Discovers available capacity for an Azure OpenAI model across all regions,
# cross-references with existing projects and subscription quota, and outputs a ranked table.
#
# Usage: ./discover_and_rank.sh <model-name> <model-version> [min-capacity]
# Example: ./discover_and_rank.sh o3-mini 2025-01-31 200
#
# Output: Ranked table of regions with capacity, quota, project counts, and match status

set -euo pipefail

MODEL_NAME="${1:?Usage: $0 <model-name> <model-version> [min-capacity]}"
MODEL_VERSION="${2:?Usage: $0 <model-name> <model-version> [min-capacity]}"
MIN_CAPACITY="${3:-0}"

SUB_ID=$(az account show --query id -o tsv)

# Query model capacity across all regions (GlobalStandard SKU)
CAPACITY_JSON=$(az rest --method GET \
  --url "https://management.azure.com/subscriptions/${SUB_ID}/providers/Microsoft.CognitiveServices/modelCapacities" \
  --url-parameters api-version=2024-10-01 modelFormat=OpenAI modelName="$MODEL_NAME" modelVersion="$MODEL_VERSION" \
  2>/dev/null)

# Query all AI Services projects
PROJECTS_JSON=$(az rest --method GET \
  --url "https://management.azure.com/subscriptions/${SUB_ID}/providers/Microsoft.CognitiveServices/accounts" \
  --url-parameters api-version=2024-10-01 \
  --query "value[?kind=='AIServices'].{name:name, location:location}" \
  2>/dev/null)

# Get unique regions from capacity results for quota checking
REGIONS=$(echo "$CAPACITY_JSON" | jq -r '.value[] | select(.properties.skuName=="GlobalStandard" and .properties.availableCapacity > 0) | .location' | sort -u)

# Build quota map: check subscription quota per region
declare -A QUOTA_MAP
for region in $REGIONS; do
  usage_json=$(az cognitiveservices usage list --location "$region" --subscription "$SUB_ID" -o json 2>/dev/null || echo "[]")
  quota_avail=$(echo "$usage_json" | jq -r --arg name "OpenAI.GlobalStandard.$MODEL_NAME" \
    '[.[] | select(.name.value == $name)] | if length > 0 then .[0].limit - .[0].currentValue else 0 end')
  QUOTA_MAP[$region]="${quota_avail:-0}"
done

# Export quota map as JSON for Python
QUOTA_JSON="{"
first=true
for region in "${!QUOTA_MAP[@]}"; do
  if [ "$first" = true ]; then first=false; else QUOTA_JSON+=","; fi
  QUOTA_JSON+="\"$region\":${QUOTA_MAP[$region]}"
done
QUOTA_JSON+="}"

# Combine, rank, and output using inline Python (available on all Azure CLI installs)
python3 -c "
import json, sys

capacity = json.loads('''${CAPACITY_JSON}''')
projects = json.loads('''${PROJECTS_JSON}''')
quota = json.loads('''${QUOTA_JSON}''')
min_cap = int('${MIN_CAPACITY}')

# Build capacity map (GlobalStandard only)
cap_map = {}
for item in capacity.get('value', []):
    props = item.get('properties', {})
    if props.get('skuName') == 'GlobalStandard' and props.get('availableCapacity', 0) > 0:
        region = item.get('location', '')
        cap_map[region] = max(cap_map.get(region, 0), props['availableCapacity'])

# Build project count map
proj_map = {}
proj_sample = {}
for p in (projects if isinstance(projects, list) else []):
    loc = p.get('location', '')
    proj_map[loc] = proj_map.get(loc, 0) + 1
    if loc not in proj_sample:
        proj_sample[loc] = p.get('name', '')

# Combine and rank
results = []
for region, cap in cap_map.items():
    meets = cap >= min_cap
    q = quota.get(region, 0)
    quota_ok = q > 0
    results.append({
        'region': region,
        'available': cap,
        'meets': meets,
        'projects': proj_map.get(region, 0),
        'sample': proj_sample.get(region, '(none)'),
        'quota': q,
        'quota_ok': quota_ok
    })

# Sort: meets target first, then quota available, then by project count, then by capacity
results.sort(key=lambda x: (-x['meets'], -x['quota_ok'], -x['projects'], -x['available']))

# Output
total = len(results)
matching = sum(1 for r in results if r['meets'])
with_quota = sum(1 for r in results if r['meets'] and r['quota_ok'])
with_projects = sum(1 for r in results if r['meets'] and r['projects'] > 0)

print(f'Model: {\"${MODEL_NAME}\"} v{\"${MODEL_VERSION}\"} | SKU: GlobalStandard | Min Capacity: {min_cap}K TPM')
print(f'Regions with capacity: {total} | Meets target: {matching} | With quota: {with_quota} | With projects: {with_projects}')
print()
print(f'{\"Region\":<22} {\"Available\":<12} {\"Meets Target\":<14} {\"Quota\":<12} {\"Projects\":<10} {\"Sample Project\"}')
print('-' * 100)
for r in results:
    mark = 'YES' if r['meets'] else 'no'
    q_display = f'{r[\"quota\"]}K' if r['quota'] > 0 else '0 (none)'
    print(f'{r[\"region\"]:<22} {r[\"available\"]}K{\"\":.<10} {mark:<14} {q_display:<12} {r[\"projects\"]:<10} {r[\"sample\"]}')
"
query_capacity.ps1 3.0 KB
<#
.SYNOPSIS
    Queries available capacity for an Azure OpenAI model and validates if a target is achievable.
.PARAMETER ModelName
    The model name (e.g., "gpt-4o", "o3-mini")
.PARAMETER ModelVersion
    The model version (e.g., "2025-01-31"). If omitted, lists available versions.
.PARAMETER Region
    Optional. Check capacity in a specific region only.
.PARAMETER SKU
    SKU to check (default: GlobalStandard)
.EXAMPLE
    .\query_capacity.ps1 -ModelName o3-mini
    .\query_capacity.ps1 -ModelName o3-mini -ModelVersion 2025-01-31 -Region eastus2
#>
param(
    [Parameter(Mandatory)][string]$ModelName,
    [string]$ModelVersion,
    [string]$Region,
    [string]$SKU = "GlobalStandard"
)

$ErrorActionPreference = "Stop"

$subId = az account show --query id -o tsv

# If no version provided, list available versions first
if (-not $ModelVersion) {
    Write-Host "Available versions for $ModelName`:"
    $loc = if ($Region) { $Region } else { "eastus" }
    az cognitiveservices model list --location $loc `
        --query "[?model.name=='$ModelName'].{Version:model.version, Format:model.format}" `
        --output table 2>$null
    return
}

# Build URL parameters
$urlParams = @("api-version=2024-10-01", "modelFormat=OpenAI", "modelName=$ModelName", "modelVersion=$ModelVersion")

if ($Region) {
    $url = "https://management.azure.com/subscriptions/$subId/providers/Microsoft.CognitiveServices/locations/$Region/modelCapacities"
} else {
    $url = "https://management.azure.com/subscriptions/$subId/providers/Microsoft.CognitiveServices/modelCapacities"
}

$raw = az rest --method GET --url $url --url-parameters @urlParams 2>$null | Out-String | ConvertFrom-Json

# Filter by SKU
$filtered = $raw.value | Where-Object { $_.properties.skuName -eq $SKU -and $_.properties.availableCapacity -gt 0 }

if (-not $filtered) {
    Write-Host "No capacity found for $ModelName v$ModelVersion ($SKU)" -ForegroundColor Red
    Write-Host "Try a different SKU or version."
    return
}

Write-Host "Capacity: $ModelName v$ModelVersion ($SKU)"
Write-Host ""
$filtered | ForEach-Object {
    # Check subscription quota for this region
    $quotaDisplay = "?"
    try {
        $usageData = az cognitiveservices usage list --location $_.location --subscription $subId -o json 2>$null | Out-String | ConvertFrom-Json
        $usageEntry = $usageData | Where-Object { $_.name.value -eq "OpenAI.$SKU.$ModelName" }
        if ($usageEntry) {
            $quotaAvail = [int]$usageEntry.limit - [int]$usageEntry.currentValue
            $quotaDisplay = if ($quotaAvail -gt 0) { "${quotaAvail}K" } else { "0 (at limit)" }
        } else {
            $quotaDisplay = "0 (none)"
        }
    } catch {
        $quotaDisplay = "?"
    }
    [PSCustomObject]@{
        Region    = $_.location
        SKU       = $_.properties.skuName
        Available = "$($_.properties.availableCapacity)K TPM"
        Quota     = $quotaDisplay
    }
} | Sort-Object { [int]($_.Available -replace '[^\d]','') } -Descending | Format-Table -AutoSize
query_capacity.sh 2.9 KB
#!/bin/bash
# query_capacity.sh
# Queries available capacity for an Azure OpenAI model.
#
# Usage:
#   ./query_capacity.sh <model-name> [model-version] [region] [sku]
# Examples:
#   ./query_capacity.sh o3-mini                          # List versions
#   ./query_capacity.sh o3-mini 2025-01-31               # All regions
#   ./query_capacity.sh o3-mini 2025-01-31 eastus2       # Specific region
#   ./query_capacity.sh o3-mini 2025-01-31 "" Standard   # Different SKU

set -euo pipefail

MODEL_NAME="${1:?Usage: $0 <model-name> [model-version] [region] [sku]}"
MODEL_VERSION="${2:-}"
REGION="${3:-}"
SKU="${4:-GlobalStandard}"

SUB_ID=$(az account show --query id -o tsv)

# If no version, list available versions
if [ -z "$MODEL_VERSION" ]; then
    LOC="${REGION:-eastus}"
    echo "Available versions for $MODEL_NAME:"
    az cognitiveservices model list --location "$LOC" \
        --query "[?model.name=='$MODEL_NAME'].{Version:model.version, Format:model.format}" \
        --output table 2>/dev/null
    exit 0
fi

# Build URL
if [ -n "$REGION" ]; then
    URL="https://management.azure.com/subscriptions/${SUB_ID}/providers/Microsoft.CognitiveServices/locations/${REGION}/modelCapacities"
else
    URL="https://management.azure.com/subscriptions/${SUB_ID}/providers/Microsoft.CognitiveServices/modelCapacities"
fi

# Query capacity
CAPACITY_RESULT=$(az rest --method GET --url "$URL" \
    --url-parameters api-version=2024-10-01 modelFormat=OpenAI modelName="$MODEL_NAME" modelVersion="$MODEL_VERSION" \
    2>/dev/null)

# Get regions with capacity
REGIONS_WITH_CAP=$(echo "$CAPACITY_RESULT" | jq -r ".value[] | select(.properties.skuName==\"$SKU\" and .properties.availableCapacity > 0) | .location" 2>/dev/null | sort -u)

if [ -z "$REGIONS_WITH_CAP" ]; then
    echo "No capacity found for $MODEL_NAME v$MODEL_VERSION ($SKU)"
    echo "Try a different SKU or version."
    exit 0
fi

echo "Capacity: $MODEL_NAME v$MODEL_VERSION ($SKU)"
echo ""
printf "%-22s %-12s %-15s %s\n" "Region" "Available" "Quota" "SKU"
printf -- '-%.0s' {1..60}; echo ""

for region in $REGIONS_WITH_CAP; do
    avail=$(echo "$CAPACITY_RESULT" | jq -r ".value[] | select(.location==\"$region\" and .properties.skuName==\"$SKU\") | .properties.availableCapacity" 2>/dev/null | head -1)

    # Check subscription quota
    usage_json=$(az cognitiveservices usage list --location "$region" --subscription "$SUB_ID" -o json 2>/dev/null || echo "[]")
    quota_avail=$(echo "$usage_json" | jq -r --arg name "OpenAI.$SKU.$MODEL_NAME" \
        '[.[] | select(.name.value == $name)] | if length > 0 then .[0].limit - .[0].currentValue else 0 end' 2>/dev/null || echo "?")

    if [ "$quota_avail" = "0" ]; then
        quota_display="0 (none)"
    elif [ "$quota_avail" = "?" ]; then
        quota_display="?"
    else
        quota_display="${quota_avail}K"
    fi

    printf "%-22s %-12s %-15s %s\n" "$region" "${avail}K TPM" "$quota_display" "$SKU"
done
models/deploy-model/customize/
EXAMPLES.md 4.3 KB
# customize Examples

## Example 1: Basic Deployment with Defaults

**Scenario:** Deploy gpt-4o accepting all defaults for quick setup.
**Config:** gpt-4o / GlobalStandard / 10K TPM / Dynamic Quota enabled
**Result:** Deployment `gpt-4o` created in ~2-3 min with auto-upgrade enabled.

## Example 2: Production Deployment with Custom Capacity

**Scenario:** Deploy gpt-4o for production with high throughput.
**Config:** gpt-4o / GlobalStandard / 50K TPM / Dynamic Quota / Name: `gpt-4o-production`
**Result:** 50K TPM (500 req/10s). Suitable for moderate-to-high traffic production apps.

## Example 3: PTU Deployment for High-Volume Workload

**Scenario:** Deploy gpt-4o with reserved capacity (PTU) for predictable workload.
**Config:** gpt-4o / ProvisionedManaged / 200 PTU (min 50, max 1000) / Priority Processing enabled
**PTU sizing:** 40K input + 20K output tokens/min โ†’ ~100 PTU estimated โ†’ 200 PTU recommended (2x headroom)
**Result:** Guaranteed throughput, fixed monthly cost. Use case: customer service bots, document pipelines.

## Example 4: Development Deployment with Standard SKU

**Scenario:** Deploy gpt-4o-mini for dev/testing with minimal cost.
**Config:** gpt-4o-mini / Standard / 1K TPM / Name: `gpt-4o-mini-dev`
**Result:** 1K TPM, 10 req/10s. Minimal pay-per-use cost for development and prototyping.

## Example 5: Spillover Configuration

**Scenario:** Deploy gpt-4o with spillover to handle peak load overflow.
**Config:** gpt-4o / GlobalStandard / 20K TPM / Dynamic Quota / Spillover โ†’ `gpt-4o-backup`
**Result:** Primary handles up to 20K TPM; overflow auto-redirects to backup deployment.

## Example 6: Anthropic Model Deployment (claude-sonnet-4-6)

**Scenario:** Deploy claude-sonnet-4-6 with customized settings.
**Config:** claude-sonnet-4-6 / GlobalStandard / capacity 1 (MaaS) / Industry: Healthcare / No RAI policy (Anthropic manages content filtering)
**Result:** User selected "Healthcare" as industry โ†’ tenant country code (US) and org name fetched automatically โ†’ deployed via ARM REST API with `modelProviderData` in ~2 min.

---

## Comparison Matrix

| Scenario | Model | SKU | Capacity | Dynamic Quota | Priority | Spillover | Use Case |
|----------|-------|-----|----------|:---:|:---:|:---:|----------|
| Ex 1 | gpt-4o | GlobalStandard | 10K TPM | โœ“ | - | - | Quick setup |
| Ex 2 | gpt-4o | GlobalStandard | 50K TPM | โœ“ | - | - | Production |
| Ex 3 | gpt-4o | ProvisionedManaged | 200 PTU | - | โœ“ | - | Predictable workload |
| Ex 4 | gpt-4o-mini | Standard | 1K TPM | - | - | - | Dev/testing |
| Ex 5 | gpt-4o | GlobalStandard | 20K TPM | โœ“ | - | โœ“ | Peak load |
| Ex 6 | claude-sonnet-4-6 | GlobalStandard | 1 (MaaS) | - | - | - | Anthropic model |

## Common Patterns

### Dev โ†’ Staging โ†’ Production

| Stage | Model | SKU | Capacity | Extras |
|-------|-------|-----|----------|--------|
| Dev | gpt-4o-mini | Standard | 1K TPM | โ€” |
| Staging | gpt-4o | GlobalStandard | 10K TPM | โ€” |
| Production | gpt-4o | GlobalStandard | 50K TPM | Dynamic Quota + Spillover |

### Cost Optimization

- **High priority:** gpt-4o, ProvisionedManaged, 100 PTU, Priority Processing
- **Low priority:** gpt-4o-mini, Standard, 5K TPM

---

## Tips and Best Practices

**Capacity:** Start conservative โ†’ monitor with Azure Monitor โ†’ scale gradually โ†’ use spillover for peaks.

**SKU Selection:** Standard for dev โ†’ GlobalStandard + dynamic quota for variable production โ†’ ProvisionedManaged (PTU) for predictable load.

**Cost:** Right-size capacity; use gpt-4o-mini where possible (80-90% accuracy at lower cost); enable dynamic quota; consider PTU for consistent high-volume.

**Versions:** Auto-upgrade recommended; test new versions in staging first; pin only if compatibility requires it.

**Content Filtering:** Start with DefaultV2; use custom policies only for specific needs; monitor filtered requests.

---

## Troubleshooting

| Problem | Solution |
|---------|----------|
| `QuotaExceeded` | Check usage with `az cognitiveservices usage list`, reduce capacity, try different SKU, check other regions, or use the [quota skill](../../../quota/quota.md) to request an increase |
| Version not available for SKU | Check `az cognitiveservices account list-models --query "[?name=='gpt-4o'].version"`, use latest |
| Deployment name exists | Skill auto-generates unique name (e.g., `gpt-4o-2`), or specify custom name |
SKILL.md 8.7 KB
---
name: customize
description: "Interactive guided deployment flow for Azure OpenAI models with full customization control. Step-by-step selection of model version, SKU (GlobalStandard/Standard/ProvisionedManaged), capacity, RAI policy (content filter), and advanced options (dynamic quota, priority processing, spillover). USE FOR: custom deployment, customize model deployment, choose version, select SKU, set capacity, configure content filter, RAI policy, deployment options, detailed deployment, advanced deployment, PTU deployment, provisioned throughput. DO NOT USE FOR: quick deployment to optimal region (use preset)."
license: MIT
metadata:
  author: Microsoft
  version: "1.0.1"
---

# Customize Model Deployment

Interactive guided workflow for deploying Azure OpenAI models with full customization control over version, SKU, capacity, content filtering, and advanced options.

## Quick Reference

| Property | Description |
|----------|-------------|
| **Flow** | Interactive step-by-step guided deployment |
| **Customization** | Version, SKU, Capacity, RAI Policy, Advanced Options |
| **SKU Support** | GlobalStandard, Standard, ProvisionedManaged, DataZoneStandard |
| **Best For** | Precise control over deployment configuration |
| **Authentication** | Azure CLI (`az login`) |
| **Tools** | Azure CLI, MCP tools (optional) |

## When to Use This Skill

Use this skill when you need **precise control** over deployment configuration:

- โœ… **Choose specific model version** (not just latest)
- โœ… **Select deployment SKU** (GlobalStandard vs Standard vs PTU)
- โœ… **Set exact capacity** within available range
- โœ… **Configure content filtering** (RAI policy selection)
- โœ… **Enable advanced features** (dynamic quota, priority processing, spillover)
- โœ… **PTU deployments** (Provisioned Throughput Units)

**Alternative:** Use `preset` for quick deployment to the best available region with automatic configuration.

### Comparison: customize vs preset

| Feature | customize | preset |
|---------|---------------------|----------------------------|
| **Focus** | Full customization control | Optimal region selection |
| **Version Selection** | User chooses from available | Uses latest automatically |
| **SKU Selection** | User chooses (GlobalStandard/Standard/PTU) | GlobalStandard only |
| **Capacity** | User specifies exact value | Auto-calculated (50% of available) |
| **RAI Policy** | User selects from options | Default policy only |
| **Region** | Current region first, falls back to all regions if no capacity | Checks capacity across all regions upfront |
| **Use Case** | Precise deployment requirements | Quick deployment to best region |

## Prerequisites

- Azure subscription with Cognitive Services Contributor or Owner role
- Azure AI Foundry project resource ID (format: `/subscriptions/{sub}/resourceGroups/{rg}/providers/Microsoft.CognitiveServices/accounts/{account}/projects/{project}`)
- Azure CLI installed and authenticated (`az login`)
- Optional: Set `PROJECT_RESOURCE_ID` environment variable

## Workflow Overview

### Complete Flow (14 Phases)

```
1. Verify Authentication
2. Get Project Resource ID
3. Verify Project Exists
4. Get Model Name (if not provided)
5. List Model Versions โ†’ User Selects
6. List SKUs for Version โ†’ User Selects
7. Get Capacity Range โ†’ User Configures
   7b. If no capacity: Cross-Region Fallback โ†’ Query all regions โ†’ User selects region/project
8. List RAI Policies โ†’ User Selects
9. Configure Advanced Options (if applicable)
10. Configure Version Upgrade Policy
11. Generate Deployment Name
12. Review Configuration
13. Execute Deployment & Monitor
```

### Fast Path (Defaults)

If user accepts all defaults (latest version, GlobalStandard SKU, recommended capacity, default RAI policy, standard upgrade policy), deployment completes in ~5 interactions.

---

## Phase Summaries

> โš ๏ธ **MUST READ:** Before executing any phase, load [references/customize-workflow.md](references/customize-workflow.md) for the full scripts and implementation details. The summaries below describe *what* each phase does โ€” the reference file contains the *how* (CLI commands, quota patterns, capacity formulas, cross-region fallback logic).

| Phase | Action | Key Details |
|-------|--------|-------------|
| **1. Verify Auth** | Check `az account show`; prompt `az login` if needed | Verify correct subscription is active |
| **2. Get Project ID** | Read `PROJECT_RESOURCE_ID` env var or prompt user | ARM resource ID format required |
| **3. Verify Project** | Parse resource ID, call `az cognitiveservices account show` | Extracts subscription, RG, account, project, region |
| **4. Get Model** | List models via `az cognitiveservices account list-models` | User selects from available or enters custom name |
| **5. Select Version** | Query versions for chosen model | Recommend latest; user picks from list |
| **6. Select SKU** | Query model catalog + subscription quota, show only deployable SKUs | โš ๏ธ Never hardcode SKU lists โ€” always query live data |
| **7. Configure Capacity** | Query capacity API, validate min/max/step, user enters value | Cross-region fallback if no capacity in current region |
| **8. Select RAI Policy** | Present content filter options | Default: `Microsoft.DefaultV2` |
| **9. Advanced Options** | Dynamic quota (GlobalStandard), priority processing (PTU), spillover | SKU-dependent availability |
| **10. Upgrade Policy** | Choose: OnceNewDefaultVersionAvailable / OnceCurrentVersionExpired / NoAutoUpgrade | Default: auto-upgrade on new default |
| **11. Deployment Name** | Auto-generate unique name, allow custom override | Validates format: `^[\w.-]{2,64}$` |
| **12. Review** | Display full config summary, confirm before proceeding | User approves or cancels |
| **13. Deploy & Monitor** | `az cognitiveservices account deployment create`, poll status | Timeout after 5 min; show endpoint + portal link |


---

## Error Handling

### Common Issues and Resolutions

| Error | Cause | Resolution |
|-------|-------|------------|
| **Model not found** | Invalid model name | List available models with `az cognitiveservices account list-models` |
| **Version not available** | Version not supported for SKU | Select different version or SKU |
| **Insufficient quota** | Capacity > available quota | Skill auto-searches all regions; fails only if no region has quota |
| **SKU not supported** | SKU not available in region | Cross-region fallback searches other regions automatically |
| **Capacity out of range** | Invalid capacity value | **PREVENTED**: Skill validates min/max/step at input (Phase 7) |
| **Deployment name exists** | Name conflict | Auto-incremented name generation |
| **Authentication failed** | Not logged in | Run `az login` |
| **Permission denied** | Insufficient permissions | Assign Cognitive Services Contributor role |
| **Capacity query fails** | API/permissions/network error | **DEPLOYMENT BLOCKED**: Will not proceed without valid quota data |

### Troubleshooting Commands

```bash
# Check deployment status
az cognitiveservices account deployment show --name <account> --resource-group <rg> --deployment-name <name>

# List all deployments
az cognitiveservices account deployment list --name <account> --resource-group <rg> -o table

# Check quota usage
az cognitiveservices usage list --name <account> --resource-group <rg>

# Delete failed deployment
az cognitiveservices account deployment delete --name <account> --resource-group <rg> --deployment-name <name>
```

---

## Selection Guides & Advanced Topics

> For SKU comparison tables, PTU sizing formulas, and advanced option details, load [references/customize-guides.md](references/customize-guides.md).

**SKU selection:** GlobalStandard (production/HA) โ†’ Standard (dev/test) โ†’ ProvisionedManaged (high-volume/guaranteed throughput) โ†’ DataZoneStandard (data residency).

**Capacity:** TPM-based SKUs range from 1K (dev) to 100K+ (large production). PTU-based use formula: `(Input TPM ร— 0.001) + (Output TPM ร— 0.002) + (Requests/min ร— 0.1)`.

**Advanced options:** Dynamic quota (GlobalStandard only), priority processing (PTU only, extra cost), spillover (overflow to backup deployment).

---

## Related Skills

- **preset** - Quick deployment to best region with automatic configuration
- **microsoft-foundry** - Parent skill for all Azure AI Foundry operations
- **[quota](../../../quota/quota.md)** โ€” For quota viewing, increase requests, and troubleshooting quota errors, defer to this skill instead of duplicating guidance
- **rbac** - Manage permissions and access control

---

## Notes

- Set `PROJECT_RESOURCE_ID` environment variable to skip prompt
- Not all SKUs available in all regions; capacity varies by subscription/region/model
- Custom RAI policies can be configured in Azure Portal
- Automatic version upgrades occur during maintenance windows
- Use Azure Monitor and Application Insights for production deployments
models/deploy-model/customize/references/
customize-guides.md 3.4 KB
# Customize Guides โ€” Selection Guides & Advanced Topics

> Reference for: `models/deploy-model/customize/SKILL.md`

**Table of Contents:** [Selection Guides](#selection-guides) ยท [Advanced Topics](#advanced-topics)

## Selection Guides

### How to Choose SKU

| SKU | Best For | Cost | Availability |
|-----|----------|------|--------------|
| **GlobalStandard** | Production, high availability | Medium | Multi-region |
| **Standard** | Development, testing | Low | Single region |
| **ProvisionedManaged** | High-volume, predictable workloads | Fixed (PTU) | Reserved capacity |
| **DataZoneStandard** | Data residency requirements | Medium | Specific zones |

**Decision Tree:**
```
Do you need guaranteed throughput?
โ”œโ”€ Yes โ†’ ProvisionedManaged (PTU)
โ””โ”€ No โ†’ Do you need high availability?
        โ”œโ”€ Yes โ†’ GlobalStandard
        โ””โ”€ No โ†’ Standard
```

### How to Choose Capacity

**For TPM-based SKUs (GlobalStandard, Standard):**

| Workload | Recommended Capacity |
|----------|---------------------|
| Development/Testing | 1K - 5K TPM |
| Small Production | 5K - 20K TPM |
| Medium Production | 20K - 100K TPM |
| Large Production | 100K+ TPM |

**For PTU-based SKUs (ProvisionedManaged):**

Use the PTU calculator based on:
- Input tokens per minute
- Output tokens per minute
- Requests per minute

**Capacity Planning Tips:**
- Start with recommended capacity
- Monitor usage and adjust
- Enable dynamic quota for flexibility
- Consider spillover for peak loads

### How to Choose RAI Policy

| Policy | Filtering Level | Use Case |
|--------|----------------|----------|
| **Microsoft.DefaultV2** | Balanced | Most applications |
| **Microsoft.Prompt-Shield** | Enhanced | Security-sensitive apps |
| **Custom** | Configurable | Specific requirements |

**Recommendation:** Start with `Microsoft.DefaultV2` and adjust based on application needs.

---

## Advanced Topics

### PTU (Provisioned Throughput Units) Deployments

**What is PTU?**
- Reserved capacity with guaranteed throughput
- Measured in PTU units, not TPM
- Fixed cost regardless of usage
- Best for high-volume, predictable workloads

**PTU Calculator:**

```
Estimated PTU = (Input TPM ร— 0.001) + (Output TPM ร— 0.002) + (Requests/min ร— 0.1)

Example:
- Input: 10,000 tokens/min
- Output: 5,000 tokens/min
- Requests: 100/min

PTU = (10,000 ร— 0.001) + (5,000 ร— 0.002) + (100 ร— 0.1)
    = 10 + 10 + 10
    = 30 PTU
```

**PTU Deployment:**
```bash
az cognitiveservices account deployment create \
  --name <account-name> \
  --resource-group <resource-group> \
  --deployment-name <deployment-name> \
  --model-name <model-name> \
  --model-version <version> \
  --model-format "OpenAI" \
  --sku-name "ProvisionedManaged" \
  --sku-capacity 100  # PTU units
```

### Spillover Configuration

**Spillover Workflow:**
1. Primary deployment receives requests
2. When capacity reached, requests overflow to spillover target
3. Spillover target must be same model or compatible
4. Configure via deployment properties

**Best Practices:**
- Use spillover for peak load handling
- Spillover target should have sufficient capacity
- Monitor both deployments
- Test failover behavior

### Priority Processing

**What is Priority Processing?**
- Prioritizes your requests during high load
- Available for ProvisionedManaged SKU
- Additional charges apply
- Ensures consistent performance

**When to Use:**
- Mission-critical applications
- SLA requirements
- High-concurrency scenarios
customize-workflow.md 13.0 KB
# Customize Workflow โ€” Detailed Phase Instructions

> Reference for: `models/deploy-model/customize/SKILL.md`

## Phase 1: Verify Authentication

```bash
az account show --query "{Subscription:name, User:user.name}" -o table
```

If not logged in: `az login`

Set subscription if needed:
```bash
az account list --query "[].[name,id,state]" -o table
az account set --subscription <subscription-id>
```

---

## Phase 2: Get Project Resource ID

Check `PROJECT_RESOURCE_ID` env var. If not set, prompt user.

**Format:** `/subscriptions/{sub}/resourceGroups/{rg}/providers/Microsoft.CognitiveServices/accounts/{account}/projects/{project}`

---

## Phase 3: Parse and Verify Project

Parse ARM resource ID to extract components:

```powershell
$SUBSCRIPTION_ID = ($PROJECT_RESOURCE_ID -split '/')[2]
$RESOURCE_GROUP = ($PROJECT_RESOURCE_ID -split '/')[4]
$ACCOUNT_NAME = ($PROJECT_RESOURCE_ID -split '/')[8]
$PROJECT_NAME = ($PROJECT_RESOURCE_ID -split '/')[10]
```

Verify project exists and get region:
```bash
az account set --subscription $SUBSCRIPTION_ID
az cognitiveservices account show \
  --name $ACCOUNT_NAME \
  --resource-group $RESOURCE_GROUP \
  --query location -o tsv
```

---

## Phase 4: Get Model Name

List available models if not provided:
```bash
az cognitiveservices account list-models \
  --name $ACCOUNT_NAME \
  --resource-group $RESOURCE_GROUP \
  --query "[].name" -o json
```

Present sorted unique list. Allow custom model name entry.

**Detect model format:**

```bash
# Get model format (e.g., OpenAI, Anthropic, Meta-Llama, Mistral, Cohere)
MODEL_FORMAT=$(az cognitiveservices account list-models \
  --name "$ACCOUNT_NAME" \
  --resource-group "$RESOURCE_GROUP" \
  --query "[?name=='$MODEL_NAME'].format" -o tsv | head -1)

MODEL_FORMAT=${MODEL_FORMAT:-"OpenAI"}
echo "Model format: $MODEL_FORMAT"
```

> ๐Ÿ’ก **Model format determines the deployment path:**
> - `OpenAI` โ€” Standard CLI, TPM-based capacity, RAI policies, version upgrade policies
> - `Anthropic` โ€” REST API with `modelProviderData`, capacity=1, no RAI, no version upgrade
> - All other formats (`Meta-Llama`, `Mistral`, `Cohere`, etc.) โ€” Standard CLI, capacity=1 (MaaS), no RAI, no version upgrade

---

## Phase 5: List and Select Model Version

```bash
az cognitiveservices account list-models \
  --name $ACCOUNT_NAME \
  --resource-group $RESOURCE_GROUP \
  --query "[?name=='$MODEL_NAME'].version" -o json
```

Recommend latest version (first in list). Default to `"latest"` if no versions found.

---

## Phase 6: List and Select SKU

> โš ๏ธ **Warning:** Never hardcode SKU lists โ€” always query live data.

**Step A โ€” Query model-supported SKUs:**
```bash
az cognitiveservices model list \
  --location $PROJECT_REGION \
  --subscription $SUBSCRIPTION_ID -o json
```

Filter: `model.name == $MODEL_NAME && model.version == $MODEL_VERSION`, extract `model.skus[].name`.

**Step B โ€” Check subscription quota per SKU:**
```bash
az cognitiveservices usage list \
  --location $PROJECT_REGION \
  --subscription $SUBSCRIPTION_ID -o json
```

Quota key pattern: `OpenAI.<SKU>.<model-name>`. Calculate `available = limit - currentValue`.

**Step C โ€” Present only deployable SKUs** (available > 0). If no SKUs have quota, direct user to the [quota skill](../../../../quota/quota.md).

---

## Phase 7: Configure Capacity

> โš ๏ธ **Non-OpenAI models (MaaS):** If `MODEL_FORMAT != "OpenAI"`, capacity is always `1` (pay-per-token billing). Skip capacity configuration and set `DEPLOY_CAPACITY=1`. Proceed to Phase 7c (Anthropic) or Phase 8.

**For OpenAI models only โ€” query capacity via REST API:**
```bash
# Current region capacity
az rest --method GET --url \
  "https://management.azure.com/subscriptions/$SUBSCRIPTION_ID/providers/Microsoft.CognitiveServices/locations/$PROJECT_REGION/modelCapacities?api-version=2024-10-01&modelFormat=$MODEL_FORMAT&modelName=$MODEL_NAME&modelVersion=$MODEL_VERSION"
```

Filter result for `properties.skuName == $SELECTED_SKU`. Read `properties.availableCapacity`.

**Capacity defaults by SKU (OpenAI only):**

| SKU | Unit | Min | Max | Step | Default |
|-----|------|-----|-----|------|---------|
| ProvisionedManaged | PTU | 50 | 1000 | 50 | 100 |
| Others (TPM-based) | TPM | 1000 | min(available, 300000) | 1000 | min(10000, available/2) |

Validate user input: must be >= min, <= max, multiple of step. On invalid input, explain constraints.

### Phase 7b: Cross-Region Fallback

If no capacity in current region, query ALL regions:
```bash
az rest --method GET --url \
  "https://management.azure.com/subscriptions/$SUBSCRIPTION_ID/providers/Microsoft.CognitiveServices/modelCapacities?api-version=2024-10-01&modelFormat=$MODEL_FORMAT&modelName=$MODEL_NAME&modelVersion=$MODEL_VERSION"
```

Filter: `properties.skuName == $SELECTED_SKU && properties.availableCapacity > 0`. Sort descending by capacity.

Present available regions. After user selects region, find existing projects there:
```bash
az cognitiveservices account list \
  --query "[?kind=='AIProject' && location=='$PROJECT_REGION'].{Name:name, ResourceGroup:resourceGroup}" \
  -o json
```

If projects exist, let user select one and update `$ACCOUNT_NAME`, `$RESOURCE_GROUP`. If none, direct to project/create skill.

Re-run capacity configuration with new region's available capacity.

If no region has capacity: fail with guidance to request quota increase, check existing deployments, or try different model/SKU.

---

## Phase 7c: Anthropic Model Provider Data (Anthropic models only)

> โš ๏ธ **Only execute this phase if `MODEL_FORMAT == "Anthropic"`.** For OpenAI and other models, skip to Phase 8.

Anthropic models require `modelProviderData` in the deployment payload. Collect this before deployment.

**Step 1: Prompt user to select industry**

Present the following list and ask the user to choose one:

```
 1. None                    (API value: none)
 2. Biotechnology           (API value: biotechnology)
 3. Consulting              (API value: consulting)
 4. Education               (API value: education)
 5. Finance                 (API value: finance)
 6. Food & Beverage         (API value: food_and_beverage)
 7. Government              (API value: government)
 8. Healthcare              (API value: healthcare)
 9. Insurance               (API value: insurance)
10. Law                     (API value: law)
11. Manufacturing           (API value: manufacturing)
12. Media                   (API value: media)
13. Nonprofit               (API value: nonprofit)
14. Technology              (API value: technology)
15. Telecommunications      (API value: telecommunications)
16. Sport & Recreation      (API value: sport_and_recreation)
17. Real Estate             (API value: real_estate)
18. Retail                  (API value: retail)
19. Other                   (API value: other)
```

> โš ๏ธ **Do NOT pick a default industry or hardcode a value. Always ask the user.** This is required by Anthropic's terms of service. The industry list is static โ€” there is no REST API that provides it.

Store selection as `SELECTED_INDUSTRY` (use the API value, e.g., `technology`).

**Step 2: Fetch tenant info (country code and organization name)**

```bash
TENANT_INFO=$(az rest --method GET \
  --url "https://management.azure.com/tenants?api-version=2024-11-01" \
  --query "value[0].{countryCode:countryCode, displayName:displayName}" -o json)

COUNTRY_CODE=$(echo "$TENANT_INFO" | jq -r '.countryCode')
ORG_NAME=$(echo "$TENANT_INFO" | jq -r '.displayName')
```

*PowerShell version:*
```powershell
$tenantInfo = az rest --method GET `
  --url "https://management.azure.com/tenants?api-version=2024-11-01" `
  --query "value[0].{countryCode:countryCode, displayName:displayName}" -o json | ConvertFrom-Json

$countryCode = $tenantInfo.countryCode
$orgName = $tenantInfo.displayName
```

Store `COUNTRY_CODE` and `ORG_NAME` for use in Phase 13.

---

## Phase 8: Select RAI Policy (Content Filter)

> โš ๏ธ **Note:** RAI policies only apply to OpenAI models. Skip this phase if `MODEL_FORMAT != "OpenAI"` (Anthropic, Meta-Llama, Mistral, Cohere, etc. do not use RAI policies).

Present options:
1. `Microsoft.DefaultV2` โ€” Balanced filtering (recommended). Filters hate, violence, sexual, self-harm.
2. `Microsoft.Prompt-Shield` โ€” Enhanced prompt injection/jailbreak protection.
3. Custom policies โ€” Organization-specific (configured in Azure Portal).

Default: `Microsoft.DefaultV2`.

---

## Phase 9: Configure Advanced Options

Options are SKU-dependent:

**A. Dynamic Quota** (GlobalStandard only)
- Auto-scales beyond base allocation when capacity available
- Default: enabled

**B. Priority Processing** (ProvisionedManaged only)
- Prioritizes requests during high load; additional charges apply
- Default: disabled

**C. Spillover** (any SKU)
- Redirects requests to backup deployment at capacity
- Requires existing deployment; list with:
```bash
az cognitiveservices account deployment list \
  --name $ACCOUNT_NAME \
  --resource-group $RESOURCE_GROUP \
  --query "[].name" -o json
```
- Default: disabled

---

## Phase 10: Configure Version Upgrade Policy

> โš ๏ธ **Note:** Version upgrade policies only apply to OpenAI models. Skip this phase if `MODEL_FORMAT != "OpenAI"`.

| Policy | Description |
|--------|-------------|
| `OnceNewDefaultVersionAvailable` | Auto-upgrade to new default (Recommended) |
| `OnceCurrentVersionExpired` | Upgrade only when current expires |
| `NoAutoUpgrade` | Manual upgrade only |

Default: `OnceNewDefaultVersionAvailable`.

---

## Phase 11: Generate Deployment Name

List existing deployments to avoid conflicts:
```bash
az cognitiveservices account deployment list \
  --name $ACCOUNT_NAME \
  --resource-group $RESOURCE_GROUP \
  --query "[].name" -o json
```

Auto-generate: use model name as base, append `-2`, `-3` etc. if taken. Allow custom override. Validate: `^[\w.-]{2,64}$`.

---

## Phase 12: Review Configuration

Display summary of all selections for user confirmation before proceeding:
- Model, version, deployment name
- SKU, capacity (with unit), region
- RAI policy, version upgrade policy
- Advanced options (dynamic quota, priority, spillover)
- Account, resource group, project

User confirms or cancels.

---

## Phase 13: Execute Deployment

> ๐Ÿ’ก `MODEL_FORMAT` was already detected in Phase 4. Use the stored value here.

### Standard CLI deployment (non-Anthropic models):

**Create deployment:**
```bash
az cognitiveservices account deployment create \
  --name $ACCOUNT_NAME \
  --resource-group $RESOURCE_GROUP \
  --deployment-name $DEPLOYMENT_NAME \
  --model-name $MODEL_NAME \
  --model-version $MODEL_VERSION \
  --model-format "$MODEL_FORMAT" \
  --sku-name $SELECTED_SKU \
  --sku-capacity $DEPLOY_CAPACITY
```

> ๐Ÿ’ก **Note:** For non-OpenAI MaaS models, `$DEPLOY_CAPACITY` is `1` (set in Phase 7).

### Anthropic model deployment (requires modelProviderData):

The Azure CLI does not support `--model-provider-data`. Use the ARM REST API directly.

> โš ๏ธ Industry, country code, and organization name should have been collected in Phase 7c.

```bash
echo "Creating Anthropic model deployment via REST API..."

az rest --method PUT \
  --url "https://management.azure.com/subscriptions/$SUBSCRIPTION_ID/resourceGroups/$RESOURCE_GROUP/providers/Microsoft.CognitiveServices/accounts/$ACCOUNT_NAME/deployments/$DEPLOYMENT_NAME?api-version=2024-10-01" \
  --body "{
    \"sku\": {
      \"name\": \"$SELECTED_SKU\",
      \"capacity\": 1
    },
    \"properties\": {
      \"model\": {
        \"format\": \"Anthropic\",
        \"name\": \"$MODEL_NAME\",
        \"version\": \"$MODEL_VERSION\"
      },
      \"modelProviderData\": {
        \"industry\": \"$SELECTED_INDUSTRY\",
        \"countryCode\": \"$COUNTRY_CODE\",
        \"organizationName\": \"$ORG_NAME\"
      }
    }
  }"
```

*PowerShell version:*
```powershell
Write-Host "Creating Anthropic model deployment via REST API..."

$body = @{
    sku = @{
        name = $SELECTED_SKU
        capacity = 1
    }
    properties = @{
        model = @{
            format = "Anthropic"
            name = $MODEL_NAME
            version = $MODEL_VERSION
        }
        modelProviderData = @{
            industry = $SELECTED_INDUSTRY
            countryCode = $countryCode
            organizationName = $orgName
        }
    }
} | ConvertTo-Json -Depth 5

az rest --method PUT `
  --url "https://management.azure.com/subscriptions/$SUBSCRIPTION_ID/resourceGroups/$RESOURCE_GROUP/providers/Microsoft.CognitiveServices/accounts/$ACCOUNT_NAME/deployments/${DEPLOYMENT_NAME}?api-version=2024-10-01" `
  --body $body
```

> ๐Ÿ’ก **Note:** Anthropic models use `capacity: 1` (MaaS billing model), not TPM-based capacity. RAI policy is not applicable for Anthropic models.

### Monitor deployment status:
```bash
az cognitiveservices account deployment show \
  --name $ACCOUNT_NAME \
  --resource-group $RESOURCE_GROUP \
  --deployment-name $DEPLOYMENT_NAME \
  --query "properties.provisioningState" -o tsv
```

Poll until `Succeeded` or `Failed`. Timeout after 5 minutes.

**Get endpoint:**
```bash
az cognitiveservices account show \
  --name $ACCOUNT_NAME \
  --resource-group $RESOURCE_GROUP \
  --query "properties.endpoint" -o tsv
```

On success, display deployment name, model, version, SKU, capacity, region, RAI policy, rate limits, endpoint, and Azure AI Foundry portal link.
models/deploy-model/preset/
EXAMPLES.md 2.9 KB
# Examples: preset

## Example 1: Fast Path โ€” Current Region Has Capacity

**Scenario:** Deploy gpt-4o to project in East US, which has capacity.
**Result:** Deployed in ~45s. No region selection needed. 100K TPM default, GlobalStandard SKU.

## Example 2: Alternative Region โ€” No Capacity in Current Region

**Scenario:** Deploy gpt-4-turbo to dev project in West US 2 (no capacity).
**Result:** Queried all regions โ†’ user selected East US 2 (120K available) โ†’ deployed in ~2 min.

## Example 3: Create New Project in Optimal Region

**Scenario:** Deploy gpt-4o-mini in Europe for data residency; no existing European project.
**Result:** Created AI Services hub + project in Sweden Central โ†’ deployed in ~4 min with 150K TPM.

## Example 4: Insufficient Quota Everywhere

**Scenario:** Deploy gpt-4 but all regions have exhausted quota.
**Result:** Graceful failure with actionable guidance:
1. Request quota increase via the [quota skill](../../../quota/quota.md)
2. List existing deployments consuming quota
3. Suggest alternative models (gpt-4o, gpt-4o-mini)

## Example 5: First-Time User โ€” No Project

**Scenario:** Deploy gpt-4o with no existing AI Foundry project.
**Result:** Full onboarding in ~5 min โ€” created resource group, AI Services hub, project, then deployed.

## Example 6: Deployment Name Conflict

**Scenario:** Auto-generated deployment name already exists.
**Result:** Appended random hex suffix (e.g., `-7b9e`) and retried automatically.

## Example 7: Multi-Version Model Selection

**Scenario:** Deploy "latest gpt-4o" when multiple versions exist.
**Result:** Latest stable version auto-selected. Capacity aggregated across versions.

## Example 8: Anthropic Model (claude-sonnet-4-6)

**Scenario:** Deploy claude-sonnet-4-6 (Anthropic model requiring modelProviderData).
**Result:** User prompted for industry selection โ†’ tenant country code and org name fetched automatically โ†’ deployed via ARM REST API with `modelProviderData` payload in ~2 min. Capacity set to 1 (MaaS billing).

---

## Summary of Scenarios

| Scenario | Duration | Key Features |
|----------|----------|--------------|
| **1: Fast Path** | ~45s | Current region has capacity, direct deploy |
| **2: Alt Region** | ~2m | Region selection, project switch |
| **3: New Project** | ~4m | Project creation in optimal region |
| **4: No Quota** | N/A | Graceful failure, actionable guidance |
| **5: First-Time** | ~5m | Complete onboarding |
| **6: Name Conflict** | ~1m | Auto-retry with suffix |
| **7: Multi-Version** | ~1m | Latest version auto-selected |
| **8: Anthropic** | ~2m | Industry prompt, tenant info, REST API deploy |

## Common Patterns

```
A: Quick Deploy     Auth โ†’ Get Project โ†’ Check Region (โœ“) โ†’ Deploy
B: Region Select    Auth โ†’ Get Project โ†’ Region (โœ—) โ†’ Query All โ†’ Select โ†’ Deploy
C: Full Onboarding  Auth โ†’ No Projects โ†’ Create Project โ†’ Deploy
D: Error Recovery   Deploy (โœ—) โ†’ Analyze โ†’ Fix โ†’ Retry
```
SKILL.md 4.8 KB
---
name: preset
description: "Intelligently deploys Azure OpenAI models to optimal regions by analyzing capacity across all available regions. Automatically checks current region first and shows alternatives if needed. USE FOR: quick deployment, optimal region, best region, automatic region selection, fast setup, multi-region capacity check, high availability deployment, deploy to best location. DO NOT USE FOR: custom SKU selection (use customize), specific version selection (use customize), custom capacity configuration (use customize), PTU deployments (use customize)."
license: MIT
metadata:
  author: Microsoft
  version: "1.0.1"
---

# Deploy Model to Optimal Region

Automates intelligent Azure OpenAI model deployment by checking capacity across regions and deploying to the best available option.

## What This Skill Does

1. Verifies Azure authentication and project scope
2. Checks capacity in current project's region
3. If no capacity: analyzes all regions and shows available alternatives
4. Filters projects by selected region
5. Supports creating new projects if needed
6. Deploys model with GlobalStandard SKU
7. Monitors deployment progress

## Prerequisites

- Azure CLI installed and configured
- Active Azure subscription with Cognitive Services read/create permissions
- Azure AI Foundry project resource ID (`PROJECT_RESOURCE_ID` env var or provided interactively)
  - Format: `/subscriptions/{sub-id}/resourceGroups/{rg}/providers/Microsoft.CognitiveServices/accounts/{account}/projects/{project}`
  - Found in: Azure AI Foundry portal โ†’ Project โ†’ Overview โ†’ Resource ID

## Quick Workflow

### Fast Path (Current Region Has Capacity)
```
1. Check authentication โ†’ 2. Get project โ†’ 3. Check current region capacity
โ†’ 4. Deploy immediately
```

### Alternative Region Path (No Capacity)
```
1. Check authentication โ†’ 2. Get project โ†’ 3. Check current region (no capacity)
โ†’ 4. Query all regions โ†’ 5. Show alternatives โ†’ 6. Select region + project
โ†’ 7. Deploy
```

---

## Deployment Phases

| Phase | Action | Key Commands |
|-------|--------|-------------|
| 1. Verify Auth | Check Azure CLI login and subscription | `az account show`, `az login` |
| 2. Get Project | Parse `PROJECT_RESOURCE_ID` ARM ID, verify exists | `az cognitiveservices account show` |
| 3. Get Model | List available models, user selects model + version | `az cognitiveservices account list-models` |
| 4. Check Current Region | Query capacity using GlobalStandard SKU | `az rest --method GET .../modelCapacities` |
| 5. Multi-Region Query | If no local capacity, query all regions | Same capacity API without location filter |
| 6. Select Region + Project | User picks region; find or create project | `az cognitiveservices account list`, `az cognitiveservices account create` |
| 7. Deploy | Generate unique name, calculate capacity (50% available, min 50 TPM), create deployment | `az cognitiveservices account deployment create` |

For detailed step-by-step instructions, see [workflow reference](references/workflow.md).

---

## Error Handling

| Error | Symptom | Resolution |
|-------|---------|------------|
| Auth failure | `az account show` returns error | Run `az login` then `az account set --subscription <id>` |
| No quota | All regions show 0 capacity | Defer to the [quota skill](../../../quota/quota.md) for increase requests and troubleshooting; check existing deployments; try alternative models |
| Model not found | Empty capacity list | Verify model name with `az cognitiveservices account list-models`; check case sensitivity |
| Name conflict | "deployment already exists" | Append suffix to deployment name (handled automatically by `generate_deployment_name` script) |
| Region unavailable | Region doesn't support model | Select a different region from the available list |
| Permission denied | "Forbidden" or "Unauthorized" | Verify Cognitive Services Contributor role: `az role assignment list --assignee <user>` |

---

## Advanced Usage

```bash
# Custom capacity
az cognitiveservices account deployment create ... --sku-capacity <value>

# Check deployment status
az cognitiveservices account deployment show --name <acct> --resource-group <rg> --deployment-name <name> --query "{Status:properties.provisioningState}"

# Delete deployment
az cognitiveservices account deployment delete --name <acct> --resource-group <rg> --deployment-name <name>
```

## Notes

- **SKU:** GlobalStandard only โ€” **API Version:** 2024-10-01 (GA stable)

---

## Related Skills

- **microsoft-foundry** - Parent skill for Azure AI Foundry operations
- **[quota](../../../quota/quota.md)** โ€” For quota viewing, increase requests, and troubleshooting quota errors, defer to this skill
- **azure-quick-review** - Review Azure resources for compliance
- **azure-cost-estimation** - Estimate costs for Azure deployments
- **azure-validate** - Validate Azure infrastructure before deployment
models/deploy-model/preset/references/
preset-workflow.md 21.6 KB
# Preset Deployment Workflow - Detailed Implementation

This file contains the full step-by-step bash/PowerShell scripts for preset (optimal region) model deployment. Referenced from the main [SKILL.md](../SKILL.md).

---

## Phase 1: Verify Authentication

Check if user is logged into Azure CLI:

```bash
az account show --query "{Subscription:name, User:user.name}" -o table
```

**If not logged in:**
```bash
az login
```

**Verify subscription is correct:**
```bash
# List all subscriptions
az account list --query "[].[name,id,state]" -o table

# Set active subscription if needed
az account set --subscription <subscription-id>
```

---

## Phase 2: Get Current Project

**Check for PROJECT_RESOURCE_ID environment variable first:**

```bash
if [ -n "$PROJECT_RESOURCE_ID" ]; then
  echo "Using project resource ID from environment: $PROJECT_RESOURCE_ID"
else
  echo "PROJECT_RESOURCE_ID not set. Please provide your Azure AI Foundry project resource ID."
  echo ""
  echo "You can find this in:"
  echo "  โ€ข Azure AI Foundry portal โ†’ Project โ†’ Overview โ†’ Resource ID"
  echo "  โ€ข Format: /subscriptions/{sub-id}/resourceGroups/{rg}/providers/Microsoft.CognitiveServices/accounts/{account}/projects/{project}"
  echo ""
  echo "Example: /subscriptions/abc123.../resourceGroups/rg-prod/providers/Microsoft.CognitiveServices/accounts/my-account/projects/my-project"
  echo ""
  read -p "Enter project resource ID: " PROJECT_RESOURCE_ID
fi
```

**Parse the ARM resource ID to extract components:**

```bash
# Extract components from ARM resource ID
# Format: /subscriptions/{sub-id}/resourceGroups/{rg}/providers/Microsoft.CognitiveServices/accounts/{account}/projects/{project}

SUBSCRIPTION_ID=$(echo "$PROJECT_RESOURCE_ID" | sed -n 's|.*/subscriptions/\([^/]*\).*|\1|p')
RESOURCE_GROUP=$(echo "$PROJECT_RESOURCE_ID" | sed -n 's|.*/resourceGroups/\([^/]*\).*|\1|p')
ACCOUNT_NAME=$(echo "$PROJECT_RESOURCE_ID" | sed -n 's|.*/accounts/\([^/]*\)/projects.*|\1|p')
PROJECT_NAME=$(echo "$PROJECT_RESOURCE_ID" | sed -n 's|.*/projects/\([^/?]*\).*|\1|p')

if [ -z "$SUBSCRIPTION_ID" ] || [ -z "$RESOURCE_GROUP" ] || [ -z "$ACCOUNT_NAME" ] || [ -z "$PROJECT_NAME" ]; then
  echo "โŒ Invalid project resource ID format"
  echo "Expected format: /subscriptions/{sub-id}/resourceGroups/{rg}/providers/Microsoft.CognitiveServices/accounts/{account}/projects/{project}"
  exit 1
fi

echo "Parsed project details:"
echo "  Subscription: $SUBSCRIPTION_ID"
echo "  Resource Group: $RESOURCE_GROUP"
echo "  Account: $ACCOUNT_NAME"
echo "  Project: $PROJECT_NAME"
```

**Verify the project exists and get its region:**

```bash
# Set active subscription
az account set --subscription "$SUBSCRIPTION_ID"

# Get project details to verify it exists and extract region
PROJECT_REGION=$(az cognitiveservices account show \
  --name "$PROJECT_NAME" \
  --resource-group "$RESOURCE_GROUP" \
  --query location -o tsv 2>/dev/null)

if [ -z "$PROJECT_REGION" ]; then
  echo "โŒ Project '$PROJECT_NAME' not found in resource group '$RESOURCE_GROUP'"
  echo ""
  echo "Please verify the resource ID is correct."
  echo ""
  echo "List available projects:"
  echo "  az cognitiveservices account list --query \"[?kind=='AIProject'].{Name:name, Location:location, ResourceGroup:resourceGroup}\" -o table"
  exit 1
fi

echo "โœ“ Project found"
echo "  Region: $PROJECT_REGION"
```

---

## Phase 3: Get Model Name

**If model name provided as skill parameter, skip this phase.**

Ask user which model to deploy. **Fetch available models dynamically** from the account rather than using a hardcoded list:

```bash
# List available models in the account
az cognitiveservices account list-models \
  --name "$PROJECT_NAME" \
  --resource-group "$RESOURCE_GROUP" \
  --query "[].name" -o tsv | sort -u
```

Present the results to the user and let them choose, or enter a custom model name.

**Store model:**
```bash
MODEL_NAME="<selected-model>"
```

**Get model version (latest stable):**
```bash
# List available models and versions in the account
az cognitiveservices account list-models \
  --name "$PROJECT_NAME" \
  --resource-group "$RESOURCE_GROUP" \
  --query "[?name=='$MODEL_NAME'].{Name:name, Version:version, Format:format}" \
  -o table
```

**Use latest version or let user specify:**
```bash
MODEL_VERSION="<version-or-latest>"
```

**Detect model format:**

```bash
# Get model format from model catalog (e.g., OpenAI, Anthropic, Meta-Llama, Mistral, Cohere)
MODEL_FORMAT=$(az cognitiveservices account list-models \
  --name "$ACCOUNT_NAME" \
  --resource-group "$RESOURCE_GROUP" \
  --query "[?name=='$MODEL_NAME'].format" -o tsv | head -1)

# Default to OpenAI if not found
MODEL_FORMAT=${MODEL_FORMAT:-"OpenAI"}

echo "Model format: $MODEL_FORMAT"
```

> ๐Ÿ’ก **Model format determines the deployment path:**
> - `OpenAI` โ€” Standard CLI deployment, TPM-based capacity, RAI policies apply
> - `Anthropic` โ€” REST API deployment with `modelProviderData`, capacity=1, no RAI
> - All other formats (`Meta-Llama`, `Mistral`, `Cohere`, etc.) โ€” Standard CLI deployment, capacity=1 (MaaS), no RAI

---

## Phase 4: Check Current Region Capacity

Before checking other regions, see if the current project's region has capacity:

```bash
# Query capacity for current region
CAPACITY_JSON=$(az rest --method GET \
  --url "https://management.azure.com/subscriptions/$SUBSCRIPTION_ID/providers/Microsoft.CognitiveServices/locations/$PROJECT_REGION/modelCapacities?api-version=2024-10-01&modelFormat=$MODEL_FORMAT&modelName=$MODEL_NAME&modelVersion=$MODEL_VERSION")

# Extract available capacity for GlobalStandard SKU
CURRENT_CAPACITY=$(echo "$CAPACITY_JSON" | jq -r '.value[] | select(.properties.skuName=="GlobalStandard") | .properties.availableCapacity')
```

**Check result:**
```bash
if [ -n "$CURRENT_CAPACITY" ] && [ "$CURRENT_CAPACITY" -gt 0 ]; then
  echo "โœ“ Current region ($PROJECT_REGION) has capacity: $CURRENT_CAPACITY TPM"
  echo "Proceeding with deployment..."
  # Skip to Phase 7 (Deploy)
else
  echo "โš  Current region ($PROJECT_REGION) has no available capacity"
  echo "Checking alternative regions..."
  # Continue to Phase 5
fi
```

---

## Phase 5: Query Multi-Region Capacity (If Needed)

Only execute this phase if current region has no capacity.

**Query capacity across all regions:**
```bash
# Get capacity for all regions in subscription
ALL_REGIONS_JSON=$(az rest --method GET \
  --url "https://management.azure.com/subscriptions/$SUBSCRIPTION_ID/providers/Microsoft.CognitiveServices/modelCapacities?api-version=2024-10-01&modelFormat=$MODEL_FORMAT&modelName=$MODEL_NAME&modelVersion=$MODEL_VERSION")

# Save to file for processing
echo "$ALL_REGIONS_JSON" > /tmp/capacity_check.json
```

**Parse and categorize regions:**
```bash
# Extract available regions (capacity > 0)
AVAILABLE_REGIONS=$(jq -r '.value[] | select(.properties.skuName=="GlobalStandard" and .properties.availableCapacity > 0) | "\(.location)|\(.properties.availableCapacity)"' /tmp/capacity_check.json)

# Extract unavailable regions (capacity = 0 or undefined)
UNAVAILABLE_REGIONS=$(jq -r '.value[] | select(.properties.skuName=="GlobalStandard" and (.properties.availableCapacity == 0 or .properties.availableCapacity == null)) | "\(.location)|0"' /tmp/capacity_check.json)
```

**Format and display regions:**
```bash
# Format capacity (e.g., 120000 -> 120K)
format_capacity() {
  local capacity=$1
  if [ "$capacity" -ge 1000000 ]; then
    echo "$(awk "BEGIN {printf \"%.1f\", $capacity/1000000}")M TPM"
  elif [ "$capacity" -ge 1000 ]; then
    echo "$(awk "BEGIN {printf \"%.0f\", $capacity/1000}")K TPM"
  else
    echo "$capacity TPM"
  fi
}

echo ""
echo "โš  No Capacity in Current Region"
echo ""
echo "The current project's region ($PROJECT_REGION) does not have available capacity for $MODEL_NAME."
echo ""
echo "Available Regions (with capacity):"
echo ""

# Display available regions with formatted capacity
echo "$AVAILABLE_REGIONS" | while IFS='|' read -r region capacity; do
  formatted_capacity=$(format_capacity "$capacity")
  # Get region display name (capitalize and format)
  region_display=$(echo "$region" | sed 's/\([a-z]\)\([a-z]*\)/\U\1\L\2/g; s/\([a-z]\)\([0-9]\)/\1 \2/g')
  echo "  โ€ข $region_display - $formatted_capacity"
done

echo ""
echo "Unavailable Regions:"
echo ""

# Display unavailable regions
echo "$UNAVAILABLE_REGIONS" | while IFS='|' read -r region capacity; do
  region_display=$(echo "$region" | sed 's/\([a-z]\)\([a-z]*\)/\U\1\L\2/g; s/\([a-z]\)\([0-9]\)/\1 \2/g')
  if [ "$capacity" = "0" ]; then
    echo "  โœ— $region_display (Insufficient quota - 0 TPM available)"
  else
    echo "  โœ— $region_display (Model not supported)"
  fi
done
```

**Handle no capacity anywhere:**
```bash
if [ -z "$AVAILABLE_REGIONS" ]; then
  echo ""
  echo "โŒ No Available Capacity in Any Region"
  echo ""
  echo "No regions have available capacity for $MODEL_NAME with GlobalStandard SKU."
  echo ""
  echo "Next Steps:"
  echo "1. Request quota increase โ€” use the quota skill (../../../quota/quota.md)"
  echo ""
  echo "2. Check existing deployments (may be using quota):"
  echo "   az cognitiveservices account deployment list \\"
  echo "     --name $PROJECT_NAME \\"
  echo "     --resource-group $RESOURCE_GROUP"
  echo ""
  echo "3. Consider alternative models with lower capacity requirements:"
  echo "   โ€ข gpt-4o-mini (cost-effective, lower capacity requirements)"
  echo "   List available models: az cognitiveservices account list-models --name \$PROJECT_NAME --resource-group \$RESOURCE_GROUP --output table"
  exit 1
fi
```

---

## Phase 6: Select Region and Project

**Ask user to select region from available options.**

Example using AskUserQuestion:
- Present available regions as options
- Show capacity for each
- User selects preferred region

**Store selection:**
```bash
SELECTED_REGION="<user-selected-region>"  # e.g., "eastus2"
```

**Find projects in selected region:**
```bash
PROJECTS_IN_REGION=$(az cognitiveservices account list \
  --query "[?kind=='AIProject' && location=='$SELECTED_REGION'].{Name:name, ResourceGroup:resourceGroup}" \
  --output json)

PROJECT_COUNT=$(echo "$PROJECTS_IN_REGION" | jq '. | length')

if [ "$PROJECT_COUNT" -eq 0 ]; then
  echo "No projects found in $SELECTED_REGION"
  echo "Would you like to create a new project? (yes/no)"
  # If yes, continue to project creation
  # If no, exit or select different region
else
  echo "Projects in $SELECTED_REGION:"
  echo "$PROJECTS_IN_REGION" | jq -r '.[] | "  โ€ข \(.Name) (\(.ResourceGroup))"'
  echo ""
  echo "Select a project or create new project"
fi
```

**Option A: Use existing project**
```bash
PROJECT_NAME="<selected-project-name>"
RESOURCE_GROUP="<resource-group>"
```

**Option B: Create new project**
```bash
# Generate project name
USER_ALIAS=$(az account show --query user.name -o tsv | cut -d'@' -f1 | tr '.' '-')
RANDOM_SUFFIX=$(openssl rand -hex 2)
NEW_PROJECT_NAME="${USER_ALIAS}-aiproject-${RANDOM_SUFFIX}"

# Prompt for resource group
echo "Resource group for new project:"
echo "  1. Use existing resource group: $RESOURCE_GROUP"
echo "  2. Create new resource group"

# If existing resource group
NEW_RESOURCE_GROUP="$RESOURCE_GROUP"

# Create AI Services account (hub)
HUB_NAME="${NEW_PROJECT_NAME}-hub"

echo "Creating AI Services hub: $HUB_NAME in $SELECTED_REGION..."

az cognitiveservices account create \
  --name "$HUB_NAME" \
  --resource-group "$NEW_RESOURCE_GROUP" \
  --location "$SELECTED_REGION" \
  --kind "AIServices" \
  --sku "S0" \
  --yes

# Create AI Foundry project
echo "Creating AI Foundry project: $NEW_PROJECT_NAME..."

az cognitiveservices account create \
  --name "$NEW_PROJECT_NAME" \
  --resource-group "$NEW_RESOURCE_GROUP" \
  --location "$SELECTED_REGION" \
  --kind "AIProject" \
  --sku "S0" \
  --yes

echo "โœ“ Project created successfully"
PROJECT_NAME="$NEW_PROJECT_NAME"
RESOURCE_GROUP="$NEW_RESOURCE_GROUP"
```

---

## Phase 7: Deploy Model

**Generate unique deployment name:**

The deployment name should match the model name (e.g., "gpt-4o"), but if a deployment with that name already exists, append a numeric suffix (e.g., "gpt-4o-2", "gpt-4o-3"). This follows the same UX pattern as Azure AI Foundry portal.

Use the `generate_deployment_name` script to check existing deployments and generate a unique name:

*Bash version:*
```bash
DEPLOYMENT_NAME=$(bash scripts/generate_deployment_name.sh \
  "$ACCOUNT_NAME" \
  "$RESOURCE_GROUP" \
  "$MODEL_NAME")

echo "Generated deployment name: $DEPLOYMENT_NAME"
```

*PowerShell version:*
```powershell
$DEPLOYMENT_NAME = & .\scripts\generate_deployment_name.ps1 `
  -AccountName $ACCOUNT_NAME `
  -ResourceGroup $RESOURCE_GROUP `
  -ModelName $MODEL_NAME

Write-Host "Generated deployment name: $DEPLOYMENT_NAME"
```

**Calculate deployment capacity:**

Follow UX capacity calculation logic. For OpenAI models, use 50% of available capacity (minimum 50 TPM). For all other models (MaaS), capacity is always 1:

```bash
if [ "$MODEL_FORMAT" = "OpenAI" ]; then
  # OpenAI models: TPM-based capacity (50% of available, minimum 50)
  SELECTED_CAPACITY=$(echo "$ALL_REGIONS_JSON" | jq -r ".value[] | select(.location==\"$SELECTED_REGION\" and .properties.skuName==\"GlobalStandard\") | .properties.availableCapacity")

  if [ "$SELECTED_CAPACITY" -gt 50 ]; then
    DEPLOY_CAPACITY=$((SELECTED_CAPACITY / 2))
    if [ "$DEPLOY_CAPACITY" -lt 50 ]; then
      DEPLOY_CAPACITY=50
    fi
  else
    DEPLOY_CAPACITY=$SELECTED_CAPACITY
  fi

  echo "Deploying with capacity: $DEPLOY_CAPACITY TPM (50% of available: $SELECTED_CAPACITY TPM)"
else
  # Non-OpenAI models (MaaS): capacity is always 1
  DEPLOY_CAPACITY=1
  echo "MaaS model โ€” deploying with capacity: 1 (pay-per-token billing)"
fi
```

### If MODEL_FORMAT is NOT "Anthropic" โ€” Standard CLI Deployment

> ๐Ÿ’ก **Note:** The Azure CLI supports all non-Anthropic model formats directly.

*Bash version:*
```bash
echo "Creating deployment..."

az cognitiveservices account deployment create \
  --name "$ACCOUNT_NAME" \
  --resource-group "$RESOURCE_GROUP" \
  --deployment-name "$DEPLOYMENT_NAME" \
  --model-name "$MODEL_NAME" \
  --model-version "$MODEL_VERSION" \
  --model-format "$MODEL_FORMAT" \
  --sku-name "GlobalStandard" \
  --sku-capacity "$DEPLOY_CAPACITY"
```

*PowerShell version:*
```powershell
Write-Host "Creating deployment..."

az cognitiveservices account deployment create `
  --name $ACCOUNT_NAME `
  --resource-group $RESOURCE_GROUP `
  --deployment-name $DEPLOYMENT_NAME `
  --model-name $MODEL_NAME `
  --model-version $MODEL_VERSION `
  --model-format $MODEL_FORMAT `
  --sku-name "GlobalStandard" `
  --sku-capacity $DEPLOY_CAPACITY
```

> ๐Ÿ’ก **Note:** For non-OpenAI MaaS models (Meta-Llama, Mistral, Cohere, etc.), `$DEPLOY_CAPACITY` is `1` (set in capacity calculation above).

### If MODEL_FORMAT is "Anthropic" โ€” REST API Deployment with modelProviderData

The Azure CLI does not support `--model-provider-data`. You must use the ARM REST API directly.

**Step 1: Prompt user to select industry**

Present the following list and ask the user to choose one:

```
 1. None                    (API value: none)
 2. Biotechnology           (API value: biotechnology)
 3. Consulting              (API value: consulting)
 4. Education               (API value: education)
 5. Finance                 (API value: finance)
 6. Food & Beverage         (API value: food_and_beverage)
 7. Government              (API value: government)
 8. Healthcare              (API value: healthcare)
 9. Insurance               (API value: insurance)
10. Law                     (API value: law)
11. Manufacturing           (API value: manufacturing)
12. Media                   (API value: media)
13. Nonprofit               (API value: nonprofit)
14. Technology              (API value: technology)
15. Telecommunications      (API value: telecommunications)
16. Sport & Recreation      (API value: sport_and_recreation)
17. Real Estate             (API value: real_estate)
18. Retail                  (API value: retail)
19. Other                   (API value: other)
```

> โš ๏ธ **Do NOT pick a default industry or hardcode a value. Always ask the user.** This is required by Anthropic's terms of service. The industry list is static โ€” there is no REST API that provides it.

Store selection as `SELECTED_INDUSTRY` (use the API value, e.g., `technology`).

**Step 2: Fetch tenant info (country code and organization name)**

```bash
TENANT_INFO=$(az rest --method GET \
  --url "https://management.azure.com/tenants?api-version=2024-11-01" \
  --query "value[0].{countryCode:countryCode, displayName:displayName}" -o json)

COUNTRY_CODE=$(echo "$TENANT_INFO" | jq -r '.countryCode')
ORG_NAME=$(echo "$TENANT_INFO" | jq -r '.displayName')
```

*PowerShell version:*
```powershell
$tenantInfo = az rest --method GET `
  --url "https://management.azure.com/tenants?api-version=2024-11-01" `
  --query "value[0].{countryCode:countryCode, displayName:displayName}" -o json | ConvertFrom-Json

$countryCode = $tenantInfo.countryCode
$orgName = $tenantInfo.displayName
```

**Step 3: Deploy via ARM REST API**

*Bash version:*
```bash
echo "Creating Anthropic model deployment via REST API..."

az rest --method PUT \
  --url "https://management.azure.com/subscriptions/$SUBSCRIPTION_ID/resourceGroups/$RESOURCE_GROUP/providers/Microsoft.CognitiveServices/accounts/$ACCOUNT_NAME/deployments/$DEPLOYMENT_NAME?api-version=2024-10-01" \
  --body "{
    \"sku\": {
      \"name\": \"GlobalStandard\",
      \"capacity\": 1
    },
    \"properties\": {
      \"model\": {
        \"format\": \"Anthropic\",
        \"name\": \"$MODEL_NAME\",
        \"version\": \"$MODEL_VERSION\"
      },
      \"modelProviderData\": {
        \"industry\": \"$SELECTED_INDUSTRY\",
        \"countryCode\": \"$COUNTRY_CODE\",
        \"organizationName\": \"$ORG_NAME\"
      }
    }
  }"
```

*PowerShell version:*
```powershell
Write-Host "Creating Anthropic model deployment via REST API..."

$body = @{
    sku = @{
        name = "GlobalStandard"
        capacity = 1
    }
    properties = @{
        model = @{
            format = "Anthropic"
            name = $MODEL_NAME
            version = $MODEL_VERSION
        }
        modelProviderData = @{
            industry = $SELECTED_INDUSTRY
            countryCode = $countryCode
            organizationName = $orgName
        }
    }
} | ConvertTo-Json -Depth 5

az rest --method PUT `
  --url "https://management.azure.com/subscriptions/$SUBSCRIPTION_ID/resourceGroups/$RESOURCE_GROUP/providers/Microsoft.CognitiveServices/accounts/$ACCOUNT_NAME/deployments/${DEPLOYMENT_NAME}?api-version=2024-10-01" `
  --body $body
```

> ๐Ÿ’ก **Note:** Anthropic models use `capacity: 1` (MaaS billing model), not TPM-based capacity.

**Monitor deployment progress:**
```bash
echo "Monitoring deployment status..."

MAX_WAIT=300  # 5 minutes
ELAPSED=0
INTERVAL=10

while [ $ELAPSED -lt $MAX_WAIT ]; do
  STATUS=$(az cognitiveservices account deployment show \
    --name "$ACCOUNT_NAME" \
    --resource-group "$RESOURCE_GROUP" \
    --deployment-name "$DEPLOYMENT_NAME" \
    --query "properties.provisioningState" -o tsv 2>/dev/null)

  case "$STATUS" in
    "Succeeded")
      echo "โœ“ Deployment successful!"
      break
      ;;
    "Failed")
      echo "โŒ Deployment failed"
      # Get error details
      az cognitiveservices account deployment show \
        --name "$ACCOUNT_NAME" \
        --resource-group "$RESOURCE_GROUP" \
        --deployment-name "$DEPLOYMENT_NAME" \
        --query "properties"
      exit 1
      ;;
    "Creating"|"Accepted"|"Running")
      echo "Status: $STATUS... (${ELAPSED}s elapsed)"
      sleep $INTERVAL
      ELAPSED=$((ELAPSED + INTERVAL))
      ;;
    *)
      echo "Unknown status: $STATUS"
      sleep $INTERVAL
      ELAPSED=$((ELAPSED + INTERVAL))
      ;;
  esac
done

if [ $ELAPSED -ge $MAX_WAIT ]; then
  echo "โš  Deployment timeout after ${MAX_WAIT}s"
  echo "Check status manually:"
  echo "  az cognitiveservices account deployment show \\"
  echo "    --name $ACCOUNT_NAME \\"
  echo "    --resource-group $RESOURCE_GROUP \\"
  echo "    --deployment-name $DEPLOYMENT_NAME"
  exit 1
fi
```

---

## Phase 8: Display Deployment Details

**Show deployment information:**
```bash
echo ""
echo "โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•"
echo "โœ“ Deployment Successful!"
echo "โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•"
echo ""

# Get endpoint information
ENDPOINT=$(az cognitiveservices account show \
  --name "$ACCOUNT_NAME" \
  --resource-group "$RESOURCE_GROUP" \
  --query "properties.endpoint" -o tsv)

# Get deployment details
DEPLOYMENT_INFO=$(az cognitiveservices account deployment show \
  --name "$ACCOUNT_NAME" \
  --resource-group "$RESOURCE_GROUP" \
  --deployment-name "$DEPLOYMENT_NAME" \
  --query "properties.model")

echo "Deployment Name: $DEPLOYMENT_NAME"
echo "Model: $MODEL_NAME"
echo "Version: $MODEL_VERSION"
echo "Region: $SELECTED_REGION"
echo "SKU: GlobalStandard"
echo "Capacity: $(format_capacity $DEPLOY_CAPACITY)"
echo "Endpoint: $ENDPOINT"
echo ""

# Generate direct link to deployment in Azure AI Foundry portal
DEPLOYMENT_URL=$(bash "$(dirname "$0")/scripts/generate_deployment_url.sh" \
  --subscription "$SUBSCRIPTION_ID" \
  --resource-group "$RESOURCE_GROUP" \
  --foundry-resource "$ACCOUNT_NAME" \
  --project "$PROJECT_NAME" \
  --deployment "$DEPLOYMENT_NAME")

echo "๐Ÿ”— View in Azure AI Foundry Portal:"
echo ""
echo "$DEPLOYMENT_URL"
echo ""
echo "โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•"
echo ""

echo "Test your deployment:"
echo ""
echo "# View deployment details"
echo "az cognitiveservices account deployment show \\"
echo "  --name $ACCOUNT_NAME \\"
echo "  --resource-group $RESOURCE_GROUP \\"
echo "  --deployment-name $DEPLOYMENT_NAME"
echo ""
echo "# List all deployments"
echo "az cognitiveservices account deployment list \\"
echo "  --name $ACCOUNT_NAME \\"
echo "  --resource-group $RESOURCE_GROUP \\"
echo "  --output table"
echo ""

echo "Next steps:"
echo "โ€ข Click the link above to test in Azure AI Foundry playground"
echo "โ€ข Integrate into your application"
echo "โ€ข Set up monitoring and alerts"
```
workflow.md 5.6 KB
# Preset Deployment Workflow โ€” Step-by-Step

Condensed implementation reference for preset (optimal region) model deployment. See [SKILL.md](../SKILL.md) for overview.

**Table of Contents:** [Phase 1: Verify Authentication](#phase-1-verify-authentication) ยท [Phase 2: Get Current Project](#phase-2-get-current-project) ยท [Phase 3: Get Model Name](#phase-3-get-model-name) ยท [Phase 4: Check Current Region Capacity](#phase-4-check-current-region-capacity) ยท [Phase 5: Query Multi-Region Capacity](#phase-5-query-multi-region-capacity) ยท [Phase 6: Select Region and Project](#phase-6-select-region-and-project) ยท [Phase 7: Deploy Model](#phase-7-deploy-model)

---

## Phase 1: Verify Authentication

```bash
az account show --query "{Subscription:name, User:user.name}" -o table
```

If not logged in: `az login`

Switch subscription:

```bash
az account list --query "[].[name,id,state]" -o table
az account set --subscription <subscription-id>
```

---

## Phase 2: Get Current Project

Read `PROJECT_RESOURCE_ID` from env or prompt user. Format:
`/subscriptions/{sub-id}/resourceGroups/{rg}/providers/Microsoft.CognitiveServices/accounts/{account}/projects/{project}`

Parse ARM ID components:

```bash
SUBSCRIPTION_ID=$(echo "$PROJECT_RESOURCE_ID" | sed -n 's|.*/subscriptions/\([^/]*\).*|\1|p')
RESOURCE_GROUP=$(echo "$PROJECT_RESOURCE_ID" | sed -n 's|.*/resourceGroups/\([^/]*\).*|\1|p')
ACCOUNT_NAME=$(echo "$PROJECT_RESOURCE_ID" | sed -n 's|.*/accounts/\([^/]*\)/projects.*|\1|p')
PROJECT_NAME=$(echo "$PROJECT_RESOURCE_ID" | sed -n 's|.*/projects/\([^/?]*\).*|\1|p')
```

Verify project exists and get region:

```bash
az account set --subscription "$SUBSCRIPTION_ID"

PROJECT_REGION=$(az cognitiveservices account show \
  --name "$PROJECT_NAME" \
  --resource-group "$RESOURCE_GROUP" \
  --query location -o tsv)
```

---

## Phase 3: Get Model Name

If model not provided as parameter, list available models:

```bash
az cognitiveservices account list-models \
  --name "$PROJECT_NAME" \
  --resource-group "$RESOURCE_GROUP" \
  --query "[].name" -o tsv | sort -u
```

Get versions for selected model:

```bash
az cognitiveservices account list-models \
  --name "$PROJECT_NAME" \
  --resource-group "$RESOURCE_GROUP" \
  --query "[?name=='$MODEL_NAME'].{Name:name, Version:version, Format:format}" \
  -o table
```

---

## Phase 4: Check Current Region Capacity

```bash
CAPACITY_JSON=$(az rest --method GET \
  --url "https://management.azure.com/subscriptions/$SUBSCRIPTION_ID/providers/Microsoft.CognitiveServices/locations/$PROJECT_REGION/modelCapacities?api-version=2024-10-01&modelFormat=OpenAI&modelName=$MODEL_NAME&modelVersion=$MODEL_VERSION")

CURRENT_CAPACITY=$(echo "$CAPACITY_JSON" | jq -r '.value[] | select(.properties.skuName=="GlobalStandard") | .properties.availableCapacity')
```

If `CURRENT_CAPACITY > 0` โ†’ skip to Phase 7. Otherwise continue to Phase 5.

---

## Phase 5: Query Multi-Region Capacity

```bash
ALL_REGIONS_JSON=$(az rest --method GET \
  --url "https://management.azure.com/subscriptions/$SUBSCRIPTION_ID/providers/Microsoft.CognitiveServices/modelCapacities?api-version=2024-10-01&modelFormat=OpenAI&modelName=$MODEL_NAME&modelVersion=$MODEL_VERSION")
```

Extract available regions (capacity > 0):

```bash
AVAILABLE_REGIONS=$(echo "$ALL_REGIONS_JSON" | jq -r '.value[] | select(.properties.skuName=="GlobalStandard" and .properties.availableCapacity > 0) | "\(.location)|\(.properties.availableCapacity)"')
```

Extract unavailable regions:

```bash
UNAVAILABLE_REGIONS=$(echo "$ALL_REGIONS_JSON" | jq -r '.value[] | select(.properties.skuName=="GlobalStandard" and (.properties.availableCapacity == 0 or .properties.availableCapacity == null)) | "\(.location)|0"')
```

If no regions have capacity, defer to the [quota skill](../../../../quota/quota.md) for increase requests. Suggest checking existing deployments or trying alternative models like `gpt-4o-mini`.

---

## Phase 6: Select Region and Project

Present available regions to user. Store selection as `SELECTED_REGION`.

Find projects in selected region:

```bash
PROJECTS_IN_REGION=$(az cognitiveservices account list \
  --query "[?kind=='AIProject' && location=='$SELECTED_REGION'].{Name:name, ResourceGroup:resourceGroup}" \
  --output json)
```

**If no projects exist โ€” create new:**

```bash
az cognitiveservices account create \
  --name "$HUB_NAME" \
  --resource-group "$RESOURCE_GROUP" \
  --location "$SELECTED_REGION" \
  --kind "AIServices" \
  --sku "S0" --yes

az cognitiveservices account create \
  --name "$NEW_PROJECT_NAME" \
  --resource-group "$RESOURCE_GROUP" \
  --location "$SELECTED_REGION" \
  --kind "AIProject" \
  --sku "S0" --yes
```

---

## Phase 7: Deploy Model

Generate unique deployment name using `scripts/generate_deployment_name.sh`:

```bash
DEPLOYMENT_NAME=$(bash scripts/generate_deployment_name.sh "$ACCOUNT_NAME" "$RESOURCE_GROUP" "$MODEL_NAME")
```

Calculate capacity โ€” 50% of available, minimum 50 TPM:

```bash
SELECTED_CAPACITY=$(echo "$ALL_REGIONS_JSON" | jq -r ".value[] | select(.location==\"$SELECTED_REGION\" and .properties.skuName==\"GlobalStandard\") | .properties.availableCapacity")
DEPLOY_CAPACITY=$(( SELECTED_CAPACITY / 2 ))
[ "$DEPLOY_CAPACITY" -lt 50 ] && DEPLOY_CAPACITY=50
```

Create deployment:

```bash
az cognitiveservices account deployment create \
  --name "$ACCOUNT_NAME" \
  --resource-group "$RESOURCE_GROUP" \
  --deployment-name "$DEPLOYMENT_NAME" \
  --model-name "$MODEL_NAME" \
  --model-version "$MODEL_VERSION" \
  --model-format "OpenAI" \
  --sku-name "GlobalStandard" \
  --sku-capacity "$DEPLOY_CAPACITY"
```

Monitor with `az cognitiveservices account deployment show ... --query "properties.provisioningState"` until `Succeeded` or `Failed`.
models/deploy-model/scripts/
generate_deployment_url.ps1 2.3 KB
# Generate Azure AI Foundry portal URL for a model deployment
# This script creates a direct clickable link to view a deployment in the Azure AI Foundry portal
#
# NOTE: The encoding scheme for the subscription ID portion is proprietary to Azure AI Foundry.
# This script uses a GUID byte encoding approach, but may need adjustment based on the actual encoding used.

param(
    [Parameter(Mandatory=$true)]
    [string]$SubscriptionId,
    
    [Parameter(Mandatory=$true)]
    [string]$ResourceGroup,
    
    [Parameter(Mandatory=$true)]
    [string]$FoundryResource,
    
    [Parameter(Mandatory=$true)]
    [string]$ProjectName,
    
    [Parameter(Mandatory=$true)]
    [string]$DeploymentName
)

function Get-SubscriptionIdEncoded {
    param([string]$SubscriptionId)
    
    # Parse GUID and convert to bytes in string order (big-endian)
    # Not using ToByteArray() because it uses little-endian format
    $guidString = $SubscriptionId.Replace('-', '')
    $bytes = New-Object byte[] 16
    for ($i = 0; $i -lt 16; $i++) {
        $bytes[$i] = [Convert]::ToByte($guidString.Substring($i * 2, 2), 16)
    }
    
    # Encode as base64url
    $base64 = [Convert]::ToBase64String($bytes)
    $urlSafe = $base64.Replace('+', '-').Replace('/', '_').TrimEnd('=')
    return $urlSafe
}

function Get-FoundryDeploymentUrl {
    param(
        [string]$SubscriptionId,
        [string]$ResourceGroup,
        [string]$FoundryResource,
        [string]$ProjectName,
        [string]$DeploymentName
    )
    
    # Encode subscription ID
    $encodedSubId = Get-SubscriptionIdEncoded -SubscriptionId $SubscriptionId
    
    # Build the encoded resource path
    # Format: {encoded-sub-id},{resource-group},,{foundry-resource},{project-name}
    # Note: Two commas between resource-group and foundry-resource
    $encodedPath = "$encodedSubId,$ResourceGroup,,$FoundryResource,$ProjectName"
    
    # Build the full URL
    $baseUrl = "https://ai.azure.com/nextgen/r/"
    $deploymentPath = "/build/models/deployments/$DeploymentName/details"
    
    return "$baseUrl$encodedPath$deploymentPath"
}

# Generate and output the URL
$url = Get-FoundryDeploymentUrl `
    -SubscriptionId $SubscriptionId `
    -ResourceGroup $ResourceGroup `
    -FoundryResource $FoundryResource `
    -ProjectName $ProjectName `
    -DeploymentName $DeploymentName

Write-Output $url
generate_deployment_url.sh 2.6 KB
#!/bin/bash
# Generate Azure AI Foundry portal URL for a model deployment
# This script creates a direct clickable link to view a deployment in the Azure AI Foundry portal

set -e

# Function to display usage
usage() {
    cat << EOF
Usage: $0 --subscription SUBSCRIPTION_ID --resource-group RESOURCE_GROUP \\
          --foundry-resource FOUNDRY_RESOURCE --project PROJECT_NAME \\
          --deployment DEPLOYMENT_NAME

Generate Azure AI Foundry deployment URL

Required arguments:
  --subscription        Azure subscription ID (GUID)
  --resource-group      Resource group name
  --foundry-resource    Foundry resource (account) name
  --project             Project name
  --deployment          Deployment name

Example:
  $0 --subscription d5320f9a-73da-4a74-b639-83efebc7bb6f \\
     --resource-group bani-host \\
     --foundry-resource banide-host-resource \\
     --project banide-host \\
     --deployment text-embedding-ada-002
EOF
    exit 1
}

# Parse command line arguments
while [[ $# -gt 0 ]]; do
    case $1 in
        --subscription)
            SUBSCRIPTION_ID="$2"
            shift 2
            ;;
        --resource-group)
            RESOURCE_GROUP="$2"
            shift 2
            ;;
        --foundry-resource)
            FOUNDRY_RESOURCE="$2"
            shift 2
            ;;
        --project)
            PROJECT_NAME="$2"
            shift 2
            ;;
        --deployment)
            DEPLOYMENT_NAME="$2"
            shift 2
            ;;
        -h|--help)
            usage
            ;;
        *)
            echo "Unknown option: $1"
            usage
            ;;
    esac
done

# Validate required arguments
if [ -z "$SUBSCRIPTION_ID" ] || [ -z "$RESOURCE_GROUP" ] || [ -z "$FOUNDRY_RESOURCE" ] || \
   [ -z "$PROJECT_NAME" ] || [ -z "$DEPLOYMENT_NAME" ]; then
    echo "Error: Missing required arguments"
    usage
fi

# Convert subscription GUID to bytes (big-endian/string order) and encode as base64url
# Remove hyphens from GUID
GUID_HEX=$(echo "$SUBSCRIPTION_ID" | tr -d '-')

# Convert hex string to bytes and base64 encode
# Using xxd to convert hex to binary, then base64 encode
ENCODED_SUB=$(echo "$GUID_HEX" | xxd -r -p | base64 | tr '+' '-' | tr '/' '_' | tr -d '=')

# Build the encoded resource path
# Format: {encoded-sub-id},{resource-group},,{foundry-resource},{project-name}
# Note: Two commas between resource-group and foundry-resource
ENCODED_PATH="${ENCODED_SUB},${RESOURCE_GROUP},,${FOUNDRY_RESOURCE},${PROJECT_NAME}"

# Build the full URL
BASE_URL="https://ai.azure.com/nextgen/r/"
DEPLOYMENT_PATH="/build/models/deployments/${DEPLOYMENT_NAME}/details"

echo "${BASE_URL}${ENCODED_PATH}${DEPLOYMENT_PATH}"
project/
connections.md 3.1 KB
# Foundry Project Connections

Connections authenticate and link external resources to a Foundry project. Many agent tools (Azure AI Search, Bing Grounding, MCP) require a project connection before use.

## Managing Connections via MCP

Use the Foundry MCP server for all connection operations. The MCP tools handle authentication, validation, and project scoping automatically.

| Operation | MCP Tool | Description |
|-----------|----------|-------------|
| List all connections | `project_connection_list` | Lists project connections and can filter by category or target |
| Get connection details | `project_connection_get` | Retrieves a specific connection by `connectionName` |
| Create a connection | `project_connection_create` | Creates or replaces a project connection to an external resource |
| Update a connection | `project_connection_update` | Updates auth, category, target, or expiry on an existing connection |
| Delete a connection | `project_connection_delete` | Removes a connection from the project by name |
| List supported categories/auth types | `project_connection_list_metadata` | Lists valid connection categories and auth types before create/update |

> ๐Ÿ’ก **Tip:** Use `project_connection_get` or `project_connection_list` to resolve the connection name and full connection resource ID before configuring agent tools that require `project_connection_id`.

## Create Connection via Portal

1. Open [Microsoft Foundry portal](https://ai.azure.com)
2. Navigate to **Operate** โ†’ **Admin** โ†’ select your project
3. Select **Add connection** โ†’ choose service type
4. Browse for resource, select auth method, click **Add connection**

## Connection ID Format

For REST and TypeScript samples, the full connection ID format is:

```
/subscriptions/{subId}/resourceGroups/{rg}/providers/Microsoft.CognitiveServices/accounts/{account}/projects/{project}/connections/{connectionName}
```

Python and C# SDKs resolve this automatically from the connection name.

## Common Connection Types

| Type | Resource | Used By |
|------|----------|---------|
| `azure_ai_search` | Azure AI Search | AI Search tool |
| `bing` | Grounding with Bing Search | Bing grounding tool |
| `bing_custom_search` | Grounding with Bing Custom Search | Bing Custom Search tool |
| `api_key` | Any API-key resource | MCP servers, custom tools |
| `azure_openai` | Azure OpenAI | Model access |
| `AzureStorageAccount` | Azure Blob Storage | Dataset upload via `evaluation_dataset_create` |

## RBAC for Connection Management

| Role | Scope | Permission |
|------|-------|------------|
| **Azure AI Project Manager** | Project | Create/manage project connections |
| **Contributor** or **Owner** | Subscription/RG | Create Bing/Search resources, get keys |

## Troubleshooting

| Error | Cause | Fix |
|-------|-------|-----|
| `Connection not found` | Name mismatch or wrong project | Use `project_connection_list` to find the correct `connectionName` |
| `Unauthorized` creating connection | Missing Azure AI Project Manager role | Assign role on the Foundry project |
| `Invalid connection ID format` | Using name instead of full resource ID | Use `project_connection_get` to resolve the full ID |
project/create/
create-foundry-project.md 5.5 KB
---
name: foundry-create-project
description: |
  Create a new Azure AI Foundry project using Azure Developer CLI (azd) to provision infrastructure for hosting AI agents and models.
  USE FOR: create Foundry project, new AI Foundry project, set up Foundry, azd init Foundry, provision Foundry infrastructure, onboard to Foundry, create Azure AI project, set up AI project.
  DO NOT USE FOR: deploying agents to existing projects (use agent/deploy), creating agent code (use agent/create), deploying AI models from catalog (use microsoft-foundry main skill), Azure Functions (use azure-functions).
allowed-tools: Read, Write, Bash, AskUserQuestion
---

# Create Azure AI Foundry Project

Create a new Azure AI Foundry project using azd. Provisions: Foundry account, project, Application Insights, managed identity, and RBAC permissions. Optionally enables hosted agents (capability host + Container Registry).

**Table of Contents:** [Prerequisites](#prerequisites) ยท [Workflow](#workflow) ยท [Best Practices](#best-practices) ยท [Troubleshooting](#troubleshooting) ยท [Related Skills](#related-skills) ยท [Resources](#resources)

## Prerequisites

Run checks in order. STOP on any failure and resolve before proceeding.

**1. Azure CLI** โ€” `az version` โ†’ expects version output. If missing: https://aka.ms/installazurecli

**2. Azure login & subscription:**

```bash
az account show --query "{Name:name, SubscriptionId:id, State:state}" -o table
```

If not logged in, run `az login`. If no active subscription: https://azure.microsoft.com/free/ โ€” STOP.

If multiple subscriptions, ask which to use, then `az account set --subscription "<id>"`.

**3. Role permissions:**

```bash
az role assignment list --assignee "$(az ad signed-in-user show --query id -o tsv)" --query "[?contains(roleDefinitionName, 'Owner') || contains(roleDefinitionName, 'Contributor') || contains(roleDefinitionName, 'Azure AI')].{Role:roleDefinitionName, Scope:scope}" -o table
```

Requires Owner, Contributor, or Azure AI Owner. If insufficient โ€” STOP, request elevated access from admin.

**4. Azure Developer CLI** โ€” `azd version`. If missing: https://aka.ms/azure-dev/install

## Workflow

### Step 1: Verify azd login

```bash
azd auth login --check-status
```

If not logged in, run `azd auth login` and complete browser auth.

### Step 2: Ask User for Project Details

Use AskUserQuestion for:

1. **Project name** โ€” used as azd environment name and resource group (`rg-<name>`). Must contain only alphanumeric characters and hyphens. Examples: `my-ai-project`, `dev-agents`
2. **Azure location** (optional) โ€” defaults to North Central US (required for hosted agents preview)
3. **Enable hosted agents?** (yes/no) โ€” provisions a capability host and Container Registry for deploying hosted agents. Defaults to no.

### Step 3: Create Directory and Initialize

```bash
mkdir "<project-name>" && cd "<project-name>"
azd init -t https://github.com/Azure-Samples/azd-ai-starter-basic -e <project-name> --no-prompt
```

- `-t` โ€” Azure AI starter template (Foundry infrastructure)
- `-e` โ€” environment name
- `--no-prompt` โ€” non-interactive, use defaults
- **IMPORTANT:** `azd init` requires an empty directory

If user specified a non-default location:

```bash
azd config set defaults.location <location>
```

If user chose to enable hosted agents:

```bash
azd env set ENABLE_HOSTED_AGENTS true
```

This provisions a capability host (`capabilityHosts/agents`) on the Foundry account and auto-adds an Azure Container Registry for hosted agent deployments.

### Step 4: Provision Infrastructure

```bash
azd provision --no-prompt
```

Takes 5โ€“10 minutes. Creates resource group, Foundry account/project, Application Insights, managed identity, and RBAC roles. If hosted agents enabled, also creates Container Registry and capability host.

### Step 5: Retrieve Project Details

```bash
azd env get-values
```

Capture `AZURE_AI_PROJECT_ID`, `AZURE_AI_PROJECT_ENDPOINT`, and `AZURE_RESOURCE_GROUP`. Direct user to verify at https://ai.azure.com.

### Step 6: Next Steps

- Deploy an agent โ†’ `agent/deploy` skill
- Browse models โ†’ `foundry_models_list` MCP tool
- Manage project โ†’ https://ai.azure.com

## Best Practices

- Use North Central US for hosted agents (preview requirement)
- Name must be alphanumeric + hyphens only โ€” no spaces, underscores, or special characters
- Delete unused projects with `azd down` to avoid ongoing costs
- `azd down` deletes ALL resources โ€” Foundry account, agents, models, Container Registry, and Application Insights data
- `azd provision` is safe to re-run on failure

## Troubleshooting

| Problem | Solution |
|---------|----------|
| `azd: command not found` | Install from https://aka.ms/azure-dev/install |
| `ERROR: Failed to authenticate` | Run `azd auth login`; verify subscription with `az account list` |
| `environment name '' is invalid` | Name must be alphanumeric + hyphens only |
| `ERROR: Insufficient permissions` | Request Contributor or Azure AI Owner role from admin |
| Region not supported for hosted agents | Use `azd config set defaults.location northcentralus` |
| Provisioning timeout | Check region availability, verify connectivity, retry `azd provision` |

## Related Skills

- **agent/deploy** โ€” Deploy agents to the created project
- **agent/create** โ€” Create a new agent for deployment

## Resources

- [Azure Developer CLI](https://aka.ms/azure-dev/install) ยท [AI Foundry Portal](https://ai.azure.com) ยท [Foundry Docs](https://learn.microsoft.com/azure/ai-foundry/) ยท [azd-ai-starter-basic template](https://github.com/Azure-Samples/azd-ai-starter-basic)
quota/
quota.md 8.3 KB
# Microsoft Foundry Quota Management

Quota and capacity management for Microsoft Foundry. Quotas are **subscription + region** level.

> โš ๏ธ **Important:** This is the **authoritative skill** for all Foundry quota operations. When a user asks about quota, capacity, TPM, PTU, quota errors, or deployment limits, **always invoke this skill** rather than using MCP tools (azure-quota, azure-documentation, azure-foundry) directly. This skill provides structured workflows and error handling that direct tool calls lack.

> **Important:** All quota operations are **control plane (management)** operations. Use **Azure CLI commands** as the primary method. MCP tools are optional convenience wrappers around the same control plane APIs.

## Quota Types

| Type | Description |
|------|-------------|
| **TPM** | Tokens Per Minute, pay-per-token, subject to rate limits |
| **PTU** | Provisioned Throughput Units, monthly commitment, no rate limits |
| **Region** | Max capacity per region, shared across subscription |
| **Slots** | 10-20 deployment slots per resource |

**When to use PTU:** Consistent high-volume production workloads where monthly commitment is cost-effective.

---

Use this sub-skill when the user needs to:

- **View quota usage** โ€” check current TPM/PTU allocation and available capacity
- **Check quota limits** โ€” show quota limits for a subscription, region, or model
- **Find optimal regions** โ€” compare quota availability across regions for deployment
- **Plan deployments** โ€” verify sufficient quota before deploying models
- **Request quota increases** โ€” navigate quota increase process through Azure Portal
- **Troubleshoot deployment failures** โ€” diagnose QuotaExceeded, InsufficientQuota, DeploymentLimitReached, 429 rate limit errors
- **Optimize allocation** โ€” monitor and consolidate quota across deployments
- **Monitor quota across deployments** โ€” track capacity by model and region
- **Explain quota concepts** โ€” explain TPM, PTU, capacity units, regional quotas
- **Free up quota** โ€” identify and delete unused deployments

**Key Points:**
1. Isolated by region (East US โ‰  West US)
2. Regional capacity varies by model
3. Multi-region enables failover and load distribution
4. Quota requests specify target region

See [detailed guide](./references/workflows.md#regional-quota).

---

## Core Workflows

### 1. Check Regional Quota

```bash
subId=$(az account show --query id -o tsv)
az rest --method get \
  --url "https://management.azure.com/subscriptions/$subId/providers/Microsoft.CognitiveServices/locations/eastus/usages?api-version=2023-05-01" \
  --query "value[?contains(name.value,'OpenAI')].{Model:name.value, Used:currentValue, Limit:limit}" -o table
```

**Output interpretation:**
- **Used**: Current TPM consumed (10000 = 10K TPM)
- **Limit**: Maximum TPM quota (15000 = 15K TPM)
- **Available**: Limit - Used (5K TPM available)

Change region: `eastus`, `eastus2`, `westus`, `westus2`, `swedencentral`, `uksouth`.

---

### 2. Find Best Region for Deployment

Check specific regions for available quota:

```bash
subId=$(az account show --query id -o tsv)
region="eastus"
az rest --method get \
  --url "https://management.azure.com/subscriptions/$subId/providers/Microsoft.CognitiveServices/locations/$region/usages?api-version=2023-05-01" \
  --query "value[?name.value=='OpenAI.Standard.gpt-4o'].{Model:name.value, Used:currentValue, Limit:limit, Available:(limit-currentValue)}" -o table
```

See [workflows reference](./references/workflows.md#multi-region-check) for multi-region comparison.

---

### 3. Check Quota Before Deployment

Verify available quota for your target model:

```bash
subId=$(az account show --query id -o tsv)
region="eastus"
model="OpenAI.Standard.gpt-4o"

az rest --method get \
  --url "https://management.azure.com/subscriptions/$subId/providers/Microsoft.CognitiveServices/locations/$region/usages?api-version=2023-05-01" \
  --query "value[?name.value=='$model'].{Model:name.value, Used:currentValue, Limit:limit, Available:(limit-currentValue)}" -o table
```

- **Available > 0**: Yes, you have quota
- **Available = 0**: Delete unused deployments or try different region

---

### 4. Monitor Quota by Model

Show quota allocation grouped by model:

```bash
subId=$(az account show --query id -o tsv)
region="eastus"
az rest --method get \
  --url "https://management.azure.com/subscriptions/$subId/providers/Microsoft.CognitiveServices/locations/$region/usages?api-version=2023-05-01" \
  --query "value[?contains(name.value,'OpenAI')].{Model:name.value, Used:currentValue, Limit:limit, Available:(limit-currentValue)}" -o table
```

Shows aggregate usage across ALL deployments by model type.

**Optional:** List individual deployments:
```bash
az cognitiveservices account list --query "[?kind=='AIServices'].{Name:name,RG:resourceGroup}" -o table

az cognitiveservices account deployment list --name <resource> --resource-group <rg> \
  --query "[].{Name:name,Model:properties.model.name,Capacity:sku.capacity}" -o table
```

---

### 5. Delete Deployment (Free Quota)

```bash
az cognitiveservices account deployment delete --name <resource> --resource-group <rg> \
  --deployment-name <deployment>
```

Quota freed **immediately**. Re-run Workflow #1 to verify.

---

### 6. Request Quota Increase

**Azure Portal Process:**
1. Navigate to [Azure Portal - All Resources](https://portal.azure.com/#view/HubsExtension/BrowseAll) โ†’ Filter "AI Services" โ†’ Click resource
2. Select **Quotas** in left navigation
3. Click **Request quota increase**
4. Fill form: Model, Current Limit, Requested Limit, Region, **Business Justification**
5. Wait for approval: **3-5 business days typically, up to 10 business days** ([source](https://learn.microsoft.com/en-us/azure/ai-foundry/openai/how-to/quota))

**Justification template:**
```
Production [workload type] using [model] in [region].
Expected traffic: [X requests/day] with [Y tokens/request].
Calculated required TPM: [Z TPM]. Current [N TPM] insufficient.
Request increase to [M TPM]. Deployment target: [date].
```

See [detailed quota request guide](./references/workflows.md#request-quota-increase) for complete steps.

---

## Quick Troubleshooting

| Error | Quick Fix | Detailed Guide |
|-------|-----------|----------------|
| `QuotaExceeded` | Delete unused deployments or request increase | [Error Resolution](./references/error-resolution.md#quotaexceeded) |
| `InsufficientQuota` | Reduce capacity or try different region | [Error Resolution](./references/error-resolution.md#insufficientquota) |
| `DeploymentLimitReached` | Delete unused deployments (10-20 slot limit) | [Error Resolution](./references/error-resolution.md#deploymentlimitreached) |
| `429 Rate Limit` | Increase TPM or migrate to PTU | [Error Resolution](./references/error-resolution.md#429-errors) |

---

## References

**Detailed Guides:**
- [Error Resolution Workflows](./references/error-resolution.md) - Detailed workflows for quota exhausted, 429 errors, insufficient quota, deployment limits
- [Troubleshooting Guide](./references/troubleshooting.md) - Quick error fixes and diagnostic commands
- [Quota Optimization Strategies](./references/optimization.md) - 5 strategies for freeing quota and reducing costs
- [Capacity Planning Guide](./references/capacity-planning.md) - TPM vs PTU comparison, model selection, workload calculations
- [Workflows Reference](./references/workflows.md) - Complete workflow steps and multi-region checks
- [PTU Guide](./references/ptu-guide.md) - Provisioned throughput capacity planning

**Official Microsoft Documentation:**
- [Azure OpenAI Service Pricing](https://azure.microsoft.com/en-us/pricing/details/cognitive-services/openai-service/) - Official pay-per-token rates
- [PTU Costs and Billing](https://learn.microsoft.com/en-us/azure/ai-foundry/openai/how-to/provisioned-throughput-onboarding) - PTU hourly rates
- [Azure OpenAI Models](https://learn.microsoft.com/en-us/azure/ai-services/openai/concepts/models) - Model capabilities and regions
- [Quota Management Guide](https://learn.microsoft.com/azure/ai-services/openai/how-to/quota) - Official quota procedures
- [Quotas and Limits](https://learn.microsoft.com/azure/ai-services/openai/quotas-limits) - Rate limits and quota details

**Calculators:**
- [Azure Pricing Calculator](https://azure.microsoft.com/pricing/calculator/) - Official pricing estimator
- Azure AI Foundry PTU calculator (Microsoft Foundry โ†’ Operate โ†’ Quota โ†’ Provisioned Throughput Unit tab) - PTU capacity sizing
quota/references/
capacity-planning.md 8.1 KB
# Capacity Planning Guide

Comprehensive guide for planning Azure AI Foundry capacity, including cost analysis, model selection, and workload calculations.

**Table of Contents:** [Cost Comparison: TPM vs PTU](#cost-comparison-tpm-vs-ptu) ยท [Production Workload Examples](#production-workload-examples) ยท [Model Selection and Deployment Type Guidance](#model-selection-and-deployment-type-guidance)

## Cost Comparison: TPM vs PTU

> **Official Pricing Sources:**
> - [Azure OpenAI Service Pricing](https://azure.microsoft.com/en-us/pricing/details/cognitive-services/openai-service/) - Official pay-per-token rates
> - [PTU Costs and Billing Guide](https://learn.microsoft.com/en-us/azure/ai-foundry/openai/how-to/provisioned-throughput-onboarding) - PTU hourly rates and capacity planning

**TPM (Standard) Pricing:**
- Pay-per-token for input/output
- No upfront commitment
- **Rates**: See [Azure OpenAI Pricing](https://azure.microsoft.com/en-us/pricing/details/cognitive-services/openai-service/)
  - GPT-4o: ~$0.0025-$0.01/1K tokens
  - GPT-4 Turbo: ~$0.01-$0.03/1K
  - GPT-3.5 Turbo: ~$0.0005-$0.0015/1K
- **Best for**: Variable workloads, unpredictable traffic

**PTU (Provisioned) Pricing:**
- Hourly billing: `$/PTU/hr ร— PTUs ร— 730 hrs/month`
- Monthly commitment with Reservations discounts
- **Rates**: See [PTU Billing Guide](https://learn.microsoft.com/en-us/azure/ai-foundry/openai/how-to/provisioned-throughput-onboarding)
- Use PTU calculator to determine requirements (Microsoft Foundry โ†’ Operate โ†’ Quota โ†’ Provisioned Throughput Unit tab)
- **Best for**: High-volume (>1M tokens/day), predictable traffic, guaranteed throughput

**Cost Decision Framework** (Analytical Guidance):

```
Step 1: Calculate monthly TPM cost
  Monthly TPM cost = (Daily tokens ร— 30 days ร— $price per 1K tokens) / 1000

Step 2: Calculate monthly PTU cost
  Monthly PTU cost = Required PTUs ร— 730 hours/month ร— $PTU-hour rate
  (Get Required PTUs from Azure AI Foundry portal: Microsoft Foundry โ†’ Operate โ†’ Quota โ†’ Provisioned Throughput Unit tab)

Step 3: Compare
  Use PTU when: Monthly PTU cost < (Monthly TPM cost ร— 0.7)
  (Use 70% threshold to account for commitment risk)
```

**Example Calculation** (Analytical):

Scenario: 1M requests/day, average 1,000 tokens per request

- **Daily tokens**: 1,000,000 ร— 1,000 = 1B tokens/day
- **TPM Cost** (using GPT-4o at $0.005/1K avg): (1B ร— 30 ร— $0.005) / 1000 = ~$150,000/month
- **PTU Cost** (estimated 100 PTU at ~$5/PTU-hour): 100 PTU ร— 730 hours ร— $5 = ~$365,000/month
- **Decision**: Use TPM (significantly lower cost for this workload)

> **Important**: Always use the official [Azure Pricing Calculator](https://azure.microsoft.com/pricing/calculator/) and Azure AI Foundry portal PTU calculator (Microsoft Foundry โ†’ Operate โ†’ Quota โ†’ Provisioned Throughput Unit tab) for exact pricing by model, region, and workload. Prices vary by region and are subject to change.

---

## Production Workload Examples

To estimate quota requirements, use real-world production scenarios with capacity calculations for gpt-4, version 0613 (from Azure Foundry Portal calculator):

| Workload Type | Calls/Min | Prompt Tokens | Response Tokens | Cache Hit % | Total Tokens/Min | PTU Required | TPM Equivalent |
|---------------|-----------|---------------|-----------------|-------------|------------------|--------------|----------------|
| **RAG Chat** | 10 | 3,500 | 300 | 20% | 38,000 | 100 | 38K TPM |
| **Basic Chat** | 10 | 500 | 100 | 20% | 6,000 | 100 | 6K TPM |
| **Summarization** | 10 | 5,000 | 300 | 20% | 53,000 | 100 | 53K TPM |
| **Classification** | 10 | 3,800 | 10 | 20% | 38,100 | 100 | 38K TPM |

**How to Estimate Your Production Quota Requirements:**

To calculate your quota needs for production deployments, follow these steps:

1. **Determine your peak calls per minute**: Monitor or estimate maximum concurrent requests
2. **Measure token usage**: Average prompt size + response size
3. **Account for cache hits**: Prompt caching can reduce effective token count by 20-50%
4. **Calculate total tokens/min**: (Calls/min ร— (Prompt tokens + Response tokens)) ร— (1 - Cache %)
5. **Choose deployment type**:
   - **TPM (Standard)**: Allocate 1.5-2ร— your calculated tokens/min for headroom
   - **PTU (Provisioned)**: Use Azure AI Foundry portal PTU calculator for exact PTU count (Microsoft Foundry โ†’ Operate โ†’ Quota โ†’ Provisioned Throughput Unit tab)

**Example Calculation (RAG Chat Production):**
- Peak: 10 calls/min
- Prompt: 3,500 tokens (context + question)
- Response: 300 tokens (answer)
- Cache: 20% hit rate (reduces prompt tokens by 20%)
- **Total TPM needed**: (10 ร— (3,500 ร— 0.8 + 300)) = 31,000 TPM
- **With 50% headroom**: 46,500 TPM โ†’ Round to **50K TPM deployment**

**PTU Recommendation:**
For the combined workload (40 calls/min, 135K tokens/min total), use **200 PTU** (from calculator above).

---

## Model Selection and Deployment Type Guidance

> **Official Documentation:**
> - [Choose the Right AI Model for Your Workload](https://learn.microsoft.com/en-us/azure/architecture/ai-ml/guide/choose-ai-model) - Microsoft Architecture Center
> - [Azure OpenAI Models](https://learn.microsoft.com/en-us/azure/ai-services/openai/concepts/models) - Model capabilities, regions, and quotas
> - [Understanding Deployment Types](https://learn.microsoft.com/en-us/azure/ai-foundry/foundry-models/concepts/deployment-types) - Standard vs Provisioned guidance

**Model Characteristics** (from [official Azure OpenAI documentation](https://learn.microsoft.com/en-us/azure/ai-services/openai/concepts/models)):

| Model | Key Characteristics | Best For |
|-------|---------------------|----------|
| **GPT-4o** | Matches GPT-4 Turbo performance in English text/coding, superior in non-English and vision tasks. Cheaper and faster than GPT-4 Turbo. | Multimodal tasks, cost-effective general purpose, high-volume production workloads |
| **GPT-4 Turbo** | Superior reasoning capabilities, larger context window (128K tokens) | Complex reasoning tasks, long-context analysis |
| **GPT-3.5 Turbo** | Most cost-effective, optimized for chat and completions, fast response time | Simple tasks, customer service, high-volume low-cost scenarios |
| **GPT-4o mini** | Fastest response time, low latency | Latency-sensitive applications requiring immediate responses |
| **text-embedding-3-large** | Purpose-built for vector embeddings | RAG applications, semantic search, document similarity |

**Deployment Type Selection** (from [official deployment types guide](https://learn.microsoft.com/en-us/azure/ai-foundry/foundry-models/concepts/deployment-types)):

| Traffic Pattern | Recommended Deployment Type | Reason |
|-----------------|---------------------------|---------|
| **Variable, bursty traffic** | Standard or Global Standard (pay-per-token) | No commitment, pay only for usage |
| **Consistent high volume** | Provisioned types (PTU) | Reserved capacity, predictable costs |
| **Large batch jobs (non-time-sensitive)** | Global Batch or DataZone Batch | 50% cost savings vs Standard |
| **Low latency variance required** | Provisioned types | Guaranteed throughput, no rate limits |
| **No regional restrictions** | Global Standard or Global Provisioned | Access to best available capacity |

**Capacity Planning Approach** (from [PTU onboarding guide](https://learn.microsoft.com/en-us/azure/ai-foundry/openai/how-to/provisioned-throughput-onboarding)):

To calculate and estimate your capacity requirements:

1. **Calculate your TPM requirements**: Determine required tokens per minute based on your expected workload
2. **Use the built-in capacity planner**: Available in Azure AI Foundry portal (Microsoft Foundry โ†’ Operate โ†’ Quota โ†’ Provisioned Throughput Unit tab)
3. **Input your metrics**: Enter input TPM and output TPM based on your workload characteristics
4. **Get PTU recommendation**: The calculator provides PTU allocation recommendation
5. **Compare costs**: Evaluate Standard (TPM) vs Provisioned (PTU) using the official pricing calculator

> **Note**: Microsoft does not publish specific "X requests/day = Y TPM" recommendations as capacity requirements vary significantly based on prompt size, response length, cache hit rates, and model choice. Use the built-in capacity planner with your actual workload characteristics.
error-resolution.md 4.9 KB
# Error Resolution Workflows

**Table of Contents:** [Workflow 7: Quota Exhausted Recovery](#workflow-7-quota-exhausted-recovery) ยท [Workflow 8: Resolve 429 Rate Limit Errors](#workflow-8-resolve-429-rate-limit-errors) ยท [Workflow 9: Resolve DeploymentLimitReached](#workflow-9-resolve-deploymentlimitreached) ยท [Workflow 10: Resolve InsufficientQuota](#workflow-10-resolve-insufficientquota) ยท [Workflow 11: Resolve QuotaExceeded](#workflow-11-resolve-quotaexceeded)

## Workflow 7: Quota Exhausted Recovery

**A. Deploy to Different Region**
```bash
subId=$(az account show --query id -o tsv)
for region in eastus westus eastus2 westus2 swedencentral uksouth; do
  az rest --method get --url "https://management.azure.com/subscriptions/$subId/providers/Microsoft.CognitiveServices/locations/$region/usages?api-version=2023-05-01" \
    --query "value[?name.value=='OpenAI.Standard.gpt-4o'].{Model:name.value, Used:currentValue, Limit:limit, Available:(limit-currentValue)}" -o table &
done; wait
```

**B. Delete Unused Deployments**
```bash
az cognitiveservices account deployment delete --name <resource> --resource-group <rg> --deployment-name <deployment>
```

**C. Request Quota Increase (3-5 days)**

**D. Migrate to PTU** - See capacity-planning.md

---

## Workflow 8: Resolve 429 Rate Limit Errors

**Identify Deployment:**
```bash
az cognitiveservices account deployment list --name <resource> --resource-group <rg> \
  --query "[].{Name:name,Model:properties.model.name,TPM:sku.capacity*1000}" -o table
```

**Solutions:**

**A. Increase Capacity**
```bash
az cognitiveservices account deployment update --name <resource> --resource-group <rg> --deployment-name <deployment> --sku-capacity 100
```

**B. Add Retry Logic** - Exponential backoff in code

**C. Load Balance**
```bash
az cognitiveservices account deployment create --name <resource> --resource-group <rg> --deployment-name gpt-4o-2 \
  --model-name gpt-4o --model-version "2024-05-13" --model-format OpenAI --sku-name Standard --sku-capacity 100
```

**D. Migrate to PTU** - No rate limits

---

## Workflow 9: Resolve DeploymentLimitReached

**Root Cause:** 10-20 slots per resource.

**Check Count:**
```bash
deployment_count=$(az cognitiveservices account deployment list --name <resource> --resource-group <rg> --query "length(@)")
echo "Deployments: $deployment_count / ~20 slots"
```

**Find Test Deployments:**
```bash
az cognitiveservices account deployment list --name <resource> --resource-group <rg> \
  --query "[?contains(name,'test') || contains(name,'demo')].{Name:name}" -o table
```

**Delete:**
```bash
az cognitiveservices account deployment delete --name <resource> --resource-group <rg> --deployment-name <deployment>
```

**Or Create New Resource (fresh 10-20 slots):**
```bash
az cognitiveservices account create --name "my-foundry-2" --resource-group <rg> --location eastus --kind AIServices --sku S0 --yes
```

---

## Workflow 10: Resolve InsufficientQuota

**Root Cause:** Requested capacity exceeds available quota.

**Check Quota:**
```bash
subId=$(az account show --query id -o tsv)
az rest --method get --url "https://management.azure.com/subscriptions/$subId/providers/Microsoft.CognitiveServices/locations/eastus/usages?api-version=2023-05-01" \
  --query "value[?contains(name.value,'OpenAI')].{Model:name.value, Used:currentValue, Limit:limit, Available:(limit-currentValue)}" -o table
```

**Solutions:**

**A. Reduce Capacity**
```bash
az cognitiveservices account deployment create --name <resource> --resource-group <rg> --deployment-name gpt-4o \
  --model-name gpt-4o --model-version "2024-05-13" --model-format OpenAI --sku-name Standard --sku-capacity 20
```

**B. Delete Unused Deployments**
```bash
az cognitiveservices account deployment delete --name <resource> --resource-group <rg> --deployment-name <unused>
```

**C. Different Region** - Check quota with multi-region script (Workflow 7)

**D. Request Increase (3-5 days)**

---

## Workflow 11: Resolve QuotaExceeded

**Root Cause:** Deployment exceeds regional quota.

**Check Quota:**
```bash
subId=$(az account show --query id -o tsv)
az rest --method get --url "https://management.azure.com/subscriptions/$subId/providers/Microsoft.CognitiveServices/locations/eastus/usages?api-version=2023-05-01" \
  --query "value[?contains(name.value,'OpenAI')]" -o table
```

**Multi-Region Check:** (Use Workflow 7 script)

**Solutions:**

**A. Delete Unused Deployments**
```bash
az cognitiveservices account deployment delete --name <resource> --resource-group <rg> --deployment-name <unused>
```

**B. Different Region**
```bash
az cognitiveservices account deployment create --name <resource> --resource-group <rg> --deployment-name gpt-4o \
  --model-name gpt-4o --model-version "2024-05-13" --model-format OpenAI --sku-name Standard --sku-capacity 50
```

**C. Request Increase (3-5 days)**

**D. Reduce Capacity**

**Decision:** Available < 10% โ†’ Different region; 10-50% โ†’ Delete/reduce; > 50% โ†’ Delete one deployment

---

optimization.md 7.6 KB
# Quota Optimization Strategies

Comprehensive strategies for optimizing Azure AI Foundry quota allocation and reducing costs.

**Table of Contents:** [1. Identify and Delete Unused Deployments](#1-identify-and-delete-unused-deployments) ยท [2. Right-Size Over-Provisioned Deployments](#2-right-size-over-provisioned-deployments) ยท [3. Consolidate Multiple Small Deployments](#3-consolidate-multiple-small-deployments) ยท [4. Cost Optimization Strategies](#4-cost-optimization-strategies) ยท [5. Regional Quota Rebalancing](#5-regional-quota-rebalancing)

## 1. Identify and Delete Unused Deployments

**Step 1: Discovery with Quota Context**

Get quota limits FIRST to understand how close you are to capacity:

```bash
# Check current quota usage vs limits (run this FIRST)
subId=$(az account show --query id -o tsv)
region="eastus"  # Change to your region
az rest --method get \
  --url "https://management.azure.com/subscriptions/$subId/providers/Microsoft.CognitiveServices/locations/$region/usages?api-version=2023-05-01" \
  --query "value[?contains(name.value,'OpenAI')].{Model:name.value, Used:currentValue, Limit:limit, Available:'(Limit - Used)'}" -o table
```

**Step 2: Parallel Deployment Enumeration**

List all deployments across resources efficiently:

```bash
# Get all Foundry resources
resources=$(az cognitiveservices account list --query "[?kind=='AIServices'].{name:name,rg:resourceGroup}" -o json)

# Parallel deployment enumeration (faster than sequential)
echo "$resources" | jq -r '.[] | "\(.name) \(.rg)"' | while read name rg; do
  echo "=== $name ($rg) ==="
  az cognitiveservices account deployment list --name "$name" --resource-group "$rg" \
    --query "[].{Deployment:name,Model:properties.model.name,Capacity:sku.capacity,Created:systemData.createdAt}" -o table &
done
wait  # Wait for all background jobs to complete
```

**Step 3: Identify Stale Deployments**

Criteria for deletion candidates:

- **Test/temporary naming**: Contains "test", "demo", "temp", "dev" in deployment name
- **Old timestamps**: Created >90 days ago with timestamp-based naming (e.g., "gpt4-20231015")
- **High capacity consumers**: Deployments with >100K TPM capacity that haven't been referenced in recent logs
- **Duplicate models**: Multiple deployments of same model/version in same region

**Example pattern matching for stale deployments:**
```bash
# Find deployments with test/temp naming
az cognitiveservices account deployment list --name <resource> --resource-group <rg> \
  --query "[?contains(name,'test') || contains(name,'demo') || contains(name,'temp')].{Name:name,Capacity:sku.capacity}" -o table
```

**Step 4: Delete and Verify Quota Recovery**

```bash
# Delete unused deployment (quota freed IMMEDIATELY)
az cognitiveservices account deployment delete --name <resource> --resource-group <rg> --deployment-name <deployment>

# Verify quota freed (re-run Step 1 quota check)
# You should see "Used" decrease by the deployment's capacity
```

**Cost Impact Analysis:**

| Deployment Type | Capacity (TPM) | Quota Freed | Cost Impact (TPM) | Cost Impact (PTU) |
|-----------------|----------------|-------------|-------------------|-------------------|
| Test deployment | 10K TPM | 10K TPM | $0 (pay-per-use) | N/A |
| Unused production | 100K TPM | 100K TPM | $0 (pay-per-use) | N/A |
| Abandoned PTU deployment | 100 PTU | ~40K TPM equivalent | $0 TPM | **$3,650/month saved** (100 PTU ร— 730h ร— $0.05/h) |
| High-capacity test | 450K TPM | 450K TPM | $0 (pay-per-use) | N/A |

**Key Insight:** For TPM (Standard) deployments, deletion frees quota but has no direct cost impact (you pay per token used). For PTU (Provisioned) deployments, deletion **immediately stops hourly charges** and can save thousands per month.

---

## 2. Right-Size Over-Provisioned Deployments

**Identify over-provisioned deployments:**
- Check Azure Monitor metrics for actual token usage
- Compare allocated TPM vs. peak usage
- Look for deployments with <50% utilization

**Right-sizing example:**
```bash
# Update deployment to lower capacity
az cognitiveservices account deployment update --name <resource> --resource-group <rg> \
  --deployment-name <deployment> --sku-capacity 30  # Reduce from 50K to 30K TPM
```

**Cost Optimization:**
- **TPM (Standard)**: Reduces regional quota consumption (no direct cost savings, pay-per-token)
- **PTU (Provisioned)**: Direct cost reduction (40% capacity reduction = 40% cost reduction)

---

## 3. Consolidate Multiple Small Deployments

**Pattern:** Multiple 10K TPM deployments โ†’ One 30-50K TPM deployment

**Benefits:**
- Fewer deployment slots consumed
- Simpler management
- Same total capacity, better utilization

**Example:**
- **Before**: 3 deployments @ 10K TPM each = 30K TPM total, 3 slots used
- **After**: 1 deployment @ 30K TPM = 30K TPM total, 1 slot used
- **Savings**: 2 deployment slots freed for other models

---

## 4. Cost Optimization Strategies

> **Official Documentation**: [Plan to manage costs for Azure OpenAI](https://learn.microsoft.com/en-us/azure/ai-services/openai/how-to/manage-costs) and [Fine-tuning cost management](https://learn.microsoft.com/en-us/azure/ai-foundry/openai/how-to/fine-tuning-cost-management)

**A. Use Fine-Tuned Smaller Models** (from [Microsoft Transparency Note](https://learn.microsoft.com/en-us/azure/ai-foundry/responsible-ai/openai/transparency-note)):

You can reduce costs or latency by swapping a fine-tuned version of a smaller/faster model (e.g., fine-tuned GPT-3.5-Turbo) for a more general-purpose model (e.g., GPT-4).

```bash
# Deploy fine-tuned GPT-3.5 Turbo as cost-effective alternative to GPT-4
az cognitiveservices account deployment create --name <resource> --resource-group <rg> \
  --deployment-name gpt-35-tuned --model-name <your-fine-tuned-model> \
  --model-format OpenAI --sku-name Standard --sku-capacity 10
```

**B. Remove Unused Fine-Tuned Deployments** (from [Fine-tuning cost management](https://learn.microsoft.com/en-us/azure/ai-foundry/openai/how-to/fine-tuning-cost-management)):

Fine-tuned model deployments incur **hourly hosting costs** even when not in use. Remove unused deployments promptly to control costs.

- Inactive deployments unused for **15 consecutive days** are automatically deleted
- Proactively delete unused fine-tuned deployments to avoid hourly charges

```bash
# Delete unused fine-tuned deployment
az cognitiveservices account deployment delete --name <resource> --resource-group <rg> \
  --deployment-name <unused-fine-tuned-deployment>
```

**C. Batch Multiple Requests** (from [Cost optimization Q&A](https://learn.microsoft.com/en-us/answers/questions/1689253/how-to-optimize-costs-per-request-azure-openai-gpt)):

Batch multiple requests together to reduce the total number of API calls and lower overall costs.

**D. Use Commitment Tiers for Predictable Costs** (from [Managing costs guide](https://learn.microsoft.com/en-us/azure/ai-services/openai/how-to/manage-costs)):

- **Pay-as-you-go**: Bills according to usage (variable costs)
- **Commitment tiers**: Commit to using service features for a fixed fee (predictable costs, potential savings for consistent usage)

---

## 5. Regional Quota Rebalancing

If you have quota spread across multiple regions but only use some:

```bash
# Check quota across regions
for region in eastus westus uksouth; do
  echo "=== $region ==="
  subId=$(az account show --query id -o tsv)
  az rest --method get \
    --url "https://management.azure.com/subscriptions/$subId/providers/Microsoft.CognitiveServices/locations/$region/usages?api-version=2023-05-01" \
    --query "value[?contains(name.value,'OpenAI')].{Model:name.value, Used:currentValue, Limit:limit}" -o table
done
```

**Optimization:** Concentrate deployments in fewer regions to maximize quota utilization per region.
ptu-guide.md 6.2 KB
# Provisioned Throughput Units (PTU) Guide

**Table of Contents:** [Understanding PTU vs Standard TPM](#understanding-ptu-vs-standard-tpm) ยท [When to Use PTU](#when-to-use-ptu) ยท [PTU Capacity Planning](#ptu-capacity-planning) ยท [Deploy Model with PTU](#deploy-model-with-ptu) ยท [Request PTU Quota Increase](#request-ptu-quota-increase) ยท [Understanding Region and Deployment Quotas](#understanding-region-and-deployment-quotas) ยท [External Resources](#external-resources)

## Understanding PTU vs Standard TPM

Microsoft Foundry offers two quota types:

### Standard TPM (Tokens Per Minute)
- Pay-as-you-go model, charged per token
- Each deployment consumes capacity units (e.g., 10K TPM, 50K TPM)
- Total regional quota shared across all deployments
- Subject to rate limiting during high demand (429 errors possible)
- Best for: Variable workloads, development, testing, bursty traffic

### Provisioned Throughput Units (PTU)
- Monthly commitment for guaranteed throughput
- No rate limiting, consistent latency
- Measured in PTU units (not TPM)
- Best for: Predictable, high-volume production workloads
- More cost-effective when consistent token usage justifies monthly commitment

## When to Use PTU

| Factor | Standard (TPM) | Provisioned (PTU) |
|--------|----------------|-------------------|
| **Best For** | Variable workloads, development, testing | Predictable production workloads |
| **Pricing** | Pay-per-token | Monthly commitment (hourly rate per PTU) |
| **Rate Limits** | Yes (429 errors possible) | No (guaranteed throughput) |
| **Latency** | Variable | Consistent |
| **Cost Decision** | Lower upfront commitment | More economical for consistent, high-volume usage |
| **Flexibility** | Scale up/down instantly | Requires planning and commitment |
| **Use Case** | Prototyping, bursty traffic | Production apps, high-volume APIs |

**Use PTU when:**
- Consistent, predictable token usage where monthly commitment is cost-effective
- Need guaranteed throughput (no 429 rate limit errors)
- Require consistent latency with performance SLA
- High-volume production workloads with stable traffic patterns

**Decision Guidance:**
Compare your current pay-as-you-go costs with PTU pricing. PTU may be more economical when consistent usage justifies the monthly commitment.

## PTU Capacity Planning

### Official Calculation Methods

> **Agent Instruction:** Only present official Azure capacity calculator methods below. Do NOT generate or suggest estimated PTU formulas, TPM-per-PTU conversion tables, or reference deprecated calculators (oai.azure.com/portal/calculator).

Calculate PTU requirements using these official methods:

**Method 1: Microsoft Foundry Portal**
1. Navigate to Microsoft Foundry portal
2. Go to **Operate** โ†’ **Quota**
3. Select **Provisioned throughput unit** tab
4. Click **Capacity calculator** button
5. Enter workload parameters (model, tokens/call, RPM, latency target)
6. Calculator returns exact PTU count needed

**Method 2: Using Azure REST API**
```bash
# Calculate required PTU capacity
curl -X POST "https://management.azure.com/subscriptions/<subscription-id>/providers/Microsoft.CognitiveServices/calculateModelCapacity?api-version=2024-10-01" \
  -H "Authorization: Bearer <access-token>" \
  -H "Content-Type: application/json" \
  -d '{
    "model": {
      "format": "OpenAI",
      "name": "gpt-4o",
      "version": "2024-05-13"
    },
    "workload": {
      "requestPerMin": 100,
      "tokensPerMin": 50000,
      "peakRequestsPerMin": 150
    }
  }'
```

## Deploy Model with PTU

### Step 1: Calculate PTU Requirements

Use the official capacity calculator methods above to determine required PTU capacity.

### Step 2: Deploy with PTU

```bash
# Deploy model with calculated PTU capacity
az cognitiveservices account deployment create \
  --name <resource-name> \
  --resource-group <rg> \
  --deployment-name gpt-4o-ptu-deployment \
  --model-name gpt-4o \
  --model-version "2024-05-13" \
  --model-format OpenAI \
  --sku-name ProvisionedManaged \
  --sku-capacity 100

# Check PTU deployment status
az cognitiveservices account deployment show \
  --name <resource-name> \
  --resource-group <rg> \
  --deployment-name gpt-4o-ptu-deployment
```

**Key Differences from Standard TPM:**
- SKU name: `ProvisionedManaged` (not `Standard`)
- Capacity: Measured in PTU units (not K TPM)
- Billing: Monthly commitment regardless of usage
- No rate limiting (guaranteed throughput)

## Request PTU Quota Increase

PTU quota is separate from TPM quota and requires specific justification:

1. Navigate to Azure Portal โ†’ Foundry resource โ†’ **Quotas**
2. Select **Provisioned throughput unit** tab
3. Identify model needing PTU increase (e.g., "GPT-4o PTU")
4. Click **Request quota increase**
5. Fill form:
   - Model name
   - Requested PTU quota
   - Include capacity calculator results in business justification
   - Explain workload characteristics (volume, latency requirements)
6. Submit and monitor status

**Processing Time:** Typically 3-5 business days (longer than standard quota requests)
**Note:** PTU quota requests typically require stronger business justification due to commitment nature

**Alternative:** Deploy to different region with available PTU quota

## Understanding Region and Deployment Quotas

### Region Quota
- Maximum PTU capacity available in an Azure region
- Varies by model type (GPT-4, GPT-4o, etc.)
- Shared across subscription resources in same region
- Separate from TPM quota (you have both TPM and PTU quotas)

### Deployment Slots
- Number of concurrent model deployments allowed
- Typically 10-20 slots per resource
- Each PTU deployment uses one slot (same as TPM deployments)
- Deployment count limit is independent of capacity

## External Resources

- [Understanding PTU Costs](https://learn.microsoft.com/azure/ai-foundry/openai/how-to/provisioned-throughput-onboarding)
- [What Is Provisioned Throughput](https://learn.microsoft.com/azure/ai-foundry/openai/concepts/provisioned-throughput)
- [Calculate Model Capacity API](https://learn.microsoft.com/rest/api/aiservices/accountmanagement/calculate-model-capacity/calculate-model-capacity?view=rest-aiservices-accountmanagement-2024-10-01&tabs=HTTP)
- [PTU Overview](https://learn.microsoft.com/azure/ai-services/openai/concepts/provisioned-throughput)
troubleshooting.md 7.3 KB
# Troubleshooting Quota Errors

**Table of Contents:** [Common Quota Errors](#common-quota-errors) ยท [Detailed Error Resolution](#detailed-error-resolution) ยท [Request Quota Increase Process](#request-quota-increase-process) ยท [Diagnostic Commands](#diagnostic-commands) ยท [External Resources](#external-resources)

## Common Quota Errors

| Error | Cause | Quick Fix |
|-------|-------|-----------|
| `QuotaExceeded` | Regional quota consumed (TPM or PTU) | Delete unused deployments or request increase |
| `InsufficientQuota` | Not enough available for requested capacity | Reduce deployment capacity or free quota |
| `DeploymentLimitReached` | Too many deployment slots used | Delete unused deployments to free slots |
| `429 Rate Limit` | TPM capacity too low for traffic (Standard only) | Increase TPM capacity or migrate to PTU |
| `PTU capacity unavailable` | No PTU quota in region | Request PTU quota or try different region |
| `SKU not supported` | PTU not available for model/region | Check model availability or use Standard TPM |

## Detailed Error Resolution

### QuotaExceeded Error

All available TPM or PTU quota consumed in the region.

**Resolution:**

1. **Check current quota usage:**
   ```bash
   subId=$(az account show --query id -o tsv)
   region="eastus"
   az rest --method get \
     --url "https://management.azure.com/subscriptions/$subId/providers/Microsoft.CognitiveServices/locations/$region/usages?api-version=2023-05-01" \
     --query "value[?contains(name.value,'OpenAI')].{Model:name.value, Used:currentValue, Limit:limit}" -o table
   ```

2. **Choose resolution:**
   - **Option A**: Delete unused deployments to free quota
   - **Option B**: Reduce requested deployment capacity
   - **Option C**: Deploy to different region with available quota
   - **Option D**: Request quota increase through Azure Portal

### InsufficientQuota Error

Available quota less than requested capacity.

**Resolution:**

1. **Check available quota:**
   ```bash
   # Calculate available: limit - currentValue
   subId=$(az account show --query id -o tsv)
   region="eastus"
   az rest --method get \
     --url "https://management.azure.com/subscriptions/$subId/providers/Microsoft.CognitiveServices/locations/$region/usages?api-version=2023-05-01" \
     --query "value[?name.value=='OpenAI.Standard.gpt-4o'].{Model:name.value, Used:currentValue, Limit:limit, Available:(limit-currentValue)}" -o table
   ```

2. **Options:**
   - Reduce deployment capacity to fit available quota
   - Delete existing deployments to free capacity
   - Try different region with more available quota
   - Request quota increase

### DeploymentLimitReached Error

Resource reached maximum deployment slot limit (10-20 slots).

**Resolution:**

1. **List existing deployments:**
   ```bash
   az cognitiveservices account deployment list \
     --name <resource-name> \
     --resource-group <rg> \
     --query '[].{Name:name, Model:properties.model.name, Capacity:sku.capacity}' \
     --output table
   ```

2. **Delete unused deployments:**
   ```bash
   az cognitiveservices account deployment delete \
     --name <resource-name> \
     --resource-group <rg> \
     --deployment-name <unused-deployment-name>
   ```

3. **Verify slot freed:**
   ```bash
   az cognitiveservices account deployment list \
     --name <resource-name> \
     --resource-group <rg> \
     --query 'length([])'
   ```

### 429 Rate Limit Errors

TPM capacity insufficient for traffic volume (Standard TPM only).

**Resolution:**

1. **Check deployment capacity:**
   ```bash
   az cognitiveservices account deployment show \
     --name <resource-name> \
     --resource-group <rg> \
     --deployment-name <deployment-name> \
     --query '{Name:name, Model:properties.model.name, Capacity:sku.capacity, SKU:sku.name}'
   ```

2. **Options:**
   - **Option A**: Increase TPM capacity on existing deployment
     ```bash
     az cognitiveservices account deployment update \
       --name <resource-name> \
       --resource-group <rg> \
       --deployment-name <deployment-name> \
       --sku-capacity <higher-capacity>
     ```
   - **Option B**: Migrate to PTU for guaranteed throughput (no rate limits)
   - **Option C**: Implement retry logic with exponential backoff in application

### PTU Capacity Unavailable Error

No PTU quota allocated in region, or PTU not available for model/region.

**Resolution:**

1. **Check PTU quota:**
   ```bash
   subId=$(az account show --query id -o tsv)
   region="eastus"
   az rest --method get \
     --url "https://management.azure.com/subscriptions/$subId/providers/Microsoft.CognitiveServices/locations/$region/usages?api-version=2023-05-01" \
     --query "value[?contains(name.value,'ProvisionedManaged')].{Model:name.value, Used:currentValue, Limit:limit}" -o table
   ```

2. **Options:**
   - Request PTU quota increase through Azure Portal (include capacity calculator results)
   - Try different region where PTU is available
   - Use Standard TPM instead

### SKU Not Supported Error

PTU not available for specific model or region combination.

**Resolution:**

1. **Check model availability:**
   - Review [PTU model availability by region](https://learn.microsoft.com/azure/ai-services/openai/concepts/models#provisioned-deployment-model-availability)

2. **Options:**
   - Deploy with Standard TPM SKU instead
   - Choose different region where PTU is supported
   - Use alternative model that supports PTU in your region

## Request Quota Increase Process

### For Standard TPM Quota

1. Navigate to Azure Portal โ†’ Your Foundry resource โ†’ **Quotas**
2. Identify model needing increase (e.g., "GPT-4o Standard")
3. Click **Request quota increase**
4. Fill form:
   - Model name
   - Requested quota (in TPM)
   - Business justification (required)
5. Submit and monitor status

**Processing Time:** Typically 1-2 business days

### For PTU Quota

1. Navigate to Azure Portal โ†’ Your Foundry resource โ†’ **Quotas**
2. Select **Provisioned throughput unit** tab
3. Identify model needing PTU increase
4. Click **Request quota increase**
5. Fill form:
   - Model name
   - Requested PTU quota
   - Include capacity calculator results
   - Detailed business justification (workload characteristics)
6. Submit and monitor status

**Processing Time:** Typically 3-5 business days (requires stronger justification)

## Diagnostic Commands

```bash
# Check deployment status
az cognitiveservices account deployment show \
  --name <resource-name> \
  --resource-group <rg> \
  --deployment-name <deployment-name>

# Verify available quota
subId=$(az account show --query id -o tsv)
az rest --method get \
  --url "https://management.azure.com/subscriptions/$subId/providers/Microsoft.CognitiveServices/locations/eastus/usages?api-version=2023-05-01" \
  --query "value[?contains(name.value,'OpenAI')].{Model:name.value, Used:currentValue, Limit:limit, Available:(limit-currentValue)}" \
  --output table

# List all deployments
az cognitiveservices account deployment list \
  --name <resource-name> \
  --resource-group <rg> \
  --query '[].{Name:name, Model:properties.model.name, Capacity:sku.capacity, SKU:sku.name}' \
  --output table
```

## External Resources

- [Quota Management Documentation](https://learn.microsoft.com/azure/ai-services/openai/how-to/quota)
- [Rate Limits Documentation](https://learn.microsoft.com/azure/ai-services/openai/quotas-limits)
- [Troubleshooting Guide](https://learn.microsoft.com/azure/ai-services/openai/troubleshooting)
workflows.md 6.9 KB
# Detailed Workflows: Quota Management

**Table of Contents:** [Workflow 1: View Current Quota Usage](#workflow-1-view-current-quota-usage---detailed-steps) ยท [Workflow 2: Find Best Region for Model Deployment](#workflow-2-find-best-region-for-model-deployment---detailed-steps) ยท [Workflow 3: Check Quota Before Deployment](#workflow-3-check-quota-before-deployment---detailed-steps) ยท [Workflow 4: Monitor Quota Across Deployments](#workflow-4-monitor-quota-across-deployments---detailed-steps) ยท [Quick Command Reference](#quick-command-reference) ยท [MCP Tools Reference](#mcp-tools-reference-optional-wrappers)

## Workflow 1: View Current Quota Usage - Detailed Steps

### Step 1: Show Regional Quota Summary (REQUIRED APPROACH)

> **CRITICAL AGENT INSTRUCTION:**
> - When showing quota: Query REGIONAL quota summary, NOT individual resources
> - DO NOT run `az cognitiveservices account list` for quota queries
> - DO NOT filter resources by username or name patterns
> - ONLY check specific resource deployments if user provides resource name
> - Quotas are managed at SUBSCRIPTION + REGION level, NOT per-resource

**Show Regional Quota Summary:**

```bash
# Get subscription ID
subId=$(az account show --query id -o tsv)

# Check quota for key regions
regions=("eastus" "eastus2" "westus" "westus2")
for region in "${regions[@]}"; do
  echo "=== Region: $region ==="
  az rest --method get \
    --url "https://management.azure.com/subscriptions/$subId/providers/Microsoft.CognitiveServices/locations/$region/usages?api-version=2023-05-01" \
    --query "value[?contains(name.value,'OpenAI.Standard')].{Model:name.value, Used:currentValue, Limit:limit, Available:(limit-currentValue)}" \
    --output table
  echo ""
done
```

### Step 2: If User Asks for Specific Resource (ONLY IF EXPLICITLY REQUESTED)

```bash
# User must provide resource name
az cognitiveservices account deployment list \
  --name <user-provided-resource-name> \
  --resource-group <user-provided-rg> \
  --query '[].{Name:name, Model:properties.model.name, Capacity:sku.capacity, SKU:sku.name}' \
  --output table
```

**Alternative - Use MCP Tools (Optional Wrappers):**
```
foundry_models_deployments_list(
  resource-group="<rg>",
  azure-ai-services="<resource-name>"
)
```
*Note: MCP tools are convenience wrappers around the same control plane APIs shown above.*

**Interpreting Results:**
- `Used` (currentValue): Currently allocated quota
- `Limit`: Maximum quota available in region
- `Available`: Calculated as `limit - currentValue`

## Workflow 2: Find Best Region for Model Deployment - Detailed Steps

### Step 1: Check Single Region

```bash
# Get subscription ID
subId=$(az account show --query id -o tsv)

# Check quota for GPT-4o Standard in a specific region
region="eastus"  # Change to your target region
az rest --method get \
  --url "https://management.azure.com/subscriptions/$subId/providers/Microsoft.CognitiveServices/locations/$region/usages?api-version=2023-05-01" \
  --query "value[?name.value=='OpenAI.Standard.gpt-4o'].{Model:name.value, Used:currentValue, Limit:limit, Available:(limit-currentValue)}" \
  -o table
```

### Step 2: Check Multiple Regions (Common Regions)

Check these regions in sequence by changing the `region` variable:
- `eastus`, `eastus2` - US East Coast
- `westus`, `westus2`, `westus3` - US West Coast
- `swedencentral` - Europe (Sweden)
- `canadacentral` - Canada
- `uksouth` - UK
- `japaneast` - Asia Pacific

**Alternative - Use MCP Tool:**
```
model_quota_list(region="eastus")
```
Repeat for each target region.

**Key Points:**
- Query returns `currentValue` (used), `limit` (max), and calculated `Available`
- Standard SKU format: `OpenAI.Standard.<model-name>`
- For PTU: `OpenAI.ProvisionedManaged.<model-name>`
- Focus on 2-3 regions relevant to your location rather than checking all regions

## Workflow 3: Check Quota Before Deployment - Detailed Steps

**Steps:**
1. Check current usage (workflow #1)
2. Calculate available: `limit - currentValue`
3. Compare: `available >= required_capacity`
4. If insufficient: Use workflow #2 to find region with capacity, or request increase

## Workflow 4: Monitor Quota Across Deployments - Detailed Steps

**Recommended Approach - Regional Quota Overview:**

Show quota by region (better than listing all resources):

```bash
subId=$(az account show --query id -o tsv)
regions=("eastus" "eastus2" "westus" "westus2" "swedencentral")

for region in "${regions[@]}"; do
  echo "=== Region: $region ==="
  az rest --method get \
    --url "https://management.azure.com/subscriptions/$subId/providers/Microsoft.CognitiveServices/locations/$region/usages?api-version=2023-05-01" \
    --query "value[?contains(name.value,'OpenAI')].{Model:name.value, Used:currentValue, Limit:limit, Available:(limit-currentValue)}" \
    --output table
  echo ""
done
```

**Alternative - Check Specific Resource:**

If user wants to monitor a specific resource, ask for resource name first:

```bash
# List deployments for specific resource
az cognitiveservices account deployment list \
  --name <resource-name> \
  --resource-group <rg> \
  --query '[].{Name:name, Model:properties.model.name, Capacity:sku.capacity}' \
  --output table
```

> **Note:** Don't automatically iterate through all resources in the subscription. Show regional quota summary or ask for specific resource name.

## Quick Command Reference

```bash
# View quota for specific model using REST API
subId=$(az account show --query id -o tsv)
region="eastus"  # Change to your region
az rest --method get \
  --url "https://management.azure.com/subscriptions/$subId/providers/Microsoft.CognitiveServices/locations/$region/usages?api-version=2023-05-01" \
  --query "value[?contains(name.value,'gpt-4')].{Name:name.value, Used:currentValue, Limit:limit, Available:(limit-currentValue)}" \
  --output table

# List all deployments with capacity
az cognitiveservices account deployment list \
  --name <resource-name> \
  --resource-group <rg> \
  --query '[].{Name:name, Model:properties.model.name, Capacity:sku.capacity}' \
  --output table

# Delete deployment to free quota
az cognitiveservices account deployment delete \
  --name <resource-name> \
  --resource-group <rg> \
  --deployment-name <deployment-name>
```

## MCP Tools Reference (Optional Wrappers)

**Note:** All quota operations are control plane (management) operations. MCP tools are optional convenience wrappers around Azure CLI commands.

| Tool | Purpose | Equivalent Azure CLI |
|------|---------|---------------------|
| `foundry_models_deployments_list` | List all deployments with capacity | `az cognitiveservices account deployment list` |
| `model_quota_list` | List quota and usage across regions | `az rest` (Management API) |
| `model_catalog_list` | List available models from catalog | `az rest` (Management API) |
| `foundry_resource_get` | Get resource details and endpoint | `az cognitiveservices account show` |

**Recommended:** Use Azure CLI commands directly for control plane operations.
rbac/
rbac.md 6.8 KB
# Microsoft Foundry RBAC Management

Reference for managing RBAC for Microsoft Foundry resources: user permissions, managed identity configuration, and service principal setup for CI/CD.

## Quick Reference

| Property | Value |
|----------|-------|
| **CLI Extension** | `az role assignment`, `az ad sp` |
| **Resource Type** | `Microsoft.CognitiveServices/accounts` |
| **Best For** | Permission management, access auditing, CI/CD setup |

## When to Use

- Grant user access to Foundry resources or projects
- Set up developer permissions (Project Manager, Owner roles)
- Audit role assignments or validate permissions
- Configure managed identity roles for connected resources
- Create service principals for CI/CD pipeline automation
- Troubleshoot permission errors

## Azure AI Foundry Built-in Roles

| Role | Create Projects | Data Actions | Role Assignments |
|------|-----------------|--------------|------------------|
| Azure AI User | No | Yes | No |
| Azure AI Project Manager | Yes | Yes | Yes (AI User only) |
| Azure AI Account Owner | Yes | No | Yes (AI User only) |
| Azure AI Owner | Yes | Yes | Yes |

> โš ๏ธ **Warning:** Azure AI User is auto-assigned via Portal but NOT via SDK/CLI. Automation must explicitly assign roles.

## Workflows

All scopes follow the pattern: `/subscriptions/<subscription-id>/resourceGroups/<resource-group>/providers/Microsoft.CognitiveServices/accounts/<foundry-resource-name>`

For project-level scoping, append `/projects/<project-name>`.

### 1. Assign User Permissions

```bash
az role assignment create --role "Azure AI User" --assignee "<user-email-or-object-id>" --scope "<foundry-scope>"
```

### 2. Assign Developer Permissions

```bash
# Project Manager (create projects, assign AI User roles)
az role assignment create --role "Azure AI Project Manager" --assignee "<user-email-or-object-id>" --scope "<foundry-scope>"

# Full ownership including data actions
az role assignment create --role "Azure AI Owner" --assignee "<user-email-or-object-id>" --scope "<foundry-scope>"
```

### 3. Audit Role Assignments

```bash
# List all assignments
az role assignment list --scope "<foundry-scope>" --output table

# Detailed with principal names
az role assignment list --scope "<foundry-scope>" --query "[].{Principal:principalName, PrincipalType:principalType, Role:roleDefinitionName}" --output table

# Azure AI roles only
az role assignment list --scope "<foundry-scope>" --query "[?contains(roleDefinitionName, 'Azure AI')].{Principal:principalName, Role:roleDefinitionName}" --output table
```

### 4. Validate Permissions

```bash
# Current user's roles on resource
az role assignment list --assignee "$(az ad signed-in-user show --query id -o tsv)" --scope "<foundry-scope>" --query "[].roleDefinitionName" --output tsv

# Check actions available to a role
az role definition list --name "Azure AI User" --query "[].permissions[].actions" --output json
```

**Permission Requirements by Action:**

| Action | Required Role(s) |
|--------|------------------|
| Deploy models | Azure AI User, Azure AI Project Manager, Azure AI Owner |
| Create projects | Azure AI Project Manager, Azure AI Account Owner, Azure AI Owner |
| Assign Azure AI User role | Azure AI Project Manager, Azure AI Account Owner, Azure AI Owner |
| Full data access | Azure AI User, Azure AI Project Manager, Azure AI Owner |

### 5. Configure Managed Identity Roles

```bash
# Get managed identity principal ID
PRINCIPAL_ID=$(az cognitiveservices account show --name <foundry-resource-name> --resource-group <resource-group> --query identity.principalId --output tsv)

# Assign roles to connected resources (repeat pattern for each)
az role assignment create --role "<role-name>" --assignee "$PRINCIPAL_ID" --scope "<resource-scope>"
```

**Common Managed Identity Role Assignments:**

| Connected Resource | Role | Purpose |
|--------------------|------|---------|
| Azure Storage | Storage Blob Data Reader | Read files/documents |
| Azure Storage | Storage Blob Data Contributor | Read/write files |
| Azure Key Vault | Key Vault Secrets User | Read secrets |
| Azure AI Search | Search Index Data Reader | Query indexes |
| Azure AI Search | Search Index Data Contributor | Query and modify indexes |
| Azure Cosmos DB | Cosmos DB Account Reader | Read data |

### 6. Create Service Principal for CI/CD

```bash
# Create SP with minimal role
az ad sp create-for-rbac --name "foundry-cicd-sp" --role "Azure AI User" --scopes "<foundry-scope>" --output json
# Output contains: appId, password, tenant โ€” store securely

# For project management permissions
az ad sp create-for-rbac --name "foundry-cicd-admin-sp" --role "Azure AI Project Manager" --scopes "<foundry-scope>" --output json

# Add Contributor for resource provisioning
SP_APP_ID=$(az ad sp list --display-name "foundry-cicd-sp" --query "[0].appId" -o tsv)
az role assignment create --role "Contributor" --assignee "$SP_APP_ID" --scope "/subscriptions/<subscription-id>/resourceGroups/<resource-group>"
```

> ๐Ÿ’ก **Tip:** Use least privilege โ€” start with `Azure AI User` and add roles as needed.

| CI/CD Scenario | Recommended Role | Additional Roles |
|----------------|------------------|------------------|
| Deploy models only | Azure AI User | None |
| Manage projects | Azure AI Project Manager | None |
| Full provisioning | Azure AI Owner | Contributor (on RG) |
| Read-only monitoring | Reader | Azure AI User (for data) |

**CI/CD Pipeline Login:**

```bash
az login --service-principal --username "<app-id>" --password "<client-secret>" --tenant "<tenant-id>"
az account set --subscription "<subscription-id>"
```

## Error Handling

| Issue | Cause | Resolution |
|-------|-------|------------|
| "Authorization failed" when deploying | Missing Azure AI User role | Assign Azure AI User role at resource scope |
| Cannot create projects | Missing Project Manager or Owner role | Assign Azure AI Project Manager role |
| "Access denied" on connected resources | Managed identity missing roles | Assign appropriate roles to MI on each resource |
| Portal works but CLI fails | Portal auto-assigns roles, CLI doesn't | Explicitly assign Azure AI User via CLI |
| Service principal cannot access data | Wrong role or scope | Verify Azure AI User is assigned at correct scope |
| "Principal does not exist" | User/SP not found in directory | Verify the assignee email or object ID is correct |
| Role assignment already exists | Duplicate assignment attempt | Use `az role assignment list` to verify existing assignments |

## Additional Resources

- [Azure AI Foundry RBAC Documentation](https://learn.microsoft.com/azure/ai-foundry/concepts/rbac-ai-foundry)
- [Azure Built-in Roles](https://learn.microsoft.com/azure/role-based-access-control/built-in-roles)
- [Managed Identities Overview](https://learn.microsoft.com/azure/active-directory/managed-identities-azure-resources/overview)
- [Service Principal Authentication](https://learn.microsoft.com/azure/developer/github/connect-from-azure)
references/
agent-metadata-contract.md 5.4 KB
# Agent Metadata Contract

Use this contract for every agent source folder that participates in Microsoft Foundry workflows.

## Required Local Layout

```text
<agent-root>/
  .foundry/
    agent-metadata.yaml
    datasets/
    evaluators/
    results/
```

- `agent-metadata.yaml` is the required source of truth for environment-specific Foundry configuration.
- `datasets/` and `evaluators/` are local cache folders. Reuse existing files when they are current, and ask before refreshing or overwriting them.
- `results/` stores local evaluation outputs and comparison artifacts by environment.

## Environment Model

| Field | Required | Purpose |
|-------|----------|---------|
| `defaultEnvironment` | โœ… | Environment used when the user does not choose one explicitly |
| `environments.<name>.projectEndpoint` | โœ… | Foundry project endpoint for that environment |
| `environments.<name>.agentName` | โœ… | Deployed Foundry agent name |
| `environments.<name>.azureContainerRegistry` | โœ… for hosted agents | ACR used for deployment and image refresh |
| `environments.<name>.observability.applicationInsightsResourceId` | Recommended | App Insights resource for trace workflows |
| `environments.<name>.observability.applicationInsightsConnectionString` | Optional | Connection string when needed for tooling |
| `environments.<name>.testCases[]` | โœ… | Dataset + local/remote references + evaluator + threshold bundles for evaluation workflows |

## Example `agent-metadata.yaml`

```yaml
defaultEnvironment: dev
environments:
  dev:
    projectEndpoint: https://contoso.services.ai.azure.com/api/projects/support-dev
    agentName: support-agent-dev
    azureContainerRegistry: contosoregistry.azurecr.io
    observability:
      applicationInsightsResourceId: /subscriptions/<sub>/resourceGroups/<rg>/providers/Microsoft.Insights/components/support-dev-ai
    testCases:
      - id: smoke-core
        priority: P0
        dataset: support-agent-dev-eval-seed
        datasetVersion: v1
        datasetFile: .foundry/datasets/support-agent-dev-eval-seed-v1.jsonl
        datasetUri: <foundry-dataset-uri>
        evaluators:
          - name: intent_resolution
            threshold: 4
          - name: task_adherence
            threshold: 4
          - name: citation_quality
            threshold: 0.9
            definitionFile: .foundry/evaluators/citation-quality.yaml
      - id: trace-regressions
        priority: P1
        dataset: support-agent-dev-traces
        datasetVersion: v3
        datasetFile: .foundry/datasets/support-agent-dev-traces-v3.jsonl
        datasetUri: <foundry-dataset-uri>
        evaluators:
          - name: coherence
            threshold: 4
          - name: groundedness
            threshold: 4
  prod:
    projectEndpoint: https://contoso.services.ai.azure.com/api/projects/support-prod
    agentName: support-agent-prod
    azureContainerRegistry: contosoregistry.azurecr.io
    testCases:
      - id: production-guardrails
        priority: P0
        dataset: support-agent-prod-curated
        datasetVersion: v2
        datasetFile: .foundry/datasets/support-agent-prod-curated-v2.jsonl
        datasetUri: <foundry-dataset-uri>
        evaluators:
          - name: violence
            threshold: 1
          - name: self_harm
            threshold: 1
```

## Workflow Rules

1. Auto-discover agent roots by searching for `.foundry/agent-metadata.yaml`.
2. If exactly one agent root is found, use it. If multiple roots are found, require the user to choose one.
3. Resolve environment in this order: explicit user choice, remembered session choice, `defaultEnvironment`.
4. Keep the selected agent root and environment visible in every deploy, eval, dataset, and trace summary.
5. Treat `datasets/` and `evaluators/` as cache folders. Reuse local files when present, but offer refresh when the user asks or when remote state is newer.
6. Never overwrite cache files or metadata silently.

## Test-Case Guidance

| Priority | Meaning | Typical Use |
|----------|---------|-------------|
| `P0` | Must-pass gate | Smoke checks, safety, deployment blockers |
| `P1` | High-value regression coverage | Production trace regressions, key business flows |
| `P2` | Broader quality coverage | Long-tail scenarios, exploratory quality checks |

Each test case should point to one dataset and one or more evaluators with explicit thresholds. Store `dataset` as the stable Foundry dataset name (without the `-vN` suffix), store the version separately in `datasetVersion`, and keep the local cache filename versioned (for example, `...-v3.jsonl`). Persist the local `datasetFile` and remote `datasetUri` together so every test case can resolve both the cache artifact and the Foundry-registered dataset. Local dataset filenames should start with the selected environment's Foundry `agentName`, followed by stage and version suffixes, so related cache files stay grouped by agent. If `agentName` already encodes the environment (for example, `support-agent-dev`), do not append the environment key again. Use test-case IDs in evaluation names, result folders, and regression summaries so the flow remains traceable.

## Sync Guidance

- Pull/refresh when the user asks, when the workflow detects missing local cache, or when remote versions clearly differ from local metadata.
- Push/register updates after the user confirms local changes that should be shared in Foundry.
- Record remote dataset names, versions, dataset URIs, and last sync timestamps in `.foundry/datasets/manifest.json` or the relevant metadata section.
auth-best-practices.md 6.5 KB
# Azure Authentication Best Practices

> Source: [Microsoft โ€” Passwordless connections for Azure services](https://learn.microsoft.com/azure/developer/intro/passwordless-overview) and [Azure Identity client libraries](https://learn.microsoft.com/dotnet/azure/sdk/authentication/).

**Table of Contents:** [Golden Rule](#golden-rule) ยท [Authentication by Environment](#authentication-by-environment) ยท [Why Not DefaultAzureCredential in Production?](#why-not-defaultazurecredential-in-production) ยท [Production Patterns](#production-patterns) ยท [Local Development Setup](#local-development-setup) ยท [Environment-Aware Pattern](#environment-aware-pattern) ยท [Security Checklist](#security-checklist) ยท [Further Reading](#further-reading)

## Golden Rule

Use **managed identities** and **Azure RBAC** in production. Reserve `DefaultAzureCredential` for **local development only**.

## Authentication by Environment

| Environment | Recommended Credential | Why |
|---|---|---|
| **Production (Azure-hosted)** | `ManagedIdentityCredential` (system- or user-assigned) | No secrets to manage; auto-rotated by Azure |
| **Production (on-premises)** | `ClientCertificateCredential` or `WorkloadIdentityCredential` | Deterministic; no fallback chain overhead |
| **CI/CD pipelines** | `AzurePipelinesCredential` / `WorkloadIdentityCredential` | Scoped to pipeline identity |
| **Local development** | `DefaultAzureCredential` | Chains CLI, PowerShell, and VS Code credentials for convenience |

## Why Not `DefaultAzureCredential` in Production?

1. **Unpredictable fallback chain** โ€” walks through multiple credential types, adding latency and making failures harder to diagnose.
2. **Broad surface area** โ€” checks environment variables, CLI tokens, and other sources that should not exist in production.
3. **Non-deterministic** โ€” which credential actually authenticates depends on the environment, making behavior inconsistent across deployments.
4. **Performance** โ€” each failed credential attempt adds network round-trips before falling back to the next.

## Production Patterns

### .NET

```csharp
using Azure.Identity;

var credential = Environment.GetEnvironmentVariable("AZURE_FUNCTIONS_ENVIRONMENT") == "Development"
    ? new DefaultAzureCredential()                          // local dev โ€” uses CLI/VS credentials
    : new ManagedIdentityCredential();                      // production โ€” deterministic, no fallback chain
// For user-assigned identity: new ManagedIdentityCredential("<client-id>")
```

### TypeScript / JavaScript

```typescript
import { DefaultAzureCredential, ManagedIdentityCredential } from "@azure/identity";

const credential = process.env.NODE_ENV === "development"
  ? new DefaultAzureCredential()                          // local dev โ€” uses CLI/VS credentials
  : new ManagedIdentityCredential();                      // production โ€” deterministic, no fallback chain
// For user-assigned identity: new ManagedIdentityCredential("<client-id>")
```

### Python

```python
import os
from azure.identity import DefaultAzureCredential, ManagedIdentityCredential

credential = (
    DefaultAzureCredential()                              # local dev โ€” uses CLI/VS credentials
    if os.getenv("AZURE_FUNCTIONS_ENVIRONMENT") == "Development"
    else ManagedIdentityCredential()                      # production โ€” deterministic, no fallback chain
)
# For user-assigned identity: ManagedIdentityCredential(client_id="<client-id>")
```

### Java

```java
import com.azure.identity.DefaultAzureCredentialBuilder;
import com.azure.identity.ManagedIdentityCredentialBuilder;

var credential = "Development".equals(System.getenv("AZURE_FUNCTIONS_ENVIRONMENT"))
    ? new DefaultAzureCredentialBuilder().build()          // local dev โ€” uses CLI/VS credentials
    : new ManagedIdentityCredentialBuilder().build();      // production โ€” deterministic, no fallback chain
// For user-assigned identity: new ManagedIdentityCredentialBuilder().clientId("<client-id>").build()
```

## Local Development Setup

`DefaultAzureCredential` is ideal for local dev because it automatically picks up credentials from developer tools:

1. **Azure CLI** โ€” `az login`
2. **Azure Developer CLI** โ€” `azd auth login`
3. **Azure PowerShell** โ€” `Connect-AzAccount`
4. **Visual Studio / VS Code** โ€” sign in via Azure extension

```typescript
import { DefaultAzureCredential } from "@azure/identity";

// Local development only โ€” uses CLI/PowerShell/VS Code credentials
const credential = new DefaultAzureCredential();
```

## Environment-Aware Pattern

Detect the runtime environment and select the appropriate credential. The key principle: use `DefaultAzureCredential` only when running locally, and a specific credential in production.

> **Tip:** Azure Functions sets `AZURE_FUNCTIONS_ENVIRONMENT` to `"Development"` when running locally. For App Service or containers, use any environment variable you control (e.g. `NODE_ENV`, `ASPNETCORE_ENVIRONMENT`).

```typescript
import { DefaultAzureCredential, ManagedIdentityCredential } from "@azure/identity";

function getCredential() {
  if (process.env.NODE_ENV === "development") {
    return new DefaultAzureCredential();          // picks up az login / VS Code creds
  }
  return process.env.AZURE_CLIENT_ID
    ? new ManagedIdentityCredential(process.env.AZURE_CLIENT_ID)  // user-assigned
    : new ManagedIdentityCredential();                            // system-assigned
}
```

## Security Checklist

- [ ] Use managed identity for all Azure-hosted apps
- [ ] Never hardcode credentials, connection strings, or keys
- [ ] Apply least-privilege RBAC roles at the narrowest scope
- [ ] Use `ManagedIdentityCredential` (not `DefaultAzureCredential`) in production
- [ ] Store any required secrets in Azure Key Vault
- [ ] Rotate secrets and certificates on a schedule
- [ ] Enable Microsoft Defender for Cloud on production resources

## Further Reading

- [Passwordless connections overview](https://learn.microsoft.com/azure/developer/intro/passwordless-overview)
- [Managed identities overview](https://learn.microsoft.com/entra/identity/managed-identities-azure-resources/overview)
- [Azure RBAC overview](https://learn.microsoft.com/azure/role-based-access-control/overview)
- [.NET authentication guide](https://learn.microsoft.com/dotnet/azure/sdk/authentication/)
- [Python identity library](https://learn.microsoft.com/python/api/overview/azure/identity-readme)
- [JavaScript identity library](https://learn.microsoft.com/javascript/api/overview/azure/identity-readme)
- [Java identity library](https://learn.microsoft.com/java/api/overview/azure/identity-readme)
private-network-standard-agent-setup.md 2.2 KB
# Private Network Standard Agent Setup

> **MANDATORY:** Read [Standard Agent Setup with Network Isolation docs](https://learn.microsoft.com/en-us/azure/ai-foundry/how-to/configure-private-link?tabs=azure-portal&pivots=fdp-project) before proceeding. It covers RBAC requirements, resource provider registration, and role assignments.

## Overview

Extends [standard agent setup](standard-agent-setup.md) with full VNet isolation using private endpoints and subnet delegation. All resources communicate over private network only.

## Networking Constraints

Two subnets required:

| Subnet | CIDR | Purpose | Delegation |
|--------|------|---------|------------|
| Agent Subnet | /24 (e.g., 192.168.0.0/24) | Agent workloads | `Microsoft.App/environments` (exclusive) |
| Private Endpoint Subnet | /24 (e.g., 192.168.1.0/24) | Private endpoints | None |

- All Foundry resources **must be in the same region as the VNet**.
- Agent subnet must be exclusive to one Foundry account.
- VNet address space must not overlap with existing networks or reserved ranges.

> โš ๏ธ **Warning:** If providing an existing VNet, ensure both subnets exist before deployment. Otherwise the template creates a new VNet with default address spaces.

## Deployment

**Always use the official Bicep template:**
[Private Network Standard Agent Setup Bicep](https://github.com/microsoft-foundry/foundry-samples/tree/main/infrastructure/infrastructure-setup-bicep/15-private-network-standard-agent-setup)

> โš ๏ธ **Warning:** Capability host provisioning is **asynchronous** (10โ€“20 minutes). Poll deployment status until success before proceeding.

## Post-Deployment

1. **Deploy a model** to the new AI Services account (e.g., `gpt-4o`). Fall back to `Standard` SKU if `GlobalStandard` quota is exhausted.
2. **Create the agent** using MCP tools (`agent_update`) or the Python SDK.

## References

- [Azure AI Foundry Networking](https://learn.microsoft.com/en-us/azure/ai-foundry/how-to/configure-private-link?tabs=azure-portal&pivots=fdp-project)
- [Azure AI Foundry RBAC](https://learn.microsoft.com/en-us/azure/ai-foundry/concepts/rbac-azure-ai-foundry?pivots=fdp-project)
- [Standard Agent Setup (public network)](standard-agent-setup.md)
standard-agent-setup.md 3.6 KB
# Standard Agent Setup

> **MANDATORY:** Read [Standard Agent Setup docs](https://learn.microsoft.com/en-us/azure/foundry/agents/concepts/standard-agent-setup?view=foundry) before proceeding with standard setup.

## Overview

Azure AI Foundry supports two agent setup configurations:

| Setup | Capability Host | Description |
|-------|----------------|-------------|
| **Basic** | None | Default setup. All resources are Microsoft-managed. No additional connections required. |
| **Standard** | Azure AI Services | Advanced setup. Bring-your-own storage and search connections for full control over data residency and scaling. |

## Standard Setup Connections

| Connection | Service | Required | Purpose |
|------------|---------|----------|---------|
| Thread storage | Azure Cosmos DB | โœ… Yes | Store conversation threads in your own Cosmos DB instance |
| File storage | Azure Storage | โœ… Yes | Store uploaded files in your own Azure Storage account |
| Vector store | Azure AI Search | โœ… Yes | Use your own Azure AI Search instance for vector/knowledge retrieval |
| Azure AI Services | Azure AI Services | โŒ Optional | Use OpenAI models from a different AI Services resource |

> ๐Ÿ’ก **Tip:** Standard setup is recommended for production workloads that require control over data storage, custom vector search, or integration with models from a separate AI Services resource.

## Prerequisites

Before starting deployment, confirm the following with the user:

1. **RBAC role on the resource group:** The user must have **Owner** or **User Access Administrator** role on the target resource group. The Bicep template assigns RBAC roles (Storage Blob Data Contributor, Cosmos DB Operator, AI Search roles) to the project's managed identity โ€” this will fail without `Microsoft.Authorization/roleAssignments/write` permission.
2. **Subscription quota:** Verify the target region has available quota for AI Services. If quota is exhausted, try an alternate region (e.g., `swedencentral`, `eastus`, `westus3`).
3. **Azure Policy compliance:** Some subscriptions enforce policies (e.g., storage accounts must disable public network access). If the Bicep template fails due to policy violations, patch the template to comply (e.g., set `publicNetworkAccess: 'Disabled'` and `defaultAction: 'Deny'` on the storage account).

## Deployment

- Standard setup always creates a **new Foundry resource and a new project**. Do not ask the user for a project endpoint โ€” one will be provisioned as part of the deployment.
- **Always use the official Bicep template:**
  [Standard Agent Setup Bicep Template](https://github.com/azure-ai-foundry/foundry-samples/blob/main/infrastructure/infrastructure-setup-bicep/43-standard-agent-setup-with-customization/main.bicep)

> โš ๏ธ **Warning:** Capability host provisioning is **asynchronous** and can take 10โ€“20 minutes. After deploying the Bicep template, you **must poll** the deployment status until it succeeds. Do not assume the setup is complete immediately.

## Post-Deployment: Model & Agent

After infrastructure provisioning succeeds:

1. **Deploy a model** to the new AI Services account (e.g., `gpt-4o`). If `GlobalStandard` SKU quota is exhausted, fall back to `Standard` SKU.
2. **Create the agent** using MCP tools (`agent_update`) or the Python SDK (`client.agents.create_version`). See [SDK Operations](../foundry-agent/create/references/sdk-operations.md) for details.

## References

- [Capability Hosts โ€” Agent Setup Types](https://learn.microsoft.com/en-us/azure/ai-foundry/agents/concepts/capability-hosts?view=foundry)
- [Standard Agent Setup](https://learn.microsoft.com/en-us/azure/foundry/agents/concepts/standard-agent-setup?view=foundry)
references/sdk/
foundry-sdk-py.md 8.4 KB
# Microsoft Foundry - Python SDK Guide

Python-specific implementations for working with Microsoft Foundry.

**Table of Contents:** [Prerequisites](#prerequisites) ยท [Model Discovery and Deployment](#model-discovery-and-deployment-mcp) ยท [RAG Agent with Azure AI Search](#rag-agent-with-azure-ai-search) ยท [Creating Agents](#creating-agents) ยท [Agent Evaluation](#agent-evaluation) ยท [Knowledge Index Operations](#knowledge-index-operations-mcp) ยท [Best Practices](#best-practices) ยท [Error Handling](#error-handling)

## Prerequisites

```bash
pip install azure-ai-projects azure-identity azure-ai-inference openai azure-ai-evaluation python-dotenv
```

### Environment Variables

```bash
PROJECT_ENDPOINT=https://<resource>.services.ai.azure.com/api/projects/<project>
MODEL_DEPLOYMENT_NAME=gpt-4o
AZURE_AI_SEARCH_CONNECTION_NAME=my-search-connection
AI_SEARCH_INDEX_NAME=my-index
AZURE_OPENAI_ENDPOINT=https://<resource>.openai.azure.com
AZURE_OPENAI_DEPLOYMENT=gpt-4o
```

## Model Discovery and Deployment (MCP)

```python
foundry_models_list()                              # All models
foundry_models_list(publisher="OpenAI")             # Filter by publisher
foundry_models_list(search_for_free_playground=True) # Free playground models

foundry_models_deploy(
    resource_group="my-rg", deployment="gpt-4o-deployment",
    model_name="gpt-4o", model_format="OpenAI",
    azure_ai_services="my-foundry-resource",
    model_version="2024-05-13", sku_capacity=10, scale_type="Standard"
)
```

## RAG Agent with Azure AI Search

> **Auth:** `DefaultAzureCredential` is for local development. See [auth-best-practices.md](../auth-best-practices.md) for production patterns.

```python
import os
from azure.ai.projects import AIProjectClient
from azure.identity import DefaultAzureCredential
from azure.ai.agents.models import (
    AzureAISearchToolDefinition, AzureAISearchToolResource,
    AISearchIndexResource, AzureAISearchQueryType,
)

project_client = AIProjectClient(
    endpoint=os.environ["FOUNDRY_PROJECT_ENDPOINT"],
    credential=DefaultAzureCredential(),
)

azs_connection = project_client.connections.get(
    os.environ["AZURE_AI_SEARCH_CONNECTION_NAME"]
)

agent = project_client.agents.create_agent(
    model=os.environ["FOUNDRY_MODEL_DEPLOYMENT_NAME"],
    name="RAGAgent",
    instructions="You are a helpful assistant. Use the knowledge base to answer. "
        "Provide citations as: `[message_idx:search_idxโ€ source]`.",
    tools=[AzureAISearchToolDefinition(
        azure_ai_search=AzureAISearchToolResource(indexes=[
            AISearchIndexResource(
                index_connection_id=azs_connection.id,
                index_name=os.environ["AI_SEARCH_INDEX_NAME"],
                query_type=AzureAISearchQueryType.HYBRID,
            ),
        ])
    )],
)
```

### Querying a RAG Agent (Streaming)

```python
openai_client = project_client.get_openai_client()

stream = openai_client.responses.create(
    stream=True, tool_choice="required", input="Your question here",
    extra_body={"agent": {"name": agent.name, "type": "agent_reference"}},
)
for event in stream:
    if event.type == "response.output_text.delta":
        print(event.delta, end="", flush=True)
    elif event.type == "response.output_item.done":
        if event.item.type == "message" and event.item.content[-1].type == "output_text":
            for ann in event.item.content[-1].annotations:
                if ann.type == "url_citation":
                    print(f"\nCitation: {ann.url}")
```

## Creating Agents

### Basic Agent

```python
agent = project_client.agents.create_agent(
    model=os.environ["MODEL_DEPLOYMENT_NAME"],
    name="my-agent",
    instructions="You are a helpful assistant.",
)
```

### Agent with Custom Function Tools

```python
from azure.ai.agents.models import FunctionTool, ToolSet

def get_weather(location: str, unit: str = "celsius") -> str:
    """Get the current weather for a location."""
    return f"Sunny and 22ยฐ{unit[0].upper()} in {location}"

functions = FunctionTool([get_weather])
toolset = ToolSet()
toolset.add(functions)

agent = project_client.agents.create_agent(
    model=os.environ["MODEL_DEPLOYMENT_NAME"],
    name="function-agent",
    instructions="You are a helpful assistant with tool access.",
    toolset=toolset,
)
```

### Agent with Web Search

```python
from azure.ai.projects.models import (
    PromptAgentDefinition, WebSearchPreviewTool, ApproximateLocation,
)

agent = project_client.agents.create_version(
    agent_name="WebSearchAgent",
    definition=PromptAgentDefinition(
        model=os.environ["MODEL_DEPLOYMENT_NAME"],
        instructions="Search the web for current information. Provide sources.",
        tools=[
            WebSearchPreviewTool(
                user_location=ApproximateLocation(
                    country="US", city="Seattle", region="Washington"
                )
            )
        ],
    ),
)
```

> ๐Ÿ’ก **Tip:** `WebSearchPreviewTool` requires no external resource or connection. For Bing Grounding (which requires a dedicated Bing resource and project connection), see [Bing Grounding reference](../../foundry-agent/create/references/tool-bing-grounding.md).

### Interacting with Agents

```python
from azure.ai.agents.models import ListSortOrder

thread = project_client.agents.threads.create()
project_client.agents.messages.create(thread_id=thread.id, role="user", content="Hello")

run = project_client.agents.runs.create_and_process(thread_id=thread.id, agent_id=agent.id)
if run.status == "failed":
    print(f"Run failed: {run.last_error}")

messages = project_client.agents.messages.list(thread_id=thread.id, order=ListSortOrder.ASCENDING)
for msg in messages:
    if msg.text_messages:
        print(f"{msg.role}: {msg.text_messages[-1].text.value}")

project_client.agents.delete_agent(agent.id)
```

## Agent Evaluation

### Single Response Evaluation (MCP)

```python
foundry_agents_query_and_evaluate(
    agent_id="<agent-id>", query="What's the weather?",
    endpoint="https://my-foundry.services.ai.azure.com/api/projects/my-project",
    azure_openai_endpoint="https://my-openai.openai.azure.com",
    azure_openai_deployment="gpt-4o",
    evaluators="intent_resolution,task_adherence,tool_call_accuracy"
)

foundry_agents_evaluate(
    query="What's the weather?", response="Sunny and 22ยฐC.",
    evaluator="intent_resolution",
    azure_openai_endpoint="https://my-openai.openai.azure.com",
    azure_openai_deployment="gpt-4o"
)
```

### Batch Evaluation

```python
from azure.ai.evaluation import AIAgentConverter, IntentResolutionEvaluator, evaluate

converter = AIAgentConverter(project_client)
converter.prepare_evaluation_data(thread_ids=["t1", "t2", "t3"], filename="eval_data.jsonl")

result = evaluate(
    data="eval_data.jsonl",
    evaluators={
        "intent_resolution": IntentResolutionEvaluator(
            azure_openai_endpoint=os.environ["AZURE_OPENAI_ENDPOINT"],
            azure_openai_deployment=os.environ["AZURE_OPENAI_DEPLOYMENT"]
        ),
    },
    output_path="./eval_results"
)
print(f"Results: {result['studio_url']}")
```

> ๐Ÿ’ก **Tip:** Continuous evaluation requires project managed identity with **Azure AI User** role and Application Insights connected to the project.

## Knowledge Index Operations (MCP)

```python
foundry_knowledge_index_list(endpoint="<project-endpoint>")
foundry_knowledge_index_schema(endpoint="<project-endpoint>", index="my-index")
```

## Best Practices

1. **Never hardcode credentials** โ€” use environment variables and `python-dotenv`
2. **Check `run.status`** and handle `HttpResponseError` exceptions
3. **Reuse `AIProjectClient`** instances โ€” don't create new ones per request
4. **Use type hints** in custom functions for better tool integration
5. **Use context managers** for agent cleanup

## Error Handling

```python
from azure.core.exceptions import HttpResponseError

try:
    agent = project_client.agents.create_agent(
        model=os.environ["MODEL_DEPLOYMENT_NAME"],
        name="my-agent", instructions="You are helpful."
    )
except HttpResponseError as e:
    if e.status_code == 429:
        print("Rate limited โ€” wait and retry with exponential backoff.")
    elif e.status_code == 401:
        print("Authentication failed โ€” check credentials.")
    else:
        print(f"Error: {e.message}")
```

### Context Manager for Agent Cleanup

```python
from contextlib import contextmanager

@contextmanager
def temporary_agent(project_client, **kwargs):
    agent = project_client.agents.create_agent(**kwargs)
    try:
        yield agent
    finally:
        project_client.agents.delete_agent(agent.id)
```
resource/create/
create-foundry-resource.md 6.1 KB
---
name: microsoft-foundry:resource/create
description: |
  Create Azure AI Services multi-service resource (Foundry resource) using Azure CLI.
  USE FOR: create Foundry resource, new AI Services resource, create multi-service resource, provision Azure AI Services, AIServices kind resource, register resource provider, enable Cognitive Services, setup AI Services account, create resource group for Foundry.
  DO NOT USE FOR: creating ML workspace hubs (use microsoft-foundry:project/create), deploying models (use microsoft-foundry:models/deploy), managing permissions (use microsoft-foundry:rbac), monitoring resource usage (use microsoft-foundry:quota).
compatibility:
  required:
    - azure-cli: ">=2.0"
  optional:
    - powershell: ">=7.0"
    - azure-portal: "any"
---

# Create Foundry Resource

This sub-skill orchestrates creation of Azure AI Services multi-service resources using Azure CLI.

> **Important:** All resource creation operations are **control plane (management)** operations. Use **Azure CLI commands** as the primary method.

> **Note:** For monitoring resource usage and quotas, use the `microsoft-foundry:quota` skill.

**Table of Contents:** [Quick Reference](#quick-reference) ยท [When to Use](#when-to-use) ยท [Prerequisites](#prerequisites) ยท [Core Workflows](#core-workflows) ยท [Important Notes](#important-notes) ยท [Additional Resources](#additional-resources)

## Quick Reference

| Property | Value |
|----------|-------|
| **Classification** | WORKFLOW SKILL |
| **Operation Type** | Control Plane (Management) |
| **Primary Method** | Azure CLI: `az cognitiveservices account create` |
| **Resource Type** | `Microsoft.CognitiveServices/accounts` (kind: `AIServices`) |
| **Resource Kind** | `AIServices` (multi-service) |

## When to Use

Use this sub-skill when you need to:

- **Create Foundry resource** - Provision new Azure AI Services multi-service account
- **Create resource group** - Set up resource group before creating resources
- **Register resource provider** - Enable Microsoft.CognitiveServices provider
- **Manual resource creation** - CLI-based resource provisioning

**Do NOT use for:**
- Creating ML workspace hubs/projects (use `microsoft-foundry:project/create`)
- Deploying AI models (use `microsoft-foundry:models/deploy`)
- Managing RBAC permissions (use `microsoft-foundry:rbac`)
- Monitoring resource usage (use `microsoft-foundry:quota`)

## Prerequisites

- **Azure subscription** - Active subscription ([create free account](https://azure.microsoft.com/pricing/purchase-options/azure-account))
- **Azure CLI** - Version 2.0 or later installed
- **Authentication** - Run `az login` before commands
- **RBAC roles** - One of:
  - Contributor
  - Owner
  - Custom role with `Microsoft.CognitiveServices/accounts/write`
- **Resource provider** - `Microsoft.CognitiveServices` must be registered in your subscription
  - If not registered, see [Workflow #3: Register Resource Provider](#3-register-resource-provider)
  - If you lack permissions, ask a subscription Owner/Contributor to register it or grant you `/register/action` privilege

> **Need RBAC help?** See [microsoft-foundry:rbac](../../rbac/rbac.md) for permission management.

## Core Workflows

### 1. Create Resource Group

**Command Pattern:** "Create a resource group for my Foundry resources"

#### Steps

1. **Ask user preference**: Use existing or create new resource group
2. **If using existing**: List and let user select from available groups (0-4: show all, 5+: show 5 most recent with "Other" option)
3. **If creating new**: Ask user to choose region, then create

```bash
# List existing resource groups
az group list --query "[-5:].{Name:name, Location:location}" --out table

# Or create new
az group create --name <rg-name> --location <location>
az group show --name <rg-name> --query "{Name:name, Location:location, State:properties.provisioningState}"
```

See [Detailed Workflow Steps](./references/workflows.md) for complete instructions.

---

### 2. Create Foundry Resource

**Command Pattern:** "Create a new Azure AI Services resource"

#### Steps

1. **Verify prerequisites**: Check Azure CLI, authentication, and provider registration
2. **Choose location**: Always ask user to select region (don't assume resource group location)
3. **Create resource**: Use `--kind AIServices` and `--sku S0` (only supported tier)
4. **Verify and get keys**

```bash
# Create Foundry resource
az cognitiveservices account create \
  --name <resource-name> \
  --resource-group <rg> \
  --kind AIServices \
  --sku S0 \
  --location <location> \
  --yes

# Verify and get keys
az cognitiveservices account show --name <resource-name> --resource-group <rg>
az cognitiveservices account keys list --name <resource-name> --resource-group <rg>
```

**Important:** S0 (Standard) is the only supported SKU - F0 free tier not available for AIServices.

See [Detailed Workflow Steps](./references/workflows.md) for complete instructions.

---

### 3. Register Resource Provider

**Command Pattern:** "Register Cognitive Services provider"

Required when first creating Cognitive Services in subscription or if you get `ResourceProviderNotRegistered` error.

```bash
# Register provider (requires Owner/Contributor role)
az provider register --namespace Microsoft.CognitiveServices
az provider show --namespace Microsoft.CognitiveServices --query "registrationState"
```

If you lack permissions, ask a subscription Owner/Contributor to register it or use `microsoft-foundry:rbac` skill.

See [Detailed Workflow Steps](./references/workflows.md) for complete instructions.

---

## Important Notes

- **Resource kind must be `AIServices`** for multi-service Foundry resources
- **SKU must be S0** (Standard) - F0 free tier not available for AIServices
- Always ask user to choose location - different regions may have varying availability

---

## Additional Resources

- [Common Patterns](./references/patterns.md) - Quick setup patterns and command reference
- [Troubleshooting](./references/troubleshooting.md) - Common errors and solutions
- [Azure AI Services documentation](https://learn.microsoft.com/en-us/azure/ai-services/multi-service-resource?pivots=azcli)
resource/create/references/
patterns.md 3.9 KB
# Common Patterns: Create Foundry Resource

**Table of Contents:** [Pattern A: Quick Setup](#pattern-a-quick-setup) ยท [Pattern B: Multi-Region Setup](#pattern-b-multi-region-setup) ยท [Quick Commands Reference](#quick-commands-reference)

## Pattern A: Quick Setup

Complete setup in one go:

```bash
# Ask user: "Use existing resource group or create new?"

# ==== If user chooses "Use existing" ====
# Count and list existing resource groups
TOTAL_RG_COUNT=$(az group list --query "length([])" -o tsv)
az group list --query "[-5:].{Name:name, Location:location}" --out table

# Based on count: show appropriate list and options
# User selects resource group
RG="<selected-rg-name>"

# Fetch details to verify
az group show --name $RG --query "{Name:name, Location:location, State:properties.provisioningState}"
# Then skip to creating Foundry resource below

# ==== If user chooses "Create new" ====
# List regions and ask user to choose
az account list-locations --query "[].{Region:name}" --out table

# Variables
RG="rg-ai-services"  # New resource group name
LOCATION="westus2"  # User's chosen location
RESOURCE_NAME="my-foundry-resource"

# Create new resource group
az group create --name $RG --location $LOCATION

# Verify creation
az group show --name $RG --query "{Name:name, Location:location, State:properties.provisioningState}"

# Create Foundry resource in user's chosen location
az cognitiveservices account create \
  --name $RESOURCE_NAME \
  --resource-group $RG \
  --kind AIServices \
  --sku S0 \
  --location $LOCATION \
  --yes

# Get endpoint and keys
echo "Resource created successfully!"
az cognitiveservices account show \
  --name $RESOURCE_NAME \
  --resource-group $RG \
  --query "{Endpoint:properties.endpoint, Location:location}"

az cognitiveservices account keys list \
  --name $RESOURCE_NAME \
  --resource-group $RG
```

## Pattern B: Multi-Region Setup

Create resources in multiple regions:

```bash
# Variables
RG="rg-ai-services"
REGIONS=("eastus" "westus2" "westeurope")

# Create resource group
az group create --name $RG --location eastus

# Create resources in each region
for REGION in "${REGIONS[@]}"; do
  RESOURCE_NAME="foundry-${REGION}"
  echo "Creating resource in $REGION..."

  az cognitiveservices account create \
    --name $RESOURCE_NAME \
    --resource-group $RG \
    --kind AIServices \
    --sku S0 \
    --location $REGION \
    --yes

  echo "Resource $RESOURCE_NAME created in $REGION"
done

# List all resources
az cognitiveservices account list --resource-group $RG --output table
```

## Quick Commands Reference

```bash
# Count total resource groups to determine which scenario applies
az group list --query "length([])" -o tsv

# Check existing resource groups (up to 5 most recent)
# 0 โ†’ create new | 1-4 โ†’ select or create | 5+ โ†’ select/other/create
az group list --query "[-5:].{Name:name, Location:location}" --out table

# If 5+ resource groups exist and user selects "Other", show all
az group list --query "[].{Name:name, Location:location}" --out table

# If user selects existing resource group, fetch details to verify and get location
az group show --name <selected-rg-name> --query "{Name:name, Location:location, State:properties.provisioningState}"

# List available regions (for creating new resource group)
az account list-locations --query "[].{Region:name}" --out table

# Create resource group (if needed)
az group create --name rg-ai-services --location westus2

# Create Foundry resource
az cognitiveservices account create \
  --name my-foundry-resource \
  --resource-group rg-ai-services \
  --kind AIServices \
  --sku S0 \
  --location westus2 \
  --yes

# List resources in group
az cognitiveservices account list --resource-group rg-ai-services

# Get resource details
az cognitiveservices account show \
  --name my-foundry-resource \
  --resource-group rg-ai-services

# Delete resource
az cognitiveservices account delete \
  --name my-foundry-resource \
  --resource-group rg-ai-services
```
troubleshooting.md 2.4 KB
# Troubleshooting: Create Foundry Resource

## Resource Creation Failures

### ResourceProviderNotRegistered

**Solution:**
1. If you have Owner/Contributor role, register the provider:
   ```bash
   az provider register --namespace Microsoft.CognitiveServices
   ```
2. If you lack permissions, ask a subscription Owner or Contributor to register it
3. Alternatively, ask them to grant you the `/register/action` privilege

### InsufficientPermissions

**Solution:**
```bash
# Check your role assignments
az role assignment list --assignee <your-user-id> --subscription <subscription-id>

# You need: Contributor, Owner, or custom role with Microsoft.CognitiveServices/accounts/write
```

Use `microsoft-foundry:rbac` skill to manage permissions.

### LocationNotAvailableForResourceType

**Solution:**
```bash
# List available regions for Cognitive Services
az provider show --namespace Microsoft.CognitiveServices \
  --query "resourceTypes[?resourceType=='accounts'].locations" --out table

# Choose different region from the list
```

### ResourceNameNotAvailable

Resource name must be globally unique. Try adding a unique suffix:

```bash
UNIQUE_SUFFIX=$(date +%s)
az cognitiveservices account create \
  --name "foundry-${UNIQUE_SUFFIX}" \
  --resource-group <rg> \
  --kind AIServices \
  --sku S0 \
  --location <location> \
  --yes
```

## Resource Shows as Failed

**Check provisioning state:**
```bash
az cognitiveservices account show \
  --name <resource-name> \
  --resource-group <rg> \
  --query "properties.provisioningState"
```

If `Failed`, delete and recreate:
```bash
# Delete failed resource
az cognitiveservices account delete \
  --name <resource-name> \
  --resource-group <rg>

# Recreate
az cognitiveservices account create \
  --name <resource-name> \
  --resource-group <rg> \
  --kind AIServices \
  --sku S0 \
  --location <location> \
  --yes
```

## Cannot Access Keys

**Error:** `AuthorizationFailed` when listing keys

**Solution:** You need `Cognitive Services User` or higher role on the resource.

Use `microsoft-foundry:rbac` skill to grant appropriate permissions.

## External Resources

- [Create multi-service resource](https://learn.microsoft.com/en-us/azure/ai-services/multi-service-resource?pivots=azcli)
- [Azure AI Services documentation](https://learn.microsoft.com/en-us/azure/ai-services/)
- [Azure regions with AI Services](https://azure.microsoft.com/global-infrastructure/services/?products=cognitive-services)
workflows.md 6.7 KB
# Detailed Workflows: Create Foundry Resource

**Table of Contents:** [Workflow 1: Create Resource Group](#workflow-1-create-resource-group---detailed-steps) ยท [Workflow 2: Create Foundry Resource](#workflow-2-create-foundry-resource---detailed-steps) ยท [Workflow 3: Register Resource Provider](#workflow-3-register-resource-provider---detailed-steps)

## Workflow 1: Create Resource Group - Detailed Steps

### Step 1: Ask user preference

Ask the user which option they prefer:
1. Use an existing resource group
2. Create a new resource group

### Step 2a: If user chooses "Use existing resource group"

Count and list existing resource groups:

```bash
# Count total resource groups
TOTAL_RG_COUNT=$(az group list --query "length([])" -o tsv)

# Get list of resource groups (up to 5 most recent)
az group list --query "[-5:].{Name:name, Location:location}" --out table
```

**Handle based on count:**

**If 0 resources found:**
- Inform user: "No existing resource groups found"
- Ask if they want to create a new one, then proceed to Step 2b

**If 1-4 resources found:**
- Display all X resource groups to the user
- Let user select from the list
- Fetch the selected resource group details:
  ```bash
  az group show --name <selected-rg-name> --query "{Name:name, Location:location, State:properties.provisioningState}"
  ```
- Display details to user, then proceed to create Foundry resource

**If 5+ resources found:**
- Display the 5 most recent resource groups
- Present options:
  1. Select from the 5 displayed
  2. Other (see all resource groups)
- If user selects a resource group, fetch details:
  ```bash
  az group show --name <selected-rg-name> --query "{Name:name, Location:location, State:properties.provisioningState}"
  ```
- If user chooses "Other", show all:
  ```bash
  az group list --query "[].{Name:name, Location:location}" --out table
  ```
  Then let user select, and fetch details as above
- Display details to user, then proceed to create Foundry resource

### Step 2b: If user chooses "Create new resource group"

1. List available Azure regions:

```bash
az account list-locations --query "[].{Region:name}" --out table
```

Common regions:
- `eastus`, `eastus2` - US East Coast
- `westus`, `westus2`, `westus3` - US West Coast
- `centralus` - US Central
- `westeurope`, `northeurope` - Europe
- `southeastasia`, `eastasia` - Asia Pacific

2. Ask user to choose a region from the list above

3. Create resource group in the chosen region:

```bash
az group create \
  --name <resource-group-name> \
  --location <user-chosen-location>
```

4. Verify creation:

```bash
az group show --name <resource-group-name> --query "{Name:name, Location:location, State:properties.provisioningState}"
```

Expected output: `State: "Succeeded"`

## Workflow 2: Create Foundry Resource - Detailed Steps

### Step 1: Verify prerequisites

```bash
# Check Azure CLI version (need 2.0+)
az --version

# Verify authentication
az account show

# Check resource provider registration status
az provider show --namespace Microsoft.CognitiveServices --query "registrationState"
```

If provider not registered, see Workflow #3: Register Resource Provider.

### Step 2: Choose location

**Always ask the user to choose a location.** List available regions and let the user select:

```bash
# List available regions for Cognitive Services
az account list-locations --query "[].{Region:name, DisplayName:displayName}" --out table
```

Common regions for AI Services:
- `eastus`, `eastus2` - US East Coast
- `westus`, `westus2`, `westus3` - US West Coast
- `centralus` - US Central
- `westeurope`, `northeurope` - Europe
- `southeastasia`, `eastasia` - Asia Pacific

> **Important:** Do not automatically use the resource group's location. Always ask the user which region they prefer.

### Step 3: Create Foundry resource

```bash
az cognitiveservices account create \
  --name <resource-name> \
  --resource-group <rg> \
  --kind AIServices \
  --sku S0 \
  --location <location> \
  --yes
```

**Parameters:**
- `--name`: Unique resource name (globally unique across Azure)
- `--resource-group`: Existing resource group name
- `--kind`: **Must be `AIServices`** for multi-service resource
- `--sku`: Must be **S0** (Standard - the only supported tier for AIServices)
- `--location`: Azure region (**always ask user to choose** from available regions)
- `--yes`: Auto-accept terms without prompting

### Step 4: Verify resource creation

```bash
# Check resource details to verify creation
az cognitiveservices account show \
  --name <resource-name> \
  --resource-group <rg>

# View endpoint and configuration
az cognitiveservices account show \
  --name <resource-name> \
  --resource-group <rg> \
  --query "{Name:name, Endpoint:properties.endpoint, Location:location, Kind:kind, SKU:sku.name}"
```

Expected output:
- `provisioningState: "Succeeded"`
- Endpoint URL
- SKU: S0
- Kind: AIServices

### Step 5: Get access keys

```bash
az cognitiveservices account keys list \
  --name <resource-name> \
  --resource-group <rg>
```

This returns `key1` and `key2` for API authentication.

## Workflow 3: Register Resource Provider - Detailed Steps

### When Needed

Required when:
- First time creating Cognitive Services in subscription
- Error: `ResourceProviderNotRegistered`
- Insufficient permissions during resource creation

### Steps

**Step 1: Check registration status**

```bash
az provider show \
  --namespace Microsoft.CognitiveServices \
  --query "registrationState"
```

Possible states:
- `Registered`: Ready to use
- `NotRegistered`: Needs registration
- `Registering`: Registration in progress

**Step 2: Register provider**

```bash
az provider register --namespace Microsoft.CognitiveServices
```

**Step 3: Wait for registration**

Registration typically takes 1-2 minutes. Check status:

```bash
az provider show \
  --namespace Microsoft.CognitiveServices \
  --query "registrationState"
```

Wait until state is `Registered`.

**Step 4: Verify registration**

```bash
az provider list --query "[?namespace=='Microsoft.CognitiveServices']"
```

### Required Permissions

To register a resource provider, you need one of:
- **Subscription Owner** role
- **Contributor** role
- **Custom role** with `Microsoft.*/register/action` permission

**If you are not the subscription owner:**
1. Ask someone with the **Owner** or **Contributor** role to register the provider for you
2. Alternatively, ask them to grant you the `/register/action` privilege so you can register it yourself

**Alternative registration methods:**
- **Azure CLI** (recommended): `az provider register --namespace Microsoft.CognitiveServices`
- **Azure Portal**: Navigate to Subscriptions โ†’ Resource providers โ†’ Microsoft.CognitiveServices โ†’ Register
- **PowerShell**: `Register-AzResourceProvider -ProviderNamespace Microsoft.CognitiveServices`

License (MIT)

View full license text
MIT License

Copyright 2025 (c) Microsoft Corporation.

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.