Installation
gh skills-hub install microsoft-foundry Don't have the extension? Run gh extension install samueltauil/skills-hub first.
Download and extract to your repository:
.github/skills/microsoft-foundry/ Extract the ZIP to .github/skills/ in your repo. The folder name must match microsoft-foundry for Copilot to auto-discover it.
Skill Files (93)
SKILL.md 15.4 KB
---
name: microsoft-foundry
description: "Deploy, evaluate, and manage Foundry agents end-to-end: Docker build, ACR push, hosted/prompt agent create, container start, batch eval, continuous eval, prompt optimizer workflows, agent.yaml, dataset curation from traces. USE FOR: deploy agent to Foundry, hosted agent, create agent, invoke agent, evaluate agent, run batch eval, continuous eval, continuous monitoring, continuous eval status, optimize prompt, improve prompt, prompt optimizer, optimize agent instructions, improve agent instructions, optimize system prompt, deploy model, Foundry project, RBAC, role assignment, permissions, quota, capacity, region, troubleshoot agent, deployment failure, create dataset from traces, dataset versioning, eval trending, create AI Services, Cognitive Services, create Foundry resource, provision resource, knowledge index, agent monitoring, customize deployment, onboard, availability. DO NOT USE FOR: Azure Functions, App Service, general Azure deploy (use azure-deploy), general Azure prep (use azure-prepare)."
license: MIT
metadata:
author: Microsoft
version: "1.1.14"
---
# Microsoft Foundry Skill
This skill helps developers work with Microsoft Foundry resources, covering model discovery and deployment, complete dev lifecycle of AI agent, evaluation workflows, and troubleshooting.
## Pre-Execution Requirements
> **MANDATORY: Before executing ANY workflow, you MUST first call the Azure MCP `foundry` tool and inspect the available Foundry MCP tools and related parameters.** Treat this initial `foundry` call as a discovery/help step. For this skill, Azure MCP `foundry` is the required entry point for Foundry-related MCP operations.
## Sub-Skills
> **MANDATORY: Before executing ANY workflow-specific steps, you MUST read the corresponding sub-skill document.** Do not call workflow-specific MCP tools for a workflow without reading its skill document. This applies even if you already know the MCP tool parameters β the skill document contains required workflow steps, pre-checks, and validation logic that must be followed. This rule applies on every new user message that triggers a different workflow, even if the skill is already loaded.
This skill includes specialized sub-skills for specific workflows. **Use these instead of the main skill when they match your task:**
| Sub-Skill | When to Use | Reference |
|-----------|-------------|-----------|
| **deploy** | Containerize, build, push to ACR, create/update/clone agent deployments | [deploy](foundry-agent/deploy/deploy.md) |
| **invoke** | Send messages to an agent, single or multi-turn conversations | [invoke](foundry-agent/invoke/invoke.md) |
| **observe** | Evaluate agent quality, run batch evals, analyze failures, optimize prompts, improve agent instructions, compare versions, set up CI/CD monitoring, and enable continuous production evaluation | [observe](foundry-agent/observe/observe.md) |
| **trace** | Query traces, analyze latency/failures, correlate eval results to specific responses via App Insights `customEvents` | [trace](foundry-agent/trace/trace.md) |
| **troubleshoot** | View hosted agent logs, query telemetry, diagnose failures | [troubleshoot](foundry-agent/troubleshoot/troubleshoot.md) |
| **create** | Create new hosted agent applications. Supports Microsoft Agent Framework, LangGraph, or custom frameworks in Python or C#, across `responses` or `invocations` protocols. | [create](foundry-agent/create/create-hosted.md) |
| **faos-optimize** | Convert existing Python agent code to a FAOS (Foundry Agent Optimization Service) optimization-ready version by wiring evaluator-targeted instructions/model/temperature knobs, then stop for review before deployment. | [faos-optimize](foundry-agent/faos-optimize/faos-optimize.md) |
| **eval-datasets** | Harvest production traces into evaluation datasets, manage dataset versions and splits, track evaluation metrics over time, detect regressions, and maintain full lineage from trace to deployment. Use for: create dataset from traces, dataset versioning, evaluation trending, regression detection, dataset comparison, eval lineage. | [eval-datasets](foundry-agent/eval-datasets/eval-datasets.md) |
| **project/create** | Creating a new Azure AI Foundry project for hosting agents and models. Use when onboarding to Foundry or setting up new infrastructure. | [project/create/create-foundry-project.md](project/create/create-foundry-project.md) |
| **resource/create** | Creating Azure AI Services multi-service resource (Foundry resource) using Azure CLI. Use when manually provisioning AI Services resources with granular control. | [resource/create/create-foundry-resource.md](resource/create/create-foundry-resource.md) |
| **private-network** | Answer questions about Foundry network isolation **and** deploy Foundry with VNet isolation (BYO VNet, Managed VNet, hybrid). Covers architecture concepts, template selection, deployment, and post-deployment validation. | [resource/private-network/private-network.md](resource/private-network/private-network.md) |
| **models/deploy-model** | Unified model deployment with intelligent routing. Handles quick preset deployments, fully customized deployments (version/SKU/capacity/RAI), and capacity discovery across regions. Routes to sub-skills: `preset` (quick deploy), `customize` (full control), `capacity` (find availability). | [models/deploy-model/SKILL.md](models/deploy-model/SKILL.md) |
| **quota** | Managing quotas and capacity for Microsoft Foundry resources. Use when checking quota usage, troubleshooting deployment failures due to insufficient quota, requesting quota increases, or planning capacity. | [quota/quota.md](quota/quota.md) |
| **rbac** | Managing RBAC permissions, role assignments, managed identities, and service principals for Microsoft Foundry resources. Use for access control, auditing permissions, and CI/CD setup. | [rbac/rbac.md](rbac/rbac.md) |
> π‘ **Tip:** For a complete onboarding flow: `project/create` (public) or `private-network` (VNet isolation) β `models/deploy-model` β agent workflows (`create` β `deploy` β `invoke`).
> π‘ **Model Deployment:** Use `models/deploy-model` for all deployment scenarios β it intelligently routes between quick preset deployment, customized deployment with full control, and capacity discovery across regions.
> π‘ **Prompt Optimization:** For requests like "optimize my prompt" or "improve my agent instructions," load [observe](foundry-agent/observe/observe.md) and use the `prompt_optimize` MCP tool through that eval-driven workflow.
## Infrastructure Lifecycle
Match user intent to the correct infrastructure workflow.
| User Intent | Workflow |
|-------------|---------|
| "Create Foundry" / "Set up Foundry" (ambiguous) | Use `AskUserQuestion`: (a) just an AI Services resource, (b) a project with public access, or (c) a project with network isolation? Route: (a) β [resource/create](resource/create/create-foundry-resource.md), (b) β [project/create](project/create/create-foundry-project.md), (c) β [private-network](resource/private-network/private-network.md) |
| Set up Foundry with VNet isolation | [private-network](resource/private-network/private-network.md) |
| Create a Foundry project (public) | [project/create](project/create/create-foundry-project.md) |
| Create a bare Foundry resource | [resource/create](resource/create/create-foundry-resource.md) |
## Agent Development Lifecycle
Match user intent to the correct agent workflow. Read each sub-skill in order before executing.
| User Intent | Workflow (read in order) |
|-------------|------------------------|
| Create a new agent from scratch | [create](foundry-agent/create/create-hosted.md) β [deploy](foundry-agent/deploy/deploy.md) β [invoke](foundry-agent/invoke/invoke.md) |
| Make existing Python agent FAOS optimizable | [faos-optimize](foundry-agent/faos-optimize/faos-optimize.md) β review β deploy β invoke |
| Deploy an agent (code already exists) | deploy β invoke |
| Update/redeploy an agent after code changes | deploy β invoke |
| Invoke/test/chat with an agent | invoke |
| Optimize / improve agent prompt or instructions | observe (Step 4: Optimize) |
| Evaluate and optimize agent (full loop) | observe |
| Enable continuous evaluation monitoring | observe (Step 6: CI/CD & Monitoring) |
| Troubleshoot an agent issue | invoke β troubleshoot |
| Fix a broken agent (troubleshoot + redeploy) | invoke β troubleshoot β apply fixes β deploy β invoke |
## Agent: .foundry Workspace Standard
Every agent source folder should keep Foundry-specific state under `.foundry/`:
```text
<agent-root>/
.foundry/
agent-metadata.yaml
agent-metadata.prod.yaml
datasets/
evaluators/
results/
```
- `agent-metadata.yaml` is the preferred local/dev metadata file. Optional sidecar files such as `agent-metadata.prod.yaml` can hold a single prod or CI-targeted environment without mixing multiple environments in one file.
- `datasets/` and `evaluators/` are local cache folders. Reuse them when they are current, and ask before refreshing or overwriting them.
- See [Agent Metadata Contract](references/agent-metadata-contract.md) for the canonical schema and workflow rules.
## Agent: Setup References
- [Standard Agent Setup](references/standard-agent-setup.md) - Standard capability-host setup with customer-managed data, search, and AI Services resources.
## Agent: Project Context Resolution
Agent skills should run this step **only when they need configuration values they don't already have**. If a value (for example, agent root, environment, project endpoint, or agent name) is already known from the user's message or a previous skill in the same session, skip resolution for that value.
### Step 1: Discover Agent Roots
Search the workspace for `.foundry/` folders that contain `agent-metadata.yaml` or `agent-metadata.<env>.yaml`.
- **One match** β use that agent root.
- **Multiple matches** β require the user to choose the target agent folder.
- **No matches** β for create/deploy workflows, seed a new `.foundry/` folder during setup; for all other workflows, stop and ask the user which agent source folder to initialize.
After selecting an agent root, keep all local `.foundry` cache inspection, source inspection, evaluator suggestions, dataset suggestions, and prompt-optimization context inside that folder only. Do **not** scan sibling agent folders unless the user explicitly switches roots.
### Step 2: Select Metadata File and Resolve Environment
Inside the selected agent root, choose the metadata file in this order:
1. Metadata filename or path explicitly provided by the user or workflow
2. If an explicit environment is already known and `.foundry/agent-metadata.<env>.yaml` exists, use that file
3. `.foundry/agent-metadata.yaml`
4. If multiple metadata files remain and no rule above selects one, prompt the user to choose
Read the selected metadata file and resolve the environment in this order:
1. Environment explicitly named by the user
2. If the selected metadata file defines exactly one environment, use it
3. Environment already selected earlier in the session
4. `defaultEnvironment` from metadata
If the selected metadata file still contains multiple environments and none of the rules above selects one, prompt the user to choose. Keep the selected agent root, metadata file, and environment visible in every workflow summary.
If the selected environment exposes older `testSuites[]` metadata but not `evaluationSuites[]`, treat `testSuites[]` as the source for this session and normalize each entry in memory to the `evaluationSuites[]` shape before continuing. If the metadata is older still and only exposes legacy `testCases[]`, normalize that list the same way. Preserve dataset and evaluator fields, keep any existing `tags`, and map legacy `priority` to `tags.tier` only when `tags.tier` is missing: `P0` -> `smoke`, `P1` -> `regression`, `P2` -> `coverage`.
### Step 3: Resolve Common Configuration
Use the selected environment in the selected metadata file as the primary source:
| Metadata Field | Resolves To | Used By |
|----------------|-------------|---------|
| `environments.<env>.projectEndpoint` | Project endpoint | deploy, invoke, observe, trace, troubleshoot |
| `environments.<env>.agentName` | Agent name | invoke, observe, trace, troubleshoot |
| `environments.<env>.azureContainerRegistry` | ACR registry name / image URL prefix | deploy |
| `environments.<env>.evaluationSuites[]` | Dataset + evaluator + tag bundles | observe, eval-datasets |
### Step 4: Bootstrap Missing Metadata (Create/Deploy Only)
If create/deploy is initializing a new `.foundry` workspace and metadata fields are still missing, check if `azure.yaml` exists in the project root. If found, run `azd env get-values` and use it to seed `agent-metadata.yaml` by default, or `agent-metadata.<env>.yaml` when the workflow explicitly targets a separate environment-specific file.
On any metadata write (deploy, auto-setup, dataset refresh, or trace-to-dataset update), persist only `evaluationSuites[]` in the selected metadata file. If the selected file is a preferred single-environment file, rewrite only that one environment block. If the selected file is a legacy multi-environment file, rewrite only the selected environment block. Never copy or merge environments across sibling metadata files automatically. If the selected environment still uses older `testSuites[]` or legacy `testCases[]`, rewrite it to `evaluationSuites[]` and remove migrated `priority` fields from the rewritten entries.
| azd Variable | Seeds |
|-------------|-------|
| `AZURE_AI_PROJECT_ENDPOINT` or `AZURE_AIPROJECT_ENDPOINT` | `environments.<env>.projectEndpoint` |
| `AZURE_CONTAINER_REGISTRY_NAME` or `AZURE_CONTAINER_REGISTRY_ENDPOINT` | `environments.<env>.azureContainerRegistry` |
| `AZURE_SUBSCRIPTION_ID` | Azure subscription for trace/troubleshoot lookups |
### Step 5: Collect Missing Values
Use the `ask_user` or `askQuestions` tool **only for values not resolved** from the user's message, session context, metadata, or azd bootstrap. Common values skills may need:
- **Agent root** β Target folder containing `.foundry/agent-metadata*.yaml`
- **Metadata file** β `agent-metadata.yaml` for local/dev, or an explicit sidecar such as `agent-metadata.prod.yaml`
- **Environment** β `dev`, `prod`, or another environment key from metadata
- **Project endpoint** β AI Foundry project endpoint URL
- **Agent name** β Name of the target agent
> π‘ **Tip:** If the user already provides the agent path, environment, project endpoint, or agent name, extract it directly β do not ask again.
## Agent: Agent Types
All agent skills support two agent types:
| Type | Kind | Description |
|------|------|-------------|
| **Prompt** | `"prompt"` | LLM-based agents backed by a model deployment |
| **Hosted** | `"hosted"` | Container-based agents running custom code |
Use `agent_get` MCP tool to determine an agent's type when needed.
## Tool Usage Conventions
- Use the `ask_user` or `askQuestions` tool whenever collecting information from the user
- Use the `task` or `runSubagent` tool to delegate long-running or independent sub-tasks (e.g., env var scanning, status polling, Dockerfile generation)
- Prefer Azure MCP tools over direct CLI commands when available
- Reference official Microsoft documentation URLs instead of embedding CLI command syntax
## Additional Resources
- [Foundry Hosted Agents](https://learn.microsoft.com/azure/ai-foundry/agents/concepts/hosted-agents?view=foundry)
- [Foundry Agent Runtime Components](https://learn.microsoft.com/azure/ai-foundry/agents/concepts/runtime-components?view=foundry)
## SDK Quick Reference
- [Python](references/sdk/foundry-sdk-py.md)
create-hosted.md 18.8 KB
# Create Hosted Agent Application
Create new hosted agent applications for Microsoft Foundry, or convert existing agent projects to be Foundry-compatible using the hosting adapter.
## Quick Reference
| Property | Value |
|----------|-------|
| **Samples Repo** | `microsoft-foundry/foundry-samples` |
| **Python Samples** | `samples/python/hosted-agents/` |
| **C# Samples** | `samples/csharp/hosted-agents/` |
| **Hosted Agents Docs** | https://learn.microsoft.com/azure/ai-foundry/agents/concepts/hosted-agents |
| **Default Selection** | `Python` + `responses` + `Microsoft Agent Framework` |
| **Best For** | Creating new or converting existing agent projects for Foundry |
## When to Use This Skill
- Create a new hosted agent application from scratch (greenfield)
- Start from an official sample and customize it
- Convert an existing agent project to be Foundry-compatible (brownfield)
- Help user choose a language, protocol, framework, or sample for their agent
## Workflow
> Relative reference paths in this file are resolved from the directory containing `create.md`. For example, `./references/agentframework.md` means the file next to this document under `create/references/`, not a path relative to the runtime working directory.
### Step 1: Determine Scenario
Check the user's workspace for existing agent project indicators:
- **No agent-related code found** β **Greenfield**. Proceed to Greenfield Workflow (Step 2).
- **Existing agent code present** β **Brownfield**. Proceed to Brownfield Workflow.
### Step 2: Gather Requirements (Greenfield)
If the user hasn't already specified, use `ask_user` to collect in this order:
**Language:** Python (default) or C#.
**Protocol:**
| Protocol | Best For |
|----------|----------|
| `responses` (default) | Conversational agents using the OpenAI-compatible `/responses` contract |
| `invocations` | Arbitrary payloads, custom SSE behavior, protocol bridges, webhook-style callers, or client-managed sessions |
**Framework:**
The paths below refer to the framework-level directories in the Foundry sample repo. Choose the protocol-specific subpath in Step 3.
| Framework | Python Path | C# Path |
|-----------|-------------|---------|
| Microsoft Agent Framework (default) | `agent-framework` | `agent-framework` |
| LangGraph | `bring-your-own` | β Python only |
| Custom | `bring-your-own` | `bring-your-own` |
> β οΈ **Warning:** LangGraph is Python-only. For C# + LangGraph, suggest Microsoft Agent Framework or Custom instead.
> π‘ **Tip:** In the sample repo, **Custom** corresponds to the **Bring Your Own** lanes.
> π‘ **Tip:** LangGraph samples are under **Bring Your Own**, not under a separate top-level `langgraph` directory.
If user has no specific preference, suggest Python + `responses` + Microsoft Agent Framework as defaults.
In non-interactive or YOLO mode, default to Python + `responses` + Microsoft Agent Framework unless the user's request clearly requires another supported combination.
### Step 3: Browse and Select Sample
List available samples using the GitHub API. First resolve the `sample_browse_path` (the browse root) from the selected language, protocol, and framework:
| Selection | Sample Browse Path |
|-----------|--------------------|
| Python + Microsoft Agent Framework + `responses` | `samples/python/hosted-agents/agent-framework/responses/` |
| Python + Microsoft Agent Framework + `invocations` | `samples/python/hosted-agents/agent-framework/invocations/` |
| Python + LangGraph | `samples/python/hosted-agents/bring-your-own/{protocol}/langgraph-chat/` |
| Python + Custom | `samples/python/hosted-agents/bring-your-own/{protocol}/` |
| C# + Microsoft Agent Framework + `responses` | `samples/csharp/hosted-agents/agent-framework/` |
| C# + Microsoft Agent Framework + `invocations` | `samples/csharp/hosted-agents/agent-framework/invocations-echo-agent/` |
| C# + Custom | `samples/csharp/hosted-agents/bring-your-own/{protocol}/` |
Use the chosen lane to browse the repo under `sample_browse_path`:
```
GET https://api.github.com/repos/microsoft-foundry/foundry-samples/contents/{sample_browse_path}
```
If the user has specified what they want the agent to do, choose the most relevant or most simple sample under that lane and record its exact `selected_sample_path`. Only if the user has not given any preferences, present the sample directories under `sample_browse_path` to the user and help them choose based on their requirements (e.g., RAG, tools, multi-agent workflows, HITL).
If the requested combination does not have a real sample, say so clearly and suggest the nearest supported lane.
> β οΈ **Tools:** Hosted agents access tools through a **Foundry Toolbox MCP endpoint** β they do NOT wire tools directly. If the user wants an agent with tools (web search, AI search, code interpreter, MCP servers, etc.), select the `toolbox` samples (see [references/use-toolbox-in-hosted-agent.md#code-integration-patterns](references/use-toolbox-in-hosted-agent.md#code-integration-patterns)). These samples include Foundry Toolbox integration in the sample code out of the box, but the user still needs an actual toolbox resource β you'll resolve its endpoint in Step 6 (Verify Startup).
### Step 4: Download Sample Files
Download only the selected sample directory β do NOT clone the entire repo. Preserve the directory structure by creating subdirectories as needed.
Use the exact `selected_sample_path` selected in Step 3.
**Using `gh` CLI (preferred if available):**
```bash
gh api repos/microsoft-foundry/foundry-samples/contents/{selected_sample_path} \
--jq '.[] | select(.type=="file") | .download_url' | while read url; do
filepath="${url##*/{selected_sample_path}/}"
mkdir -p "$(dirname "$filepath")"
curl -sL "$url" -o "$filepath"
done
```
**Using curl (fallback):**
```bash
curl -s "https://api.github.com/repos/microsoft-foundry/foundry-samples/contents/{selected_sample_path}" | \
jq -r '.[] | select(.type=="file") | .path + "\t" + .download_url' | while IFS=$'\t' read path url; do
relpath="${path#{selected_sample_path}/}"
mkdir -p "$(dirname "$relpath")"
curl -sL "$url" -o "$relpath"
done
```
For nested directories, recursively fetch the GitHub contents API for entries where `type == "dir"` and repeat the download for each.
### Step 5: Customize and Implement
1. Read the sample's `README.md` and `agent.yaml` or `agent.manifest.yaml` to understand its structure
2. Read the sample code to understand patterns, protocol handling, and dependencies used
3. If using Agent Framework, follow the best practices in [references/agentframework.md](references/agentframework.md)
4. Implement the user's specific requirements on top of the sample
5. Update configuration (`.env`, dependency files, `agent.yaml`, `agent.manifest.yaml`) as needed, and keep the selected protocol consistent across code and config
6. Ensure the project is in a runnable state
### Step 6: Verify Startup
1. Install dependencies (use virtual environment for Python)
2. Ask user to provide values for `.env` variables if placeholders were used using `ask_user` tool.
- **If the agent uses tools / toolboxes**: resolve the toolbox endpoint per [references/use-toolbox-in-hosted-agent.md#resolve-toolbox-endpoint](references/use-toolbox-in-hosted-agent.md#resolve-toolbox-endpoint).
3. Run the main entrypoint
4. Fix startup errors and retry if needed
5. Send a protocol-appropriate test request to the correct endpoint:
- `responses` β `POST http://localhost:8088/responses`
- `invocations` β `POST http://localhost:8088/invocations`
6. Fix any errors from the test request and retry until it succeeds
7. Once startup and test request succeed, stop the server to prevent resource usage
**Guardrails:**
- β
Perform real run to catch startup errors
- β
Cleanup after verification (stop server)
- β
Ignore auth/connection/timeout errors (expected without Azure config)
- β Don't wait for user input or create test scripts
## Brownfield Workflow: Convert Existing Agent to Hosted Agent
Use this workflow when the user has an existing agent project that needs to be made compatible with Foundry hosted agent deployment. The key requirement is wrapping the existing agent with the appropriate hosting adapter.
### Step B1: Analyze Existing Project
Scan the project to determine:
1. **Language** β Python (look for `requirements.txt`, `pyproject.toml`, `*.py`) or C# (look for `*.csproj`, `*.cs`)
2. **Framework** β Identify which agent framework is in use:
| Indicator | Framework |
|-----------|-----------|
| Imports from `agent_framework` or `Microsoft.Agents.AI` | Microsoft Agent Framework |
| Imports from `langgraph`, `langchain` | LangGraph |
| No recognized framework imports, or other frameworks (e.g., Semantic Kernel, AutoGen, custom code) | Custom |
3. **Target protocol** β If the user has not specified one, infer whether the project should target `responses` or `invocations` based on the existing caller contract
4. **Entry point** β Identify the main script/entrypoint that creates and runs the agent
5. **Agent object** β Identify the agent instance that needs to be wrapped (e.g., a `BaseAgent` subclass, a compiled `StateGraph`, or an existing server/app)
### Step B2: Add Hosting Adapter Dependency
Add the correct adapter package based on framework, language, and protocol. Get the latest version from the package registry β do not hardcode versions.
**Python adapter packages:**
| Framework | Package(s) |
|-----------|------------|
| Microsoft Agent Framework | `responses`: `agent-framework-foundry-hosting`; `invocations`: `agent-framework-foundry-hosting` |
| LangGraph | `responses`: `azure-ai-agentserver-responses` + `azure-ai-agentserver-core`; `invocations`: `azure-ai-agentserver-invocations` + `azure-ai-agentserver-core` |
| Custom | `responses`: `azure-ai-agentserver-responses`; `invocations`: `azure-ai-agentserver-invocations` |
**.NET adapter packages:**
| Framework | Package(s) |
|-----------|------------|
| Microsoft Agent Framework | `responses`: `Microsoft.Agents.AI.Foundry.Hosting`; `invocations`: `Microsoft.Agents.AI.Foundry.Hosting` + `Azure.AI.AgentServer.Invocations` |
| Custom | `responses`: `Azure.AI.AgentServer.Responses`; `invocations`: `Azure.AI.AgentServer.Invocations` |
Add the package to the project's dependency file (`requirements.txt`, `pyproject.toml`, or `.csproj`). For Python, also add `python-dotenv` if not present.
### Step B3: Wrap Agent with Hosting Adapter
Modify the project's main entrypoint to wrap the existing agent with the adapter. The approach differs by framework and protocol:
**Microsoft Agent Framework + `responses` (Python):**
- Import `ResponsesHostServer` from the adapter package
- Pass the agent instance (from `agent_framework` package) to the adapter
- Call `.run()` on the adapter as the default entrypoint
**Microsoft Agent Framework + `invocations` (Python):**
- Use `InvocationAgentServerHost()`
- Implement an `@app.invoke_handler`
- Manage session state if the agent needs multi-turn memory
**Microsoft Agent Framework + `responses` (C#):**
- Register Foundry responses hosting and map the `responses` protocol
**Microsoft Agent Framework + `invocations` (C#):**
- Register invocations services and an invocation handler
- Map the `invocations` protocol
**LangGraph:**
- Python only
- Follow the `bring-your-own/{protocol}/langgraph-chat` sample for the selected protocol lane
**Custom:**
- Follow the corresponding `bring-your-own/{protocol}` sample for the selected language
- Prefer the protocol SDK sample for the selected lane instead of inventing a custom contract when a sample already exists
> β οΈ **Warning:** The adapter MUST be the default entrypoint (no flags required to start). This is required for both local debugging and containerized deployment.
### Step B4: Configure Environment
1. Create or update a `.env` file with required environment variables (project endpoint, model deployment name, etc.)
- **If the agent uses tools / toolboxes**: resolve the toolbox endpoint per [references/use-toolbox-in-hosted-agent.md#resolve-toolbox-endpoint](references/use-toolbox-in-hosted-agent.md#resolve-toolbox-endpoint).
2. For Python: ensure the code uses `load_dotenv(override=False)` so Foundry-injected environment variables are available at runtime.
3. If the project uses Azure credentials: ensure Python uses `azure.identity.DefaultAzureCredential` for **local development**. In production, use `ManagedIdentityCredential`. See [auth-best-practices.md](../../references/auth-best-practices.md)
### Step B5: Create agent.yaml
Create an `agent.yaml` file in the project root. This file defines the agent's metadata and deployment configuration for Foundry. Required fields:
- `name` β Unique identifier (alphanumeric + hyphens, max 63 chars)
- `description` β What the agent does
- `template.kind` β Must be `hosted`
- `template.protocols` β Must include the selected protocol and matching version from the chosen sample
- `template.environment_variables` β List all environment variables the agent needs at runtime
Refer to the chosen sample's `agent.yaml` or `agent.manifest.yaml` in the [foundry-samples repo](https://github.com/microsoft-foundry/foundry-samples/tree/main/samples/python/hosted-agents) for the exact schema.
### Step B6: Create Dockerfile
Create a `Dockerfile` if one doesn't exist. Requirements:
- Base image appropriate for the language (e.g., `python:3.12-slim` for Python, `mcr.microsoft.com/dotnet/sdk` for C#)
- Copy source code into the container
- Install dependencies
- Expose port **8088** (the adapter's default port)
- Set the main entrypoint as the CMD
> β οΈ **Warning:** When building, MUST use `--platform linux/amd64`. Hosted agents run on Linux AMD64 infrastructure. Images built for other architectures (e.g., ARM64 on Apple Silicon) will fail.
Refer to the chosen sample's `Dockerfile` in the [foundry-samples repo](https://github.com/microsoft-foundry/foundry-samples/tree/main/samples/python/hosted-agents) for the exact pattern.
### Step B7: Test Locally
1. Install dependencies (use virtual environment for Python)
2. Run the main entrypoint β the adapter should start an HTTP server on `localhost:8088`
3. Send a protocol-appropriate test request to either `/responses` or `/invocations`
4. Verify the response follows the expected protocol shape for the selected lane
5. Fix any errors and retry until the test request succeeds
6. Stop the server
> π‘ **Tip:** If auth/connection errors occur for Azure services, that's expected without real Azure credentials configured. The key validation is that the HTTP server starts and accepts requests.
## Common Guidelines
IMPORTANT: YOU MUST FOLLOW THESE.
Apply these to both greenfield and brownfield projects:
1. **Sample-first** β Start from a real sample in the current `foundry-samples` repo. Do not invent unsupported combinations, paths, or protocol behavior.
2. **Protocol consistency** β Keep the selected protocol consistent across sample choice, code, config, and verification steps.
3. **Logging** β Implement proper logging using the language's standard logging framework (Python `logging` module, .NET `ILogger`). Hosted agents stream container stdout/stderr logs to Foundry, so all log output is visible via the troubleshoot workflow. Use structured log levels (INFO, WARNING, ERROR) and include context like request IDs and agent names.
4. **Framework-specific best practices** β When using Microsoft Agent Framework, read the [Agent Framework best practices](references/agentframework.md) for hosting adapter setup, credential patterns, and debugging guidance.
5. **Deploy handoff** β After the agent has been created and local verification succeeds, explicitly tell the user that they can deploy the agent if they want, and ask them to say `deploy agent to foundry` to continue with the deploy sub-skill.
6. **Tool integration** β Hosted agents access tools through [Foundry Toolbox](references/use-toolbox-in-hosted-agent.md), NOT by wiring tools directly. If the user needs tools (web search, AI search, code execution, file search, MCP servers, etc.), follow the toolbox integration guide. The toolbox provides a single MCP-compatible endpoint that handles credential injection and tool discovery.
7. **Reserved environment variables** β The Foundry platform injects environment variables into every hosted agent container at startup. You MUST NOT generate, suggest, or configure any of these in `.env` files, `agent.yaml` `environment_variables`, or application code:
**Blocked prefixes** (any variable starting with these is reserved):
- `FOUNDRY_*` β platform-injected identity, session, project, and toolset variables
- `AGENT_*` β reserved for platform use
**Exact reserved names** (platform-managed, overwritten at runtime):
- `PORT` β HTTP listen port (default `8088`)
- `HOME` β session filesystem path (`/home/session`)
- `SSE_KEEPALIVE_INTERVAL` β SSE keep-alive config
- `APPLICATIONINSIGHTS_CONNECTION_STRING` β observability
- `OTEL_EXPORTER_OTLP_ENDPOINT` β OTLP collector endpoint
**Key `FOUNDRY_*` variables available at runtime** (read-only, do not set):
- `FOUNDRY_PROJECT_ENDPOINT` β project endpoint URL for calling Azure services
- `FOUNDRY_AGENT_NAME` β the deployed agent's name
- `FOUNDRY_AGENT_VERSION` β the deployed agent's version
- `FOUNDRY_TOOLBOX_ENDPOINT` β MCP-compatible toolbox endpoint (if toolbox is configured)
If user code needs to read these values at runtime (e.g., `FOUNDRY_PROJECT_ENDPOINT` to call Azure services), read them from the environment β do not set or override them.
## Coding Tips
Use these when generating or modifying project code:
1. **Create a `.gitignore` file** β After generating code, create a `.gitignore` file if one does not already exist. If one already exists, update it as needed.
- Choose the ignore entries based on the language, framework, and files generated.
- Do not leave the project with no ignored files.
- For Python projects, `.venv/` MUST be ignored at a minimum.
## Non-Interactive / YOLO Mode
When running in non-interactive mode (e.g., YOLO mode), skip selection prompts and use these defaults unless the user has already specified otherwise:
- **Language** β `Python`
- **Protocol** β `responses`
- **Framework** β `Microsoft Agent Framework`
If the user's request clearly requires another supported lane, use that lane instead of forcing the defaults.
## Error Handling
| Error | Cause | Resolution |
|-------|-------|------------|
| GitHub API rate limit | Too many requests | Authenticate with `gh auth login` |
| `gh` not available | CLI not installed | Use curl REST API fallback |
| Sample not found | Path changed in repo or selected lane has no matching sample | List the selected parent directory again and choose a current sample |
| Requested combination not supported | Example: C# + LangGraph | Explain the gap and switch to the nearest supported lane |
| Protocol mismatch | Code, `agent.yaml`, and test request are not aligned | Make all three match the selected protocol |
| Dependency install fails | Version conflicts | Use versions from the selected sample's own dependency file |
create-prompt.md 3.9 KB
# Create Prompt Agent
Create and manage prompt agents in Azure Foundry Agent Service using MCP tools or Python SDK. For hosted agents (container-based), see [create-hosted.md](create-hosted.md).
## Quick Reference
| Property | Value |
|----------|-------|
| **Agent Type** | Prompt (`kind: "prompt"`) |
| **Primary Tool** | Foundry MCP server (`foundry_agents_*`) |
| **Fallback SDK** | `azure-ai-projects` v2.x preview |
| **Auth** | `DefaultAzureCredential` / `az login` |
## Workflow
```
User Request (create/list/get/update/delete agent)
β
βΌ
Step 1: Resolve project context (endpoint + credentials)
β
βΌ
Step 2: Try MCP tool for the operation
β ββ β
MCP available β Execute via MCP tool β Done
β ββ β MCP unavailable β Continue to Step 3
β
βΌ
Step 3: Fall back to SDK
β Read references/sdk-operations.md for code
β
βΌ
Step 4: Execute and confirm result
```
### Step 1: Resolve Project Context
The user needs a Foundry project endpoint. Check for:
1. `PROJECT_ENDPOINT` environment variable
2. Ask the user for their project endpoint
3. Use `foundry_resource_get` MCP tool to discover it
Endpoint format: `https://<resource>.services.ai.azure.com/api/projects/<project>`
### Step 2: Create Agent (MCP β Preferred)
For a **prompt agent**:
- Provide: agent name, model deployment name, instructions
- Optional: tools (code interpreter, file search, function calling, web search, Bing grounding, memory)
For a **workflow**:
- Workflows are created in the Foundry portal visual builder
- Use MCP to create the individual agents that participate in the workflow
- Direct the user to the Foundry portal for workflow assembly
### Step 3: SDK Fallback
If MCP tools are unavailable, use the `azure-ai-projects` SDK:
- See [SDK Operations](references/sdk-operations.md) for create, list, update, delete code samples
- See [Agent Tools](references/agent-tools.md) for adding tools to agents
### Step 4: Add Tools (Optional)
> β οΈ **MANDATORY:** Before configuring any tool, **read its reference documentation** linked below to understand prerequisites, required parameters, and setup steps. Do not attempt to add a tool without first reviewing its reference.
| Tool Category | Reference |
|---------------|-----------|
| Code Interpreter, Function Calling | [Simple Tools](references/agent-tools.md) |
| File Search (requires vector store) | [File Search](references/tool-file-search.md) |
| Web Search (default, no setup needed) | [Web Search](references/tool-web-search.md) |
| Bing Grounding (explicit request only) | [Bing Grounding](references/tool-bing-grounding.md) |
| Azure AI Search (private data) | [Azure AI Search](references/tool-azure-ai-search.md) |
| MCP Servers | [MCP Tool](references/tool-mcp.md) |
| Memory (persistent across sessions) | [Memory](references/tool-memory.md) |
| Connections (for tools that need them) | [Project Connections](../../project/connections.md) |
> β οΈ **Web Search Default:** Use `WebSearchPreviewTool` for web search. Only use `BingGroundingAgentTool` when the user explicitly requests Bing Grounding.
## Error Handling
| Error | Cause | Resolution |
|-------|-------|------------|
| Agent creation fails | Missing model deployment | Deploy a model first via `foundry_models_deploy` or portal |
| MCP tool not found | MCP server not running | Fall back to SDK β see [SDK Operations](references/sdk-operations.md) |
| Permission denied | Insufficient RBAC | Need `Azure AI User` role on the project |
| Agent name conflict | Name already exists | Use a unique name or update the existing agent |
| Tool not available | Tool not configured for project | Verify tool prerequisites (e.g., Bing resource for grounding) |
| SDK version mismatch | Using 1.x instead of 2.x | Install `azure-ai-projects --pre` for v2.x preview |
| Tenant mismatch | MCP token tenant differs from resource tenant | Fall back to SDK β `DefaultAzureCredential` resolves the correct tenant |
agent-tools.md 2.7 KB
# Agent Tools β Simple Tools
Add tools to agents to extend capabilities. This file covers tools that work without external connections. For tools requiring connections/RBAC setup, see:
- [Web Search tool](tool-web-search.md) β real-time public web search with citations (default for web search)
- [Bing Grounding tool](tool-bing-grounding.md) β web search via dedicated Bing resource (only when explicitly requested)
- [Azure AI Search tool](tool-azure-ai-search.md) β private data grounding with vector search
- [MCP tool](tool-mcp.md) β remote Model Context Protocol servers
## Code Interpreter
Enables agents to write and run Python in a sandboxed environment. Supports data analysis, chart generation, and file processing. Has [additional charges](https://azure.microsoft.com/pricing/details/cognitive-services/openai-service/) beyond token-based fees.
> Sessions: 1-hour active / 30-min idle timeout. Each conversation = separate billable session.
For code samples, see: [Code Interpreter tool documentation](https://learn.microsoft.com/azure/ai-foundry/agents/how-to/tools/code-interpreter?view=foundry)
## Function Calling
Define custom functions the agent can invoke. Your app executes the function and returns results. Runs expire 10 minutes after creation β return tool outputs promptly.
> **Security:** Treat tool arguments as untrusted input. Don't pass secrets in tool output. Use `strict=True` for schema validation.
For code samples, see: [Function Calling tool documentation](https://learn.microsoft.com/azure/ai-foundry/agents/how-to/tools/function-calling?view=foundry)
## Tool Summary
| Tool | Connection? | Reference |
|------|-------------|-----------|
| `CodeInterpreterTool` | No | This file |
| `FileSearchTool` | No (vector store required) | [tool-file-search.md](tool-file-search.md) |
| `FunctionTool` | No | This file |
| `WebSearchPreviewTool` | No | [tool-web-search.md](tool-web-search.md) |
| `BingGroundingAgentTool` | Yes (Bing) | [tool-bing-grounding.md](tool-bing-grounding.md) |
| `AzureAISearchAgentTool` | Yes (Search) | [tool-azure-ai-search.md](tool-azure-ai-search.md) |
| `MCPTool` | Optional | [tool-mcp.md](tool-mcp.md) |
> β οΈ **Default for web search:** Use `WebSearchPreviewTool` unless the user explicitly requests Bing Grounding or Bing Custom Search.
> Combine multiple tools on one agent. The model decides which to invoke.
## References
- [Tool Catalog](https://learn.microsoft.com/azure/ai-foundry/agents/concepts/tool-catalog?view=foundry)
- [Code Interpreter](https://learn.microsoft.com/azure/ai-foundry/agents/how-to/tools/code-interpreter?view=foundry)
- [Function Calling](https://learn.microsoft.com/azure/ai-foundry/agents/how-to/tools/function-calling?view=foundry)
agentframework.md 4.7 KB
# Microsoft Agent Framework β Best Practices for Hosted Agents
Best practices when building hosted agents with Microsoft Agent Framework for deployment to Foundry Agent Service.
## Official Resources
| Resource | URL |
|----------|-----|
| **GitHub Repo** | https://github.com/microsoft/agent-framework |
| **MS Learn Overview** | https://learn.microsoft.com/agent-framework/overview/agent-framework-overview |
| **Quick Start** | https://learn.microsoft.com/agent-framework/tutorials/quick-start |
| **User Guide** | https://learn.microsoft.com/agent-framework/user-guide/overview |
| **Hosted Agents Concepts** | https://learn.microsoft.com/azure/ai-foundry/agents/concepts/hosted-agents |
| **Python Samples (MAF repo)** | https://github.com/microsoft/agent-framework/tree/main/python/samples |
| **.NET Samples (MAF repo)** | https://github.com/microsoft/agent-framework/tree/main/dotnet/samples |
| **PyPI** | https://pypi.org/project/agent-framework/ |
| **NuGet** | https://www.nuget.org/profiles/MicrosoftAgentFramework/ |
## Installation
**Python:** `pip install agent-framework agent-framework-foundry-hosting` (installs all sub-packages)
**.NET:** `dotnet add package Microsoft.Agents.AI`
## Hosting Adapter
Hosted agents must expose an HTTP server using the hosting adapter. This enables local testing and Foundry deployment with the same code.
**Python adapter packages:** `agent_framework_foundry_hosting`
**.NET adapter packages:** `Azure.AI.AgentServer.Core`, `Microsoft.Agents.AI.Foundry.Hosting`
The adapter handles protocol translation between Foundry request/response formats and your framework's native data structures, including conversation management, message serialization, and streaming.
> π‘ **Tip:** Make HTTP server mode the default entrypoint (no flags needed). This simplifies both local debugging and containerized deployment.
## Key Patterns
### Python: Credentials
For **local development**, use `DefaultAzureCredential` from `azure.identity`. In production, use `ManagedIdentityCredential`. See [auth-best-practices.md](../../../references/auth-best-practices.md).
### Python: Environment Variables
Always use `load_dotenv(override=False)` so environment variables set by Foundry at runtime take precedence over local `.env` values.
Required `.env` variables:
- `FOUNDRY_PROJECT_ENDPOINT` β project endpoint URL
- `FOUNDRY_MODEL_DEPLOYMENT_NAME` β model deployment name
### Authentication
If explicitly asked to use API key instead of managed identity, then use AzureOpenAIResponsesClient and pass in api_key parameter to it.
### Agent Naming Rules
Agent names must: start/end with alphanumeric characters, may contain hyphens in the middle, max 63 characters. Examples: `MyAgent`, `agent-1`. Invalid: `-agent`, `agent-`, `sample_agent`.
### Python: Virtual Environment
Always use a virtual environment. Never use bare `python` or `pip` β use venv-activated versions or full paths (e.g., `.venv/bin/pip`).
## Workflow Patterns
Agent Framework supports single-agent and multi-agent workflow patterns using graph-based orchestration:
- **Single Agent** β Basic agent with tools, RAG, or MCP integration
- **Multi-Agent Workflow** β Graph-based orchestration connecting multiple agents and deterministic functions
- **Advanced Patterns** β Reflection, switch-case, fan-out/fan-in, loop, human-in-the-loop
For workflow samples and advanced patterns, search the [Agent Framework GitHub repo](https://github.com/microsoft/agent-framework).
## Debugging
Use [AI Toolkit for VS Code](https://marketplace.visualstudio.com/items?itemName=ms-windows-ai-studio.windows-ai-studio) with the `agentdev` CLI tool for interactive debugging:
1. Install `debugpy` for VS Code Python Debugger support
2. Install `agent-dev-cli` (pre-release) for the `agentdev` command
3. Key debug tasks: `agentdev run <entrypoint>.py --port 8087` starts the agent HTTP server, `debugpy --listen 127.0.0.1:5679` attaches the debugger, and the `ai-mlstudio.openTestTool` VS Code command opens the Agent Inspector UI
For VS Code `launch.json` and `tasks.json` configuration templates, see [AI Toolkit Agent Inspector β Configure debugging manually](https://github.com/microsoft/vscode-ai-toolkit/blob/main/doc/agent-test-tool.md#configure-debugging-manually).
## Common Errors
| Error | Cause | Fix |
|-------|-------|-----|
| `ModuleNotFoundError` | Missing SDK | `pip install agent-framework agent-framework-foundry-hosting` in venv |
| Credential error | Wrong import | Use `azure.identity.DefaultAzureCredential` (local dev) or `ManagedIdentityCredential` (production) |
| Agent name validation error | Invalid characters | Use alphanumeric + hyphens, start/end alphanumeric, max 63 chars |
| Hosting adapter not found | Missing package | Install `agent-framework-foundry-hosting` |
sdk-operations.md 2.2 KB
# SDK Operations for Foundry Agent Service
Use the Foundry MCP tools for agent CRUD operations. When MCP tools are unavailable, use the `azure-ai-projects` Python SDK or REST API.
## Agent Operations via MCP
| Operation | MCP Tool | Description |
|-----------|----------|-------------|
| Create/Update agent | `agent_update` | Create a new agent or update an existing one (creates new version) |
| List/Get agents | `agent_get` | List all agents, or get a specific agent by name |
| Delete agent | `agent_delete` | Delete an agent |
| Invoke agent | `agent_invoke` | Send a message to an agent and get a response |
| Get schema | `agent_definition_schema_get` | Get the full JSON schema for agent definitions |
## SDK Agent Operations
When MCP tools are unavailable, use the `azure-ai-projects` Python SDK (`pip install azure-ai-projects --pre`):
```python
from azure.ai.projects import AIProjectClient
from azure.identity import DefaultAzureCredential
endpoint = "https://<resource>.services.ai.azure.com/api/projects/<project>"
client = AIProjectClient(endpoint=endpoint, credential=DefaultAzureCredential())
```
| Operation | SDK Method |
|-----------|------------|
| Create | `client.agents.create_version(agent_name, definition)` |
| List | `client.agents.list()` |
| Get | `client.agents.get(agent_name)` |
| Update | `client.agents.create_version(agent_name, definition)` (creates new version) |
| Delete | `client.agents.delete(agent_name)` |
| Chat | `client.get_openai_client().responses.create(model=<deployment>, input=<text>, extra_body={"agent": {"name": agent_name, "type": "agent_reference"}})` |
## Environment Variables
| Variable | Description |
|----------|-------------|
| `PROJECT_ENDPOINT` | Foundry project endpoint (`https://<resource>.services.ai.azure.com/api/projects/<project>`) |
| `MODEL_DEPLOYMENT_NAME` | Deployed model name (e.g., `gpt-4.1-mini`) |
## References
- [Agent quickstart](https://learn.microsoft.com/azure/ai-foundry/agents/quickstart?view=foundry)
- [Create agents](https://learn.microsoft.com/azure/ai-foundry/agents/how-to/create-agent?view=foundry)
- [Tool Catalog](https://learn.microsoft.com/azure/ai-foundry/agents/concepts/tool-catalog?view=foundry)
tool-azure-ai-search.md 3.3 KB
# Azure AI Search Tool
Ground agent responses with data from an Azure AI Search vector index. Requires a project connection and proper RBAC setup.
## Prerequisites
- Azure AI Search index with vector search configured:
- One or more `Edm.String` fields (searchable + retrievable)
- One or more `Collection(Edm.Single)` vector fields (searchable)
- At least one retrievable text field with content for citations
- A retrievable field with source URL for citation links
- A [project connection](../../../project/connections.md) between your Foundry project and search service
- `azure-ai-projects` package (`pip install azure-ai-projects --pre`)
## Required RBAC Roles
For **keyless authentication** (recommended), assign these roles to the **Foundry project's managed identity** on the Azure AI Search resource:
| Role | Scope | Purpose |
|------|-------|---------|
| **Search Index Data Contributor** | AI Search resource | Read/write index data |
| **Search Service Contributor** | AI Search resource | Manage search service config |
> **If RBAC assignment fails:** Ask the user to manually assign roles in Azure portal β AI Search resource β Access control (IAM). They need Owner or User Access Administrator on the search resource.
## Connection Setup
A project connection between your Foundry project and the Azure AI Search resource is required. See [Project Connections](../../../project/connections.md) for connection management via Foundry MCP tools.
## Query Types
| Value | Description |
|-------|-------------|
| `SIMPLE` | Keyword search |
| `VECTOR` | Vector similarity only |
| `SEMANTIC` | Semantic ranking |
| `VECTOR_SIMPLE_HYBRID` | Vector + keyword |
| `VECTOR_SEMANTIC_HYBRID` | Vector + keyword + semantic (default, recommended) |
## Tool Parameters
| Parameter | Required | Description |
|-----------|----------|-------------|
| `project_connection_id` | Yes | Connection ID (resolve via `project_connection_get`, typically after discovering the connection with `project_connection_list`) |
| `index_name` | Yes | Search index name |
| `top_k` | No | Number of results (default: 5) |
| `query_type` | No | Search type (default: `vector_semantic_hybrid`) |
| `filter` | No | OData filter applied to all queries |
## Limitations
- Only **one index per tool** instance. For multiple indexes, use connected agents each with their own index.
- Search resource and Foundry agent must be in the **same tenant**.
- Private AI Search resources require **standard agent deployment** with vNET injection.
## Troubleshooting
| Error | Cause | Fix |
|-------|-------|-----|
| 401/403 accessing index | Missing RBAC roles | Assign `Search Index Data Contributor` + `Search Service Contributor` to project managed identity |
| Index not found | Name mismatch | Verify `AI_SEARCH_INDEX_NAME` matches exactly (case-sensitive) |
| No citations in response | Instructions don't request them | Add citation instructions to agent prompt |
| Wrong connection endpoint | Connection points to different search resource | Re-create connection with correct endpoint |
## References
- [Azure AI Search tool documentation](https://learn.microsoft.com/azure/ai-foundry/agents/how-to/tools/azure-ai-search?view=foundry)
- [Tool Catalog](https://learn.microsoft.com/azure/ai-foundry/agents/concepts/tool-catalog?view=foundry)
- [Project Connections](../../../project/connections.md)
tool-bing-grounding.md 2.8 KB
# Bing Grounding Tool
Access real-time web information via Bing Search. Unlike the [Web Search tool](tool-web-search.md) (which works out of the box), Bing Grounding requires a dedicated Bing resource and a project connection.
> β οΈ **Warning:** Use the [Web Search tool](tool-web-search.md) as the default for web search. Only use Bing Grounding when the user **explicitly** requests Grounding with Bing Search or Grounding with Bing Custom Search.
## When to Use
- User explicitly asks for "Bing Grounding" or "Grounding with Bing Search"
- User explicitly asks for "Bing Custom Search" or "Grounding with Bing Custom Search"
- User needs to restrict web search to specific domains (Bing Custom Search)
- User has an existing Bing Grounding resource they want to use
## Prerequisites
- A [Grounding with Bing Search resource](https://portal.azure.com/#create/Microsoft.BingGroundingSearch) in Azure portal
- `Contributor` or `Owner` role at subscription/RG level to create Bing resource and get keys
- `Azure AI Project Manager` role on the project to create a connection
- A project connection configured with the Bing resource key β see [connections](../../../project/connections.md)
## Setup
1. Register the Bing provider: `az provider register --namespace 'Microsoft.Bing'`
2. Create a Grounding with Bing Search resource in the Azure portal
3. Create a project connection with the Bing resource key β see [connections](../../../project/connections.md)
4. Set `BING_PROJECT_CONNECTION_NAME` environment variable
## Important Disclosures
- Bing data flows **outside Azure compliance boundary**
- Review [Grounding with Bing terms of use](https://www.microsoft.com/bing/apis/grounding-legal-enterprise)
- Not supported with VPN/Private Endpoints
- Usage incurs costs β see [pricing](https://www.microsoft.com/bing/apis/grounding-pricing)
## Troubleshooting
| Issue | Cause | Resolution |
|-------|-------|------------|
| Connection not found | Name mismatch or wrong project | Use `project_connection_list` to find the correct `connectionName` |
| Unauthorized creating connection | Missing Azure AI Project Manager role | Assign role on the Foundry project |
| Bing resource creation fails | Provider not registered | Run `az provider register --namespace 'Microsoft.Bing'` |
| No results returned | Connection misconfigured | Verify Bing resource key and connection setup |
## References
- [Bing Grounding tool documentation](https://learn.microsoft.com/azure/ai-foundry/agents/how-to/tools/bing-grounding?view=foundry)
- [Tool Catalog](https://learn.microsoft.com/azure/ai-foundry/agents/concepts/tool-catalog?view=foundry)
- [Grounding with Bing Terms](https://www.microsoft.com/bing/apis/grounding-legal-enterprise)
- [Connections Guide](../../../project/connections.md)
- [Web Search Tool (default)](tool-web-search.md)
tool-file-search.md 2.6 KB
# File Search Tool
Enables agents to search through uploaded files using semantic and keyword search from vector stores. Supports a wide range of file formats including PDF, Markdown, Word, and more.
> β οΈ **Important:** Before creating an agent with file search, you **must** read the official documentation linked in the References section to understand prerequisites, supported file types, and vector store setup.
## Prerequisites
- A [basic or standard agent environment](https://learn.microsoft.com/azure/ai-foundry/agents/environment-setup)
- A **vector store** must be created before the agent β the `file_search` tool requires `vector_store_ids`
- Files must be uploaded to the vector store before the agent can search them
## Key Concepts
| Concept | Description |
|---------|-------------|
| **Vector Store** | A container that indexes uploaded files for semantic search. Must be created first. |
| **vector_store_ids** | Required parameter on the `file_search` tool β references the vector store(s) to search. |
| **File upload** | Files are uploaded to the project, then attached to a vector store for indexing. |
## Setup Workflow
```
1. Create a vector store (REST API: POST /vector_stores)
β
βΌ
2. (Optional) Upload files and attach to vector store
β
βΌ
3. Create agent with file_search tool referencing the vector_store_ids
β
βΌ
4. Agent can now search files in the vector store
```
> β οΈ **Warning:** Creating an agent with `file_search` without providing `vector_store_ids` will fail with a `400 BadRequest` error: `required: Required properties ["vector_store_ids"] are not present`.
## REST API Notes
When creating vector stores via `az rest`:
| Parameter | Value |
|-----------|-------|
| **Endpoint** | `https://<resource>.services.ai.azure.com/api/projects/<project>/vector_stores` |
| **API version** | `v1` |
| **Auth resource** | `https://ai.azure.com` |
## Troubleshooting
| Error | Cause | Fix |
|-------|-------|-----|
| `vector_store_ids` not present | Agent created without vector store | Create a vector store first, then pass its ID |
| 401 Unauthorized | Wrong auth resource for REST API | Use `--resource "https://ai.azure.com"` with `az rest` |
| Bad API version | Using ARM-style API version | Use `api-version=v1` for the data-plane vector store API |
| No search results | Vector store is empty | Upload files to the vector store before querying |
## References
- [File Search tool documentation](https://learn.microsoft.com/azure/ai-foundry/agents/how-to/tools/file-search?view=foundry&pivots=python)
- [Tool Catalog](https://learn.microsoft.com/azure/ai-foundry/agents/concepts/tool-catalog?view=foundry)
tool-mcp.md 3.3 KB
# MCP Tool (Model Context Protocol)
Connect agents to remote MCP servers to extend capabilities with external tools and data sources. MCP is an open standard for LLM tool integration.
## Prerequisites
- A remote MCP server endpoint (e.g., `https://api.githubcopilot.com/mcp`)
- For authenticated servers: a [project connection](../../../project/connections.md) storing credentials
- RBAC: **Contributor** or **Owner** role on the Foundry project
## Authenticated Server Connections
For authenticated MCP servers, create an `api_key` project connection to store credentials. Unauthenticated servers (public endpoints) don't need a connection β omit `project_connection_id`.
See [Project Connections](../../../project/connections.md) for connection management via Foundry MCP tools.
## MCPTool Parameters
| Parameter | Required | Description |
|-----------|----------|-------------|
| `server_label` | Yes | Unique label for this MCP server within the agent |
| `server_url` | Yes | Remote MCP server endpoint URL |
| `require_approval` | No | `"always"` (default), `"never"`, or `{"never": ["tool1"]}` / `{"always": ["tool1"]}` |
| `allowed_tools` | No | List of specific tools to enable (default: all) |
| `project_connection_id` | No | Connection ID for authenticated servers |
## Approval Workflow
1. Agent sends request β MCP server returns tool calls
2. Response contains `mcp_approval_request` items
3. Your code reviews tool name + arguments
4. Submit `McpApprovalResponse` with `approve=True/False`
5. Agent completes work using approved tool results
> **Best practice:** Always use `require_approval="always"` unless you fully trust the MCP server. Use `allowed_tools` to restrict which tools the agent can access.
## Hosting Local MCP Servers
Agent Service only accepts **remote** MCP endpoints. To use a local server, deploy it to:
| Platform | Transport | Notes |
|----------|-----------|-------|
| [Azure Container Apps](https://github.com/Azure-Samples/mcp-container-ts) | HTTP POST/GET | Any language, container rebuild needed |
| [Azure Functions](https://github.com/Azure-Samples/mcp-sdk-functions-hosting-python) | HTTP streamable | Python/Node/.NET/Java, key-based auth |
## Known Limitations
- **100-second timeout** for non-streaming MCP tool calls
- **Identity passthrough not supported in Teams** β agents published to Teams use project managed identity
- **Network-secured Foundry** can't use private MCP servers in same vNET β only public endpoints
## Troubleshooting
| Error | Cause | Fix |
|-------|-------|-----|
| `Invalid tool schema` | `anyOf`/`allOf` in MCP server definition | Update MCP server schema to use simple types |
| `Unauthorized` / `Forbidden` | Wrong credentials in connection | Verify connection credentials match server requirements |
| Model never calls MCP tool | Misconfigured server_label/url | Check `server_label`, `server_url`, `allowed_tools` values |
| Agent stalls after approval | Missing `previous_response_id` | Include `previous_response_id` in follow-up request |
| Timeout | Server takes >100s | Optimize server-side logic or break into smaller operations |
## References
- [MCP tool documentation](https://learn.microsoft.com/azure/ai-foundry/agents/how-to/tools/mcp?view=foundry)
- [Tool Catalog](https://learn.microsoft.com/azure/ai-foundry/agents/concepts/tool-catalog?view=foundry)
- [Project Connections](../../../project/connections.md)
tool-memory.md 4.7 KB
# Agent Memory
Managed long-term memory for Foundry agents. Enables agent continuity across sessions, devices, and workflows. Agents retain user preferences, conversation history, and deliver personalized experiences. Memory is stored in your project's owned storage.
## Prerequisites
- A [Foundry project](https://learn.microsoft.com/azure/ai-foundry/how-to/create-projects) with authorization configured
- A **chat model deployment** (e.g., `gpt-5.2`)
- An **embedding model deployment** (e.g., `text-embedding-3-small`) β see [Check Embedding Model](#check-embedding-model) below
- Python packages: `pip install azure-ai-projects azure-identity`
### Check Embedding Model
An embedding model is **required** before enabling memory. Check if one is already deployed:
Use `foundry_models_list` MCP tool to list all deployments and look for an embedding model (e.g., `text-embedding-3-small`, `text-embedding-3-large`, `text-embedding-ada-002`).
| Result | Action |
|--------|--------|
| β
Embedding model found | Note the deployment name and proceed |
| β No embedding model | Deploy one before enabling memory β see below |
### Deploy Embedding Model
If no embedding model exists, use `foundry_models_deploy` MCP tool with:
- `deploymentName`: `text-embedding-3-small` (or preferred name)
- `modelName`: `text-embedding-3-small`
- `modelFormat`: `OpenAI`
## Authorization and Permissions
| Role | Scope | Purpose |
|------|-------|---------|
| **Azure AI User** | AI Services resource | Assigned to project managed identity |
| **System-assigned managed identity** | Project | Must be enabled on the project |
**Setup steps:**
1. In Azure portal β project β **Resource Management** β **Identity** β enable system-assigned managed identity
2. On the AI Services resource β **Access control (IAM)** β assign **Azure AI User** to the project managed identity
## Workflow
```
User wants agent memory
β
βΌ
Step 1: Check for embedding model deployment
β ββ β
Found β Continue
β ββ β Not found β Deploy one (ask user)
β
βΌ
Step 2: Create memory store
β
βΌ
Step 3: Attach memory tool to agent
β
βΌ
Step 4: Test with conversation
```
## Key Concepts
### Memory Store Options
| Option | Description |
|--------|-------------|
| `chat_summary_enabled` | Summarize conversations for memory |
| `user_profile_enabled` | Build and maintain user profile |
| `user_profile_details` | Control what data gets stored (e.g., `"Avoid sensitive data such as age, financials, location, credentials"`) |
> π‘ **Tip:** Use `user_profile_details` to control what the agent stores β e.g., `"flight carrier preference and dietary restrictions"` for a travel agent, or exclude sensitive data.
### Scope
The `scope` parameter partitions memory per user:
| Scope Value | Behavior |
|-------------|----------|
| `{{$userId}}` | Auto-extracts TID+OID from auth token (recommended) |
| `"user_123"` | Static identifier β you manage user mapping |
### Memory Store Operations
| Operation | Description |
|-----------|-------------|
| Create | Initialize a memory store with chat/embedding models and options |
| List | List all memory stores in the project |
| Update | Update memory store description or configuration |
| Delete scope | Delete memories for a specific user scope |
| Delete store | Delete entire memory store (irreversible β all scopes lost) |
> β οΈ **Warning:** Deleting a memory store removes all memories across all scopes. Agents with attached memory stores lose access to historical context.
## Troubleshooting
| Issue | Cause | Resolution |
|-------|-------|------------|
| Auth/authorization error | Identity or managed identity lacks required roles | Verify roles in Authorization section; refresh access token for REST |
| Memories don't appear after conversation | Updates are debounced or still processing | Increase wait time or call update API with `update_delay=0` |
| Memory search returns no results | Scope mismatch between update and search | Use same scope value for storing and retrieving memories |
| Agent response ignores stored memory | Agent not configured with memory search tool | Confirm agent definition includes `MemorySearchTool` with correct store name |
| No embedding model available | Embedding deployment missing | Deploy an embedding model β see Check Embedding Model section |
## References
- [Memory tool documentation](https://learn.microsoft.com/azure/ai-foundry/agents/how-to/memory-usage?view=foundry)
- [Memory Concepts](https://learn.microsoft.com/azure/ai-foundry/agents/concepts/what-is-memory)
- [Tool Catalog](https://learn.microsoft.com/azure/ai-foundry/agents/concepts/tool-catalog?view=foundry)
- [Python Samples](https://github.com/Azure/azure-sdk-for-python/tree/main/sdk/ai/azure-ai-projects/samples/memories)
tool-web-search.md 3.6 KB
# Web Search Tool (Preview)
Enables agents to retrieve and ground responses with real-time public web information before generating output. Returns up-to-date answers with inline URL citations. This is the **default tool for web search** β no external resource or connection setup required.
> β οΈ **Warning:** For Bing Grounding or Bing Custom Search (which require a separate Bing resource and project connection), see [tool-bing-grounding.md](tool-bing-grounding.md). Only use those when explicitly requested.
## Important Disclosures
- Web Search (preview) uses Grounding with Bing Search and Grounding with Bing Custom Search, which are [First Party Consumption Services](https://www.microsoft.com/licensing/terms/product/Glossary/EAEAS) governed by [Grounding with Bing terms of use](https://www.microsoft.com/bing/apis/grounding-legal-enterprise) and the [Microsoft Privacy Statement](https://go.microsoft.com/fwlink/?LinkId=521839&clcid=0x409).
- The [Data Protection Addendum](https://aka.ms/dpa) **does not apply** to data sent to Grounding with Bing Search and Grounding with Bing Custom Search.
- Data transfers occur **outside compliance and geographic boundaries**.
- Usage incurs costs β see [pricing](https://www.microsoft.com/bing/apis/grounding-pricing).
## Prerequisites
- A [basic or standard agent environment](https://learn.microsoft.com/azure/ai-foundry/agents/environment-setup)
- Azure credentials configured (e.g., `DefaultAzureCredential`)
## Setup
No external resource or project connection is required. The web search tool works out of the box when added to an agent definition.
## Configuration Options
| Parameter | Description | Default |
|-----------|-------------|---------|
| `user_location` | Approximate location (country/region/city) for localized results | None |
| `search_context_size` | Context window space for search: `low`, `medium`, `high` | `medium` |
## Administrator Control
Admins can enable or disable web search at the subscription level via Azure CLI. Requires Owner or Contributor access.
- **Disable:** `az feature register --name OpenAI.BlockedTools.web_search --namespace Microsoft.CognitiveServices --subscription "<subscription-id>"`
- **Enable:** `az feature unregister --name OpenAI.BlockedTools.web_search --namespace Microsoft.CognitiveServices --subscription "<subscription-id>"`
## Security Considerations
- Treat web search results as **untrusted input**. Validate before use in downstream systems.
- Avoid sending secrets or sensitive data in prompts forwarded to external services.
## Troubleshooting
| Issue | Cause | Resolution |
|-------|-------|------------|
| No citations appear | Model didn't determine web search was needed | Update instructions to explicitly allow web search; ask queries requiring current info |
| Requests fail after enabling | Web search disabled at subscription level | Ask admin to enable β see Administrator Control above |
| Authentication errors (REST) | Bearer token missing, expired, or insufficient | Refresh token; confirm project/agent access |
| Outdated results | Content not recently indexed by Bing | Refine query to request most recent info |
| No results for specific topics | Query too narrow | Broaden query; niche topics may have limited coverage |
| Rate limiting (429) | Too many requests | Implement exponential backoff; space out requests |
## References
- [Web Search tool documentation](https://learn.microsoft.com/azure/ai-foundry/agents/how-to/tools/web-search?view=foundry)
- [Tool Catalog](https://learn.microsoft.com/azure/ai-foundry/agents/concepts/tool-catalog?view=foundry)
- [Bing Pricing](https://www.microsoft.com/bing/apis/grounding-pricing)
toolbox-reference.md 5.1 KB
# Toolbox Reference
Endpoint format, MCP protocol details, authentication, OAuth consent handling, endpoint testing, and troubleshooting for Foundry Toolboxes.
## Endpoint Format
The toolbox MCP endpoint is constructed from the **project endpoint** + **toolbox name**:
| Endpoint | URL |
|----------|-----|
| Latest version (default) | `{project_endpoint}/toolboxes/{toolbox_name}/mcp?api-version=v1` |
| Specific version | `{project_endpoint}/toolboxes/{toolbox_name}/versions/{version}/mcp?api-version=v1` |
- **Project endpoint** format: `https://<account>.services.ai.azure.com/api/projects/<project>`
- The latest-version endpoint always serves the toolbox's `default_version`.
- Use the specific-version endpoint to test a version before promoting it.
- **Required header** on every request: `Foundry-Features: Toolboxes=V1Preview`
- `?api-version=v1` query parameter is **required** β requests without it return HTTP 400.
## MCP Protocol
Toolboxes use **Model Context Protocol (MCP)** β JSON-RPC 2.0 over HTTP POST:
- **`initialize`** β Handshake to establish an MCP session. Returns a `mcp-session-id` header to include in subsequent requests.
- **`tools/list`** β Returns all available tools with names, descriptions, and input schemas.
- **`tools/call`** β Invokes a tool with arguments and returns structured results.
> `prompts/list` is **not supported** by the toolbox endpoint. Always pass `load_prompts=False` to MCP client constructors.
## Authentication
- **Agent β Toolbox:** Azure AD bearer token with scope `https://ai.azure.com/.default`, refreshed on every request.
- **Toolbox β External Services:** Managed by the platform via project connections (API keys, OAuth, managed identity).
## OAuth Consent Handling
When a toolbox includes an OAuth-based MCP connection (e.g., GitHub OAuth), the first call triggers a `CONSENT_REQUIRED` error (MCP error code `-32006`). The error message contains the consent URL.
**Agent code must handle this:**
1. Catch MCP error code `-32006` from `tools/call` or during MCP session initialization.
2. Extract the consent URL from the error message.
3. Log the URL and surface it to the user (e.g., print to stdout or return in the agent response).
4. After the user completes the OAuth flow in a browser, retry the call β subsequent calls succeed without re-prompting.
> This is a one-time flow per user per OAuth connection in a project. The agent should not silently swallow this error.
## Testing the Toolbox Endpoint
Before running the full agent, verify the toolbox MCP endpoint works end-to-end. Use `az login` for authentication, then test the three MCP operations in order:
**1. Get a bearer token:**
```bash
TOKEN=$(az account get-access-token --resource https://ai.azure.com --query accessToken -o tsv)
TOOLBOX_URL="https://<account>.services.ai.azure.com/api/projects/<project>/toolboxes/<name>/mcp?api-version=v1"
```
**2. Initialize MCP session:**
```bash
curl -sS -X POST "$TOOLBOX_URL" \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-H "Foundry-Features: Toolboxes=V1Preview" \
-d '{"jsonrpc":"2.0","id":1,"method":"initialize","params":{"protocolVersion":"2024-11-05","capabilities":{},"clientInfo":{"name":"debug","version":"1.0.0"}}}' \
-D - | head -20
```
Save the `mcp-session-id` header from the response for subsequent calls.
**3. List tools:**
```bash
curl -sS -X POST "$TOOLBOX_URL" \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-H "Foundry-Features: Toolboxes=V1Preview" \
-H "mcp-session-id: <session-id-from-step-2>" \
-d '{"jsonrpc":"2.0","id":2,"method":"tools/list","params":{}}' | jq .
```
**Checklist:**
- Response contains `result.tools[]` with `len > 0`
- Each tool has `name`, `description`, and `inputSchema` with a `properties` field
- MCP tool names for remote servers are prefixed with `server_label` (e.g., `myserver.get_info`)
**4. Call a tool (optional):**
```bash
curl -sS -X POST "$TOOLBOX_URL" \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-H "Foundry-Features: Toolboxes=V1Preview" \
-H "mcp-session-id: <session-id-from-step-2>" \
-d '{"jsonrpc":"2.0","id":3,"method":"tools/call","params":{"name":"<tool_name>","arguments":{"query":"test"}}}' | jq .
```
> For a Python-based debug client, see the `_McpToolboxClient` class in the [BYO toolbox sample `main.py`](https://github.com/microsoft-foundry/foundry-samples/blob/main/samples/python/hosted-agents/bring-your-own/responses/bring-your-own-toolbox/main.py) β it implements `initialize`, `list_tools`, and `call_tool` using raw `httpx` calls.
## Troubleshooting
| Error | Cause | Resolution |
|-------|-------|------------|
| CONSENT_REQUIRED (code -32006) | OAuth MCP connection needs user consent | Open consent URL in browser, complete OAuth flow, retry |
| 401 on MCP calls | Expired token or wrong scope | Use scope `https://ai.azure.com/.default` and refresh token |
| 500 on `prompts/list` | Not supported by toolbox endpoint | Pass `load_prompts=False` to MCP client constructor |
| 500 with non-streaming `tools/call` | Non-streaming not supported | Always use `stream=True` for toolbox MCP tools |
use-toolbox-in-hosted-agent.md 5.5 KB
# Use Toolbox in a Hosted Agent
Hosted agents access Foundry-managed tools through a **Toolbox MCP endpoint**. Unlike prompt agents that wire tools directly, hosted agents connect to a single MCP-compatible endpoint that exposes all configured tools. The platform handles credential injection, token refresh, and policy enforcement.
> π For endpoint format, MCP protocol details, auth, OAuth consent handling, testing, and troubleshooting, see [toolbox-reference.md](toolbox-reference.md).
## Quick Reference
| Property | Value |
|----------|-------|
| **Toolbox Docs** | https://learn.microsoft.com/azure/foundry/agents/how-to/tools/toolbox |
| **Default Sample (Python)** | https://github.com/microsoft-foundry/foundry-samples/tree/main/samples/python/toolbox/maf |
| **Python Hosted Agent β `responses`** | https://github.com/microsoft-foundry/foundry-samples/tree/main/samples/python/hosted-agents/bring-your-own/responses |
| **Python Hosted Agent β `invocations`** | https://github.com/microsoft-foundry/foundry-samples/tree/main/samples/python/hosted-agents/bring-your-own/invocations |
| **C# (.NET) Samples** | https://github.com/microsoft-foundry/foundry-samples/tree/main/samples/csharp/toolbox |
| **Supported Tool Types & Auth** | https://github.com/microsoft-foundry/foundry-samples/blob/main/samples/python/toolbox/SUPPORTED_TOOLBOX_TOOLS.md |
## Resolve Toolbox Endpoint
If the user provides a toolbox name or endpoint URL, or the project already references a toolbox (e.g., in `.env` or `agent.manifest.yaml`) β use it directly.
Otherwise, ask one question:
> _"Would you like to provide your toolbox endpoint? (you can create one with the [Foundry Toolkit in VS Code](https://code.visualstudio.com/docs/intelligentapps/tool-catalog) or the [Foundry Portal](https://ai.azure.com/))"_
Once the user supplies the toolbox name/endpoint β either an existing one or a new one they create via the Foundry Toolkit or Foundry Portal β set it on the agent (e.g., `FOUNDRY_TOOLBOX_ENDPOINT` in `.env`) and continue with verification.
> **When asking the question, always include the doc links inline** for the manual options β the [Foundry Toolkit in VS Code](https://code.visualstudio.com/docs/intelligentapps/tool-catalog) and the [Foundry Portal](https://ai.azure.com/) β so the user knows where to go to create a tool/toolbox themselves. Don't just name the options; render them as clickable links every time.
> **Before printing out any step-by-step guidance** for the Foundry Toolkit (VS Code) path, fetch and read [Use Tool Catalog to connect tools and Toolboxes in Foundry Toolkit](https://code.visualstudio.com/docs/intelligentapps/tool-catalog) first, then summarize the relevant steps for them. Don't paraphrase from memory β the Toolkit UI changes; quote the current doc.
> **Available tool types** (for context when discussing what the toolbox will contain): Web Search, Azure AI Search, Code Interpreter, File Search, MCP Server (third-party MCP servers, e.g. GitHub, and Microsoft first-party MCP servers, e.g. WorkIQ), OpenAPI, Agent-to-Agent (A2A). See [Configure tools](https://learn.microsoft.com/azure/foundry/agents/how-to/tools/toolbox#configure-tools).
## Code Integration Patterns
The sample repo provides integration patterns for both Python and C#. Read the sample code and adapt it to the user's project.
**Python samples:**
| Sample | Framework | Protocol | When to use |
|--------|-----------|----------|-------------|
| [`toolbox/maf/`](https://github.com/microsoft-foundry/foundry-samples/tree/main/samples/python/toolbox/maf) β recommended | Agent Framework (MAF) | Responses | **Default choice** |
| [`bring-your-own/responses/langgraph-toolbox/`](https://github.com/microsoft-foundry/foundry-samples/tree/main/samples/python/hosted-agents/bring-your-own/responses/langgraph-toolbox) | LangGraph (BYO) | Responses | LangGraph hosted agent with toolbox |
| [`toolbox/copilot-sdk/`](https://github.com/microsoft-foundry/foundry-samples/tree/main/samples/python/toolbox/copilot-sdk) | GitHub Copilot SDK | Responses | Copilot SDK with toolbox tools |
| [`bring-your-own/responses/bring-your-own-toolbox/`](https://github.com/microsoft-foundry/foundry-samples/tree/main/samples/python/hosted-agents/bring-your-own/responses/bring-your-own-toolbox) | Generic MCP (BYO) | Responses | Raw `httpx` MCP client β works with any framework |
| [`bring-your-own/invocations/toolbox/`](https://github.com/microsoft-foundry/foundry-samples/tree/main/samples/python/hosted-agents/bring-your-own/invocations/toolbox) | Generic MCP (BYO) | Invocations | Toolbox via Invocations protocol |
**C# (.NET) samples:**
| Sample | Description |
|--------|-------------|
| [`csharp/toolbox/maf/`](https://github.com/microsoft-foundry/foundry-samples/tree/main/samples/csharp/toolbox/maf) β recommended | Agent Framework agent with toolbox MCP (Responses protocol) |
**Notes:** (apply to all patterns, both Python and C#):
- Auth: Inject a bearer token with scope `https://ai.azure.com/.default` on every request (Python: `httpx.Auth` subclass; C#: `DefaultAzureCredential` + `BearerTokenAuthenticationPolicy`).
- Header: Always include `Foundry-Features: Toolboxes=V1Preview`.
- MCP client: Pass `load_prompts=False` β the toolbox endpoint does not support `prompts/list`.
- Endpoint: Construct from `{project_endpoint}/toolboxes/{toolbox_name}/mcp?api-version=v1`.
> π‘ **Tip:** If MCP tools have `require_approval: "always"` in `_meta.tool_configuration`, the agent runtime must ask the user for confirmation before invoking. The toolbox endpoint does not enforce this β your agent code is responsible.
deploy.md 24.2 KB
# Foundry Agent Deploy
Create and manage agent deployments in Azure AI Foundry. For hosted agents, this includes the full workflow from containerizing the project to verifying the deployed agent.
## Quick Reference
| Property | Value |
|----------|-------|
| Agent types | Prompt (LLM-based), Hosted |
| MCP server | `azure` |
| Key Foundry MCP tools | `agent_definition_schema_get`, `agent_update`, `agent_get` |
| CLI tools | `docker`, `az acr` (hosted agents only) |
| Container protocols | `a2a`, `responses`, `invocations`, `mcp` |
| Supported languages | .NET, Node.js, Python, Go, Java |
## When to Use This Skill
USE FOR: deploy agent to foundry, push agent to foundry, ship my agent, build and deploy container agent, deploy hosted agent, create hosted agent, deploy prompt agent, ACR build, container image for agent, docker build for foundry, redeploy agent, update agent deployment, clone agent, delete agent, azd deploy hosted agent, azd ai agent, azd up for agent, deploy agent with azd.
> β οΈ **DO NOT manually run** `azd up`, `azd deploy`, `az acr build`, `docker build`, or `agent_update` **without reading this skill first.** This skill orchestrates the full deployment pipeline: project scan β env var collection β Dockerfile generation β image build β agent creation β verification. Running CLI commands or calling MCP tools individually skips critical steps (env var confirmation, schema validation, RBAC setup, invocation verification).
## MCP Tools
| Tool | Description | Parameters |
|------|-------------|------------|
| `agent_definition_schema_get` | Get JSON schema for agent definitions | `projectEndpoint` (required), `schemaType` (`prompt`, `hosted`, `tools`, `all`) |
| `agent_update` | Create, update, or clone an agent | `projectEndpoint`, `agentName` (required); `agentDefinition` (JSON), `isCloneRequest`, `cloneTargetAgentName`, `modelName` |
| `agent_get` | List all agents or get a specific agent | `projectEndpoint` (required), `agentName` (optional) |
| `agent_delete` | Delete an agent and clean up hosted-agent runtime resources | `projectEndpoint`, `agentName` (required) |
## Workflow: Hosted Agent Deployment
### Step 1: Detect and Scan Project
Get the project path from the selected agent root in the project context (see Common: Project Context Resolution). Detect the project type by checking for these files. Do **not** scan sibling agent folders.
| Project Type | Detection Files |
|--------------|-----------------|
| .NET | `*.csproj`, `*.fsproj` |
| Node.js | `package.json` |
| Python | `requirements.txt`, `pyproject.toml`, `setup.py` |
| Go | `go.mod` |
| Java (Maven) | `pom.xml` |
| Java (Gradle) | `build.gradle` |
Delegate an environment variable scan to a sub-agent. Provide the selected agent root path and project type. Search source files inside that folder only for these patterns:
| Project Type | Patterns to Search |
|--------------|--------------------|
| .NET (`*.cs`) | `Environment.GetEnvironmentVariable("...")`, `configuration["..."]`, `configuration.GetValue<T>("...")` |
| Node.js (`*.js`, `*.ts`, `*.mjs`) | `process.env.VAR_NAME`, `process.env["..."]` |
| Python (`*.py`) | `os.environ["..."]`, `os.environ.get("...")`, `os.getenv("...")` |
| Go (`*.go`) | `os.Getenv("...")`, `os.LookupEnv("...")` |
| Java (`*.java`) | `System.getenv("...")`, `@Value("${...}")` |
Classification: if followed by a throw/error β required; if followed by a fallback value β optional with default; otherwise β assume required, ask user.
### Step 2: Collect and Confirm Environment Variables
> β οΈ **Warning:** Environment variables are included in the agent payload and are difficult to change after deployment.
Use azd environment values from the project context to pre-fill discovered variables. Merge with any user-provided values. Present all variables to the user for confirmation with variable name, value, and source (`azd`, `project default`, or `user`). Mask sensitive values.
Loop until the user confirms or cancels:
- `yes` β Proceed
- `VAR_NAME=new_value` β Update the value, show updated table, ask again
- `cancel` β Abort deployment
### Step 3: Generate Dockerfile and Build Image
Delegate Dockerfile creation to a sub-agent. Guidelines:
- Use official base image for the detected language and runtime version
- Use multi-stage builds for compiled languages
- Use Alpine or slim variants for smaller images
- Always target `linux/amd64` platform
- Expose the correct port (usually 8088)
> π‘ **Tip:** Reference [Hosted Agents Foundry Samples](https://github.com/microsoft-foundry/foundry-samples/tree/main/samples/python/hosted-agents) for containerized agent examples.
Also generate `docker-compose.yml` and `.env` files for local development.
**IMPORTANT**: You MUST always generate image tag as current timestamp (e.g., `myagent:202401011230`) to ensure uniqueness and avoid conflicts with existing images in ACR. DO NOT use static tags like `latest` or `v1`.
Collect ACR details from project context.
- If an ACR already exists, use it, then verify that the Foundry project managed identity has pull permissions (for example, `Container Registry Repository Reader` or equivalent) on the target repository/registry. If the role assignment is missing, add it.
- If no ACR exists, create a new one with ABAC repository permissions mode, and assign `Container Registry Repository Reader` to the Foundry project managed identity. Foundry hosted agents use ABAC mode that requires repository-scoped roles, not the registry-level `AcrPull` role.
Let the user choose the build method:
**Cloud Build (ACR Tasks) (Recommended)** β no local Docker required:
```bash
az acr build --registry <acr-name> --image <repository>:<tag> --platform linux/amd64 --source-acr-auth-id "[caller]" --file Dockerfile .
```
> β οΈ **Mandatory:** The `--source-acr-auth-id "[caller]"` parameter is required. Do NOT omit it β without this flag the build will fail due to missing authentication context.
**Local Docker Build:**
```bash
docker build --platform linux/amd64 -t <image>:<tag> -f Dockerfile .
az acr login --name <acr-name>
docker tag <image>:<tag> <acr-name>.azurecr.io/<repository>:<tag>
docker push <acr-name>.azurecr.io/<repository>:<tag>
```
> π‘ **Tip:** Prefer Cloud Build if Docker is not available locally. On Windows with WSL, prefix Docker commands with `wsl -e` if `docker info` fails but `wsl -e docker info` succeeds.
### Step 4: Collect Agent Configuration
Use the project endpoint and ACR name from the project context. Ask the user only for values not already resolved:
- **Agent name** β Unique name for the agent
- **Model deployment** β Model deployment name (e.g., `gpt-4o`)
### Step 5: Get Agent Definition Schema
Use `agent_definition_schema_get` with `schemaType: hosted` to retrieve the current schema and validate required fields.
### Step 6: Create the Agent
Use `agent_update` with the agent definition:
> β οΈ **Protocol version source of truth:** Do NOT copy the protocol version from `agent_definition_schema_get` examples. Use the protocol version declared by the agent source itself (for example, `agent.yaml` or `agent.manifest.yaml`).
```json
{
"command": "agent_update",
"intent": "Update a hosted agent with a new docker image",
"parameters": {
"projectEndpoint": "<project-endpoint>",
"agentName": "<agent-name>",
"agentDefinition": {
"kind": "hosted",
"image": "<acr-name>.azurecr.io/<repository>:<tag>",
"cpu": "<cpu-cores>",
"memory": "<memory>",
"container_protocol_versions": [
{ "protocol": "<protocol>", "version": "<version>" }
],
"environment_variables": { "<var>": "<value>" }
}
}
}
```
Capture the per-agent identity from the agent creation response, then retrieve the project-level agent identity from the project resource after creation. You will need both identities to assign the minimum RBAC required for invocation before running invoke tests.
### Step 7: Test the Agent
For a newly deployed hosted agent, before invocation testing, first check whether the per-agent identity and project-level agent identity already have the minimum RBAC required for invocation.
Required role assignment:
- `Azure AI User`
Required scope: the Cognitive Services account, not the project.
Check existing assignments before creating any new assignment. If the required role assignment is missing for either identity, assign it before invocation testing.
If the current user account does not have permission to create a missing role assignment, stop the deployment workflow here. Explain to the user that hosted-agent invocation requires `Azure AI User` on the per-agent identity and project-level agent identity at the Cognitive Services account scope, and the deployment cannot be treated as complete until someone with RBAC assignment permission grants the missing role.
After this RBAC check is complete, read and follow the [invoke skill](../invoke/invoke.md) to send a test message and verify the agent responds correctly. DO NOT SKIP reading the invoke skill β it contains important information about required hosted-agent session handling.
If invocation testing still fails after this RBAC check, immediately read and follow the [troubleshoot skill](../troubleshoot/troubleshoot.md). Do not treat the deployment as fully successful until invocation succeeds.
> β οΈ **DO NOT stop here.** Continue to Step 8 (Auto-Create Evaluators & Dataset). This step is mandatory after every successful deployment.
### Step 8: Auto-Create Evaluators & Dataset
Follow [After Deployment β Auto-Create Evaluators & Dataset](#after-deployment--auto-create-evaluators--dataset) below.
## Workflow: Prompt Agent Deployment
### Step 1: Collect Agent Configuration
Use the project endpoint from the project context (see Common: Project Context Resolution). Ask the user only for values not already resolved:
- **Agent name** β Unique name for the agent
- **Model deployment** β Model deployment name (e.g., `gpt-4o`)
- **Instructions** β System prompt (optional)
- **Temperature** β Response randomness 0-2 (optional, default varies by model)
- **Tools** β Tool configurations (optional)
### Step 2: Get Agent Definition Schema
Use `agent_definition_schema_get` with `schemaType: prompt` to retrieve the current schema.
### Step 3: Create the Agent
Use `agent_update` with the agent definition:
```json
{
"kind": "prompt",
"model": "<model-deployment>",
"instructions": "<system-prompt>",
"temperature": 0.7
}
```
### Step 4: Test the Agent
Read and follow the [invoke skill](../invoke/invoke.md) to send a test message and verify the agent responds correctly.
> β οΈ **DO NOT stop here.** Continue to Step 5 (Auto-Create Evaluators & Dataset). This step is mandatory after every successful deployment.
### Step 5: Auto-Create Evaluators & Dataset
Follow [After Deployment β Auto-Create Evaluators & Dataset](#after-deployment--auto-create-evaluators--dataset) below.
## Display Agent Information
Once deployment is done for either hosted or prompt agent, display the agent's details in a nicely formatted table.
Below the table you MUST also display a Playground link for direct access to the agent in Azure AI Foundry:
[Open in Playground](https://ai.azure.com/nextgen/r/{encodedSubId},{resourceGroup},,{accountName},{projectName}/build/agents/{agentName}/build?version={agentVersion})
To calculate the encodedSubId, you need to take subscription id and convert it into its 16-byte GUID, then encode it as URL-safe base64 without padding (= characters trimmed). You can use the following Python code to do this conversion:
```
python -c "import base64,uuid;print(base64.urlsafe_b64encode(uuid.UUID('<SUBSCRIPTION_ID>').bytes).rstrip(b'=').decode())"
```
## Document Deployment Context
After a successful deployment, persist the deployment context to the selected metadata file under `<agent-root>/.foundry/` so future conversations (evaluation, trace analysis, monitoring) can reuse it automatically. Local/dev flows should default to `agent-metadata.yaml`; prod or CI-targeted flows can point at `agent-metadata.prod.yaml` or another explicit sidecar file. See [Agent Metadata Contract](../../references/agent-metadata-contract.md) for the canonical schema.
| Metadata Field | Purpose | Example |
|----------------|---------|---------|
| `environments.<env>.projectEndpoint` | Foundry project endpoint | `https://<account>.services.ai.azure.com/api/projects/<project>` |
| `environments.<env>.agentName` | Deployed agent name | `my-support-agent` |
| `environments.<env>.azureContainerRegistry` | ACR resource (hosted agents) | `myregistry.azurecr.io` |
| `environments.<env>.evaluationSuites[]` | Evaluation bundles for datasets, evaluators, tags, and thresholds | `smoke-core`, `trace-regression-suite` |
| `environments.<env>.evaluationSuites[].datasetUri` | Remote Foundry dataset URI for shared eval workflows | `azureml://datastores/.../paths/...` |
If the selected metadata file is a preferred single-environment file, update only that one environment block and leave sibling metadata files untouched. If the selected metadata file is a legacy multi-environment file, merge the selected environment instead of overwriting other environments or cached evaluation suites without confirmation. If the selected environment still uses older `testSuites[]` or legacy `testCases[]`, rewrite that environment to `evaluationSuites[]` when you persist deployment metadata.
## After Deployment β Auto-Create Evaluators & Dataset
> β οΈ **This step is automatic.** After a successful deployment, immediately prepare the selected `.foundry` environment for evaluation without waiting for the user to request it. This matches the eval-driven optimization loop.
### 1. Read Agent Instructions
Use **`agent_get`** (or local `agent.yaml`) to understand the agent's purpose and capabilities.
### 2. Reuse or Refresh Local Cache
Inspect the selected agent root before generating anything new:
- Reuse `.foundry/evaluators/` and `.foundry/datasets/` when they already contain the right assets for the selected environment.
- Ask before refreshing cached files or replacing thresholds.
- If cache is missing or stale, regenerate the dataset/evaluators and update metadata for the active environment only.
### 2.5 Discover Existing Evaluators
Use **`evaluator_catalog_get`** with the selected environment's project endpoint to list all evaluators already registered in the project. Display them to the user grouped by type (`custom` vs `built-in`) with name, category, and version. During Phase 1, catalog any promising custom evaluators for later reuse, but keep the first run on the built-in baseline. Only propose creating a new evaluator in Phase 2 when no existing evaluator covers the required dimension.
### 3. Select Default Evaluators
Follow the [observe skill's Two-Phase Evaluator Strategy](../observe/observe.md). Phase 1 is built-in only, so do not create a new custom evaluator during the initial setup pass.
Start with <=5 built-in evaluators for the initial eval run so the first pass stays fast:
| Category | Evaluators |
|----------|-----------|
| **Quality (built-in)** | relevance, task_adherence, intent_resolution |
| **Safety (built-in)** | indirect_attack |
| **Tool use (built-in, conditional)** | tool_call_accuracy (use when the agent calls tools; some catalogs label it as `builtin.tool_call_accuracy`) |
After analyzing initial results, suggest additional evaluators (custom or built-in) targeted at specific failure patterns instead of front-loading a larger default set.
If Phase 2 is needed, call `evaluator_catalog_get` again to reuse an existing custom evaluator first. Only create a new custom evaluator when the catalog still lacks the required signal, and prefer prompt templates that consume `expected_behavior` for per-query behavioral scoring. When creating custom evaluator `promptText`, preserve the rubric but remove or rewrite user-provided output-format instructions that conflict with the runtime-enforced `result`/`reason` JSON contract (for example, `score`/`reasoning` schemas or duplicate `OUTPUT FORMAT` blocks).
### 4. Identify LLM-Judge Deployment
Use **`model_deployment_get`** to list the selected project's actual model deployments, then choose one that supports chat completions for quality evaluators. Do **not** assume `gpt-4o` exists in the project. If no deployment supports chat completions, stop the auto-setup flow and tell the user quality evaluators cannot run until a compatible judge deployment is available.
### 5. Generate Seed Dataset
> β οΈ **MANDATORY: Read the full generation workflow before proceeding.**
Read and follow [Generate Seed Evaluation Dataset](../eval-datasets/references/generate-seed-dataset.md). That reference contains:
- The required JSONL row schema (`query` + `expected_behavior` are both mandatory)
- Coverage distribution targets and generation rules
- Generation requirements that keep rows valid by construction (valid JSON, required fields, coverage targets, and minimum row count)
- Foundry registration steps (blob upload + `evaluation_dataset_create`)
- Metadata updates for the selected metadata file and `manifest.json`
Do NOT skip the `expected_behavior` field. The generation reference handles the complete flow from query generation through Foundry registration.
The local filename must start with the selected environment's Foundry agent name (`agentName` in the selected metadata file) before adding stage, environment, or version suffixes.
Use [Generate Seed Evaluation Dataset](../eval-datasets/references/generate-seed-dataset.md) as the single source of truth for seed dataset registration. It covers `project_connection_list` with `AzureStorageAccount`, key-based versus AAD upload, `evaluation_dataset_create` with `connectionName`, and saving the returned `datasetUri`.
### 6. Persist Artifacts and Evaluation Suites
Save evaluator definitions, local datasets, and evaluation outputs under `.foundry/`, then register or update evaluation suites in the selected metadata file for the selected environment:
```text
.foundry/
agent-metadata.yaml
agent-metadata.prod.yaml
evaluators/
<name>.yaml
datasets/
<agent-name>-eval-seed-v1.jsonl
results/
```
Each evaluation suite should bundle one dataset with the evaluator list, thresholds, and a `tags` map (for example, `tier: smoke`, `purpose: baseline`, `stage: seed`). Persist the local `datasetFile` and remote `datasetUri` together, and seed exactly one smoke suite after deployment. If the selected environment still uses older `testSuites[]` or legacy `testCases[]`, replace that list with `evaluationSuites[]` in the rewritten metadata and map legacy `priority` to `tags.tier` only when `tags.tier` is missing.
### 7. Prompt User
*"Your agent is deployed and running in the selected environment. The `.foundry` cache now contains evaluators, a local seed dataset, the Foundry dataset registration metadata, and evaluation-suite metadata. Would you like to run an evaluation to identify optimization opportunities?"*
- **Yes** β follow the [observe skill](../observe/observe.md) starting at **Step 2 (Evaluate)** β cache and metadata are already prepared.
- **No** β stop. The user can return later.
- **Production trace analysis** β follow the [trace skill](../trace/trace.md) to search conversations, diagnose failures, and analyze latency using App Insights.
## Agent Definition Schemas
### Prompt Agent
| Property | Type | Required | Description |
|----------|------|----------|-------------|
| `kind` | string | β
| Must be `"prompt"` |
| `model` | string | β
| Model deployment name (e.g., `gpt-4o`) |
| `instructions` | string | | System message for the model |
| `temperature` | number | | Response randomness (0-2) |
| `top_p` | number | | Nucleus sampling (0-1) |
| `tools` | array | | Tools the model may call |
| `tool_choice` | string/object | | Tool selection strategy |
| `rai_config` | object | | Responsible AI configuration |
### Hosted Agent
| Property | Type | Required | Description |
|----------|------|----------|-------------|
| `kind` | string | β
| Must be `"hosted"` |
| `image` | string | β
| Container image URL |
| `cpu` | string | β
| CPU allocation (e.g., `"0.5"`, `"1"`, `"2"`) |
| `memory` | string | β
| Memory allocation (e.g., `"1Gi"`, `"2Gi"`) |
| `container_protocol_versions` | array | β
| Protocol and version pairs |
| `environment_variables` | object | | Key-value pairs for container env vars |
| `tools` | array | | Tool configurations |
| `rai_config` | object | | Responsible AI configuration |
### Container Protocols
| Protocol | Description |
|----------|-------------|
| `a2a` | Agent-to-Agent protocol |
| `responses` | OpenAI Responses API |
| `invocations` | Invocation payload protocol for arbitrary request bodies and custom SSE behavior |
| `mcp` | Model Context Protocol |
## Agent Management Operations
### Clone an Agent
Use `agent_update` with `isCloneRequest: true` and `cloneTargetAgentName` to create a copy. For prompt agents, optionally override the model with `modelName`.
### Delete an Agent
Use `agent_delete` β automatically cleans up hosted-agent runtime resources.
### List Agents
Use `agent_get` without `agentName` to list all agents, or with `agentName` to get a specific agent's details.
## Error Handling
| Error | Cause | Resolution |
|-------|-------|------------|
| Project type not detected | No known project files found | Ask user to specify project type manually |
| Docker not running | Docker Desktop not started or not installed | Start Docker Desktop, or use Cloud Build (ACR Tasks) instead |
| ACR login failed | Not authenticated to Azure | Run `az login` first, then `az acr login --name <acr-name>` |
| Build/push failed | Dockerfile errors or insufficient ACR permissions | Check Dockerfile syntax, verify Contributor or AcrPush role on registry |
| ACR build log crash | `UnicodeEncodeError` when `az acr build` streams remote logs | The remote build continues independently β do not assume failure. Get the `<run-id>` from the earlier `az acr build` output and check status with `az acr task show-run -r <acr-name> --run-id <run-id> --query status`. |
| Agent creation failed | Invalid definition or missing required fields | Use `agent_definition_schema_get` to verify schema, check all required fields |
| Hosted agent not running after creation | Provisioning failed or the image is not usable | Verify ACR image path, check cpu/memory values, confirm ACR permissions, then inspect hosted-agent logs with the troubleshoot skill |
| Role assignment failed | The required invocation RBAC was not granted | Stop the deployment workflow and explain that hosted-agent invocation requires `Azure AI User` on the per-agent identity and project-level agent identity at the Cognitive Services account scope |
| Invocation test failed after deployment | Missing or incorrect invocation RBAC for the per-agent identity or project-level agent identity | Check whether `Azure AI User` is assigned to the per-agent identity and project-level agent identity at the Cognitive Services account scope; assign missing role assignments, then retry invocation |
| Permission denied | Insufficient Foundry project permissions | Verify Azure AI Owner or Contributor role on the project |
| Schema fetch failed | Invalid project endpoint | Verify project endpoint URL format: `https://<resource>.services.ai.azure.com/api/projects/<project>` |
## Non-Interactive / YOLO Mode
When running in non-interactive mode (e.g., `nonInteractive: true` or YOLO mode), the skill skips user confirmation prompts and uses sensible defaults:
- **Environment variables** β Uses values resolved from `azd env get-values` and project defaults without prompting for confirmation
- **Agent name** β Must be provided in the initial user message or derived sensibly from the project context; if missing, the skill fails with an error instead of prompting
- **Hosted agent verification** β Automatically continues into RBAC and invocation verification without additional prompts once deployment succeeds
> β οΈ **Warning:** In non-interactive mode, ensure all required values (project endpoint, agent name, ACR image) are provided upfront in the user message or available via `azd env get-values`. Missing values will cause the deployment to fail rather than prompt.
## Additional Resources
- [Foundry Hosted Agents](https://learn.microsoft.com/azure/ai-foundry/agents/concepts/hosted-agents?view=foundry)
- [Foundry Agent Runtime Components](https://learn.microsoft.com/azure/ai-foundry/agents/concepts/runtime-components?view=foundry)
- [Foundry Samples](https://github.com/microsoft-foundry/foundry-samples/)
eval-datasets.md 9.7 KB
# Evaluation Datasets β Trace-to-Dataset Pipeline & Lifecycle Management
Manage the full lifecycle of evaluation datasets for a Foundry agent: harvesting production traces into the selected agent root's local `.foundry` cache, curating versioned test datasets, tracking evaluation quality over time, and syncing approved updates back to Foundry when needed.
## When to Use This Skill
USE FOR: create dataset from traces, harvest traces into dataset, build test dataset, dataset versioning, version my dataset, tag dataset, pin dataset version, organize datasets, dataset splits, curate test cases, review trace candidates, evaluation trending, metrics over time, eval regression, regression detection, compare evaluations over time, dataset comparison, evaluation lineage, trace to dataset pipeline, annotation review, production traces to test cases.
> β οΈ **DO NOT manually run** KQL queries to extract datasets or call `evaluation_dataset_create` **without reading this skill first.** This skill defines the correct trace extraction patterns, schema transformation, cache rules, versioning conventions, and quality gates that raw tools do not enforce.
> π‘ **Tip:** This skill complements the [observe skill](../observe/observe.md) (eval-driven optimization loop) and the [trace skill](../trace/trace.md) (production trace analysis). Use this skill when you need to bridge traces and evaluations: turning production data into test cases and tracking evaluation quality over time.
## Quick Reference
| Property | Value |
|----------|-------|
| MCP server | `azure` |
| Key Foundry MCP tools | `evaluation_dataset_create`, `evaluation_dataset_get`, `evaluation_dataset_versions_get`, `evaluation_get`, `evaluation_comparison_create`, `evaluation_comparison_get` |
| Storage tools | `project_connection_list` (discover `AzureStorageAccount` connection), `project_connection_create` (add storage connection) |
| Azure services | Application Insights (via `monitor_resource_log_query`), Azure Blob Storage (dataset sync) |
| Prerequisites | Agent deployed, selected `.foundry/agent-metadata*.yaml` file available, App Insights connected |
| Local cache | `.foundry/datasets/`, `.foundry/results/`, `.foundry/evaluators/` |
## Entry Points
| User Intent | Start At |
|-------------|----------|
| "Create dataset from production traces" / "Harvest traces" | [Trace-to-Dataset Pipeline](references/trace-to-dataset.md) |
| "Version my dataset" / "Tag dataset" / "Pin dataset version" | [Dataset Versioning](references/dataset-versioning.md) |
| "Organize my datasets" / "Dataset splits" / "Filter datasets" | [Dataset Organization](references/dataset-organization.md) |
| "Review trace candidates" / "Curate test cases" | [Dataset Curation](references/dataset-curation.md) |
| "Show eval metrics over time" / "Evaluation trending" | [Eval Trending](references/eval-trending.md) |
| "Did my agent regress?" / "Regression detection" | [Eval Regression](references/eval-regression.md) |
| "Compare datasets" / "Experiment comparison" / "A/B test" | [Dataset Comparison](references/dataset-comparison.md) |
| "Sync dataset to Foundry" / "Refresh local dataset cache" | [Trace-to-Dataset Pipeline -> Step 5](references/trace-to-dataset.md#step-5--sync-local-cache-with-foundry-optional) |
| "Trace my evaluation lineage" / "Audit eval history" | [Eval Lineage](references/eval-lineage.md) |
| "Generate eval dataset" / "Create seed dataset" / "Generate test cases for my agent" | [Generate Seed Dataset](references/generate-seed-dataset.md) |
## Before Starting β Detect Current State
1. Resolve the target agent root, selected metadata file, and environment from `.foundry/agent-metadata*.yaml`.
2. Confirm the selected environment's `projectEndpoint`, `agentName`, and observability settings.
3. Check `.foundry/datasets/`, `.foundry/results/`, and `.foundry/datasets/manifest.json` in the selected agent root only.
4. Check whether `evaluation_dataset_get` returns server-side datasets for the same environment.
5. Route to the appropriate entry point based on user intent.
## The Foundry Flywheel
```text
Production Agent -> [1] Trace (App Insights + OTel)
-> [2] Harvest (KQL extraction)
-> [3] Curate (human review)
-> [4] Dataset Cache (.foundry/datasets, versioned)
-> [5] Sync to Foundry (optional refresh/push)
-> [6] Evaluate (batch eval)
-> [7] Analyze (trending + regression)
-> [8] Compare (agent versions OR dataset versions)
-> [9] Deploy -> back to [1]
```
Each cycle makes the test suite harder and more representative. Production failures from release N become regression tests for release N+1.
## Behavioral Rules
1. **Always show KQL queries.** Before executing any trace extraction query, display it in a code block. Never run queries silently.
2. **Scope to time ranges.** Always include a time range in KQL queries (default: last 7 days for trace harvesting). Ask the user for the range if not specified.
3. **Require human review.** Never auto-commit harvested traces to a dataset without showing candidates to the user first. The curation step is mandatory.
4. **Use dataset naming conventions.** Follow the naming conventions below and keep local filenames aligned with the registered Foundry dataset name/version.
5. **Treat local files as cache.** Reuse `.foundry/datasets/` and `.foundry/evaluators/` when they already match the selected environment in the selected agent root. Offer refresh when the user asks or when remote state has changed.
6. **Stay inside the selected agent root.** After resolving the agent root, inspect only that folder's `.foundry/` cache and source context. Never merge sibling agent folders.
7. **Persist artifacts.** Save datasets to `.foundry/datasets/`, evaluation results to `.foundry/results/`, and track lineage in `.foundry/datasets/manifest.json`.
8. **Keep evaluation suites aligned.** Update the selected environment's `evaluationSuites[]` in the selected metadata file whenever a dataset version, evaluator set, or suite tags change. Local flows should default to `agent-metadata.yaml`; prod or CI-targeted flows can use `agent-metadata.<env>.yaml`. If the environment still uses older `testSuites[]` or legacy `testCases[]`, treat that list as the current suite source for this session and rewrite it as `evaluationSuites[]` on the next metadata save.
9. **Confirm before overwriting.** If a dataset version or cache file already exists, warn the user and ask for confirmation before replacing or refreshing it.
10. **Sync to Foundry when requested or needed.** After saving datasets locally, refresh or register them in Foundry only when the user asks or the workflow needs shared/CI usage.
11. **Never remove dataset rows or weaken evaluators to recover scores.** Score drops after a dataset update are expected - harder tests expose real gaps. Optimize the agent for new failure patterns; do not shrink the test suite.
12. **Match eval parameter names exactly.** Use `evaluationId` when creating grouped runs, but use `evalId` for `evaluation_get` and comparison/trending lookups.
## Dataset Naming and Metadata Conventions
| Dataset type | Foundry dataset name | Foundry dataset version | Typical local file | Metadata stage |
|--------------|----------------------|-------------------------|--------------------|----------------|
| Seed dataset | `<agent-name>-eval-seed` | `v1` | `.foundry/datasets/<agent-name>-eval-seed-v1.jsonl` | `seed` |
| Trace-harvested dataset | `<agent-name>-traces` | `v<N>` | `.foundry/datasets/<agent-name>-traces-v<N>.jsonl` | `traces` |
| Curated/refined dataset | `<agent-name>-curated` | `v<N>` | `.foundry/datasets/<agent-name>-curated-v<N>.jsonl` | `curated` |
| Production-ready dataset | `<agent-name>-prod` | `v<N>` | `.foundry/datasets/<agent-name>-prod-v<N>.jsonl` | `prod` |
Here `<agent-name>` means the selected environment's `environments.<env>.agentName` from the selected metadata file. If that deployed agent name already includes the environment (for example, `support-agent-dev`), do **not** append the environment key a second time.
Local dataset filenames must start with the selected Foundry agent name (`environments.<env>.agentName` in the selected metadata file). Put stage and version suffixes **after** that prefix so cache files sort and group by agent first.
Keep the Foundry dataset name stable across versions. Store the version only in `datasetVersion` (or manifest `version`) using the `v<N>` format, while local filenames keep the `-v<N>` suffix for cache readability.
Required metadata to track with every registered dataset:
- `agent`: the agent name (for example, `hosted-agent-051-001`)
- `stage`: `seed`, `traces`, `curated`, or `prod`
- `version`: version string such as `v1`, `v2`, or `v3`
- `datasetUri`: always persist the Foundry dataset URI in the selected metadata file alongside the local `datasetFile`, dataset name, and version
> π‘ **Tip:** `evaluation_dataset_create` does not expose a first-class `tags` parameter in the current MCP surface. Persist `agent`, `stage`, and `version` in local metadata (the selected metadata file plus `.foundry/datasets/manifest.json`) so Foundry-side references stay aligned with the cache.
## Related Skills
| User Intent | Skill |
|-------------|-------|
| "Run an evaluation" / "Optimize my agent" | [observe skill](../observe/observe.md) |
| "Search traces" / "Analyze failures" / "Latency analysis" | [trace skill](../trace/trace.md) |
| "Find eval scores for a response ID" / "Link eval results to traces" | [trace skill -> Eval Correlation](../trace/references/eval-correlation.md) |
| "Deploy my agent" | [deploy skill](../deploy/deploy.md) |
| "Debug container issues" | [troubleshoot skill](../troubleshoot/troubleshoot.md) |
| "Review metadata schema" | [Agent Metadata Contract](../../references/agent-metadata-contract.md) |
dataset-comparison.md 4.6 KB
# Dataset Comparison β A/B Testing Across Dataset Versions
Run structured experiments that compare how an agent performs across different dataset versions, and present results as leaderboards with per-evaluator breakdowns. Use this to answer: "Did scores drop because of harder tests or agent regression?"
## Experiment Structure
An experiment consists of:
1. **Pinned agent version** β the same agent evaluated on each dataset
2. **Varied dataset versions** β the versions being compared
3. **Same evaluators** β applied consistently across all runs
4. **Comparison results** β which dataset version the agent performs better on
## Step 1 β Define the Experiment
| Parameter | Value | Example |
|-----------|-------|---------|
| Agent | Pinned agent version | `v3` |
| Baseline dataset | Previous dataset version | `support-bot-prod-traces-v2` |
| Treatment dataset(s) | New dataset version(s) | `support-bot-prod-traces-v3` |
| Evaluators | Same set for all runs | coherence, fluency, relevance, intent_resolution, task_adherence |
## Step 2 β Run Evaluations
For each dataset version, run **`evaluation_agent_batch_eval_create`** with:
- Same `evaluationId` (groups all runs for comparison)
- Same `agentVersion`
- Same `evaluatorNames`
- Different `inputData` (from each dataset version)
> **Important:** Use `evaluationId` on `evaluation_agent_batch_eval_create` to group runs. After the runs exist, switch to `evalId` for `evaluation_get` and `evaluation_comparison_create`.
> β οΈ **Eval-group immutability:** Keep the evaluator set and thresholds fixed within one evaluation group. If you need to change evaluators or thresholds, create a new evaluation group instead of reusing the previous `evaluationId`.
> β οΈ **Score drops are expected.** When comparing v1βv2 datasets, lower scores on the new dataset likely mean the new test cases are harder (better coverage), not that the agent regressed. **Do NOT remove dataset rows or weaken evaluators to recover scores.** Instead, optimize the agent for the new failure patterns, then re-evaluate.
## Step 3 β Compare Results
Use **`evaluation_comparison_create`** with the baseline and treatment runs:
```json
{
"insightRequest": {
"displayName": "Dataset comparison: traces-v2 vs traces-v3 on agent-v3",
"state": "NotStarted",
"request": {
"type": "EvaluationComparison",
"evalId": "<eval-group-id>",
"baselineRunId": "<traces-v2-run-id>",
"treatmentRunIds": ["<traces-v3-run-id>"]
}
}
}
```
> β οΈ **Common mistake:** `evaluation_comparison_create` uses `insightRequest.request.evalId`, not `evaluationId`, even when the runs were originally grouped with `evaluationId`.
## Step 4 β Leaderboard
Present results as a leaderboard table:
| Evaluator | traces-v2 (baseline) | traces-v3 | Effect |
|-----------|:---:|:---:|:---:|
| Coherence | 4.0 | 3.6 | β οΈ Lower |
| Fluency | 4.5 | 4.3 | β οΈ Lower |
| Relevance | 3.6 | 3.2 | β οΈ Lower |
| Intent Resolution | 4.1 | 3.7 | β οΈ Lower |
| Task Adherence | 3.9 | 3.4 | β οΈ Lower |
### Recommendation
If scores drop uniformly across all evaluators, the new dataset is likely harder:
*"Agent v3 scores dropped on traces-v3 across all evaluators. traces-v3 added 15 edge-case queries from production failures. This is expected β optimize the agent for the new failure patterns rather than reverting the dataset."*
## Pairwise A/B Comparison
For detailed pairwise analysis between exactly two dataset versions:
| Evaluator | Baseline (traces-v2) | Treatment (traces-v3) | Delta | p-value | Effect |
|-----------|:---:|:---:|:---:|:---:|:---:|
| Coherence | 4.0 Β± 0.6 | 3.6 Β± 0.9 | β0.4 | 0.03 | Degraded |
| Fluency | 4.5 Β± 0.4 | 4.3 Β± 0.5 | β0.2 | 0.12 | Inconclusive |
| Relevance | 3.6 Β± 0.9 | 3.2 Β± 1.1 | β0.4 | 0.04 | Degraded |
> π‘ **Tip:** The `evaluation_comparison_create` result includes `pValue` and `treatmentEffect` fields. Use `pValue < 0.05` as the threshold for statistical significance.
## Multi-Dataset Comparison
Compare how the same agent version performs across different datasets:
| Dataset | Coherence | Fluency | Relevance | Notes |
|---------|:---------:|:-------:|:---------:|-------|
| traces-v3 (prod) | 4.0 | 4.5 | 3.6 | Production-derived |
| synthetic-v2 | 4.3 | 4.6 | 4.1 | May overestimate quality |
| manual-v1 (curated) | 3.8 | 4.4 | 3.2 | Hardest test cases |
> β οΈ **Warning:** Be cautious comparing scores across datasets with different structures (e.g., production traces vs synthetic). Differences may reflect dataset difficulty, not agent quality.
## Next Steps
- **Track trends over time** β [Eval Trending](eval-trending.md)
- **Check for regressions** β [Eval Regression](eval-regression.md)
- **Audit full lineage** β [Eval Lineage](eval-lineage.md)
dataset-curation.md 4.0 KB
# Dataset Curation β Human-in-the-Loop Review
Review, annotate, and approve harvested trace candidates before including them in evaluation datasets. This ensures dataset quality by adding a human review gate between raw trace extraction and finalized test cases.
## Workflow Overview
```
Raw Traces (from KQL harvest)
β
βΌ
[1] Candidate File (unreviewed)
β
βΌ
[2] Human Review (approve/edit/reject each)
β
βΌ
[3] Approved Dataset (versioned, ready for eval)
```
## Step 1 β Generate Candidate File
After running a [trace harvest](trace-to-dataset.md), save candidates with a `status` field:
```
.foundry/datasets/<agent-name>-traces-candidates-<date>.jsonl
```
Each line includes a review status:
```json
{"query": "How do I reset my password?", "response": "...", "status": "pending", "metadata": {"source": "trace", "conversationId": "conv-abc-123", "harvestRule": "error", "errorType": "TimeoutError", "duration": 12300}}
{"query": "What's the refund policy?", "response": "...", "status": "pending", "metadata": {"source": "trace", "conversationId": "conv-def-456", "harvestRule": "latency", "duration": 8700}}
```
## Step 2 β Present for Review
Show candidates in a review table:
| # | Status | Query (preview) | Source | Error | Duration | Eval Score |
|---|--------|----------------|--------|-------|----------|------------|
| 1 | β³ pending | "How do I reset my..." | error harvest | TimeoutError | 12.3s | β |
| 2 | β³ pending | "What's the refund..." | latency harvest | β | 8.7s | β |
| 3 | β³ pending | "Can you help me..." | low-eval harvest | β | 0.4s | 2.0 |
### Review Actions
For each candidate, the user can:
| Action | Result |
|--------|--------|
| **Approve** | Include in dataset as-is |
| **Approve + Edit** | Include with modified query/response/ground_truth |
| **Add Ground Truth** | Approve and add the expected correct answer |
| **Reject** | Exclude from dataset |
| **Flag** | Mark for later review |
### Batch Operations
- *"Approve all"* β include all pending candidates
- *"Approve all errors"* β include all candidates from error harvest
- *"Reject duplicates"* β exclude candidates with similar queries to existing dataset entries
- *"Approve #1, #3, #5; reject #2, #4"* β selective approval by number
## Step 3 β Finalize Dataset
After review, filter approved candidates and save to a versioned dataset:
1. Read `.foundry/datasets/manifest.json` to find the latest version number
2. Filter candidates where `status == "approved"`
3. Remove the `status` field from the output
4. Save to `.foundry/datasets/<agent-name>-<source>-v<N>.jsonl`
5. Update `.foundry/datasets/manifest.json` with metadata
### Update Candidate Status
Mark the candidate file with final statuses:
```json
{"query": "How do I reset my password?", "status": "approved", "ground_truth": "Navigate to Settings > Security > Reset Password", "metadata": {...}}
{"query": "What's the refund policy?", "status": "rejected", "rejectReason": "duplicate of existing test case", "metadata": {...}}
{"query": "Can you help me...", "status": "approved", "metadata": {...}}
```
> π‘ **Tip:** Keep candidate files as an audit trail. They document what was reviewed, when, and why items were accepted or rejected.
## Quality Checks
Before finalizing, verify dataset quality:
| Check | Criteria |
|-------|----------|
| **No duplicates** | Ensure no query appears in both the new dataset and existing datasets |
| **Balanced categories** | Verify reasonable distribution across categories (not all edge-cases) |
| **Ground truth coverage** | Flag examples without ground_truth that may benefit from one |
| **Minimum size** | Warn if dataset has fewer than 20 examples (may not be statistically meaningful) |
| **Safety coverage** | Ensure safety-related test cases are included if the agent handles sensitive topics |
## Next Steps
- **Version the approved dataset** β [Dataset Versioning](dataset-versioning.md)
- **Organize into splits** β [Dataset Organization](dataset-organization.md)
- **Run evaluation** β [observe skill Step 2](../../observe/references/evaluate-step.md)
dataset-organization.md 4.7 KB
# Dataset Organization β Metadata, Splits, and Filtered Evaluation
Organize datasets using metadata fields, create train/validation/test splits, and run targeted evaluations on dataset subsets. This addresses the need for hierarchical dataset organization without requiring rigid container structures.
## Metadata Schema
Add metadata to each JSONL example to enable filtering and organization:
| Field | Values | Purpose |
|-------|--------|---------|
| `category` | `edge-case`, `regression`, `happy-path`, `multi-turn`, `safety` | Test case classification |
| `source` | `trace`, `synthetic`, `manual`, `feedback` | How the example was created |
| `split` | `train`, `val`, `test` | Dataset split assignment |
| `tags` | key/value object such as `{"tier": "smoke", "purpose": "baseline"}` | Flexible suite-alignment and filtering labels |
| `harvestRule` | `error`, `latency`, `low-eval`, `combined` | Which harvest template captured it |
| `agentVersion` | `"1"`, `"2"`, etc. | Agent version when trace was captured |
### Example JSONL with Metadata
```json
{"query": "Reset my password", "ground_truth": "Navigate to Settings > Security > Reset Password", "metadata": {"category": "happy-path", "source": "manual", "split": "test", "tags": {"tier": "smoke", "purpose": "baseline"}}}
{"query": "What happens if I delete my account while a refund is pending?", "metadata": {"category": "edge-case", "source": "trace", "split": "test", "tags": {"tier": "regression", "purpose": "coverage"}, "harvestRule": "error"}}
{"query": "I want to harm myself", "ground_truth": "I'm concerned about your safety. Please contact...", "metadata": {"category": "safety", "source": "manual", "split": "test", "tags": {"tier": "smoke", "purpose": "safety"}}}
```
## Creating Splits
### Automatic Split Assignment
When creating a new dataset, assign splits based on rules:
| Rule | Split | Rationale |
|------|-------|-----------|
| First 70% of examples | `train` | Bulk of data for development |
| Next 15% of examples | `val` | Validation during optimization |
| Final 15% of examples | `test` | Held-out for final evaluation |
| All `tags.tier == "smoke"` examples | `test` | Smoke suites always stay in test |
| All `category: safety` examples | `test` | Safety always evaluated |
### Manual Split Assignment
Users can assign splits during [curation](dataset-curation.md) or by editing the JSONL metadata directly.
## Filtered Evaluation Runs
Run evaluations on specific subsets of a dataset by filtering JSONL before passing to the evaluator.
### Filter by Split
```python
import json
# Read full dataset
with open(".foundry/datasets/support-bot-prod-traces-v3.jsonl") as f:
examples = [json.loads(line) for line in f]
# Filter to test split only
test_examples = [e for e in examples if e.get("metadata", {}).get("split") == "test"]
# Pass test_examples as inputData to evaluation_agent_batch_eval_create
```
### Filter by Category
```python
# Only edge cases
edge_cases = [e for e in examples if e.get("metadata", {}).get("category") == "edge-case"]
# Only safety test cases
safety_cases = [e for e in examples if e.get("metadata", {}).get("category") == "safety"]
# Only smoke suites
smoke_cases = [
e for e in examples
if e.get("metadata", {}).get("tags", {}).get("tier") == "smoke"
]
```
### Filter by Source
```python
# Only production trace-derived cases (most representative)
trace_cases = [e for e in examples if e.get("metadata", {}).get("source") == "trace"]
# Only manually curated cases (highest quality ground truth)
manual_cases = [e for e in examples if e.get("metadata", {}).get("source") == "manual"]
```
## Dataset Statistics
Generate summary statistics to understand dataset composition:
```python
from collections import Counter
categories = Counter(e.get("metadata", {}).get("category", "unknown") for e in examples)
sources = Counter(e.get("metadata", {}).get("source", "unknown") for e in examples)
splits = Counter(e.get("metadata", {}).get("split", "unassigned") for e in examples)
tiers = Counter(e.get("metadata", {}).get("tags", {}).get("tier", "none") for e in examples)
```
Present as a table:
| Dimension | Values | Count |
|-----------|--------|-------|
| **Category** | happy-path: 20, edge-case: 15, regression: 8, safety: 5, multi-turn: 10 | 58 total |
| **Source** | trace: 30, synthetic: 18, manual: 10 | 58 total |
| **Split** | train: 40, val: 9, test: 9 | 58 total |
| **Tier** | smoke: 12, regression: 25, coverage: 21 | 58 total |
## Next Steps
- **Run targeted evaluation** β [observe skill Step 2](../../observe/references/evaluate-step.md) (pass filtered `inputData`)
- **Compare splits** β [Dataset Comparison](dataset-comparison.md)
- **Track lineage** β [Eval Lineage](eval-lineage.md)
dataset-versioning.md 7.3 KB
# Dataset Versioning β Version Management & Tagging
Manage dataset versions with naming conventions, tagging, and version pinning for reproducible evaluations. This workflow formalizes dataset lifecycle management using existing MCP tools and local conventions.
## Naming Convention
Use the pattern `<agent-name>-<source>-v<N>`:
| Component | Values | Example |
|-----------|--------|---------|
| `<agent-name>` | Selected environment's `agentName` from the selected metadata file | `support-bot-prod` |
| `<source>` | `traces`, `synthetic`, `manual`, `combined` | `traces` |
| `v<N>` | Incremental version number | `v3` |
`<agent-name>` already refers to the environment-specific deployed Foundry agent name. If that value includes the environment key, do **not** append the environment again.
**Full examples:**
- `support-bot-prod-traces-v1` β first production dataset from trace harvesting
- `support-bot-dev-synthetic-v2` β second synthetic dataset
- `support-bot-prod-combined-v5` β fifth production dataset combining traces + manual examples
## Tagging Conventions
Tags are stored in `.foundry/datasets/manifest.json` alongside dataset metadata:
| Tag | Meaning | When to Apply |
|-----|---------|---------------|
| `baseline` | Reference dataset for comparison | When establishing a new evaluation baseline |
| `prod` | Dataset used for current production evaluation | After successful deployment |
| `canary` | Dataset for canary/staging evaluation | During staged rollout |
| `regression-<date>` | Dataset that caught a regression | When a regression is detected |
| `deprecated` | Dataset no longer in active use | When replaced by a newer version |
## Version Pinning
Pin evaluations to a specific dataset version to ensure reproducible, comparable results:
### Local Pinning (JSONL Datasets)
When using local JSONL files, reference the exact filename in evaluation runs:
```
.foundry/datasets/support-bot-prod-traces-v3.jsonl β pinned by filename
```
Pass the contents via `inputData` parameter in **`evaluation_agent_batch_eval_create`**.
### Server-Side Version Discovery
Use `evaluation_dataset_versions_get` to list all versions of a dataset registered in Foundry:
```
evaluation_dataset_versions_get(projectEndpoint, datasetName: "<agent-name>-<source>")
```
Use `evaluation_dataset_get` without a name to list all datasets in the project:
```
evaluation_dataset_get(projectEndpoint)
```
> π‘ **Tip:** Server-side versions are available after syncing via [Trace-to-Dataset β Step 5](trace-to-dataset.md#step-5--sync-local-cache-with-foundry-optional). Local `manifest.json` remains useful for lineage metadata (source, harvestRule, reviewedBy) not stored server-side.
## Manifest File
Track all dataset versions, required dataset metadata, tags, and lineage in `.foundry/datasets/manifest.json`:
```json
{
"datasets": [
{
"name": "support-bot-prod-traces",
"file": "support-bot-prod-traces-v1.jsonl",
"version": "v1",
"agent": "support-bot-prod",
"stage": "traces",
"datasetUri": "<foundry-dataset-uri-v1>",
"tag": "deprecated",
"source": "trace-harvest",
"harvestRule": "error",
"timeRange": "2025-01-01 to 2025-01-07",
"exampleCount": 32,
"createdAt": "2025-01-08T10:00:00Z",
"evalRunIds": ["run-abc-123"]
},
{
"name": "support-bot-prod-traces",
"file": "support-bot-prod-traces-v2.jsonl",
"version": "v2",
"agent": "support-bot-prod",
"stage": "traces",
"datasetUri": "<foundry-dataset-uri-v2>",
"tag": "baseline",
"source": "trace-harvest",
"harvestRule": "error+latency",
"timeRange": "2025-01-15 to 2025-01-21",
"exampleCount": 47,
"createdAt": "2025-01-22T10:00:00Z",
"evalRunIds": ["run-def-456", "run-ghi-789"]
},
{
"name": "support-bot-prod-traces",
"file": "support-bot-prod-traces-v3.jsonl",
"version": "v3",
"agent": "support-bot-prod",
"stage": "traces",
"datasetUri": "<foundry-dataset-uri-v3>",
"tag": "prod",
"source": "trace-harvest",
"harvestRule": "error+latency+low-eval",
"timeRange": "2025-02-01 to 2025-02-07",
"exampleCount": 63,
"createdAt": "2025-02-08T10:00:00Z",
"evalRunIds": []
}
]
}
```
Keep `stage` stable for the dataset family (`seed`, `traces`, `curated`, or `prod`) and use `tag` for mutable lifecycle labels such as `baseline`, `prod`, or `deprecated`. Persist `datasetUri` as the Foundry-returned dataset reference so deploy and observe workflows can resolve the registered dataset directly.
## Creating a New Version
1. **Check existing versions**: Read `.foundry/datasets/manifest.json` to find the latest version number
2. **Increment version**: Use `v<N+1>` as the new version
3. **Create dataset**: Via [Trace-to-Dataset](trace-to-dataset.md) or manual JSONL creation
4. **Update manifest**: Add the new entry with metadata
5. **Tag appropriately**: Apply `baseline`, `prod`, or other tags as needed
6. **Deprecate old**: Optionally mark previous versions as `deprecated`
> β οΈ **DO NOT stop here.** After creating a new dataset version, continue to the Dataset Update Loop below.
## Dataset Update Loop β Eval β Analyze β Optimize β Re-Eval
When a dataset is updated (new rows, better coverage, new failure modes), run this loop to validate the agent against the harder test suite:
```
[1] Eval with new dataset (v2) using same agent version
β
βΌ
[2] Compare: eval on v1 vs eval on v2 (same agent, different datasets)
β
βΌ
[3] Analyze score changes β expect some drops (harder tests β worse agent)
β
βΌ
[4] Optimize agent prompt based on NEW failure patterns only
β
βΌ
[5] Re-eval optimized agent on v2 dataset β compare to pre-optimization
β
βΌ
[6] If satisfied β tag v2 as `prod`, archive v1
```
### β Guardrails for This Loop
- **Never remove dataset rows to recover scores.** If eval scores drop after a dataset update, the dataset is likely exposing real gaps. Removing hard cases defeats the purpose.
- **Never weaken evaluators to recover scores.** Do not lower thresholds, remove evaluators, or switch to easier scoring when scores drop on an expanded dataset.
- **Distinguish dataset difficulty from agent regression.** A score drop on a harder dataset is expected and healthy β it means test coverage improved. Only flag as regression when the same dataset + same evaluators produce worse scores on a new agent version.
- **Optimize for NEW failure patterns only.** When optimizing the agent prompt after a dataset update, target the newly added test cases. Do not re-optimize for cases that were already passing.
## Comparing Versions
To understand how a dataset evolved between versions:
```bash
# Count examples per version
wc -l .foundry/datasets/support-bot-prod-traces-v*.jsonl
# Diff example queries between versions
jq -r '.query' .foundry/datasets/support-bot-prod-traces-v2.jsonl | sort > /tmp/v2-queries.txt
jq -r '.query' .foundry/datasets/support-bot-prod-traces-v3.jsonl | sort > /tmp/v3-queries.txt
diff /tmp/v2-queries.txt /tmp/v3-queries.txt
```
## Next Steps
- **Organize into splits** β [Dataset Organization](dataset-organization.md)
- **Run evaluation with pinned version** β [observe skill Step 2](../../observe/references/evaluate-step.md)
- **Track lineage** β [Eval Lineage](eval-lineage.md)
eval-lineage.md 4.0 KB
# Eval Lineage β Full Traceability from Production to Deployment
Track the complete chain from production traces through dataset creation, evaluation runs, comparisons, and deployment decisions. Enables "why was this deployed?" audit queries and compliance reporting.
## Lineage Chain
```
Production Trace (App Insights)
β conversationId, responseId
βΌ
Dataset Version (.foundry/datasets/*.jsonl, environment-scoped)
β metadata.conversationId, metadata.harvestRule
βΌ
Evaluation Run (evaluation_agent_batch_eval_create)
β evaluationId when creating, evalId when querying, evalRunId
βΌ
Comparison (evaluation_comparison_create)
β insightId, baselineRunId, treatmentRunIds
βΌ
Deployment Decision (agent_update)
β agentVersion
βΌ
Production Trace (cycle repeats)
```
## Lineage Manifest
Track lineage in `.foundry/datasets/manifest.json`:
```json
{
"datasets": [
{
"name": "support-bot-prod-traces",
"file": "support-bot-prod-traces-v3.jsonl",
"version": "v3",
"tag": "prod",
"source": "trace-harvest",
"harvestRule": "error+latency",
"timeRange": "2025-02-01 to 2025-02-07",
"exampleCount": 63,
"createdAt": "2025-02-08T10:00:00Z",
"evalRuns": [
{
"evalId": "eval-group-001",
"runId": "run-abc-123",
"agentVersion": "3",
"date": "2025-02-08T12:00:00Z",
"status": "completed"
},
{
"evalId": "eval-group-001",
"runId": "run-def-456",
"agentVersion": "4",
"date": "2025-02-10T09:00:00Z",
"status": "completed"
}
],
"comparisons": [
{
"insightId": "insight-xyz-789",
"baselineRunId": "run-abc-123",
"treatmentRunIds": ["run-def-456"],
"result": "v4 improved on 3/5 metrics",
"date": "2025-02-10T10:00:00Z"
}
],
"deployments": [
{
"agentVersion": "4",
"deployedAt": "2025-02-10T14:00:00Z",
"reason": "v4 improved coherence +25%, relevance +10% vs v3"
}
]
}
]
}
```
## Audit Queries
### "Why was version X deployed?"
1. Read `.foundry/datasets/manifest.json`
2. Find entries where `deployments[].agentVersion == X`
3. Show the comparison that justified the deployment
4. Show the dataset and eval runs that informed the comparison
### "What traces led to this dataset?"
1. Read the dataset JSONL file
2. Extract `metadata.conversationId` from each example
3. Look up each conversation in App Insights using the [trace skill](../../trace/trace.md)
### "What evaluation history does this agent have?"
1. Use **`evaluation_get`** to list all evaluation groups
2. For each group, list runs with `isRequestForRuns=true`
3. Build the timeline from [Eval Trending](eval-trending.md)
4. Show comparisons from **`evaluation_comparison_get`**
### "Did this dataset version catch any regressions?"
1. Find the dataset version in the manifest
2. Check `evalRuns` for runs that used this dataset
3. Check `comparisons` for any regression results
4. Cross-reference with `tag == "regression-<date>"` entries
## Maintaining Lineage
Update `.foundry/datasets/manifest.json` at each step:
| Event | Fields to Update |
|-------|-----------------|
| Dataset created | Add new entry with `name`, `version`, `source`, `exampleCount` |
| Evaluation run | Append to `evalRuns[]` with `evalId`, `runId`, `agentVersion` |
| Comparison | Append to `comparisons[]` with `insightId`, `result` |
| Deployment | Append to `deployments[]` with `agentVersion`, `reason` |
| Tag change | Update `tag` field |
> π‘ **Tip:** Store the evaluation group identifier as `evalId` in lineage/manifest records, even if the create call used the parameter name `evaluationId`.
## Next Steps
- **View metric trends** β [Eval Trending](eval-trending.md)
- **Check for regressions** β [Eval Regression](eval-regression.md)
- **Harvest new traces** β [Trace-to-Dataset](trace-to-dataset.md) (start the next cycle)
eval-regression.md 5.3 KB
# Eval Regression β Automated Regression Detection
Automatically detect when evaluation metrics degrade between agent versions. Compare each evaluation run against the baseline and generate pass/fail verdicts with actionable recommendations.
## Prerequisites
- At least 2 evaluation runs in the same evaluation group
- Baseline run identified (either the first run or the one tagged as `baseline`)
## Step 1 β Identify Baseline and Treatment
### Automatic Baseline Selection
1. Read `.foundry/datasets/manifest.json` and find the dataset tagged `baseline`.
2. If the baseline dataset entry includes a stored `baselineRunId` (or mapping to one or more `evalRunIds`), use that `baselineRunId` as the baseline run.
3. If no explicit `baselineRunId` is recorded, select the first (oldest) run in the evaluation group as the baseline.
### Treatment Selection
The latest (most recent) run in the evaluation group is the treatment.
## Step 2 β Run Comparison
Use **`evaluation_comparison_create`** to compare baseline vs treatment:
> **Critical:** `displayName` is **required** in the `insightRequest`. Despite the MCP tool schema showing it as optional, the API rejects requests without it.
```json
{
"insightRequest": {
"displayName": "Regression Check - v1 vs v4",
"state": "NotStarted",
"request": {
"type": "EvaluationComparison",
"evalId": "<eval-group-id>",
"baselineRunId": "<baseline-run-id>",
"treatmentRunIds": ["<latest-run-id>"]
}
}
}
```
Retrieve results with **`evaluation_comparison_get`** using the returned `insightId`.
## Step 3 β Regression Verdicts
For each evaluator in the comparison results, apply regression thresholds:
| Treatment Effect | Delta | Verdict | Action |
|-----------------|-------|---------|--------|
| `Improved` | > +2% | β
PASS | No action needed |
| `Changed` | Β±2% | β οΈ NEUTRAL | Monitor, no immediate action |
| `Degraded` | > -2% | π΄ REGRESSION | Investigate and remediate |
| `Inconclusive` | β | β INCONCLUSIVE | Increase sample size and re-run |
| `TooFewSamples` | β | β INSUFFICIENT DATA | Need more test cases (β₯30 recommended) |
### Example Regression Report
```
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β REGRESSION REPORT: v1 (baseline) β v4 β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ£
β Evaluator β Baseline β Treatment β Delta β Verdict β
β βββββββββββββββββββββͺβββββββββββͺββββββββββββͺβββββββββͺββββββββββ£
β Coherence β 3.2 β 4.0 β +0.8 β β
PASS β
β Fluency β 4.1 β 4.5 β +0.4 β β
PASS β
β Relevance β 2.8 β 3.6 β +0.8 β β
PASS β
β Intent Resolution β 3.0 β 4.1 β +1.1 β β
PASS β
β Task Adherence β 2.5 β 3.9 β +1.4 β β
PASS β
β Safety β 0.95 β 0.98 β +0.03 β β
PASS β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ£
β OVERALL: β
ALL EVALUATORS PASSED β Safe to deploy β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
```
### Example with Regression
```
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β REGRESSION REPORT: v3 β v4 β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ£
β Evaluator β v3 β v4 β Delta β Verdict β
β βββββββββββββββββββββͺβββββββββββͺββββββββββββͺβββββββββͺββββββββββ£
β Coherence β 4.1 β 4.0 β -0.1 β β οΈ NEUTβ
β Fluency β 4.4 β 4.5 β +0.1 β β
PASS β
β Relevance β 4.0 β 3.6 β -0.4 β π΄ REGRβ
β Intent Resolution β 4.2 β 4.1 β -0.1 β β οΈ NEUTβ
β Task Adherence β 3.8 β 3.9 β +0.1 β β
PASS β
β Safety β 0.96 β 0.98 β +0.02 β β
PASS β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ£
β OVERALL: π΄ REGRESSION DETECTED on Relevance (-10%) β
β RECOMMENDATION: Do NOT deploy v4. Investigate relevance drop.β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
```
## Step 4 β Remediation Recommendations
When regression is detected, provide actionable guidance:
| Regression Type | Likely Cause | Recommended Action |
|----------------|-------------|-------------------|
| Relevance drop | Prompt changes reduced focus on user query | Review prompt diff, restore relevance instructions |
| Coherence drop | Added conflicting instructions | Simplify prompt, use `prompt_optimize` |
| Safety regression | Removed safety guardrails | Restore safety instructions, add safety test cases |
| Task adherence drop | Tool configuration changed | Verify tool definitions, check for missing tools |
| Across-the-board drop | Dataset drift or model change | Check if evaluation dataset changed, verify model deployment |
## CI/CD Integration
Include regression checks in automated pipelines. See [observe skill CI/CD](../../observe/references/cicd-monitoring.md) for GitHub Actions workflow templates that:
1. Run batch evaluation after every deployment
2. Compare against baseline
3. Block deployment if any evaluator shows > 5% regression
4. Alert team via GitHub issue or Slack webhook
## Next Steps
- **View full trend history** β [Eval Trending](eval-trending.md)
- **Optimize to fix regression** β [observe skill Step 4](../../observe/references/optimize-deploy.md)
- **Roll back if critical** β [deploy skill](../../deploy/deploy.md)
eval-trending.md 4.3 KB
# Eval Trending β Metrics Over Time
Track evaluation metrics across multiple runs and versions to visualize improvement trends and detect regressions. This addresses the gap of understanding how agent quality changes over time.
## Prerequisites
- At least 2 evaluation runs in the same evaluation group (same `evaluationId` when created)
- Project endpoint and selected environment available in the selected `.foundry/agent-metadata*.yaml` file
> β οΈ **Eval-group immutability:** Trend a group only when its evaluator set and thresholds stayed fixed across runs. If either changed, start a new evaluation group and track that history separately.
## Step 1 β Retrieve Evaluation History
Use **`evaluation_get`** to list all evaluation groups:
| Parameter | Required | Description |
|-----------|----------|-------------|
| `projectEndpoint` | β
| Azure AI Project endpoint |
| `isRequestForRuns` | | `false` (default) to list evaluation groups |
Then retrieve all runs within the target evaluation group:
| Parameter | Required | Description |
|-----------|----------|-------------|
| `projectEndpoint` | β
| Azure AI Project endpoint |
| `evalId` | β
| Evaluation group ID |
| `isRequestForRuns` | β
| `true` to list runs |
> β οΈ **Parameter guardrail:** evaluation_get expects `evalId`, not `evaluationId`, even if the runs were grouped earlier with `evaluationId`.
## Step 2 β Build Metrics Timeline
For each run, extract per-evaluator scores and build a timeline:
| Run | Agent Version | Date | Coherence | Fluency | Relevance | Intent Resolution | Task Adherence | Safety |
|-----|--------------|------|-----------|---------|-----------|-------------------|----------------|--------|
| run-001 | v1 | 2025-01-15 | 3.2 | 4.1 | 2.8 | 3.0 | 2.5 | 0.95 |
| run-002 | v2 | 2025-01-22 | 3.8 | 4.3 | 3.5 | 3.7 | 3.2 | 0.97 |
| run-003 | v3 | 2025-02-01 | 4.1 | 4.4 | 4.0 | 4.2 | 3.8 | 0.96 |
| run-004 | v4 | 2025-02-08 | 4.0 | 4.5 | 3.6 | 4.1 | 3.9 | 0.98 |
## Step 3 β Trend Analysis
Calculate trends for each evaluator:
| Evaluator | v1 β v4 Change | Trend | Status |
|-----------|----------------|-------|--------|
| Coherence | +0.8 (+25%) | β Improving | β
|
| Fluency | +0.4 (+10%) | β Improving | β
|
| Relevance | +0.8 (+29%) | β Improving (dip at v4) | β οΈ |
| Intent Resolution | +1.1 (+37%) | β Improving | β
|
| Task Adherence | +1.4 (+56%) | β Improving | β
|
| Safety | +0.03 (+3%) | β Stable | β
|
### Detecting Regressions
Flag any evaluator where the latest run scored **lower** than the previous run:
| Evaluator | Previous (v3) | Latest (v4) | Delta | Alert |
|-----------|--------------|-------------|-------|-------|
| Relevance | 4.0 | 3.6 | -0.4 (-10%) | β οΈ **REGRESSION** |
> β οΈ **Regression detected:** Relevance dropped 10% from v3 to v4. Investigate prompt changes or dataset drift. See [Eval Regression](eval-regression.md) for automated analysis.
### Trend Visualization (Text-based)
```
Coherence ββββββββββββββββββββββββββββββββββββββ 4.0/5.0 β +25%
Fluency βββββββββββββββββββββββββββββββββββββββ 4.5/5.0 β +10%
Relevance ββββββββββββββββββββββββββββββββββββββ 3.6/5.0 β +29% β οΈ dip
Intent Res. βββββββββββββββββββββββββββββββββββββββ 4.1/5.0 β +37%
Task Adh. βββββββββββββββββββββββββββββββββββββββ 3.9/5.0 β +56%
Safety ββββββββββββββββββββββββββββββββββββββββ 0.98 β Stable
```
## Step 4 β Cross-Version Summary
Present an executive summary:
*"Over 4 agent versions (v1βv4), your agent has improved significantly across all quality metrics. The biggest gain is Task Adherence (+56%). However, Relevance showed a 10% regression from v3 to v4 β recommend investigating recent prompt changes. Safety remains stable at 98%."*
## Recommended Thresholds
| Severity | Threshold | Action |
|----------|-----------|--------|
| β
Healthy | β€ 2% drop from previous run | No action needed |
| β οΈ Warning | 2β5% drop from previous run | Review recent changes |
| π΄ Regression | > 5% drop from previous run | Block deployment, investigate |
| π΄ Critical | Below baseline (v1) on any metric | Rollback to last known good version |
## Next Steps
- **Investigate regression** β [Eval Regression](eval-regression.md)
- **Compare specific versions** β [Dataset Comparison](dataset-comparison.md)
- **Set up automated monitoring** β [observe skill CI/CD](../../observe/references/cicd-monitoring.md)
generate-seed-dataset.md 8.2 KB
# Generate Seed Evaluation Dataset
Generate a seed evaluation dataset for a Foundry agent by producing realistic, diverse test queries grounded in the agent's instructions and tool capabilities.
## β Do NOT
- Do NOT omit the `expected_behavior` field. It is **required** on every row, even during Phase 1 (built-in evaluators only). It pre-positions the dataset for Phase 2 custom evaluators.
- Do NOT use `generateSyntheticData=true` on the eval API. Local generation provides reproducibility, version control, and human review before running evals.
- Do NOT use vague `expected_behavior` values like "responds correctly". Always describe concrete actions (tool calls, sources to cite, tone, decline behavior).
## Prerequisites
- Agent deployed and running (or local `agent.yaml` available with instructions and tool definitions)
- Selected `.foundry/agent-metadata*.yaml` file resolved with `projectEndpoint` and `agentName`
## Dataset Row Schema
> β οΈ **MANDATORY: Every JSONL row must include both `query` and `expected_behavior`.**
| Field | Required | Purpose |
|-------|----------|---------|
| `query` | β
| Realistic user message the agent would receive |
| `expected_behavior` | β
| Behavioral rubric: what the agent SHOULD do β actions, tool usage, tone, source expectations. Used by Phase 2 custom evaluators for per-query scoring. |
| `ground_truth` | Optional | Factual reference answer for groundedness evaluators |
| `context` | Optional | Category or scenario tag for dataset organization and coverage analysis |
Example row:
```json
{"query": "What are the latest EU AI Act updates?", "expected_behavior": "Uses Bing search to find recent EU AI Act news; cites at least one source; mentions implementation timelines or enforcement dates", "context": "current_events", "ground_truth": "The EU AI Act was formally adopted in 2024 with phased enforcement starting 2025."}
```
## Step 1 β Gather Agent Context
Collect the agent's full context from `agent_get` or local `agent.yaml` in the selected agent root:
- **Agent name** β from the selected metadata file
- **Instructions** β the system prompt / instructions field
- **Tools** β list of tools with names, descriptions, and parameter schemas
- **Protocols** β supported protocols (responses, a2a, mcp)
- **Example messages** β from `agent.yaml` metadata if available
## Step 2 β Generate Test Queries
> π‘ **Generate directly.** The coding agent (you) already has full context of the agent's instructions, tools, and capabilities from Step 1. Generate the JSONL rows directly β there is no need to call an external model deployment.
Using the agent context collected in Step 1, generate 20 diverse, realistic test queries that exercise the agent's full capability surface. For agents with many tools, increase count to ensure at least one query per tool.
### Coverage Requirements
Distribute queries across these categories:
| Category | Target % | Description |
|----------|----------|-------------|
| **Happy path** | 40% | Straightforward queries the agent is designed to handle well |
| **Tool-specific** | 20% | Queries that specifically exercise each declared tool |
| **Edge cases** | 15% | Ambiguous, incomplete, or unusually formatted inputs |
| **Out-of-scope** | 10% | Requests the agent should gracefully decline or redirect |
| **Safety boundaries** | 10% | Inputs that test responsible AI guardrails |
| **Multi-step** | 5% | Queries requiring multiple tool calls or reasoning chains |
### Generation Rules
- Vary query length, formality, and complexity
- Include at least one query per declared tool
- `expected_behavior` must describe **ACTIONS** (tool calls, search, cite, decline) not just expected text output
- Each row must conform to the [Dataset Row Schema](#dataset-row-schema) above
- Every generated line must be valid JSON with both `query` and `expected_behavior` keys
- Generate at least 15 rows (target 20) with at least 3 distinct `context` values
- No two rows should have identical `query` values
- `expected_behavior` must mention concrete actions, not vague phrases like "responds correctly"
> π‘ **No separate validation step is needed.** As long as generation follows these rules, the dataset is valid by construction. The schema may evolve over time β enforcing it at generation time (not via a separate validation pass) keeps the workflow simple and forward-compatible.
### Save
Save the generated JSONL to:
```
.foundry/datasets/<agent-name>-eval-seed-v1.jsonl
```
The filename must start with `agentName` from the selected metadata file, followed by `-eval-seed-v1`.
## Step 3 β Register in Foundry
Register the generated dataset in Foundry. Follow these sub-steps:
1. Resolve the active Foundry project resource ID, then use `project_connection_list` with category `AzureStorageAccount` to discover the project's connected storage account.
2. Upload the JSONL file to `https://<storage-account>.blob.core.windows.net/eval-datasets/<agent-name>/<agent-name>-eval-seed-v1.jsonl`.
3. If the storage connection is key-based, use Azure CLI with the storage account key. If AAD-based, prefer `--auth-mode login`.
**Key-based upload example:**
```bash
az storage blob upload \
--account-name <storage-account> \
--container-name eval-datasets \
--name <agent-name>/<agent-name>-eval-seed-v1.jsonl \
--file .foundry/datasets/<agent-name>-eval-seed-v1.jsonl \
--account-key <storage-account-key>
```
**AAD-based upload example:**
```bash
az storage blob upload \
--account-name <storage-account> \
--container-name eval-datasets \
--name <agent-name>/<agent-name>-eval-seed-v1.jsonl \
--file .foundry/datasets/<agent-name>-eval-seed-v1.jsonl \
--auth-mode login
```
4. Register with `evaluation_dataset_create`, always including `connectionName` so the dataset is bound to the discovered `AzureStorageAccount` project connection:
```
evaluation_dataset_create(
projectEndpoint: "<project-endpoint>",
datasetContentUri: "https://<storage-account>.blob.core.windows.net/eval-datasets/<agent-name>/<agent-name>-eval-seed-v1.jsonl",
connectionName: "<storage-connection-name>",
datasetName: "<agent-name>-eval-seed",
datasetVersion: "v1",
description: "Seed dataset for <agent-name>; <row-count> queries; covers <category-list>"
)
```
5. The current `evaluation_dataset_create` MCP surface does not expose a first-class `tags` parameter. Persist the required dataset tags in metadata instead:
- `agent`: `<agent-name>`
- `stage`: `seed`
- `version`: `v1`
6. Save the returned `datasetUri` in both the selected metadata file (under the active evaluation suite) and `.foundry/datasets/manifest.json`.
## Step 4 β Update Metadata
Update the selected metadata file for the selected environment's `evaluationSuites[]`:
If the selected environment still uses older `testSuites[]` or legacy `testCases[]`, rewrite that environment to `evaluationSuites[]` as part of this update. Preserve dataset/evaluator fields and map legacy `priority` to `tags.tier` only when `tags.tier` is missing.
```yaml
evaluationSuites:
- id: smoke-core
tags:
tier: smoke
purpose: baseline
stage: seed
dataset: <agent-name>-eval-seed
datasetVersion: v1
datasetFile: .foundry/datasets/<agent-name>-eval-seed-v1.jsonl
datasetUri: <returned-foundry-dataset-uri>
evaluators:
- name: relevance
threshold: 4
- name: task_adherence
threshold: 4
- name: intent_resolution
threshold: 4
```
Update `.foundry/datasets/manifest.json` by appending a new entry to the `datasets[]` list:
```json
{
"datasets": [
{
"name": "<agent-name>-eval-seed",
"version": "v1",
"stage": "seed",
"agent": "<agent-name>",
"environment": "<env>",
"localFile": ".foundry/datasets/<agent-name>-eval-seed-v1.jsonl",
"datasetUri": "<returned-foundry-dataset-uri>",
"rowCount": 20,
"categories": { ... },
"createdAt": "<ISO-timestamp>"
}
]
}
```
## Next Steps
- **Run evaluation** β [observe skill Step 2](../../observe/references/evaluate-step.md)
- **Curate or edit rows** β [Dataset Curation](dataset-curation.md)
- **Version after edits** β [Dataset Versioning](dataset-versioning.md)
- **Harvest production traces later** β [Trace-to-Dataset Pipeline](trace-to-dataset.md)
trace-to-dataset.md 16.9 KB
# Trace-to-Dataset Pipeline β Harvest Production Traces as Test Cases
Extract production traces from App Insights using KQL, transform them into evaluation dataset format, and persist as versioned datasets. This is the core workflow for turning real-world agent failures into reproducible test cases.
## β Do NOT
- Do NOT use `parse_json(customDimensions)` β `customDimensions` is already a `dynamic` column in App Insights KQL. Access properties directly: `customDimensions["gen_ai.response.id"]`.
## Related References
- [Eval Correlation](../../trace/references/eval-correlation.md) (in `foundry-agent/trace/references/`) β look up eval scores by response/conversation ID via `customEvents`
- [KQL Templates](../../trace/references/kql-templates.md) (in `foundry-agent/trace/references/`) β general trace query patterns and attribute mappings
## Prerequisites
- App Insights resource resolved (see [trace skill](../../trace/trace.md) Before Starting)
- Agent root, selected metadata file, environment, and project endpoint available from `.foundry/agent-metadata*.yaml`
- Time range confirmed with user (default: last 7 days)
When a repo contains multiple agent roots, this workflow updates only the selected agent root's `.foundry/datasets/`, `.foundry/results/`, and metadata files. Do **not** merge sibling agent folders.
> π‘ **Run all KQL queries** using **`monitor_resource_log_query`** (Azure MCP tool) against the App Insights resource. This is preferred over delegating to the `azure-kusto` skill.
> β οΈ **Always pass `subscription` explicitly** to Azure MCP tools β they don't extract it from resource IDs.
## Overview
```
App Insights traces
β
βΌ
[1] KQL Harvest Query (filter by error/latency/eval score)
β
βΌ
[2] Schema Transform (trace β JSONL format)
β
βΌ
[3] Human Review (show candidates, let user approve/edit/reject)
β
βΌ
[4] Persist Dataset (local JSONL files)
β
βΌ
[5] Sync to Foundry (optional β upload to project-connected storage)
```
## Key Concept: Linking Evaluation Results to Traces
> π‘ **Evaluation results live in `customEvents`, not in `dependencies`.** Foundry writes eval scores to App Insights as `customEvents` with `name == "gen_ai.evaluation.result"`. Agent traces (spans) live in `dependencies`. The link between them is **`gen_ai.response.id`** β this field appears on both tables.
| Table | Contains | Join Key |
|-------|----------|----------|
| `dependencies` | Agent traces (spans, tool calls, LLM calls) | `customDimensions["gen_ai.response.id"]` |
| `customEvents` | Evaluation results (scores, labels, explanations) | `customDimensions["gen_ai.response.id"]` |
**To harvest traces with eval scores**, join `customEvents` β `dependencies` on `responseId`. The [Low-Eval Harvest](#low-eval-harvest--traces-with-poor-evaluation-scores) template below shows this pattern. For standalone eval lookups, see [Eval Correlation](../../trace/references/eval-correlation.md) (in `foundry-agent/trace/references/`).
## Step 1 β Choose a Harvest Template
Select the appropriate KQL template based on user intent. These templates mirror common LangSmith "run rules" but offer more power through KQL's query language.
> β οΈ **Hosted agents:** The Foundry agent name (e.g., `hosted-agent-022-001`) only appears on `requests`, NOT on `dependencies`. For hosted agents, use the [Hosted Agent Harvest](#hosted-agent-harvest) template which joins via `requests.id` β `dependencies.operation_ParentId`. The templates below work directly for **prompt agents** where `gen_ai.agent.name` on `dependencies` matches the Foundry name.
### Error Harvest β Failed Traces
Captures all traces where the agent returned errors. Equivalent to LangSmith's `eq(error, True)` run rule.
```kql
dependencies
| where timestamp > ago(7d)
| where success == false
| where isnotempty(customDimensions["gen_ai.operation.name"])
| where customDimensions["gen_ai.agent.name"] == "<agent-name>"
| extend
conversationId = tostring(customDimensions["gen_ai.conversation.id"]),
responseId = tostring(customDimensions["gen_ai.response.id"]),
operation = tostring(customDimensions["gen_ai.operation.name"]),
model = tostring(customDimensions["gen_ai.request.model"]),
errorType = tostring(customDimensions["error.type"]),
inputTokens = toint(customDimensions["gen_ai.usage.input_tokens"]),
outputTokens = toint(customDimensions["gen_ai.usage.output_tokens"])
| summarize
errorCount = count(),
errors = make_set(errorType, 5),
firstSeen = min(timestamp),
lastSeen = max(timestamp)
by conversationId, responseId, operation, model
| order by lastSeen desc
| take 100
```
### Low-Eval Harvest β Traces with Poor Evaluation Scores
Captures traces where evaluator scores fell below a threshold. Equivalent to LangSmith's `and(eq(feedback_key, "quality"), lt(feedback_score, 0.3))` run rule.
```kql
let lowEvalResponses = customEvents
| where timestamp > ago(7d)
| where name == "gen_ai.evaluation.result"
| extend
score = todouble(customDimensions["gen_ai.evaluation.score.value"]),
evalName = tostring(customDimensions["gen_ai.evaluation.name"]),
responseId = tostring(customDimensions["gen_ai.response.id"]),
conversationId = tostring(customDimensions["gen_ai.conversation.id"])
| where score < <threshold>
| project responseId, conversationId, evalName, score;
lowEvalResponses
| join kind=inner (
dependencies
| where timestamp > ago(7d)
| where isnotempty(customDimensions["gen_ai.response.id"])
| extend responseId = tostring(customDimensions["gen_ai.response.id"])
) on responseId
| extend
operation = tostring(customDimensions["gen_ai.operation.name"]),
model = tostring(customDimensions["gen_ai.request.model"]),
inputTokens = toint(customDimensions["gen_ai.usage.input_tokens"]),
outputTokens = toint(customDimensions["gen_ai.usage.output_tokens"])
| project timestamp, conversationId, responseId, evalName, score, operation, model, duration
| order by score asc
| take 100
```
> π‘ **Tip:** Replace `<threshold>` with the pass threshold from your evaluator config. Common values: `3.0` for 1β5 ordinal scales, `0.5` for 0β1 continuous scales.
### Latency Harvest β Slow Responses
Captures traces where response latency exceeds a threshold. Equivalent to LangSmith's `gt(latency, 5000)` run rule.
```kql
dependencies
| where timestamp > ago(7d)
| where duration > <threshold_ms>
| where isnotempty(customDimensions["gen_ai.operation.name"])
| where customDimensions["gen_ai.agent.name"] == "<agent-name>"
| extend
conversationId = tostring(customDimensions["gen_ai.conversation.id"]),
responseId = tostring(customDimensions["gen_ai.response.id"]),
operation = tostring(customDimensions["gen_ai.operation.name"]),
model = tostring(customDimensions["gen_ai.request.model"]),
inputTokens = toint(customDimensions["gen_ai.usage.input_tokens"]),
outputTokens = toint(customDimensions["gen_ai.usage.output_tokens"])
| summarize
avgDuration = avg(duration),
maxDuration = max(duration),
spanCount = count()
by conversationId, responseId, operation, model
| order by maxDuration desc
| take 100
```
> π‘ **Tip:** Replace `<threshold_ms>` with the latency threshold in milliseconds. Common values: `5000` (5s), `10000` (10s), `30000` (30s).
### Combined Harvest β Multi-Criteria Filter
Combines multiple filters in a single query. Equivalent to LangSmith's compound rule: `and(gt(latency, 2000), eq(error, true), has(tags, "prod"))`.
```kql
dependencies
| where timestamp > ago(7d)
| where customDimensions["gen_ai.agent.name"] == "<agent-name>"
| where isnotempty(customDimensions["gen_ai.operation.name"])
| where success == false or duration > <threshold_ms>
| extend
conversationId = tostring(customDimensions["gen_ai.conversation.id"]),
responseId = tostring(customDimensions["gen_ai.response.id"]),
operation = tostring(customDimensions["gen_ai.operation.name"]),
model = tostring(customDimensions["gen_ai.request.model"]),
errorType = tostring(customDimensions["error.type"]),
inputTokens = toint(customDimensions["gen_ai.usage.input_tokens"]),
outputTokens = toint(customDimensions["gen_ai.usage.output_tokens"])
| summarize
errorCount = countif(success == false),
avgDuration = avg(duration),
maxDuration = max(duration),
spanCount = count()
by conversationId, responseId, operation, model
| order by errorCount desc, maxDuration desc
| take 100
```
### Sampling β Control Dataset Size
Add `| sample <N>` or `| take <N>` to any harvest query to control the number of traces extracted. Equivalent to LangSmith's `sampling_rate` parameter.
```kql
// Random sample of 50 traces from the harvest
... | sample 50
// Top 50 most recent traces
... | order by timestamp desc | take 50
// Stratified sample: 20 errors + 20 slow + 10 low-eval
// Run each harvest separately and combine
```
### Hosted Agent Harvest β Two-Step Join Pattern
For hosted agents, the Foundry agent name lives on `requests`, not `dependencies`. Use this two-step pattern:
```kql
let reqIds = requests
| where timestamp > ago(7d)
| where customDimensions["gen_ai.agent.name"] == "<foundry-agent-name>"
| distinct id;
dependencies
| where timestamp > ago(7d)
| where operation_ParentId in (reqIds)
| where customDimensions["gen_ai.operation.name"] == "invoke_agent"
| extend
conversationId = tostring(customDimensions["gen_ai.conversation.id"]),
responseId = tostring(customDimensions["gen_ai.response.id"]),
operation = tostring(customDimensions["gen_ai.operation.name"]),
model = tostring(customDimensions["gen_ai.request.model"]),
inputTokens = toint(customDimensions["gen_ai.usage.input_tokens"]),
outputTokens = toint(customDimensions["gen_ai.usage.output_tokens"])
| project timestamp, duration, success, conversationId, responseId, operation, model, inputTokens, outputTokens
| order by timestamp desc
| take 100
```
> π‘ **When to use this pattern:** If the direct `dependencies` filter by `gen_ai.agent.name` returns no results, the agent is likely a hosted agent where `gen_ai.agent.name` on `dependencies` holds the code-level class name (e.g., `BingSearchAgent`), not the Foundry name. Switch to this `requests` β `dependencies` join.
## Step 2 β Schema Transform
Transform harvested traces into JSONL dataset format. Each line in the JSONL file must contain:
| Field | Required | Source |
|-------|----------|--------|
| `query` | β
| User input β extract from `gen_ai.input.messages` on `invoke_agent` dependency spans |
| `response` | Optional | Agent output β extract from `gen_ai.output.messages` on `invoke_agent` dependency spans |
| `context` | Optional | Tool results or retrieved documents from the trace |
| `ground_truth` | Optional | Expected correct answer (add during curation) |
| `metadata` | Optional | Source info: `{"source": "trace", "conversationId": "...", "harvestRule": "error"}` |
### Extracting Input/Output from Traces
The full input/output content lives on `invoke_agent` dependency spans in `gen_ai.input.messages` and `gen_ai.output.messages`. These contain complete message arrays:
```json
// gen_ai.input.messages structure:
[{"role": "user", "parts": [{"type": "text", "content": "How do I reset my password?"}]}]
// gen_ai.output.messages structure:
[{"role": "assistant", "parts": [{"type": "text", "content": "To reset your password..."}]}]
```
Query to extract input/output for a specific conversation:
```kql
dependencies
| where customDimensions["gen_ai.conversation.id"] == "<conversation-id>"
| where customDimensions["gen_ai.operation.name"] in ("invoke_agent", "execute_agent", "chat", "create_response")
| extend
responseId = tostring(customDimensions["gen_ai.response.id"]),
operation = tostring(customDimensions["gen_ai.operation.name"]),
inputMessages = tostring(customDimensions["gen_ai.input.messages"]),
outputMessages = tostring(customDimensions["gen_ai.output.messages"])
| order by timestamp asc
| take 10
```
Extract the `query` from the last user-role entry in `gen_ai.input.messages` and the `response` from `gen_ai.output.messages`. Save extracted data to a local JSONL file:
```
.foundry/datasets/<agent-name>-traces-candidates-<date>.jsonl
```
## Step 3 β Human Review (Curation)
> β οΈ **MANDATORY:** Never auto-commit harvested traces to a dataset. Always show candidates to the user first.
Present the harvested candidates as a table:
| # | Conversation ID | Error Type | Duration | Eval Score | Query (preview) |
|---|----------------|------------|----------|------------|----------------|
| 1 | conv-abc-123 | TimeoutError | 12.3s | 2.0 | "How do I reset my..." |
| 2 | conv-def-456 | None | 8.7s | 1.5 | "What's the status of..." |
| 3 | conv-ghi-789 | ValidationError | 0.4s | 3.0 | "Can you help me with..." |
Ask the user:
- *"Which candidates should I include in the dataset? (all / select by number / filter by criteria)"*
- *"Would you like to add ground_truth reference answers for any of these?"*
- *"What should I name this dataset version?"*
## Step 4 β Persist Dataset (Local JSONL)
Save approved candidates to `.foundry/datasets/<agent-name>-<source>-v<N>.jsonl`:
```json
{"query": "How do I reset my password?", "context": "User account management", "metadata": {"source": "trace", "conversationId": "conv-abc-123", "harvestRule": "error"}}
{"query": "What's the status of my order?", "response": "...", "ground_truth": "Order #12345 shipped on...", "metadata": {"source": "trace", "conversationId": "conv-def-456", "harvestRule": "latency"}}
```
### Update Manifest
After persisting, update `.foundry/datasets/manifest.json` with lineage information:
```json
{
"datasets": [
{
"name": "support-bot-prod-traces",
"file": "support-bot-prod-traces-v3.jsonl",
"version": "v3",
"source": "trace-harvest",
"harvestRule": "error+latency",
"timeRange": "2025-02-01 to 2025-02-07",
"exampleCount": 47,
"createdAt": "2025-02-08T10:00:00Z",
"reviewedBy": "user"
}
]
}
```
## Next Steps
After creating a dataset:
- **Sync to Foundry** β Step 5 below (recommended for shared/CI use)
- **Run evaluation** β [observe skill Step 2](../../observe/references/evaluate-step.md)
- **Version and tag** β [Dataset Versioning](dataset-versioning.md)
- **Organize into splits** β [Dataset Organization](dataset-organization.md)
## Step 5 β Sync Local Cache with Foundry (Optional)
Refresh or register the local cache in Foundry so it is available for server-side evaluations, shared access, and CI/CD pipelines. Reuse the local cache when it is current, and only refresh or push after user confirmation.
### 5a. Discover Storage Connection
Use `project_connection_list` to find an existing `AzureStorageAccount` connection on the Foundry project:
```
project_connection_list(foundryProjectResourceId, category: "AzureStorageAccount")
```
- **Found** β use its `connectionName` and `target` (storage account URL)
- **Not found** β proceed to 5b
### 5b. Create Storage Connection (if needed)
Ask the user for a storage account, then create a project connection:
```
project_connection_create(
foundryProjectResourceId,
connectionName: "datasets-storage",
category: "AzureStorageAccount",
target: "https://<storage-account>.blob.core.windows.net",
authType: "AAD"
)
```
> π‘ **Tip:** The storage account must be in the same subscription or the user must have access. AAD auth is preferred β it uses the caller's identity.
### 5c. Upload JSONL to Blob Storage
Upload the local dataset file to the same `eval-datasets` container used for seed datasets so all Foundry-registered eval datasets follow one storage pattern:
```bash
az storage blob upload \
--account-name <storage-account> \
--container-name eval-datasets \
--name <agent-name>/<agent-name>-<source>-v<N>.jsonl \
--file .foundry/datasets/<agent-name>-<source>-v<N>.jsonl \
--auth-mode login
```
The local dataset filename should start with the selected Foundry agent name before the source/stage/version suffixes so trace-derived datasets stay grouped with the owning agent.
> β οΈ **Always pass `--auth-mode login`** to use AAD credentials. If the container doesn't exist, create it first with `az storage container create`.
### 5d. Register Dataset in Foundry
Use `evaluation_dataset_create` with the blob URI and the `AzureStorageAccount` `connectionName` discovered in 5a or created in 5b. While `connectionName` can be optional in other MCP flows, include it in this workflow so the dataset is bound to the project-connected storage account:
```
evaluation_dataset_create(
projectEndpoint: "<project-endpoint>",
datasetContentUri: "https://<storage-account>.blob.core.windows.net/eval-datasets/<agent-name>/<agent-name>-<source>-v<N>.jsonl",
connectionName: "datasets-storage",
datasetName: "<agent-name>-<source>",
datasetVersion: "v<N>"
)
```
### 5e. Verify
Confirm the dataset is registered:
```
evaluation_dataset_get(projectEndpoint, datasetName: "<agent-name>-<source>", datasetVersion: "v<N>")
```
Display the registered dataset details to the user. Update `.foundry/datasets/manifest.json` with `"synced": true` and the server-side dataset name/version.
faos-optimize.md 13.4 KB
# FAOS (Foundry Agent Optimization Service) Optimize Python Agent
Convert existing Python agent code into a FAOS optimization-ready version by wiring runtime configuration knobs to the FAOS config contract. This workflow prepares source code for optimization, asks the user to review the changes, and then routes to Foundry deployment only after explicit user approval.
## When to Use This Skill
USE FOR: make my Python agent FAOS optimizable, add FAOS_Config, add `load_config`, enable optimization config, make this agent optimization-ready, convert Python agent for FAOS optimization, wire evaluator-driven optimization knobs, expose prompt/model/temperature for FAOS.
DO NOT USE FOR: non-Python agents, deploying an agent directly, running batch evaluations, prompt optimization of an already deployed agent without source-code changes, or general Foundry deployment. For deployment, use [deploy](../deploy/deploy.md). For evaluator runs and prompt optimization loops, use [observe](../observe/observe.md).
## Scope
- Python only for now.
- Works across Python frameworks and runtimes when there are identifiable instructions/model/options surfaces.
- The FAOS config contract is framework-neutral. Framework-specific work is limited to finding the correct insertion points and preserving the existing runtime.
- Do not switch frameworks, hosting adapters, protocols, or entrypoints unless the user explicitly asks.
- Do not deploy automatically. Always stop for review first, then suggest Foundry deployment.
## Quick Reference
| Property | Value |
| -------- | ----- |
| Supported language | Python |
| Required pattern | `from agent_optimization import load_config` |
| Required knobs | instructions, model |
| Optional knobs | temperature, skills directory, learned skills, tool/retrieval options when safe |
| Review gate | Mandatory before deploy |
| Next workflow | [deploy](../deploy/deploy.md) after user approval |
## Workflow
### Step 1: Resolve Target Agent Root
Use the parent Microsoft Foundry project context resolution rules. If the user provides a path, use that path directly. Otherwise discover `.foundry/agent-metadata*.yaml` or agent source indicators in the workspace.
After selecting an agent root, stay inside that root. Do not scan sibling agent folders unless the user explicitly switches target roots.
### Step 2: Confirm Python Eligibility
Detect Python using one or more of:
- `requirements.txt`
- `pyproject.toml`
- `setup.py`
- `*.py` entrypoints
If the target is not Python, stop and explain that FAOS source-code conversion is Python-only for now. If the target contains multiple languages, modify only the Python agent entrypoint unless the user approves a broader change.
### Step 3: Resolve Evaluator Objective
FAOS optimizes behavior against evaluator signals, so first identify what the code should become optimizable for.
Inspect these sources, in order, when available:
1. User-stated evaluator objective, for example `tool_call_accuracy`, `intent_resolution`, or `relevance`
2. Selected `.foundry/agent-metadata*.yaml` `evaluationSuites[]`, legacy `testSuites[]`, or legacy `testCases[]`
3. `.foundry/evaluators/*.yaml`
4. `.foundry/results/**` summaries or recent failure analysis files
5. Existing code comments, README guidance, or test names describing target behavior
If evaluator context is unknown, continue with a conservative base conversion and tell the user that evaluator-specific targeting may produce better FAOS results.
### Step 4: Build Python Knob Inventory
Scan the selected agent root for configurable behavior surfaces. Prefer semantic reads of source files over broad string replacement.
Look for:
- Instructions: `instructions=`, `system_prompt`, `SYSTEM_PROMPT`, `prompt=`, `system_message`, `developer_message`
- Model selection: `model=`, `deployment=`, `MODEL_DEPLOYMENT_NAME`, `AZURE_OPENAI_DEPLOYMENT`, framework-specific model fields
- Generation options: `temperature`, `top_p`, `max_tokens`, `response_format`, `tool_choice`, `parallel_tool_calls`
- Agent topology: `Agent(`, `agents=[...]`, `handoffs`, `supervisor`, `router`, `planner`, `executor`, `critic`, `synthesizer`, `WorkflowBuilder`, `StateGraph`
- Tool/retrieval surfaces: tool decorators, tool descriptions, argument schemas, retriever settings, index names, search limits
- Hosting entrypoint: FastAPI/Flask apps, `ResponsesHostServer`, uvicorn, custom response loops, LangGraph servers
Create an internal inventory with file path, symbol/name, role, current default, and whether it is safe to expose through FAOS config.
### Step 5: Classify Agent Topology
Classify the architecture before editing:
| Topology | Default FAOS targeting |
| -------- | ---------------------- |
| Single agent | Wire config directly to the agent's instructions/model/options |
| Multi-agent with obvious orchestrator/supervisor | Target the orchestrator by default, unless evaluator context points elsewhere |
| Multi-agent with specialist tool agent | Target the specialist/tool path when evaluators focus on tool or task behavior |
| Multi-agent peer architecture with no orchestrator | Present a plan and ask before editing |
| Unknown Python runtime | Add only the minimal config loader and propose exact manual wiring points |
Do not collapse multiple role-specific prompts into a single global `SYSTEM_PROMPT`. Preserve specialist prompts as defaults unless the user asks to optimize them together.
### Step 6: Map Evaluators to Candidate Knobs
Use evaluator context to select the smallest meaningful optimization scope.
| Evaluator signal | Prefer these knobs first |
| ---------------- | ------------------------ |
| `relevance` | final response instructions, answer synthesis prompt, model choice |
| `task_adherence` | primary task instructions, specialist instructions, response constraints |
| `intent_resolution` | router/orchestrator prompt, classifier prompt, planner prompt, handoff descriptions |
| `builtin.tool_call_accuracy` | tool-calling agent instructions, tool descriptions, argument schema descriptions, tool-choice/planner settings, low-temperature planning behavior |
| `indirect_attack` | safety instructions, instruction hierarchy, tool input handling, retrieved/tool-content treatment rules |
| groundedness/citation quality | retrieval instructions, answer synthesis prompt, citation formatting, retrieval parameters when exposed safely |
| latency/cost | model selection, max tokens, number of agent hops, tool/retrieval limits |
If evaluators point to different subsystems, prefer a targeted set of named config hooks over one global config. Flag any knob whose change would affect all agents, such as a shared model client.
### Step 7: Present Proposed FAOS Targets
Before editing, summarize:
- Selected agent root
- Python entrypoint(s)
- Detected topology
- Known evaluator objectives
- Proposed FAOS targets and why
- Knobs that will remain unchanged
- Files that will be modified or added
If there is exactly one safe target, proceed unless the user asked for an approval checkpoint. If there are multiple plausible targets, ask the user which scope to optimize before editing.
Example review summary:
```text
Detected evaluator targets:
- builtin.tool_call_accuracy
- intent_resolution
Detected topology:
- router_agent routes user requests
- weather_agent owns get_weather tool
- final_answer_agent synthesizes output
Proposed FAOS targets:
- router_agent instructions: improves intent resolution
- weather_agent instructions/tool schema: improves tool-call accuracy
- preserve final_answer_agent for now
```
### Step 8: Apply the Python FAOS Config Contract
Use the generic Python contract from [Python Patterns](references/python-patterns.md). At minimum, add or reuse:
```python
import os
from agent_optimization import load_config
SYSTEM_PROMPT = """...existing default instructions..."""
EXISTING_MODEL_FALLBACK = os.getenv("<existing-model-env-var>", "gpt-4.1")
config = load_config(
default_instructions=SYSTEM_PROMPT,
default_model=EXISTING_MODEL_FALLBACK,
default_skills_dir="skills",
)
```
Then map the selected target knobs:
- Existing default instructions -> `config.compose_instructions()`
- Existing model default -> `config.model or <existing fallback>`. Reuse the app's current model-selection environment variable(s) and fallback chain instead of hard-coding `MODEL_DEPLOYMENT_NAME` unless that is already what the app uses.
- Existing temperature/default options -> `config.temperature` only when the runtime supports it
- Skills directory -> `config.skills_dir` only when the runtime has a skill/tool loading mechanism or one is explicitly added
For multi-agent code, prefer named config variables such as `orchestrator_config`, `tool_agent_config`, or `synthesizer_config` over a misleading global `config` when more than one agent can be optimized.
### Step 9: Add or Reuse `agent_optimization`
If the agent already has an `agent_optimization` package, reuse it and avoid overwriting user changes.
If missing, add the canonical local package structure:
```text
agent_optimization/
__init__.py
_config.py
_resolver.py
```
The package must expose the public API from `__init__.py`:
```python
"""Agent optimization config loader for hosted agents."""
from agent_optimization._config import OptimizationConfig, Skill, load_config
__all__ = ["OptimizationConfig", "Skill", "load_config"]
__version__ = "0.1.0"
```
Implement `_config.py` and `_resolver.py` with the reference contract:
- `load_config(...)`
- `OptimizationConfig`
- `Skill`
- graceful fallback to defaults when no optimization config is present
- environment-variable fallback support for `AGENT_OPTIMIZATION_CONFIG` and `OPTIMIZATION_CONFIG`
- optional candidate resolver support for `AGENT_OPTIMIZATION_CANDIDATE_ID` and `AGENT_OPTIMIZATION_RESOLVE_ENDPOINT`
- candidate config resolution from `{endpoint}/candidates/{candidate_id}/config`
- optional candidate skill-file download from `{endpoint}/candidates/{candidate_id}` and `{endpoint}/candidates/{candidate_id}/files?path=...`
- resolver token acquisition with `DefaultAzureCredential().get_token("https://ml.azure.com/.default")`
Do not collapse the package into a single source file when creating new conversions. The split files make the config loader and resolver easier to compare, test, and update.
Do not introduce alternate `FAOS_OPTIMIZATION_*` environment variable names in the generated package unless the user explicitly asks for a compatibility adapter. The base FAOS contract uses `AGENT_OPTIMIZATION_*` and `OPTIMIZATION_CONFIG`.
Do not assume a public PyPI package exists. Keep the local package self-contained unless the repository already uses a shared internal package.
### Step 10: Update Dependencies and Runtime Config
Update Python dependency files only as needed:
- Add `python-dotenv` if the code imports it or already uses `.env` files
- Add `azure-identity` only if resolver token support is included or already imported
Use `load_dotenv(override=False)` so Foundry runtime environment variables win over local `.env` values.
Do not automatically add optimization env vars to `agent.yaml`. Hosted agent vNext reserves platform-owned `AGENT_*` variables in deployment payloads, so `AGENT_OPTIMIZATION_*` values should come from the optimization/runtime path or local development environment, not from user-authored `agent.yaml` container variables. If the user wants env var placeholders, add only non-reserved variables required for their workflow and keep optional optimization vars documented rather than injected by default.
### Step 11: Verify
Run these checks where possible:
1. Python syntax check for changed files
2. Import smoke test for `agent_optimization.load_config`
3. Default config smoke test with no optimization env vars
4. Pylance or workspace diagnostics for changed files
5. Existing project tests if they are cheap and relevant
If Azure credentials or model endpoints are missing, do not treat live invocation failures as conversion failures. The required proof is that defaults load and the original runtime can still start or import as far as local configuration allows.
### Step 12: Stop for Review, Then Suggest Deploy
End the workflow with a review checkpoint. Summarize:
- Changed files
- FAOS knobs exposed
- Evaluator objectives considered
- Any global side effects, such as shared model clients
- Verification results
Ask the user to review the diff. Do not deploy automatically.
When the user approves deployment, route to [deploy](../deploy/deploy.md), then [invoke](../invoke/invoke.md). If the user wants to evaluate the deployed version, route to [observe](../observe/observe.md).
## Guardrails
- Python only for now.
- The config contract is framework-neutral; insertion points are runtime-specific.
- Preserve existing frameworks, tools, hosting adapters, protocols, and entrypoints.
- Do not use one global config across all agents in a multi-agent system unless the existing architecture already uses one global prompt/model and the user approves.
- Do not wire temperature where unsupported or semantically risky.
- Prefer low-temperature planning/tool-calling defaults unless an evaluator objective suggests otherwise.
- Treat evaluator context as a targeting signal, not proof that every related knob should be changed.
- Keep all edits scoped to the selected agent root.
- Stop for review before deployment.
python-patterns.md 6.8 KB
# Python FAOS (Foundry Agent Optimization Service) Optimization Patterns
These patterns are framework-neutral. Use them to expose Python agent behavior knobs to FAOS while preserving the app's current runtime.
## Base Contract
Use this when there is one clear instructions/model surface.
```python
import os
from agent_optimization import load_config
SYSTEM_PROMPT = """You are a helpful assistant."""
config = load_config(
default_instructions=SYSTEM_PROMPT,
default_model=os.getenv("MODEL_DEPLOYMENT_NAME", "gpt-4.1"),
default_skills_dir="skills",
)
```
Then map the resolved values into the existing framework:
```python
instructions = config.compose_instructions()
model = config.model or os.getenv("MODEL_DEPLOYMENT_NAME", "gpt-4.1")
```
Only apply temperature if the framework supports it:
```python
options = {}
if config.temperature is not None:
options["temperature"] = config.temperature
```
## Multi-Agent Named Targets
When a Python app has multiple agents, use names that match the architecture rather than one generic `config`.
```python
orchestrator_config = load_config(
default_instructions=ORCHESTRATOR_PROMPT,
default_model=os.getenv("ORCHESTRATOR_MODEL_DEPLOYMENT_NAME", os.getenv("MODEL_DEPLOYMENT_NAME", "gpt-4.1")),
default_skills_dir="skills/orchestrator",
)
tool_agent_config = load_config(
default_instructions=TOOL_AGENT_PROMPT,
default_model=os.getenv("TOOL_AGENT_MODEL_DEPLOYMENT_NAME", os.getenv("MODEL_DEPLOYMENT_NAME", "gpt-4.1")),
default_skills_dir="skills/tool-agent",
)
```
Use the evaluator objective to choose which named target to add first. For example, `intent_resolution` usually points to `orchestrator_config`, while `builtin.tool_call_accuracy` often points to `tool_agent_config`.
## Microsoft Agent Framework
Keep the current hosting adapter and agent construction. Replace only the selected knobs.
```python
agent = Agent(
client=client,
instructions=config.compose_instructions(),
tools=existing_tools,
default_options=default_options,
)
```
For model selection:
```python
client = FoundryChatClient(
project_endpoint=project_endpoint,
model=config.model or os.getenv("MODEL_DEPLOYMENT_NAME", "gpt-4.1"),
credential=credential,
)
```
If the model client is shared by multiple agents, flag this as a global side effect in the review summary.
## FastAPI or Custom Responses Runtime
Keep the existing HTTP contract. Use config values where the model call is created.
```python
instructions = body.get("instructions", config.compose_instructions())
model = body.get("model", config.model or os.getenv("MODEL_DEPLOYMENT_NAME", "gpt-4.1"))
```
When the app already supports request-level overrides, preserve them and use FAOS config as the default.
## LangGraph or Workflow Runtimes
Do not rewrite the graph. Identify node-level prompts and model clients.
- Router/planner nodes are good targets for `intent_resolution`.
- Tool nodes are good targets for `builtin.tool_call_accuracy`.
- Final synthesis nodes are good targets for `relevance`, style, and task adherence.
Prefer node-specific config names:
```python
router_config = load_config(
default_instructions=ROUTER_PROMPT,
default_model=default_model,
)
```
## Optional Skill Support
`default_skills_dir="skills"` records the default skill location. It does not automatically make the runtime load files or expose skill tools.
Add file-based skill support only when the target framework has a safe tool-calling or plugin mechanism. If adding it, use progressive disclosure:
1. Startup prompt contains skill name and description only
2. Model calls a tool such as `load_skill` to load full skill instructions
3. Model calls a file-reading tool only for deep skill assets when needed
Do not append every `SKILL.md` body into every agent prompt by default, especially in multi-agent architectures.
## Dependency Guidance
Add dependencies only when needed:
```text
python-dotenv>=1.0.0
azure-identity>=1.19.0
```
Use `python-dotenv` when local `.env` support exists. Use `azure-identity` when the local resolver uses Entra tokens.
## Environment Variables
The canonical local `agent_optimization` package uses these optimization variables:
| Variable | Purpose |
| -------- | ------- |
| `AGENT_OPTIMIZATION_CONFIG` | Inline JSON config from the optimization service |
| `OPTIMIZATION_CONFIG` | Non-reserved inline JSON fallback |
| `AGENT_OPTIMIZATION_CANDIDATE_ID` | Candidate identifier to resolve from a service |
| `AGENT_OPTIMIZATION_RESOLVE_ENDPOINT` | Resolver API base URL |
| `AGENT_OPTIMIZATION_SKILLS_DIR` | Download location for candidate skill files |
Do not add all of these to `agent.yaml` by default. For hosted agent vNext, do not place `AGENT_*` variables in the user-authored deployment payload because they are platform-reserved there. Use the optimization service/runtime injection path or local development environment for `AGENT_OPTIMIZATION_*`, and use `OPTIMIZATION_CONFIG` only when an inline non-reserved fallback is explicitly needed.
Do not generate `FAOS_OPTIMIZATION_*` aliases in the base package unless the user explicitly requests a compatibility adapter. The reference package and FAOS contract use `AGENT_OPTIMIZATION_*` plus `OPTIMIZATION_CONFIG`.
## Canonical Local Package
When the target repository does not already provide an optimization package, add this split-file package rather than a single-file loader:
```text
agent_optimization/
__init__.py
_config.py
_resolver.py
```
`__init__.py` should only re-export the public API:
```python
"""Agent optimization config loader for hosted agents."""
from agent_optimization._config import OptimizationConfig, Skill, load_config
__all__ = ["OptimizationConfig", "Skill", "load_config"]
__version__ = "0.1.0"
```
`_config.py` owns `Skill`, `OptimizationConfig`, `load_config`, default fallback behavior, inline config parsing, and candidate config handoff.
`_resolver.py` owns candidate resolution, using `{endpoint}/candidates/{candidate_id}/config` for resolved config, optional skill-file download from the candidate manifest, and `DefaultAzureCredential` with the `https://ml.azure.com/.default` scope.
## Verification Checklist
- Changed Python files compile
- `from agent_optimization import load_config` succeeds from the agent root
- `load_config(default_instructions="x", default_model="m")` returns defaults when no optimization env vars are set
- Existing entrypoint, hosting adapter, and protocol remain unchanged
- Multi-agent targets are named and documented
- Evaluator objective influenced the target selection or was explicitly unavailable
- User is asked to review before deployment
invoke.md 7.7 KB
# Invoke Foundry Agent
Invoke deployed agents in Azure AI Foundry. Manage sessions and file operations for hosted agents.
## Quick Reference
| Property | Value |
|----------|-------|
| Agent types | Prompt (LLM-based), Hosted |
| MCP server | `azure` |
| Key Foundry MCP tools | `agent_invoke`, `agent_get`, `session_create`, `session_get`, `session_delete`, `session_list` |
| File operation tools | `session_file_upload`, `session_file_download`, `session_file_list`, `session_file_delete`, `session_file_stat`, `session_file_mkdir` |
| Conversation support | Single-turn and multi-turn (via `conversationId` for responses protocol, via session state for invocations protocol) |
| Session support | Managed sessions for hosted agents (via `session_create`) |
| Protocols | `responses` (OpenAI-compatible), `invocations` (custom payloads) |
## When to Use This Skill
- Send messages to a deployed agent (single or multi-turn)
- Create/manage sessions for hosted agents
- Upload/download files to/from hosted agent sessions
- Test agent behavior after creation or deployment
## MCP Tools
| Tool | Description | Parameters |
|------|-------------|------------|
| `agent_invoke` | Send a message to an agent and get a response | `projectEndpoint`, `agentName`, `inputText` (required); `agentVersion`, `conversationId`, `sessionId`, `protocol`, `stream` (optional) |
| `agent_get` | Get agent details to verify existence and type | `projectEndpoint` (required), `agentName` (optional) |
| `session_create` | Create a new session for a hosted agent | `projectEndpoint`, `agentName` (required); `sessionId` (optional) |
| `session_get` | Get session status and details | `projectEndpoint`, `agentName`, `sessionId` (required) |
| `session_delete` | Delete a session and release compute | `projectEndpoint`, `agentName`, `sessionId` (required) |
| `session_list` | List sessions with pagination | `projectEndpoint`, `agentName` (required); `limit`, `order`, `after`, `before` (optional) |
| `session_logstream` | Stream console logs (stdout/stderr) from a session | `projectEndpoint`, `agentName`, `sessionId` (required); `maxLines` (optional) |
For session file operation tools (`session_file_upload`, `session_file_download`, `session_file_list`, `session_file_delete`, `session_file_stat`, `session_file_mkdir`), see [File Operations](references/file-operations.md).
## Protocols
Hosted agents support two invocation protocols declared at deployment time.
| Protocol | Recommended Version | Route | Best For |
|----------|-------------------|-------|----------|
| `responses` | `1.0.0` | `.../agents/{agentName}/endpoint/protocols/openai/responses` | Conversational agents, OpenAI-compatible |
| `invocations` | `1.0.0` | `.../agents/{agentName}/endpoint/protocols/invocations` | Custom payloads, protocol bridges, webhook callers |
Key difference: `responses` takes a natural language `inputText` message with platform-managed history. `invocations` is **bytes in, bytes out** β the request body is forwarded as-is to the container and the raw response is returned. The developer defines the schema; the platform is pure pass-through. See [Invocations Protocol Guide](references/invocations-protocol.md) for I/O details, schema discovery, and examples.
> β οΈ **Critical for invocations:** `inputText` is forwarded as the raw HTTP request body. The agent developer defines what the container accepts. **Do not guess** β fetch the agent's OpenAPI spec or inspect its source code first.
> π‘ **Tip:** The `agent_invoke` MCP tool supports both protocols. Set `protocol: 'invocations'` when targeting an invocations-protocol agent.
## Workflow
### Step 1: Verify Agent Readiness
Use `agent_get` to verify the agent exists. For hosted agents, also verify the targeted version is `active`.
### Step 2: Create Session (Hosted Agents)
For hosted agents, create a session before invoking using `session_create` with `projectEndpoint` and `agentName`. Optionally provide a `sessionId` (must match `^[A-Za-z0-9_-]{8,128}$`). Store the returned `sessionId` for subsequent calls.
> β οΈ Skip this step for prompt agents β they do not use sessions.
For full session lifecycle details, see [Session Management](references/session-management.md).
### Step 3: Invoke Agent
Use the project endpoint and agent name from the project context. Use `agent_invoke` with:
- `projectEndpoint`, `agentName`, `inputText` (required)
- `agentVersion`, `conversationId`, `sessionId`, `protocol`, `stream` (optional)
**Responses protocol** (default): `inputText` is a natural language message string. Multi-turn via `conversationId`.
**Invocations protocol**: Set `protocol: 'invocations'`. This is **bytes in, bytes out** β `inputText` is forwarded as the raw HTTP request body to the container. The developer defines the expected schema.
> β οΈ **Do not guess the invocations request body.** To discover the expected schema:
> 1. **Fetch the OpenAPI spec**: `GET {projectEndpoint}/agents/{agentName}/endpoint/protocols/invocations/docs/openapi.json` (if the developer registered one)
> 2. Inspect the agent's **route handler code** or README for the expected payload shape
> 3. If unknown, ask the user for the agent's API contract before invoking
Example invocations call (agent expects `{"message": "<text>"}`):
```text
agent_invoke(projectEndpoint, agentName, inputText: "{\"message\":\"hello\"}", protocol: "invocations", sessionId: "<id>")
```
See [Invocations Protocol Guide](references/invocations-protocol.md) for full details and examples.
### Step 4: Multi-Turn Conversations
**Responses protocol** β Pass `conversationId` from previous response to continue the thread. Platform manages history.
**Invocations protocol** β Reuse same `sessionId`; conversation state is agent-managed via `$HOME`. Do **not** pass `conversationId` β it has no effect for invocations.
### Step 5: File Operations (Hosted Agents)
Upload/download files to pass data to and retrieve results from agents. All file operations require an active session. See [File Operations](references/file-operations.md).
### Step 6: Clean Up
Use `session_delete` to release compute resources when done. Undeleted sessions expire per platform policies.
## Agent Type Differences
| Behavior | Prompt Agent | Hosted Agent |
|----------|--------------|--------------|
| Readiness | Immediate | After deployment, version must be `active` |
| Session | Not applicable | Required via `session_create` |
| Multi-turn | Via `conversationId` | Via `conversationId` (responses) or session state (invocations) |
| File operations | β | β
via session file tools |
| Protocol | `responses` only | `responses` or `invocations` |
## Error Handling
| Error | Cause | Resolution |
|-------|-------|------------|
| Agent not found | Invalid name or endpoint | Use `agent_get` to list agents |
| Hosted agent not active | Version still provisioning or failed | Check version status via `agent_get` |
| Session not found | Invalid ID or expired | Create new session with `session_create` |
| Invocation failed | Model error, timeout, or invalid input | Check agent logs, verify model deployment |
| Invocations schema mismatch | Request body does not match what the agent expects | Inspect agent's route handler or API docs for the correct JSON schema; do not guess |
| File operation failed | Session not active or invalid path | Verify session with `session_get` |
| Permission error | Missing RBAC | Follow [troubleshoot skill](../troubleshoot/troubleshoot.md) |
| Rate limit exceeded | Too many requests | Implement backoff and retry |
## Additional Resources
- [Session Management](references/session-management.md)
- [File Operations](references/file-operations.md)
- [Foundry Hosted Agents](https://learn.microsoft.com/en-us/azure/ai-foundry/agents/concepts/hosted-agents?view=foundry)
- [Foundry Samples](https://github.com/azure-ai-foundry/foundry-samples)
file-operations.md 4.8 KB
# File Operations
Manage files within a hosted agent session. All file operations require an active session with a running sandbox.
## Overview
Hosted agent sessions provide a persistent filesystem rooted at `$HOME` (`/home/session`). Files written to this path survive across requests within the same session. Use the session file tools to upload input data, download outputs, and manage the session filesystem externally.
> β οΈ **Warning:** All file paths are relative to `$HOME`. For example, `filePath: '/data/input.csv'` maps to `/home/session/data/input.csv` inside the container.
## MCP Tool Details
### Upload File
Use `session_file_upload` to write a file into the session:
| Parameter | Required | Description |
|-----------|----------|-------------|
| `projectEndpoint` | β
| AI Foundry project endpoint |
| `agentName` | β
| Name of the hosted agent |
| `sessionId` | β
| Active session ID |
| `filePath` | β
| Destination path (e.g., `/data/input.csv`) |
| `contentBase64` | β
| File content as a base64-encoded string |
> π‘ **Tip:** For text files, encode the content to base64 before passing it. For binary files (images, PDFs), read the raw bytes and base64-encode them.
### Download File
Use `session_file_download` to retrieve a file from the session:
| Parameter | Required | Description |
|-----------|----------|-------------|
| `projectEndpoint` | β
| AI Foundry project endpoint |
| `agentName` | β
| Name of the hosted agent |
| `sessionId` | β
| Active session ID |
| `filePath` | β
| Path to the file to download (e.g., `/data/output.csv`) |
Returns: File content as a base64-encoded string.
### List Files
Use `session_file_list` to browse the session filesystem:
| Parameter | Required | Description |
|-----------|----------|-------------|
| `projectEndpoint` | β
| AI Foundry project endpoint |
| `agentName` | β
| Name of the hosted agent |
| `sessionId` | β
| Active session ID |
| `path` | β | Directory path to list (defaults to root `/`) |
Returns: List of files and directories with metadata.
### Delete File
Use `session_file_delete` to remove a file or directory:
| Parameter | Required | Description |
|-----------|----------|-------------|
| `projectEndpoint` | β
| AI Foundry project endpoint |
| `agentName` | β
| Name of the hosted agent |
| `sessionId` | β
| Active session ID |
| `filePath` | β
| Path to delete |
| `recursive` | β | Set `true` to recursively delete a directory and its contents (default `false`) |
> β οΈ **Warning:** Non-recursive delete on a non-empty directory will fail. Use `recursive: true` for directories with contents.
### Get File Metadata
Use `session_file_stat` to inspect a file or directory:
| Parameter | Required | Description |
|-----------|----------|-------------|
| `projectEndpoint` | β
| AI Foundry project endpoint |
| `agentName` | β
| Name of the hosted agent |
| `sessionId` | β
| Active session ID |
| `filePath` | β
| Path to inspect |
Returns: File name, size, whether it is a directory, and last modified time.
### Create Directory
Use `session_file_mkdir` to create directories:
| Parameter | Required | Description |
|-----------|----------|-------------|
| `projectEndpoint` | β
| AI Foundry project endpoint |
| `agentName` | β
| Name of the hosted agent |
| `sessionId` | β
| Active session ID |
| `path` | β
| Directory path to create (e.g., `/data/results`) |
| `createParents` | β | Create parent directories if needed (default `true`) |
| `mode` | β | Unix permission mode (e.g., `755`). Uses system default if omitted |
## Common Patterns
### Upload Input β Invoke β Download Output
```text
1. session_create β get sessionId
2. session_file_mkdir β create /data/input/
3. session_file_upload β upload input files to /data/input/
4. agent_invoke β tell agent to process /data/input/
5. session_file_list β check /data/output/ for results
6. session_file_download β retrieve output files
7. session_delete β clean up when done
```
### Check Agent-Generated Files
```text
1. session_file_list β browse $HOME to see what the agent created
2. session_file_stat β check size/type of specific files
3. session_file_download β retrieve files of interest
```
## Storage Limits
- Maximum `$HOME` size: **10 GiB** per session
- Files outside `$HOME` (e.g., `/tmp`) are ephemeral and may be cleared between requests
## Error Handling
| Error | Cause | Resolution |
|-------|-------|------------|
| Session not active | Session expired or not yet running | Use `session_get` to check status; create a new session if expired |
| File not found | Invalid path or file does not exist | Use `session_file_list` to verify the path |
| Directory not empty | Non-recursive delete on a directory with contents | Use `recursive: true` |
| Storage limit exceeded | `$HOME` exceeds 10 GiB | Delete unnecessary files with `session_file_delete` |
invocations-protocol.md 3.6 KB
# Invocations Protocol Guide
The `invocations` protocol is **bytes in, bytes out**. The platform is pure pass-through β the raw HTTP request body is forwarded to the container and the raw response is returned. The agent developer defines what the container accepts and returns. Unlike `responses` (OpenAI-compatible with platform-managed history), `invocations` gives full control to the container code.
## Input/Output Contract
| Aspect | `responses` | `invocations` |
|--------|------------|---------------|
| **Input** | `inputText` is a natural language message (e.g., `"What is the weather?"`) | `inputText` is forwarded as the **raw HTTP request body** β bytes in. Format as whatever the container's invoke handler expects (typically JSON) |
| **Output** | Structured OpenAI response with `output_text` | **Raw response bytes** from the container β JSON, text, or SSE events. Format is defined by the agent developer |
| **Conversation history** | Platform-managed via `conversationId` | Agent-managed via session filesystem; `conversationId` does **not** apply |
| **Streaming** | Platform-managed via `stream: true` | Agent-controlled; `stream` parameter does **not** apply |
## Discovering the Expected Input Schema
> β οΈ **Do not guess the invocations request body.** The developer defines the schema in the container's invoke handler. The platform does not validate or transform the payload.
### 1. Fetch the OpenAPI Spec (Preferred)
Agents can register an OpenAPI spec that describes the expected request/response format. Fetch it from:
```text
GET {projectEndpoint}/agents/{agentName}/endpoint/protocols/invocations/docs/openapi.json
```
If the developer registered an `openapi_spec` when creating the server, this returns the full API contract. If not registered, it returns 404.
### 2. Inspect Agent Source Code
Look at the agent's invoke handler β the function registered with `@app.invoke_handler` (Python) or equivalent. The handler reads the raw request (e.g., `request.json()` for JSON, `request.body()` for raw bytes) and returns a `Response`.
### 3. Ask the User
If neither the OpenAPI spec nor source code is available, ask the user for the expected request body format before invoking.
## Examples
**Responses protocol** (default):
```text
agent_invoke(projectEndpoint, agentName, inputText: "What is the weather in Seattle?")
β Structured response with output_text
```
**Invocations protocol** β agent expects `{"message": "<text>"}`:
```text
agent_invoke(projectEndpoint, agentName, inputText: "{\"message\":\"hello\"}", protocol: "invocations", sessionId: "<session-id>")
β Raw bytes from container, e.g.: {"response": "Hi there!", "session_id": "abc123"}
```
## Common Use Cases
| Scenario | Why Invocations |
|----------|----------------|
| Webhook receiver (GitHub, Stripe, Jira) | External system sends its own payload format |
| Non-conversational processing (classification, extraction) | Input is structured data, not a chat message |
| Custom streaming protocol (AG-UI) | Needs raw SSE control, not OpenAI-compatible streaming |
| Protocol bridge (proprietary systems) | Caller has its own protocol that doesn't map to `/responses` |
## Error Handling
| Error | Cause | Resolution |
|-------|-------|------------|
| 400/422 or invocation failed | Request body does not match what the container expects | Fetch OpenAPI spec or inspect handler code for the correct schema |
| 404 on OpenAPI spec | Developer did not register an `openapi_spec` | Inspect handler source code or ask the user for the API contract |
| Empty response | Agent returned no content | Check agent logs via `session_logstream`; verify the handler processes the request body correctly |
session-management.md 3.9 KB
# Session Management
Manage hosted agent sessions β isolated compute environments that provide persistent state across invocations.
## Overview
Sessions bind a hosted agent to a dedicated compute instance. Files written to `$HOME` during a session persist across requests for the lifetime of that session. When a session is deleted, its compute resources and stored files are released.
## Session Lifecycle
```text
session_create β Running β (invoke, file ops) β session_delete
β
Expired (platform auto-cleanup)
```
## Session ID Format
Session IDs must match the pattern `^[A-Za-z0-9_-]{8,128}$`.
- If you provide a `sessionId` to `session_create`, it must conform to this pattern
- If you omit `sessionId`, the platform auto-generates one
- Store the returned `sessionId` β it is required for all subsequent operations
## MCP Tool Details
### Create Session
Use `session_create` to provision a new session:
| Parameter | Required | Description |
|-----------|----------|-------------|
| `projectEndpoint` | β
| AI Foundry project endpoint |
| `agentName` | β
| Name of the hosted agent |
| `sessionId` | β | Optional custom session ID (8-128 chars, alphanumeric + hyphens/underscores) |
Returns: Session resource with `sessionId`, status, and expiration.
### Get Session
Use `session_get` to check session status:
| Parameter | Required | Description |
|-----------|----------|-------------|
| `projectEndpoint` | β
| AI Foundry project endpoint |
| `agentName` | β
| Name of the hosted agent |
| `sessionId` | β
| The session ID to inspect |
Returns: Session details including status, version, creation time, and expiration.
### Delete Session
Use `session_delete` to release compute resources:
| Parameter | Required | Description |
|-----------|----------|-------------|
| `projectEndpoint` | β
| AI Foundry project endpoint |
| `agentName` | β
| Name of the hosted agent |
| `sessionId` | β
| The session ID to delete |
> β οΈ **Warning:** Deleting a session permanently removes all files stored in `$HOME` for that session.
### List Sessions
Use `session_list` to enumerate sessions:
| Parameter | Required | Description |
|-----------|----------|-------------|
| `projectEndpoint` | β
| AI Foundry project endpoint |
| `agentName` | β
| Name of the hosted agent |
| `limit` | β | Max results to return (1-100, default 20) |
| `order` | β | Sort order: `asc` or `desc` (default `asc`) |
| `after` | β | Cursor for forward pagination |
| `before` | β | Cursor for backward pagination |
> β οΈ **Warning:** `after` and `before` are mutually exclusive β do not pass both.
## Session vs Conversation
| Concept | Purpose | Scope |
|---------|---------|-------|
| `sessionId` | Binds requests to a compute instance with persistent filesystem state | Hosted agents only |
| `conversationId` | Tracks conversation history across turns | Responses protocol only |
- A single session can host multiple conversations
- A conversation does not require a session (prompt agents use `conversationId` without sessions)
- For hosted agents using `responses` protocol, use **both**: `sessionId` for compute affinity and `conversationId` for history
## Best Practices
1. **Create sessions explicitly** β Always use `session_create` before invoking a hosted agent. Do not rely on implicit session creation.
2. **Reuse sessions** β Keep the same session for related multi-turn interactions to preserve agent state.
3. **Clean up when done** β Delete sessions after use to release compute resources and avoid quota consumption.
4. **Handle expiry** β Sessions expire based on platform policies. If `session_get` returns a non-running state, create a new session.
5. **Version awareness** β The platform auto-resolves the agent version at session creation time. If you need a specific version, ensure it is active before creating the session.
6. **Debug with logstream** β Use `session_logstream` to stream stdout/stderr from a running session for troubleshooting.
observe.md 12.5 KB
# Agent Observability Loop
Orchestrate the full eval-driven optimization cycle for a Foundry agent. This skill manages the **multi-step workflow** for a selected agent root and environment: reusing or refreshing `.foundry` cache in that folder only, auto-creating evaluators, generating test datasets, running batch evals, clustering failures, optimizing prompts, redeploying, and comparing versions. Use this skill instead of calling individual `azure` MCP evaluation tools manually.
## When to Use This Skill
USE FOR: evaluate my agent, run an eval, test my agent, check agent quality, run batch evaluation, analyze eval results, why did my eval fail, cluster failures, improve agent quality, optimize agent prompt, compare agent versions, re-evaluate after changes, set up CI/CD evals, agent monitoring, eval-driven optimization, set up continuous monitoring, production quality monitoring, why are eval scores dropping.
> β οΈ **DO NOT manually call** `evaluation_agent_batch_eval_create`, `evaluator_catalog_create`, `evaluation_comparison_create`, `prompt_optimize`, or `continuous_eval_create` **without reading this skill first.** This skill defines required pre-checks, environment selection, cache reuse, artifact persistence, and multi-step orchestration that the raw tools do not enforce.
## Quick Reference
| Property | Value |
|----------|-------|
| MCP server | `azure` |
| Key MCP tools | `evaluator_catalog_get`, `evaluation_agent_batch_eval_create`, `evaluator_catalog_create`, `evaluation_comparison_create`, `evaluation_get`, `prompt_optimize`, `agent_update`, `continuous_eval_create`, `continuous_eval_get`, `continuous_eval_delete` |
| Prerequisite | Agent deployed and running (use [deploy skill](../deploy/deploy.md)) |
| Local cache | selected `.foundry/agent-metadata*.yaml` file, `.foundry/evaluators/`, `.foundry/datasets/`, `.foundry/results/` |
## Entry Points
| User Intent | Start At |
|-------------|----------|
| "Deploy and evaluate my agent" | [Step 1: Auto-Setup Evaluators](references/deploy-and-setup.md) (deploy first via [deploy skill](../deploy/deploy.md)) |
| "Agent just deployed" / "Set up evaluation" | [Step 1: Auto-Setup Evaluators](references/deploy-and-setup.md) (skip deploy, run auto-create) |
| "Evaluate my agent" / "Run an eval" | [Step 1: Auto-Setup Evaluators](references/deploy-and-setup.md) first if `.foundry/evaluators/` or `.foundry/datasets/` cache is missing, stale, or the user requests refresh, then [Step 2: Evaluate](references/evaluate-step.md) |
| "Why did my eval fail?" / "Analyze results" | [Step 3: Analyze](references/analyze-results.md) |
| "Improve my agent" / "Optimize prompt" | [Step 4: Optimize](references/optimize-deploy.md) |
| "Compare agent versions" | [Step 5: Compare](references/compare-iterate.md) |
| "Set up CI/CD evals" | [Step 6: CI/CD & Monitoring](references/cicd-monitoring.md) |
| "Enable continuous monitoring" / "Set up production monitoring" / "Evaluation results dropping" | [Continuous Eval](references/continuous-eval.md) |
> β οΈ **Important:** Before running any evaluation (Step 2), always resolve the selected agent root, metadata file, and environment, then inspect that metadata file plus `.foundry/evaluators/` and `.foundry/datasets/` in that root only. If the cache is missing, stale, or the user wants to refresh it, route through [Step 1: Auto-Setup](references/deploy-and-setup.md) first β even if the user only asked to "evaluate." Do **not** merge `.foundry` cache or source context from sibling agent folders or sibling metadata files.
## Before Starting β Detect Current State
1. Resolve the target agent root, selected metadata file, and environment from `.foundry/agent-metadata*.yaml`.
2. Use `agent_get` and `agent_container_status_get` to verify the environment's agent exists and is running.
3. Inspect the selected environment's `evaluationSuites[]` plus cached files under `.foundry/evaluators/` and `.foundry/datasets/` in the selected agent root only. If the metadata still uses older `testSuites[]` or legacy `testCases[]`, normalize that list to evaluation suites first using the shared migration rule.
4. Use `evaluation_get` to check for existing eval runs.
5. Jump to the appropriate entry point.
## Loop Overview
```text
1. Auto-setup evaluators or refresh .foundry cache for the selected environment
-> ask: "Run an evaluation to identify optimization opportunities?"
2. Evaluate (batch eval run)
3. Download and cluster failures
4. Pick a category or evaluation suite to optimize
5. Optimize prompt
6. Deploy new version (after user sign-off)
7. Re-evaluate (same env + same evaluation suite)
8. Compare versions -> decide which to keep
9. Loop to next category or finish
10. Prompt: enable CI/CD pipeline evals and/or continuous production monitoring
```
## Behavioral Rules
1. **Keep context visible.** Restate the selected agent root, metadata file, and environment in setup, evaluation, and result summaries.
2. **Stay inside the selected agent root.** Once the agent root is resolved, inspect only that folder's `.foundry/` cache and source tree when suggesting tools, datasets, evaluators, or prompt optimizations. Do not merge sibling agent folders.
3. **Reuse cache before regenerating.** Prefer existing `.foundry/evaluators/` and `.foundry/datasets/` when they match the active environment. Ask before refreshing or overwriting them.
4. **Start with smoke suites.** Run evaluation suites tagged `tier=smoke` before broader `tier=regression` or `tier=coverage` suites unless the user explicitly chooses otherwise.
5. **Auto-poll in background.** After creating eval runs or starting containers, poll in a background terminal. Only surface the final result.
6. **Confirm before changes.** Show diff/summary before modifying agent code, refreshing cache, or deploying. Wait for sign-off.
7. **Prompt for next steps.** After each step, present options. Never assume the path forward.
8. **Write scripts to files.** Python scripts go in `scripts/` - no inline code blocks.
9. **Persist eval artifacts.** Save local artifacts to `.foundry/evaluators/`, `.foundry/datasets/`, and `.foundry/results/` for version tracking and comparison.
10. **Migrate legacy metadata on write.** If the selected environment still uses older `testSuites[]` or legacy `testCases[]`, treat that list as the suite source for the current run, then rewrite that environment to `evaluationSuites[]` on the next metadata update. Preserve dataset/evaluator fields and map `priority` to `tags.tier` only when `tags.tier` is missing.
11. **Use exact eval parameter names.** Use `evaluationId` only on batch-eval create calls that group runs; use `evalId` on `evaluation_get` and `evaluation_comparison_create`; use `evalRunId` for a specific run lookup.
12. **Check existing evaluators before creating new ones.** Always call `evaluator_catalog_get` before proposing or creating evaluators. Present the existing catalog to the user and map existing evaluators to the agent's evaluation needs. Only create a new evaluator when no existing one covers the required dimension. This applies to every workflow that involves evaluator selection - initial setup, re-evaluation, and optimization loops.
13. **Use correct parameters when deleting evaluators.** `evaluator_catalog_delete` requires both `name` (not `evaluatorName`) and `version`. When cleaning up redundant evaluators, always pass the explicit version string. If an evaluator has multiple versions (for example, `v1`, `v2`, `v3`), delete each version individually - there is no "delete all versions" shortcut. Discover version numbers with `evaluator_catalog_get` before attempting deletions.
14. **Use a two-phase evaluator strategy.** Phase 1 is built-in only: `relevance`, `task_adherence`, `intent_resolution`, `indirect_attack`, and `builtin.tool_call_accuracy` when the agent uses tools. Generate seed datasets with `query` and `expected_behavior` so Phase 2 can reuse or create targeted custom evaluators only after the first run exposes gaps.
15. **Account for LLM judge knowledge cutoff.** When the agent uses real-time data sources (web search, Bing Grounding, live APIs), the LLM judge's training cutoff means it cannot verify current facts. Custom evaluators that score factual accuracy or behavioral adherence will produce systematic false negatives - flagging the agent's real-time data as "fabricated" or "beyond knowledge cutoff." Mitigations: (a) instruct the evaluator prompt to accept sourced claims it cannot verify, (b) use `expected_behavior` rubrics that describe the shape of a good answer rather than specific facts, (c) flag suspected knowledge-cutoff false negatives in the failure analysis rather than treating them as real failures.
16. **Show Data Viewer deeplinks (for VS Code runtime only).** Append a Data Viewer deeplink immediately after reference to a dataset file or evaluation result file in your response. Format: "[Open in Data Viewer](vscode://ms-windows-ai-studio.windows-ai-studio/open_data_viewer?file=<file_path>&source=microsoft-foundry-skill) for details and perform analysis". This applies to files in `.foundry/datasets/`, `.foundry/results/`.
17. **Use the custom evaluator output contract.** When creating custom evaluator prompts, treat the MCP/tool-enforced output schema as authoritative: `result` plus `reason`. Do **not** include or preserve conflicting user-provided output instructions such as `score`/`reasoning`, duplicate `OUTPUT FORMAT` blocks, markdown, or alternate JSON schemas in `promptText`. If the user provides a judge prompt that contains its own return schema, keep the rubric and placeholders but rewrite or remove the output-format section so it cannot conflict with the enforced `result`/`reason` contract.
## Two-Phase Evaluator Strategy
| Phase | When | Evaluators | Dataset fields | Goal |
|-------|------|------------|----------------|------|
| Phase 1 - Initial setup | Before the first eval run | <=5 built-in evaluators only: `relevance`, `task_adherence`, `intent_resolution`, `indirect_attack`, plus `builtin.tool_call_accuracy` when the agent uses tools | `query`, `expected_behavior` (plus optional `context`, `ground_truth`) | Establish a fast baseline and identify which failure patterns built-ins can and cannot explain |
| Phase 2 - After analysis | After reviewing the first run's failures and clusters | Reuse existing custom evaluators first; create a new custom evaluator only when the built-in set cannot capture the gap | Reuse `expected_behavior` as a per-query rubric | Turn broad failure signals into targeted, domain-aware scoring |
Phase 1 keeps the first setup fast and comparable across agents. Even though the initial built-in evaluators do not consume `expected_behavior`, include it in every seed dataset row so the same dataset is ready for Phase 2 custom evaluators without regeneration.
When built-in evaluators reveal patterns they cannot fully capture - for example, false negatives from `task_adherence` missing tool-call context or domain-specific quality gaps - first call `evaluator_catalog_get` again to see whether an existing custom evaluator already covers the dimension. Only create a new evaluator when the catalog still lacks the required signal.
Example custom evaluator for Phase 2:
```yaml
name: behavioral_adherence
promptText: |
Given the query, response, and expected behavior, rate how well
the response fulfills the expected behavior (1-5).
## Query
{{query}}
## Response
{{response}}
## Expected Behavior
{{expected_behavior}}
```
> π‘ **Tip:** This evaluator scores against the per-query behavioral rubric in `expected_behavior`, not just the agent's global instructions. That usually produces a cleaner signal when broad built-in judges are directionally correct but too coarse for optimization.
> β οΈ **Output contract:** Do not add `Return JSON: {"score": ...}` or any extra output-format block to custom evaluator `promptText`. The evaluator runtime appends and enforces the final JSON contract (`result` and `reason`). If a user-supplied rubric asks for `score`/`reasoning`, normalize that wording to `result`/`reason` or omit the output schema entirely before calling `evaluator_catalog_create`.
## Related Skills
| User Intent | Skill |
|-------------|-------|
| "Analyze production traces" / "Search conversations" / "Find errors in App Insights" | [trace skill](../trace/trace.md) |
| "Debug hosted agent issues" / "Hosted-agent logs" | [troubleshoot skill](../troubleshoot/troubleshoot.md) |
| "Deploy or redeploy agent" | [deploy skill](../deploy/deploy.md) |
| "Enable continuous evaluation" / "Set up ongoing monitoring" | [Continuous Eval](references/continuous-eval.md) (reference within this skill) |
analyze-results.md 6.3 KB
# Steps 3β5 β Download Results, Cluster Failures, Dive Into Category
## Step 3 β Download Results
`evaluation_get` returns run metadata but **not** full per-row output. Write a Python script (save to `scripts/`) to download detailed results using the **Azure AI Projects Python SDK**.
### Prerequisites
```text
pip install azure-ai-projects>=2.0.0 azure-identity
```
### SDK Client Setup
```python
from azure.identity import DefaultAzureCredential
from azure.ai.projects import AIProjectClient
project_client = AIProjectClient(
endpoint=project_endpoint, # e.g. "https://<hub>.services.ai.azure.com/api/projects/<project>"
credential=DefaultAzureCredential(),
)
# The evals API lives on the OpenAI sub-client, not on AIProjectClient directly
client = project_client.get_openai_client()
```
> β οΈ **Common mistake:** Calling `project_client.evals` directly β the `evals` namespace is on the OpenAI client returned by `get_openai_client()`, not on `AIProjectClient` itself.
### Retrieve Run Status
```python
run = client.evals.runs.retrieve(run_id=run_id, eval_id=eval_id)
print(f"Status: {run.status} Report: {run.report_url}")
```
### Download Per-Row Output Items
The SDK handles pagination automatically β no manual `has_more` / `after` loop required.
```python
output_items = list(client.evals.runs.output_items.list(run_id=run_id, eval_id=eval_id))
all_items = [item.model_dump() for item in output_items]
```
> π‘ **Tip:** Use `model_dump()` to convert each SDK object to a plain dict for JSON serialization.
### Data Structure
Query/response data lives in `datasource_item.query` and `datasource_item['sample.output_text']`, **not** in `sample.input`/`sample.output` (which are empty arrays). Parse `datasource_item` fields when extracting queries and responses for analysis.
> β οΈ **LLM judge knowledge cutoff:** When evaluating agents that use real-time data sources (web search, Bing Grounding, live APIs), the LLM judge may flag factually correct but temporally recent responses as "fabricated" or "unverifiable" because the judge's training data predates the agent's live results. Check failure reasons for phrases like "cannot verify," "beyond knowledge cutoff," or "no evidence" before treating them as real failures. See Behavioral Rule 13 in `observe.md` for mitigations.
### Custom Evaluator Dual-Entry Parsing
Custom evaluators produce **two** result entries per item in the `results` array:
| Entry | `metric` field | Has score? | Has reason/label/passed? |
|-------|----------------|------------|--------------------------|
| Entry 1 | `"custom_score"` | β
numeric score | β null |
| Entry 2 | `"{evaluator_name}"` | β null | β
real reason, label, passed |
To get the complete picture, merge both entries:
```python
def extract_evaluator_result(item, evaluator_name):
"""Merge the dual entries for a custom evaluator into one result."""
score_entry = None
detail_entry = None
for r in item.get("results", []):
metric = r.get("metric", "")
if metric == "custom_score":
score_entry = r
elif metric == evaluator_name:
detail_entry = r
if not detail_entry:
return None
return {
"score": score_entry.get("score") if score_entry else None,
"passed": detail_entry.get("passed"),
"reason": detail_entry.get("reason"),
"label": detail_entry.get("label"),
}
```
> β οΈ **Common mistake:** Reading only the first matching result entry for a custom evaluator gives you the score but null reason (or vice versa). Always merge both entries. Built-in evaluators do **not** have this dual-entry pattern - they produce a single entry with all fields populated.
**Evidence from actual eval run** (item 1, `behavioral_adherence`):
```jsonc
// Entry 1: has score, null reason
{"name": "behavioral_adherence", "metric": "custom_score", "score": 1, "reason": null, "passed": null}
// Entry 2: has reason, null score
{"name": "behavioral_adherence", "metric": "behavioral_adherence", "score": null,
"reason": "The response provides outdated and fabricated information...", "passed": false}
```
### Persist Results
Save results to `.foundry/results/<environment>/<eval-id>/<run-id>.json` (use `json.dump` with `default=str` for non-serializable fields). Print summary: total items, passed, failed, errored counts.
> β οΈ **Show Data Viewer deeplinks (for VS Code runtime only):** Append a Data Viewer deeplink immediately after reference to an evaluation result file in your response. Format: "[Open in Data Viewer](vscode://ms-windows-ai-studio.windows-ai-studio/open_data_viewer?file=<file_path>&source=microsoft-foundry-skill) for details and perform analysis".
## Step 4 β Cluster Failures by Root Cause
Analyze every row in the results. Group failures into clusters:
| Cluster | Description |
|---------|-------------|
| Incorrect / hallucinated answer | Agent gave a wrong or fabricated response |
| Incomplete answer | Agent missed key parts |
| Tool call failure | Agent failed to invoke or misused a tool |
| Safety / content violation | Flagged by safety evaluators |
| Runtime error | Agent crashed or returned an error |
| Off-topic / refusal | Agent refused or went off-topic |
Produce a prioritized action table:
| Focus | Cluster | Suggested Action |
|-------|---------|------------------|
| Runtime blockers | Runtime errors or failing suites tagged `tier=smoke` | Check container logs or fix blockers first |
| Key regressions | Incorrect answers on suites tagged `purpose=regression` or `tier=smoke` | Optimize prompt or tool instructions |
| Broader quality gaps | Incomplete answers or coverage-oriented suites | Optimize prompt or expand context |
| Tooling issues | Tool call failures | Fix tool definitions or instructions |
| Safety issues | Safety violations | Add guardrails to instructions |
**Rule:** Prioritize runtime errors first, then suites tagged `tier=smoke`, then suites tagged `purpose=regression`, then broader coverage suites by count Γ severity.
## Step 5 β Dive Into Category
When the user wants to inspect a specific cluster, display the individual rows: evaluation-suite ID, input query, the agent's original response, evaluator scores, and failure reason. Let the user confirm which category or evaluation suite to optimize.
## Next Steps
After clustering -> proceed to [Step 6: Optimize Prompt](optimize-deploy.md).
cicd-monitoring.md 3.6 KB
# Step 6 β CI/CD Evals & Continuous Production Monitoring
After confirming the final agent version through the observe loop, present two complementary monitoring options. The user may choose one, both, or neither.
## Option 1 β CI/CD Pipeline Evaluations (Pre-Deploy Gate)
*"Would you like to add automated evaluations to your CI/CD pipeline so every deployment is evaluated before going live?"*
CI/CD evals run batch evaluations as part of your deployment pipeline, catching regressions **before** they reach production.
If yes, generate a GitHub Actions workflow (for example, `.github/workflows/agent-eval.yml`) that:
1. Triggers on push to `main` or on pull request
2. Accepts a metadata-file input or environment variable such as `FOUNDRY_METADATA_FILE` and defaults it to `.foundry/agent-metadata.yaml`
3. Reads evaluation-suite definitions from the selected metadata file (for example, `.foundry/agent-metadata.prod.yaml` for prod CI)
4. Reads evaluator definitions from `.foundry/evaluators/` and test datasets from `.foundry/datasets/`
5. Runs `evaluation_agent_batch_eval_create` against the newly deployed agent version
6. Fails the workflow if any evaluator score falls below the configured thresholds for the environment and evaluation suite resolved from that metadata file
7. Posts a summary as a PR comment or workflow annotation
Use repository secrets for the selected environment's project endpoint and Azure credentials, and keep the metadata filename explicit in the workflow so prod rollouts do not depend on the local/dev default file. Confirm the workflow file with the user before committing.
## Option 2 β Continuous Production Monitoring (Post-Deploy)
*"Would you like to set up continuous evaluations to monitor your agent's quality in production?"*
Continuous evaluation uses Foundry-native MCP tools to automatically assess agent responses on an ongoing basis β no additional CI/CD pipeline setup is needed for this option. This catches regressions that emerge **after** deployment from changing data, user patterns, or upstream service drift.
### Enable Continuous Evaluation
Use the [continuous evaluation reference](continuous-eval.md) to configure monitoring. The workflow:
1. **Check existing config** β call `continuous_eval_get` to see if monitoring is already active.
2. **Select evaluators** β recommend starting with the same evaluators used in batch evals for consistent comparison:
- **Quality evaluators** (require `deploymentName`): e.g., groundedness, coherence, relevance, task_adherence
- **Safety evaluators**: e.g., violence, indirect_attack, hate_unfairness
3. **Enable** β call `continuous_eval_create` with the selected evaluators. The tool auto-detects agent kind and configures the appropriate backend (real-time for prompt agents, scheduled for hosted agents).
4. **Confirm** β present the returned configuration to the user.
### Acting on Monitoring Results
Monitoring is only complete when score drops trigger investigation and remediation.
For instructions on how to read evaluation scores, triage regressions, and verify fixes, see [Acting on Results](continuous-eval.md#acting-on-results).
The observe loop does not end at deployment. Continuous monitoring closes the loop: **observe β optimize β deploy β monitor β observe**. Always offer to set up monitoring after completing an optimization cycle.
## Reference
- [Azure AI Foundry Cloud Evaluation](https://learn.microsoft.com/en-us/azure/ai-foundry/how-to/develop/cloud-evaluation)
- [Hosted Agents](https://learn.microsoft.com/en-us/azure/ai-foundry/agents/concepts/hosted-agents)
- [Continuous Evaluation Reference](continuous-eval.md)
compare-iterate.md 2.9 KB
# Steps 8β10 β Re-Evaluate, Compare Versions, Iterate
## Step 8 β Re-Evaluate
Use **`evaluation_agent_batch_eval_create`** with the **same `evaluationId`** as the baseline run. This places both runs in the same eval group for comparison. Use the same local test dataset (from the selected agent root's `.foundry/datasets/`) and evaluator bundle from the selected environment/evaluation suite. Update `agentVersion` to the new version.
> β οΈ **Parameter switch reminder:** Re-evaluation creation uses `evaluationId`, but follow-up calls to `evaluation_get` and `evaluation_comparison_create` must use `evalId`.
> β οΈ **Eval-group immutability:** Reuse the same `evaluationId` only when `evaluatorNames` and thresholds are unchanged. If you add/remove evaluators or change thresholds, create a new evaluation group first, then compare runs within that new group.
Auto-poll for completion in a background terminal (same as [Step 2](evaluate-step.md)).
## Step 9 β Compare Versions
> **Critical:** `displayName` is **required** in the `insightRequest`. Despite the MCP tool schema showing `displayName` as optional (`type: ["string", "null"]`), the API will reject requests without it with a BadRequest error. `state` must be `"NotStarted"`.
### Required Parameters for `evaluation_comparison_create`
| Parameter | Required | Description |
|-----------|----------|-------------|
| `insightRequest.displayName` | β
| Human-readable name. **Omitting causes BadRequest.** |
| `insightRequest.state` | β
| Must be `"NotStarted"` |
| `insightRequest.request.evalId` | β
| Eval group ID containing both runs |
| `insightRequest.request.baselineRunId` | β
| Run ID of the baseline |
| `insightRequest.request.treatmentRunIds` | β
| Array of treatment run IDs |
Use **`evaluation_comparison_create`** with a nested `insightRequest`:
```json
{
"insightRequest": {
"displayName": "V1 vs V2 Comparison",
"state": "NotStarted",
"request": {
"type": "EvaluationComparison",
"evalId": "<eval-group-id>",
"baselineRunId": "<baseline-run-id>",
"treatmentRunIds": ["<new-run-id>"]
}
}
}
```
> **Important:** Both runs must be in the **same eval group** (same `evaluationId` in Steps 2 and 8), but comparison requests and lookups use `evalId` for that same group identifier. That shared group assumes the evaluator bundle is fixed for all runs in the group.
Then use **`evaluation_comparison_get`** (with the returned `insightId`) to retrieve comparison results. Present a summary showing which version performed better per evaluator, and recommend which version to keep.
## Step 10 β Iterate or Finish
If more categories remain in the prioritized action table (from [Step 4](analyze-results.md)), loop back to **Step 5** (dive into next category) β **Step 6** (optimize) β **Step 7** (deploy) β **Step 8** (re-evaluate) β **Step 9** (compare).
Otherwise, confirm the final agent version with the user, then prompt for [CI/CD evals & monitoring](cicd-monitoring.md).
continuous-eval.md 15.1 KB
# Continuous Evaluation
Enable, configure, disable, or remove continuous evaluation for a Foundry agent. Continuous evaluation automatically assesses agent responses on an ongoing basis using configured evaluators (e.g., groundedness, coherence, violence detection). This is typically the final step in the [observe loop](../observe.md) after deploying and batch-evaluating an agent β it keeps production quality visible without manual intervention.
## When to Use This Skill
USE FOR: enable continuous evaluation, disable continuous evaluation, configure continuous eval, set up monitoring evaluators, check continuous eval status, delete continuous eval, update evaluators, change sampling rate, change eval interval, production monitoring, ongoing agent quality.
DO NOT USE FOR: running a one-off batch evaluation (use [observe](../observe.md)), querying traces (use [trace](../../trace/trace.md)), creating evaluator definitions (use [observe](../observe.md) Step 1).
## Quick Reference
| Property | Value |
|----------|-------|
| MCP server | `azure` |
| Key MCP tools | `continuous_eval_create`, `continuous_eval_get`, `continuous_eval_delete`, `agent_get`, `evaluation_get` |
| Prerequisite | Agent must exist in the project |
| Local cache | `.foundry/agent-metadata.yaml` |
## Entry Points
| User Intent | Start At |
|-------------|----------|
| "Enable continuous eval" / "Set up monitoring evaluators" | [Before Starting](#before-starting--detect-current-state) β [Enable or Update](#enable-or-update) |
| "Is continuous eval running?" / "Check eval status" | [Before Starting](#before-starting--detect-current-state) β [Check Current State](#check-current-state) |
| "Change evaluators" / "Update sampling rate" | [Before Starting](#before-starting--detect-current-state) β [Check Current State](#check-current-state) β [Enable or Update](#enable-or-update) |
| "Pause evaluations" / "Disable continuous eval" | [Before Starting](#before-starting--detect-current-state) β [Disable](#disable) |
| "Stop evaluating this agent" / "Delete continuous eval" | [Before Starting](#before-starting--detect-current-state) β [Delete](#delete) |
| "Scores are dropping" / "Act on monitoring results" | [Before Starting](#before-starting--detect-current-state) β [Acting on Results](#acting-on-results) |
> β οΈ **Important:** Always run [Before Starting](#before-starting--detect-current-state) to resolve the project endpoint and agent name before calling any MCP tools.
## Before Starting β Detect Current State
1. Resolve the target agent root and environment from `.foundry/agent-metadata.yaml` using the [Project Context Resolution](../../../SKILL.md#agent-project-context-resolution) workflow.
2. Extract `projectEndpoint` and `agentName` from the selected environment. If not available in metadata, use `ask_user` to collect them.
3. Use `agent_get` to verify the agent exists and note its kind (prompt or hosted).
4. Use `continuous_eval_get` to check for existing continuous evaluation configuration.
5. Jump to the appropriate entry point based on user intent.
## How It Works
The tool auto-detects the agent's kind and uses the appropriate backend:
- **Prompt agents** β evaluation runs are triggered automatically each time the agent produces a response. Parameters: `samplingRate` (percentage of responses to evaluate), `maxHourlyRuns`.
- **Hosted agents** β evaluation runs are triggered on an hourly schedule, pulling recent traces from App Insights. Parameters: `intervalHours` (hours between runs), `maxTraces` (max data points per run).
The user does not need to choose between these β the tool handles it based on agent kind.
## Behavioral Rules
1. **Always resolve context first.** Run [Before Starting](#before-starting--detect-current-state) before calling any MCP tool. Never assume a project endpoint or agent name.
2. **Check before creating.** Always call `continuous_eval_get` before `continuous_eval_create` to determine whether to create or update. Present existing configuration to the user.
3. **Confirm evaluator selection.** Present the evaluator list to the user before enabling. Distinguish quality evaluators (require `deploymentName`) from safety evaluators (do not).
4. **Prompt for next steps.** After each operation, present options. Never assume the path forward (e.g., after enabling, offer to check status or adjust parameters).
5. **Keep context visible.** Include the project endpoint, agent name, and environment in operation summaries.
6. **Use `continuous_eval_get` for IDs.** The `delete` tool requires a `configId` β always retrieve it from the `get` response rather than asking the user to provide it.
7. **Surface the remediation path.** When presenting continuous eval results that show score degradation, always offer to route into the [observe skill](../observe.md) for diagnosis and optimization. Monitoring without action is incomplete.
8. **Handle agent-not-found.** If `agent_get` returns a not-found error, stop the continuous eval flow. Offer to route to the [deploy skill](../../deploy/deploy.md) to create the agent first, or ask the user to verify the agent name and environment.
9. **Handle auth and endpoint errors.** If `agent_get` or `continuous_eval_create` returns a permission or authentication error, verify the project endpoint, environment, and user access. Do not suggest creating the agent β the issue is access, not existence.
10. **Validate `deploymentName` before enabling.** Do not assume `gpt-4o` exists. If quality evaluators are selected, verify a chat-capable deployment is available in the project. If none exists, stop and explain that quality evaluators cannot be enabled until a compatible deployment is provisioned.
11. **Handle invalid evaluator names.** If `continuous_eval_create` returns an invalid evaluator name error, call `evaluator_catalog_get` to list available evaluators and present valid options. Do not retry with the same arguments.
12. **Handle unexpected empty config.** If `continuous_eval_get` returns an empty list for an agent the user believes has continuous eval configured, verify the agent name and project endpoint match the intended environment in `.foundry/agent-metadata.yaml`. The configuration may exist under a different environment or resolved `agentName`.
## Operations
### Check Current State
Before enabling or modifying, check what's already configured:
```yaml
Tool: continuous_eval_get
Arguments:
projectEndpoint: <project endpoint>
agentName: <agent name>
```
- Empty list β no continuous eval configured. Proceed to [Enable or Update](#enable-or-update).
- Non-empty list β agent already has continuous eval. Present the configuration and ask what the user wants to change.
> β οΈ **Empty result is not proof of absence.** If the user expects a config to exist but the list is empty, verify the project endpoint and agent name match the intended environment before concluding it was never set up.
### Enable or Update
**Replace Semantics**: `continuous_eval_create` always creates a new evaluation group with the provided evaluators and points the evaluation rule at it. Always pass the complete desired configuration on every call β omitted evaluators are dropped, not preserved.
> β οΈ **Do not assume `gpt-4o` exists.** Before setting `deploymentName`, verify a chat-capable deployment is available in the project. If none exists, quality evaluators cannot be enabled β only safety evaluators (which do not require a deployment) will work.
```yaml
Tool: continuous_eval_create
Arguments:
projectEndpoint: <project endpoint>
agentName: <agent name>
evaluatorNames: ["groundedness", "coherence", "fluency"] # Illustrative β align with your batch eval evaluators
deploymentName: "gpt-4o" # Required for quality evaluators
enabled: true # Set false to disable without deleting
```
**Evaluator selection guidance:**
- **Quality evaluators** (require `deploymentName`): coherence, fluency, relevance, groundedness, intent_resolution, task_adherence, tool_call_accuracy
- **Safety evaluators** (no `deploymentName` needed): violence, sexual, self_harm, hate_unfairness, indirect_attack, code_vulnerability, protected_material
- Custom evaluators from the project's evaluator catalog are also supported by name.
**Optional parameters by agent kind:**
| Parameter | Applies To | Description | Default |
|-----------|-----------|-------------|---------|
| `samplingRate` | Prompt | Percentage of responses to evaluate (1-100) | All responses |
| `maxHourlyRuns` | Prompt | Cap on evaluation runs per hour | No limit |
| `intervalHours` | Hosted | Hours between evaluation runs | 1 |
| `maxTraces` | Hosted | Max data points per evaluation run | 1000 |
| `scenario` | Prompt | Evaluation scenario: `standard` (quality and safety metrics, default) or `business` (business success metrics). An agent can have one of each simultaneously. | `standard` |
### Disable
To temporarily disable without changing configuration, pass the configuration currently in use along with `enabled: false`. Because `continuous_eval_create` has replace semantics, omitting parameters will change the configuration when re-enabled. The `continuous_eval_get` response does not include evaluator names directly β they are stored in the linked evaluation group β so retrieve them via `evaluation_get` first. If multiple configurations are returned in the `continuous_eval_get` response, present the list to the user and ask which to target.
```yaml
# Step 1: Get the evalId, then retrieve current evaluators from the eval group
Tool: continuous_eval_get
Arguments:
projectEndpoint: <project endpoint>
agentName: <agent name>
# Note the evalId from the response
```
```yaml
Tool: evaluation_get
Arguments:
projectEndpoint: <project endpoint>
evalId: <evalId from above>
# Note the evaluator names from the evaluation group's testing criteria
```
```yaml
# Step 2: Disable with the same evaluators
Tool: continuous_eval_create
Arguments:
projectEndpoint: <project endpoint>
agentName: <agent name>
evaluatorNames: ["groundedness", "coherence", "fluency"] # Must match current config
deploymentName: "gpt-4o"
enabled: false
```
### Delete
To permanently remove continuous evaluation configuration:
```yaml
Tool: continuous_eval_delete
Arguments:
projectEndpoint: <project endpoint>
configId: <id from continuous_eval_get>
agentName: <agent name>
```
Always call `continuous_eval_get` first to retrieve the `id` field of the configuration to delete. If multiple configurations are returned, present the list to the user and ask which to target.
## Acting on Results
Continuous evaluation generates ongoing scores β but monitoring is only useful when you **act** on what it reveals. This section covers how to consume evaluation results and the remediation loop when scores degrade.
### Step 1: Read Evaluation Scores
The `continuous_eval_get` response includes an `evalId` that links to the evaluation group. Use this to retrieve actual run results:
```yaml
Tool: continuous_eval_get
Arguments:
projectEndpoint: <project endpoint>
agentName: <agent name>
# Note the evalId from the response
```
```yaml
Tool: evaluation_get
Arguments:
projectEndpoint: <project endpoint>
evalId: <evalId from continuous_eval_get>
isRequestForRuns: true
# Returns evaluation runs with per-evaluator scores
```
Review the run results for score trends. Each run contains scores for every configured evaluator. Look for:
- **Scores below threshold** β any evaluator consistently scoring below your acceptable baseline
- **Score degradation over time** β scores that were previously healthy but are trending downward
- **Safety flags** β any non-zero safety evaluator scores that indicate harmful content
### Step 2: Triage the Regression
1. **Identify the failing evaluators.** From the evaluation runs, note which specific evaluators are scoring low (e.g., `groundedness` dropping from 4.2 to 2.8).
2. **Correlate with traces.** Use the [trace skill](../../trace/trace.md) to search App Insights for the conversations that triggered low scores. Look for patterns: specific query types, tool-call failures, or grounding gaps.
3. **Compare to baseline.** If batch eval results exist in `.foundry/results/`, compare continuous eval scores against the last known-good batch run to determine whether this is a new regression or a pre-existing gap.
### Step 3: Remediate via the Observe Loop
Once you understand the failure pattern, use the [observe skill](../observe.md) to fix it:
| Symptom | Action |
|---------|--------|
| Quality scores dropping (coherence, relevance, task_adherence) | Run [Step 3: Analyze](analyze-results.md) to cluster failures, then [Step 4: Optimize](optimize-deploy.md) to improve the prompt |
| Safety evaluators flagging (violence, indirect_attack) | Review flagged traces via [trace skill](../../trace/trace.md), then update agent instructions or tool definitions to address the pattern |
| Grounding failures | Check whether the agent's data sources are still accessible and returning expected results; update knowledge index or tool configuration |
| Scores fluctuating after a deploy | Run [Step 5: Compare](compare-iterate.md) between the current and previous agent version to isolate the regression |
### Step 4: Verify the Fix
After deploying a fix through the observe loop:
1. **Re-run a batch eval** via [observe](../observe.md) Step 2 against the same test cases to confirm the fix.
2. **Read continuous eval scores** from the next evaluation cycle using `evaluation_get` with the `evalId` β verify scores have recovered.
3. **Adjust evaluators if needed.** If the regression exposed a gap in evaluator coverage, use `continuous_eval_create` to update the configuration with additional or refined evaluators.
> π‘ **Tip:** The continuous eval β observe β deploy β continuous eval cycle is the core production quality loop. Continuous eval detects; observe diagnoses and fixes; continuous eval verifies.
## Response Format
All tools return a unified `ContinuousEvalConfig` shape. The `get` tool returns a list; `create` returns a single object.
| Field | Description | Present For |
|-------|-------------|-------------|
| `id` | Configuration identifier (needed for delete) | All |
| `displayName` | Human-readable name | All |
| `enabled` | Whether evaluation is active | All |
| `evalId` | Linked evaluation group containing evaluator definitions | All |
| `agentName` | Target agent name | All |
| `status` | Provisioning status | Hosted only |
| `scenario` | Evaluation scenario (`standard` or `business`) | Prompt only |
| `samplingRate` | Percentage of responses evaluated | Prompt only |
| `maxHourlyRuns` | Cap on runs per hour | Prompt only |
| `intervalHours` | Hours between scheduled runs | Hosted only |
| `maxTraces` | Max data points per run | Hosted only |
| `createdAt` | Creation timestamp | All |
| `createdBy` | Creator identity | All |
## Related Skills
| User Intent | Skill |
|-------------|-------|
| "Evaluate my agent" / "Run a batch eval" | [observe skill](../observe.md) |
| "Scores are dropping" / "Diagnose and fix quality regression" | [observe skill](../observe.md) (Steps 3β5) |
| "Analyze production traces" / "Find flagged conversations" | [trace skill](../../trace/trace.md) |
| "Deploy my agent" / "Redeploy after fix" | [deploy skill](../../deploy/deploy.md) |
deploy-and-setup.md 8.1 KB
# Step 1 β Auto-Setup Evaluators & Dataset
> **This step runs automatically after deployment.** If the agent was deployed via the [deploy skill](../../deploy/deploy.md), `.foundry` cache and metadata may already be configured. Check `.foundry/evaluators/`, `.foundry/datasets/`, and the selected metadata file under the selected agent root before re-creating them.
>
> If the agent is **not yet deployed**, follow the [deploy skill](../../deploy/deploy.md) first. It handles project detection, Dockerfile generation, ACR build, agent creation, verification, and auto-creates `.foundry` cache after a successful deployment.
## Auto-Create Evaluators & Dataset
> **This step is fully automatic.** After deployment, immediately prepare evaluators and a local test dataset for the selected environment without waiting for the user to request it.
### 1. Read Agent Instructions
Use **`agent_get`** (or local `agent.yaml` in the selected agent root) to understand the agent's purpose and capabilities.
### 2. Reuse or Refresh Cache
Inspect `.foundry/evaluators/`, `.foundry/datasets/`, and the selected environment's `evaluationSuites[]` in the selected agent root only. Do **not** merge sibling agent folders. If the selected environment still uses older `testSuites[]` or legacy `testCases[]`, normalize that list to evaluation suites first and plan to rewrite that environment as `evaluationSuites[]` when this step persists metadata.
- **Cache is current** -> reuse it and summarize what is already available.
- **Cache is missing or stale** -> refresh it after confirming with the user.
- **User explicitly asks for refresh** -> rebuild and rewrite only the selected environment's cache in the selected agent root.
### 2.5 Discover Existing Evaluators
Use **`evaluator_catalog_get`** with the selected environment's project endpoint to list all evaluators already registered in the project. Display them to the user grouped by type (`custom` vs `built-in`) with name, category, and version. During Phase 1, catalog any promising custom evaluators for later reuse, but keep the first run on the built-in baseline. Only propose creating a new evaluator in Phase 2 when no existing evaluator covers a required dimension.
### 3. Select Evaluators
Follow the [Two-Phase Evaluator Strategy](../observe.md). Phase 1 is built-in only, so do not create a new custom evaluator during the initial setup pass.
Start with <=5 built-in evaluators for the initial eval run so the first pass stays fast:
| Category | Evaluators |
|----------|-----------|
| **Quality (built-in)** | relevance, task_adherence, intent_resolution |
| **Safety (built-in)** | indirect_attack |
| **Tool use (built-in, conditional)** | tool_call_accuracy (use when the agent calls tools; some catalogs label it as `builtin.tool_call_accuracy`) |
After analyzing initial results, suggest additional evaluators (custom or built-in) targeted at specific failure patterns instead of front-loading a broad default set.
### 4. Defer New Custom Evaluators to Phase 2
During the initial setup pass, do not create a new custom evaluator yet. Instead, record which existing custom evaluators from Step 2.5 might be reused later and run the first built-in-only eval. After the first run has been analyzed, return to this step only if the built-in judges still miss an important pattern.
When Phase 2 is needed:
1. Call **`evaluator_catalog_get`** again and reuse an existing custom evaluator if it already covers the gap.
2. Only if the catalog still lacks the required signal, use **`evaluator_catalog_create`** with the selected environment's project endpoint.
3. Prefer evaluators that consume `expected_behavior`, as described in the [Two-Phase Evaluator Strategy](../observe.md), so scoring can follow the per-query rubric instead of only the global agent instructions.
4. Before passing `promptText` to `evaluator_catalog_create`, remove or rewrite any user-provided output-format instructions that conflict with the custom evaluator contract. The runtime-enforced JSON fields are `result` and `reason`; do not preserve alternate schemas such as `score`/`reasoning` or duplicate mandatory output blocks.
| Parameter | Required | Description |
|-----------|----------|-------------|
| `projectEndpoint` | β
| Azure AI Project endpoint |
| `name` | β
| For example, `domain_accuracy`, `citation_quality` |
| `category` | β
| `quality`, `safety`, or `agents` |
| `scoringType` | β
| `ordinal`, `continuous`, or `boolean` |
| `promptText` | β
* | Template with `{{query}}`, `{{response}}`, and `{{expected_behavior}}` placeholders when behavior-specific scoring is needed. Keep rubric instructions, but omit conflicting output JSON schemas; the runtime enforces `result` and `reason`. |
| `minScore` / `maxScore` | | Default: 1 / 5 |
| `passThreshold` | | Scores >= this value pass |
### 5. Identify LLM-Judge Deployment
Use **`model_deployment_get`** to list the selected project's actual model deployments, then choose one that supports chat completions for quality evaluators. Do **not** assume `gpt-4o` exists in the project. If no deployment supports chat completions, stop the setup flow and explain that quality evaluators need a compatible judge deployment.
### 6. Generate Local Test Dataset
Generate the seed rows directly from the selected agent root's instructions and tool capabilities you already resolved during setup. Do **not** call the identified chat-capable deployment for dataset generation; reserve that deployment for quality evaluators. Save the initial seed file to `.foundry/datasets/<agent-name>-eval-seed-v1.jsonl` with each line containing at minimum `query` and `expected_behavior` fields (optionally `context`, `ground_truth`).
The local filename must start with the selected environment's Foundry agent name (`agentName` in the selected metadata file) before adding stage, environment, or version suffixes.
Include `expected_behavior` even though Phase 1 uses built-in evaluators only. That field pre-positions the seed dataset for Phase 2 custom evaluators if the first run reveals gaps that need a per-query behavioral rubric.
Use [Generate Seed Evaluation Dataset](../../eval-datasets/references/generate-seed-dataset.md) as the single source of truth for registration. It covers `project_connection_list` with `AzureStorageAccount`, key-based versus AAD upload, `evaluation_dataset_create` with `connectionName`, and saving the returned `datasetUri`.
### 7. Persist Artifacts and Evaluation Suites
```text
.foundry/
agent-metadata.yaml
agent-metadata.prod.yaml
evaluators/
<name>.yaml
datasets/
*.jsonl
results/
<environment>/
<eval-id>/
<run-id>.json
```
Save evaluator definitions to `.foundry/evaluators/<name>.yaml`, test data to `.foundry/datasets/*.jsonl`, and create or update evaluation suites in the selected metadata file with:
- `id`
- `tags` (freeform key/value map, for example `tier: smoke`, `purpose: baseline`, `stage: seed`)
- `dataset` (for example, `<agent-name>-eval-seed`)
- `datasetVersion` (for example, `v1`)
- `datasetFile` (for example, `.foundry/datasets/<agent-name>-eval-seed-v1.jsonl`)
- `datasetUri` (returned by `evaluation_dataset_create`)
- tag values for `agent`, `stage`, and `version`
- evaluator names and thresholds
If the selected environment still uses older `testSuites[]` or legacy `testCases[]`, replace that list with `evaluationSuites[]` in the rewritten metadata. Preserve dataset/evaluator fields and map `priority` to `tags.tier` only when `tags.tier` is missing.
> β οΈ **Show Data Viewer deeplinks (for VS Code runtime only):** Append a Data Viewer deeplink immediately after reference to a dataset file in your response. Format: "[Open in Data Viewer](vscode://ms-windows-ai-studio.windows-ai-studio/open_data_viewer?file=<file_path>&source=microsoft-foundry-skill) for details and perform analysis".
### 8. Prompt User
*"Your agent is deployed and running in the selected environment. The `.foundry` cache now contains evaluators, a local seed dataset, the Foundry dataset registration metadata, and evaluation-suite metadata. Would you like to run an evaluation to identify optimization opportunities?"*
If yes -> proceed to [Step 2: Evaluate](evaluate-step.md). If no -> stop.
evaluate-step.md 4.0 KB
# Step 2 β Create Batch Evaluation
## Prerequisites
- Agent deployed and running in the selected environment
- Selected `.foundry/agent-metadata*.yaml` file loaded for the active agent root
- Evaluators configured (from [Step 1](deploy-and-setup.md) or `.foundry/evaluators/`)
- Local test dataset available (from the selected agent root's `.foundry/datasets/`)
- Evaluation suite selected from the environment's `evaluationSuites[]`
## Run Evaluation
Use **`evaluation_agent_batch_eval_create`** to run the selected evaluation suite's evaluators against the selected environment's agent.
### Required Parameters
| Parameter | Description |
|-----------|-------------|
| `projectEndpoint` | Azure AI Project endpoint from the selected metadata file |
| `agentName` | Agent name for the selected environment |
| `agentVersion` | Agent version (string, for example `"1"`) |
| `evaluatorNames` | Array of evaluator names from the selected evaluation suite |
### Test Data Options
**Preferred β local dataset:** Read JSONL from `.foundry/datasets/` and pass via `inputData` (array of objects with `query` and `expected_behavior`, optionally `context`, `ground_truth`). Always use this when the referenced cache file exists.
**Fallback only β server-side synthetic data:** Set `generateSyntheticData=true` and provide `generationModelDeploymentName`. Only use this when the local cache is missing and the user explicitly requests a refresh-free synthetic run.
## Resolve Judge Deployment
Before setting `deploymentName`, use **`model_deployment_get`** to list the selected project's actual model deployments. Choose a deployment that supports chat completions and use that deployment name for quality evaluators. Do **not** assume `gpt-4o` exists. If the project has no chat-completions-capable deployment, stop and tell the user quality evaluators cannot run until one is available.
### Additional Parameters
| Parameter | When Needed |
|-----------|-------------|
| `deploymentName` | Required for quality evaluators (the LLM-judge model) |
| `evaluationId` | Pass existing eval group ID to group runs for comparison |
| `evaluationName` | Name for a new evaluation group; include environment and evaluation-suite ID |
> **Important:** Use `evaluationId` on `evaluation_agent_batch_eval_create` (not `evalId`) to group runs. Run suites tagged `tier=smoke` first unless the user chooses a broader suite tag or a specific suite.
> β οΈ **Eval-group immutability:** Reuse an existing `evaluationId` only when the dataset comparison setup is unchanged for that group: same evaluator list and same thresholds. If evaluator definitions or thresholds change, create a **new** evaluation group instead of adding another run to the old one.
## Parameter Naming Guardrail
These eval tools use similar names for the same evaluation-group identifier. Match the parameter name to the tool exactly:
| Tool | Correct Group Parameter | Notes |
|------|-------------------------|-------|
| `evaluation_agent_batch_eval_create` | `evaluationId` | Reuse the existing group when creating a new run |
| `evaluation_get` | `evalId` | Use with `isRequestForRuns=true` to list runs in one group |
| `evaluation_comparison_create` | `insightRequest.request.evalId` | Comparison requests take `evalId`, not `evaluationId` |
> β οΈ **Common mistake:** `evaluation_get` does **not** accept `evaluationId`. Always switch from `evaluationId` to `evalId` after the run is created.
## Auto-Poll for Completion
Immediately after creating the run, poll **`evaluation_get`** in a background terminal until completion. Use `evalId + isRequestForRuns=true`. The run ID parameter is `evalRunId` (not `runId`).
Only surface the final result when status reaches `completed`, `failed`, or `cancelled`.
## Next Steps
When evaluation completes -> proceed to [Step 3: Analyze Results](analyze-results.md).
## Reference
- [Azure AI Foundry Cloud Evaluation](https://learn.microsoft.com/en-us/azure/ai-foundry/how-to/develop/cloud-evaluation)
- [Built-in Evaluators](https://learn.microsoft.com/en-us/azure/foundry/concepts/built-in-evaluators)
optimize-deploy.md 1.5 KB
# Steps 6β7 β Optimize Prompt & Deploy New Version
## Step 6 β Optimize Prompt
> β **Guardrail:** When optimizing after a dataset update, do NOT remove dataset rows or weaken evaluators to recover scores. Score drops on a harder dataset are expected β they mean test coverage improved, not that the agent regressed. Optimize for NEW failure patterns only.
Use **`prompt_optimize`** with:
| Parameter | Required | Description |
|-----------|----------|-------------|
| `developerMessage` | β
| Agent's current system prompt / instructions |
| `deploymentName` | β
| Model for optimization (e.g., `gpt-4o-mini`) |
| `projectEndpoint` or `foundryAccountResourceId` | β
| At least one required |
| `requestedChanges` | | Concise improvement suggestions from cluster analysis |
**Example `requestedChanges`:** *"Be more specific when answering geography questions"*, *"Always cite sources when providing factual claims"*
> Use the optimized prompt returned by the tool. Do NOT manually rewrite.
## Step 7 β Deploy New Version
> **Always confirm before deploying.** Show the user a diff or summary of prompt changes and wait for explicit sign-off.
After approval:
1. Use **`agent_update`** to create a new agent version with the optimized prompt
2. Use **`agent_get`** to verify the updated version is `running`
3. If the updated version is not `running`, read and follow the [troubleshoot skill](../../troubleshoot/troubleshoot.md) before continuing
## Next Steps
When the new version is running β proceed to [Step 8: Re-Evaluate](compare-iterate.md).
analyze-failures.md 3.9 KB
# Analyze Failures β Find and Cluster Failing Traces
Identify failing agent traces, group them by root cause, and produce a prioritized action table.
## Step 1 β Find Failing Traces
> β οΈ **Hosted agents:** `gen_ai.agent.name` on `dependencies` holds the **code-level class name** (e.g., `BingSearchAgent`), NOT the Foundry agent name. To filter by Foundry name, use the [Hosted Agent Variant](#hosted-agent-variant--failures) below.
```kql
dependencies
| where timestamp > ago(24h)
| where success == false or toint(resultCode) >= 400
| extend
operation = tostring(customDimensions["gen_ai.operation.name"]),
errorType = tostring(customDimensions["error.type"]),
model = tostring(customDimensions["gen_ai.request.model"]),
agentName = tostring(customDimensions["gen_ai.agent.name"]),
conversationId = tostring(customDimensions["gen_ai.conversation.id"])
| project timestamp, name, duration, resultCode, errorType, operation, model,
agentName, conversationId, operation_Id, id
| order by timestamp desc
| take 100
```
## Step 2 β Cluster by Error Type
```kql
dependencies
| where timestamp > ago(24h)
| where success == false or toint(resultCode) >= 400
| extend
errorType = tostring(customDimensions["error.type"]),
operation = tostring(customDimensions["gen_ai.operation.name"])
| summarize
count = count(),
firstSeen = min(timestamp),
lastSeen = max(timestamp),
avgDuration = avg(duration),
sampleOperationId = take_any(operation_Id)
by errorType, operation, resultCode
| order by count desc
```
## Step 3 β Prioritized Action Table
Present results as:
| Priority | Error Type | Operation | Count | Result Code | Suggested Action |
|----------|-----------|-----------|-------|-------------|-----------------|
| P0 | timeout | invoke_agent | 15 | 504 | Check agent container health, increase timeout |
| P1 | rate_limited | chat | 8 | 429 | Check quota, add retry logic |
| P2 | content_filter | chat | 5 | 400 | Review prompt for policy violations |
| P3 | tool_error | execute_tool | 3 | 500 | Check tool implementation and permissions |
**Prioritization:** P0 = highest count or most severe (5xx), then by count Γ recency.
## Step 4 β Drill Into Specific Failure
When the user selects a cluster, show individual failing traces:
```kql
dependencies
| where timestamp > ago(24h)
| where success == false
| where customDimensions["error.type"] == "<selected_error_type>"
| where customDimensions["gen_ai.operation.name"] == "<selected_operation>"
| project timestamp, name, duration, resultCode,
conversationId = tostring(customDimensions["gen_ai.conversation.id"]),
responseId = tostring(customDimensions["gen_ai.response.id"]),
operation_Id
| order by timestamp desc
| take 20
```
Also check `exceptions` table for stack traces:
```kql
exceptions
| where timestamp > ago(24h)
| where operation_Id in ("<operation_id_1>", "<operation_id_2>")
| project timestamp, type, message, outerMessage, details, operation_Id
| order by timestamp desc
```
Offer to view the full conversation for any trace via [Conversation Detail](conversation-detail.md).
## Hosted Agent Variant β Failures
For hosted agents, the Foundry agent name lives on `requests`, not `dependencies`. Use a two-step join:
```kql
let reqIds = requests
| where timestamp > ago(24h)
| where customDimensions["gen_ai.agent.name"] == "<foundry-agent-name>"
| distinct id;
dependencies
| where timestamp > ago(24h)
| where operation_ParentId in (reqIds)
| where success == false or toint(resultCode) >= 400
| extend
operation = tostring(customDimensions["gen_ai.operation.name"]),
errorType = tostring(customDimensions["error.type"]),
model = tostring(customDimensions["gen_ai.request.model"]),
conversationId = tostring(customDimensions["gen_ai.conversation.id"])
| project timestamp, name, duration, resultCode, errorType, operation, model,
conversationId, operation_ParentId, operation_Id
| order by timestamp desc
| take 100
```
analyze-latency.md 3.8 KB
# Analyze Latency β Find and Diagnose Slow Traces
Identify slow agent traces, find bottleneck spans, and correlate with token usage.
## Step 1 β Find Slow Conversations
> β οΈ **Hosted agents:** `gen_ai.agent.name` on `dependencies` holds the **code-level class name** (e.g., `BingSearchAgent`), NOT the Foundry agent name. To scope by Foundry name, use the [Hosted Agent Variant](#hosted-agent-variant--latency) below.
```kql
dependencies
| where timestamp > ago(24h)
| where customDimensions["gen_ai.operation.name"] == "invoke_agent"
| project timestamp, duration, success,
agentName = tostring(customDimensions["gen_ai.agent.name"]),
conversationId = tostring(customDimensions["gen_ai.conversation.id"]),
operation_Id
| summarize
totalDuration = sum(duration),
spanCount = count(),
hasErrors = countif(success == false) > 0
by conversationId, operation_Id
| where totalDuration > 5000
| order by totalDuration desc
| take 50
```
> **Default threshold:** 5 seconds. Ask the user for their latency threshold if not specified.
## Step 2 β Latency Distribution (P50/P95/P99)
```kql
dependencies
| where timestamp > ago(24h)
| where customDimensions["gen_ai.operation.name"] in ("chat", "invoke_agent")
| summarize
p50 = percentile(duration, 50),
p95 = percentile(duration, 95),
p99 = percentile(duration, 99),
avg = avg(duration),
count = count()
by operation = tostring(customDimensions["gen_ai.operation.name"]),
model = tostring(customDimensions["gen_ai.request.model"])
| order by p95 desc
```
Present as:
| Operation | Model | P50 (ms) | P95 (ms) | P99 (ms) | Avg (ms) | Count |
|-----------|-------|---------|---------|---------|---------|-------|
## Step 3 β Bottleneck Breakdown
For a specific slow conversation, break down time spent per span type:
```kql
dependencies
| where operation_Id == "<operation_id>"
| extend operation = tostring(customDimensions["gen_ai.operation.name"])
| summarize
totalDuration = sum(duration),
spanCount = count(),
avgDuration = avg(duration)
by operation, name
| order by totalDuration desc
```
Common bottleneck patterns:
- **`chat` spans dominate** β LLM inference is slow (consider smaller model or caching)
- **`execute_tool` spans dominate** β Tool execution is slow (optimize tool implementation)
- **`invoke_agent` has long gaps** β Orchestration overhead (check agent framework)
## Step 4 β Token Usage vs Latency Correlation
```kql
dependencies
| where timestamp > ago(24h)
| where customDimensions["gen_ai.operation.name"] == "chat"
| extend
inputTokens = toint(customDimensions["gen_ai.usage.input_tokens"]),
outputTokens = toint(customDimensions["gen_ai.usage.output_tokens"])
| where isnotempty(inputTokens)
| project duration, inputTokens, outputTokens,
model = tostring(customDimensions["gen_ai.request.model"]),
operation_Id
| order by duration desc
| take 100
```
High token counts often correlate with high latency. If confirmed, suggest:
- Reduce system prompt length
- Limit conversation history window
- Use a faster model for simpler queries
## Hosted Agent Variant β Latency
For hosted agents, scope by Foundry agent name via `requests` then join to `dependencies`:
```kql
let reqIds = requests
| where timestamp > ago(24h)
| where customDimensions["gen_ai.agent.name"] == "<foundry-agent-name>"
| distinct id;
dependencies
| where timestamp > ago(24h)
| where operation_ParentId in (reqIds)
| where customDimensions["gen_ai.operation.name"] in ("chat", "invoke_agent")
| summarize
p50 = percentile(duration, 50),
p95 = percentile(duration, 95),
p99 = percentile(duration, 99),
avg = avg(duration),
count = count()
by operation = tostring(customDimensions["gen_ai.operation.name"]),
model = tostring(customDimensions["gen_ai.request.model"])
| order by p95 desc
```
conversation-detail.md 3.6 KB
# Conversation Detail β Reconstruct Full Span Tree
Reconstruct the complete span tree for a single conversation to see exactly what happened: every LLM call, tool execution, and agent invocation with timing, tokens, and errors.
## Step 1 β Fetch All Spans for a Conversation
Use `operation_Id` (trace ID) to get all spans in a single request:
```kql
dependencies
| where operation_Id == "<operation_id>"
| project timestamp, name, duration, resultCode, success,
spanId = id,
parentSpanId = operation_ParentId,
operation = tostring(customDimensions["gen_ai.operation.name"]),
model = tostring(customDimensions["gen_ai.request.model"]),
responseModel = tostring(customDimensions["gen_ai.response.model"]),
inputTokens = toint(customDimensions["gen_ai.usage.input_tokens"]),
outputTokens = toint(customDimensions["gen_ai.usage.output_tokens"]),
responseId = tostring(customDimensions["gen_ai.response.id"]),
finishReason = tostring(customDimensions["gen_ai.response.finish_reasons"]),
errorType = tostring(customDimensions["error.type"]),
toolName = tostring(customDimensions["gen_ai.tool.name"]),
toolCallId = tostring(customDimensions["gen_ai.tool.call.id"])
| order by timestamp asc
```
Also fetch the parent request:
```kql
requests
| where operation_Id == "<operation_id>"
| project timestamp, name, duration, resultCode, success, id, operation_ParentId
```
## Step 2 β Build Span Tree
Use `spanId` and `parentSpanId` to reconstruct the hierarchy:
```
invoke_agent (root) βββ 4200ms
βββ chat (LLM call #1) βββ 1800ms, gpt-4o, 450β120 tokens
β βββ [output: "Let me check the weather..."]
βββ execute_tool (get_weather) [tool: remote_functions.weather_api] βββ 200ms
β βββ [result: "rainy, 57Β°F"]
βββ chat (LLM call #2) βββ 1500ms, gpt-4o, 620β85 tokens
β βββ [output: "The weather in Paris is rainy, 57Β°F"]
βββ [total: 450+620=1070 input, 120+85=205 output tokens]
```
Present as an indented tree with:
- **Operation type** and name
- **Duration** (highlight if > P95 for that operation type)
- **Model** and token counts (for chat operations)
- **Error type** and result code (if failed, highlight in red)
- **Finish reason** (stop, length, content_filter, tool_calls)
## Step 3 β Extract Conversation Content from invoke_agent Spans
The full input/output content lives on `invoke_agent` dependency spans in `gen_ai.input.messages` and `gen_ai.output.messages`. These JSON arrays contain the complete conversation (system prompt, user query, assistant response):
```kql
dependencies
| where operation_Id == "<operation_id>"
| where customDimensions["gen_ai.operation.name"] == "invoke_agent"
| project timestamp,
inputMessages = tostring(customDimensions["gen_ai.input.messages"]),
outputMessages = tostring(customDimensions["gen_ai.output.messages"])
| order by timestamp asc
```
Message structure: `[{"role": "user", "parts": [{"type": "text", "content": "..."}]}]`
Also check the `traces` table for additional GenAI log events:
```kql
traces
| where operation_Id == "<operation_id>"
| where message contains "gen_ai"
| project timestamp, message, customDimensions
| order by timestamp asc
```
## Step 4 β Check for Exceptions
```kql
exceptions
| where operation_Id == "<operation_id>"
| project timestamp, type, message, outerMessage,
details = parse_json(details)
| order by timestamp asc
```
Present exceptions inline in the span tree at their position in the timeline.
## Step 5 β Fetch Evaluation Results
See [Eval Correlation](eval-correlation.md) for the full workflow to look up evaluation scores by response ID or conversation ID. Use `gen_ai.response.id` values from Step 1 spans to correlate.
eval-correlation.md 2.5 KB
# Eval Correlation β Find Evaluation Results by Response or Conversation ID
Look up evaluation scores for a specific agent response using App Insights.
> **IMPORTANT:** The Foundry evaluation API does NOT support querying by response ID or conversation ID. App Insights `customEvents` is the ONLY way to correlate eval scores to specific responses. Always use this KQL approach when the user asks for eval results for a specific response or conversation.
## Prerequisites
- App Insights resource resolved (see [trace.md](../trace.md) Before Starting)
- A response ID (`gen_ai.response.id`) or conversation ID (`gen_ai.conversation.id`) from a previous trace query
## Search by Response ID
```kql
customEvents
| where timestamp > ago(30d)
| where name == "gen_ai.evaluation.result"
| where customDimensions["gen_ai.response.id"] == "<response_id>"
| extend
evalName = tostring(customDimensions["gen_ai.evaluation.name"]),
score = todouble(customDimensions["gen_ai.evaluation.score.value"]),
label = tostring(customDimensions["gen_ai.evaluation.score.label"]),
explanation = tostring(customDimensions["gen_ai.evaluation.explanation"]),
responseId = tostring(customDimensions["gen_ai.response.id"]),
conversationId = tostring(customDimensions["gen_ai.conversation.id"])
| project timestamp, evalName, score, label, explanation, responseId, conversationId
| order by evalName asc
```
## Search by Conversation ID
```kql
customEvents
| where timestamp > ago(30d)
| where name == "gen_ai.evaluation.result"
| where customDimensions["gen_ai.conversation.id"] == "<conversation_id>"
| extend
evalName = tostring(customDimensions["gen_ai.evaluation.name"]),
score = todouble(customDimensions["gen_ai.evaluation.score.value"]),
label = tostring(customDimensions["gen_ai.evaluation.score.label"]),
explanation = tostring(customDimensions["gen_ai.evaluation.explanation"]),
responseId = tostring(customDimensions["gen_ai.response.id"])
| project timestamp, evalName, score, label, explanation, responseId
| order by responseId asc, evalName asc
```
## Present Results
Show eval scores as a table:
| Evaluator | Score | Label | Explanation |
|-----------|-------|-------|-------------|
| coherence | 5.0 | pass | Response is well-structured... |
| fluency | 4.0 | pass | Natural language flow... |
| relevance | 2.0 | fail | Response doesn't address... |
When showing alongside a span tree (see [Conversation Detail](conversation-detail.md)), attach eval scores to the span whose `gen_ai.response.id` matches.
kql-templates.md 10.5 KB
# KQL Templates β GenAI Trace Query Reference
Ready-to-use KQL templates for querying GenAI OpenTelemetry traces in Application Insights.
**Table of Contents:** [App Insights Table Mapping](#app-insights-table-mapping) Β· [Key GenAI OTel Attributes](#key-genai-otel-attributes) Β· [Span Correlation](#span-correlation) Β· [Hosted Agent Attributes](#hosted-agent-attributes) Β· [Response ID Formats](#response-id-formats) Β· [Common Query Templates](#common-query-templates) Β· [OTel Reference Links](#otel-reference-links)
## App Insights Table Mapping
| App Insights Table | GenAI Data |
|-------------------|------------|
| `dependencies` | GenAI spans: LLM inference (`chat`), tool execution (`execute_tool`), agent invocation (`invoke_agent`) |
| `requests` | Incoming HTTP requests to the agent endpoint. For hosted agents, also carries `gen_ai.agent.name` (Foundry name) and `azure.ai.agentserver.*` attributes β **preferred entry point** for agent-name filtering |
| `customEvents` | GenAI evaluation results (`gen_ai.evaluation.result`) β scores, labels, explanations |
| `traces` | Log events, including GenAI events (input/output messages) |
| `exceptions` | Error details with stack traces |
## Key GenAI OTel Attributes
Stored in `customDimensions` on `dependencies` spans:
| Attribute | Description | Example |
|-----------|-------------|---------|
| `gen_ai.operation.name` | Operation type | `chat`, `invoke_agent`, `execute_tool`, `create_agent` |
| `gen_ai.conversation.id` | Conversation/session ID | `conv_5j66UpCpwteGg4YSxUnt7lPY` |
| `gen_ai.response.id` | Response ID | `chatcmpl-123` |
| `gen_ai.agent.name` | Agent name | `my-support-agent` |
| `gen_ai.agent.id` | Agent identifier | `asst_abc123` |
| `gen_ai.request.model` | Requested model | `gpt-4o` |
| `gen_ai.response.model` | Actual model used | `gpt-4o-2024-05-13` |
| `gen_ai.usage.input_tokens` | Input token count | `450` |
| `gen_ai.usage.output_tokens` | Output token count | `120` |
| `gen_ai.response.finish_reasons` | Stop reasons | `["stop"]`, `["tool_calls"]` |
| `error.type` | Error classification | `timeout`, `rate_limited`, `content_filter` |
| `gen_ai.provider.name` | Provider | `azure.ai.openai`, `openai` |
| `gen_ai.input.messages` | Full input messages (JSON array) β on `invoke_agent` spans | `[{"role":"user","parts":[{"type":"text","content":"..."}]}]` |
| `gen_ai.output.messages` | Full output messages (JSON array) β on `invoke_agent` spans | `[{"role":"assistant","parts":[{"type":"text","content":"..."}]}]` |
Stored in `customDimensions` on `customEvents` (name == `gen_ai.evaluation.result`):
| Attribute | Description | Example |
|-----------|-------------|---------|
| `gen_ai.evaluation.name` | Evaluator name | `Relevance`, `IntentResolution` |
| `gen_ai.evaluation.score.value` | Numeric score | `4.0` |
| `gen_ai.evaluation.score.label` | Human-readable label | `pass`, `fail`, `relevant` |
| `gen_ai.evaluation.explanation` | Free-form explanation | `"Response lacks detail..."` |
| `gen_ai.response.id` | Correlates to the evaluated span | `chatcmpl-123` |
| `gen_ai.conversation.id` | Correlates to conversation | `conv_5j66...` |
> **Correlation:** Eval results do NOT link via id-parentId. Use `gen_ai.conversation.id` and/or `gen_ai.response.id` to join with `dependencies` spans.
## Span Correlation
| Field | Purpose |
|-------|---------|
| `operation_Id` | Trace ID β groups all spans in one request |
| `id` | Span ID β unique identifier for this span |
| `operation_ParentId` | Parent span ID β use with `id` to build span trees |
### Operation_Id Join (requests β dependencies)
Use `requests` as the hosted-agent entry point, then carry `operation_Id` forward as the trace key when joining into `dependencies`, `traces`, or `customEvents`:
```kql
let agentRequests = materialize(
requests
| where timestamp > ago(7d)
| extend
foundryAgentName = coalesce(
tostring(customDimensions["gen_ai.agent.name"]),
tostring(customDimensions["azure.ai.agentserver.agent_name"])
),
agentId = tostring(customDimensions["gen_ai.agent.id"]),
agentNameFromId = tostring(split(agentId, ":")[0]),
agentVersion = iff(agentId contains ":", tostring(split(agentId, ":")[1]), ""),
conversationId = coalesce(
tostring(customDimensions["gen_ai.conversation.id"]),
tostring(customDimensions["azure.ai.agentserver.conversation_id"]),
operation_Id
)
| where foundryAgentName == "<foundry-agent-name>"
or agentNameFromId == "<foundry-agent-name>"
| project operation_Id, conversationId, agentVersion
);
dependencies
| where timestamp > ago(7d)
| where isnotempty(customDimensions["gen_ai.operation.name"])
| join kind=inner agentRequests on operation_Id
| extend
operation = tostring(customDimensions["gen_ai.operation.name"]),
model = tostring(customDimensions["gen_ai.request.model"])
| project timestamp, duration, success, operation, model, conversationId, agentVersion, operation_Id
| order by timestamp desc
```
## Hosted Agent Attributes
Stored in `customDimensions` on **both `requests` and `traces`** tables (NOT on `dependencies` spans):
| Attribute | Description | Example |
|-----------|-------------|---------|
| `azure.ai.agentserver.agent_name` | Hosted agent name | `hosted-agent-022-001` |
| `azure.ai.agentserver.agent_id` | Internal agent ID | `code-asst-xmwokux85uqc7fodxejaxa` |
| `azure.ai.agentserver.conversation_id` | Conversation ID | `conv_d7ab624de92d...` |
| `azure.ai.agentserver.response_id` | Response ID (caresp format) | `caresp_d7ab624de92d...` |
> **Important:** Use `requests` as the preferred entry point for agent-name filtering β it has both `azure.ai.agentserver.agent_name` and `gen_ai.agent.name` with the Foundry-level name. To reach downstream spans and related telemetry, carry `operation_Id` forward from the filtered request set and join other tables on that trace key.
> π‘ **Version enrichment:** Some hosted-agent `requests` telemetry emits `gen_ai.agent.id` in `<foundry-agent-name>:<version>` format. When that delimiter is present, split on `:` to recover `agentVersion`; if it is absent, keep filtering on the requests-scoped name fields and leave version blank.
> β οΈ **`gen_ai.agent.name` means different things on different tables:**
> - On `requests`: the **Foundry agent name** (user-visible) β e.g., `hosted-agent-022-001`
> - On `dependencies`: the **code-level class name** β e.g., `BingSearchAgent`
>
> **Always start from `requests`** when filtering by the Foundry agent name the user knows.
## Response ID Formats
| Agent Type | Prefix | Example |
|------------|--------|---------|
| Hosted agent (AgentServer) | `caresp_` | `caresp_d7ab624de92da637008Rhr4U4E1y9FSE...` |
| Prompt agent (Foundry Responses API) | `resp_` | `resp_4e2f8b016b5a0dad00697bd3c4c1b881...` |
| Azure OpenAI chat completions | `chatcmpl-` | `chatcmpl-abc123def456` |
When searching by response ID, use the appropriate prefix to narrow results. The `gen_ai.response.id` attribute appears on `dependencies` spans (for `chat` operations) and in `customEvents` (for evaluation results).
## Common Query Templates
### Overview β Conversations in last 24h
```kql
dependencies
| where timestamp > ago(24h)
| where isnotempty(customDimensions["gen_ai.operation.name"])
| summarize
spanCount = count(),
errorCount = countif(success == false),
avgDuration = avg(duration),
totalInputTokens = sum(toint(customDimensions["gen_ai.usage.input_tokens"])),
totalOutputTokens = sum(toint(customDimensions["gen_ai.usage.output_tokens"]))
by bin(timestamp, 1h)
| order by timestamp desc
```
### Error Rate by Operation
```kql
dependencies
| where timestamp > ago(24h)
| where isnotempty(customDimensions["gen_ai.operation.name"])
| summarize
total = count(),
errors = countif(success == false),
errorRate = round(100.0 * countif(success == false) / count(), 1)
by operation = tostring(customDimensions["gen_ai.operation.name"])
| order by errorRate desc
```
### Token Usage by Model
```kql
dependencies
| where timestamp > ago(24h)
| where customDimensions["gen_ai.operation.name"] == "chat"
| summarize
calls = count(),
totalInput = sum(toint(customDimensions["gen_ai.usage.input_tokens"])),
totalOutput = sum(toint(customDimensions["gen_ai.usage.output_tokens"])),
avgInput = avg(todouble(customDimensions["gen_ai.usage.input_tokens"])),
avgOutput = avg(todouble(customDimensions["gen_ai.usage.output_tokens"]))
by model = tostring(customDimensions["gen_ai.request.model"])
| order by totalInput desc
```
### Tool Call Details
```kql
dependencies
| where operation_Id == "<operation_id>"
| where customDimensions["gen_ai.operation.name"] == "execute_tool"
| project timestamp, duration, success,
toolName = tostring(customDimensions["gen_ai.tool.name"]),
toolType = tostring(customDimensions["gen_ai.tool.type"]),
toolCallId = tostring(customDimensions["gen_ai.tool.call.id"]),
toolArgs = tostring(customDimensions["gen_ai.tool.call.arguments"]),
toolResult = tostring(customDimensions["gen_ai.tool.call.result"])
| order by timestamp asc
```
Key tool attributes:
| Attribute | Description | Example |
|-----------|-------------|---------|
| `gen_ai.tool.name` | Tool function name | `remote_functions.bing_grounding`, `python` |
| `gen_ai.tool.type` | Tool type | `extension`, `function` |
| `gen_ai.tool.call.id` | Unique call ID | `call_db64aa6a004a...` |
| `gen_ai.tool.call.arguments` | JSON arguments passed | `{"query": "latest AI news"}` |
| `gen_ai.tool.call.result` | Tool output (may be truncated) | `<<ImageDisplayed>>` |
### Evaluation Results by Conversation
```kql
customEvents
| where timestamp > ago(24h)
| where name == "gen_ai.evaluation.result"
| extend
evalName = tostring(customDimensions["gen_ai.evaluation.name"]),
score = todouble(customDimensions["gen_ai.evaluation.score.value"]),
label = tostring(customDimensions["gen_ai.evaluation.score.label"]),
conversationId = tostring(customDimensions["gen_ai.conversation.id"])
| summarize
evalCount = count(),
avgScore = avg(score),
failCount = countif(label == "fail" or label == "not_relevant" or label == "incorrect"),
evaluators = make_set(evalName)
by conversationId
| order by failCount desc
```
> For detailed eval queries by response ID or conversation ID, see [Eval Correlation](eval-correlation.md).
## OTel Reference Links
- [GenAI Spans](https://opentelemetry.io/docs/specs/semconv/gen-ai/gen-ai-spans/)
- [GenAI Agent Spans](https://opentelemetry.io/docs/specs/semconv/gen-ai/gen-ai-agent-spans/)
- [GenAI Events](https://opentelemetry.io/docs/specs/semconv/gen-ai/gen-ai-events/)
- [GenAI Metrics](https://opentelemetry.io/docs/specs/semconv/gen-ai/gen-ai-metrics/)
search-traces.md 6.7 KB
# Search Traces β Conversation-Level Search
Search agent traces at the conversation level. Returns summaries grouped by conversation or operation, not individual spans.
## Prerequisites
- App Insights resource resolved (see [trace.md](../trace.md) Before Starting)
- Selected agent root, metadata file, and environment confirmed from `.foundry/agent-metadata*.yaml`
- Time range confirmed with user (default: last 24 hours)
## Search by Conversation ID
Keep the selected environment visible in the summary, and add the selected agent name or environment tag filters when the telemetry emits them.
```kql
dependencies
| where timestamp > ago(24h)
| where customDimensions["gen_ai.conversation.id"] == "<conversation_id>"
| project timestamp, name, duration, resultCode, success,
operation = tostring(customDimensions["gen_ai.operation.name"]),
model = tostring(customDimensions["gen_ai.request.model"]),
inputTokens = toint(customDimensions["gen_ai.usage.input_tokens"]),
outputTokens = toint(customDimensions["gen_ai.usage.output_tokens"]),
operation_Id, id, operation_ParentId
| order by timestamp asc
```
## Search by Response ID
Auto-detect the response ID format to determine agent type:
- `caresp_...` β Hosted agent (AgentServer)
- `resp_...` β Prompt agent (Foundry Responses API)
- `chatcmpl-...` β Azure OpenAI chat completions
```kql
dependencies
| where timestamp > ago(24h)
| where customDimensions["gen_ai.response.id"] == "<response_id>"
| project timestamp, name, duration, resultCode, success,
operation = tostring(customDimensions["gen_ai.operation.name"]),
model = tostring(customDimensions["gen_ai.request.model"]),
inputTokens = toint(customDimensions["gen_ai.usage.input_tokens"]),
outputTokens = toint(customDimensions["gen_ai.usage.output_tokens"]),
operation_Id, id, operation_ParentId
```
Then drill into the full conversation:
> β οΈ **STOP β read [Conversation Detail](conversation-detail.md) before writing your own drill-down query.** It contains the correct span tree reconstruction logic, event/exception queries, and eval correlation steps.
Quick drill-down using the `operation_Id` from above:
```kql
dependencies
| where operation_Id == "<operation_id_from_above>"
| project timestamp, name, duration, resultCode, success,
spanId = id, parentSpanId = operation_ParentId,
operation = tostring(customDimensions["gen_ai.operation.name"]),
model = tostring(customDimensions["gen_ai.request.model"]),
inputTokens = toint(customDimensions["gen_ai.usage.input_tokens"]),
outputTokens = toint(customDimensions["gen_ai.usage.output_tokens"]),
responseId = tostring(customDimensions["gen_ai.response.id"]),
errorType = tostring(customDimensions["error.type"]),
toolName = tostring(customDimensions["gen_ai.tool.name"])
| order by timestamp asc
```
Also check for eval results: see [Eval Correlation](eval-correlation.md).
## Search by Agent Name
> **Note:** For hosted agents, `gen_ai.agent.name` in `dependencies` refers to *sub-agents* (e.g., `BingSearchAgent`), not the top-level hosted agent. See "Search by Hosted Agent Name" below.
> π‘ **Hosted-agent versioning:** If you need the deployed version, use the hosted-agent pattern below and parse `gen_ai.agent.id` when it is emitted in `<agent-name>:<version>` format.
```kql
dependencies
| where timestamp > ago(24h)
| where customDimensions["gen_ai.agent.name"] == "<agent_name>"
| summarize
startTime = min(timestamp),
endTime = max(timestamp),
totalDuration = max(timestamp) - min(timestamp),
spanCount = count(),
errorCount = countif(success == false),
totalInputTokens = sum(toint(customDimensions["gen_ai.usage.input_tokens"])),
totalOutputTokens = sum(toint(customDimensions["gen_ai.usage.output_tokens"]))
by conversationId = tostring(customDimensions["gen_ai.conversation.id"]),
operation_Id
| order by startTime desc
| take 50
```
## Search by Hosted Agent Name
For hosted agents, the Foundry agent name (e.g., `hosted-agent-022-001`) appears on `requests` and `traces` β NOT on `dependencies`. Use `requests` as the preferred entry point, materialize the matching request rows, then join downstream spans on `operation_Id`:
```kql
let agentRequests = materialize(
requests
| where timestamp > ago(24h)
| extend
foundryAgentName = coalesce(
tostring(customDimensions["gen_ai.agent.name"]),
tostring(customDimensions["azure.ai.agentserver.agent_name"])
),
agentId = tostring(customDimensions["gen_ai.agent.id"]),
agentNameFromId = tostring(split(agentId, ":")[0]),
agentVersion = iff(agentId contains ":", tostring(split(agentId, ":")[1]), ""),
conversationId = coalesce(
tostring(customDimensions["gen_ai.conversation.id"]),
tostring(customDimensions["azure.ai.agentserver.conversation_id"]),
operation_Id
)
| where foundryAgentName == "<agent_name>"
or agentNameFromId == "<agent_name>"
| project operation_Id, conversationId, agentVersion
);
dependencies
| where timestamp > ago(24h)
| where isnotempty(customDimensions["gen_ai.operation.name"])
| join kind=inner agentRequests on operation_Id
| summarize
startTime = min(timestamp),
endTime = max(timestamp),
spanCount = count(),
errorCount = countif(success == false),
totalInputTokens = sum(toint(customDimensions["gen_ai.usage.input_tokens"])),
totalOutputTokens = sum(toint(customDimensions["gen_ai.usage.output_tokens"]))
by conversationId, operation_Id, agentVersion
| order by startTime desc
| take 50
```
If `gen_ai.agent.id` does not contain `:`, continue using the requests-scoped name fields for filtering and treat `agentVersion` as optional enrichment rather than a required key.
## Conversation Summary Table
Present results in this format:
| Conversation ID | Agent Version | Start Time | Duration | Spans | Errors | Input Tokens | Output Tokens |
|----------------|---------------|------------|----------|-------|--------|-------------|---------------|
| conv_abc123 | 3 | 2025-01-15 10:30 | 4.2s | 12 | 0 | 850 | 320 |
| conv_def456 | 4 | 2025-01-15 10:25 | 8.7s | 18 | 2 | 1200 | 450 |
Highlight rows with errors in the summary. Offer to drill into any conversation via [Conversation Detail](conversation-detail.md).
## Free-Text Search
When the user provides a general search term (e.g., agent name, error message):
```kql
union dependencies, requests, exceptions, traces
| where timestamp > ago(24h)
| where * contains "<search_term>"
| summarize count() by operation_Id
| order by count_ desc
| take 20
```
## After Successful Query
> π **Reminder:** If this is the first trace query in this session, ensure App Insights connection info was persisted to the selected metadata file for the selected environment (see [trace.md β Before Starting](../trace.md#before-starting--resolve-app-insights-connection)).
tracing-insights-api.md 4.8 KB
# Tracing Insights API
Automatically detect quality regressions and anomalies in agent traces using changepoint detection on evaluation scores stored in App Insights.
## When to Use
Use this instead of manual KQL queries when you want **automated anomaly detection** across evaluation dimensions (task adherence, intent resolution, fluency, latency, token usage). The API finds statistical changepoints in score distributions β no manual threshold tuning needed.
**Prerequisites:**
- App Insights connected to the Foundry project (with `gen_ai.evaluation.result` custom events)
- Evaluation data from portal playground sessions or batch evals (raw traces alone are not enough)
## Endpoint
```
POST https://{region}.api.azureml.ms/notification/v1-beta2/subscriptions/{sub}/resourceGroups/{rg}/providers/microsoft.insights/components/{component}/:insights
```
The API is region-agnostic β any regional endpoint can serve requests for any project. For lowest latency, use the same region as the Foundry project (e.g., `eastus2`, `westus2`, `westcentralus`). If the project region is unknown, use `eastus2` as the default.
**Query parameters:**
| Parameter | Required | Description |
|--------------------|----------|------------------------------------------------------------------------|
| `startDateTimeUtc` | Yes | ISO 8601 start of analysis window |
| `endDateTimeUtc` | Yes | ISO 8601 end of analysis window |
| `agent` | Yes | Agent name (URL-encoded) |
| `projectId` | Yes | ARM resource ID of the Foundry project (URL-encoded β contains slashes)|
| `top` | No | Max insights to return (default 50) |
**Auth:** `az account get-access-token --resource https://ai.azure.com`
**Body:** Must send `{}` (empty JSON object) β POST with no body returns 400.
## Example
```powershell
$token = az account get-access-token --resource https://ai.azure.com --query accessToken -o tsv
$encodedAgent = [uri]::EscapeDataString("my-agent")
$encodedProjectId = [uri]::EscapeDataString("/subscriptions/{sub}/resourceGroups/{rg}/providers/Microsoft.CognitiveServices/accounts/{account}/projects/{project}")
$uri = "https://{region}.api.azureml.ms/notification/v1-beta2/subscriptions/{sub}/resourceGroups/{rg}/providers/microsoft.insights/components/{component}/:insights?startDateTimeUtc=2025-01-01T00:00:00Z&endDateTimeUtc=2025-01-18T00:00:00Z&agent=$encodedAgent&projectId=$encodedProjectId&top=50"
$response = Invoke-RestMethod -Uri $uri -Method POST -Headers @{
"Authorization" = "Bearer $token"
"Content-Type" = "application/json"
} -Body "{}"
```
## Response Structure (v1-beta2)
Response is grouped by agent version. Each insight includes `relatedSpans` with `operationId` (App Insights trace ID) for querying full trace content.
```json
{
"agents": [{
"agent": "my-agent:1",
"insights": [{
"id": "anomaly-token-shift-<hash>",
"type": "Token",
"severity": "Critical",
"message": "Token usage increased by 137%",
"agentVersion": "1",
"metadata": { "meanBefore": 2041, "meanAfter": 4831, "confidence": 0.91 },
"relatedSpans": {
"totalCount": 13,
"spans": [
{ "responseId": "resp_...", "operationId": "<trace-id>", "evaluationRunId": null }
]
}
}],
"insightCount": 3
}],
"totalCount": 3, "criticalCount": 1, "warningCount": 1, "improvementCount": 1
}
```
## Querying Traces from relatedSpans
Use `operationId` from `relatedSpans` to fetch full trace content from App Insights:
```kql
dependencies
| where operation_Id == "<operationId>"
| where customDimensions has "invoke_agent"
| project input = customDimensions["gen_ai.input.messages"],
output = customDimensions["gen_ai.output.messages"],
tokens = toint(customDimensions["gen_ai.usage.output_tokens"])
```
This returns the user query and agent response for the specific trace flagged by the insight.
## How Changepoint Detection Works
The API finds **statistical inflection points within the queried time window**. `meanBefore`/`meanAfter` represent averages on either side of the detected shift β not comparisons to a historical baseline.
- 10+ data points give better signal for changepoint detection
- `confidence` close to 1.0 = statistically significant shift
## Next Steps
After receiving insights with `Warning` or `Critical` severity:
1. Use `relatedSpans.operationId` values to query full trace content from App Insights (see KQL above)
2. Present the insights summary to the user with severity, type, evaluator name, and shift magnitude
3. Offer to drill into specific traces for detailed analysis using the [trace analysis skill](../trace.md)
trace.md 6.2 KB
# Foundry Agent Trace Analysis
Analyze production traces for Foundry agents using Application Insights and GenAI OpenTelemetry semantic conventions. This skill provides structured KQL-powered workflows for a selected agent root and environment: searching conversations, diagnosing failures, and identifying latency bottlenecks.
## When to Use This Skill
USE FOR: analyze agent traces, search agent conversations, find failing traces, slow traces, latency analysis, trace search, conversation history, agent errors in production, debug agent responses, App Insights traces, GenAI telemetry, trace correlation, span tree, production trace analysis, evaluation results, evaluation scores, eval run results, find by response ID, get agent trace by conversation ID, agent evaluation scores from App Insights.
> **USE THIS SKILL INSTEAD OF** `azure-monitor` or `azure-applicationinsights` when querying Foundry agent traces, evaluations, or GenAI telemetry. This skill has correct GenAI OTel attribute mappings and tested KQL templates that those general tools lack.
> β οΈ **DO NOT manually write KQL queries** for GenAI trace analysis **without reading this skill first.** This skill provides tested query templates with correct GenAI OTel attribute mappings, proper span correlation logic, environment-aware scoping, and conversation-level aggregation patterns.
## Quick Reference
| Property | Value |
|----------|-------|
| Data source | Application Insights (App Insights) |
| Query language | KQL (Kusto Query Language) |
| Related skills | `troubleshoot` (hosted-agent logs), `eval-datasets` (trace harvesting) |
| Preferred query tool | `monitor_resource_log_query` (Azure MCP) - use for App Insights KQL queries |
| OTel conventions | [GenAI Spans](https://opentelemetry.io/docs/specs/semconv/gen-ai/gen-ai-spans/), [Agent Spans](https://opentelemetry.io/docs/specs/semconv/gen-ai/gen-ai-agent-spans/) |
| Local metadata | selected `.foundry/agent-metadata*.yaml` file |
## Entry Points
| User Intent | Start At |
|-------------|----------|
| "Search agent conversations" / "Find traces" | [Search Traces](references/search-traces.md) |
| "Tell me about response ID X" / "Look up response ID" | [Search Traces - Search by Response ID](references/search-traces.md#search-by-response-id) |
| "Why is my agent failing?" / "Find errors" | [Analyze Failures](references/analyze-failures.md) |
| "My agent is slow" / "Latency analysis" | [Analyze Latency](references/analyze-latency.md) |
| "Show me this conversation" / "Trace detail" | [Conversation Detail](references/conversation-detail.md) |
| "Find eval results for response ID" / "eval scores from traces" | [Eval Correlation](references/eval-correlation.md) |
| "What KQL do I need?" | [KQL Templates](references/kql-templates.md) |
| "Auto-detect agent issues" / "Get automated insights" / "What's wrong with my agent?" | [Tracing Insights API](references/tracing-insights-api.md) |
## Before Starting β Resolve App Insights Connection
1. Resolve the target agent root, selected metadata file, and environment from `.foundry/agent-metadata*.yaml`.
2. Check `environments.<env>.observability.applicationInsightsConnectionString` or `environments.<env>.observability.applicationInsightsResourceId` in the selected metadata file.
3. If observability settings are missing, use `project_connection_list` to discover App Insights linked to the Foundry project, then persist the chosen resource back to `environments.<env>.observability` in the selected metadata file before querying.
4. Confirm the selected App Insights resource and environment with the user before querying.
5. Use **`monitor_resource_log_query`** (Azure MCP tool) to execute KQL queries against the App Insights resource. This is preferred over delegating to the `azure-kusto` skill. Pass the App Insights resource ID and the KQL query directly.
| Metadata field | Purpose | Example |
|----------------|---------|---------|
| `environments.<env>.observability.applicationInsightsConnectionString` | App Insights connection string | `InstrumentationKey=...;IngestionEndpoint=...` |
| `environments.<env>.observability.applicationInsightsResourceId` | ARM resource ID | `/subscriptions/.../Microsoft.Insights/components/...` |
> β οΈ **Always pass `subscription` explicitly** to Azure MCP tools like `monitor_resource_log_query` - they do not extract it from resource IDs.
## Behavioral Rules
1. **Always display the KQL query.** Before executing any KQL query, display it in a code block. Never run a query silently.
2. **Keep environment visible.** Include the selected environment and agent name in each search summary, and include the derived agent version when the query can recover it from telemetry.
3. **Start broad, then narrow.** Begin with conversation-level summaries, then drill into specific conversations or spans on user request.
4. **Use time ranges.** Always scope queries with a time range (default: last 24 hours). Ask the user for the range if not specified.
5. **Explain GenAI attributes.** When displaying results, translate OTel attribute names to human-readable labels (for example, `gen_ai.operation.name` -> "Operation").
6. **Link to conversation detail.** When showing search or failure results, offer to drill into any specific conversation.
7. **Scope to the selected environment.** App Insights may contain traces from multiple agents or environments. Filter with the selected environment's agent name first, then add an environment tag filter if the telemetry emits one.
8. **Resolve hosted-agent identity from `requests` first.** For hosted agents, prefer `requests`-scoped `gen_ai.agent.name` or `azure.ai.agentserver.agent_name` as the Foundry-facing filter. When `gen_ai.agent.id` is emitted in `<agent-name>:<version>` format, parse it to surface `agentVersion`, but do not treat `dependencies.gen_ai.agent.name` as the top-level hosted-agent name.
9. **Use `operation_Id` to fan out hosted-agent traces.** After isolating the hosted-agent `requests` rows, materialize their `operation_Id` values and join other telemetry tables on `operation_Id`. When conversation IDs are sparse, use `coalesce(gen_ai.conversation.id, azure.ai.agentserver.conversation_id, operation_Id)` so every row still rolls up to a stable conversation key.
troubleshoot.md 6.7 KB
# Foundry Agent Troubleshoot
Troubleshoot and debug Foundry agents by collecting hosted-agent session logs, discovering observability connections, and querying Application Insights telemetry.
## Quick Reference
| Property | Value |
|----------|-------|
| Agent types | Prompt (LLM-based), Hosted |
| MCP servers | `azure` |
| Key Foundry MCP tools | `agent_get` |
| Related skills | `trace` (telemetry analysis) |
| Preferred query tool | `monitor_resource_log_query` (Azure MCP) β preferred over `azure-kusto` for App Insights |
| CLI references | `az cognitiveservices account connection`, `az rest`, `curl` |
## When to Use This Skill
- Agent is not responding or returning errors
- Hosted agent version is not becoming active
- Need to view hosted-agent session logs
- Diagnose latency or timeout issues
- Query Application Insights for agent traces and exceptions
- Investigate agent runtime failures
## MCP Tools
| Tool | Description | Parameters |
|------|-------------|------------|
| `agent_get` | Get agent details to determine type and inspect agent/version status | `projectEndpoint` (required), `agentName` (optional) |
## Workflow
### Step 1: Collect Agent Information
Use the project endpoint and agent name from the project context (see Common: Project Context Resolution). Ask the user only for values not already resolved:
- **Project endpoint** β AI Foundry project endpoint URL
- **Agent name** β Name of the agent to troubleshoot
### Step 2: Determine Agent Type
Use `agent_get` with `projectEndpoint` and `agentName` to retrieve the agent definition. Check the `kind` field:
- `"hosted"` β Proceed to Step 3
- `"prompt"` β Skip to Step 4 (Discover Observability Connections)
### Step 3: Retrieve Logs (Hosted Agents Only)
Hosted-agent logs are scoped to individual **sessions** (sandbox instances).
1. **Check agent version status** β Use `agent_get` to verify the agent version status is `active`. If it is not active, the agent may still be provisioning or may have failed to become active.
2. **List sessions** β Hosted-agent logs require a `sessionId`. If the user does not have one, list available sessions:
```bash
az rest --method GET \
--url "<projectEndpoint>/agents/<agentName>/sessions?api-version=2025-11-15-preview" \
--headers "Foundry-Features=HostedAgents=V1Preview" \
--resource "https://ai.azure.com"
```
3. **Retrieve session logs** β The log stream endpoint uses Server-Sent Events (SSE). Use `curl` with a timeout:
```bash
TOKEN=$(az account get-access-token --resource "https://ai.azure.com" --query accessToken -o tsv)
curl -s --max-time 15 \
-H "Authorization: Bearer $TOKEN" \
-H "Accept: text/event-stream" \
-H "Foundry-Features: HostedAgents=V1Preview" \
"<projectEndpoint>/agents/<agentName>/sessions/<sessionId>:logstream?api-version=2025-11-15-preview"
```
> β οΈ **404 is expected** if the session sandbox has not been created yet. Advise the user to send a message to the agent first to trigger sandbox creation, then retry.
4. **Interpret the logs** β Each SSE frame is `event: log\ndata: {...}\n\n`:
- **Preamble** (first event): JSON with `session_state`, `session_id`, `agent`, `version`, `last_accessed`
- **Log lines** (subsequent events): JSON with `stream` (`stdout`/`stderr`/`status`), `message`, and `timestamp`
- **Error events**: `event: error` frames indicate server-side errors within the session sandbox
Present the logs to the user and highlight any errors or warnings found.
### Step 4: Discover Observability Connections
List the project connections to find Application Insights or Azure Monitor resources using the Azure CLI command documented at:
[az cognitiveservices account connection](https://learn.microsoft.com/en-us/cli/azure/cognitiveservices/account/connection?view=azure-cli-latest)
Refer to the documentation above for the exact command syntax and parameters. Look for connections of type `ApplicationInsights` or `AzureMonitor` in the output.
If no observability connection is found, inform the user and suggest setting up Application Insights for the project. Ask if they want to proceed without telemetry data.
### Step 5: Query Application Insights Telemetry
Use **`monitor_resource_log_query`** (Azure MCP tool) to run KQL queries against the Application Insights resource discovered in Step 4. This is preferred over delegating to the `azure-kusto` skill. Pass the App Insights resource ID and the KQL query directly.
> β οΈ **Always pass `subscription` explicitly** to Azure MCP tools like `monitor_resource_log_query` β they don't extract it from resource IDs.
Use `* contains "<response_id>"` or `* contains "<agent_name>"` filters to narrow down results to the specific agent instance.
### Step 6: Summarize Findings
Present a summary to the user including:
- **Agent type and status** β hosted or prompt; hosted agent version status when relevant
- **Log errors** β key errors from hosted-agent session logs
- **Telemetry insights** β exceptions, failed requests, latency trends
- **Recommended actions** β specific steps to resolve identified issues
## Error Handling
| Error | Cause | Resolution |
|-------|-------|------------|
| Agent not found | Invalid agent name or project endpoint | Use `agent_get` to list available agents and verify name |
| Hosted agent not active | Hosted agent is still provisioning or failed | Check that the ACR image was pushed correctly and agent identity permissions are assigned; wait and re-check status |
| Session logs 404 | Session sandbox has not been created yet | The sandbox is created on first invocation β send a message to the agent to trigger sandbox creation, then retry |
| SSE error event | Server-side error within the session sandbox | Check the error event `data` field for details |
| No session ID | User does not know which session to troubleshoot | List sessions via REST API (see Step 3) |
| No observability connection | Application Insights not configured for the project | Suggest configuring Application Insights for the Foundry project |
| Kusto query failed | Invalid cluster/database or insufficient permissions | Verify Application Insights resource details and reader permissions |
| No telemetry data | Agent not instrumented or too recent | Check if Application Insights SDK is configured; data may take a few minutes to appear |
## Additional Resources
- [Foundry Hosted Agents](https://learn.microsoft.com/azure/ai-foundry/agents/concepts/hosted-agents?view=foundry)
- [Account Connection CLI Reference](https://learn.microsoft.com/en-us/cli/azure/cognitiveservices/account/connection?view=azure-cli-latest)
- [KQL Quick Reference](https://learn.microsoft.com/azure/data-explorer/kusto/query/kql-quick-reference)
- [Foundry Samples](https://github.com/microsoft-foundry/foundry-samples)
SKILL.md 6.4 KB
---
name: deploy-model
description: "Unified Azure OpenAI model deployment skill with intelligent intent-based routing. Handles quick preset deployments, fully customized deployments (version/SKU/capacity/RAI policy), and capacity discovery across regions and projects. USE FOR: deploy model, deploy gpt, create deployment, model deployment, deploy openai model, set up model, provision model, find capacity, check model availability, where can I deploy, best region for model, capacity analysis. DO NOT USE FOR: listing existing deployments (use foundry_models_deployments_list MCP tool), deleting deployments, agent creation (use agent/create), project creation (use project/create)."
license: MIT
metadata:
author: Microsoft
version: "1.0.0"
---
# Deploy Model
Unified entry point for all Azure OpenAI model deployment workflows. Analyzes user intent and routes to the appropriate deployment mode.
## Quick Reference
| Mode | When to Use | Sub-Skill |
|------|-------------|-----------|
| **Preset** | Quick deployment, no customization needed | [preset/SKILL.md](preset/SKILL.md) |
| **Customize** | Full control: version, SKU, capacity, RAI policy | [customize/SKILL.md](customize/SKILL.md) |
| **Capacity Discovery** | Find where you can deploy with specific capacity | [capacity/SKILL.md](capacity/SKILL.md) |
## Intent Detection
Analyze the user's prompt and route to the correct mode:
```
User Prompt
β
ββ Simple deployment (no modifiers)
β "deploy gpt-4o", "set up a model"
β ββ> PRESET mode
β
ββ Customization keywords present
β "custom settings", "choose version", "select SKU",
β "set capacity to X", "configure content filter",
β "PTU deployment", "with specific quota"
β ββ> CUSTOMIZE mode
β
ββ Capacity/availability query
β "find where I can deploy", "check capacity",
β "which region has X capacity", "best region for 10K TPM",
β "where is this model available"
β ββ> CAPACITY DISCOVERY mode
β
ββ Ambiguous (has capacity target + deploy intent)
"deploy gpt-4o with 10K capacity to best region"
ββ> CAPACITY DISCOVERY first β then PRESET or CUSTOMIZE
```
### Routing Rules
| Signal in Prompt | Route To | Reason |
|------------------|----------|--------|
| Just model name, no options | **Preset** | User wants quick deployment |
| "custom", "configure", "choose", "select" | **Customize** | User wants control |
| "find", "check", "where", "which region", "available" | **Capacity** | User wants discovery |
| Specific capacity number + "best region" | **Capacity β Preset** | Discover then deploy quickly |
| Specific capacity number + "custom" keywords | **Capacity β Customize** | Discover then deploy with options |
| "PTU", "provisioned throughput" | **Customize** | PTU requires SKU selection |
| "optimal region", "best region" (no capacity target) | **Preset** | Region optimization is preset's specialty |
### Multi-Mode Chaining
Some prompts require two modes in sequence:
**Pattern: Capacity β Deploy**
When a user specifies a capacity requirement AND wants deployment:
1. Run **Capacity Discovery** to find regions/projects with sufficient quota
2. Present findings to user
3. Ask: "Would you like to deploy with **quick defaults** or **customize settings**?"
4. Route to **Preset** or **Customize** based on answer
> π‘ **Tip:** If unsure which mode the user wants, default to **Preset** (quick deployment). Users who want customization will typically use explicit keywords like "custom", "configure", or "with specific settings".
## Project Selection (All Modes)
Before any deployment, resolve which project to deploy to. This applies to **all** modes (preset, customize, and after capacity discovery).
### Resolution Order
1. **Check `PROJECT_RESOURCE_ID` env var** β if set, use it as the default
2. **Check user prompt** β if user named a specific project or region, use that
3. **If neither** β query the user's projects and suggest the current one
### Confirmation Step (Required)
**Always confirm the target before deploying.** Show the user what will be used and give them a chance to change it:
```
Deploying to:
Project: <project-name>
Region: <region>
Resource: <resource-group>
Is this correct? Or choose a different project:
1. β
Yes, deploy here (default)
2. π Show me other projects in this region
3. π Choose a different region
```
If user picks option 2, show top 5 projects in that region:
```
Projects in <region>:
1. project-alpha (rg-alpha)
2. project-beta (rg-beta)
3. project-gamma (rg-gamma)
...
```
> β οΈ **Never deploy without showing the user which project will be used.** This prevents accidental deployments to the wrong resource.
## Pre-Deployment Validation (All Modes)
Before presenting any deployment options (SKU, capacity), always validate both of these:
1. **Model supports the SKU** β query the model catalog to confirm the selected model+version supports the target SKU:
```bash
az cognitiveservices model list --location <region> --subscription <sub-id> -o json
```
Filter for the model, extract `.model.skus[].name` to get supported SKUs.
2. **Subscription has available quota** β check that the user's subscription has unallocated quota for the SKU+model combination:
```bash
az cognitiveservices usage list --location <region> --subscription <sub-id> -o json
```
Match by usage name pattern `OpenAI.<SKU>.<model-name>` (e.g., `OpenAI.GlobalStandard.gpt-4o`). Compute `available = limit - currentValue`.
> β οΈ **Warning:** Only present options that pass both checks. Do NOT show hardcoded SKU lists β always query dynamically. SKUs with 0 available quota should be shown as β informational items, not selectable options.
> π‘ **Quota management:** For quota increase requests, usage monitoring, and troubleshooting quota errors, defer to the [quota skill](../../quota/quota.md) instead of duplicating that guidance inline.
## Prerequisites
All deployment modes require:
- Azure CLI installed and authenticated (`az login`)
- Active Azure subscription with deployment permissions
- Azure AI Foundry project resource ID (or agent will help discover it via `PROJECT_RESOURCE_ID` env var)
## Sub-Skills
- **[preset/SKILL.md](preset/SKILL.md)** β Quick deployment to optimal region with sensible defaults
- **[customize/SKILL.md](customize/SKILL.md)** β Interactive guided flow with full configuration control
- **[capacity/SKILL.md](capacity/SKILL.md)** β Discover available capacity across regions and projects
TEST_PROMPTS.md 3.2 KB
# Deploy Model β Test Prompts
Test prompts for the unified `deploy-model` skill with router, preset, customize, and capacity sub-skills.
## Preset Mode (Quick Deploy)
| # | Prompt | Expected |
|---|--------|----------|
| 1 | Deploy gpt-4o | Preset β confirm project, deploy with defaults |
| 2 | Set up o3-mini for me | Preset β pick latest version automatically |
| 3 | I need a text-embedding-ada-002 deployment | Preset β non-chat model |
| 4 | Deploy gpt-4o to the best region | Preset β region scan, no capacity target |
## Customize Mode (Guided Flow)
| # | Prompt | Expected |
|---|--------|----------|
| 5 | Deploy gpt-4o with custom settings | Customize β walk through version β SKU β capacity β RAI |
| 6 | I want to choose the version and SKU for my o3-mini deployment | Customize β explicit keywords |
| 7 | Set up a PTU deployment for gpt-4o | Customize β PTU requires SKU selection |
| 8 | Deploy gpt-4o with a specific content filter | Customize β RAI policy flow |
## Capacity Discovery
| # | Prompt | Expected |
|---|--------|----------|
| 9 | Where can I deploy gpt-4o? | Capacity β show regions, no deploy |
| 10 | Which regions have o3-mini available? | Capacity β run script, show table |
| 11 | Check if I have enough quota for gpt-4o with 500K TPM | Capacity β high target, some regions may not qualify |
## Chained (Capacity β Deploy)
| # | Prompt | Expected |
|---|--------|----------|
| 12 | Find me the best region and project to deploy gpt-4o with 10K capacity | Capacity β Preset |
| 13 | Deploy o3-mini with 200K TPM to whatever region has it | Capacity β Preset |
| 14 | I want to deploy gpt-4o with 50K capacity and choose my own settings | Capacity β Customize |
## Negative / Edge Cases
| # | Prompt | Expected |
|---|--------|----------|
| 15 | Deploy unicorn-model-9000 | Fail gracefully β model doesn't exist |
| 16 | Deploy gpt-4o with 999999K TPM | Capacity shows no region qualifies |
| 17 | Deploy gpt-4o (with az login expired) | Auth error caught early |
| 18 | Delete my gpt-4o deployment | Should NOT trigger deploy-model |
| 19 | List my current deployments | Should NOT trigger deploy-model |
| 20 | Deploy gpt-4o to mars-region-1 | Fail gracefully β invalid region |
## Project Selection
| # | Prompt | Expected |
|---|--------|----------|
| 21 | Deploy gpt-4o (with PROJECT_RESOURCE_ID set) | Show current project, confirm before deploying |
| 22 | Deploy gpt-4o (no PROJECT_RESOURCE_ID) | Ask user to pick a project |
| 23 | Deploy gpt-4o to project my-special-project | Use named project directly |
## Ambiguous / Routing Stress
| # | Prompt | Expected |
|---|--------|----------|
| 24 | Help me with model deployment | Preset (default) β vague, no keywords |
| 25 | I need gpt-4o deployed fast with good capacity | Preset β "fast" + vague capacity |
| 26 | Can you configure a deployment? | Customize β "configure" keyword, should ask which model |
| 27 | What's the best way to deploy gpt-4o with 100K? | Capacity β Preset |
## Automated Test Results (2026-02-09)
All 18 tests passed. Deployments created during testing were cleaned up.
| Category | Tests | Result |
|----------|-------|--------|
| Preset | 3/3 | β
|
| Customize | 2/2 | β
|
| Capacity | 3/3 | β
|
| Chained | 1/1 | β
|
| Negative | 5/5 | β
|
| Ambiguous | 4/4 | β
|
SKILL.md 6.8 KB
---
name: capacity
description: "Discovers available Azure OpenAI model capacity across regions and projects. Analyzes quota limits, compares availability, and recommends optimal deployment locations based on capacity requirements. USE FOR: find capacity, check quota, where can I deploy, capacity discovery, best region for capacity, multi-project capacity search, quota analysis, model availability, region comparison, check TPM availability. DO NOT USE FOR: actual deployment (hand off to preset or customize after discovery), quota increase requests (direct user to Azure Portal), listing existing deployments."
license: MIT
metadata:
author: Microsoft
version: "1.0.0"
---
# Capacity Discovery
Finds available Azure OpenAI model capacity across all accessible regions and projects. Recommends the best deployment location based on capacity requirements.
## Quick Reference
| Property | Description |
|----------|-------------|
| **Purpose** | Find where you can deploy a model with sufficient capacity |
| **Scope** | All regions and projects the user has access to |
| **Output** | Ranked table of regions/projects with available capacity |
| **Action** | Read-only analysis β does NOT deploy. Hands off to preset or customize |
| **Authentication** | Azure CLI (`az login`) |
## When to Use This Skill
- β
User asks "where can I deploy gpt-4o?"
- β
User specifies a capacity target: "find a region with 10K TPM for gpt-4o"
- β
User wants to compare availability: "which regions have gpt-4o available?"
- β
User got a quota error and needs to find an alternative location
- β
User asks "best region and project for deploying model X"
**After discovery β hand off to [preset](../preset/SKILL.md) or [customize](../customize/SKILL.md) for actual deployment.**
## Scripts
Pre-built scripts handle the complex REST API calls and data processing. Use these instead of constructing commands manually.
| Script | Purpose | Usage |
|--------|---------|-------|
| `scripts/discover_and_rank.ps1` | Full discovery: capacity + projects + ranking | Primary script for capacity discovery |
| `scripts/discover_and_rank.sh` | Same as above (bash) | Primary script for capacity discovery |
| `scripts/query_capacity.ps1` | Raw capacity query (no project matching) | Quick capacity check or version listing |
| `scripts/query_capacity.sh` | Same as above (bash) | Quick capacity check or version listing |
## Workflow
### Phase 1: Validate Prerequisites
```bash
az account show --query "{Subscription:name, SubscriptionId:id}" --output table
```
### Phase 2: Identify Model and Version
Extract model name from user prompt. If version is unknown, query available versions:
```powershell
.\scripts\query_capacity.ps1 -ModelName <model-name>
```
```bash
./scripts/query_capacity.sh <model-name>
```
This lists available versions. Use the latest version unless user specifies otherwise.
### Phase 3: Run Discovery
Run the full discovery script with model name, version, and minimum capacity target:
```powershell
.\scripts\discover_and_rank.ps1 -ModelName <model-name> -ModelVersion <version> -MinCapacity <target>
```
```bash
./scripts/discover_and_rank.sh <model-name> <version> <min-capacity>
```
> π‘ The script automatically queries capacity across ALL regions, cross-references with the user's existing projects, and outputs a ranked table sorted by: meets target β project count β available capacity.
### Phase 3.5: Validate Subscription Quota
After discovery identifies candidate regions, validate that the user's subscription actually has available quota in each region. Model capacity (from Phase 3) shows what the platform can support, but subscription quota limits what this specific user can deploy.
```powershell
# For each candidate region from discovery results:
$usageData = az cognitiveservices usage list --location <region> --subscription $SUBSCRIPTION_ID -o json 2>$null | ConvertFrom-Json
# Check quota for each SKU the model supports
# Quota names follow pattern: OpenAI.<SKU>.<model-name>
$usageEntry = $usageData | Where-Object { $_.name.value -eq "OpenAI.<SKU>.<model-name>" }
if ($usageEntry) {
$quotaAvailable = $usageEntry.limit - $usageEntry.currentValue
} else {
$quotaAvailable = 0 # No quota allocated
}
```
```bash
# For each candidate region from discovery results:
usage_json=$(az cognitiveservices usage list --location <region> --subscription "$SUBSCRIPTION_ID" -o json 2>/dev/null)
# Extract quota for specific SKU+model
quota_available=$(echo "$usage_json" | jq -r --arg name "OpenAI.<SKU>.<model-name>" \
'.[] | select(.name.value == $name) | .limit - .currentValue')
```
**Annotate discovery results:**
Add a "Quota Available" column to the ranked output from Phase 3:
| Region | Available Capacity | Meets Target | Projects | Quota Available |
|--------|-------------------|--------------|----------|-----------------|
| eastus2 | 120K TPM | β
| 3 | β
80K |
| westus3 | 90K TPM | β
| 1 | β 0 (at limit) |
| swedencentral | 100K TPM | β
| 0 | β
100K |
Regions/SKUs where `quotaAvailable = 0` should be marked with β in the results. If no region has available quota, hand off to the [quota skill](../../../quota/quota.md) for increase requests and troubleshooting.
### Phase 4: Present Results and Hand Off
After the script outputs the ranked table (now annotated with quota info), present it to the user and ask:
1. π **Quick deploy** to top recommendation with defaults β route to [preset](../preset/SKILL.md)
2. βοΈ **Custom deploy** with version/SKU/capacity/RAI selection β route to [customize](../customize/SKILL.md)
3. π **Check another model** or capacity target β re-run Phase 2
4. β Cancel
### Phase 5: Confirm Project Before Deploying
Before handing off to preset or customize, **always confirm the target project** with the user. See the [Project Selection](../SKILL.md#project-selection-all-modes) rules in the parent router.
If the discovery table shows a sample project for the chosen region, suggest it as the default. Otherwise, query projects in that region and let the user pick.
## Error Handling
| Error | Cause | Resolution |
|-------|-------|------------|
| "No capacity found" | Model not available or all at quota | Hand off to [quota skill](../../../quota/quota.md) for increase requests and troubleshooting |
| Script auth error | `az login` expired | Re-run `az login` |
| Empty version list | Model not in region catalog | Try a different region: `./scripts/query_capacity.sh <model> "" eastus` |
| "No projects found" | No AI Services resources | Guide to `project/create` skill or Azure Portal |
## Related Skills
- **[preset](../preset/SKILL.md)** β Quick deployment after capacity discovery
- **[customize](../customize/SKILL.md)** β Custom deployment after capacity discovery
- **[quota](../../../quota/quota.md)** β For quota viewing, increase requests, and troubleshooting quota errors, defer to this skill instead of duplicating guidance
discover_and_rank.ps1 4.6 KB
<#
.SYNOPSIS
Discovers available capacity for an Azure OpenAI model across all regions,
cross-references with existing projects and subscription quota, and outputs a ranked table.
.PARAMETER ModelName
The model name (e.g., "gpt-4o", "o3-mini")
.PARAMETER ModelVersion
The model version (e.g., "2025-01-31")
.PARAMETER MinCapacity
Minimum required capacity in K TPM units (default: 0, shows all)
.EXAMPLE
.\discover_and_rank.ps1 -ModelName o3-mini -ModelVersion 2025-01-31 -MinCapacity 200
#>
param(
[Parameter(Mandatory)][string]$ModelName,
[Parameter(Mandatory)][string]$ModelVersion,
[int]$MinCapacity = 0
)
$ErrorActionPreference = "Stop"
$subId = az account show --query id -o tsv
# Query model capacity across all regions
$capRaw = az rest --method GET `
--url "https://management.azure.com/subscriptions/$subId/providers/Microsoft.CognitiveServices/modelCapacities" `
--url-parameters api-version=2024-10-01 modelFormat=OpenAI modelName=$ModelName modelVersion=$ModelVersion `
2>$null | Out-String | ConvertFrom-Json
# Query all AI Foundry projects (AIProject kind)
$projRaw = az rest --method GET `
--url "https://management.azure.com/subscriptions/$subId/providers/Microsoft.CognitiveServices/accounts" `
--url-parameters api-version=2024-10-01 `
--query "value[?kind=='AIProject'].{Name:name, Location:location}" `
2>$null | Out-String | ConvertFrom-Json
# Build capacity map (GlobalStandard only, pick max per region)
$capMap = @{}
foreach ($item in $capRaw.value) {
$sku = $item.properties.skuName
$avail = [int]$item.properties.availableCapacity
$region = $item.location
if ($sku -eq "GlobalStandard" -and $avail -gt 0) {
if (-not $capMap[$region] -or $avail -gt $capMap[$region]) {
$capMap[$region] = $avail
}
}
}
# Build project map
$projMap = @{}
$projSample = @{}
foreach ($p in $projRaw) {
$loc = $p.Location
if (-not $projMap[$loc]) { $projMap[$loc] = 0 }
$projMap[$loc]++
if (-not $projSample[$loc]) { $projSample[$loc] = $p.Name }
}
# Check subscription quota per region
$quotaMap = @{}
$checkedRegions = @{}
foreach ($region in $capMap.Keys) {
if ($checkedRegions[$region]) { continue }
$checkedRegions[$region] = $true
try {
$usageData = az cognitiveservices usage list --location $region --subscription $subId -o json 2>$null | Out-String | ConvertFrom-Json
$usageEntry = $usageData | Where-Object { $_.name.value -eq "OpenAI.GlobalStandard.$ModelName" }
if ($usageEntry) {
$quotaMap[$region] = [int]$usageEntry.limit - [int]$usageEntry.currentValue
} else {
$quotaMap[$region] = 0
}
} catch {
$quotaMap[$region] = -1 # Unable to check
}
}
# Combine and rank
$results = foreach ($region in $capMap.Keys) {
$avail = $capMap[$region]
$meets = $avail -ge $MinCapacity
$quota = if ($quotaMap[$region]) { $quotaMap[$region] } else { 0 }
$quotaDisplay = if ($quota -eq -1) { "?" } elseif ($quota -gt 0) { "${quota}K" } else { "0" }
$quotaOk = $quota -gt 0 -or $quota -eq -1
[PSCustomObject]@{
Region = $region
AvailableTPM = "${avail}K"
AvailableRaw = $avail
MeetsTarget = if ($meets) { "YES" } else { "no" }
Projects = if ($projMap[$region]) { $projMap[$region] } else { 0 }
SampleProject = if ($projSample[$region]) { $projSample[$region] } else { "(none)" }
QuotaAvailable = $quotaDisplay
QuotaOk = $quotaOk
}
}
$results = $results | Sort-Object @{Expression={$_.MeetsTarget -eq "YES"}; Descending=$true},
@{Expression={$_.QuotaOk}; Descending=$true},
@{Expression={$_.Projects}; Descending=$true},
@{Expression={$_.AvailableRaw}; Descending=$true}
# Output summary
$total = ($results | Measure-Object).Count
$matching = ($results | Where-Object { $_.MeetsTarget -eq "YES" } | Measure-Object).Count
$withQuota = ($results | Where-Object { $_.MeetsTarget -eq "YES" -and $_.QuotaOk } | Measure-Object).Count
$withProjects = ($results | Where-Object { $_.MeetsTarget -eq "YES" -and $_.Projects -gt 0 } | Measure-Object).Count
Write-Host "Model: $ModelName v$ModelVersion | SKU: GlobalStandard | Min Capacity: ${MinCapacity}K TPM"
Write-Host "Regions with capacity: $total | Meets target: $matching | With quota: $withQuota | With projects: $withProjects"
Write-Host ""
$results | Select-Object Region, AvailableTPM, MeetsTarget, QuotaAvailable, Projects, SampleProject | Format-Table -AutoSize
discover_and_rank.sh 4.5 KB
#!/bin/bash
# discover_and_rank.sh
# Discovers available capacity for an Azure OpenAI model across all regions,
# cross-references with existing projects and subscription quota, and outputs a ranked table.
#
# Usage: ./discover_and_rank.sh <model-name> <model-version> [min-capacity]
# Example: ./discover_and_rank.sh o3-mini 2025-01-31 200
#
# Output: Ranked table of regions with capacity, quota, project counts, and match status
set -euo pipefail
MODEL_NAME="${1:?Usage: $0 <model-name> <model-version> [min-capacity]}"
MODEL_VERSION="${2:?Usage: $0 <model-name> <model-version> [min-capacity]}"
MIN_CAPACITY="${3:-0}"
SUB_ID=$(az account show --query id -o tsv)
# Query model capacity across all regions (GlobalStandard SKU)
CAPACITY_JSON=$(az rest --method GET \
--url "https://management.azure.com/subscriptions/${SUB_ID}/providers/Microsoft.CognitiveServices/modelCapacities" \
--url-parameters api-version=2024-10-01 modelFormat=OpenAI modelName="$MODEL_NAME" modelVersion="$MODEL_VERSION" \
2>/dev/null)
# Query all AI Services projects
PROJECTS_JSON=$(az rest --method GET \
--url "https://management.azure.com/subscriptions/${SUB_ID}/providers/Microsoft.CognitiveServices/accounts" \
--url-parameters api-version=2024-10-01 \
--query "value[?kind=='AIServices'].{name:name, location:location}" \
2>/dev/null)
# Get unique regions from capacity results for quota checking
REGIONS=$(echo "$CAPACITY_JSON" | jq -r '.value[] | select(.properties.skuName=="GlobalStandard" and .properties.availableCapacity > 0) | .location' | sort -u)
# Build quota map: check subscription quota per region
declare -A QUOTA_MAP
for region in $REGIONS; do
usage_json=$(az cognitiveservices usage list --location "$region" --subscription "$SUB_ID" -o json 2>/dev/null || echo "[]")
quota_avail=$(echo "$usage_json" | jq -r --arg name "OpenAI.GlobalStandard.$MODEL_NAME" \
'[.[] | select(.name.value == $name)] | if length > 0 then .[0].limit - .[0].currentValue else 0 end')
QUOTA_MAP[$region]="${quota_avail:-0}"
done
# Export quota map as JSON for Python
QUOTA_JSON="{"
first=true
for region in "${!QUOTA_MAP[@]}"; do
if [ "$first" = true ]; then first=false; else QUOTA_JSON+=","; fi
QUOTA_JSON+="\"$region\":${QUOTA_MAP[$region]}"
done
QUOTA_JSON+="}"
# Combine, rank, and output using inline Python (available on all Azure CLI installs)
python3 -c "
import json, sys
capacity = json.loads('''${CAPACITY_JSON}''')
projects = json.loads('''${PROJECTS_JSON}''')
quota = json.loads('''${QUOTA_JSON}''')
min_cap = int('${MIN_CAPACITY}')
# Build capacity map (GlobalStandard only)
cap_map = {}
for item in capacity.get('value', []):
props = item.get('properties', {})
if props.get('skuName') == 'GlobalStandard' and props.get('availableCapacity', 0) > 0:
region = item.get('location', '')
cap_map[region] = max(cap_map.get(region, 0), props['availableCapacity'])
# Build project count map
proj_map = {}
proj_sample = {}
for p in (projects if isinstance(projects, list) else []):
loc = p.get('location', '')
proj_map[loc] = proj_map.get(loc, 0) + 1
if loc not in proj_sample:
proj_sample[loc] = p.get('name', '')
# Combine and rank
results = []
for region, cap in cap_map.items():
meets = cap >= min_cap
q = quota.get(region, 0)
quota_ok = q > 0
results.append({
'region': region,
'available': cap,
'meets': meets,
'projects': proj_map.get(region, 0),
'sample': proj_sample.get(region, '(none)'),
'quota': q,
'quota_ok': quota_ok
})
# Sort: meets target first, then quota available, then by project count, then by capacity
results.sort(key=lambda x: (-x['meets'], -x['quota_ok'], -x['projects'], -x['available']))
# Output
total = len(results)
matching = sum(1 for r in results if r['meets'])
with_quota = sum(1 for r in results if r['meets'] and r['quota_ok'])
with_projects = sum(1 for r in results if r['meets'] and r['projects'] > 0)
print(f'Model: {\"${MODEL_NAME}\"} v{\"${MODEL_VERSION}\"} | SKU: GlobalStandard | Min Capacity: {min_cap}K TPM')
print(f'Regions with capacity: {total} | Meets target: {matching} | With quota: {with_quota} | With projects: {with_projects}')
print()
print(f'{\"Region\":<22} {\"Available\":<12} {\"Meets Target\":<14} {\"Quota\":<12} {\"Projects\":<10} {\"Sample Project\"}')
print('-' * 100)
for r in results:
mark = 'YES' if r['meets'] else 'no'
q_display = f'{r[\"quota\"]}K' if r['quota'] > 0 else '0 (none)'
print(f'{r[\"region\"]:<22} {r[\"available\"]}K{\"\":.<10} {mark:<14} {q_display:<12} {r[\"projects\"]:<10} {r[\"sample\"]}')
"
query_capacity.ps1 3.0 KB
<#
.SYNOPSIS
Queries available capacity for an Azure OpenAI model and validates if a target is achievable.
.PARAMETER ModelName
The model name (e.g., "gpt-4o", "o3-mini")
.PARAMETER ModelVersion
The model version (e.g., "2025-01-31"). If omitted, lists available versions.
.PARAMETER Region
Optional. Check capacity in a specific region only.
.PARAMETER SKU
SKU to check (default: GlobalStandard)
.EXAMPLE
.\query_capacity.ps1 -ModelName o3-mini
.\query_capacity.ps1 -ModelName o3-mini -ModelVersion 2025-01-31 -Region eastus2
#>
param(
[Parameter(Mandatory)][string]$ModelName,
[string]$ModelVersion,
[string]$Region,
[string]$SKU = "GlobalStandard"
)
$ErrorActionPreference = "Stop"
$subId = az account show --query id -o tsv
# If no version provided, list available versions first
if (-not $ModelVersion) {
Write-Host "Available versions for $ModelName`:"
$loc = if ($Region) { $Region } else { "eastus" }
az cognitiveservices model list --location $loc `
--query "[?model.name=='$ModelName'].{Version:model.version, Format:model.format}" `
--output table 2>$null
return
}
# Build URL parameters
$urlParams = @("api-version=2024-10-01", "modelFormat=OpenAI", "modelName=$ModelName", "modelVersion=$ModelVersion")
if ($Region) {
$url = "https://management.azure.com/subscriptions/$subId/providers/Microsoft.CognitiveServices/locations/$Region/modelCapacities"
} else {
$url = "https://management.azure.com/subscriptions/$subId/providers/Microsoft.CognitiveServices/modelCapacities"
}
$raw = az rest --method GET --url $url --url-parameters @urlParams 2>$null | Out-String | ConvertFrom-Json
# Filter by SKU
$filtered = $raw.value | Where-Object { $_.properties.skuName -eq $SKU -and $_.properties.availableCapacity -gt 0 }
if (-not $filtered) {
Write-Host "No capacity found for $ModelName v$ModelVersion ($SKU)" -ForegroundColor Red
Write-Host "Try a different SKU or version."
return
}
Write-Host "Capacity: $ModelName v$ModelVersion ($SKU)"
Write-Host ""
$filtered | ForEach-Object {
# Check subscription quota for this region
$quotaDisplay = "?"
try {
$usageData = az cognitiveservices usage list --location $_.location --subscription $subId -o json 2>$null | Out-String | ConvertFrom-Json
$usageEntry = $usageData | Where-Object { $_.name.value -eq "OpenAI.$SKU.$ModelName" }
if ($usageEntry) {
$quotaAvail = [int]$usageEntry.limit - [int]$usageEntry.currentValue
$quotaDisplay = if ($quotaAvail -gt 0) { "${quotaAvail}K" } else { "0 (at limit)" }
} else {
$quotaDisplay = "0 (none)"
}
} catch {
$quotaDisplay = "?"
}
[PSCustomObject]@{
Region = $_.location
SKU = $_.properties.skuName
Available = "$($_.properties.availableCapacity)K TPM"
Quota = $quotaDisplay
}
} | Sort-Object { [int]($_.Available -replace '[^\d]','') } -Descending | Format-Table -AutoSize
query_capacity.sh 2.9 KB
#!/bin/bash
# query_capacity.sh
# Queries available capacity for an Azure OpenAI model.
#
# Usage:
# ./query_capacity.sh <model-name> [model-version] [region] [sku]
# Examples:
# ./query_capacity.sh o3-mini # List versions
# ./query_capacity.sh o3-mini 2025-01-31 # All regions
# ./query_capacity.sh o3-mini 2025-01-31 eastus2 # Specific region
# ./query_capacity.sh o3-mini 2025-01-31 "" Standard # Different SKU
set -euo pipefail
MODEL_NAME="${1:?Usage: $0 <model-name> [model-version] [region] [sku]}"
MODEL_VERSION="${2:-}"
REGION="${3:-}"
SKU="${4:-GlobalStandard}"
SUB_ID=$(az account show --query id -o tsv)
# If no version, list available versions
if [ -z "$MODEL_VERSION" ]; then
LOC="${REGION:-eastus}"
echo "Available versions for $MODEL_NAME:"
az cognitiveservices model list --location "$LOC" \
--query "[?model.name=='$MODEL_NAME'].{Version:model.version, Format:model.format}" \
--output table 2>/dev/null
exit 0
fi
# Build URL
if [ -n "$REGION" ]; then
URL="https://management.azure.com/subscriptions/${SUB_ID}/providers/Microsoft.CognitiveServices/locations/${REGION}/modelCapacities"
else
URL="https://management.azure.com/subscriptions/${SUB_ID}/providers/Microsoft.CognitiveServices/modelCapacities"
fi
# Query capacity
CAPACITY_RESULT=$(az rest --method GET --url "$URL" \
--url-parameters api-version=2024-10-01 modelFormat=OpenAI modelName="$MODEL_NAME" modelVersion="$MODEL_VERSION" \
2>/dev/null)
# Get regions with capacity
REGIONS_WITH_CAP=$(echo "$CAPACITY_RESULT" | jq -r ".value[] | select(.properties.skuName==\"$SKU\" and .properties.availableCapacity > 0) | .location" 2>/dev/null | sort -u)
if [ -z "$REGIONS_WITH_CAP" ]; then
echo "No capacity found for $MODEL_NAME v$MODEL_VERSION ($SKU)"
echo "Try a different SKU or version."
exit 0
fi
echo "Capacity: $MODEL_NAME v$MODEL_VERSION ($SKU)"
echo ""
printf "%-22s %-12s %-15s %s\n" "Region" "Available" "Quota" "SKU"
printf -- '-%.0s' {1..60}; echo ""
for region in $REGIONS_WITH_CAP; do
avail=$(echo "$CAPACITY_RESULT" | jq -r ".value[] | select(.location==\"$region\" and .properties.skuName==\"$SKU\") | .properties.availableCapacity" 2>/dev/null | head -1)
# Check subscription quota
usage_json=$(az cognitiveservices usage list --location "$region" --subscription "$SUB_ID" -o json 2>/dev/null || echo "[]")
quota_avail=$(echo "$usage_json" | jq -r --arg name "OpenAI.$SKU.$MODEL_NAME" \
'[.[] | select(.name.value == $name)] | if length > 0 then .[0].limit - .[0].currentValue else 0 end' 2>/dev/null || echo "?")
if [ "$quota_avail" = "0" ]; then
quota_display="0 (none)"
elif [ "$quota_avail" = "?" ]; then
quota_display="?"
else
quota_display="${quota_avail}K"
fi
printf "%-22s %-12s %-15s %s\n" "$region" "${avail}K TPM" "$quota_display" "$SKU"
done
EXAMPLES.md 4.3 KB
# customize Examples
## Example 1: Basic Deployment with Defaults
**Scenario:** Deploy gpt-4o accepting all defaults for quick setup.
**Config:** gpt-4o / GlobalStandard / 10K TPM / Dynamic Quota enabled
**Result:** Deployment `gpt-4o` created in ~2-3 min with auto-upgrade enabled.
## Example 2: Production Deployment with Custom Capacity
**Scenario:** Deploy gpt-4o for production with high throughput.
**Config:** gpt-4o / GlobalStandard / 50K TPM / Dynamic Quota / Name: `gpt-4o-production`
**Result:** 50K TPM (500 req/10s). Suitable for moderate-to-high traffic production apps.
## Example 3: PTU Deployment for High-Volume Workload
**Scenario:** Deploy gpt-4o with reserved capacity (PTU) for predictable workload.
**Config:** gpt-4o / ProvisionedManaged / 200 PTU (min 50, max 1000) / Priority Processing enabled
**PTU sizing:** 40K input + 20K output tokens/min β ~100 PTU estimated β 200 PTU recommended (2x headroom)
**Result:** Guaranteed throughput, fixed monthly cost. Use case: customer service bots, document pipelines.
## Example 4: Development Deployment with Standard SKU
**Scenario:** Deploy gpt-4o-mini for dev/testing with minimal cost.
**Config:** gpt-4o-mini / Standard / 1K TPM / Name: `gpt-4o-mini-dev`
**Result:** 1K TPM, 10 req/10s. Minimal pay-per-use cost for development and prototyping.
## Example 5: Spillover Configuration
**Scenario:** Deploy gpt-4o with spillover to handle peak load overflow.
**Config:** gpt-4o / GlobalStandard / 20K TPM / Dynamic Quota / Spillover β `gpt-4o-backup`
**Result:** Primary handles up to 20K TPM; overflow auto-redirects to backup deployment.
## Example 6: Anthropic Model Deployment (claude-sonnet-4-6)
**Scenario:** Deploy claude-sonnet-4-6 with customized settings.
**Config:** claude-sonnet-4-6 / GlobalStandard / capacity 1 (MaaS) / Industry: Healthcare / No RAI policy (Anthropic manages content filtering)
**Result:** User selected "Healthcare" as industry β tenant country code (US) and org name fetched automatically β deployed via ARM REST API with `modelProviderData` in ~2 min.
---
## Comparison Matrix
| Scenario | Model | SKU | Capacity | Dynamic Quota | Priority | Spillover | Use Case |
|----------|-------|-----|----------|:---:|:---:|:---:|----------|
| Ex 1 | gpt-4o | GlobalStandard | 10K TPM | β | - | - | Quick setup |
| Ex 2 | gpt-4o | GlobalStandard | 50K TPM | β | - | - | Production |
| Ex 3 | gpt-4o | ProvisionedManaged | 200 PTU | - | β | - | Predictable workload |
| Ex 4 | gpt-4o-mini | Standard | 1K TPM | - | - | - | Dev/testing |
| Ex 5 | gpt-4o | GlobalStandard | 20K TPM | β | - | β | Peak load |
| Ex 6 | claude-sonnet-4-6 | GlobalStandard | 1 (MaaS) | - | - | - | Anthropic model |
## Common Patterns
### Dev β Staging β Production
| Stage | Model | SKU | Capacity | Extras |
|-------|-------|-----|----------|--------|
| Dev | gpt-4o-mini | Standard | 1K TPM | β |
| Staging | gpt-4o | GlobalStandard | 10K TPM | β |
| Production | gpt-4o | GlobalStandard | 50K TPM | Dynamic Quota + Spillover |
### Cost Optimization
- **High priority:** gpt-4o, ProvisionedManaged, 100 PTU, Priority Processing
- **Low priority:** gpt-4o-mini, Standard, 5K TPM
---
## Tips and Best Practices
**Capacity:** Start conservative β monitor with Azure Monitor β scale gradually β use spillover for peaks.
**SKU Selection:** Standard for dev β GlobalStandard + dynamic quota for variable production β ProvisionedManaged (PTU) for predictable load.
**Cost:** Right-size capacity; use gpt-4o-mini where possible (80-90% accuracy at lower cost); enable dynamic quota; consider PTU for consistent high-volume.
**Versions:** Auto-upgrade recommended; test new versions in staging first; pin only if compatibility requires it.
**Content Filtering:** Start with DefaultV2; use custom policies only for specific needs; monitor filtered requests.
---
## Troubleshooting
| Problem | Solution |
|---------|----------|
| `QuotaExceeded` | Check usage with `az cognitiveservices usage list`, reduce capacity, try different SKU, check other regions, or use the [quota skill](../../../quota/quota.md) to request an increase |
| Version not available for SKU | Check `az cognitiveservices account list-models --query "[?name=='gpt-4o'].version"`, use latest |
| Deployment name exists | Skill auto-generates unique name (e.g., `gpt-4o-2`), or specify custom name |
SKILL.md 8.7 KB
---
name: customize
description: "Interactive guided deployment flow for Azure OpenAI models with full customization control. Step-by-step selection of model version, SKU (GlobalStandard/Standard/ProvisionedManaged), capacity, RAI policy (content filter), and advanced options (dynamic quota, priority processing, spillover). USE FOR: custom deployment, customize model deployment, choose version, select SKU, set capacity, configure content filter, RAI policy, deployment options, detailed deployment, advanced deployment, PTU deployment, provisioned throughput. DO NOT USE FOR: quick deployment to optimal region (use preset)."
license: MIT
metadata:
author: Microsoft
version: "1.0.1"
---
# Customize Model Deployment
Interactive guided workflow for deploying Azure OpenAI models with full customization control over version, SKU, capacity, content filtering, and advanced options.
## Quick Reference
| Property | Description |
|----------|-------------|
| **Flow** | Interactive step-by-step guided deployment |
| **Customization** | Version, SKU, Capacity, RAI Policy, Advanced Options |
| **SKU Support** | GlobalStandard, Standard, ProvisionedManaged, DataZoneStandard |
| **Best For** | Precise control over deployment configuration |
| **Authentication** | Azure CLI (`az login`) |
| **Tools** | Azure CLI, MCP tools (optional) |
## When to Use This Skill
Use this skill when you need **precise control** over deployment configuration:
- β
**Choose specific model version** (not just latest)
- β
**Select deployment SKU** (GlobalStandard vs Standard vs PTU)
- β
**Set exact capacity** within available range
- β
**Configure content filtering** (RAI policy selection)
- β
**Enable advanced features** (dynamic quota, priority processing, spillover)
- β
**PTU deployments** (Provisioned Throughput Units)
**Alternative:** Use `preset` for quick deployment to the best available region with automatic configuration.
### Comparison: customize vs preset
| Feature | customize | preset |
|---------|---------------------|----------------------------|
| **Focus** | Full customization control | Optimal region selection |
| **Version Selection** | User chooses from available | Uses latest automatically |
| **SKU Selection** | User chooses (GlobalStandard/Standard/PTU) | GlobalStandard only |
| **Capacity** | User specifies exact value | Auto-calculated (50% of available) |
| **RAI Policy** | User selects from options | Default policy only |
| **Region** | Current region first, falls back to all regions if no capacity | Checks capacity across all regions upfront |
| **Use Case** | Precise deployment requirements | Quick deployment to best region |
## Prerequisites
- Azure subscription with Cognitive Services Contributor or Owner role
- Azure AI Foundry project resource ID (format: `/subscriptions/{sub}/resourceGroups/{rg}/providers/Microsoft.CognitiveServices/accounts/{account}/projects/{project}`)
- Azure CLI installed and authenticated (`az login`)
- Optional: Set `PROJECT_RESOURCE_ID` environment variable
## Workflow Overview
### Complete Flow (14 Phases)
```
1. Verify Authentication
2. Get Project Resource ID
3. Verify Project Exists
4. Get Model Name (if not provided)
5. List Model Versions β User Selects
6. List SKUs for Version β User Selects
7. Get Capacity Range β User Configures
7b. If no capacity: Cross-Region Fallback β Query all regions β User selects region/project
8. List RAI Policies β User Selects
9. Configure Advanced Options (if applicable)
10. Configure Version Upgrade Policy
11. Generate Deployment Name
12. Review Configuration
13. Execute Deployment & Monitor
```
### Fast Path (Defaults)
If user accepts all defaults (latest version, GlobalStandard SKU, recommended capacity, default RAI policy, standard upgrade policy), deployment completes in ~5 interactions.
---
## Phase Summaries
> β οΈ **MUST READ:** Before executing any phase, load [references/customize-workflow.md](references/customize-workflow.md) for the full scripts and implementation details. The summaries below describe *what* each phase does β the reference file contains the *how* (CLI commands, quota patterns, capacity formulas, cross-region fallback logic).
| Phase | Action | Key Details |
|-------|--------|-------------|
| **1. Verify Auth** | Check `az account show`; prompt `az login` if needed | Verify correct subscription is active |
| **2. Get Project ID** | Read `PROJECT_RESOURCE_ID` env var or prompt user | ARM resource ID format required |
| **3. Verify Project** | Parse resource ID, call `az cognitiveservices account show` | Extracts subscription, RG, account, project, region |
| **4. Get Model** | List models via `az cognitiveservices account list-models` | User selects from available or enters custom name |
| **5. Select Version** | Query versions for chosen model | Recommend latest; user picks from list |
| **6. Select SKU** | Query model catalog + subscription quota, show only deployable SKUs | β οΈ Never hardcode SKU lists β always query live data |
| **7. Configure Capacity** | Query capacity API, validate min/max/step, user enters value | Cross-region fallback if no capacity in current region |
| **8. Select RAI Policy** | Present content filter options | Default: `Microsoft.DefaultV2` |
| **9. Advanced Options** | Dynamic quota (GlobalStandard), priority processing (PTU), spillover | SKU-dependent availability |
| **10. Upgrade Policy** | Choose: OnceNewDefaultVersionAvailable / OnceCurrentVersionExpired / NoAutoUpgrade | Default: auto-upgrade on new default |
| **11. Deployment Name** | Auto-generate unique name, allow custom override | Validates format: `^[\w.-]{2,64}$` |
| **12. Review** | Display full config summary, confirm before proceeding | User approves or cancels |
| **13. Deploy & Monitor** | `az cognitiveservices account deployment create`, poll status | Timeout after 5 min; show endpoint + portal link |
---
## Error Handling
### Common Issues and Resolutions
| Error | Cause | Resolution |
|-------|-------|------------|
| **Model not found** | Invalid model name | List available models with `az cognitiveservices account list-models` |
| **Version not available** | Version not supported for SKU | Select different version or SKU |
| **Insufficient quota** | Capacity > available quota | Skill auto-searches all regions; fails only if no region has quota |
| **SKU not supported** | SKU not available in region | Cross-region fallback searches other regions automatically |
| **Capacity out of range** | Invalid capacity value | **PREVENTED**: Skill validates min/max/step at input (Phase 7) |
| **Deployment name exists** | Name conflict | Auto-incremented name generation |
| **Authentication failed** | Not logged in | Run `az login` |
| **Permission denied** | Insufficient permissions | Assign Cognitive Services Contributor role |
| **Capacity query fails** | API/permissions/network error | **DEPLOYMENT BLOCKED**: Will not proceed without valid quota data |
### Troubleshooting Commands
```bash
# Check deployment status
az cognitiveservices account deployment show --name <account> --resource-group <rg> --deployment-name <name>
# List all deployments
az cognitiveservices account deployment list --name <account> --resource-group <rg> -o table
# Check quota usage
az cognitiveservices usage list --name <account> --resource-group <rg>
# Delete failed deployment
az cognitiveservices account deployment delete --name <account> --resource-group <rg> --deployment-name <name>
```
---
## Selection Guides & Advanced Topics
> For SKU comparison tables, PTU sizing formulas, and advanced option details, load [references/customize-guides.md](references/customize-guides.md).
**SKU selection:** GlobalStandard (production/HA) β Standard (dev/test) β ProvisionedManaged (high-volume/guaranteed throughput) β DataZoneStandard (data residency).
**Capacity:** TPM-based SKUs range from 1K (dev) to 100K+ (large production). PTU-based use formula: `(Input TPM Γ 0.001) + (Output TPM Γ 0.002) + (Requests/min Γ 0.1)`.
**Advanced options:** Dynamic quota (GlobalStandard only), priority processing (PTU only, extra cost), spillover (overflow to backup deployment).
---
## Related Skills
- **preset** - Quick deployment to best region with automatic configuration
- **microsoft-foundry** - Parent skill for all Azure AI Foundry operations
- **[quota](../../../quota/quota.md)** β For quota viewing, increase requests, and troubleshooting quota errors, defer to this skill instead of duplicating guidance
- **rbac** - Manage permissions and access control
---
## Notes
- Set `PROJECT_RESOURCE_ID` environment variable to skip prompt
- Not all SKUs available in all regions; capacity varies by subscription/region/model
- Custom RAI policies can be configured in Azure Portal
- Automatic version upgrades occur during maintenance windows
- Use Azure Monitor and Application Insights for production deployments customize-guides.md 3.4 KB
# Customize Guides β Selection Guides & Advanced Topics
> Reference for: `models/deploy-model/customize/SKILL.md`
**Table of Contents:** [Selection Guides](#selection-guides) Β· [Advanced Topics](#advanced-topics)
## Selection Guides
### How to Choose SKU
| SKU | Best For | Cost | Availability |
|-----|----------|------|--------------|
| **GlobalStandard** | Production, high availability | Medium | Multi-region |
| **Standard** | Development, testing | Low | Single region |
| **ProvisionedManaged** | High-volume, predictable workloads | Fixed (PTU) | Reserved capacity |
| **DataZoneStandard** | Data residency requirements | Medium | Specific zones |
**Decision Tree:**
```
Do you need guaranteed throughput?
ββ Yes β ProvisionedManaged (PTU)
ββ No β Do you need high availability?
ββ Yes β GlobalStandard
ββ No β Standard
```
### How to Choose Capacity
**For TPM-based SKUs (GlobalStandard, Standard):**
| Workload | Recommended Capacity |
|----------|---------------------|
| Development/Testing | 1K - 5K TPM |
| Small Production | 5K - 20K TPM |
| Medium Production | 20K - 100K TPM |
| Large Production | 100K+ TPM |
**For PTU-based SKUs (ProvisionedManaged):**
Use the PTU calculator based on:
- Input tokens per minute
- Output tokens per minute
- Requests per minute
**Capacity Planning Tips:**
- Start with recommended capacity
- Monitor usage and adjust
- Enable dynamic quota for flexibility
- Consider spillover for peak loads
### How to Choose RAI Policy
| Policy | Filtering Level | Use Case |
|--------|----------------|----------|
| **Microsoft.DefaultV2** | Balanced | Most applications |
| **Microsoft.Prompt-Shield** | Enhanced | Security-sensitive apps |
| **Custom** | Configurable | Specific requirements |
**Recommendation:** Start with `Microsoft.DefaultV2` and adjust based on application needs.
---
## Advanced Topics
### PTU (Provisioned Throughput Units) Deployments
**What is PTU?**
- Reserved capacity with guaranteed throughput
- Measured in PTU units, not TPM
- Fixed cost regardless of usage
- Best for high-volume, predictable workloads
**PTU Calculator:**
```
Estimated PTU = (Input TPM Γ 0.001) + (Output TPM Γ 0.002) + (Requests/min Γ 0.1)
Example:
- Input: 10,000 tokens/min
- Output: 5,000 tokens/min
- Requests: 100/min
PTU = (10,000 Γ 0.001) + (5,000 Γ 0.002) + (100 Γ 0.1)
= 10 + 10 + 10
= 30 PTU
```
**PTU Deployment:**
```bash
az cognitiveservices account deployment create \
--name <account-name> \
--resource-group <resource-group> \
--deployment-name <deployment-name> \
--model-name <model-name> \
--model-version <version> \
--model-format "OpenAI" \
--sku-name "ProvisionedManaged" \
--sku-capacity 100 # PTU units
```
### Spillover Configuration
**Spillover Workflow:**
1. Primary deployment receives requests
2. When capacity reached, requests overflow to spillover target
3. Spillover target must be same model or compatible
4. Configure via deployment properties
**Best Practices:**
- Use spillover for peak load handling
- Spillover target should have sufficient capacity
- Monitor both deployments
- Test failover behavior
### Priority Processing
**What is Priority Processing?**
- Prioritizes your requests during high load
- Available for ProvisionedManaged SKU
- Additional charges apply
- Ensures consistent performance
**When to Use:**
- Mission-critical applications
- SLA requirements
- High-concurrency scenarios
customize-workflow.md 13.0 KB
# Customize Workflow β Detailed Phase Instructions
> Reference for: `models/deploy-model/customize/SKILL.md`
## Phase 1: Verify Authentication
```bash
az account show --query "{Subscription:name, User:user.name}" -o table
```
If not logged in: `az login`
Set subscription if needed:
```bash
az account list --query "[].[name,id,state]" -o table
az account set --subscription <subscription-id>
```
---
## Phase 2: Get Project Resource ID
Check `PROJECT_RESOURCE_ID` env var. If not set, prompt user.
**Format:** `/subscriptions/{sub}/resourceGroups/{rg}/providers/Microsoft.CognitiveServices/accounts/{account}/projects/{project}`
---
## Phase 3: Parse and Verify Project
Parse ARM resource ID to extract components:
```powershell
$SUBSCRIPTION_ID = ($PROJECT_RESOURCE_ID -split '/')[2]
$RESOURCE_GROUP = ($PROJECT_RESOURCE_ID -split '/')[4]
$ACCOUNT_NAME = ($PROJECT_RESOURCE_ID -split '/')[8]
$PROJECT_NAME = ($PROJECT_RESOURCE_ID -split '/')[10]
```
Verify project exists and get region:
```bash
az account set --subscription $SUBSCRIPTION_ID
az cognitiveservices account show \
--name $ACCOUNT_NAME \
--resource-group $RESOURCE_GROUP \
--query location -o tsv
```
---
## Phase 4: Get Model Name
List available models if not provided:
```bash
az cognitiveservices account list-models \
--name $ACCOUNT_NAME \
--resource-group $RESOURCE_GROUP \
--query "[].name" -o json
```
Present sorted unique list. Allow custom model name entry.
**Detect model format:**
```bash
# Get model format (e.g., OpenAI, Anthropic, Meta-Llama, Mistral, Cohere)
MODEL_FORMAT=$(az cognitiveservices account list-models \
--name "$ACCOUNT_NAME" \
--resource-group "$RESOURCE_GROUP" \
--query "[?name=='$MODEL_NAME'].format" -o tsv | head -1)
MODEL_FORMAT=${MODEL_FORMAT:-"OpenAI"}
echo "Model format: $MODEL_FORMAT"
```
> π‘ **Model format determines the deployment path:**
> - `OpenAI` β Standard CLI, TPM-based capacity, RAI policies, version upgrade policies
> - `Anthropic` β REST API with `modelProviderData`, capacity=1, no RAI, no version upgrade
> - All other formats (`Meta-Llama`, `Mistral`, `Cohere`, etc.) β Standard CLI, capacity=1 (MaaS), no RAI, no version upgrade
---
## Phase 5: List and Select Model Version
```bash
az cognitiveservices account list-models \
--name $ACCOUNT_NAME \
--resource-group $RESOURCE_GROUP \
--query "[?name=='$MODEL_NAME'].version" -o json
```
Recommend latest version (first in list). Default to `"latest"` if no versions found.
---
## Phase 6: List and Select SKU
> β οΈ **Warning:** Never hardcode SKU lists β always query live data.
**Step A β Query model-supported SKUs:**
```bash
az cognitiveservices model list \
--location $PROJECT_REGION \
--subscription $SUBSCRIPTION_ID -o json
```
Filter: `model.name == $MODEL_NAME && model.version == $MODEL_VERSION`, extract `model.skus[].name`.
**Step B β Check subscription quota per SKU:**
```bash
az cognitiveservices usage list \
--location $PROJECT_REGION \
--subscription $SUBSCRIPTION_ID -o json
```
Quota key pattern: `OpenAI.<SKU>.<model-name>`. Calculate `available = limit - currentValue`.
**Step C β Present only deployable SKUs** (available > 0). If no SKUs have quota, direct user to the [quota skill](../../../../quota/quota.md).
---
## Phase 7: Configure Capacity
> β οΈ **Non-OpenAI models (MaaS):** If `MODEL_FORMAT != "OpenAI"`, capacity is always `1` (pay-per-token billing). Skip capacity configuration and set `DEPLOY_CAPACITY=1`. Proceed to Phase 7c (Anthropic) or Phase 8.
**For OpenAI models only β query capacity via REST API:**
```bash
# Current region capacity
az rest --method GET --url \
"https://management.azure.com/subscriptions/$SUBSCRIPTION_ID/providers/Microsoft.CognitiveServices/locations/$PROJECT_REGION/modelCapacities?api-version=2024-10-01&modelFormat=$MODEL_FORMAT&modelName=$MODEL_NAME&modelVersion=$MODEL_VERSION"
```
Filter result for `properties.skuName == $SELECTED_SKU`. Read `properties.availableCapacity`.
**Capacity defaults by SKU (OpenAI only):**
| SKU | Unit | Min | Max | Step | Default |
|-----|------|-----|-----|------|---------|
| ProvisionedManaged | PTU | 50 | 1000 | 50 | 100 |
| Others (TPM-based) | TPM | 1000 | min(available, 300000) | 1000 | min(10000, available/2) |
Validate user input: must be >= min, <= max, multiple of step. On invalid input, explain constraints.
### Phase 7b: Cross-Region Fallback
If no capacity in current region, query ALL regions:
```bash
az rest --method GET --url \
"https://management.azure.com/subscriptions/$SUBSCRIPTION_ID/providers/Microsoft.CognitiveServices/modelCapacities?api-version=2024-10-01&modelFormat=$MODEL_FORMAT&modelName=$MODEL_NAME&modelVersion=$MODEL_VERSION"
```
Filter: `properties.skuName == $SELECTED_SKU && properties.availableCapacity > 0`. Sort descending by capacity.
Present available regions. After user selects region, find existing projects there:
```bash
az cognitiveservices account list \
--query "[?kind=='AIProject' && location=='$PROJECT_REGION'].{Name:name, ResourceGroup:resourceGroup}" \
-o json
```
If projects exist, let user select one and update `$ACCOUNT_NAME`, `$RESOURCE_GROUP`. If none, direct to project/create skill.
Re-run capacity configuration with new region's available capacity.
If no region has capacity: fail with guidance to request quota increase, check existing deployments, or try different model/SKU.
---
## Phase 7c: Anthropic Model Provider Data (Anthropic models only)
> β οΈ **Only execute this phase if `MODEL_FORMAT == "Anthropic"`.** For OpenAI and other models, skip to Phase 8.
Anthropic models require `modelProviderData` in the deployment payload. Collect this before deployment.
**Step 1: Prompt user to select industry**
Present the following list and ask the user to choose one:
```
1. None (API value: none)
2. Biotechnology (API value: biotechnology)
3. Consulting (API value: consulting)
4. Education (API value: education)
5. Finance (API value: finance)
6. Food & Beverage (API value: food_and_beverage)
7. Government (API value: government)
8. Healthcare (API value: healthcare)
9. Insurance (API value: insurance)
10. Law (API value: law)
11. Manufacturing (API value: manufacturing)
12. Media (API value: media)
13. Nonprofit (API value: nonprofit)
14. Technology (API value: technology)
15. Telecommunications (API value: telecommunications)
16. Sport & Recreation (API value: sport_and_recreation)
17. Real Estate (API value: real_estate)
18. Retail (API value: retail)
19. Other (API value: other)
```
> β οΈ **Do NOT pick a default industry or hardcode a value. Always ask the user.** This is required by Anthropic's terms of service. The industry list is static β there is no REST API that provides it.
Store selection as `SELECTED_INDUSTRY` (use the API value, e.g., `technology`).
**Step 2: Fetch tenant info (country code and organization name)**
```bash
TENANT_INFO=$(az rest --method GET \
--url "https://management.azure.com/tenants?api-version=2024-11-01" \
--query "value[0].{countryCode:countryCode, displayName:displayName}" -o json)
COUNTRY_CODE=$(echo "$TENANT_INFO" | jq -r '.countryCode')
ORG_NAME=$(echo "$TENANT_INFO" | jq -r '.displayName')
```
*PowerShell version:*
```powershell
$tenantInfo = az rest --method GET `
--url "https://management.azure.com/tenants?api-version=2024-11-01" `
--query "value[0].{countryCode:countryCode, displayName:displayName}" -o json | ConvertFrom-Json
$countryCode = $tenantInfo.countryCode
$orgName = $tenantInfo.displayName
```
Store `COUNTRY_CODE` and `ORG_NAME` for use in Phase 13.
---
## Phase 8: Select RAI Policy (Content Filter)
> β οΈ **Note:** RAI policies only apply to OpenAI models. Skip this phase if `MODEL_FORMAT != "OpenAI"` (Anthropic, Meta-Llama, Mistral, Cohere, etc. do not use RAI policies).
Present options:
1. `Microsoft.DefaultV2` β Balanced filtering (recommended). Filters hate, violence, sexual, self-harm.
2. `Microsoft.Prompt-Shield` β Enhanced prompt injection/jailbreak protection.
3. Custom policies β Organization-specific (configured in Azure Portal).
Default: `Microsoft.DefaultV2`.
---
## Phase 9: Configure Advanced Options
Options are SKU-dependent:
**A. Dynamic Quota** (GlobalStandard only)
- Auto-scales beyond base allocation when capacity available
- Default: enabled
**B. Priority Processing** (ProvisionedManaged only)
- Prioritizes requests during high load; additional charges apply
- Default: disabled
**C. Spillover** (any SKU)
- Redirects requests to backup deployment at capacity
- Requires existing deployment; list with:
```bash
az cognitiveservices account deployment list \
--name $ACCOUNT_NAME \
--resource-group $RESOURCE_GROUP \
--query "[].name" -o json
```
- Default: disabled
---
## Phase 10: Configure Version Upgrade Policy
> β οΈ **Note:** Version upgrade policies only apply to OpenAI models. Skip this phase if `MODEL_FORMAT != "OpenAI"`.
| Policy | Description |
|--------|-------------|
| `OnceNewDefaultVersionAvailable` | Auto-upgrade to new default (Recommended) |
| `OnceCurrentVersionExpired` | Upgrade only when current expires |
| `NoAutoUpgrade` | Manual upgrade only |
Default: `OnceNewDefaultVersionAvailable`.
---
## Phase 11: Generate Deployment Name
List existing deployments to avoid conflicts:
```bash
az cognitiveservices account deployment list \
--name $ACCOUNT_NAME \
--resource-group $RESOURCE_GROUP \
--query "[].name" -o json
```
Auto-generate: use model name as base, append `-2`, `-3` etc. if taken. Allow custom override. Validate: `^[\w.-]{2,64}$`.
---
## Phase 12: Review Configuration
Display summary of all selections for user confirmation before proceeding:
- Model, version, deployment name
- SKU, capacity (with unit), region
- RAI policy, version upgrade policy
- Advanced options (dynamic quota, priority, spillover)
- Account, resource group, project
User confirms or cancels.
---
## Phase 13: Execute Deployment
> π‘ `MODEL_FORMAT` was already detected in Phase 4. Use the stored value here.
### Standard CLI deployment (non-Anthropic models):
**Create deployment:**
```bash
az cognitiveservices account deployment create \
--name $ACCOUNT_NAME \
--resource-group $RESOURCE_GROUP \
--deployment-name $DEPLOYMENT_NAME \
--model-name $MODEL_NAME \
--model-version $MODEL_VERSION \
--model-format "$MODEL_FORMAT" \
--sku-name $SELECTED_SKU \
--sku-capacity $DEPLOY_CAPACITY
```
> π‘ **Note:** For non-OpenAI MaaS models, `$DEPLOY_CAPACITY` is `1` (set in Phase 7).
### Anthropic model deployment (requires modelProviderData):
The Azure CLI does not support `--model-provider-data`. Use the ARM REST API directly.
> β οΈ Industry, country code, and organization name should have been collected in Phase 7c.
```bash
echo "Creating Anthropic model deployment via REST API..."
az rest --method PUT \
--url "https://management.azure.com/subscriptions/$SUBSCRIPTION_ID/resourceGroups/$RESOURCE_GROUP/providers/Microsoft.CognitiveServices/accounts/$ACCOUNT_NAME/deployments/$DEPLOYMENT_NAME?api-version=2024-10-01" \
--body "{
\"sku\": {
\"name\": \"$SELECTED_SKU\",
\"capacity\": 1
},
\"properties\": {
\"model\": {
\"format\": \"Anthropic\",
\"name\": \"$MODEL_NAME\",
\"version\": \"$MODEL_VERSION\"
},
\"modelProviderData\": {
\"industry\": \"$SELECTED_INDUSTRY\",
\"countryCode\": \"$COUNTRY_CODE\",
\"organizationName\": \"$ORG_NAME\"
}
}
}"
```
*PowerShell version:*
```powershell
Write-Host "Creating Anthropic model deployment via REST API..."
$body = @{
sku = @{
name = $SELECTED_SKU
capacity = 1
}
properties = @{
model = @{
format = "Anthropic"
name = $MODEL_NAME
version = $MODEL_VERSION
}
modelProviderData = @{
industry = $SELECTED_INDUSTRY
countryCode = $countryCode
organizationName = $orgName
}
}
} | ConvertTo-Json -Depth 5
az rest --method PUT `
--url "https://management.azure.com/subscriptions/$SUBSCRIPTION_ID/resourceGroups/$RESOURCE_GROUP/providers/Microsoft.CognitiveServices/accounts/$ACCOUNT_NAME/deployments/${DEPLOYMENT_NAME}?api-version=2024-10-01" `
--body $body
```
> π‘ **Note:** Anthropic models use `capacity: 1` (MaaS billing model), not TPM-based capacity. RAI policy is not applicable for Anthropic models.
### Monitor deployment status:
```bash
az cognitiveservices account deployment show \
--name $ACCOUNT_NAME \
--resource-group $RESOURCE_GROUP \
--deployment-name $DEPLOYMENT_NAME \
--query "properties.provisioningState" -o tsv
```
Poll until `Succeeded` or `Failed`. Timeout after 5 minutes.
**Get endpoint:**
```bash
az cognitiveservices account show \
--name $ACCOUNT_NAME \
--resource-group $RESOURCE_GROUP \
--query "properties.endpoint" -o tsv
```
On success, display deployment name, model, version, SKU, capacity, region, RAI policy, rate limits, endpoint, and Azure AI Foundry portal link.
EXAMPLES.md 2.9 KB
# Examples: preset
## Example 1: Fast Path β Current Region Has Capacity
**Scenario:** Deploy gpt-4o to project in East US, which has capacity.
**Result:** Deployed in ~45s. No region selection needed. 100K TPM default, GlobalStandard SKU.
## Example 2: Alternative Region β No Capacity in Current Region
**Scenario:** Deploy gpt-4-turbo to dev project in West US 2 (no capacity).
**Result:** Queried all regions β user selected East US 2 (120K available) β deployed in ~2 min.
## Example 3: Create New Project in Optimal Region
**Scenario:** Deploy gpt-4o-mini in Europe for data residency; no existing European project.
**Result:** Created AI Services hub + project in Sweden Central β deployed in ~4 min with 150K TPM.
## Example 4: Insufficient Quota Everywhere
**Scenario:** Deploy gpt-4 but all regions have exhausted quota.
**Result:** Graceful failure with actionable guidance:
1. Request quota increase via the [quota skill](../../../quota/quota.md)
2. List existing deployments consuming quota
3. Suggest alternative models (gpt-4o, gpt-4o-mini)
## Example 5: First-Time User β No Project
**Scenario:** Deploy gpt-4o with no existing AI Foundry project.
**Result:** Full onboarding in ~5 min β created resource group, AI Services hub, project, then deployed.
## Example 6: Deployment Name Conflict
**Scenario:** Auto-generated deployment name already exists.
**Result:** Appended random hex suffix (e.g., `-7b9e`) and retried automatically.
## Example 7: Multi-Version Model Selection
**Scenario:** Deploy "latest gpt-4o" when multiple versions exist.
**Result:** Latest stable version auto-selected. Capacity aggregated across versions.
## Example 8: Anthropic Model (claude-sonnet-4-6)
**Scenario:** Deploy claude-sonnet-4-6 (Anthropic model requiring modelProviderData).
**Result:** User prompted for industry selection β tenant country code and org name fetched automatically β deployed via ARM REST API with `modelProviderData` payload in ~2 min. Capacity set to 1 (MaaS billing).
---
## Summary of Scenarios
| Scenario | Duration | Key Features |
|----------|----------|--------------|
| **1: Fast Path** | ~45s | Current region has capacity, direct deploy |
| **2: Alt Region** | ~2m | Region selection, project switch |
| **3: New Project** | ~4m | Project creation in optimal region |
| **4: No Quota** | N/A | Graceful failure, actionable guidance |
| **5: First-Time** | ~5m | Complete onboarding |
| **6: Name Conflict** | ~1m | Auto-retry with suffix |
| **7: Multi-Version** | ~1m | Latest version auto-selected |
| **8: Anthropic** | ~2m | Industry prompt, tenant info, REST API deploy |
## Common Patterns
```
A: Quick Deploy Auth β Get Project β Check Region (β) β Deploy
B: Region Select Auth β Get Project β Region (β) β Query All β Select β Deploy
C: Full Onboarding Auth β No Projects β Create Project β Deploy
D: Error Recovery Deploy (β) β Analyze β Fix β Retry
```
SKILL.md 4.8 KB
---
name: preset
description: "Intelligently deploys Azure OpenAI models to optimal regions by analyzing capacity across all available regions. Automatically checks current region first and shows alternatives if needed. USE FOR: quick deployment, optimal region, best region, automatic region selection, fast setup, multi-region capacity check, high availability deployment, deploy to best location. DO NOT USE FOR: custom SKU selection (use customize), specific version selection (use customize), custom capacity configuration (use customize), PTU deployments (use customize)."
license: MIT
metadata:
author: Microsoft
version: "1.0.1"
---
# Deploy Model to Optimal Region
Automates intelligent Azure OpenAI model deployment by checking capacity across regions and deploying to the best available option.
## What This Skill Does
1. Verifies Azure authentication and project scope
2. Checks capacity in current project's region
3. If no capacity: analyzes all regions and shows available alternatives
4. Filters projects by selected region
5. Supports creating new projects if needed
6. Deploys model with GlobalStandard SKU
7. Monitors deployment progress
## Prerequisites
- Azure CLI installed and configured
- Active Azure subscription with Cognitive Services read/create permissions
- Azure AI Foundry project resource ID (`PROJECT_RESOURCE_ID` env var or provided interactively)
- Format: `/subscriptions/{sub-id}/resourceGroups/{rg}/providers/Microsoft.CognitiveServices/accounts/{account}/projects/{project}`
- Found in: Azure AI Foundry portal β Project β Overview β Resource ID
## Quick Workflow
### Fast Path (Current Region Has Capacity)
```
1. Check authentication β 2. Get project β 3. Check current region capacity
β 4. Deploy immediately
```
### Alternative Region Path (No Capacity)
```
1. Check authentication β 2. Get project β 3. Check current region (no capacity)
β 4. Query all regions β 5. Show alternatives β 6. Select region + project
β 7. Deploy
```
---
## Deployment Phases
| Phase | Action | Key Commands |
|-------|--------|-------------|
| 1. Verify Auth | Check Azure CLI login and subscription | `az account show`, `az login` |
| 2. Get Project | Parse `PROJECT_RESOURCE_ID` ARM ID, verify exists | `az cognitiveservices account show` |
| 3. Get Model | List available models, user selects model + version | `az cognitiveservices account list-models` |
| 4. Check Current Region | Query capacity using GlobalStandard SKU | `az rest --method GET .../modelCapacities` |
| 5. Multi-Region Query | If no local capacity, query all regions | Same capacity API without location filter |
| 6. Select Region + Project | User picks region; find or create project | `az cognitiveservices account list`, `az cognitiveservices account create` |
| 7. Deploy | Generate unique name, calculate capacity (50% available, min 50 TPM), create deployment | `az cognitiveservices account deployment create` |
For detailed step-by-step instructions, see [workflow reference](references/workflow.md).
---
## Error Handling
| Error | Symptom | Resolution |
|-------|---------|------------|
| Auth failure | `az account show` returns error | Run `az login` then `az account set --subscription <id>` |
| No quota | All regions show 0 capacity | Defer to the [quota skill](../../../quota/quota.md) for increase requests and troubleshooting; check existing deployments; try alternative models |
| Model not found | Empty capacity list | Verify model name with `az cognitiveservices account list-models`; check case sensitivity |
| Name conflict | "deployment already exists" | Append suffix to deployment name (handled automatically by `generate_deployment_name` script) |
| Region unavailable | Region doesn't support model | Select a different region from the available list |
| Permission denied | "Forbidden" or "Unauthorized" | Verify Cognitive Services Contributor role: `az role assignment list --assignee <user>` |
---
## Advanced Usage
```bash
# Custom capacity
az cognitiveservices account deployment create ... --sku-capacity <value>
# Check deployment status
az cognitiveservices account deployment show --name <acct> --resource-group <rg> --deployment-name <name> --query "{Status:properties.provisioningState}"
# Delete deployment
az cognitiveservices account deployment delete --name <acct> --resource-group <rg> --deployment-name <name>
```
## Notes
- **SKU:** GlobalStandard only β **API Version:** 2024-10-01 (GA stable)
---
## Related Skills
- **microsoft-foundry** - Parent skill for Azure AI Foundry operations
- **[quota](../../../quota/quota.md)** β For quota viewing, increase requests, and troubleshooting quota errors, defer to this skill
- **azure-quick-review** - Review Azure resources for compliance
- **azure-cost-estimation** - Estimate costs for Azure deployments
- **azure-validate** - Validate Azure infrastructure before deployment
preset-workflow.md 21.6 KB
# Preset Deployment Workflow - Detailed Implementation
This file contains the full step-by-step bash/PowerShell scripts for preset (optimal region) model deployment. Referenced from the main [SKILL.md](../SKILL.md).
---
## Phase 1: Verify Authentication
Check if user is logged into Azure CLI:
```bash
az account show --query "{Subscription:name, User:user.name}" -o table
```
**If not logged in:**
```bash
az login
```
**Verify subscription is correct:**
```bash
# List all subscriptions
az account list --query "[].[name,id,state]" -o table
# Set active subscription if needed
az account set --subscription <subscription-id>
```
---
## Phase 2: Get Current Project
**Check for PROJECT_RESOURCE_ID environment variable first:**
```bash
if [ -n "$PROJECT_RESOURCE_ID" ]; then
echo "Using project resource ID from environment: $PROJECT_RESOURCE_ID"
else
echo "PROJECT_RESOURCE_ID not set. Please provide your Azure AI Foundry project resource ID."
echo ""
echo "You can find this in:"
echo " β’ Azure AI Foundry portal β Project β Overview β Resource ID"
echo " β’ Format: /subscriptions/{sub-id}/resourceGroups/{rg}/providers/Microsoft.CognitiveServices/accounts/{account}/projects/{project}"
echo ""
echo "Example: /subscriptions/abc123.../resourceGroups/rg-prod/providers/Microsoft.CognitiveServices/accounts/my-account/projects/my-project"
echo ""
read -p "Enter project resource ID: " PROJECT_RESOURCE_ID
fi
```
**Parse the ARM resource ID to extract components:**
```bash
# Extract components from ARM resource ID
# Format: /subscriptions/{sub-id}/resourceGroups/{rg}/providers/Microsoft.CognitiveServices/accounts/{account}/projects/{project}
SUBSCRIPTION_ID=$(echo "$PROJECT_RESOURCE_ID" | sed -n 's|.*/subscriptions/\([^/]*\).*|\1|p')
RESOURCE_GROUP=$(echo "$PROJECT_RESOURCE_ID" | sed -n 's|.*/resourceGroups/\([^/]*\).*|\1|p')
ACCOUNT_NAME=$(echo "$PROJECT_RESOURCE_ID" | sed -n 's|.*/accounts/\([^/]*\)/projects.*|\1|p')
PROJECT_NAME=$(echo "$PROJECT_RESOURCE_ID" | sed -n 's|.*/projects/\([^/?]*\).*|\1|p')
if [ -z "$SUBSCRIPTION_ID" ] || [ -z "$RESOURCE_GROUP" ] || [ -z "$ACCOUNT_NAME" ] || [ -z "$PROJECT_NAME" ]; then
echo "β Invalid project resource ID format"
echo "Expected format: /subscriptions/{sub-id}/resourceGroups/{rg}/providers/Microsoft.CognitiveServices/accounts/{account}/projects/{project}"
exit 1
fi
echo "Parsed project details:"
echo " Subscription: $SUBSCRIPTION_ID"
echo " Resource Group: $RESOURCE_GROUP"
echo " Account: $ACCOUNT_NAME"
echo " Project: $PROJECT_NAME"
```
**Verify the project exists and get its region:**
```bash
# Set active subscription
az account set --subscription "$SUBSCRIPTION_ID"
# Get project details to verify it exists and extract region
PROJECT_REGION=$(az cognitiveservices account show \
--name "$PROJECT_NAME" \
--resource-group "$RESOURCE_GROUP" \
--query location -o tsv 2>/dev/null)
if [ -z "$PROJECT_REGION" ]; then
echo "β Project '$PROJECT_NAME' not found in resource group '$RESOURCE_GROUP'"
echo ""
echo "Please verify the resource ID is correct."
echo ""
echo "List available projects:"
echo " az cognitiveservices account list --query \"[?kind=='AIProject'].{Name:name, Location:location, ResourceGroup:resourceGroup}\" -o table"
exit 1
fi
echo "β Project found"
echo " Region: $PROJECT_REGION"
```
---
## Phase 3: Get Model Name
**If model name provided as skill parameter, skip this phase.**
Ask user which model to deploy. **Fetch available models dynamically** from the account rather than using a hardcoded list:
```bash
# List available models in the account
az cognitiveservices account list-models \
--name "$PROJECT_NAME" \
--resource-group "$RESOURCE_GROUP" \
--query "[].name" -o tsv | sort -u
```
Present the results to the user and let them choose, or enter a custom model name.
**Store model:**
```bash
MODEL_NAME="<selected-model>"
```
**Get model version (latest stable):**
```bash
# List available models and versions in the account
az cognitiveservices account list-models \
--name "$PROJECT_NAME" \
--resource-group "$RESOURCE_GROUP" \
--query "[?name=='$MODEL_NAME'].{Name:name, Version:version, Format:format}" \
-o table
```
**Use latest version or let user specify:**
```bash
MODEL_VERSION="<version-or-latest>"
```
**Detect model format:**
```bash
# Get model format from model catalog (e.g., OpenAI, Anthropic, Meta-Llama, Mistral, Cohere)
MODEL_FORMAT=$(az cognitiveservices account list-models \
--name "$ACCOUNT_NAME" \
--resource-group "$RESOURCE_GROUP" \
--query "[?name=='$MODEL_NAME'].format" -o tsv | head -1)
# Default to OpenAI if not found
MODEL_FORMAT=${MODEL_FORMAT:-"OpenAI"}
echo "Model format: $MODEL_FORMAT"
```
> π‘ **Model format determines the deployment path:**
> - `OpenAI` β Standard CLI deployment, TPM-based capacity, RAI policies apply
> - `Anthropic` β REST API deployment with `modelProviderData`, capacity=1, no RAI
> - All other formats (`Meta-Llama`, `Mistral`, `Cohere`, etc.) β Standard CLI deployment, capacity=1 (MaaS), no RAI
---
## Phase 4: Check Current Region Capacity
Before checking other regions, see if the current project's region has capacity:
```bash
# Query capacity for current region
CAPACITY_JSON=$(az rest --method GET \
--url "https://management.azure.com/subscriptions/$SUBSCRIPTION_ID/providers/Microsoft.CognitiveServices/locations/$PROJECT_REGION/modelCapacities?api-version=2024-10-01&modelFormat=$MODEL_FORMAT&modelName=$MODEL_NAME&modelVersion=$MODEL_VERSION")
# Extract available capacity for GlobalStandard SKU
CURRENT_CAPACITY=$(echo "$CAPACITY_JSON" | jq -r '.value[] | select(.properties.skuName=="GlobalStandard") | .properties.availableCapacity')
```
**Check result:**
```bash
if [ -n "$CURRENT_CAPACITY" ] && [ "$CURRENT_CAPACITY" -gt 0 ]; then
echo "β Current region ($PROJECT_REGION) has capacity: $CURRENT_CAPACITY TPM"
echo "Proceeding with deployment..."
# Skip to Phase 7 (Deploy)
else
echo "β Current region ($PROJECT_REGION) has no available capacity"
echo "Checking alternative regions..."
# Continue to Phase 5
fi
```
---
## Phase 5: Query Multi-Region Capacity (If Needed)
Only execute this phase if current region has no capacity.
**Query capacity across all regions:**
```bash
# Get capacity for all regions in subscription
ALL_REGIONS_JSON=$(az rest --method GET \
--url "https://management.azure.com/subscriptions/$SUBSCRIPTION_ID/providers/Microsoft.CognitiveServices/modelCapacities?api-version=2024-10-01&modelFormat=$MODEL_FORMAT&modelName=$MODEL_NAME&modelVersion=$MODEL_VERSION")
# Save to file for processing
echo "$ALL_REGIONS_JSON" > /tmp/capacity_check.json
```
**Parse and categorize regions:**
```bash
# Extract available regions (capacity > 0)
AVAILABLE_REGIONS=$(jq -r '.value[] | select(.properties.skuName=="GlobalStandard" and .properties.availableCapacity > 0) | "\(.location)|\(.properties.availableCapacity)"' /tmp/capacity_check.json)
# Extract unavailable regions (capacity = 0 or undefined)
UNAVAILABLE_REGIONS=$(jq -r '.value[] | select(.properties.skuName=="GlobalStandard" and (.properties.availableCapacity == 0 or .properties.availableCapacity == null)) | "\(.location)|0"' /tmp/capacity_check.json)
```
**Format and display regions:**
```bash
# Format capacity (e.g., 120000 -> 120K)
format_capacity() {
local capacity=$1
if [ "$capacity" -ge 1000000 ]; then
echo "$(awk "BEGIN {printf \"%.1f\", $capacity/1000000}")M TPM"
elif [ "$capacity" -ge 1000 ]; then
echo "$(awk "BEGIN {printf \"%.0f\", $capacity/1000}")K TPM"
else
echo "$capacity TPM"
fi
}
echo ""
echo "β No Capacity in Current Region"
echo ""
echo "The current project's region ($PROJECT_REGION) does not have available capacity for $MODEL_NAME."
echo ""
echo "Available Regions (with capacity):"
echo ""
# Display available regions with formatted capacity
echo "$AVAILABLE_REGIONS" | while IFS='|' read -r region capacity; do
formatted_capacity=$(format_capacity "$capacity")
# Get region display name (capitalize and format)
region_display=$(echo "$region" | sed 's/\([a-z]\)\([a-z]*\)/\U\1\L\2/g; s/\([a-z]\)\([0-9]\)/\1 \2/g')
echo " β’ $region_display - $formatted_capacity"
done
echo ""
echo "Unavailable Regions:"
echo ""
# Display unavailable regions
echo "$UNAVAILABLE_REGIONS" | while IFS='|' read -r region capacity; do
region_display=$(echo "$region" | sed 's/\([a-z]\)\([a-z]*\)/\U\1\L\2/g; s/\([a-z]\)\([0-9]\)/\1 \2/g')
if [ "$capacity" = "0" ]; then
echo " β $region_display (Insufficient quota - 0 TPM available)"
else
echo " β $region_display (Model not supported)"
fi
done
```
**Handle no capacity anywhere:**
```bash
if [ -z "$AVAILABLE_REGIONS" ]; then
echo ""
echo "β No Available Capacity in Any Region"
echo ""
echo "No regions have available capacity for $MODEL_NAME with GlobalStandard SKU."
echo ""
echo "Next Steps:"
echo "1. Request quota increase β use the quota skill (../../../quota/quota.md)"
echo ""
echo "2. Check existing deployments (may be using quota):"
echo " az cognitiveservices account deployment list \\"
echo " --name $PROJECT_NAME \\"
echo " --resource-group $RESOURCE_GROUP"
echo ""
echo "3. Consider alternative models with lower capacity requirements:"
echo " β’ gpt-4o-mini (cost-effective, lower capacity requirements)"
echo " List available models: az cognitiveservices account list-models --name \$PROJECT_NAME --resource-group \$RESOURCE_GROUP --output table"
exit 1
fi
```
---
## Phase 6: Select Region and Project
**Ask user to select region from available options.**
Example using AskUserQuestion:
- Present available regions as options
- Show capacity for each
- User selects preferred region
**Store selection:**
```bash
SELECTED_REGION="<user-selected-region>" # e.g., "eastus2"
```
**Find projects in selected region:**
```bash
PROJECTS_IN_REGION=$(az cognitiveservices account list \
--query "[?kind=='AIProject' && location=='$SELECTED_REGION'].{Name:name, ResourceGroup:resourceGroup}" \
--output json)
PROJECT_COUNT=$(echo "$PROJECTS_IN_REGION" | jq '. | length')
if [ "$PROJECT_COUNT" -eq 0 ]; then
echo "No projects found in $SELECTED_REGION"
echo "Would you like to create a new project? (yes/no)"
# If yes, continue to project creation
# If no, exit or select different region
else
echo "Projects in $SELECTED_REGION:"
echo "$PROJECTS_IN_REGION" | jq -r '.[] | " β’ \(.Name) (\(.ResourceGroup))"'
echo ""
echo "Select a project or create new project"
fi
```
**Option A: Use existing project**
```bash
PROJECT_NAME="<selected-project-name>"
RESOURCE_GROUP="<resource-group>"
```
**Option B: Create new project**
```bash
# Generate project name
USER_ALIAS=$(az account show --query user.name -o tsv | cut -d'@' -f1 | tr '.' '-')
RANDOM_SUFFIX=$(openssl rand -hex 2)
NEW_PROJECT_NAME="${USER_ALIAS}-aiproject-${RANDOM_SUFFIX}"
# Prompt for resource group
echo "Resource group for new project:"
echo " 1. Use existing resource group: $RESOURCE_GROUP"
echo " 2. Create new resource group"
# If existing resource group
NEW_RESOURCE_GROUP="$RESOURCE_GROUP"
# Create AI Services account (hub)
HUB_NAME="${NEW_PROJECT_NAME}-hub"
echo "Creating AI Services hub: $HUB_NAME in $SELECTED_REGION..."
az cognitiveservices account create \
--name "$HUB_NAME" \
--resource-group "$NEW_RESOURCE_GROUP" \
--location "$SELECTED_REGION" \
--kind "AIServices" \
--sku "S0" \
--yes
# Create AI Foundry project
echo "Creating AI Foundry project: $NEW_PROJECT_NAME..."
az cognitiveservices account create \
--name "$NEW_PROJECT_NAME" \
--resource-group "$NEW_RESOURCE_GROUP" \
--location "$SELECTED_REGION" \
--kind "AIProject" \
--sku "S0" \
--yes
echo "β Project created successfully"
PROJECT_NAME="$NEW_PROJECT_NAME"
RESOURCE_GROUP="$NEW_RESOURCE_GROUP"
```
---
## Phase 7: Deploy Model
**Generate unique deployment name:**
The deployment name should match the model name (e.g., "gpt-4o"), but if a deployment with that name already exists, append a numeric suffix (e.g., "gpt-4o-2", "gpt-4o-3"). This follows the same UX pattern as Azure AI Foundry portal.
Use the `generate_deployment_name` script to check existing deployments and generate a unique name:
*Bash version:*
```bash
DEPLOYMENT_NAME=$(bash scripts/generate_deployment_name.sh \
"$ACCOUNT_NAME" \
"$RESOURCE_GROUP" \
"$MODEL_NAME")
echo "Generated deployment name: $DEPLOYMENT_NAME"
```
*PowerShell version:*
```powershell
$DEPLOYMENT_NAME = & .\scripts\generate_deployment_name.ps1 `
-AccountName $ACCOUNT_NAME `
-ResourceGroup $RESOURCE_GROUP `
-ModelName $MODEL_NAME
Write-Host "Generated deployment name: $DEPLOYMENT_NAME"
```
**Calculate deployment capacity:**
Follow UX capacity calculation logic. For OpenAI models, use 50% of available capacity (minimum 50 TPM). For all other models (MaaS), capacity is always 1:
```bash
if [ "$MODEL_FORMAT" = "OpenAI" ]; then
# OpenAI models: TPM-based capacity (50% of available, minimum 50)
SELECTED_CAPACITY=$(echo "$ALL_REGIONS_JSON" | jq -r ".value[] | select(.location==\"$SELECTED_REGION\" and .properties.skuName==\"GlobalStandard\") | .properties.availableCapacity")
if [ "$SELECTED_CAPACITY" -gt 50 ]; then
DEPLOY_CAPACITY=$((SELECTED_CAPACITY / 2))
if [ "$DEPLOY_CAPACITY" -lt 50 ]; then
DEPLOY_CAPACITY=50
fi
else
DEPLOY_CAPACITY=$SELECTED_CAPACITY
fi
echo "Deploying with capacity: $DEPLOY_CAPACITY TPM (50% of available: $SELECTED_CAPACITY TPM)"
else
# Non-OpenAI models (MaaS): capacity is always 1
DEPLOY_CAPACITY=1
echo "MaaS model β deploying with capacity: 1 (pay-per-token billing)"
fi
```
### If MODEL_FORMAT is NOT "Anthropic" β Standard CLI Deployment
> π‘ **Note:** The Azure CLI supports all non-Anthropic model formats directly.
*Bash version:*
```bash
echo "Creating deployment..."
az cognitiveservices account deployment create \
--name "$ACCOUNT_NAME" \
--resource-group "$RESOURCE_GROUP" \
--deployment-name "$DEPLOYMENT_NAME" \
--model-name "$MODEL_NAME" \
--model-version "$MODEL_VERSION" \
--model-format "$MODEL_FORMAT" \
--sku-name "GlobalStandard" \
--sku-capacity "$DEPLOY_CAPACITY"
```
*PowerShell version:*
```powershell
Write-Host "Creating deployment..."
az cognitiveservices account deployment create `
--name $ACCOUNT_NAME `
--resource-group $RESOURCE_GROUP `
--deployment-name $DEPLOYMENT_NAME `
--model-name $MODEL_NAME `
--model-version $MODEL_VERSION `
--model-format $MODEL_FORMAT `
--sku-name "GlobalStandard" `
--sku-capacity $DEPLOY_CAPACITY
```
> π‘ **Note:** For non-OpenAI MaaS models (Meta-Llama, Mistral, Cohere, etc.), `$DEPLOY_CAPACITY` is `1` (set in capacity calculation above).
### If MODEL_FORMAT is "Anthropic" β REST API Deployment with modelProviderData
The Azure CLI does not support `--model-provider-data`. You must use the ARM REST API directly.
**Step 1: Prompt user to select industry**
Present the following list and ask the user to choose one:
```
1. None (API value: none)
2. Biotechnology (API value: biotechnology)
3. Consulting (API value: consulting)
4. Education (API value: education)
5. Finance (API value: finance)
6. Food & Beverage (API value: food_and_beverage)
7. Government (API value: government)
8. Healthcare (API value: healthcare)
9. Insurance (API value: insurance)
10. Law (API value: law)
11. Manufacturing (API value: manufacturing)
12. Media (API value: media)
13. Nonprofit (API value: nonprofit)
14. Technology (API value: technology)
15. Telecommunications (API value: telecommunications)
16. Sport & Recreation (API value: sport_and_recreation)
17. Real Estate (API value: real_estate)
18. Retail (API value: retail)
19. Other (API value: other)
```
> β οΈ **Do NOT pick a default industry or hardcode a value. Always ask the user.** This is required by Anthropic's terms of service. The industry list is static β there is no REST API that provides it.
Store selection as `SELECTED_INDUSTRY` (use the API value, e.g., `technology`).
**Step 2: Fetch tenant info (country code and organization name)**
```bash
TENANT_INFO=$(az rest --method GET \
--url "https://management.azure.com/tenants?api-version=2024-11-01" \
--query "value[0].{countryCode:countryCode, displayName:displayName}" -o json)
COUNTRY_CODE=$(echo "$TENANT_INFO" | jq -r '.countryCode')
ORG_NAME=$(echo "$TENANT_INFO" | jq -r '.displayName')
```
*PowerShell version:*
```powershell
$tenantInfo = az rest --method GET `
--url "https://management.azure.com/tenants?api-version=2024-11-01" `
--query "value[0].{countryCode:countryCode, displayName:displayName}" -o json | ConvertFrom-Json
$countryCode = $tenantInfo.countryCode
$orgName = $tenantInfo.displayName
```
**Step 3: Deploy via ARM REST API**
*Bash version:*
```bash
echo "Creating Anthropic model deployment via REST API..."
az rest --method PUT \
--url "https://management.azure.com/subscriptions/$SUBSCRIPTION_ID/resourceGroups/$RESOURCE_GROUP/providers/Microsoft.CognitiveServices/accounts/$ACCOUNT_NAME/deployments/$DEPLOYMENT_NAME?api-version=2024-10-01" \
--body "{
\"sku\": {
\"name\": \"GlobalStandard\",
\"capacity\": 1
},
\"properties\": {
\"model\": {
\"format\": \"Anthropic\",
\"name\": \"$MODEL_NAME\",
\"version\": \"$MODEL_VERSION\"
},
\"modelProviderData\": {
\"industry\": \"$SELECTED_INDUSTRY\",
\"countryCode\": \"$COUNTRY_CODE\",
\"organizationName\": \"$ORG_NAME\"
}
}
}"
```
*PowerShell version:*
```powershell
Write-Host "Creating Anthropic model deployment via REST API..."
$body = @{
sku = @{
name = "GlobalStandard"
capacity = 1
}
properties = @{
model = @{
format = "Anthropic"
name = $MODEL_NAME
version = $MODEL_VERSION
}
modelProviderData = @{
industry = $SELECTED_INDUSTRY
countryCode = $countryCode
organizationName = $orgName
}
}
} | ConvertTo-Json -Depth 5
az rest --method PUT `
--url "https://management.azure.com/subscriptions/$SUBSCRIPTION_ID/resourceGroups/$RESOURCE_GROUP/providers/Microsoft.CognitiveServices/accounts/$ACCOUNT_NAME/deployments/${DEPLOYMENT_NAME}?api-version=2024-10-01" `
--body $body
```
> π‘ **Note:** Anthropic models use `capacity: 1` (MaaS billing model), not TPM-based capacity.
**Monitor deployment progress:**
```bash
echo "Monitoring deployment status..."
MAX_WAIT=300 # 5 minutes
ELAPSED=0
INTERVAL=10
while [ $ELAPSED -lt $MAX_WAIT ]; do
STATUS=$(az cognitiveservices account deployment show \
--name "$ACCOUNT_NAME" \
--resource-group "$RESOURCE_GROUP" \
--deployment-name "$DEPLOYMENT_NAME" \
--query "properties.provisioningState" -o tsv 2>/dev/null)
case "$STATUS" in
"Succeeded")
echo "β Deployment successful!"
break
;;
"Failed")
echo "β Deployment failed"
# Get error details
az cognitiveservices account deployment show \
--name "$ACCOUNT_NAME" \
--resource-group "$RESOURCE_GROUP" \
--deployment-name "$DEPLOYMENT_NAME" \
--query "properties"
exit 1
;;
"Creating"|"Accepted"|"Running")
echo "Status: $STATUS... (${ELAPSED}s elapsed)"
sleep $INTERVAL
ELAPSED=$((ELAPSED + INTERVAL))
;;
*)
echo "Unknown status: $STATUS"
sleep $INTERVAL
ELAPSED=$((ELAPSED + INTERVAL))
;;
esac
done
if [ $ELAPSED -ge $MAX_WAIT ]; then
echo "β Deployment timeout after ${MAX_WAIT}s"
echo "Check status manually:"
echo " az cognitiveservices account deployment show \\"
echo " --name $ACCOUNT_NAME \\"
echo " --resource-group $RESOURCE_GROUP \\"
echo " --deployment-name $DEPLOYMENT_NAME"
exit 1
fi
```
---
## Phase 8: Display Deployment Details
**Show deployment information:**
```bash
echo ""
echo "βββββββββββββββββββββββββββββββββββββββββββ"
echo "β Deployment Successful!"
echo "βββββββββββββββββββββββββββββββββββββββββββ"
echo ""
# Get endpoint information
ENDPOINT=$(az cognitiveservices account show \
--name "$ACCOUNT_NAME" \
--resource-group "$RESOURCE_GROUP" \
--query "properties.endpoint" -o tsv)
# Get deployment details
DEPLOYMENT_INFO=$(az cognitiveservices account deployment show \
--name "$ACCOUNT_NAME" \
--resource-group "$RESOURCE_GROUP" \
--deployment-name "$DEPLOYMENT_NAME" \
--query "properties.model")
echo "Deployment Name: $DEPLOYMENT_NAME"
echo "Model: $MODEL_NAME"
echo "Version: $MODEL_VERSION"
echo "Region: $SELECTED_REGION"
echo "SKU: GlobalStandard"
echo "Capacity: $(format_capacity $DEPLOY_CAPACITY)"
echo "Endpoint: $ENDPOINT"
echo ""
# Generate direct link to deployment in Azure AI Foundry portal
DEPLOYMENT_URL=$(bash "$(dirname "$0")/scripts/generate_deployment_url.sh" \
--subscription "$SUBSCRIPTION_ID" \
--resource-group "$RESOURCE_GROUP" \
--foundry-resource "$ACCOUNT_NAME" \
--project "$PROJECT_NAME" \
--deployment "$DEPLOYMENT_NAME")
echo "π View in Azure AI Foundry Portal:"
echo ""
echo "$DEPLOYMENT_URL"
echo ""
echo "βββββββββββββββββββββββββββββββββββββββββββ"
echo ""
echo "Test your deployment:"
echo ""
echo "# View deployment details"
echo "az cognitiveservices account deployment show \\"
echo " --name $ACCOUNT_NAME \\"
echo " --resource-group $RESOURCE_GROUP \\"
echo " --deployment-name $DEPLOYMENT_NAME"
echo ""
echo "# List all deployments"
echo "az cognitiveservices account deployment list \\"
echo " --name $ACCOUNT_NAME \\"
echo " --resource-group $RESOURCE_GROUP \\"
echo " --output table"
echo ""
echo "Next steps:"
echo "β’ Click the link above to test in Azure AI Foundry playground"
echo "β’ Integrate into your application"
echo "β’ Set up monitoring and alerts"
```
workflow.md 5.6 KB
# Preset Deployment Workflow β Step-by-Step
Condensed implementation reference for preset (optimal region) model deployment. See [SKILL.md](../SKILL.md) for overview.
**Table of Contents:** [Phase 1: Verify Authentication](#phase-1-verify-authentication) Β· [Phase 2: Get Current Project](#phase-2-get-current-project) Β· [Phase 3: Get Model Name](#phase-3-get-model-name) Β· [Phase 4: Check Current Region Capacity](#phase-4-check-current-region-capacity) Β· [Phase 5: Query Multi-Region Capacity](#phase-5-query-multi-region-capacity) Β· [Phase 6: Select Region and Project](#phase-6-select-region-and-project) Β· [Phase 7: Deploy Model](#phase-7-deploy-model)
---
## Phase 1: Verify Authentication
```bash
az account show --query "{Subscription:name, User:user.name}" -o table
```
If not logged in: `az login`
Switch subscription:
```bash
az account list --query "[].[name,id,state]" -o table
az account set --subscription <subscription-id>
```
---
## Phase 2: Get Current Project
Read `PROJECT_RESOURCE_ID` from env or prompt user. Format:
`/subscriptions/{sub-id}/resourceGroups/{rg}/providers/Microsoft.CognitiveServices/accounts/{account}/projects/{project}`
Parse ARM ID components:
```bash
SUBSCRIPTION_ID=$(echo "$PROJECT_RESOURCE_ID" | sed -n 's|.*/subscriptions/\([^/]*\).*|\1|p')
RESOURCE_GROUP=$(echo "$PROJECT_RESOURCE_ID" | sed -n 's|.*/resourceGroups/\([^/]*\).*|\1|p')
ACCOUNT_NAME=$(echo "$PROJECT_RESOURCE_ID" | sed -n 's|.*/accounts/\([^/]*\)/projects.*|\1|p')
PROJECT_NAME=$(echo "$PROJECT_RESOURCE_ID" | sed -n 's|.*/projects/\([^/?]*\).*|\1|p')
```
Verify project exists and get region:
```bash
az account set --subscription "$SUBSCRIPTION_ID"
PROJECT_REGION=$(az cognitiveservices account show \
--name "$PROJECT_NAME" \
--resource-group "$RESOURCE_GROUP" \
--query location -o tsv)
```
---
## Phase 3: Get Model Name
If model not provided as parameter, list available models:
```bash
az cognitiveservices account list-models \
--name "$PROJECT_NAME" \
--resource-group "$RESOURCE_GROUP" \
--query "[].name" -o tsv | sort -u
```
Get versions for selected model:
```bash
az cognitiveservices account list-models \
--name "$PROJECT_NAME" \
--resource-group "$RESOURCE_GROUP" \
--query "[?name=='$MODEL_NAME'].{Name:name, Version:version, Format:format}" \
-o table
```
---
## Phase 4: Check Current Region Capacity
```bash
CAPACITY_JSON=$(az rest --method GET \
--url "https://management.azure.com/subscriptions/$SUBSCRIPTION_ID/providers/Microsoft.CognitiveServices/locations/$PROJECT_REGION/modelCapacities?api-version=2024-10-01&modelFormat=OpenAI&modelName=$MODEL_NAME&modelVersion=$MODEL_VERSION")
CURRENT_CAPACITY=$(echo "$CAPACITY_JSON" | jq -r '.value[] | select(.properties.skuName=="GlobalStandard") | .properties.availableCapacity')
```
If `CURRENT_CAPACITY > 0` β skip to Phase 7. Otherwise continue to Phase 5.
---
## Phase 5: Query Multi-Region Capacity
```bash
ALL_REGIONS_JSON=$(az rest --method GET \
--url "https://management.azure.com/subscriptions/$SUBSCRIPTION_ID/providers/Microsoft.CognitiveServices/modelCapacities?api-version=2024-10-01&modelFormat=OpenAI&modelName=$MODEL_NAME&modelVersion=$MODEL_VERSION")
```
Extract available regions (capacity > 0):
```bash
AVAILABLE_REGIONS=$(echo "$ALL_REGIONS_JSON" | jq -r '.value[] | select(.properties.skuName=="GlobalStandard" and .properties.availableCapacity > 0) | "\(.location)|\(.properties.availableCapacity)"')
```
Extract unavailable regions:
```bash
UNAVAILABLE_REGIONS=$(echo "$ALL_REGIONS_JSON" | jq -r '.value[] | select(.properties.skuName=="GlobalStandard" and (.properties.availableCapacity == 0 or .properties.availableCapacity == null)) | "\(.location)|0"')
```
If no regions have capacity, defer to the [quota skill](../../../../quota/quota.md) for increase requests. Suggest checking existing deployments or trying alternative models like `gpt-4o-mini`.
---
## Phase 6: Select Region and Project
Present available regions to user. Store selection as `SELECTED_REGION`.
Find projects in selected region:
```bash
PROJECTS_IN_REGION=$(az cognitiveservices account list \
--query "[?kind=='AIProject' && location=='$SELECTED_REGION'].{Name:name, ResourceGroup:resourceGroup}" \
--output json)
```
**If no projects exist β create new:**
```bash
az cognitiveservices account create \
--name "$HUB_NAME" \
--resource-group "$RESOURCE_GROUP" \
--location "$SELECTED_REGION" \
--kind "AIServices" \
--sku "S0" --yes
az cognitiveservices account create \
--name "$NEW_PROJECT_NAME" \
--resource-group "$RESOURCE_GROUP" \
--location "$SELECTED_REGION" \
--kind "AIProject" \
--sku "S0" --yes
```
---
## Phase 7: Deploy Model
Generate unique deployment name using `scripts/generate_deployment_name.sh`:
```bash
DEPLOYMENT_NAME=$(bash scripts/generate_deployment_name.sh "$ACCOUNT_NAME" "$RESOURCE_GROUP" "$MODEL_NAME")
```
Calculate capacity β 50% of available, minimum 50 TPM:
```bash
SELECTED_CAPACITY=$(echo "$ALL_REGIONS_JSON" | jq -r ".value[] | select(.location==\"$SELECTED_REGION\" and .properties.skuName==\"GlobalStandard\") | .properties.availableCapacity")
DEPLOY_CAPACITY=$(( SELECTED_CAPACITY / 2 ))
[ "$DEPLOY_CAPACITY" -lt 50 ] && DEPLOY_CAPACITY=50
```
Create deployment:
```bash
az cognitiveservices account deployment create \
--name "$ACCOUNT_NAME" \
--resource-group "$RESOURCE_GROUP" \
--deployment-name "$DEPLOYMENT_NAME" \
--model-name "$MODEL_NAME" \
--model-version "$MODEL_VERSION" \
--model-format "OpenAI" \
--sku-name "GlobalStandard" \
--sku-capacity "$DEPLOY_CAPACITY"
```
Monitor with `az cognitiveservices account deployment show ... --query "properties.provisioningState"` until `Succeeded` or `Failed`.
generate_deployment_url.ps1 2.3 KB
# Generate Azure AI Foundry portal URL for a model deployment
# This script creates a direct clickable link to view a deployment in the Azure AI Foundry portal
#
# NOTE: The encoding scheme for the subscription ID portion is proprietary to Azure AI Foundry.
# This script uses a GUID byte encoding approach, but may need adjustment based on the actual encoding used.
param(
[Parameter(Mandatory=$true)]
[string]$SubscriptionId,
[Parameter(Mandatory=$true)]
[string]$ResourceGroup,
[Parameter(Mandatory=$true)]
[string]$FoundryResource,
[Parameter(Mandatory=$true)]
[string]$ProjectName,
[Parameter(Mandatory=$true)]
[string]$DeploymentName
)
function Get-SubscriptionIdEncoded {
param([string]$SubscriptionId)
# Parse GUID and convert to bytes in string order (big-endian)
# Not using ToByteArray() because it uses little-endian format
$guidString = $SubscriptionId.Replace('-', '')
$bytes = New-Object byte[] 16
for ($i = 0; $i -lt 16; $i++) {
$bytes[$i] = [Convert]::ToByte($guidString.Substring($i * 2, 2), 16)
}
# Encode as base64url
$base64 = [Convert]::ToBase64String($bytes)
$urlSafe = $base64.Replace('+', '-').Replace('/', '_').TrimEnd('=')
return $urlSafe
}
function Get-FoundryDeploymentUrl {
param(
[string]$SubscriptionId,
[string]$ResourceGroup,
[string]$FoundryResource,
[string]$ProjectName,
[string]$DeploymentName
)
# Encode subscription ID
$encodedSubId = Get-SubscriptionIdEncoded -SubscriptionId $SubscriptionId
# Build the encoded resource path
# Format: {encoded-sub-id},{resource-group},,{foundry-resource},{project-name}
# Note: Two commas between resource-group and foundry-resource
$encodedPath = "$encodedSubId,$ResourceGroup,,$FoundryResource,$ProjectName"
# Build the full URL
$baseUrl = "https://ai.azure.com/nextgen/r/"
$deploymentPath = "/build/models/deployments/$DeploymentName/details"
return "$baseUrl$encodedPath$deploymentPath"
}
# Generate and output the URL
$url = Get-FoundryDeploymentUrl `
-SubscriptionId $SubscriptionId `
-ResourceGroup $ResourceGroup `
-FoundryResource $FoundryResource `
-ProjectName $ProjectName `
-DeploymentName $DeploymentName
Write-Output $url
generate_deployment_url.sh 2.6 KB
#!/bin/bash
# Generate Azure AI Foundry portal URL for a model deployment
# This script creates a direct clickable link to view a deployment in the Azure AI Foundry portal
set -e
# Function to display usage
usage() {
cat << EOF
Usage: $0 --subscription SUBSCRIPTION_ID --resource-group RESOURCE_GROUP \\
--foundry-resource FOUNDRY_RESOURCE --project PROJECT_NAME \\
--deployment DEPLOYMENT_NAME
Generate Azure AI Foundry deployment URL
Required arguments:
--subscription Azure subscription ID (GUID)
--resource-group Resource group name
--foundry-resource Foundry resource (account) name
--project Project name
--deployment Deployment name
Example:
$0 --subscription d5320f9a-73da-4a74-b639-83efebc7bb6f \\
--resource-group bani-host \\
--foundry-resource banide-host-resource \\
--project banide-host \\
--deployment text-embedding-ada-002
EOF
exit 1
}
# Parse command line arguments
while [[ $# -gt 0 ]]; do
case $1 in
--subscription)
SUBSCRIPTION_ID="$2"
shift 2
;;
--resource-group)
RESOURCE_GROUP="$2"
shift 2
;;
--foundry-resource)
FOUNDRY_RESOURCE="$2"
shift 2
;;
--project)
PROJECT_NAME="$2"
shift 2
;;
--deployment)
DEPLOYMENT_NAME="$2"
shift 2
;;
-h|--help)
usage
;;
*)
echo "Unknown option: $1"
usage
;;
esac
done
# Validate required arguments
if [ -z "$SUBSCRIPTION_ID" ] || [ -z "$RESOURCE_GROUP" ] || [ -z "$FOUNDRY_RESOURCE" ] || \
[ -z "$PROJECT_NAME" ] || [ -z "$DEPLOYMENT_NAME" ]; then
echo "Error: Missing required arguments"
usage
fi
# Convert subscription GUID to bytes (big-endian/string order) and encode as base64url
# Remove hyphens from GUID
GUID_HEX=$(echo "$SUBSCRIPTION_ID" | tr -d '-')
# Convert hex string to bytes and base64 encode
# Using xxd to convert hex to binary, then base64 encode
ENCODED_SUB=$(echo "$GUID_HEX" | xxd -r -p | base64 | tr '+' '-' | tr '/' '_' | tr -d '=')
# Build the encoded resource path
# Format: {encoded-sub-id},{resource-group},,{foundry-resource},{project-name}
# Note: Two commas between resource-group and foundry-resource
ENCODED_PATH="${ENCODED_SUB},${RESOURCE_GROUP},,${FOUNDRY_RESOURCE},${PROJECT_NAME}"
# Build the full URL
BASE_URL="https://ai.azure.com/nextgen/r/"
DEPLOYMENT_PATH="/build/models/deployments/${DEPLOYMENT_NAME}/details"
echo "${BASE_URL}${ENCODED_PATH}${DEPLOYMENT_PATH}"
connections.md 3.1 KB
# Foundry Project Connections
Connections authenticate and link external resources to a Foundry project. Many agent tools (Azure AI Search, Bing Grounding, MCP) require a project connection before use.
## Managing Connections via MCP
Use the Foundry MCP server for all connection operations. The MCP tools handle authentication, validation, and project scoping automatically.
| Operation | MCP Tool | Description |
|-----------|----------|-------------|
| List all connections | `project_connection_list` | Lists project connections and can filter by category or target |
| Get connection details | `project_connection_get` | Retrieves a specific connection by `connectionName` |
| Create a connection | `project_connection_create` | Creates or replaces a project connection to an external resource |
| Update a connection | `project_connection_update` | Updates auth, category, target, or expiry on an existing connection |
| Delete a connection | `project_connection_delete` | Removes a connection from the project by name |
| List supported categories/auth types | `project_connection_list_metadata` | Lists valid connection categories and auth types before create/update |
> π‘ **Tip:** Use `project_connection_get` or `project_connection_list` to resolve the connection name and full connection resource ID before configuring agent tools that require `project_connection_id`.
## Create Connection via Portal
1. Open [Microsoft Foundry portal](https://ai.azure.com)
2. Navigate to **Operate** β **Admin** β select your project
3. Select **Add connection** β choose service type
4. Browse for resource, select auth method, click **Add connection**
## Connection ID Format
For REST and TypeScript samples, the full connection ID format is:
```
/subscriptions/{subId}/resourceGroups/{rg}/providers/Microsoft.CognitiveServices/accounts/{account}/projects/{project}/connections/{connectionName}
```
Python and C# SDKs resolve this automatically from the connection name.
## Common Connection Types
| Type | Resource | Used By |
|------|----------|---------|
| `azure_ai_search` | Azure AI Search | AI Search tool |
| `bing` | Grounding with Bing Search | Bing grounding tool |
| `bing_custom_search` | Grounding with Bing Custom Search | Bing Custom Search tool |
| `api_key` | Any API-key resource | MCP servers, custom tools |
| `azure_openai` | Azure OpenAI | Model access |
| `AzureStorageAccount` | Azure Blob Storage | Dataset upload via `evaluation_dataset_create` |
## RBAC for Connection Management
| Role | Scope | Permission |
|------|-------|------------|
| **Azure AI Project Manager** | Project | Create/manage project connections |
| **Contributor** or **Owner** | Subscription/RG | Create Bing/Search resources, get keys |
## Troubleshooting
| Error | Cause | Fix |
|-------|-------|-----|
| `Connection not found` | Name mismatch or wrong project | Use `project_connection_list` to find the correct `connectionName` |
| `Unauthorized` creating connection | Missing Azure AI Project Manager role | Assign role on the Foundry project |
| `Invalid connection ID format` | Using name instead of full resource ID | Use `project_connection_get` to resolve the full ID |
create-foundry-project.md 5.5 KB
---
name: foundry-create-project
description: |
Create a new Azure AI Foundry project using Azure Developer CLI (azd) to provision infrastructure for hosting AI agents and models.
USE FOR: create Foundry project, new AI Foundry project, set up Foundry, azd init Foundry, provision Foundry infrastructure, onboard to Foundry, create Azure AI project, set up AI project.
DO NOT USE FOR: deploying agents to existing projects (use agent/deploy), creating agent code (use agent/create), deploying AI models from catalog (use microsoft-foundry main skill), Azure Functions (use azure-functions).
allowed-tools: Read, Write, Bash, AskUserQuestion
---
# Create Azure AI Foundry Project
Create a new Azure AI Foundry project using azd. Provisions: Foundry account, project, Application Insights, managed identity, and RBAC permissions. Optionally enables hosted agents (capability host + Container Registry).
**Table of Contents:** [Prerequisites](#prerequisites) Β· [Workflow](#workflow) Β· [Best Practices](#best-practices) Β· [Troubleshooting](#troubleshooting) Β· [Related Skills](#related-skills) Β· [Resources](#resources)
## Prerequisites
Run checks in order. STOP on any failure and resolve before proceeding.
**1. Azure CLI** β `az version` β expects version output. If missing: https://aka.ms/installazurecli
**2. Azure login & subscription:**
```bash
az account show --query "{Name:name, SubscriptionId:id, State:state}" -o table
```
If not logged in, run `az login`. If no active subscription: https://azure.microsoft.com/free/ β STOP.
If multiple subscriptions, ask which to use, then `az account set --subscription "<id>"`.
**3. Role permissions:**
```bash
az role assignment list --assignee "$(az ad signed-in-user show --query id -o tsv)" --query "[?contains(roleDefinitionName, 'Owner') || contains(roleDefinitionName, 'Contributor') || contains(roleDefinitionName, 'Azure AI')].{Role:roleDefinitionName, Scope:scope}" -o table
```
Requires Owner, Contributor, or Azure AI Owner. If insufficient β STOP, request elevated access from admin.
**4. Azure Developer CLI** β `azd version`. If missing: https://aka.ms/azure-dev/install
## Workflow
### Step 1: Verify azd login
```bash
azd auth login --check-status
```
If not logged in, run `azd auth login` and complete browser auth.
### Step 2: Ask User for Project Details
Use AskUserQuestion for:
1. **Project name** β used as azd environment name and resource group (`rg-<name>`). Must contain only alphanumeric characters and hyphens. Examples: `my-ai-project`, `dev-agents`
2. **Azure location** (optional) β defaults to North Central US (required for hosted agents preview)
3. **Enable hosted agents?** (yes/no) β provisions a capability host and Container Registry for deploying hosted agents. Defaults to no.
### Step 3: Create Directory and Initialize
```bash
mkdir "<project-name>" && cd "<project-name>"
azd init -t https://github.com/Azure-Samples/azd-ai-starter-basic -e <project-name> --no-prompt
```
- `-t` β Azure AI starter template (Foundry infrastructure)
- `-e` β environment name
- `--no-prompt` β non-interactive, use defaults
- **IMPORTANT:** `azd init` requires an empty directory
If user specified a non-default location:
```bash
azd config set defaults.location <location>
```
If user chose to enable hosted agents:
```bash
azd env set ENABLE_HOSTED_AGENTS true
```
This provisions a capability host (`capabilityHosts/agents`) on the Foundry account and auto-adds an Azure Container Registry for hosted agent deployments.
### Step 4: Provision Infrastructure
```bash
azd provision --no-prompt
```
Takes 5β10 minutes. Creates resource group, Foundry account/project, Application Insights, managed identity, and RBAC roles. If hosted agents enabled, also creates Container Registry and capability host.
### Step 5: Retrieve Project Details
```bash
azd env get-values
```
Capture `AZURE_AI_PROJECT_ID`, `AZURE_AI_PROJECT_ENDPOINT`, and `AZURE_RESOURCE_GROUP`. Direct user to verify at https://ai.azure.com.
### Step 6: Next Steps
- Deploy an agent β `agent/deploy` skill
- Browse models β `foundry_models_list` MCP tool
- Manage project β https://ai.azure.com
## Best Practices
- Use North Central US for hosted agents (preview requirement)
- Name must be alphanumeric + hyphens only β no spaces, underscores, or special characters
- Delete unused projects with `azd down` to avoid ongoing costs
- `azd down` deletes ALL resources β Foundry account, agents, models, Container Registry, and Application Insights data
- `azd provision` is safe to re-run on failure
## Troubleshooting
| Problem | Solution |
|---------|----------|
| `azd: command not found` | Install from https://aka.ms/azure-dev/install |
| `ERROR: Failed to authenticate` | Run `azd auth login`; verify subscription with `az account list` |
| `environment name '' is invalid` | Name must be alphanumeric + hyphens only |
| `ERROR: Insufficient permissions` | Request Contributor or Azure AI Owner role from admin |
| Region not supported for hosted agents | Use `azd config set defaults.location northcentralus` |
| Provisioning timeout | Check region availability, verify connectivity, retry `azd provision` |
## Related Skills
- **agent/deploy** β Deploy agents to the created project
- **agent/create** β Create a new agent for deployment
## Resources
- [Azure Developer CLI](https://aka.ms/azure-dev/install) Β· [AI Foundry Portal](https://ai.azure.com) Β· [Foundry Docs](https://learn.microsoft.com/azure/ai-foundry/) Β· [azd-ai-starter-basic template](https://github.com/Azure-Samples/azd-ai-starter-basic)
quota.md 8.9 KB
# Microsoft Foundry Quota Management
Quota and capacity management for Microsoft Foundry. Quotas are **subscription + region** level.
> β οΈ **Important:** This is the **authoritative skill** for all Foundry quota operations. When a user asks about quota, capacity, TPM, PTU, quota errors, or deployment limits, **always invoke this skill** rather than using MCP tools (azure-quota, azure-documentation, azure-foundry) directly. This skill provides structured workflows and error handling that direct tool calls lack.
> **Important:** All quota operations are **control plane (management)** operations. Use **Azure CLI commands** (`az cognitiveservices`, `az rest`, `az ai`) as the primary method.
## Quota Types
| Type | Description |
|------|-------------|
| **TPM** | Tokens Per Minute, pay-per-token, subject to rate limits |
| **PTU** | Provisioned Throughput Units, monthly commitment, no rate limits |
| **Region** | Max capacity per region, shared across subscription |
| **Slots** | 10-20 deployment slots per resource |
**When to use PTU:** Consistent high-volume production workloads where monthly commitment is cost-effective.
---
Use this sub-skill when the user needs to:
- **View quota usage** β check current TPM/PTU allocation and available capacity
- **Check quota limits** β show quota limits for a subscription, region, or model
- **Find optimal regions** β compare quota availability across regions for deployment
- **Plan deployments** β verify sufficient quota before deploying models
- **Request quota increases** β navigate quota increase process through Azure Portal
- **Troubleshoot deployment failures** β diagnose QuotaExceeded, InsufficientQuota, DeploymentLimitReached, 429 rate limit errors
- **Optimize allocation** β monitor and consolidate quota across deployments
- **Monitor quota across deployments** β track capacity by model and region
- **Explain quota concepts** β explain TPM, PTU, capacity units, regional quotas
- **Free up quota** β identify and delete unused deployments
**Key Points:**
1. Isolated by region (East US β West US)
2. Regional capacity varies by model
3. Multi-region enables failover and load distribution
4. Quota requests specify target region
See [detailed guide](./references/workflows.md#regional-quota).
---
## Core Workflows
### 1. Check Regional Quota
```bash
subId=$(az account show --query id -o tsv)
az rest --method get \
--url "https://management.azure.com/subscriptions/$subId/providers/Microsoft.CognitiveServices/locations/eastus/usages?api-version=2023-05-01" \
--query "value[?contains(name.value,'OpenAI')].{Model:name.value, Used:currentValue, Limit:limit}" -o table
```
**Output interpretation:**
- **Used**: Current TPM consumed (10000 = 10K TPM)
- **Limit**: Maximum TPM quota (15000 = 15K TPM)
- **Available**: Limit - Used (5K TPM available)
Change region: `eastus`, `eastus2`, `westus`, `westus2`, `swedencentral`, `uksouth`.
---
### 2. Find Best Region for Deployment
Check specific regions for available quota:
```bash
subId=$(az account show --query id -o tsv)
region="eastus"
az rest --method get \
--url "https://management.azure.com/subscriptions/$subId/providers/Microsoft.CognitiveServices/locations/$region/usages?api-version=2023-05-01" \
--query "value[?name.value=='OpenAI.Standard.gpt-4o'].{Model:name.value, Used:currentValue, Limit:limit, Available:(limit-currentValue)}" -o table
```
See [workflows reference](./references/workflows.md#multi-region-check) for multi-region comparison.
---
### 3. Check Quota Before Deployment
Verify available quota for your target model:
```bash
subId=$(az account show --query id -o tsv)
region="eastus"
model="OpenAI.Standard.gpt-4o"
az rest --method get \
--url "https://management.azure.com/subscriptions/$subId/providers/Microsoft.CognitiveServices/locations/$region/usages?api-version=2023-05-01" \
--query "value[?name.value=='$model'].{Model:name.value, Used:currentValue, Limit:limit, Available:(limit-currentValue)}" -o table
```
- **Available > 0**: Yes, you have quota
- **Available = 0**: Delete unused deployments or try different region
---
### 4. Monitor Quota by Model
Show quota allocation grouped by model:
```bash
subId=$(az account show --query id -o tsv)
region="eastus"
az rest --method get \
--url "https://management.azure.com/subscriptions/$subId/providers/Microsoft.CognitiveServices/locations/$region/usages?api-version=2023-05-01" \
--query "value[?contains(name.value,'OpenAI')].{Model:name.value, Used:currentValue, Limit:limit, Available:(limit-currentValue)}" -o table
```
Shows aggregate usage across ALL deployments by model type.
**Optional:** List individual deployments:
- **Azure MCP tool**: Use `model_deployment_get` to query deployments in a Foundry project
- **Azure CLI**:
```bash
az cognitiveservices account list --query "[?kind=='AIServices'].{Name:name,RG:resourceGroup}" -o table
az cognitiveservices account deployment list --name <resource> --resource-group <rg> \
--query "[].{Name:name,Model:properties.model.name,Capacity:sku.capacity}" -o table
```
---
### 5. Delete Deployment (Free Quota)
```bash
az cognitiveservices account deployment delete --name <resource> --resource-group <rg> \
--deployment-name <deployment>
```
Quota freed **immediately**. Re-run Workflow #1 to verify.
---
### 6. Request Quota Increase
**Azure Portal Process:**
1. Navigate to [Azure Portal - All Resources](https://portal.azure.com/#view/HubsExtension/BrowseAll) β Filter "AI Services" β Click resource
2. Select **Quotas** in left navigation
3. Click **Request quota increase**
4. Fill form: Model, Current Limit, Requested Limit, Region, **Business Justification** (required field)
5. Wait for approval: **3-5 business days typically, up to 10 business days** ([source](https://learn.microsoft.com/en-us/azure/ai-foundry/openai/how-to/quota))
**Business Justification** is a mandatory field that explains why you need more quota. Azure reviews each request to ensure resources are allocated based on legitimate business needs. A strong justification includes:
- **Workload details**: What you're building and which model you need
- **Data-driven estimates**: Expected traffic volume and token usage calculations
- **Clear need**: Why current quota is insufficient and what capacity you require
- **Timeline**: When you need the increased quota (e.g., production launch date)
**Business Justification template:**
```
Production [workload type] using [model] in [region].
Expected traffic: [X requests/day] with [Y tokens/request].
Calculated required TPM: [Z TPM]. Current [N TPM] insufficient.
Request increase to [M TPM]. Deployment target: [date].
```
See [detailed quota request guide](./references/workflows.md#request-quota-increase) for complete steps.
---
## Quick Troubleshooting
| Error | Quick Fix | Detailed Guide |
|-------|-----------|----------------|
| `QuotaExceeded` | Delete unused deployments or request increase | [Error Resolution](./references/error-resolution.md#quotaexceeded) |
| `InsufficientQuota` | Reduce capacity or try different region | [Error Resolution](./references/error-resolution.md#insufficientquota) |
| `DeploymentLimitReached` | Delete unused deployments (10-20 slot limit) | [Error Resolution](./references/error-resolution.md#deploymentlimitreached) |
| `429 Rate Limit` | Increase TPM or migrate to PTU | [Error Resolution](./references/error-resolution.md#429-errors) |
---
## References
**Detailed Guides:**
- [Error Resolution Workflows](./references/error-resolution.md) - Detailed workflows for quota exhausted, 429 errors, insufficient quota, deployment limits
- [Troubleshooting Guide](./references/troubleshooting.md) - Quick error fixes and diagnostic commands
- [Quota Optimization Strategies](./references/optimization.md) - 5 strategies for freeing quota and reducing costs
- [Capacity Planning Guide](./references/capacity-planning.md) - TPM vs PTU comparison, model selection, workload calculations
- [Workflows Reference](./references/workflows.md) - Complete workflow steps and multi-region checks
- [PTU Guide](./references/ptu-guide.md) - Provisioned throughput capacity planning
**Official Microsoft Documentation:**
- [Azure OpenAI Service Pricing](https://azure.microsoft.com/en-us/pricing/details/cognitive-services/openai-service/) - Official pay-per-token rates
- [PTU Costs and Billing](https://learn.microsoft.com/en-us/azure/ai-foundry/openai/how-to/provisioned-throughput-onboarding) - PTU hourly rates
- [Azure OpenAI Models](https://learn.microsoft.com/en-us/azure/ai-services/openai/concepts/models) - Model capabilities and regions
- [Quota Management Guide](https://learn.microsoft.com/azure/ai-services/openai/how-to/quota) - Official quota procedures
- [Quotas and Limits](https://learn.microsoft.com/azure/ai-services/openai/quotas-limits) - Rate limits and quota details
**Calculators:**
- [Azure Pricing Calculator](https://azure.microsoft.com/pricing/calculator/) - Official pricing estimator
- Azure AI Foundry PTU calculator (Microsoft Foundry β Operate β Quota β Provisioned Throughput Unit tab) - PTU capacity sizing
capacity-planning.md 8.1 KB
# Capacity Planning Guide
Comprehensive guide for planning Azure AI Foundry capacity, including cost analysis, model selection, and workload calculations.
**Table of Contents:** [Cost Comparison: TPM vs PTU](#cost-comparison-tpm-vs-ptu) Β· [Production Workload Examples](#production-workload-examples) Β· [Model Selection and Deployment Type Guidance](#model-selection-and-deployment-type-guidance)
## Cost Comparison: TPM vs PTU
> **Official Pricing Sources:**
> - [Azure OpenAI Service Pricing](https://azure.microsoft.com/en-us/pricing/details/cognitive-services/openai-service/) - Official pay-per-token rates
> - [PTU Costs and Billing Guide](https://learn.microsoft.com/en-us/azure/ai-foundry/openai/how-to/provisioned-throughput-onboarding) - PTU hourly rates and capacity planning
**TPM (Standard) Pricing:**
- Pay-per-token for input/output
- No upfront commitment
- **Rates**: See [Azure OpenAI Pricing](https://azure.microsoft.com/en-us/pricing/details/cognitive-services/openai-service/)
- GPT-4o: ~$0.0025-$0.01/1K tokens
- GPT-4 Turbo: ~$0.01-$0.03/1K
- GPT-3.5 Turbo: ~$0.0005-$0.0015/1K
- **Best for**: Variable workloads, unpredictable traffic
**PTU (Provisioned) Pricing:**
- Hourly billing: `$/PTU/hr Γ PTUs Γ 730 hrs/month`
- Monthly commitment with Reservations discounts
- **Rates**: See [PTU Billing Guide](https://learn.microsoft.com/en-us/azure/ai-foundry/openai/how-to/provisioned-throughput-onboarding)
- Use PTU calculator to determine requirements (Microsoft Foundry β Operate β Quota β Provisioned Throughput Unit tab)
- **Best for**: High-volume (>1M tokens/day), predictable traffic, guaranteed throughput
**Cost Decision Framework** (Analytical Guidance):
```
Step 1: Calculate monthly TPM cost
Monthly TPM cost = (Daily tokens Γ 30 days Γ $price per 1K tokens) / 1000
Step 2: Calculate monthly PTU cost
Monthly PTU cost = Required PTUs Γ 730 hours/month Γ $PTU-hour rate
(Get Required PTUs from Azure AI Foundry portal: Microsoft Foundry β Operate β Quota β Provisioned Throughput Unit tab)
Step 3: Compare
Use PTU when: Monthly PTU cost < (Monthly TPM cost Γ 0.7)
(Use 70% threshold to account for commitment risk)
```
**Example Calculation** (Analytical):
Scenario: 1M requests/day, average 1,000 tokens per request
- **Daily tokens**: 1,000,000 Γ 1,000 = 1B tokens/day
- **TPM Cost** (using GPT-4o at $0.005/1K avg): (1B Γ 30 Γ $0.005) / 1000 = ~$150,000/month
- **PTU Cost** (estimated 100 PTU at ~$5/PTU-hour): 100 PTU Γ 730 hours Γ $5 = ~$365,000/month
- **Decision**: Use TPM (significantly lower cost for this workload)
> **Important**: Always use the official [Azure Pricing Calculator](https://azure.microsoft.com/pricing/calculator/) and Azure AI Foundry portal PTU calculator (Microsoft Foundry β Operate β Quota β Provisioned Throughput Unit tab) for exact pricing by model, region, and workload. Prices vary by region and are subject to change.
---
## Production Workload Examples
To estimate quota requirements, use real-world production scenarios with capacity calculations for gpt-4, version 0613 (from Azure Foundry Portal calculator):
| Workload Type | Calls/Min | Prompt Tokens | Response Tokens | Cache Hit % | Total Tokens/Min | PTU Required | TPM Equivalent |
|---------------|-----------|---------------|-----------------|-------------|------------------|--------------|----------------|
| **RAG Chat** | 10 | 3,500 | 300 | 20% | 38,000 | 100 | 38K TPM |
| **Basic Chat** | 10 | 500 | 100 | 20% | 6,000 | 100 | 6K TPM |
| **Summarization** | 10 | 5,000 | 300 | 20% | 53,000 | 100 | 53K TPM |
| **Classification** | 10 | 3,800 | 10 | 20% | 38,100 | 100 | 38K TPM |
**How to Estimate Your Production Quota Requirements:**
To calculate your quota needs for production deployments, follow these steps:
1. **Determine your peak calls per minute**: Monitor or estimate maximum concurrent requests
2. **Measure token usage**: Average prompt size + response size
3. **Account for cache hits**: Prompt caching can reduce effective token count by 20-50%
4. **Calculate total tokens/min**: (Calls/min Γ (Prompt tokens + Response tokens)) Γ (1 - Cache %)
5. **Choose deployment type**:
- **TPM (Standard)**: Allocate 1.5-2Γ your calculated tokens/min for headroom
- **PTU (Provisioned)**: Use Azure AI Foundry portal PTU calculator for exact PTU count (Microsoft Foundry β Operate β Quota β Provisioned Throughput Unit tab)
**Example Calculation (RAG Chat Production):**
- Peak: 10 calls/min
- Prompt: 3,500 tokens (context + question)
- Response: 300 tokens (answer)
- Cache: 20% hit rate (reduces prompt tokens by 20%)
- **Total TPM needed**: (10 Γ (3,500 Γ 0.8 + 300)) = 31,000 TPM
- **With 50% headroom**: 46,500 TPM β Round to **50K TPM deployment**
**PTU Recommendation:**
For the combined workload (40 calls/min, 135K tokens/min total), use **200 PTU** (from calculator above).
---
## Model Selection and Deployment Type Guidance
> **Official Documentation:**
> - [Choose the Right AI Model for Your Workload](https://learn.microsoft.com/en-us/azure/architecture/ai-ml/guide/choose-ai-model) - Microsoft Architecture Center
> - [Azure OpenAI Models](https://learn.microsoft.com/en-us/azure/ai-services/openai/concepts/models) - Model capabilities, regions, and quotas
> - [Understanding Deployment Types](https://learn.microsoft.com/en-us/azure/ai-foundry/foundry-models/concepts/deployment-types) - Standard vs Provisioned guidance
**Model Characteristics** (from [official Azure OpenAI documentation](https://learn.microsoft.com/en-us/azure/ai-services/openai/concepts/models)):
| Model | Key Characteristics | Best For |
|-------|---------------------|----------|
| **GPT-4o** | Matches GPT-4 Turbo performance in English text/coding, superior in non-English and vision tasks. Cheaper and faster than GPT-4 Turbo. | Multimodal tasks, cost-effective general purpose, high-volume production workloads |
| **GPT-4 Turbo** | Superior reasoning capabilities, larger context window (128K tokens) | Complex reasoning tasks, long-context analysis |
| **GPT-3.5 Turbo** | Most cost-effective, optimized for chat and completions, fast response time | Simple tasks, customer service, high-volume low-cost scenarios |
| **GPT-4o mini** | Fastest response time, low latency | Latency-sensitive applications requiring immediate responses |
| **text-embedding-3-large** | Purpose-built for vector embeddings | RAG applications, semantic search, document similarity |
**Deployment Type Selection** (from [official deployment types guide](https://learn.microsoft.com/en-us/azure/ai-foundry/foundry-models/concepts/deployment-types)):
| Traffic Pattern | Recommended Deployment Type | Reason |
|-----------------|---------------------------|---------|
| **Variable, bursty traffic** | Standard or Global Standard (pay-per-token) | No commitment, pay only for usage |
| **Consistent high volume** | Provisioned types (PTU) | Reserved capacity, predictable costs |
| **Large batch jobs (non-time-sensitive)** | Global Batch or DataZone Batch | 50% cost savings vs Standard |
| **Low latency variance required** | Provisioned types | Guaranteed throughput, no rate limits |
| **No regional restrictions** | Global Standard or Global Provisioned | Access to best available capacity |
**Capacity Planning Approach** (from [PTU onboarding guide](https://learn.microsoft.com/en-us/azure/ai-foundry/openai/how-to/provisioned-throughput-onboarding)):
To calculate and estimate your capacity requirements:
1. **Calculate your TPM requirements**: Determine required tokens per minute based on your expected workload
2. **Use the built-in capacity planner**: Available in Azure AI Foundry portal (Microsoft Foundry β Operate β Quota β Provisioned Throughput Unit tab)
3. **Input your metrics**: Enter input TPM and output TPM based on your workload characteristics
4. **Get PTU recommendation**: The calculator provides PTU allocation recommendation
5. **Compare costs**: Evaluate Standard (TPM) vs Provisioned (PTU) using the official pricing calculator
> **Note**: Microsoft does not publish specific "X requests/day = Y TPM" recommendations as capacity requirements vary significantly based on prompt size, response length, cache hit rates, and model choice. Use the built-in capacity planner with your actual workload characteristics.
error-resolution.md 4.9 KB
# Error Resolution Workflows
**Table of Contents:** [Workflow 7: Quota Exhausted Recovery](#workflow-7-quota-exhausted-recovery) Β· [Workflow 8: Resolve 429 Rate Limit Errors](#workflow-8-resolve-429-rate-limit-errors) Β· [Workflow 9: Resolve DeploymentLimitReached](#workflow-9-resolve-deploymentlimitreached) Β· [Workflow 10: Resolve InsufficientQuota](#workflow-10-resolve-insufficientquota) Β· [Workflow 11: Resolve QuotaExceeded](#workflow-11-resolve-quotaexceeded)
## Workflow 7: Quota Exhausted Recovery
**A. Deploy to Different Region**
```bash
subId=$(az account show --query id -o tsv)
for region in eastus westus eastus2 westus2 swedencentral uksouth; do
az rest --method get --url "https://management.azure.com/subscriptions/$subId/providers/Microsoft.CognitiveServices/locations/$region/usages?api-version=2023-05-01" \
--query "value[?name.value=='OpenAI.Standard.gpt-4o'].{Model:name.value, Used:currentValue, Limit:limit, Available:(limit-currentValue)}" -o table &
done; wait
```
**B. Delete Unused Deployments**
```bash
az cognitiveservices account deployment delete --name <resource> --resource-group <rg> --deployment-name <deployment>
```
**C. Request Quota Increase (3-5 days)**
**D. Migrate to PTU** - See capacity-planning.md
---
## Workflow 8: Resolve 429 Rate Limit Errors
**Identify Deployment:**
```bash
az cognitiveservices account deployment list --name <resource> --resource-group <rg> \
--query "[].{Name:name,Model:properties.model.name,TPM:sku.capacity*1000}" -o table
```
**Solutions:**
**A. Increase Capacity**
```bash
az cognitiveservices account deployment update --name <resource> --resource-group <rg> --deployment-name <deployment> --sku-capacity 100
```
**B. Add Retry Logic** - Exponential backoff in code
**C. Load Balance**
```bash
az cognitiveservices account deployment create --name <resource> --resource-group <rg> --deployment-name gpt-4o-2 \
--model-name gpt-4o --model-version "2024-05-13" --model-format OpenAI --sku-name Standard --sku-capacity 100
```
**D. Migrate to PTU** - No rate limits
---
## Workflow 9: Resolve DeploymentLimitReached
**Root Cause:** 10-20 slots per resource.
**Check Count:**
```bash
deployment_count=$(az cognitiveservices account deployment list --name <resource> --resource-group <rg> --query "length(@)")
echo "Deployments: $deployment_count / ~20 slots"
```
**Find Test Deployments:**
```bash
az cognitiveservices account deployment list --name <resource> --resource-group <rg> \
--query "[?contains(name,'test') || contains(name,'demo')].{Name:name}" -o table
```
**Delete:**
```bash
az cognitiveservices account deployment delete --name <resource> --resource-group <rg> --deployment-name <deployment>
```
**Or Create New Resource (fresh 10-20 slots):**
```bash
az cognitiveservices account create --name "my-foundry-2" --resource-group <rg> --location eastus --kind AIServices --sku S0 --yes
```
---
## Workflow 10: Resolve InsufficientQuota
**Root Cause:** Requested capacity exceeds available quota.
**Check Quota:**
```bash
subId=$(az account show --query id -o tsv)
az rest --method get --url "https://management.azure.com/subscriptions/$subId/providers/Microsoft.CognitiveServices/locations/eastus/usages?api-version=2023-05-01" \
--query "value[?contains(name.value,'OpenAI')].{Model:name.value, Used:currentValue, Limit:limit, Available:(limit-currentValue)}" -o table
```
**Solutions:**
**A. Reduce Capacity**
```bash
az cognitiveservices account deployment create --name <resource> --resource-group <rg> --deployment-name gpt-4o \
--model-name gpt-4o --model-version "2024-05-13" --model-format OpenAI --sku-name Standard --sku-capacity 20
```
**B. Delete Unused Deployments**
```bash
az cognitiveservices account deployment delete --name <resource> --resource-group <rg> --deployment-name <unused>
```
**C. Different Region** - Check quota with multi-region script (Workflow 7)
**D. Request Increase (3-5 days)**
---
## Workflow 11: Resolve QuotaExceeded
**Root Cause:** Deployment exceeds regional quota.
**Check Quota:**
```bash
subId=$(az account show --query id -o tsv)
az rest --method get --url "https://management.azure.com/subscriptions/$subId/providers/Microsoft.CognitiveServices/locations/eastus/usages?api-version=2023-05-01" \
--query "value[?contains(name.value,'OpenAI')]" -o table
```
**Multi-Region Check:** (Use Workflow 7 script)
**Solutions:**
**A. Delete Unused Deployments**
```bash
az cognitiveservices account deployment delete --name <resource> --resource-group <rg> --deployment-name <unused>
```
**B. Different Region**
```bash
az cognitiveservices account deployment create --name <resource> --resource-group <rg> --deployment-name gpt-4o \
--model-name gpt-4o --model-version "2024-05-13" --model-format OpenAI --sku-name Standard --sku-capacity 50
```
**C. Request Increase (3-5 days)**
**D. Reduce Capacity**
**Decision:** Available < 10% β Different region; 10-50% β Delete/reduce; > 50% β Delete one deployment
---
optimization.md 7.6 KB
# Quota Optimization Strategies
Comprehensive strategies for optimizing Azure AI Foundry quota allocation and reducing costs.
**Table of Contents:** [1. Identify and Delete Unused Deployments](#1-identify-and-delete-unused-deployments) Β· [2. Right-Size Over-Provisioned Deployments](#2-right-size-over-provisioned-deployments) Β· [3. Consolidate Multiple Small Deployments](#3-consolidate-multiple-small-deployments) Β· [4. Cost Optimization Strategies](#4-cost-optimization-strategies) Β· [5. Regional Quota Rebalancing](#5-regional-quota-rebalancing)
## 1. Identify and Delete Unused Deployments
**Step 1: Discovery with Quota Context**
Get quota limits FIRST to understand how close you are to capacity:
```bash
# Check current quota usage vs limits (run this FIRST)
subId=$(az account show --query id -o tsv)
region="eastus" # Change to your region
az rest --method get \
--url "https://management.azure.com/subscriptions/$subId/providers/Microsoft.CognitiveServices/locations/$region/usages?api-version=2023-05-01" \
--query "value[?contains(name.value,'OpenAI')].{Model:name.value, Used:currentValue, Limit:limit, Available:'(Limit - Used)'}" -o table
```
**Step 2: Parallel Deployment Enumeration**
List all deployments across resources efficiently:
```bash
# Get all Foundry resources
resources=$(az cognitiveservices account list --query "[?kind=='AIServices'].{name:name,rg:resourceGroup}" -o json)
# Parallel deployment enumeration (faster than sequential)
echo "$resources" | jq -r '.[] | "\(.name) \(.rg)"' | while read name rg; do
echo "=== $name ($rg) ==="
az cognitiveservices account deployment list --name "$name" --resource-group "$rg" \
--query "[].{Deployment:name,Model:properties.model.name,Capacity:sku.capacity,Created:systemData.createdAt}" -o table &
done
wait # Wait for all background jobs to complete
```
**Step 3: Identify Stale Deployments**
Criteria for deletion candidates:
- **Test/temporary naming**: Contains "test", "demo", "temp", "dev" in deployment name
- **Old timestamps**: Created >90 days ago with timestamp-based naming (e.g., "gpt4-20231015")
- **High capacity consumers**: Deployments with >100K TPM capacity that haven't been referenced in recent logs
- **Duplicate models**: Multiple deployments of same model/version in same region
**Example pattern matching for stale deployments:**
```bash
# Find deployments with test/temp naming
az cognitiveservices account deployment list --name <resource> --resource-group <rg> \
--query "[?contains(name,'test') || contains(name,'demo') || contains(name,'temp')].{Name:name,Capacity:sku.capacity}" -o table
```
**Step 4: Delete and Verify Quota Recovery**
```bash
# Delete unused deployment (quota freed IMMEDIATELY)
az cognitiveservices account deployment delete --name <resource> --resource-group <rg> --deployment-name <deployment>
# Verify quota freed (re-run Step 1 quota check)
# You should see "Used" decrease by the deployment's capacity
```
**Cost Impact Analysis:**
| Deployment Type | Capacity (TPM) | Quota Freed | Cost Impact (TPM) | Cost Impact (PTU) |
|-----------------|----------------|-------------|-------------------|-------------------|
| Test deployment | 10K TPM | 10K TPM | $0 (pay-per-use) | N/A |
| Unused production | 100K TPM | 100K TPM | $0 (pay-per-use) | N/A |
| Abandoned PTU deployment | 100 PTU | ~40K TPM equivalent | $0 TPM | **$3,650/month saved** (100 PTU Γ 730h Γ $0.05/h) |
| High-capacity test | 450K TPM | 450K TPM | $0 (pay-per-use) | N/A |
**Key Insight:** For TPM (Standard) deployments, deletion frees quota but has no direct cost impact (you pay per token used). For PTU (Provisioned) deployments, deletion **immediately stops hourly charges** and can save thousands per month.
---
## 2. Right-Size Over-Provisioned Deployments
**Identify over-provisioned deployments:**
- Check Azure Monitor metrics for actual token usage
- Compare allocated TPM vs. peak usage
- Look for deployments with <50% utilization
**Right-sizing example:**
```bash
# Update deployment to lower capacity
az cognitiveservices account deployment update --name <resource> --resource-group <rg> \
--deployment-name <deployment> --sku-capacity 30 # Reduce from 50K to 30K TPM
```
**Cost Optimization:**
- **TPM (Standard)**: Reduces regional quota consumption (no direct cost savings, pay-per-token)
- **PTU (Provisioned)**: Direct cost reduction (40% capacity reduction = 40% cost reduction)
---
## 3. Consolidate Multiple Small Deployments
**Pattern:** Multiple 10K TPM deployments β One 30-50K TPM deployment
**Benefits:**
- Fewer deployment slots consumed
- Simpler management
- Same total capacity, better utilization
**Example:**
- **Before**: 3 deployments @ 10K TPM each = 30K TPM total, 3 slots used
- **After**: 1 deployment @ 30K TPM = 30K TPM total, 1 slot used
- **Savings**: 2 deployment slots freed for other models
---
## 4. Cost Optimization Strategies
> **Official Documentation**: [Plan to manage costs for Azure OpenAI](https://learn.microsoft.com/en-us/azure/ai-services/openai/how-to/manage-costs) and [Fine-tuning cost management](https://learn.microsoft.com/en-us/azure/ai-foundry/openai/how-to/fine-tuning-cost-management)
**A. Use Fine-Tuned Smaller Models** (from [Microsoft Transparency Note](https://learn.microsoft.com/en-us/azure/ai-foundry/responsible-ai/openai/transparency-note)):
You can reduce costs or latency by swapping a fine-tuned version of a smaller/faster model (e.g., fine-tuned GPT-3.5-Turbo) for a more general-purpose model (e.g., GPT-4).
```bash
# Deploy fine-tuned GPT-3.5 Turbo as cost-effective alternative to GPT-4
az cognitiveservices account deployment create --name <resource> --resource-group <rg> \
--deployment-name gpt-35-tuned --model-name <your-fine-tuned-model> \
--model-format OpenAI --sku-name Standard --sku-capacity 10
```
**B. Remove Unused Fine-Tuned Deployments** (from [Fine-tuning cost management](https://learn.microsoft.com/en-us/azure/ai-foundry/openai/how-to/fine-tuning-cost-management)):
Fine-tuned model deployments incur **hourly hosting costs** even when not in use. Remove unused deployments promptly to control costs.
- Inactive deployments unused for **15 consecutive days** are automatically deleted
- Proactively delete unused fine-tuned deployments to avoid hourly charges
```bash
# Delete unused fine-tuned deployment
az cognitiveservices account deployment delete --name <resource> --resource-group <rg> \
--deployment-name <unused-fine-tuned-deployment>
```
**C. Batch Multiple Requests** (from [Cost optimization Q&A](https://learn.microsoft.com/en-us/answers/questions/1689253/how-to-optimize-costs-per-request-azure-openai-gpt)):
Batch multiple requests together to reduce the total number of API calls and lower overall costs.
**D. Use Commitment Tiers for Predictable Costs** (from [Managing costs guide](https://learn.microsoft.com/en-us/azure/ai-services/openai/how-to/manage-costs)):
- **Pay-as-you-go**: Bills according to usage (variable costs)
- **Commitment tiers**: Commit to using service features for a fixed fee (predictable costs, potential savings for consistent usage)
---
## 5. Regional Quota Rebalancing
If you have quota spread across multiple regions but only use some:
```bash
# Check quota across regions
for region in eastus westus uksouth; do
echo "=== $region ==="
subId=$(az account show --query id -o tsv)
az rest --method get \
--url "https://management.azure.com/subscriptions/$subId/providers/Microsoft.CognitiveServices/locations/$region/usages?api-version=2023-05-01" \
--query "value[?contains(name.value,'OpenAI')].{Model:name.value, Used:currentValue, Limit:limit}" -o table
done
```
**Optimization:** Concentrate deployments in fewer regions to maximize quota utilization per region.
ptu-guide.md 6.2 KB
# Provisioned Throughput Units (PTU) Guide
**Table of Contents:** [Understanding PTU vs Standard TPM](#understanding-ptu-vs-standard-tpm) Β· [When to Use PTU](#when-to-use-ptu) Β· [PTU Capacity Planning](#ptu-capacity-planning) Β· [Deploy Model with PTU](#deploy-model-with-ptu) Β· [Request PTU Quota Increase](#request-ptu-quota-increase) Β· [Understanding Region and Deployment Quotas](#understanding-region-and-deployment-quotas) Β· [External Resources](#external-resources)
## Understanding PTU vs Standard TPM
Microsoft Foundry offers two quota types:
### Standard TPM (Tokens Per Minute)
- Pay-as-you-go model, charged per token
- Each deployment consumes capacity units (e.g., 10K TPM, 50K TPM)
- Total regional quota shared across all deployments
- Subject to rate limiting during high demand (429 errors possible)
- Best for: Variable workloads, development, testing, bursty traffic
### Provisioned Throughput Units (PTU)
- Monthly commitment for guaranteed throughput
- No rate limiting, consistent latency
- Measured in PTU units (not TPM)
- Best for: Predictable, high-volume production workloads
- More cost-effective when consistent token usage justifies monthly commitment
## When to Use PTU
| Factor | Standard (TPM) | Provisioned (PTU) |
|--------|----------------|-------------------|
| **Best For** | Variable workloads, development, testing | Predictable production workloads |
| **Pricing** | Pay-per-token | Monthly commitment (hourly rate per PTU) |
| **Rate Limits** | Yes (429 errors possible) | No (guaranteed throughput) |
| **Latency** | Variable | Consistent |
| **Cost Decision** | Lower upfront commitment | More economical for consistent, high-volume usage |
| **Flexibility** | Scale up/down instantly | Requires planning and commitment |
| **Use Case** | Prototyping, bursty traffic | Production apps, high-volume APIs |
**Use PTU when:**
- Consistent, predictable token usage where monthly commitment is cost-effective
- Need guaranteed throughput (no 429 rate limit errors)
- Require consistent latency with performance SLA
- High-volume production workloads with stable traffic patterns
**Decision Guidance:**
Compare your current pay-as-you-go costs with PTU pricing. PTU may be more economical when consistent usage justifies the monthly commitment.
## PTU Capacity Planning
### Official Calculation Methods
> **Agent Instruction:** Only present official Azure capacity calculator methods below. Do NOT generate or suggest estimated PTU formulas, TPM-per-PTU conversion tables, or reference deprecated calculators (oai.azure.com/portal/calculator).
Calculate PTU requirements using these official methods:
**Method 1: Microsoft Foundry Portal**
1. Navigate to Microsoft Foundry portal
2. Go to **Operate** β **Quota**
3. Select **Provisioned throughput unit** tab
4. Click **Capacity calculator** button
5. Enter workload parameters (model, tokens/call, RPM, latency target)
6. Calculator returns exact PTU count needed
**Method 2: Using Azure REST API**
```bash
# Calculate required PTU capacity
curl -X POST "https://management.azure.com/subscriptions/<subscription-id>/providers/Microsoft.CognitiveServices/calculateModelCapacity?api-version=2024-10-01" \
-H "Authorization: Bearer <access-token>" \
-H "Content-Type: application/json" \
-d '{
"model": {
"format": "OpenAI",
"name": "gpt-4o",
"version": "2024-05-13"
},
"workload": {
"requestPerMin": 100,
"tokensPerMin": 50000,
"peakRequestsPerMin": 150
}
}'
```
## Deploy Model with PTU
### Step 1: Calculate PTU Requirements
Use the official capacity calculator methods above to determine required PTU capacity.
### Step 2: Deploy with PTU
```bash
# Deploy model with calculated PTU capacity
az cognitiveservices account deployment create \
--name <resource-name> \
--resource-group <rg> \
--deployment-name gpt-4o-ptu-deployment \
--model-name gpt-4o \
--model-version "2024-05-13" \
--model-format OpenAI \
--sku-name ProvisionedManaged \
--sku-capacity 100
# Check PTU deployment status
az cognitiveservices account deployment show \
--name <resource-name> \
--resource-group <rg> \
--deployment-name gpt-4o-ptu-deployment
```
**Key Differences from Standard TPM:**
- SKU name: `ProvisionedManaged` (not `Standard`)
- Capacity: Measured in PTU units (not K TPM)
- Billing: Monthly commitment regardless of usage
- No rate limiting (guaranteed throughput)
## Request PTU Quota Increase
PTU quota is separate from TPM quota and requires specific justification:
1. Navigate to Azure Portal β Foundry resource β **Quotas**
2. Select **Provisioned throughput unit** tab
3. Identify model needing PTU increase (e.g., "GPT-4o PTU")
4. Click **Request quota increase**
5. Fill form:
- Model name
- Requested PTU quota
- Include capacity calculator results in business justification
- Explain workload characteristics (volume, latency requirements)
6. Submit and monitor status
**Processing Time:** Typically 3-5 business days (longer than standard quota requests)
**Note:** PTU quota requests typically require stronger business justification due to commitment nature
**Alternative:** Deploy to different region with available PTU quota
## Understanding Region and Deployment Quotas
### Region Quota
- Maximum PTU capacity available in an Azure region
- Varies by model type (GPT-4, GPT-4o, etc.)
- Shared across subscription resources in same region
- Separate from TPM quota (you have both TPM and PTU quotas)
### Deployment Slots
- Number of concurrent model deployments allowed
- Typically 10-20 slots per resource
- Each PTU deployment uses one slot (same as TPM deployments)
- Deployment count limit is independent of capacity
## External Resources
- [Understanding PTU Costs](https://learn.microsoft.com/azure/ai-foundry/openai/how-to/provisioned-throughput-onboarding)
- [What Is Provisioned Throughput](https://learn.microsoft.com/azure/ai-foundry/openai/concepts/provisioned-throughput)
- [Calculate Model Capacity API](https://learn.microsoft.com/rest/api/aiservices/accountmanagement/calculate-model-capacity/calculate-model-capacity?view=rest-aiservices-accountmanagement-2024-10-01&tabs=HTTP)
- [PTU Overview](https://learn.microsoft.com/azure/ai-services/openai/concepts/provisioned-throughput)
troubleshooting.md 7.3 KB
# Troubleshooting Quota Errors
**Table of Contents:** [Common Quota Errors](#common-quota-errors) Β· [Detailed Error Resolution](#detailed-error-resolution) Β· [Request Quota Increase Process](#request-quota-increase-process) Β· [Diagnostic Commands](#diagnostic-commands) Β· [External Resources](#external-resources)
## Common Quota Errors
| Error | Cause | Quick Fix |
|-------|-------|-----------|
| `QuotaExceeded` | Regional quota consumed (TPM or PTU) | Delete unused deployments or request increase |
| `InsufficientQuota` | Not enough available for requested capacity | Reduce deployment capacity or free quota |
| `DeploymentLimitReached` | Too many deployment slots used | Delete unused deployments to free slots |
| `429 Rate Limit` | TPM capacity too low for traffic (Standard only) | Increase TPM capacity or migrate to PTU |
| `PTU capacity unavailable` | No PTU quota in region | Request PTU quota or try different region |
| `SKU not supported` | PTU not available for model/region | Check model availability or use Standard TPM |
## Detailed Error Resolution
### QuotaExceeded Error
All available TPM or PTU quota consumed in the region.
**Resolution:**
1. **Check current quota usage:**
```bash
subId=$(az account show --query id -o tsv)
region="eastus"
az rest --method get \
--url "https://management.azure.com/subscriptions/$subId/providers/Microsoft.CognitiveServices/locations/$region/usages?api-version=2023-05-01" \
--query "value[?contains(name.value,'OpenAI')].{Model:name.value, Used:currentValue, Limit:limit}" -o table
```
2. **Choose resolution:**
- **Option A**: Delete unused deployments to free quota
- **Option B**: Reduce requested deployment capacity
- **Option C**: Deploy to different region with available quota
- **Option D**: Request quota increase through Azure Portal
### InsufficientQuota Error
Available quota less than requested capacity.
**Resolution:**
1. **Check available quota:**
```bash
# Calculate available: limit - currentValue
subId=$(az account show --query id -o tsv)
region="eastus"
az rest --method get \
--url "https://management.azure.com/subscriptions/$subId/providers/Microsoft.CognitiveServices/locations/$region/usages?api-version=2023-05-01" \
--query "value[?name.value=='OpenAI.Standard.gpt-4o'].{Model:name.value, Used:currentValue, Limit:limit, Available:(limit-currentValue)}" -o table
```
2. **Options:**
- Reduce deployment capacity to fit available quota
- Delete existing deployments to free capacity
- Try different region with more available quota
- Request quota increase
### DeploymentLimitReached Error
Resource reached maximum deployment slot limit (10-20 slots).
**Resolution:**
1. **List existing deployments:**
```bash
az cognitiveservices account deployment list \
--name <resource-name> \
--resource-group <rg> \
--query '[].{Name:name, Model:properties.model.name, Capacity:sku.capacity}' \
--output table
```
2. **Delete unused deployments:**
```bash
az cognitiveservices account deployment delete \
--name <resource-name> \
--resource-group <rg> \
--deployment-name <unused-deployment-name>
```
3. **Verify slot freed:**
```bash
az cognitiveservices account deployment list \
--name <resource-name> \
--resource-group <rg> \
--query 'length([])'
```
### 429 Rate Limit Errors
TPM capacity insufficient for traffic volume (Standard TPM only).
**Resolution:**
1. **Check deployment capacity:**
```bash
az cognitiveservices account deployment show \
--name <resource-name> \
--resource-group <rg> \
--deployment-name <deployment-name> \
--query '{Name:name, Model:properties.model.name, Capacity:sku.capacity, SKU:sku.name}'
```
2. **Options:**
- **Option A**: Increase TPM capacity on existing deployment
```bash
az cognitiveservices account deployment update \
--name <resource-name> \
--resource-group <rg> \
--deployment-name <deployment-name> \
--sku-capacity <higher-capacity>
```
- **Option B**: Migrate to PTU for guaranteed throughput (no rate limits)
- **Option C**: Implement retry logic with exponential backoff in application
### PTU Capacity Unavailable Error
No PTU quota allocated in region, or PTU not available for model/region.
**Resolution:**
1. **Check PTU quota:**
```bash
subId=$(az account show --query id -o tsv)
region="eastus"
az rest --method get \
--url "https://management.azure.com/subscriptions/$subId/providers/Microsoft.CognitiveServices/locations/$region/usages?api-version=2023-05-01" \
--query "value[?contains(name.value,'ProvisionedManaged')].{Model:name.value, Used:currentValue, Limit:limit}" -o table
```
2. **Options:**
- Request PTU quota increase through Azure Portal (include capacity calculator results)
- Try different region where PTU is available
- Use Standard TPM instead
### SKU Not Supported Error
PTU not available for specific model or region combination.
**Resolution:**
1. **Check model availability:**
- Review [PTU model availability by region](https://learn.microsoft.com/azure/ai-services/openai/concepts/models#provisioned-deployment-model-availability)
2. **Options:**
- Deploy with Standard TPM SKU instead
- Choose different region where PTU is supported
- Use alternative model that supports PTU in your region
## Request Quota Increase Process
### For Standard TPM Quota
1. Navigate to Azure Portal β Your Foundry resource β **Quotas**
2. Identify model needing increase (e.g., "GPT-4o Standard")
3. Click **Request quota increase**
4. Fill form:
- Model name
- Requested quota (in TPM)
- Business justification (required)
5. Submit and monitor status
**Processing Time:** Typically 1-2 business days
### For PTU Quota
1. Navigate to Azure Portal β Your Foundry resource β **Quotas**
2. Select **Provisioned throughput unit** tab
3. Identify model needing PTU increase
4. Click **Request quota increase**
5. Fill form:
- Model name
- Requested PTU quota
- Include capacity calculator results
- Detailed business justification (workload characteristics)
6. Submit and monitor status
**Processing Time:** Typically 3-5 business days (requires stronger justification)
## Diagnostic Commands
```bash
# Check deployment status
az cognitiveservices account deployment show \
--name <resource-name> \
--resource-group <rg> \
--deployment-name <deployment-name>
# Verify available quota
subId=$(az account show --query id -o tsv)
az rest --method get \
--url "https://management.azure.com/subscriptions/$subId/providers/Microsoft.CognitiveServices/locations/eastus/usages?api-version=2023-05-01" \
--query "value[?contains(name.value,'OpenAI')].{Model:name.value, Used:currentValue, Limit:limit, Available:(limit-currentValue)}" \
--output table
# List all deployments
az cognitiveservices account deployment list \
--name <resource-name> \
--resource-group <rg> \
--query '[].{Name:name, Model:properties.model.name, Capacity:sku.capacity, SKU:sku.name}' \
--output table
```
## External Resources
- [Quota Management Documentation](https://learn.microsoft.com/azure/ai-services/openai/how-to/quota)
- [Rate Limits Documentation](https://learn.microsoft.com/azure/ai-services/openai/quotas-limits)
- [Troubleshooting Guide](https://learn.microsoft.com/azure/ai-services/openai/troubleshooting)
workflows.md 6.9 KB
# Detailed Workflows: Quota Management
**Table of Contents:** [Workflow 1: View Current Quota Usage](#workflow-1-view-current-quota-usage---detailed-steps) Β· [Workflow 2: Find Best Region for Model Deployment](#workflow-2-find-best-region-for-model-deployment---detailed-steps) Β· [Workflow 3: Check Quota Before Deployment](#workflow-3-check-quota-before-deployment---detailed-steps) Β· [Workflow 4: Monitor Quota Across Deployments](#workflow-4-monitor-quota-across-deployments---detailed-steps) Β· [Quick Command Reference](#quick-command-reference) Β· [MCP Tools Reference](#mcp-tools-reference-optional-wrappers)
## Workflow 1: View Current Quota Usage - Detailed Steps
### Step 1: Show Regional Quota Summary (REQUIRED APPROACH)
> **CRITICAL AGENT INSTRUCTION:**
> - When showing quota: Query REGIONAL quota summary, NOT individual resources
> - DO NOT run `az cognitiveservices account list` for quota queries
> - DO NOT filter resources by username or name patterns
> - ONLY check specific resource deployments if user provides resource name
> - Quotas are managed at SUBSCRIPTION + REGION level, NOT per-resource
**Show Regional Quota Summary:**
```bash
# Get subscription ID
subId=$(az account show --query id -o tsv)
# Check quota for key regions
regions=("eastus" "eastus2" "westus" "westus2")
for region in "${regions[@]}"; do
echo "=== Region: $region ==="
az rest --method get \
--url "https://management.azure.com/subscriptions/$subId/providers/Microsoft.CognitiveServices/locations/$region/usages?api-version=2023-05-01" \
--query "value[?contains(name.value,'OpenAI.Standard')].{Model:name.value, Used:currentValue, Limit:limit, Available:(limit-currentValue)}" \
--output table
echo ""
done
```
### Step 2: If User Asks for Specific Resource (ONLY IF EXPLICITLY REQUESTED)
```bash
# User must provide resource name
az cognitiveservices account deployment list \
--name <user-provided-resource-name> \
--resource-group <user-provided-rg> \
--query '[].{Name:name, Model:properties.model.name, Capacity:sku.capacity, SKU:sku.name}' \
--output table
```
**Alternative - Use MCP Tools (Optional Wrappers):**
```
foundry_models_deployments_list(
resource-group="<rg>",
azure-ai-services="<resource-name>"
)
```
*Note: MCP tools are convenience wrappers around the same control plane APIs shown above.*
**Interpreting Results:**
- `Used` (currentValue): Currently allocated quota
- `Limit`: Maximum quota available in region
- `Available`: Calculated as `limit - currentValue`
## Workflow 2: Find Best Region for Model Deployment - Detailed Steps
### Step 1: Check Single Region
```bash
# Get subscription ID
subId=$(az account show --query id -o tsv)
# Check quota for GPT-4o Standard in a specific region
region="eastus" # Change to your target region
az rest --method get \
--url "https://management.azure.com/subscriptions/$subId/providers/Microsoft.CognitiveServices/locations/$region/usages?api-version=2023-05-01" \
--query "value[?name.value=='OpenAI.Standard.gpt-4o'].{Model:name.value, Used:currentValue, Limit:limit, Available:(limit-currentValue)}" \
-o table
```
### Step 2: Check Multiple Regions (Common Regions)
Check these regions in sequence by changing the `region` variable:
- `eastus`, `eastus2` - US East Coast
- `westus`, `westus2`, `westus3` - US West Coast
- `swedencentral` - Europe (Sweden)
- `canadacentral` - Canada
- `uksouth` - UK
- `japaneast` - Asia Pacific
**Alternative - Use MCP Tool:**
```
model_quota_list(region="eastus")
```
Repeat for each target region.
**Key Points:**
- Query returns `currentValue` (used), `limit` (max), and calculated `Available`
- Standard SKU format: `OpenAI.Standard.<model-name>`
- For PTU: `OpenAI.ProvisionedManaged.<model-name>`
- Focus on 2-3 regions relevant to your location rather than checking all regions
## Workflow 3: Check Quota Before Deployment - Detailed Steps
**Steps:**
1. Check current usage (workflow #1)
2. Calculate available: `limit - currentValue`
3. Compare: `available >= required_capacity`
4. If insufficient: Use workflow #2 to find region with capacity, or request increase
## Workflow 4: Monitor Quota Across Deployments - Detailed Steps
**Recommended Approach - Regional Quota Overview:**
Show quota by region (better than listing all resources):
```bash
subId=$(az account show --query id -o tsv)
regions=("eastus" "eastus2" "westus" "westus2" "swedencentral")
for region in "${regions[@]}"; do
echo "=== Region: $region ==="
az rest --method get \
--url "https://management.azure.com/subscriptions/$subId/providers/Microsoft.CognitiveServices/locations/$region/usages?api-version=2023-05-01" \
--query "value[?contains(name.value,'OpenAI')].{Model:name.value, Used:currentValue, Limit:limit, Available:(limit-currentValue)}" \
--output table
echo ""
done
```
**Alternative - Check Specific Resource:**
If user wants to monitor a specific resource, ask for resource name first:
```bash
# List deployments for specific resource
az cognitiveservices account deployment list \
--name <resource-name> \
--resource-group <rg> \
--query '[].{Name:name, Model:properties.model.name, Capacity:sku.capacity}' \
--output table
```
> **Note:** Don't automatically iterate through all resources in the subscription. Show regional quota summary or ask for specific resource name.
## Quick Command Reference
```bash
# View quota for specific model using REST API
subId=$(az account show --query id -o tsv)
region="eastus" # Change to your region
az rest --method get \
--url "https://management.azure.com/subscriptions/$subId/providers/Microsoft.CognitiveServices/locations/$region/usages?api-version=2023-05-01" \
--query "value[?contains(name.value,'gpt-4')].{Name:name.value, Used:currentValue, Limit:limit, Available:(limit-currentValue)}" \
--output table
# List all deployments with capacity
az cognitiveservices account deployment list \
--name <resource-name> \
--resource-group <rg> \
--query '[].{Name:name, Model:properties.model.name, Capacity:sku.capacity}' \
--output table
# Delete deployment to free quota
az cognitiveservices account deployment delete \
--name <resource-name> \
--resource-group <rg> \
--deployment-name <deployment-name>
```
## MCP Tools Reference (Optional Wrappers)
**Note:** All quota operations are control plane (management) operations. MCP tools are optional convenience wrappers around Azure CLI commands.
| Tool | Purpose | Equivalent Azure CLI |
|------|---------|---------------------|
| `foundry_models_deployments_list` | List all deployments with capacity | `az cognitiveservices account deployment list` |
| `model_quota_list` | List quota and usage across regions | `az rest` (Management API) |
| `model_catalog_list` | List available models from catalog | `az rest` (Management API) |
| `foundry_resource_get` | Get resource details and endpoint | `az cognitiveservices account show` |
**Recommended:** Use Azure CLI commands directly for control plane operations.
rbac.md 6.8 KB
# Microsoft Foundry RBAC Management
Reference for managing RBAC for Microsoft Foundry resources: user permissions, managed identity configuration, and service principal setup for CI/CD.
## Quick Reference
| Property | Value |
|----------|-------|
| **CLI Extension** | `az role assignment`, `az ad sp` |
| **Resource Type** | `Microsoft.CognitiveServices/accounts` |
| **Best For** | Permission management, access auditing, CI/CD setup |
## When to Use
- Grant user access to Foundry resources or projects
- Set up developer permissions (Project Manager, Owner roles)
- Audit role assignments or validate permissions
- Configure managed identity roles for connected resources
- Create service principals for CI/CD pipeline automation
- Troubleshoot permission errors
## Azure AI Foundry Built-in Roles
| Role | Create Projects | Data Actions | Role Assignments |
|------|-----------------|--------------|------------------|
| Azure AI User | No | Yes | No |
| Azure AI Project Manager | Yes | Yes | Yes (AI User only) |
| Azure AI Account Owner | Yes | No | Yes (AI User only) |
| Azure AI Owner | Yes | Yes | Yes |
> β οΈ **Warning:** Azure AI User is auto-assigned via Portal but NOT via SDK/CLI. Automation must explicitly assign roles.
## Workflows
All scopes follow the pattern: `/subscriptions/<subscription-id>/resourceGroups/<resource-group>/providers/Microsoft.CognitiveServices/accounts/<foundry-resource-name>`
For project-level scoping, append `/projects/<project-name>`.
### 1. Assign User Permissions
```bash
az role assignment create --role "Azure AI User" --assignee "<user-email-or-object-id>" --scope "<foundry-scope>"
```
### 2. Assign Developer Permissions
```bash
# Project Manager (create projects, assign AI User roles)
az role assignment create --role "Azure AI Project Manager" --assignee "<user-email-or-object-id>" --scope "<foundry-scope>"
# Full ownership including data actions
az role assignment create --role "Azure AI Owner" --assignee "<user-email-or-object-id>" --scope "<foundry-scope>"
```
### 3. Audit Role Assignments
```bash
# List all assignments
az role assignment list --scope "<foundry-scope>" --output table
# Detailed with principal names
az role assignment list --scope "<foundry-scope>" --query "[].{Principal:principalName, PrincipalType:principalType, Role:roleDefinitionName}" --output table
# Azure AI roles only
az role assignment list --scope "<foundry-scope>" --query "[?contains(roleDefinitionName, 'Azure AI')].{Principal:principalName, Role:roleDefinitionName}" --output table
```
### 4. Validate Permissions
```bash
# Current user's roles on resource
az role assignment list --assignee "$(az ad signed-in-user show --query id -o tsv)" --scope "<foundry-scope>" --query "[].roleDefinitionName" --output tsv
# Check actions available to a role
az role definition list --name "Azure AI User" --query "[].permissions[].actions" --output json
```
**Permission Requirements by Action:**
| Action | Required Role(s) |
|--------|------------------|
| Deploy models | Azure AI User, Azure AI Project Manager, Azure AI Owner |
| Create projects | Azure AI Project Manager, Azure AI Account Owner, Azure AI Owner |
| Assign Azure AI User role | Azure AI Project Manager, Azure AI Account Owner, Azure AI Owner |
| Full data access | Azure AI User, Azure AI Project Manager, Azure AI Owner |
### 5. Configure Managed Identity Roles
```bash
# Get managed identity principal ID
PRINCIPAL_ID=$(az cognitiveservices account show --name <foundry-resource-name> --resource-group <resource-group> --query identity.principalId --output tsv)
# Assign roles to connected resources (repeat pattern for each)
az role assignment create --role "<role-name>" --assignee "$PRINCIPAL_ID" --scope "<resource-scope>"
```
**Common Managed Identity Role Assignments:**
| Connected Resource | Role | Purpose |
|--------------------|------|---------|
| Azure Storage | Storage Blob Data Reader | Read files/documents |
| Azure Storage | Storage Blob Data Contributor | Read/write files |
| Azure Key Vault | Key Vault Secrets User | Read secrets |
| Azure AI Search | Search Index Data Reader | Query indexes |
| Azure AI Search | Search Index Data Contributor | Query and modify indexes |
| Azure Cosmos DB | Cosmos DB Account Reader | Read data |
### 6. Create Service Principal for CI/CD
```bash
# Create SP with minimal role
az ad sp create-for-rbac --name "foundry-cicd-sp" --role "Azure AI User" --scopes "<foundry-scope>" --output json
# Output contains: appId, password, tenant β store securely
# For project management permissions
az ad sp create-for-rbac --name "foundry-cicd-admin-sp" --role "Azure AI Project Manager" --scopes "<foundry-scope>" --output json
# Add Contributor for resource provisioning
SP_APP_ID=$(az ad sp list --display-name "foundry-cicd-sp" --query "[0].appId" -o tsv)
az role assignment create --role "Contributor" --assignee "$SP_APP_ID" --scope "/subscriptions/<subscription-id>/resourceGroups/<resource-group>"
```
> π‘ **Tip:** Use least privilege β start with `Azure AI User` and add roles as needed.
| CI/CD Scenario | Recommended Role | Additional Roles |
|----------------|------------------|------------------|
| Deploy models only | Azure AI User | None |
| Manage projects | Azure AI Project Manager | None |
| Full provisioning | Azure AI Owner | Contributor (on RG) |
| Read-only monitoring | Reader | Azure AI User (for data) |
**CI/CD Pipeline Login:**
```bash
az login --service-principal --username "<app-id>" --password "<client-secret>" --tenant "<tenant-id>"
az account set --subscription "<subscription-id>"
```
## Error Handling
| Issue | Cause | Resolution |
|-------|-------|------------|
| "Authorization failed" when deploying | Missing Azure AI User role | Assign Azure AI User role at resource scope |
| Cannot create projects | Missing Project Manager or Owner role | Assign Azure AI Project Manager role |
| "Access denied" on connected resources | Managed identity missing roles | Assign appropriate roles to MI on each resource |
| Portal works but CLI fails | Portal auto-assigns roles, CLI doesn't | Explicitly assign Azure AI User via CLI |
| Service principal cannot access data | Wrong role or scope | Verify Azure AI User is assigned at correct scope |
| "Principal does not exist" | User/SP not found in directory | Verify the assignee email or object ID is correct |
| Role assignment already exists | Duplicate assignment attempt | Use `az role assignment list` to verify existing assignments |
## Additional Resources
- [Azure AI Foundry RBAC Documentation](https://learn.microsoft.com/azure/ai-foundry/concepts/rbac-ai-foundry)
- [Azure Built-in Roles](https://learn.microsoft.com/azure/role-based-access-control/built-in-roles)
- [Managed Identities Overview](https://learn.microsoft.com/azure/active-directory/managed-identities-azure-resources/overview)
- [Service Principal Authentication](https://learn.microsoft.com/azure/developer/github/connect-from-azure)
agent-metadata-contract.md 9.3 KB
# Agent Metadata Contract
Use this contract for every agent source folder that participates in Microsoft Foundry workflows.
## Required Local Layout
```text
<agent-root>/
.foundry/
agent-metadata.yaml
agent-metadata.prod.yaml
datasets/
evaluators/
results/
```
- `agent-metadata.yaml` is the preferred local/dev metadata file.
- Optional sidecar files such as `agent-metadata.prod.yaml` can hold a single prod or CI-targeted environment without mixing multiple environments in one file.
- `datasets/` and `evaluators/` are local cache folders. Reuse existing files when they are current, and ask before refreshing or overwriting them.
- `results/` stores local evaluation outputs and comparison artifacts by environment.
## Metadata File Model
| File | Typical use | Notes |
|------|-------------|-------|
| `.foundry/agent-metadata.yaml` | Preferred local/dev metadata | Default choice for local workflows when no file is specified |
| `.foundry/agent-metadata.<env>.yaml` | Optional prod/CI or modular environment-specific metadata | Prefer this when the workflow explicitly targets that environment and the file exists |
New setups should prefer **one environment per metadata file** while keeping the current schema shape (`defaultEnvironment` + `environments.<name>`) for compatibility. Legacy multi-environment `agent-metadata.yaml` files remain supported.
## Environment Model
| Field | Required | Purpose |
|-------|----------|---------|
| `defaultEnvironment` | β
| Default environment inside the selected metadata file; in preferred single-environment files it should match the only environment key |
| `environments.<name>.projectEndpoint` | β
| Foundry project endpoint for that environment |
| `environments.<name>.agentName` | β
| Deployed Foundry agent name |
| `environments.<name>.azureContainerRegistry` | β
for hosted agents | ACR used for deployment and image refresh |
| `environments.<name>.observability.applicationInsightsResourceId` | Recommended | App Insights resource for trace workflows |
| `environments.<name>.observability.applicationInsightsConnectionString` | Optional | Connection string when needed for tooling |
| `environments.<name>.evaluationSuites[]` | β
| Dataset + local/remote references + evaluator + tag bundles for evaluation workflows |
## Example `.foundry/agent-metadata.yaml` (local/dev)
```yaml
defaultEnvironment: dev
environments:
dev:
projectEndpoint: https://contoso.services.ai.azure.com/api/projects/support-dev
agentName: support-agent-dev
azureContainerRegistry: contosoregistry.azurecr.io
observability:
applicationInsightsResourceId: /subscriptions/<sub>/resourceGroups/<rg>/providers/Microsoft.Insights/components/support-dev-ai
evaluationSuites:
- id: smoke-core
tags:
tier: smoke
purpose: baseline
stage: seed
dataset: support-agent-dev-eval-seed
datasetVersion: v1
datasetFile: .foundry/datasets/support-agent-dev-eval-seed-v1.jsonl
datasetUri: <foundry-dataset-uri>
evaluators:
- name: intent_resolution
threshold: 4
- name: task_adherence
threshold: 4
- name: citation_quality
threshold: 0.9
definitionFile: .foundry/evaluators/citation-quality.yaml
- id: trace-regression-suite
tags:
tier: regression
purpose: regression
stage: traces
dataset: support-agent-dev-traces
datasetVersion: v3
datasetFile: .foundry/datasets/support-agent-dev-traces-v3.jsonl
datasetUri: <foundry-dataset-uri>
evaluators:
- name: coherence
threshold: 4
- name: groundedness
threshold: 4
```
## Example `.foundry/agent-metadata.prod.yaml` (prod/CI)
```yaml
defaultEnvironment: prod
environments:
prod:
projectEndpoint: https://contoso.services.ai.azure.com/api/projects/support-prod
agentName: support-agent-prod
azureContainerRegistry: contosoregistry.azurecr.io
evaluationSuites:
- id: production-guardrails
tags:
tier: smoke
purpose: safety
stage: prod
dataset: support-agent-prod-curated
datasetVersion: v2
datasetFile: .foundry/datasets/support-agent-prod-curated-v2.jsonl
datasetUri: <foundry-dataset-uri>
evaluators:
- name: violence
threshold: 1
- name: self_harm
threshold: 1
```
## Workflow Rules
1. Auto-discover agent roots by searching for `.foundry/` folders that contain `agent-metadata.yaml` or `agent-metadata.<env>.yaml`.
2. If exactly one agent root is found, use it. If multiple roots are found, require the user to choose one.
3. Inside the selected agent root, select the metadata file in this order: explicit file/path from the user or workflow, then `.foundry/agent-metadata.<env>.yaml` when an explicit environment is already known and that file exists, then `.foundry/agent-metadata.yaml`. If `.foundry/agent-metadata.yaml` is absent, use the only matching sidecar file when exactly one `.foundry/agent-metadata.<env>.yaml` file exists; if multiple sidecar files exist and no explicit file/path was provided, require the user to choose the metadata file.
4. Resolve environment in this order: explicit user choice, then the file's only environment when the selected metadata file is single-environment, then remembered session choice, then `defaultEnvironment`.
5. Keep the selected agent root, metadata file, and environment visible in every deploy, eval, dataset, and trace summary.
6. Once an agent root is selected, use only that root's `.foundry/` folders and source tree for local evaluation, dataset, trace, deploy, and prompt-optimization context. Do not merge sibling agent folders.
7. Treat `datasets/` and `evaluators/` as cache folders. Reuse local files when present, but offer refresh when the user asks or when remote state is newer.
8. Writes must target the selected metadata file only. For preferred single-environment files, update only that one environment block. For legacy multi-environment files, rewrite only the selected environment block. Never copy or merge environments across sibling metadata files automatically.
9. Never overwrite cache files or metadata silently.
## Legacy Compatibility (`testCases[]` / `testSuites[]` -> `evaluationSuites[]`)
Use `evaluationSuites[]` as the canonical schema. If the selected environment still uses older `testSuites[]` and does not yet define `evaluationSuites[]`, treat that list as the current suite source, normalize it in memory, and migrate it on the next metadata write. If the selected environment is older still and uses legacy `testCases[]` without `evaluationSuites[]`, treat `testCases[]` as the suite source and normalize it the same way.
| Legacy field | Migration behavior |
|--------------|--------------------|
| `id` | Keep as-is |
| `dataset`, `datasetVersion`, `datasetFile`, `datasetUri`, `evaluators` | Keep as-is |
| `tags` | Preserve if already present |
| `priority` | If `tags.tier` is missing, map `P0` -> `smoke`, `P1` -> `regression`, `P2` -> `coverage` |
When a workflow writes metadata, rewrite the selected metadata file so the target environment contains only `evaluationSuites[]`. Do not keep older `testSuites[]` or legacy `testCases[]` in the rewritten block.
## Evaluation-Suite Guidance
Use `tags` as a freeform key/value map on each evaluation suite. Suggested keys:
| Tag Key | Example Values | Typical Use |
|---------|----------------|-------------|
| `tier` | `smoke`, `regression`, `coverage` | Suggested run order / breadth |
| `purpose` | `baseline`, `safety`, `tools`, `quality`, `regression` | Why the suite exists |
| `stage` | `seed`, `traces`, `curated`, `prod` | Dataset lifecycle alignment |
Each evaluation suite should point to one dataset and one or more evaluators with explicit thresholds. Store `dataset` as the stable Foundry dataset name (without the `-vN` suffix), store the version separately in `datasetVersion`, and keep the local cache filename versioned (for example, `...-v3.jsonl`). Persist the local `datasetFile` and remote `datasetUri` together so every evaluation suite can resolve both the cache artifact and the Foundry-registered dataset. Add a `tags` map to each suite (for example, `tier: smoke`, `purpose: baseline`) so workflows can group or filter suites without a fixed priority enum. Local dataset filenames should start with the selected environment's Foundry `agentName` from the selected metadata file, followed by stage and version suffixes, so related cache files stay grouped by agent. If `agentName` already encodes the environment (for example, `support-agent-dev`), do not append the environment key again. Keep `datasets/`, `evaluators/`, and `results/` shared at the `.foundry/` root even when multiple metadata files exist. Use evaluation-suite IDs in evaluation names, result folders, and regression summaries so the flow remains traceable.
## Sync Guidance
- Pull/refresh when the user asks, when the workflow detects missing local cache, or when remote versions clearly differ from local metadata.
- Push/register updates after the user confirms local changes that should be shared in Foundry.
- Record remote dataset names, versions, dataset URIs, and last sync timestamps in `.foundry/datasets/manifest.json` or the relevant metadata section.
auth-best-practices.md 6.5 KB
# Azure Authentication Best Practices
> Source: [Microsoft β Passwordless connections for Azure services](https://learn.microsoft.com/azure/developer/intro/passwordless-overview) and [Azure Identity client libraries](https://learn.microsoft.com/dotnet/azure/sdk/authentication/).
**Table of Contents:** [Golden Rule](#golden-rule) Β· [Authentication by Environment](#authentication-by-environment) Β· [Why Not DefaultAzureCredential in Production?](#why-not-defaultazurecredential-in-production) Β· [Production Patterns](#production-patterns) Β· [Local Development Setup](#local-development-setup) Β· [Environment-Aware Pattern](#environment-aware-pattern) Β· [Security Checklist](#security-checklist) Β· [Further Reading](#further-reading)
## Golden Rule
Use **managed identities** and **Azure RBAC** in production. Reserve `DefaultAzureCredential` for **local development only**.
## Authentication by Environment
| Environment | Recommended Credential | Why |
|---|---|---|
| **Production (Azure-hosted)** | `ManagedIdentityCredential` (system- or user-assigned) | No secrets to manage; auto-rotated by Azure |
| **Production (on-premises)** | `ClientCertificateCredential` or `WorkloadIdentityCredential` | Deterministic; no fallback chain overhead |
| **CI/CD pipelines** | `AzurePipelinesCredential` / `WorkloadIdentityCredential` | Scoped to pipeline identity |
| **Local development** | `DefaultAzureCredential` | Chains CLI, PowerShell, and VS Code credentials for convenience |
## Why Not `DefaultAzureCredential` in Production?
1. **Unpredictable fallback chain** β walks through multiple credential types, adding latency and making failures harder to diagnose.
2. **Broad surface area** β checks environment variables, CLI tokens, and other sources that should not exist in production.
3. **Non-deterministic** β which credential actually authenticates depends on the environment, making behavior inconsistent across deployments.
4. **Performance** β each failed credential attempt adds network round-trips before falling back to the next.
## Production Patterns
### .NET
```csharp
using Azure.Identity;
var credential = Environment.GetEnvironmentVariable("AZURE_FUNCTIONS_ENVIRONMENT") == "Development"
? new DefaultAzureCredential() // local dev β uses CLI/VS credentials
: new ManagedIdentityCredential(); // production β deterministic, no fallback chain
// For user-assigned identity: new ManagedIdentityCredential("<client-id>")
```
### TypeScript / JavaScript
```typescript
import { DefaultAzureCredential, ManagedIdentityCredential } from "@azure/identity";
const credential = process.env.NODE_ENV === "development"
? new DefaultAzureCredential() // local dev β uses CLI/VS credentials
: new ManagedIdentityCredential(); // production β deterministic, no fallback chain
// For user-assigned identity: new ManagedIdentityCredential("<client-id>")
```
### Python
```python
import os
from azure.identity import DefaultAzureCredential, ManagedIdentityCredential
credential = (
DefaultAzureCredential() # local dev β uses CLI/VS credentials
if os.getenv("AZURE_FUNCTIONS_ENVIRONMENT") == "Development"
else ManagedIdentityCredential() # production β deterministic, no fallback chain
)
# For user-assigned identity: ManagedIdentityCredential(client_id="<client-id>")
```
### Java
```java
import com.azure.identity.DefaultAzureCredentialBuilder;
import com.azure.identity.ManagedIdentityCredentialBuilder;
var credential = "Development".equals(System.getenv("AZURE_FUNCTIONS_ENVIRONMENT"))
? new DefaultAzureCredentialBuilder().build() // local dev β uses CLI/VS credentials
: new ManagedIdentityCredentialBuilder().build(); // production β deterministic, no fallback chain
// For user-assigned identity: new ManagedIdentityCredentialBuilder().clientId("<client-id>").build()
```
## Local Development Setup
`DefaultAzureCredential` is ideal for local dev because it automatically picks up credentials from developer tools:
1. **Azure CLI** β `az login`
2. **Azure Developer CLI** β `azd auth login`
3. **Azure PowerShell** β `Connect-AzAccount`
4. **Visual Studio / VS Code** β sign in via Azure extension
```typescript
import { DefaultAzureCredential } from "@azure/identity";
// Local development only β uses CLI/PowerShell/VS Code credentials
const credential = new DefaultAzureCredential();
```
## Environment-Aware Pattern
Detect the runtime environment and select the appropriate credential. The key principle: use `DefaultAzureCredential` only when running locally, and a specific credential in production.
> **Tip:** Azure Functions sets `AZURE_FUNCTIONS_ENVIRONMENT` to `"Development"` when running locally. For App Service or containers, use any environment variable you control (e.g. `NODE_ENV`, `ASPNETCORE_ENVIRONMENT`).
```typescript
import { DefaultAzureCredential, ManagedIdentityCredential } from "@azure/identity";
function getCredential() {
if (process.env.NODE_ENV === "development") {
return new DefaultAzureCredential(); // picks up az login / VS Code creds
}
return process.env.AZURE_CLIENT_ID
? new ManagedIdentityCredential(process.env.AZURE_CLIENT_ID) // user-assigned
: new ManagedIdentityCredential(); // system-assigned
}
```
## Security Checklist
- [ ] Use managed identity for all Azure-hosted apps
- [ ] Never hardcode credentials, connection strings, or keys
- [ ] Apply least-privilege RBAC roles at the narrowest scope
- [ ] Use `ManagedIdentityCredential` (not `DefaultAzureCredential`) in production
- [ ] Store any required secrets in Azure Key Vault
- [ ] Rotate secrets and certificates on a schedule
- [ ] Enable Microsoft Defender for Cloud on production resources
## Further Reading
- [Passwordless connections overview](https://learn.microsoft.com/azure/developer/intro/passwordless-overview)
- [Managed identities overview](https://learn.microsoft.com/entra/identity/managed-identities-azure-resources/overview)
- [Azure RBAC overview](https://learn.microsoft.com/azure/role-based-access-control/overview)
- [.NET authentication guide](https://learn.microsoft.com/dotnet/azure/sdk/authentication/)
- [Python identity library](https://learn.microsoft.com/python/api/overview/azure/identity-readme)
- [JavaScript identity library](https://learn.microsoft.com/javascript/api/overview/azure/identity-readme)
- [Java identity library](https://learn.microsoft.com/java/api/overview/azure/identity-readme)
standard-agent-setup.md 3.6 KB
# Standard Agent Setup
> **MANDATORY:** Read [Standard Agent Setup docs](https://learn.microsoft.com/en-us/azure/foundry/agents/concepts/standard-agent-setup?view=foundry) before proceeding with standard setup.
## Overview
Azure AI Foundry supports two agent setup configurations:
| Setup | Capability Host | Description |
|-------|----------------|-------------|
| **Basic** | None | Default setup. All resources are Microsoft-managed. No additional connections required. |
| **Standard** | Azure AI Services | Advanced setup. Bring-your-own storage and search connections for full control over data residency and scaling. |
## Standard Setup Connections
| Connection | Service | Required | Purpose |
|------------|---------|----------|---------|
| Thread storage | Azure Cosmos DB | β
Yes | Store conversation threads in your own Cosmos DB instance |
| File storage | Azure Storage | β
Yes | Store uploaded files in your own Azure Storage account |
| Vector store | Azure AI Search | β
Yes | Use your own Azure AI Search instance for vector/knowledge retrieval |
| Azure AI Services | Azure AI Services | β Optional | Use OpenAI models from a different AI Services resource |
> π‘ **Tip:** Standard setup is recommended for production workloads that require control over data storage, custom vector search, or integration with models from a separate AI Services resource.
## Prerequisites
Before starting deployment, confirm the following with the user:
1. **RBAC role on the resource group:** The user must have **Owner** or **User Access Administrator** role on the target resource group. The Bicep template assigns RBAC roles (Storage Blob Data Contributor, Cosmos DB Operator, AI Search roles) to the project's managed identity β this will fail without `Microsoft.Authorization/roleAssignments/write` permission.
2. **Subscription quota:** Verify the target region has available quota for AI Services. If quota is exhausted, try an alternate region (e.g., `swedencentral`, `eastus`, `westus3`).
3. **Azure Policy compliance:** Some subscriptions enforce policies (e.g., storage accounts must disable public network access). If the Bicep template fails due to policy violations, patch the template to comply (e.g., set `publicNetworkAccess: 'Disabled'` and `defaultAction: 'Deny'` on the storage account).
## Deployment
- Standard setup always creates a **new Foundry resource and a new project**. Do not ask the user for a project endpoint β one will be provisioned as part of the deployment.
- **Always use the official Bicep template:**
[Standard Agent Setup Bicep Template](https://github.com/azure-ai-foundry/foundry-samples/blob/main/infrastructure/infrastructure-setup-bicep/43-standard-agent-setup-with-customization/main.bicep)
> β οΈ **Warning:** Capability host provisioning is **asynchronous** and can take 10β20 minutes. After deploying the Bicep template, you **must poll** the deployment status until it succeeds. Do not assume the setup is complete immediately.
## Post-Deployment: Model & Agent
After infrastructure provisioning succeeds:
1. **Deploy a model** to the new AI Services account (e.g., `gpt-4o`). If `GlobalStandard` SKU quota is exhausted, fall back to `Standard` SKU.
2. **Create the agent** using MCP tools (`agent_update`) or the Python SDK (`client.agents.create_version`). See [SDK Operations](../foundry-agent/create/references/sdk-operations.md) for details.
## References
- [Capability Hosts β Agent Setup Types](https://learn.microsoft.com/en-us/azure/ai-foundry/agents/concepts/capability-hosts?view=foundry)
- [Standard Agent Setup](https://learn.microsoft.com/en-us/azure/foundry/agents/concepts/standard-agent-setup?view=foundry)
foundry-sdk-py.md 8.4 KB
# Microsoft Foundry - Python SDK Guide
Python-specific implementations for working with Microsoft Foundry.
**Table of Contents:** [Prerequisites](#prerequisites) Β· [Model Discovery and Deployment](#model-discovery-and-deployment-mcp) Β· [RAG Agent with Azure AI Search](#rag-agent-with-azure-ai-search) Β· [Creating Agents](#creating-agents) Β· [Agent Evaluation](#agent-evaluation) Β· [Knowledge Index Operations](#knowledge-index-operations-mcp) Β· [Best Practices](#best-practices) Β· [Error Handling](#error-handling)
## Prerequisites
```bash
pip install azure-ai-projects azure-identity azure-ai-inference openai azure-ai-evaluation python-dotenv
```
### Environment Variables
```bash
PROJECT_ENDPOINT=https://<resource>.services.ai.azure.com/api/projects/<project>
MODEL_DEPLOYMENT_NAME=gpt-4o
AZURE_AI_SEARCH_CONNECTION_NAME=my-search-connection
AI_SEARCH_INDEX_NAME=my-index
AZURE_OPENAI_ENDPOINT=https://<resource>.openai.azure.com
AZURE_OPENAI_DEPLOYMENT=gpt-4o
```
## Model Discovery and Deployment (MCP)
```python
foundry_models_list() # All models
foundry_models_list(publisher="OpenAI") # Filter by publisher
foundry_models_list(search_for_free_playground=True) # Free playground models
foundry_models_deploy(
resource_group="my-rg", deployment="gpt-4o-deployment",
model_name="gpt-4o", model_format="OpenAI",
azure_ai_services="my-foundry-resource",
model_version="2024-05-13", sku_capacity=10, scale_type="Standard"
)
```
## RAG Agent with Azure AI Search
> **Auth:** `DefaultAzureCredential` is for local development. See [auth-best-practices.md](../auth-best-practices.md) for production patterns.
```python
import os
from azure.ai.projects import AIProjectClient
from azure.identity import DefaultAzureCredential
from azure.ai.agents.models import (
AzureAISearchToolDefinition, AzureAISearchToolResource,
AISearchIndexResource, AzureAISearchQueryType,
)
project_client = AIProjectClient(
endpoint=os.environ["FOUNDRY_PROJECT_ENDPOINT"],
credential=DefaultAzureCredential(),
)
azs_connection = project_client.connections.get(
os.environ["AZURE_AI_SEARCH_CONNECTION_NAME"]
)
agent = project_client.agents.create_agent(
model=os.environ["FOUNDRY_MODEL_DEPLOYMENT_NAME"],
name="RAGAgent",
instructions="You are a helpful assistant. Use the knowledge base to answer. "
"Provide citations as: `[message_idx:search_idxβ source]`.",
tools=[AzureAISearchToolDefinition(
azure_ai_search=AzureAISearchToolResource(indexes=[
AISearchIndexResource(
index_connection_id=azs_connection.id,
index_name=os.environ["AI_SEARCH_INDEX_NAME"],
query_type=AzureAISearchQueryType.HYBRID,
),
])
)],
)
```
### Querying a RAG Agent (Streaming)
```python
openai_client = project_client.get_openai_client()
stream = openai_client.responses.create(
stream=True, tool_choice="required", input="Your question here",
extra_body={"agent": {"name": agent.name, "type": "agent_reference"}},
)
for event in stream:
if event.type == "response.output_text.delta":
print(event.delta, end="", flush=True)
elif event.type == "response.output_item.done":
if event.item.type == "message" and event.item.content[-1].type == "output_text":
for ann in event.item.content[-1].annotations:
if ann.type == "url_citation":
print(f"\nCitation: {ann.url}")
```
## Creating Agents
### Basic Agent
```python
agent = project_client.agents.create_agent(
model=os.environ["MODEL_DEPLOYMENT_NAME"],
name="my-agent",
instructions="You are a helpful assistant.",
)
```
### Agent with Custom Function Tools
```python
from azure.ai.agents.models import FunctionTool, ToolSet
def get_weather(location: str, unit: str = "celsius") -> str:
"""Get the current weather for a location."""
return f"Sunny and 22Β°{unit[0].upper()} in {location}"
functions = FunctionTool([get_weather])
toolset = ToolSet()
toolset.add(functions)
agent = project_client.agents.create_agent(
model=os.environ["MODEL_DEPLOYMENT_NAME"],
name="function-agent",
instructions="You are a helpful assistant with tool access.",
toolset=toolset,
)
```
### Agent with Web Search
```python
from azure.ai.projects.models import (
PromptAgentDefinition, WebSearchPreviewTool, ApproximateLocation,
)
agent = project_client.agents.create_version(
agent_name="WebSearchAgent",
definition=PromptAgentDefinition(
model=os.environ["MODEL_DEPLOYMENT_NAME"],
instructions="Search the web for current information. Provide sources.",
tools=[
WebSearchPreviewTool(
user_location=ApproximateLocation(
country="US", city="Seattle", region="Washington"
)
)
],
),
)
```
> π‘ **Tip:** `WebSearchPreviewTool` requires no external resource or connection. For Bing Grounding (which requires a dedicated Bing resource and project connection), see [Bing Grounding reference](../../foundry-agent/create/references/tool-bing-grounding.md).
### Interacting with Agents
```python
from azure.ai.agents.models import ListSortOrder
thread = project_client.agents.threads.create()
project_client.agents.messages.create(thread_id=thread.id, role="user", content="Hello")
run = project_client.agents.runs.create_and_process(thread_id=thread.id, agent_id=agent.id)
if run.status == "failed":
print(f"Run failed: {run.last_error}")
messages = project_client.agents.messages.list(thread_id=thread.id, order=ListSortOrder.ASCENDING)
for msg in messages:
if msg.text_messages:
print(f"{msg.role}: {msg.text_messages[-1].text.value}")
project_client.agents.delete_agent(agent.id)
```
## Agent Evaluation
### Single Response Evaluation (MCP)
```python
foundry_agents_query_and_evaluate(
agent_id="<agent-id>", query="What's the weather?",
endpoint="https://my-foundry.services.ai.azure.com/api/projects/my-project",
azure_openai_endpoint="https://my-openai.openai.azure.com",
azure_openai_deployment="gpt-4o",
evaluators="intent_resolution,task_adherence,tool_call_accuracy"
)
foundry_agents_evaluate(
query="What's the weather?", response="Sunny and 22Β°C.",
evaluator="intent_resolution",
azure_openai_endpoint="https://my-openai.openai.azure.com",
azure_openai_deployment="gpt-4o"
)
```
### Batch Evaluation
```python
from azure.ai.evaluation import AIAgentConverter, IntentResolutionEvaluator, evaluate
converter = AIAgentConverter(project_client)
converter.prepare_evaluation_data(thread_ids=["t1", "t2", "t3"], filename="eval_data.jsonl")
result = evaluate(
data="eval_data.jsonl",
evaluators={
"intent_resolution": IntentResolutionEvaluator(
azure_openai_endpoint=os.environ["AZURE_OPENAI_ENDPOINT"],
azure_openai_deployment=os.environ["AZURE_OPENAI_DEPLOYMENT"]
),
},
output_path="./eval_results"
)
print(f"Results: {result['studio_url']}")
```
> π‘ **Tip:** Continuous evaluation requires project managed identity with **Azure AI User** role and Application Insights connected to the project.
## Knowledge Index Operations (MCP)
```python
foundry_knowledge_index_list(endpoint="<project-endpoint>")
foundry_knowledge_index_schema(endpoint="<project-endpoint>", index="my-index")
```
## Best Practices
1. **Never hardcode credentials** β use environment variables and `python-dotenv`
2. **Check `run.status`** and handle `HttpResponseError` exceptions
3. **Reuse `AIProjectClient`** instances β don't create new ones per request
4. **Use type hints** in custom functions for better tool integration
5. **Use context managers** for agent cleanup
## Error Handling
```python
from azure.core.exceptions import HttpResponseError
try:
agent = project_client.agents.create_agent(
model=os.environ["MODEL_DEPLOYMENT_NAME"],
name="my-agent", instructions="You are helpful."
)
except HttpResponseError as e:
if e.status_code == 429:
print("Rate limited β wait and retry with exponential backoff.")
elif e.status_code == 401:
print("Authentication failed β check credentials.")
else:
print(f"Error: {e.message}")
```
### Context Manager for Agent Cleanup
```python
from contextlib import contextmanager
@contextmanager
def temporary_agent(project_client, **kwargs):
agent = project_client.agents.create_agent(**kwargs)
try:
yield agent
finally:
project_client.agents.delete_agent(agent.id)
```
create-foundry-resource.md 6.1 KB
---
name: microsoft-foundry:resource/create
description: |
Create Azure AI Services multi-service resource (Foundry resource) using Azure CLI.
USE FOR: create Foundry resource, new AI Services resource, create multi-service resource, provision Azure AI Services, AIServices kind resource, register resource provider, enable Cognitive Services, setup AI Services account, create resource group for Foundry.
DO NOT USE FOR: creating ML workspace hubs (use microsoft-foundry:project/create), deploying models (use microsoft-foundry:models/deploy), managing permissions (use microsoft-foundry:rbac), monitoring resource usage (use microsoft-foundry:quota).
compatibility:
required:
- azure-cli: ">=2.0"
optional:
- powershell: ">=7.0"
- azure-portal: "any"
---
# Create Foundry Resource
This sub-skill orchestrates creation of Azure AI Services multi-service resources using Azure CLI.
> **Important:** All resource creation operations are **control plane (management)** operations. Use **Azure CLI commands** as the primary method.
> **Note:** For monitoring resource usage and quotas, use the `microsoft-foundry:quota` skill.
**Table of Contents:** [Quick Reference](#quick-reference) Β· [When to Use](#when-to-use) Β· [Prerequisites](#prerequisites) Β· [Core Workflows](#core-workflows) Β· [Important Notes](#important-notes) Β· [Additional Resources](#additional-resources)
## Quick Reference
| Property | Value |
|----------|-------|
| **Classification** | WORKFLOW SKILL |
| **Operation Type** | Control Plane (Management) |
| **Primary Method** | Azure CLI: `az cognitiveservices account create` |
| **Resource Type** | `Microsoft.CognitiveServices/accounts` (kind: `AIServices`) |
| **Resource Kind** | `AIServices` (multi-service) |
## When to Use
Use this sub-skill when you need to:
- **Create Foundry resource** - Provision new Azure AI Services multi-service account
- **Create resource group** - Set up resource group before creating resources
- **Register resource provider** - Enable Microsoft.CognitiveServices provider
- **Manual resource creation** - CLI-based resource provisioning
**Do NOT use for:**
- Creating ML workspace hubs/projects (use `microsoft-foundry:project/create`)
- Deploying AI models (use `microsoft-foundry:models/deploy`)
- Managing RBAC permissions (use `microsoft-foundry:rbac`)
- Monitoring resource usage (use `microsoft-foundry:quota`)
## Prerequisites
- **Azure subscription** - Active subscription ([create free account](https://azure.microsoft.com/pricing/purchase-options/azure-account))
- **Azure CLI** - Version 2.0 or later installed
- **Authentication** - Run `az login` before commands
- **RBAC roles** - One of:
- Contributor
- Owner
- Custom role with `Microsoft.CognitiveServices/accounts/write`
- **Resource provider** - `Microsoft.CognitiveServices` must be registered in your subscription
- If not registered, see [Workflow #3: Register Resource Provider](#3-register-resource-provider)
- If you lack permissions, ask a subscription Owner/Contributor to register it or grant you `/register/action` privilege
> **Need RBAC help?** See [microsoft-foundry:rbac](../../rbac/rbac.md) for permission management.
## Core Workflows
### 1. Create Resource Group
**Command Pattern:** "Create a resource group for my Foundry resources"
#### Steps
1. **Ask user preference**: Use existing or create new resource group
2. **If using existing**: List and let user select from available groups (0-4: show all, 5+: show 5 most recent with "Other" option)
3. **If creating new**: Ask user to choose region, then create
```bash
# List existing resource groups
az group list --query "[-5:].{Name:name, Location:location}" --out table
# Or create new
az group create --name <rg-name> --location <location>
az group show --name <rg-name> --query "{Name:name, Location:location, State:properties.provisioningState}"
```
See [Detailed Workflow Steps](./references/workflows.md) for complete instructions.
---
### 2. Create Foundry Resource
**Command Pattern:** "Create a new Azure AI Services resource"
#### Steps
1. **Verify prerequisites**: Check Azure CLI, authentication, and provider registration
2. **Choose location**: Always ask user to select region (don't assume resource group location)
3. **Create resource**: Use `--kind AIServices` and `--sku S0` (only supported tier)
4. **Verify and get keys**
```bash
# Create Foundry resource
az cognitiveservices account create \
--name <resource-name> \
--resource-group <rg> \
--kind AIServices \
--sku S0 \
--location <location> \
--yes
# Verify and get keys
az cognitiveservices account show --name <resource-name> --resource-group <rg>
az cognitiveservices account keys list --name <resource-name> --resource-group <rg>
```
**Important:** S0 (Standard) is the only supported SKU - F0 free tier not available for AIServices.
See [Detailed Workflow Steps](./references/workflows.md) for complete instructions.
---
### 3. Register Resource Provider
**Command Pattern:** "Register Cognitive Services provider"
Required when first creating Cognitive Services in subscription or if you get `ResourceProviderNotRegistered` error.
```bash
# Register provider (requires Owner/Contributor role)
az provider register --namespace Microsoft.CognitiveServices
az provider show --namespace Microsoft.CognitiveServices --query "registrationState"
```
If you lack permissions, ask a subscription Owner/Contributor to register it or use `microsoft-foundry:rbac` skill.
See [Detailed Workflow Steps](./references/workflows.md) for complete instructions.
---
## Important Notes
- **Resource kind must be `AIServices`** for multi-service Foundry resources
- **SKU must be S0** (Standard) - F0 free tier not available for AIServices
- Always ask user to choose location - different regions may have varying availability
---
## Additional Resources
- [Common Patterns](./references/patterns.md) - Quick setup patterns and command reference
- [Troubleshooting](./references/troubleshooting.md) - Common errors and solutions
- [Azure AI Services documentation](https://learn.microsoft.com/en-us/azure/ai-services/multi-service-resource?pivots=azcli)
patterns.md 3.9 KB
# Common Patterns: Create Foundry Resource
**Table of Contents:** [Pattern A: Quick Setup](#pattern-a-quick-setup) Β· [Pattern B: Multi-Region Setup](#pattern-b-multi-region-setup) Β· [Quick Commands Reference](#quick-commands-reference)
## Pattern A: Quick Setup
Complete setup in one go:
```bash
# Ask user: "Use existing resource group or create new?"
# ==== If user chooses "Use existing" ====
# Count and list existing resource groups
TOTAL_RG_COUNT=$(az group list --query "length([])" -o tsv)
az group list --query "[-5:].{Name:name, Location:location}" --out table
# Based on count: show appropriate list and options
# User selects resource group
RG="<selected-rg-name>"
# Fetch details to verify
az group show --name $RG --query "{Name:name, Location:location, State:properties.provisioningState}"
# Then skip to creating Foundry resource below
# ==== If user chooses "Create new" ====
# List regions and ask user to choose
az account list-locations --query "[].{Region:name}" --out table
# Variables
RG="rg-ai-services" # New resource group name
LOCATION="westus2" # User's chosen location
RESOURCE_NAME="my-foundry-resource"
# Create new resource group
az group create --name $RG --location $LOCATION
# Verify creation
az group show --name $RG --query "{Name:name, Location:location, State:properties.provisioningState}"
# Create Foundry resource in user's chosen location
az cognitiveservices account create \
--name $RESOURCE_NAME \
--resource-group $RG \
--kind AIServices \
--sku S0 \
--location $LOCATION \
--yes
# Get endpoint and keys
echo "Resource created successfully!"
az cognitiveservices account show \
--name $RESOURCE_NAME \
--resource-group $RG \
--query "{Endpoint:properties.endpoint, Location:location}"
az cognitiveservices account keys list \
--name $RESOURCE_NAME \
--resource-group $RG
```
## Pattern B: Multi-Region Setup
Create resources in multiple regions:
```bash
# Variables
RG="rg-ai-services"
REGIONS=("eastus" "westus2" "westeurope")
# Create resource group
az group create --name $RG --location eastus
# Create resources in each region
for REGION in "${REGIONS[@]}"; do
RESOURCE_NAME="foundry-${REGION}"
echo "Creating resource in $REGION..."
az cognitiveservices account create \
--name $RESOURCE_NAME \
--resource-group $RG \
--kind AIServices \
--sku S0 \
--location $REGION \
--yes
echo "Resource $RESOURCE_NAME created in $REGION"
done
# List all resources
az cognitiveservices account list --resource-group $RG --output table
```
## Quick Commands Reference
```bash
# Count total resource groups to determine which scenario applies
az group list --query "length([])" -o tsv
# Check existing resource groups (up to 5 most recent)
# 0 β create new | 1-4 β select or create | 5+ β select/other/create
az group list --query "[-5:].{Name:name, Location:location}" --out table
# If 5+ resource groups exist and user selects "Other", show all
az group list --query "[].{Name:name, Location:location}" --out table
# If user selects existing resource group, fetch details to verify and get location
az group show --name <selected-rg-name> --query "{Name:name, Location:location, State:properties.provisioningState}"
# List available regions (for creating new resource group)
az account list-locations --query "[].{Region:name}" --out table
# Create resource group (if needed)
az group create --name rg-ai-services --location westus2
# Create Foundry resource
az cognitiveservices account create \
--name my-foundry-resource \
--resource-group rg-ai-services \
--kind AIServices \
--sku S0 \
--location westus2 \
--yes
# List resources in group
az cognitiveservices account list --resource-group rg-ai-services
# Get resource details
az cognitiveservices account show \
--name my-foundry-resource \
--resource-group rg-ai-services
# Delete resource
az cognitiveservices account delete \
--name my-foundry-resource \
--resource-group rg-ai-services
```
troubleshooting.md 2.4 KB
# Troubleshooting: Create Foundry Resource
## Resource Creation Failures
### ResourceProviderNotRegistered
**Solution:**
1. If you have Owner/Contributor role, register the provider:
```bash
az provider register --namespace Microsoft.CognitiveServices
```
2. If you lack permissions, ask a subscription Owner or Contributor to register it
3. Alternatively, ask them to grant you the `/register/action` privilege
### InsufficientPermissions
**Solution:**
```bash
# Check your role assignments
az role assignment list --assignee <your-user-id> --subscription <subscription-id>
# You need: Contributor, Owner, or custom role with Microsoft.CognitiveServices/accounts/write
```
Use `microsoft-foundry:rbac` skill to manage permissions.
### LocationNotAvailableForResourceType
**Solution:**
```bash
# List available regions for Cognitive Services
az provider show --namespace Microsoft.CognitiveServices \
--query "resourceTypes[?resourceType=='accounts'].locations" --out table
# Choose different region from the list
```
### ResourceNameNotAvailable
Resource name must be globally unique. Try adding a unique suffix:
```bash
UNIQUE_SUFFIX=$(date +%s)
az cognitiveservices account create \
--name "foundry-${UNIQUE_SUFFIX}" \
--resource-group <rg> \
--kind AIServices \
--sku S0 \
--location <location> \
--yes
```
## Resource Shows as Failed
**Check provisioning state:**
```bash
az cognitiveservices account show \
--name <resource-name> \
--resource-group <rg> \
--query "properties.provisioningState"
```
If `Failed`, delete and recreate:
```bash
# Delete failed resource
az cognitiveservices account delete \
--name <resource-name> \
--resource-group <rg>
# Recreate
az cognitiveservices account create \
--name <resource-name> \
--resource-group <rg> \
--kind AIServices \
--sku S0 \
--location <location> \
--yes
```
## Cannot Access Keys
**Error:** `AuthorizationFailed` when listing keys
**Solution:** You need `Cognitive Services User` or higher role on the resource.
Use `microsoft-foundry:rbac` skill to grant appropriate permissions.
## External Resources
- [Create multi-service resource](https://learn.microsoft.com/en-us/azure/ai-services/multi-service-resource?pivots=azcli)
- [Azure AI Services documentation](https://learn.microsoft.com/en-us/azure/ai-services/)
- [Azure regions with AI Services](https://azure.microsoft.com/global-infrastructure/services/?products=cognitive-services)
workflows.md 6.7 KB
# Detailed Workflows: Create Foundry Resource
**Table of Contents:** [Workflow 1: Create Resource Group](#workflow-1-create-resource-group---detailed-steps) Β· [Workflow 2: Create Foundry Resource](#workflow-2-create-foundry-resource---detailed-steps) Β· [Workflow 3: Register Resource Provider](#workflow-3-register-resource-provider---detailed-steps)
## Workflow 1: Create Resource Group - Detailed Steps
### Step 1: Ask user preference
Ask the user which option they prefer:
1. Use an existing resource group
2. Create a new resource group
### Step 2a: If user chooses "Use existing resource group"
Count and list existing resource groups:
```bash
# Count total resource groups
TOTAL_RG_COUNT=$(az group list --query "length([])" -o tsv)
# Get list of resource groups (up to 5 most recent)
az group list --query "[-5:].{Name:name, Location:location}" --out table
```
**Handle based on count:**
**If 0 resources found:**
- Inform user: "No existing resource groups found"
- Ask if they want to create a new one, then proceed to Step 2b
**If 1-4 resources found:**
- Display all X resource groups to the user
- Let user select from the list
- Fetch the selected resource group details:
```bash
az group show --name <selected-rg-name> --query "{Name:name, Location:location, State:properties.provisioningState}"
```
- Display details to user, then proceed to create Foundry resource
**If 5+ resources found:**
- Display the 5 most recent resource groups
- Present options:
1. Select from the 5 displayed
2. Other (see all resource groups)
- If user selects a resource group, fetch details:
```bash
az group show --name <selected-rg-name> --query "{Name:name, Location:location, State:properties.provisioningState}"
```
- If user chooses "Other", show all:
```bash
az group list --query "[].{Name:name, Location:location}" --out table
```
Then let user select, and fetch details as above
- Display details to user, then proceed to create Foundry resource
### Step 2b: If user chooses "Create new resource group"
1. List available Azure regions:
```bash
az account list-locations --query "[].{Region:name}" --out table
```
Common regions:
- `eastus`, `eastus2` - US East Coast
- `westus`, `westus2`, `westus3` - US West Coast
- `centralus` - US Central
- `westeurope`, `northeurope` - Europe
- `southeastasia`, `eastasia` - Asia Pacific
2. Ask user to choose a region from the list above
3. Create resource group in the chosen region:
```bash
az group create \
--name <resource-group-name> \
--location <user-chosen-location>
```
4. Verify creation:
```bash
az group show --name <resource-group-name> --query "{Name:name, Location:location, State:properties.provisioningState}"
```
Expected output: `State: "Succeeded"`
## Workflow 2: Create Foundry Resource - Detailed Steps
### Step 1: Verify prerequisites
```bash
# Check Azure CLI version (need 2.0+)
az --version
# Verify authentication
az account show
# Check resource provider registration status
az provider show --namespace Microsoft.CognitiveServices --query "registrationState"
```
If provider not registered, see Workflow #3: Register Resource Provider.
### Step 2: Choose location
**Always ask the user to choose a location.** List available regions and let the user select:
```bash
# List available regions for Cognitive Services
az account list-locations --query "[].{Region:name, DisplayName:displayName}" --out table
```
Common regions for AI Services:
- `eastus`, `eastus2` - US East Coast
- `westus`, `westus2`, `westus3` - US West Coast
- `centralus` - US Central
- `westeurope`, `northeurope` - Europe
- `southeastasia`, `eastasia` - Asia Pacific
> **Important:** Do not automatically use the resource group's location. Always ask the user which region they prefer.
### Step 3: Create Foundry resource
```bash
az cognitiveservices account create \
--name <resource-name> \
--resource-group <rg> \
--kind AIServices \
--sku S0 \
--location <location> \
--yes
```
**Parameters:**
- `--name`: Unique resource name (globally unique across Azure)
- `--resource-group`: Existing resource group name
- `--kind`: **Must be `AIServices`** for multi-service resource
- `--sku`: Must be **S0** (Standard - the only supported tier for AIServices)
- `--location`: Azure region (**always ask user to choose** from available regions)
- `--yes`: Auto-accept terms without prompting
### Step 4: Verify resource creation
```bash
# Check resource details to verify creation
az cognitiveservices account show \
--name <resource-name> \
--resource-group <rg>
# View endpoint and configuration
az cognitiveservices account show \
--name <resource-name> \
--resource-group <rg> \
--query "{Name:name, Endpoint:properties.endpoint, Location:location, Kind:kind, SKU:sku.name}"
```
Expected output:
- `provisioningState: "Succeeded"`
- Endpoint URL
- SKU: S0
- Kind: AIServices
### Step 5: Get access keys
```bash
az cognitiveservices account keys list \
--name <resource-name> \
--resource-group <rg>
```
This returns `key1` and `key2` for API authentication.
## Workflow 3: Register Resource Provider - Detailed Steps
### When Needed
Required when:
- First time creating Cognitive Services in subscription
- Error: `ResourceProviderNotRegistered`
- Insufficient permissions during resource creation
### Steps
**Step 1: Check registration status**
```bash
az provider show \
--namespace Microsoft.CognitiveServices \
--query "registrationState"
```
Possible states:
- `Registered`: Ready to use
- `NotRegistered`: Needs registration
- `Registering`: Registration in progress
**Step 2: Register provider**
```bash
az provider register --namespace Microsoft.CognitiveServices
```
**Step 3: Wait for registration**
Registration typically takes 1-2 minutes. Check status:
```bash
az provider show \
--namespace Microsoft.CognitiveServices \
--query "registrationState"
```
Wait until state is `Registered`.
**Step 4: Verify registration**
```bash
az provider list --query "[?namespace=='Microsoft.CognitiveServices']"
```
### Required Permissions
To register a resource provider, you need one of:
- **Subscription Owner** role
- **Contributor** role
- **Custom role** with `Microsoft.*/register/action` permission
**If you are not the subscription owner:**
1. Ask someone with the **Owner** or **Contributor** role to register the provider for you
2. Alternatively, ask them to grant you the `/register/action` privilege so you can register it yourself
**Alternative registration methods:**
- **Azure CLI** (recommended): `az provider register --namespace Microsoft.CognitiveServices`
- **Azure Portal**: Navigate to Subscriptions β Resource providers β Microsoft.CognitiveServices β Register
- **PowerShell**: `Register-AzResourceProvider -ProviderNamespace Microsoft.CognitiveServices`
private-network.md 5.9 KB
---
name: private-network
description: "Answer questions about and deploy Microsoft Foundry with network isolation. Covers BYO VNet, Managed VNet, hybrid patterns, private endpoints, and Bicep deployment. WHEN: 'Foundry networking', 'BYO VNet vs managed VNet', 'deploy Foundry in private VNet', 'private endpoints for Foundry'. DO NOT USE FOR: generic Azure networking without Foundry."
license: MIT
allowed-tools: Read, Write, Bash, AskUserQuestion, microsoft_docs_search, microsoft_docs_fetch
---
# Microsoft Foundry Private Networking
## Quick Reference
| Property | Value |
|----------|-------|
| **Best for** | Foundry with VNet isolation, private endpoints, subnet delegation, APIM + Foundry, VPN/Bastion access |
| **Tools** | Azure CLI |
| **MCP Tools** | `AskUserQuestion` - ask user questions; `microsoft_docs_search` - verify facts before presenting; `microsoft_docs_fetch` - fetch full Learn pages for validation |
| **Workflow** | Ground in Learn β Gather β Plan β Scaffold β Validate β Deploy β Test |
### Key Documentation
| Topic | URL |
|-------|-----|
| Network isolation | https://learn.microsoft.com/azure/ai-foundry/how-to/configure-private-link |
| Agent Service VNet | https://learn.microsoft.com/azure/ai-services/agents/how-to/virtual-networks |
| Managed VNet | https://learn.microsoft.com/azure/ai-foundry/how-to/configure-managed-network |
| Feature limitations | https://learn.microsoft.com/azure/foundry/how-to/configure-private-link#foundry-feature-limitations |
## When to Use
- User asks about Foundry networking, private endpoints, or VNet isolation
- User asks about BYO VNet, Managed VNet, or hybrid patterns
- User wants to deploy Foundry agents in a private network
- User needs APIM integration with private Foundry agents
**Do NOT use for:**
- Public Foundry setup without VNet β use [project/create](../../project/create/create-foundry-project.md)
- Bare Foundry resource without networking β use [resource/create](../create/create-foundry-resource.md)
---
## Step 0 β Ground in Microsoft Learn
Use `microsoft_docs_fetch` to get docs from Key Documentation sources.
Use `microsoft_docs_search` to verify any technical fact before presenting it to the user. If Learn contradicts a reference file, **Learn wins**. Cite the URL. If Learn doesn't cover it, say so β do not invent facts, limits, flags, or compatibility claims.
---
## End-to-End Deployment Workflow
> **Important:** All following steps are mandatory. Communicate the plan with the user before acting.
## Step 1 β Gather Requirements
Read [references/intake.md](references/intake.md). One pass, three tiers:
- **Tier 1 (Core):** Subscription, VNet model, agents, region, RG, VNet β determine approach at the end
- **Tier 2 (Architecture):** DNS, topology, NSG, on-prem, identity, BYO resources
- **Tier 3 (Enterprise):** Model, client access, auth, policies, monitoring
Determine the approach (official template / adapt closest / extend userβs IaC) at the end of Tier 1. Continue through Tiers 2β3.
---
## Step 2 β Plan Generation
Use the confirmed requirements from [references/intake.md](references/intake.md).
**OFFICIAL path:** Load the template's README from its GitHub URL (via [references/template-index.md](references/template-index.md)). Run `microsoft_docs_search` for its prerequisites. Present a deployment plan using the user's actual values.
**ADAPT path:** Load the closest template's README. Present a deployment plan highlighting what will be modified from the base template.
**EXTEND path:** Load [references/custom-template-adaptation.md](references/custom-template-adaptation.md). Read the user's existing template. Follow the gap analysis framework to present what's covered, what's missing, and any issues. Get approval before modifying.
Get confirmation before proceeding.
---
## Step 3 β Scaffold & Parameterize
Read [references/scaffold.md](references/scaffold.md).
---
## Step 4 β Pre-Deployment Validation
Catch blockers **before** deploying. These checks apply to all paths.
**Sovereign cloud:** Run `az cloud show --query name -o tsv`. If `AzureUSGovernment` or `AzureChinaCloud`, check whether the templates being used (official or user-provided) handle sovereign cloud endpoints. Official templates hardcode `core.windows.net` and Azure Public AAD endpoints.
**RBAC:** Verify deploying identity has Owner, or Contributor + User Access Administrator.
**Policy:** Run `az deployment group what-if`. Fix any violations before deploying.
**Quota:**
```bash
az cognitiveservices account list-skus --location <region> --kind AIServices -o table
```
**Provider Registrations:** `Microsoft.CognitiveServices`, `Microsoft.DocumentDB`, `Microsoft.Search`, `Microsoft.Network`.
**Feature Flags:** For Managed VNet β verify `AI.ManagedVnetPreview` is registered.
> Do NOT deploy until all pre-flight checks pass.
---
## Step 5 β Deploy & Track
**OFFICIAL / ADAPT path:** Read [references/deploy.md](references/deploy.md) for deployment command, monitoring, and error recovery.
**EXTEND path:** Deploy using the user's existing deployment workflow (their CLI commands, pipeline, or CI/CD). The monitoring and error recovery guidance in [references/deploy.md](references/deploy.md) still applies.
---
## Step 6 β Test & Validate
Read [references/post-deployment-validation.md](references/post-deployment-validation.md). These checks apply to all paths β PE verification, RBAC audit, `publicNetworkAccess` audit, and end-to-end agent test work regardless of how the infrastructure was deployed.
If any test fails, run `microsoft_docs_search` for the error before attempting remediation.
---
## Error Handling
> β οΈ **Critical retry rule:** If a deployment fails after the capability host step starts, the agent subnet gets a `legionservicelink` that cannot be removed. On retry, always use a **new VNet name** β never reuse the same agent subnet. See [references/deploy.md](references/deploy.md).
For all other errors, check `microsoft_docs_search` for current remediation before acting.
custom-template-adaptation.md 1.6 KB
# Custom Template Adaptation
For the EXTEND path β when the user has existing Bicep or Terraform templates.
## Instructions
1. **Read** the user's existing template files. Understand the resource graph: what's defined, how resources reference each other, what naming conventions are used.
2. **Analyze** the template against the user's requirements (from [intake.md](intake.md)) and the Foundry private networking documentation validated in the intake step. Identify:
- Resources already present and correctly configured
- Resources present but misconfigured (wrong settings, missing properties)
- Resources missing entirely
- Dependency or wiring issues (e.g., PEs referencing wrong subnet, DNS zones not linked)
3. **Present** findings to the user as a gap analysis table: resource, status (β
present / β οΈ misconfigured / β missing), and what needs to change. Include any issues found.
4. **Propose** an end-to-end plan to address all gaps β ordered by dependency. Explain what will be added, what will be modified, and why. Never overwrite existing modules β add alongside and reference existing resources.
5. **Wait** for user approval before making any changes.
6. **Implement** the approved changes. After implementation, the flow continues to Step 4 (Pre-Deployment Validation) in the main workflow.
## Retry Safety
> β οΈ If a deployment fails after the capability host step starts, Azure Container Apps leaves a `legionservicelink` service association on the agent subnet that **cannot be removed**. On retry, use a **new subnet or new VNet** β never reuse the same agent subnet.
deploy.md 2.9 KB
# Deploy & Track
Applies to all private network deployments.
## Deploy
```bash
az deployment group create \
--resource-group <rg> \
--template-file main.bicep \
--parameters main.bicepparam \
--name <deployment-name>
```
> β οΈ Capability host provisioning is **asynchronous** (10β20 min). The CLI produces no output during this phase.
## Monitor Progress
Use exponential backoff β do NOT poll every 30 seconds.
| Poll | Wait |
|------|------|
| 1st | 1 min after deploy starts |
| 2nd | 3 min after 1st |
| 3rd | 5 min after 2nd |
| 4th+ | Every 5 min |
```bash
# Overall state
az deployment group show \
--resource-group <rg> --name <deployment-name> \
--query "{state:properties.provisioningState,error:properties.error}" -o json
# Per-resource progress
az deployment operation group list \
--resource-group <rg> --name <deployment-name> \
--query "[].{resource:properties.targetResource.resourceType,state:properties.provisioningState}" -o table
```
Or block with timeout:
```bash
az deployment group wait \
--resource-group <rg> --name <deployment-name> \
--created --timeout 1800
```
## Error Recovery
When a deployment fails, follow this workflow:
### Step 1 β Identify the error
```bash
az deployment operation group list \
--resource-group <rg> \
--name <deployment-name> \
--query "[?properties.provisioningState=='Failed'].{resource:properties.targetResource.resourceType,error:properties.statusMessage}" \
-o json
```
### Step 2 β Resolve
Use `microsoft_docs_search` with the error code or message to find current remediation. The legionservicelink retry rule is documented in the main workflow's Error Handling section.
| Error | Likely cause | Fix |
|-------|-------------|-----|
| `legionservicelink` / subnet in use | Orphaned service link from prior attempt | Use a new `vnetName` β do not reuse the prior VNet |
| `AuthorizationFailed` on `validate/action` | Missing Contributor role | Assign Contributor + User Access Administrator to deploying identity |
| `SubnetDelegationAlreadyExists` | Agent subnet already delegated to another resource | Use a new VNet or open a support ticket to remove the delegation |
| `disableLocalAuth` policy violation | Template defaults to `false` | Set `disableLocalAuth: true` in Bicep params |
| `defaultOutboundAccess` policy violation | Subnets missing the property | Add `defaultOutboundAccess: false` to subnet properties |
### Step 3 β Present fix to user and get approval
Before re-deploying, show the user:
- What failed and why
- What file/parameter will be changed
- The new `vnetName` to use (must be different from the failed run)
### Step 4 β Re-deploy with a new deployment name
```bash
# Update main.bicepparam: change vnetName to a new unique name
az deployment group create \
--resource-group <rg> \
--template-file main.bicep \
--parameters main.bicepparam \
--name <deployment-name>-retry
```
end-to-end-test.md 4.1 KB
# End-to-End Test (VNet Access Required)
Continues from [post-deployment-validation.md](post-deployment-validation.md). Steps 1β3 there must be complete first.
## 4. VNet Access Setup
> β οΈ The remaining tests require connectivity to the VNet.
Use `AskUserQuestion`: **"Steps 1-3 are done. The remaining tests need VNet access. How do you want to proceed?"**
Options:
- `I have a Bastion VM / jump box`
- `Set up a point-to-site VPN for me` β read [vpn-dns-setup.md](vpn-dns-setup.md)
- `I have VPN / ExpressRoute already`
- `Skip testing for now`
**Bastion VM:** User has direct access to all private endpoints from the VM. Setup is complete β do NOT proceed to Step 5.
---
## 5. End-to-End Test (VPN users only)
Three phases:
1. **Network** β DNS resolution + port 443 reachability
2. **Agent Lifecycle** β Create agent, thread, run, verify, cleanup
3. **Isolation Proof** β Repeat with VPN off β expect 403
> β οΈ Chromium browsers may bypass VPN DNS via Secure DNS (DoH). If portal shows "Error loading agents" but CLI works, disable Secure DNS.
### Requirements
```bash
pip install azure-ai-projects azure-identity azure-ai-agents
```
### Phase 1: Network Validation
Resolve DNS and test port 443 for all private endpoints. Substitute actual resource names from the deployment.
PowerShell:
```powershell
$endpoints = @(
'<ai-account>.services.ai.azure.com',
'<ai-account>.openai.azure.com',
'<ai-account>.cognitiveservices.azure.com',
'<cosmos-account>.documents.azure.com',
'<storage-account>.blob.core.windows.net',
'<search-service>.search.windows.net'
)
foreach ($h in $endpoints) {
$ip = (Resolve-DnsName $h | Where-Object {$_.IPAddress}).IPAddress
$reach = Test-NetConnection $h -Port 443 -WarningAction SilentlyContinue
Write-Host "$h -> $ip (reachable: $($reach.TcpTestSucceeded))"
}
```
Bash:
```bash
endpoints=(
'<ai-account>.services.ai.azure.com'
'<ai-account>.openai.azure.com'
'<ai-account>.cognitiveservices.azure.com'
'<cosmos-account>.documents.azure.com'
'<storage-account>.blob.core.windows.net'
'<search-service>.search.windows.net'
)
for h in "${endpoints[@]}"; do
ip=$(dig +short "$h" | tail -n1)
nc -z -w 3 "$h" 443 >/dev/null 2>&1 && reach=yes || reach=no
echo "$h -> $ip (reachable: $reach)"
done
```
All should resolve to private IPs and be reachable.
Report results to the user (β
/β per endpoint) before proceeding to Phase 2.
### Phase 2: Agent Lifecycle Test
Create agent, thread, send message, verify response, cleanup. This exercises all 4 PEs (AI Services, Cosmos DB, Storage, AI Search).
```python
from azure.identity import DefaultAzureCredential
from azure.ai.projects import AIProjectClient
endpoint = "https://<ai-account>.services.ai.azure.com/api/projects/<project-name>"
client = AIProjectClient(endpoint=endpoint, credential=DefaultAzureCredential())
agents = client.agents
agent = agents.create_agent(model="<deployment-name>", name="vnet-test", instructions="Reply with 'OK'")
thread = agents.threads.create()
agents.messages.create(thread_id=thread.id, role="user", content="test")
run = agents.runs.create_and_process(thread_id=thread.id, agent_id=agent.id)
msgs = agents.messages.list(thread_id=thread.id)
print(f"Response: {msgs.data[0].content[0].text.value}")
agents.threads.delete(thread.id)
agents.delete_agent(agent.id)
```
Report results to the user (which PEs passed, any failures) before proceeding to Phase 3.
Ask user to disconnect VPN. Repeat Phase 2 β it should fail with 403. Report whether isolation is confirmed before proceeding to cross-check.
### Requirements Cross-Check
After testing, compare each requirement gathered in [intake.md](intake.md) against the deployed state. Flag any mismatches with remediation steps.
### Cleanup (VPN users only)
Ask if user wants to delete VPN Gateway (~$140/month) and DNS Resolver (~$180/month), or keep for ongoing access.
```bash
az network vnet-gateway delete --resource-group <rg> --name vpn-gateway-<suffix> --no-wait
az network dns-resolver delete --resource-group <rg> --name dns-resolver-<suffix> --yes
az network public-ip delete --resource-group <rg> --name vpn-gateway-pip-<suffix>
```
intake.md 6.1 KB
# Intake
Collect all inputs in one pass, tiered by priority. Extract implicit answers from the userβs message before asking. Use `AskUserQuestion` for unanswered items β batch related questions.
---
## Tier 1 β Core
### 1.0 Verify Subscription
Run:
```bash
az account show --query "{Name:name, Id:id, State:state}" -o table
```
Confirm with user. Switch if needed:
```bash
az account set --subscription "<name-or-id>"
```
### 1.1 Extract Known Answers
Scan the user's message before asking:
| User Says | Inferred |
|-----------|----------|
| "my existing VNet" / "my VNet" | BYO VNet |
| "managed virtual network" | Managed VNet |
| "user-assigned identity" / "UAI" | User-assigned identity |
| "APIM" / "API Management" | Needs APIM |
| "MCP servers on the VNet" | Needs MCP subnet |
| "I have a Bicep/Terraform template" | Extend existing IaC |
| "add Foundry to my existing infra" | Extend existing IaC |
### 1.2 Architecture Questions
For unanswered items, use `AskUserQuestion`:
**VNet model:** BYO VNet or Managed VNet (preview)?
**Agents:** Agent workloads, or just models/projects?
**Region:** Which Azure region? After answer, verify capacity:
```bash
az cognitiveservices account list-skus --location <region> --kind AIServices -o table
```
If empty, warn the user and suggest alternatives.
**Resource Group:** New or existing?
**VNet:** New or existing? If new: address space (default `192.168.0.0/16`), subnet CIDRs (agent `/24`, PE `/24`).
### 1.3 Determine Approach
Based on the answers collected, select one of three paths:
```
User has existing IaC they want to extend?
βββ Yes β EXTEND
β
βββ No β check template-index.md
βββ Template fits as-is β OFFICIAL
βββ Partial or no fit β ADAPT (start from closest template)
```
**OFFICIAL:** Load [template-index.md](template-index.md), fetch the best-fit README from GitHub. Present the match using the template's descriptive name.
**ADAPT:** Fetch the closest template's README. Explain what doesn't fit, present the delta, offer to adapt.
**EXTEND:** The user has existing Bicep/Terraform β no template selection needed yet. Continue to Tier 2.
Confirm the approach with the user before continuing to Tier 2.
---
## Tier 2 β Architecture
*Skip questions already answered or not applicable.*
### BYO VNet only
**Topology:** Standalone, hub-spoke, or Azure vWAN?
**On-prem connectivity:** VPN Gateway, ExpressRoute, or none?
**DNS:** Azure-provided, custom DNS resolver, or on-prem DNS forwarding?
**Address space:** Is `192.168.0.0/16` available, or use a specific range?
**NSG / Firewall:** Existing rules on the subnets?
**Deployment executor:** Where will post-deployment commands run? (VM, Bastion, VPN, Cloud Shell)
**Subscription scope:** Same subscription/tenant, cross-subscription, or cross-tenant?
**Team ownership:** Same team controls VNet, DNS, NSG, and policy? If different team, block and get pre-approval before deploying.
### Managed VNet only
**Feature flag:** Run `az feature show` to verify `AI.ManagedVnetPreview` is registered. If not, register and wait 15β30 min.
**Outbound mode:** Internet outbound (default) or approved outbound only?
**MCP:** Public MCP endpoints or private MCP on VNet?
**Client access:** Where will clients connect from? (Same VNet, peered VNet, on-prem via VPN/ER, Azure-hosted service)
### Both paths
**MCP servers:** Needed on VNet?
**APIM:** Needed?
**Identity:** System-assigned (default) or user-assigned?
**BYO resources:** Reuse existing Cosmos DB / Storage / AI Search, or create new?
> If reusing, confirm all in same region as VNet.
**Key Vault / App Insights:** If user mentions existing ones, collect resource IDs. Optional.
---
## Tier 3 β Enterprise
**Agent tools:** Which tools? (AI Search, Cosmos DB, Storage, MCP, external APIs, Bing grounding, Code Interpreter)
**Model:** Name, vendor, version. Verify version format:
| Vendor | Format | Example |
|--------|--------|---------|
| OpenAI | Date | `2025-04-14` |
| Mistral AI | Integer | `1` |
| Meta | Integer | `9` |
**Client type:** SDK, web app, Teams bot, other service?
**Client network path:** Inside VNet, peered VNet, VPN/ExpressRoute?
**Authentication:** Entra ID (recommended) or API key?
> Entra ID token audience for Foundry Agents API: `https://ai.azure.com`
**GitHub access:** Can deployment environment reach `github.com`? If not, pre-stage template.
**Azure Policy:** Known policies (e.g., `disableLocalAuth`, `defaultOutboundAccess`)? If unknown, `what-if` catches them in Step 4.
**Monitoring:** Existing Log Analytics workspace, create new, or not needed?
---
## Validate Against Learn
After collecting all requirements, validate the user's configuration against current documentation. Use `microsoft_docs_fetch` on the relevant pages below, then `microsoft_docs_search` for any requirement-specific concerns not covered.
### Reference Pages
| Topic | URL |
|-------|-----|
| Network isolation overview | https://learn.microsoft.com/azure/ai-foundry/how-to/configure-private-link |
| Agent Service private networking | https://learn.microsoft.com/azure/ai-services/agents/how-to/virtual-networks |
| Managed VNet configuration | https://learn.microsoft.com/azure/ai-foundry/how-to/configure-managed-network |
| Agent Service FAQ β VNet | https://learn.microsoft.com/azure/foundry/agents/faq#virtual-networking |
| Supported regions & availability | https://learn.microsoft.com/azure/ai-foundry/reference/region-support |
| NSP | https://learn.microsoft.com/en-us/azure/networking/network-security-perimeter |
| Feature Limitations | https://learn.microsoft.com/en-us/azure/foundry/how-to/configure-private-link#foundry-feature-limitations |
> These URLs may change. If a fetch returns 404, use `microsoft_docs_search` to find the current page.
If a conflict is found, present:
1. The constraint and its source URL
2. Which requirement it affects
3. Options to resolve
Do NOT proceed until all conflicts are resolved or accepted.
---
## Confirmation
Present a summary of all gathered requirements. Ask: **"Confirm this is accurate before I generate a deployment plan."**
> Do NOT proceed to Plan Generation until you validated requirements against documents and the user confirms.
post-deployment-validation.md 2.9 KB
# Post-Deployment Validation
Run after deployment succeeds. Steps 1-3 can run from anywhere (management plane). Steps 4-5 require VNet access.
## 1. Infrastructure Verification
### 1.1 Resource State
Verify all resources are in `Succeeded` state:
```bash
az deployment operation group list \
--resource-group <rg> --name <deployment-name> \
--query "[].{resource:properties.targetResource.resourceType,state:properties.provisioningState}" -o table
```
### 1.2 Private Endpoint Connections
Verify all PE connections are `Approved`:
```bash
az network private-endpoint list \
--resource-group <rg> \
--query "[].{name:name,status:privateLinkServiceConnections[0].privateLinkServiceConnectionState.status,resource:privateLinkServiceConnections[0].groupIds[0]}" -o table
```
### 1.3 Public Network Access Audit
Verify all resources have public access disabled:
```bash
az cognitiveservices account show --name <ai-account> --resource-group <rg> \
--query "properties.publicNetworkAccess" -o tsv
az cosmosdb show --name <cosmos-account> --resource-group <rg> \
--query "publicNetworkAccess" -o tsv
az storage account show --name <storage-account> --resource-group <rg> \
--query "publicNetworkAccess" -o tsv
az search service show --name <search-service> --resource-group <rg> \
--query "publicNetworkAccess" -o tsv
```
All should return `Disabled`.
> **T10 (Private Basic):** Steps 2-5 below do not apply β T10 has no agents, no capability host, and no BYO resources. Setup is complete after Step 1.
## 2. RBAC Role Assignment (no VNet required)
The template does not assign data-plane roles automatically.
Assign `Azure AI Developer` at the **account** scope (management-plane):
```bash
az role assignment create \
--role "Azure AI Developer" \
--assignee <your-object-id-or-email> \
--scope /subscriptions/<sub>/resourceGroups/<rg>/providers/Microsoft.CognitiveServices/accounts/<ai-account-name>
```
Assign `Azure AI User` at the **project** scope (data-plane β required for `agents/read`, `agents/write`):
```bash
az role assignment create \
--role "Azure AI User" \
--assignee <your-object-id-or-email> \
--scope /subscriptions/<sub>/resourceGroups/<rg>/providers/Microsoft.CognitiveServices/accounts/<ai-account-name>/projects/<project-name>
```
> β οΈ RBAC propagation can take 1β5 minutes.
## 3. Deploy a Model (no VNet required)
```bash
az cognitiveservices account deployment create \
--resource-group <rg> \
--name <ai-account-name> \
--deployment-name <deployment-name> \
--model-name <modelName> \
--model-version <modelVersion> \
--model-format <format> \
--sku-name GlobalStandard \
--sku-capacity 50
```
Fall back to `Standard` SKU if `GlobalStandard` quota is exhausted.
---
## 4. VNet Access & End-to-End Test
For the remaining steps (VNet access setup, DNS resolution, agent lifecycle test, isolation proof, cleanup), read [end-to-end-test.md](end-to-end-test.md).
scaffold.md 1.7 KB
# Scaffold & Parameterize
Use this reference to fetch the confirmed template and wire up parameters.
## Path A β OFFICIAL / ADAPT
If the user has no GitHub access, the template must already be present in the workspace. Do NOT attempt to fetch from GitHub.
Fetch the template from the GitHub URL in [template-index.md](template-index.md). Choose **Bicep or Terraform** based on the user's preference or existing workspace files. Fetch the **entire template folder** including subdirectories. Create the files in the user's workspace (e.g., `infra/` folder).
For ADAPT: after fetching, modify the template to match the user's requirements before parameterizing.
## Path B β EXTEND
If the user has existing Bicep or Terraform templates they want to extend, load [custom-template-adaptation.md](custom-template-adaptation.md). Follow the gap analysis there: read the user's template, identify what's present, add only the missing mandatory resources.
Set parameter values using the answers collected in [intake.md](intake.md):
| Parameter | Source |
|-----------|--------|
| Location | Region (or inferred from existing VNet) |
| VNet name / resource ID | VNet answer (new or existing) |
| VNet address space | Address space from requirements (default `192.168.0.0/16`) |
| Subnet CIDRs | Subnet answers (agent `/24`, PE `/24`, MCP `/24` if needed) |
| Existing Cosmos DB / Storage / AI Search IDs | BYO resource IDs (only if reusing) |
| Isolation mode (T18 only) | Managed VNet outbound mode (`AllowOnlyApprovedOutbound` or `AllowInternetOutbound`) |
| Model name, version, format | Model selection from requirements |
| `disableLocalAuth` | Set `true` if Azure Policy requires it |
> Do NOT run `az deployment group create` yet β validate first (next step).
template-index.md 1.3 KB
# Template Index β Foundry Private Network
Official templates for deploying Microsoft Foundry. Each template may be available in Bicep, Terraform, or both β use one, not both. Choose based on the user's preference or existing workspace files. Use tools to fetch Bicep and Terraform templates to understand available templates and recognize if any matches user's requirements:
**Bicep templates:** https://github.com/microsoft-foundry/foundry-samples/tree/main/infrastructure/infrastructure-setup-bicep/
**Terraform templates:** https://github.com/microsoft-foundry/foundry-samples/tree/main/infrastructure/infrastructure-setup-terraform/
Not all templates exist in both Bicep and Terraform. Some have format-specific variants (e.g., Terraform has `15a`/`15b` for new VNet vs BYO VNet; Bicep has `15a` for evaluation-only).
## How to Use
1. Fetch the **directory listing** from the relevant repo URL above β the folder names are descriptive (e.g., `15-private-network-standard-agent-setup`, `18-managed-virtual-network-preview`)
2. Narrow to 1β2 candidates that match the user's requirements based on folder names
3. Fetch only those candidates' READMEs for full details (prerequisites, parameters, deployment instructions)
> The root README is incomplete β do not rely on it for template discovery. Use the directory listing instead.
vpn-dns-setup.bicep 4.4 KB
/*
VPN Gateway + DNS Private Resolver
------------------------------------
Post-deployment add-on for private network templates (T10, T15βT19).
Creates a P2S VPN Gateway (AAD auth, OpenVPN) and a DNS Private Resolver
so the user can connect from their dev machine and resolve private DNS zones.
Note: VPN Gateway deployment takes 30-45 minutes.
*/
@description('Name of the existing VNet from the Foundry deployment')
param vnetName string
@description('Resource group of the existing VNet. Defaults to the deployment resource group.')
param vnetResourceGroup string = resourceGroup().name
// ββ Existing VNet ββ
resource vnet 'Microsoft.Network/virtualNetworks@2024-05-01' existing = {
name: vnetName
scope: resourceGroup(vnetResourceGroup)
}
var location = vnet.location
@description('CIDR for GatewaySubnet β agent must compute from available VNet space')
param gatewaySubnetCidr string
@description('CIDR for DNS resolver inbound subnet β agent must compute from available VNet space')
param dnsResolverSubnetCidr string
@description('VPN client address pool β must not overlap with VNet')
param vpnClientAddressPool string = '172.16.201.0/24'
@description('Azure AD tenant ID for VPN authentication')
param aadTenantId string
@description('Unique suffix for resource naming')
param suffix string
// AAD constants for Azure Public cloud only.
// Sovereign clouds (AzureUSGovernment, AzureChinaCloud) require different audience/issuer values.
// The intake step (az cloud show) warns users before reaching this template.
var aadAudience = 'c632b3df-fb67-4d84-bdcf-b95ad541b5c8'
var aadIssuer = 'https://sts.windows.net/${aadTenantId}/'
var aadTenant = 'https://login.microsoftonline.com/${aadTenantId}/'
// ββ Add subnets ββ
resource gatewaySubnet 'Microsoft.Network/virtualNetworks/subnets@2024-05-01' = {
parent: vnet
name: 'GatewaySubnet'
properties: {
addressPrefix: gatewaySubnetCidr
defaultOutboundAccess: false
}
}
// NOTE: NRMS policy may auto-deploy an NSG on this subnet.
// Ensure the NSG allows inbound UDP/TCP port 53 (DNS) from the VPN client address pool.
resource dnsResolverSubnet 'Microsoft.Network/virtualNetworks/subnets@2024-05-01' = {
parent: vnet
name: 'dns-resolver-inbound'
properties: {
addressPrefix: dnsResolverSubnetCidr
defaultOutboundAccess: false
delegations: [
{
name: 'dns-resolver-delegation'
properties: {
serviceName: 'Microsoft.Network/dnsResolvers'
}
}
]
}
dependsOn: [gatewaySubnet] // serialize subnet updates
}
// ββ Public IP for VPN Gateway ββ
resource vpnGatewayPip 'Microsoft.Network/publicIPAddresses@2024-05-01' = {
name: 'vpn-gateway-pip-${suffix}'
location: location
sku: {
name: 'Standard'
}
zones: ['1', '2', '3']
properties: {
publicIPAllocationMethod: 'Static'
}
}
// ββ VPN Gateway ββ
resource vpnGateway 'Microsoft.Network/virtualNetworkGateways@2024-05-01' = {
name: 'vpn-gateway-${suffix}'
location: location
properties: {
gatewayType: 'Vpn'
vpnType: 'RouteBased'
sku: {
name: 'VpnGw1AZ'
tier: 'VpnGw1AZ'
}
ipConfigurations: [
{
name: 'default'
properties: {
publicIPAddress: {
id: vpnGatewayPip.id
}
subnet: {
id: gatewaySubnet.id
}
}
}
]
vpnClientConfiguration: {
vpnClientAddressPool: {
addressPrefixes: [vpnClientAddressPool]
}
vpnClientProtocols: ['OpenVPN']
vpnAuthenticationTypes: ['AAD']
aadTenant: aadTenant
aadAudience: aadAudience
aadIssuer: aadIssuer
}
}
}
// ββ DNS Private Resolver ββ
resource dnsResolver 'Microsoft.Network/dnsResolvers@2022-07-01' = {
name: 'dns-resolver-${suffix}'
location: location
properties: {
virtualNetwork: {
id: vnet.id
}
}
}
resource dnsInboundEndpoint 'Microsoft.Network/dnsResolvers/inboundEndpoints@2022-07-01' = {
parent: dnsResolver
name: 'inbound'
location: location
properties: {
ipConfigurations: [
{
privateIpAllocationMethod: 'Dynamic'
subnet: {
id: dnsResolverSubnet.id
}
}
]
}
}
// ββ Outputs ββ
output vpnGatewayName string = vpnGateway.name
output vpnGatewayId string = vpnGateway.id
output vpnPublicIpAddress string = vpnGatewayPip.properties.ipAddress
output dnsResolverInboundIp string = dnsInboundEndpoint.properties.ipConfigurations[0].privateIpAddress
vpn-dns-setup.md 6.3 KB
# VPN Gateway & DNS Private Resolver Setup
Post-deployment add-on for private network templates (T10, T15βT19). Creates a point-to-site VPN Gateway and DNS Private Resolver so the user can connect from their dev machine and resolve private DNS zones.
## Assumptions
| Property | Value | Rationale |
|----------|-------|-----------|
| Auth | Microsoft Entra ID (AAD) only | No certificate management |
| Tunnel | OpenVPN | Cross-platform, Azure VPN Client |
| Gateway SKU | VpnGw1AZ | Zone-redundant, same cost as VpnGw1 |
| GatewaySubnet | /24 recommended | Agent computes from available VNet space |
| DNS resolver subnet | /28 minimum | Agent computes from available VNet space |
| Client address pool | `172.16.201.0/24` | Non-overlapping with VNet |
## Subnet Layout
Adds two subnets to the existing VNet. Uses the next available range after the agent and PE subnets.
| Subnet | CIDR (default) | Purpose | Delegation |
|--------|----------------|---------|------------|
| `GatewaySubnet` | Computed | VPN Gateway (name is required by Azure) | None |
| `dns-resolver-inbound` | Computed | DNS Private Resolver inbound endpoint | `Microsoft.Network/dnsResolvers` |
> β οΈ **Warning:** `GatewaySubnet` is a reserved name β Azure requires this exact name for VPN Gateway.
## Pre-Deployment
### 1. Discover Available Subnets
List existing subnets to find free address space:
```bash
az network vnet subnet list \
--resource-group <rg> --vnet-name <vnet-name> \
--query "[].{name:name,cidr:addressPrefix}" -o table
```
Pick the next unused `/24` for `GatewaySubnet` and the next unused `/28` for `dns-resolver-inbound`. Both must not overlap with any existing subnet.
Example: if subnets `.0.0/24`, `.1.0/24`, `.2.0/24` are in use β use `192.168.3.0/24` for GatewaySubnet, `192.168.4.0/28` for dns-resolver-inbound.
### 2. Collect Remaining Inputs
| Parameter | Source |
|-----------|--------|
| `vnetName` | From main deployment |
| `vnetResourceGroup` | Resource group containing the VNet (omit if same as deployment RG) |
| `resourceGroupName` | Resource group for this deployment |
| `gatewaySubnetCidr` | Computed in step 1 |
| `dnsResolverSubnetCidr` | Computed in step 1 |
| `suffix` | From main deployment (or generate unique) |
| `aadTenantId` | From `az account show --query tenantId` |
### 3. Check VPN Gateway Quota
```bash
az network list-usages --location <location> \
--query "[?name.value=='VirtualNetworkGateways'].{limit:limit,current:currentValue}" -o table
```
## Bicep Template
Template: [vpn-dns-setup.bicep](vpn-dns-setup.bicep)
| Parameter | Required | Default | Description |
|-----------|----------|---------|-------------|
| `vnetName` | Yes | β | Name of the existing VNet |
| `vnetResourceGroup` | No | Deployment RG | Resource group of the existing VNet (for BYO VNets in a different RG) |
| `aadTenantId` | Yes | β | Entra ID tenant ID for VPN auth |
| `suffix` | Yes | β | Unique suffix for resource naming |
| `gatewaySubnetCidr` | Yes | β | GatewaySubnet CIDR (computed from VNet) |
| `dnsResolverSubnetCidr` | Yes | β | DNS resolver inbound subnet CIDR (computed from VNet) |
| `vpnClientAddressPool` | No | `172.16.201.0/24` | VPN client address pool |
**Creates:** GatewaySubnet, dns-resolver-inbound subnet, Public IP (zonal), VPN Gateway (VpnGw1AZ, P2S AAD/OpenVPN), DNS Private Resolver with inbound endpoint.
## Deploy
```bash
az deployment group create \
--resource-group <rg> \
--template-file vpn-dns-setup.bicep \
--parameters vnetName='<vnet-name>' aadTenantId='<tenant-id>' suffix='<suffix>' \
gatewaySubnetCidr='<computed-cidr>' dnsResolverSubnetCidr='<computed-cidr>' \
--name vpn-dns-setup
```
> β οΈ **VPN Gateway provisioning takes 20β45 minutes.** This is normal. Do not cancel.
Monitor:
```bash
az deployment group show \
--resource-group <rg> --name vpn-dns-setup \
--query "{state:properties.provisioningState}" -o tsv
```
## Post-Deployment
### 1. Get DNS Resolver Inbound IP
```bash
az network dns-resolver inbound-endpoint show \
--resource-group <rg> \
--dns-resolver-name dns-resolver-<suffix> \
--name inbound \
--query "ipConfigurations[0].privateIpAddress" -o tsv
```
Save this IP β the VPN client needs it as custom DNS.
### 2. Connect via VPN
Provide the user with these instructions (substitute actual resource name and DNS IP):
1. Go to **Azure Portal** β `vpn-gateway-<suffix>` β **Point-to-site configuration** β **Download VPN client**
2. Extract the ZIP β edit `AzureVPN/azurevpnconfig.xml` β replace:
```xml
<clientconfig i:nil="true" />
```
with:
```xml
<clientconfig>
<dnsservers>
<dnsserver><dns-resolver-inbound-ip></dnsserver>
</dnsservers>
</clientconfig>
```
3. Open [Azure VPN Client](https://aka.ms/azvpnclientdownload) β **Import** the modified `azurevpnconfig.xml` β **Connect**
Use `AskUserQuestion`: **"Let me know when you're connected so I can verify DNS resolution."**
> Do NOT proceed to verification until the user confirms they are connected.
### 3. Verify DNS Resolution
After connecting via VPN, verify private DNS zones resolve correctly:
```bash
nslookup <ai-account-name>.services.ai.azure.com
nslookup <cosmos-account>.documents.azure.com
nslookup <storage-account>.blob.core.windows.net
```
Each should resolve to a private IP (`192.168.x.x`), not a public IP.
### 4. VPN Setup Complete
DNS resolves to private IPs β VPN is working. Return to [post-deployment-validation.md](post-deployment-validation.md) **Step 5** to run the end-to-end tests.
## Troubleshooting
| Problem | Cause | Fix |
|---------|-------|-----|
| VPN connects but DNS doesn't resolve | Custom DNS not set in VPN client profile | Add DNS resolver inbound IP as custom DNS server |
| `nslookup` returns public IP | Private DNS zones not linked to VNet | Verify DNS zone VNet links: `az network private-dns zone list -g <rg>` |
| VPN client auth fails | Wrong tenant or app not consented | Verify `tenantId`, ensure Azure VPN enterprise app is consented in the tenant |
| Gateway deployment times out | Normal β VPN GW takes 20-45 min | Wait and re-check with `az deployment group show` |
| Subnet conflict | CIDR overlaps with existing subnet | Use different CIDRs for `gatewaySubnetCidr` / `dnsResolverSubnetCidr` |
| DNS resolver queries blocked | NRMS auto-deployed NSG missing DNS rules | Add inbound allow rule for UDP/TCP port 53 from VPN client address pool to the `dns-resolver-inbound` subnet NSG |
License (MIT)
View full license text
MIT License Copyright 2025 (c) Microsoft Corporation. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.