Installation

Install with CLI Recommended
gh skills-hub install azure-aigateway

Don't have the extension? Run gh extension install samueltauil/skills-hub first.

Download and extract to your repository:

.github/skills/azure-aigateway/

Extract the ZIP to .github/skills/ in your repo. The folder name must match azure-aigateway for Copilot to auto-discover it.

Skill Files (9)

SKILL.md 4.9 KB
---
name: azure-aigateway
description: "Configure Azure API Management as an AI Gateway for AI models, MCP tools, and agents. WHEN: semantic caching, token limit, content safety, load balancing, AI model governance, MCP rate limiting, jailbreak detection, add Azure OpenAI backend, add AI Foundry model, test AI gateway, LLM policies, configure AI backend, token metrics, AI cost control, convert API to MCP, import OpenAPI to gateway."
license: MIT
metadata:
  author: Microsoft
  version: "3.0.1"
compatibility: Requires Azure CLI (az) for configuration and testing
---

# Azure AI Gateway

Configure Azure API Management (APIM) as an AI Gateway for governing AI models, MCP tools, and agents.

> **To deploy APIM**, use the **azure-prepare** skill. See [APIM deployment guide](https://learn.microsoft.com/azure/api-management/get-started-create-service-instance).

## When to Use This Skill

| Category | Triggers |
|----------|----------|
| **Model Governance** | "semantic caching", "token limits", "load balance AI", "track token usage" |
| **Tool Governance** | "rate limit MCP", "protect my tools", "configure my tool", "convert API to MCP" |
| **Agent Governance** | "content safety", "jailbreak detection", "filter harmful content" |
| **Configuration** | "add Azure OpenAI backend", "configure my model", "add AI Foundry model" |
| **Testing** | "test AI gateway", "call OpenAI through gateway" |

---

## Quick Reference

| Policy | Purpose | Details |
|--------|---------|---------|
| `azure-openai-token-limit` | Cost control | [Model Policies](references/policies.md#token-rate-limiting) |
| `azure-openai-semantic-cache-lookup/store` | 60-80% cost savings | [Model Policies](references/policies.md#semantic-caching) |
| `azure-openai-emit-token-metric` | Observability | [Model Policies](references/policies.md#token-metrics) |
| `llm-content-safety` | Safety & compliance | [Agent Policies](references/policies.md#content-safety) |
| `rate-limit-by-key` | MCP/tool protection | [Tool Policies](references/policies.md#request-rate-limiting) |

---

## Get Gateway Details

```bash
# Get gateway URL
az apim show --name <apim-name> --resource-group <rg> --query "gatewayUrl" -o tsv

# List backends (AI models)
az apim backend list --service-name <apim-name> --resource-group <rg> \
  --query "[].{id:name, url:url}" -o table

# Get subscription key
az apim subscription keys list \
  --service-name <apim-name> --resource-group <rg> --subscription-id <sub-id>
```

---

## Test AI Endpoint

```bash
GATEWAY_URL=$(az apim show --name <apim-name> --resource-group <rg> --query "gatewayUrl" -o tsv)

curl -X POST "${GATEWAY_URL}/openai/deployments/<deployment>/chat/completions?api-version=2024-02-01" \
  -H "Content-Type: application/json" \
  -H "Ocp-Apim-Subscription-Key: <key>" \
  -d '{"messages": [{"role": "user", "content": "Hello"}], "max_tokens": 100}'
```

---

## Common Tasks

### Add AI Backend

See [references/patterns.md](references/patterns.md#pattern-1-add-ai-model-backend) for full steps.

```bash
# Discover AI resources
az cognitiveservices account list --query "[?kind=='OpenAI']" -o table

# Create backend
az apim backend create --service-name <apim> --resource-group <rg> \
  --backend-id openai-backend --protocol http --url "https://<aoai>.openai.azure.com/openai"

# Grant access (managed identity)
az role assignment create --assignee <apim-principal-id> \
  --role "Cognitive Services User" --scope <aoai-resource-id>
```

### Apply AI Governance Policy

Recommended policy order in `<inbound>`:

1. **Authentication** - Managed identity to backend
2. **Semantic Cache Lookup** - Check cache before calling AI
3. **Token Limits** - Cost control
4. **Content Safety** - Filter harmful content
5. **Backend Selection** - Load balancing
6. **Metrics** - Token usage tracking

See [references/policies.md](references/policies.md#combining-policies) for complete example.

---

## Troubleshooting

| Issue | Solution |
|-------|----------|
| Token limit 429 | Increase `tokens-per-minute` or add load balancing |
| No cache hits | Lower `score-threshold` to 0.7 |
| Content false positives | Increase category thresholds (5-6) |
| Backend auth 401 | Grant APIM "Cognitive Services User" role |

See [references/troubleshooting.md](references/troubleshooting.md) for details.

---

## References

- [**Detailed Policies**](references/policies.md) - Full policy examples
- [**Configuration Patterns**](references/patterns.md) - Step-by-step patterns
- [**Troubleshooting**](references/troubleshooting.md) - Common issues
- [AI-Gateway Samples](https://github.com/Azure-Samples/AI-Gateway)
- [GenAI Gateway Docs](https://learn.microsoft.com/azure/api-management/genai-gateway-capabilities)

## SDK Quick References

- **Content Safety**: [Python](references/sdk/azure-ai-contentsafety-py.md) | [TypeScript](references/sdk/azure-ai-contentsafety-ts.md)
- **API Management**: [Python](references/sdk/azure-mgmt-apimanagement-py.md) | [.NET](references/sdk/azure-mgmt-apimanagement-dotnet.md)
references/
auth-best-practices.md 6.0 KB
# Azure Authentication Best Practices

> Source: [Microsoft โ€” Passwordless connections for Azure services](https://learn.microsoft.com/azure/developer/intro/passwordless-overview) and [Azure Identity client libraries](https://learn.microsoft.com/dotnet/azure/sdk/authentication/).

## Golden Rule

Use **managed identities** and **Azure RBAC** in production. Reserve `DefaultAzureCredential` for **local development only**.

## Authentication by Environment

| Environment | Recommended Credential | Why |
|---|---|---|
| **Production (Azure-hosted)** | `ManagedIdentityCredential` (system- or user-assigned) | No secrets to manage; auto-rotated by Azure |
| **Production (on-premises)** | `ClientCertificateCredential` or `WorkloadIdentityCredential` | Deterministic; no fallback chain overhead |
| **CI/CD pipelines** | `AzurePipelinesCredential` / `WorkloadIdentityCredential` | Scoped to pipeline identity |
| **Local development** | `DefaultAzureCredential` | Chains CLI, PowerShell, and VS Code credentials for convenience |

## Why Not `DefaultAzureCredential` in Production?

1. **Unpredictable fallback chain** โ€” walks through multiple credential types, adding latency and making failures harder to diagnose.
2. **Broad surface area** โ€” checks environment variables, CLI tokens, and other sources that should not exist in production.
3. **Non-deterministic** โ€” which credential actually authenticates depends on the environment, making behavior inconsistent across deployments.
4. **Performance** โ€” each failed credential attempt adds network round-trips before falling back to the next.

## Production Patterns

### .NET

```csharp
using Azure.Identity;

var credential = Environment.GetEnvironmentVariable("AZURE_FUNCTIONS_ENVIRONMENT") == "Development"
    ? new DefaultAzureCredential()                          // local dev โ€” uses CLI/VS credentials
    : new ManagedIdentityCredential();                      // production โ€” deterministic, no fallback chain
// For user-assigned identity: new ManagedIdentityCredential("<client-id>")
```

### TypeScript / JavaScript

```typescript
import { DefaultAzureCredential, ManagedIdentityCredential } from "@azure/identity";

const credential = process.env.NODE_ENV === "development"
  ? new DefaultAzureCredential()                          // local dev โ€” uses CLI/VS credentials
  : new ManagedIdentityCredential();                      // production โ€” deterministic, no fallback chain
// For user-assigned identity: new ManagedIdentityCredential("<client-id>")
```

### Python

```python
import os
from azure.identity import DefaultAzureCredential, ManagedIdentityCredential

credential = (
    DefaultAzureCredential()                              # local dev โ€” uses CLI/VS credentials
    if os.getenv("AZURE_FUNCTIONS_ENVIRONMENT") == "Development"
    else ManagedIdentityCredential()                      # production โ€” deterministic, no fallback chain
)
# For user-assigned identity: ManagedIdentityCredential(client_id="<client-id>")
```

### Java

```java
import com.azure.identity.DefaultAzureCredentialBuilder;
import com.azure.identity.ManagedIdentityCredentialBuilder;

var credential = "Development".equals(System.getenv("AZURE_FUNCTIONS_ENVIRONMENT"))
    ? new DefaultAzureCredentialBuilder().build()          // local dev โ€” uses CLI/VS credentials
    : new ManagedIdentityCredentialBuilder().build();      // production โ€” deterministic, no fallback chain
// For user-assigned identity: new ManagedIdentityCredentialBuilder().clientId("<client-id>").build()
```

## Local Development Setup

`DefaultAzureCredential` is ideal for local dev because it automatically picks up credentials from developer tools:

1. **Azure CLI** โ€” `az login`
2. **Azure Developer CLI** โ€” `azd auth login`
3. **Azure PowerShell** โ€” `Connect-AzAccount`
4. **Visual Studio / VS Code** โ€” sign in via Azure extension

```typescript
import { DefaultAzureCredential } from "@azure/identity";

// Local development only โ€” uses CLI/PowerShell/VS Code credentials
const credential = new DefaultAzureCredential();
```

## Environment-Aware Pattern

Detect the runtime environment and select the appropriate credential. The key principle: use `DefaultAzureCredential` only when running locally, and a specific credential in production.

> **Tip:** Azure Functions sets `AZURE_FUNCTIONS_ENVIRONMENT` to `"Development"` when running locally. For App Service or containers, use any environment variable you control (e.g. `NODE_ENV`, `ASPNETCORE_ENVIRONMENT`).

```typescript
import { DefaultAzureCredential, ManagedIdentityCredential } from "@azure/identity";

function getCredential() {
  if (process.env.NODE_ENV === "development") {
    return new DefaultAzureCredential();          // picks up az login / VS Code creds
  }
  return process.env.AZURE_CLIENT_ID
    ? new ManagedIdentityCredential(process.env.AZURE_CLIENT_ID)  // user-assigned
    : new ManagedIdentityCredential();                            // system-assigned
}
```

## Security Checklist

- [ ] Use managed identity for all Azure-hosted apps
- [ ] Never hardcode credentials, connection strings, or keys
- [ ] Apply least-privilege RBAC roles at the narrowest scope
- [ ] Use `ManagedIdentityCredential` (not `DefaultAzureCredential`) in production
- [ ] Store any required secrets in Azure Key Vault
- [ ] Rotate secrets and certificates on a schedule
- [ ] Enable Microsoft Defender for Cloud on production resources

## Further Reading

- [Passwordless connections overview](https://learn.microsoft.com/azure/developer/intro/passwordless-overview)
- [Managed identities overview](https://learn.microsoft.com/entra/identity/managed-identities-azure-resources/overview)
- [Azure RBAC overview](https://learn.microsoft.com/azure/role-based-access-control/overview)
- [.NET authentication guide](https://learn.microsoft.com/dotnet/azure/sdk/authentication/)
- [Python identity library](https://learn.microsoft.com/python/api/overview/azure/identity-readme)
- [JavaScript identity library](https://learn.microsoft.com/javascript/api/overview/azure/identity-readme)
- [Java identity library](https://learn.microsoft.com/java/api/overview/azure/identity-readme)
patterns.md 6.6 KB
# AI Gateway Configuration Patterns

Step-by-step patterns for configuring Azure API Management as an AI Gateway.

---

## Pattern 1: Add AI Model Backend

Connect Azure OpenAI or AI Foundry models to your APIM instance.

### Prerequisites

- APIM instance deployed (use **azure-prepare** skill to deploy APIM โ€” see [APIM deployment guide](https://learn.microsoft.com/azure/api-management/get-started-create-service-instance))
- Azure OpenAI or AI Foundry resource provisioned
- System-assigned or user-assigned managed identity enabled on APIM

### Steps

#### 1. Discover AI Resources

```bash
# Find Azure OpenAI resources
az cognitiveservices account list --query "[?kind=='OpenAI'].{name:name, rg:resourceGroup, endpoint:properties.endpoint}" -o table

# Find AI Foundry resources (if using)
az cognitiveservices account list --query "[?kind=='AIServices'].{name:name, rg:resourceGroup}" -o table
```

#### 2. Enable Managed Identity on APIM

```bash
# Enable system-assigned identity
az apim update --name <apim-name> --resource-group <rg> --set identity.type=SystemAssigned

# Get principal ID
PRINCIPAL_ID=$(az apim show --name <apim-name> --resource-group <rg> --query "identity.principalId" -o tsv)
```

#### 3. Grant RBAC Access

```bash
AOAI_ID=$(az cognitiveservices account show --name <aoai-name> --resource-group <rg> --query id -o tsv)

az role assignment create \
  --assignee "$PRINCIPAL_ID" \
  --role "Cognitive Services User" \
  --scope "$AOAI_ID"
```

#### 4. Create Backend

```bash
az apim backend create \
  --service-name <apim-name> \
  --resource-group <rg> \
  --backend-id openai-backend \
  --protocol http \
  --url "https://<aoai-name>.openai.azure.com/openai"
```

#### 5. Import API (OpenAPI Spec)

```bash
# Import the Azure OpenAI API specification
az apim api import \
  --service-name <apim-name> \
  --resource-group <rg> \
  --api-id azure-openai-api \
  --path "openai" \
  --specification-format OpenApi \
  --specification-url "https://raw.githubusercontent.com/Azure/azure-rest-api-specs/main/specification/cognitiveservices/data-plane/AzureOpenAI/inference/stable/2024-02-01/inference.json" \
  --service-url "https://<aoai-name>.openai.azure.com/openai"
```

#### 6. Set Backend Policy

Add managed identity authentication in `<inbound>`:

```xml
<inbound>
    <base />
    <set-backend-service backend-id="openai-backend" />
    <authentication-managed-identity resource="https://cognitiveservices.azure.com" />
</inbound>
```

---

## Pattern 2: Load Balance Across Multiple AI Backends

Distribute requests across multiple Azure OpenAI instances for higher throughput.

### Steps

#### 1. Create Multiple Backends

```bash
# Primary region
az apim backend create --service-name <apim> --resource-group <rg> \
  --backend-id openai-eastus --protocol http \
  --url "https://<aoai-eastus>.openai.azure.com/openai"

# Secondary region
az apim backend create --service-name <apim> --resource-group <rg> \
  --backend-id openai-westus --protocol http \
  --url "https://<aoai-westus>.openai.azure.com/openai"
```

#### 2. Create Backend Pool

Using APIM backend pool (preview) or policy-based load balancing:

```xml
<inbound>
    <base />
    <set-variable name="backendUrl" value="@{
        var backends = new [] {
            "https://aoai-eastus.openai.azure.com",
            "https://aoai-westus.openai.azure.com"
        };
        var hash = Math.Abs(context.RequestId.GetHashCode());
        var index = hash % backends.Length;
        return backends[index];
    }" />
    <set-backend-service base-url="@((string)context.Variables["backendUrl"] + "/openai")" />
    <authentication-managed-identity resource="https://cognitiveservices.azure.com" />
</inbound>
```

#### 3. Add Circuit Breaker (Retry on 429)

```xml
<retry condition="@(context.Response.StatusCode == 429)" count="3" interval="10" delta="5" max-interval="30" first-fast-retry="false">
    <set-variable name="backendUrl" value="@{
        var backends = new [] {
            "https://aoai-eastus.openai.azure.com",
            "https://aoai-westus.openai.azure.com"
        };
        var currentIndex = Array.IndexOf(backends, (string)context.Variables["backendUrl"]);
        return backends[(currentIndex + 1) % backends.Length];
    }" />
    <set-backend-service base-url="@((string)context.Variables["backendUrl"] + "/openai")" />
    <forward-request />
</retry>
```

---

## Pattern 3: Convert API to MCP Tool

Expose an existing API through APIM as an MCP-compatible tool for AI agents.

### Steps

1. **Import API** into APIM using OpenAPI spec
2. **Add rate limiting** to protect the tool endpoint
3. **Add content safety** to filter harmful inputs
4. **Generate MCP manifest** pointing to the APIM endpoint

```xml
<!-- Rate limit MCP tool calls -->
<inbound>
    <base />
    <rate-limit-by-key calls="10" renewal-period="60"
        counter-key="@(context.Request.Headers.GetValueOrDefault("X-Agent-Id", "anonymous"))" />
</inbound>
```

---

## Pattern 4: Add Streaming Support

Configure APIM to properly handle Server-Sent Events (SSE) for streaming AI responses.

```xml
<inbound>
    <base />
    <set-backend-service backend-id="openai-backend" />
    <authentication-managed-identity resource="https://cognitiveservices.azure.com" />
</inbound>
<outbound>
    <base />
    <set-header name="Content-Type" exists-action="override">
        <value>@(context.Request.Body.As<JObject>()["stream"]?.Value<bool>() == true
            ? "text/event-stream" : "application/json")</value>
    </set-header>
</outbound>
```

> **Note**: Semantic caching and token metrics policies are NOT compatible with streaming responses. Use non-streaming for cost control scenarios.

---

## Pattern 5: Multi-Tenant AI Gateway

Isolate tenants with per-client rate limiting and tracking.

```xml
<inbound>
    <base />
    <!-- Extract tenant from subscription or header -->
    <set-variable name="tenantId" value="@(context.Subscription.Id)" />

    <!-- Per-tenant token limit -->
    <azure-openai-token-limit
        tokens-per-minute="10000"
        counter-key="@((string)context.Variables["tenantId"])"
        estimate-prompt-tokens="true" />

    <!-- Per-tenant metrics -->
    <azure-openai-emit-token-metric namespace="ai-gateway">
        <dimension name="Tenant" value="@((string)context.Variables["tenantId"])" />
        <dimension name="API" value="@(context.Api.Name)" />
    </azure-openai-emit-token-metric>

    <set-backend-service backend-id="openai-backend" />
    <authentication-managed-identity resource="https://cognitiveservices.azure.com" />
</inbound>
```

---

## Next Steps

- Apply [governance policies](policies.md) to your configured backends
- Review [troubleshooting](troubleshooting.md) for common configuration issues
policies.md 9.1 KB
# AI Gateway Policies

Complete reference for Azure API Management AI governance policies.

---

## Policy Placement Order

Recommended order in `<inbound>` section:

```
1. Authentication (managed identity)
2. Semantic Cache Lookup
3. Token Rate Limiting
4. Content Safety
5. Backend Selection / Load Balancing
6. Token Metrics
```

---

## Model Policies

### Token Rate Limiting

Control costs by limiting token consumption per minute.

```xml
<azure-openai-token-limit
    tokens-per-minute="50000"
    counter-key="@(context.Subscription.Id)"
    estimate-prompt-tokens="true"
    tokens-consumed-header-name="x-tokens-consumed"
    remaining-tokens-header-name="x-tokens-remaining" />
```

| Attribute | Purpose | Default |
|-----------|---------|---------|
| `tokens-per-minute` | Max tokens per counter window | Required |
| `counter-key` | Grouping key (subscription, IP, custom) | Required |
| `estimate-prompt-tokens` | Count prompt tokens toward limit | `true` |
| `tokens-consumed-header-name` | Response header with consumed count | โ€” |
| `remaining-tokens-header-name` | Response header with remaining count | โ€” |

**Usage tiers example:**

```xml
<!-- Free tier: 5K TPM -->
<azure-openai-token-limit tokens-per-minute="5000"
    counter-key="@("free-" + context.Subscription.Id)"
    estimate-prompt-tokens="true" />

<!-- Premium tier: 100K TPM -->
<azure-openai-token-limit tokens-per-minute="100000"
    counter-key="@("premium-" + context.Subscription.Id)"
    estimate-prompt-tokens="true" />
```

---

### Semantic Caching

Cache AI responses for semantically similar prompts. Saves 60-80% on repeated queries.

**Lookup** (in `<inbound>`):

```xml
<azure-openai-semantic-cache-lookup
    score-threshold="0.8"
    embeddings-backend-id="embeddings-backend"
    embeddings-backend-auth="system-assigned" />
```

**Store** (in `<outbound>`):

```xml
<azure-openai-semantic-cache-store duration="3600" />
```

| Attribute | Purpose | Recommended |
|-----------|---------|-------------|
| `score-threshold` | Similarity threshold (0-1) | 0.8 (lower = more cache hits) |
| `embeddings-backend-id` | Backend for embedding generation | Required |
| `embeddings-backend-auth` | Auth to embeddings backend | `system-assigned` |
| `duration` | Cache TTL in seconds | 3600 (1 hour) |

**Prerequisites:**
- An embeddings model deployed (e.g., `text-embedding-ada-002`)
- A separate backend pointing to the embeddings endpoint
- Azure Cache for Redis Enterprise with RediSearch module (for vector storage)

```bash
# Create embeddings backend
az apim backend create --service-name <apim> --resource-group <rg> \
  --backend-id embeddings-backend --protocol http \
  --url "https://<aoai>.openai.azure.com/openai"
```

> **Note**: Semantic caching is NOT compatible with streaming responses (`"stream": true`).

---

### Token Metrics

Emit token usage metrics for monitoring and chargeback.

```xml
<azure-openai-emit-token-metric namespace="ai-gateway">
    <dimension name="Subscription" value="@(context.Subscription.Id)" />
    <dimension name="API" value="@(context.Api.Name)" />
    <dimension name="Model" value="@(context.Request.Headers.GetValueOrDefault("x-model", "unknown"))" />
    <dimension name="Operation" value="@(context.Operation.Id)" />
</azure-openai-emit-token-metric>
```

Emits to Azure Monitor with these metrics:
- `Total Tokens` โ€” prompt + completion combined
- `Prompt Tokens` โ€” input tokens
- `Completion Tokens` โ€” output tokens

**Query token usage (KQL):**

```kql
customMetrics
| where name == "Total Tokens"
| extend Subscription = tostring(customDimensions["Subscription"])
| summarize TotalTokens = sum(value) by Subscription, bin(timestamp, 1h)
| order by TotalTokens desc
```

---

## Agent Policies

### Content Safety

Filter harmful, violent, or inappropriate content from AI inputs and outputs.

```xml
<!-- In <inbound> -->
<llm-content-safety backend-id="contentsafety-backend">
    <category name="Hate" threshold="4" />
    <category name="Sexual" threshold="4" />
    <category name="SelfHarm" threshold="4" />
    <category name="Violence" threshold="4" />
</llm-content-safety>
```

| Category | Description | Threshold Range |
|----------|-------------|-----------------|
| Hate | Discrimination, slurs | 0 (block all) - 6 (allow most) |
| Sexual | Explicit content | 0-6 |
| SelfHarm | Self-injury content | 0-6 |
| Violence | Violent content | 0-6 |

**Prerequisites:**
- Azure AI Content Safety resource deployed
- Backend configured for the Content Safety endpoint:

```bash
az apim backend create --service-name <apim> --resource-group <rg> \
  --backend-id contentsafety-backend --protocol http \
  --url "https://<contentsafety>.cognitiveservices.azure.com"
```

---

### Jailbreak Detection

Block prompt injection attacks that attempt to bypass AI safety guardrails.

```xml
<llm-content-safety backend-id="contentsafety-backend">
    <category name="Hate" threshold="4" />
    <category name="Sexual" threshold="4" />
    <category name="SelfHarm" threshold="4" />
    <category name="Violence" threshold="4" />
    <!-- Jailbreak detection is automatic when content safety is enabled -->
</llm-content-safety>
```

Custom response for blocked content:

```xml
<on-error>
    <base />
    <choose>
        <when condition="@(context.LastError.Source == "llm-content-safety")">
            <return-response>
                <set-status code="400" reason="Content Filtered" />
                <set-body>{"error": "Request blocked by content safety policy"}</set-body>
            </return-response>
        </when>
    </choose>
</on-error>
```

---

## Tool Policies

### Request Rate Limiting

Protect MCP tools and API endpoints from excessive requests.

```xml
<!-- Per-agent rate limiting -->
<rate-limit-by-key calls="30" renewal-period="60"
    counter-key="@(context.Request.Headers.GetValueOrDefault("X-Agent-Id", "anonymous"))"
    remaining-calls-header-name="x-ratelimit-remaining"
    retry-after-header-name="Retry-After" />
```

```xml
<!-- Per-subscription rate limiting -->
<rate-limit-by-key calls="100" renewal-period="60"
    counter-key="@(context.Subscription.Id)" />
```

---

## Combining Policies

Complete inbound policy example with all governance layers:

```xml
<policies>
    <inbound>
        <base />

        <!-- 1. Authentication -->
        <authentication-managed-identity resource="https://cognitiveservices.azure.com" />

        <!-- 2. Semantic Cache Lookup -->
        <azure-openai-semantic-cache-lookup
            score-threshold="0.8"
            embeddings-backend-id="embeddings-backend"
            embeddings-backend-auth="system-assigned" />

        <!-- 3. Token Rate Limiting -->
        <azure-openai-token-limit
            tokens-per-minute="50000"
            counter-key="@(context.Subscription.Id)"
            estimate-prompt-tokens="true" />

        <!-- 4. Content Safety -->
        <llm-content-safety backend-id="contentsafety-backend">
            <category name="Hate" threshold="4" />
            <category name="Sexual" threshold="4" />
            <category name="SelfHarm" threshold="4" />
            <category name="Violence" threshold="4" />
        </llm-content-safety>

        <!-- 5. Backend Selection -->
        <set-backend-service backend-id="openai-backend" />

        <!-- 6. Token Metrics -->
        <azure-openai-emit-token-metric namespace="ai-gateway">
            <dimension name="Subscription" value="@(context.Subscription.Id)" />
            <dimension name="API" value="@(context.Api.Name)" />
        </azure-openai-emit-token-metric>
    </inbound>

    <backend>
        <forward-request timeout="120" />
    </backend>

    <outbound>
        <base />
        <!-- Cache store (after successful response) -->
        <azure-openai-semantic-cache-store duration="3600" />
    </outbound>

    <on-error>
        <base />
        <choose>
            <when condition="@(context.LastError.Source == "azure-openai-token-limit")">
                <return-response>
                    <set-status code="429" reason="Token Limit Exceeded" />
                    <set-header name="Retry-After" exists-action="override">
                        <value>60</value>
                    </set-header>
                    <set-body>{"error": "Token rate limit exceeded. Try again later."}</set-body>
                </return-response>
            </when>
        </choose>
    </on-error>
</policies>
```

---

## Policy Quick-Decision Table

| Need | Policy | Section |
|------|--------|---------|
| Control token spend | `azure-openai-token-limit` | `<inbound>` |
| Cache similar prompts | `azure-openai-semantic-cache-lookup/store` | `<inbound>` / `<outbound>` |
| Track token usage | `azure-openai-emit-token-metric` | `<inbound>` |
| Block harmful content | `llm-content-safety` | `<inbound>` |
| Rate limit API calls | `rate-limit-by-key` | `<inbound>` |
| Authenticate to backend | `authentication-managed-identity` | `<inbound>` |
| Load balance backends | `set-backend-service` + retry | `<inbound>` |

---

## References

- [GenAI Gateway Capabilities](https://learn.microsoft.com/azure/api-management/genai-gateway-capabilities)
- [APIM Policy Reference](https://learn.microsoft.com/azure/api-management/api-management-policies)
- [AI-Gateway Samples](https://github.com/Azure-Samples/AI-Gateway)
troubleshooting.md 7.7 KB
# AI Gateway Troubleshooting

Common issues when using Azure API Management as an AI Gateway.

---

## Authentication Issues

### 401 Unauthorized from Backend

**Symptom**: APIM returns `401` when calling Azure OpenAI.

**Causes & Solutions**:

| Cause | Fix |
|-------|-----|
| Managed identity not enabled on APIM | `az apim update --name <apim> --resource-group <rg> --set identity.type=SystemAssigned` |
| Missing RBAC role | `az role assignment create --assignee <apim-principal-id> --role "Cognitive Services User" --scope <aoai-resource-id>` |
| Wrong auth resource | Ensure `resource="https://cognitiveservices.azure.com"` (not the endpoint URL) |
| RBAC propagation delay | Wait 5-10 minutes after role assignment |

**Diagnostic**:

```bash
# Verify identity is enabled
az apim show --name <apim> --resource-group <rg> --query "identity" -o json

# Check role assignments
AOAI_ID=$(az cognitiveservices account show --name <aoai> --resource-group <rg> --query id -o tsv)
az role assignment list --scope "$AOAI_ID" --query "[?principalType=='ServicePrincipal'].{role:roleDefinitionName, principal:principalId}" -o table
```

---

## Rate Limiting Issues

### 429 Token Limit Exceeded

**Symptom**: Requests blocked with `429 Too Many Requests` from `azure-openai-token-limit` policy.

**Solutions**:

1. **Increase limit**: Raise `tokens-per-minute` value
2. **Add more backends**: Load balance across regions for higher aggregate TPM
3. **Enable semantic caching**: Reduce actual token consumption by serving cached responses
4. **Switch counter-key**: Use per-user instead of global to prevent one user from exhausting the pool

```xml
<!-- Per-user instead of global -->
<azure-openai-token-limit
    tokens-per-minute="50000"
    counter-key="@(context.Request.Headers.GetValueOrDefault("X-User-Id", context.Subscription.Id))"
    estimate-prompt-tokens="true" />
```

### 429 from Azure OpenAI (Not APIM)

**Symptom**: Backend returns `429` even though APIM token limits are not exceeded.

**Cause**: Azure OpenAI's own TPM quota is exhausted.

**Solutions**:

1. Increase Azure OpenAI deployment TPM quota in the portal
2. Add load balancing across multiple Azure OpenAI instances
3. Use retry with backoff:

```xml
<retry condition="@(context.Response.StatusCode == 429)" count="3" interval="10">
    <forward-request />
</retry>
```

---

## Semantic Caching Issues

### No Cache Hits

**Symptom**: Semantic cache is configured but cache hit rate is 0%.

**Causes & Solutions**:

| Cause | Fix |
|-------|-----|
| `score-threshold` too high | Lower from 0.9 to 0.7 (more matches) |
| Embeddings backend misconfigured | Verify backend URL and auth |
| Redis not configured | Deploy Azure Cache for Redis Enterprise with RediSearch |
| Streaming requests | Semantic caching doesn't work with `"stream": true` |

**Verify caching is working**:

```bash
# Check cache-related headers in response
curl -v -X POST "${GATEWAY_URL}/openai/deployments/<deployment>/chat/completions?api-version=2024-02-01" \
  -H "Content-Type: application/json" \
  -H "Ocp-Apim-Subscription-Key: <key>" \
  -d '{"messages": [{"role": "user", "content": "What is Azure?"}], "max_tokens": 100}'

# Look for: x-cache-status header in response
```

### Cache Returns Stale Data

**Solution**: Reduce `duration` in `azure-openai-semantic-cache-store`:

```xml
<!-- Shorter TTL for frequently changing knowledge -->
<azure-openai-semantic-cache-store duration="300" />  <!-- 5 minutes -->
```

---

## Content Safety Issues

### False Positives (Legitimate Content Blocked)

**Symptom**: Normal business content is being blocked by content safety policy.

**Solutions**:

1. **Increase thresholds** (less strict):

```xml
<llm-content-safety backend-id="contentsafety-backend">
    <category name="Hate" threshold="5" />      <!-- Was 4, now less strict -->
    <category name="Sexual" threshold="5" />
    <category name="SelfHarm" threshold="5" />
    <category name="Violence" threshold="5" />
</llm-content-safety>
```

2. **Log blocked content** for review:

```xml
<on-error>
    <choose>
        <when condition="@(context.LastError.Source == "llm-content-safety")">
            <trace source="content-safety" severity="warning">
                @{
                    return new JObject(
                        new JProperty("blocked", true),
                        new JProperty("subscription", context.Subscription.Id),
                        new JProperty("timestamp", DateTime.UtcNow)
                    ).ToString();
                }
            </trace>
            <return-response>
                <set-status code="400" reason="Content Filtered" />
                <set-body>{"error": "Content filtered by safety policy"}</set-body>
            </return-response>
        </when>
    </choose>
</on-error>
```

### Content Safety Backend Error

**Symptom**: `500` error from `llm-content-safety` policy.

**Causes**:

| Cause | Fix |
|-------|-----|
| Content Safety resource not deployed | Deploy Azure AI Content Safety resource |
| Backend URL wrong | Check `contentsafety-backend` URL matches resource endpoint |
| Missing RBAC | Grant APIM "Cognitive Services User" on the Content Safety resource |
| Region mismatch | Content Safety must be in a supported region |

---

## Backend Configuration Issues

### Backend Not Found

**Symptom**: `500` error with "Backend not found" message.

```bash
# Verify backend exists
az apim backend list --service-name <apim> --resource-group <rg> \
  --query "[].{id:name, url:url}" -o table

# Check backend ID matches policy reference
```

### Timeout on AI Requests

**Symptom**: Requests timeout, especially for large context windows or complex prompts.

**Solution**: Increase timeout in `<backend>`:

```xml
<backend>
    <!-- Default is 30s, increase for large AI requests -->
    <forward-request timeout="120" />
</backend>
```

---

## Diagnostic Tools

### APIM Tracing

Enable request tracing for debugging policy flow:

```bash
# Get tracing subscription key
az apim subscription list --service-name <apim> --resource-group <rg> \
  --query "[?displayName=='Built-in all-access subscription'].primaryKey" -o tsv

# Send request with tracing
curl -X POST "${GATEWAY_URL}/..." \
  -H "Ocp-Apim-Trace: true" \
  -H "Ocp-Apim-Subscription-Key: <built-in-key>"
```

### Application Insights

If APIM is connected to Application Insights:

```kql
// Failed AI gateway requests
requests
| where success == false
| where url contains "openai"
| project timestamp, resultCode, duration, url
| order by timestamp desc
| take 20

// Token metrics over time
customMetrics
| where name == "Total Tokens"
| summarize TotalTokens = sum(value) by bin(timestamp, 1h)
| render timechart

// Content safety blocks
traces
| where message contains "content-safety"
| project timestamp, message, customDimensions
| order by timestamp desc
```

### Health Check

Quick validation that the AI Gateway is functioning:

```bash
# 1. Check APIM is running
az apim show --name <apim> --resource-group <rg> --query "provisioningState" -o tsv
# Expected: Succeeded

# 2. Check backends
az apim backend list --service-name <apim> --resource-group <rg> -o table

# 3. Test endpoint
curl -s -o /dev/null -w "%{http_code}" "${GATEWAY_URL}/openai/deployments/<deployment>/chat/completions?api-version=2024-02-01" \
  -H "Ocp-Apim-Subscription-Key: <key>" \
  -H "Content-Type: application/json" \
  -d '{"messages": [{"role": "user", "content": "ping"}], "max_tokens": 5}'
# Expected: 200
```

---

## References

- [APIM Diagnostics](https://learn.microsoft.com/azure/api-management/diagnose-solve-problems)
- [AI Gateway Monitoring](https://learn.microsoft.com/azure/api-management/genai-gateway-capabilities#monitoring-and-analytics)
- [APIM Error Handling](https://learn.microsoft.com/azure/api-management/api-management-error-handling-policies)
references/sdk/
azure-ai-contentsafety-py.md 1.3 KB
# Azure AI Content Safety โ€” Python SDK Quick Reference

> Condensed from **azure-ai-contentsafety-py**. Full patterns (blocklist management, image analysis, 8-severity mode)
> in the **azure-ai-contentsafety-py** plugin skill if installed.

## Install
```bash
pip install azure-ai-contentsafety
```

## Quick Start
```python
from azure.ai.contentsafety import ContentSafetyClient, BlocklistClient
from azure.ai.contentsafety.models import AnalyzeTextOptions, TextCategory
client = ContentSafetyClient(endpoint=endpoint, credential=credential)
```

## Non-Obvious Patterns
- Two clients: `ContentSafetyClient` (analyze) and `BlocklistClient` (blocklist management)
- Image from file: base64-encode bytes, pass via `ImageData(content=base64_str)`
- 8-severity mode: `AnalyzeTextOptions(text=..., output_type=AnalyzeTextOutputType.EIGHT_SEVERITY_LEVELS)`
- Blocklist analyze: `AnalyzeTextOptions(text=..., blocklist_names=[...], halt_on_blocklist_hit=True)`

## Best Practices
1. Use blocklists for domain-specific terms
2. Set severity thresholds appropriate for your use case
3. Handle multiple categories โ€” content can be harmful in multiple ways
4. Use `halt_on_blocklist_hit` for immediate rejection
5. Log analysis results for audit and improvement
6. Consider 8-severity mode for finer-grained control
7. Pre-moderate AI outputs before showing to users
azure-ai-contentsafety-ts.md 1.4 KB
# Azure AI Content Safety โ€” TypeScript SDK Quick Reference

> Condensed from **azure-ai-contentsafety-ts**. Full patterns (blocklist CRUD, image moderation, severity thresholds)
> in the **azure-ai-contentsafety-ts** plugin skill if installed.

## Install
```bash
npm install @azure-rest/ai-content-safety @azure/identity @azure/core-auth
```

## Quick Start
```typescript
import ContentSafetyClient, { isUnexpected } from "@azure-rest/ai-content-safety";
import { AzureKeyCredential } from "@azure/core-auth";
const client = ContentSafetyClient(endpoint, new AzureKeyCredential(key));
```

## Non-Obvious Patterns
- REST client โ€” `ContentSafetyClient` is a function, not a class
- Text: `client.path("/text:analyze").post({ body: { text, categories: [...] } })`
- Image: `client.path("/image:analyze").post({ body: { image: { content: base64 } } })`
- Blocklist create: `.path("/text/blocklists/{blocklistName}", name).patch({...})`
- API key import: `AzureKeyCredential` from `@azure/core-auth` (not `@azure/identity`)

## Best Practices
1. Always use `isUnexpected()` โ€” type guard for error handling
2. Set appropriate thresholds โ€” different categories may need different severity levels
3. Use blocklists for domain-specific terms to supplement AI detection
4. Log moderation decisions โ€” keep audit trail for compliance
5. Handle edge cases โ€” empty text, very long text, unsupported image formats
azure-mgmt-apimanagement-dotnet.md 1.3 KB
# API Management โ€” .NET SDK Quick Reference

> Condensed from **azure-mgmt-apimanagement-dotnet**. Full patterns (service
> creation, APIs, products, policies, users, gateways, backends)
> in the **azure-mgmt-apimanagement-dotnet** plugin skill if installed.

## Install
dotnet add package Azure.ResourceManager.ApiManagement
dotnet add package Azure.Identity

## Quick Start
> **Auth:** `DefaultAzureCredential` is for local development. See [auth-best-practices.md](../auth-best-practices.md) for production patterns.

```csharp
using Azure.ResourceManager;
using Azure.Identity;
var armClient = new ArmClient(new DefaultAzureCredential());
```

## Best Practices
- Use `WaitUntil.Completed` for operations that must finish before proceeding
- Use `WaitUntil.Started` for long operations like service creation (30+ min)
- Use DefaultAzureCredential for **local development only**. In production, use ManagedIdentityCredential โ€” see [auth-best-practices.md](../auth-best-practices.md)
- Handle `RequestFailedException` for ARM API errors
- Use `CreateOrUpdateAsync` for idempotent operations
- Navigate hierarchy via `Get*` methods (e.g., `service.GetApis()`)
- Policy format โ€” use XML format for policies; JSON is also supported
- Service creation โ€” Developer SKU is fastest for testing (~15-30 min)
azure-mgmt-apimanagement-py.md 1.0 KB
# API Management โ€” Python SDK Quick Reference

> Condensed from **azure-mgmt-apimanagement-py**. Full patterns (APIs,
> products, subscriptions, policies, backends, named values)
> in the **azure-mgmt-apimanagement-py** plugin skill if installed.

## Install
pip install azure-mgmt-apimanagement azure-identity

## Quick Start
> **Auth:** `DefaultAzureCredential` is for local development. See [auth-best-practices.md](../auth-best-practices.md) for production patterns.

```python
import os
from azure.mgmt.apimanagement import ApiManagementClient
from azure.identity import DefaultAzureCredential
client = ApiManagementClient(DefaultAzureCredential(), os.environ["AZURE_SUBSCRIPTION_ID"])
```

## Best Practices
- Use named values for secrets and configuration
- Apply policies at appropriate scopes (global, product, API, operation)
- Use products to bundle APIs and manage access
- Enable Application Insights for monitoring
- Use backends to abstract backend services
- Version your APIs using APIM's versioning features

License (MIT)

View full license text
MIT License

Copyright 2025 (c) Microsoft Corporation.

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.