---
title: "Fallbacks and Tracking"
description: "Configure model fallbacks for reliability and track token usage for cost management in production AI applications."
canonical_url: "https://vercel.com/academy/svelte-on-vercel/fallbacks-and-tracking"
md_url: "https://vercel.com/academy/svelte-on-vercel/fallbacks-and-tracking.md"
docset_id: "vercel-academy"
doc_version: "1.0"
last_updated: "2026-04-11T10:32:53.177Z"
content_type: "lesson"
course: "svelte-on-vercel"
course_title: "Svelte on Vercel"
prerequisites:  []
---

<agent-instructions>
Vercel Academy — structured learning, not reference docs.
Lessons are sequenced.
Adapt commands to the human's actual environment (OS, package manager, shell, editor) — detect from project context or ask, don't assume.
The lesson shows one path; if the human's project diverges, adapt concepts to their setup.
Preserve the learning goal over literal steps.
Quizzes are pedagogical — engage, don't spoil.
Quiz answers are included for your reference.
</agent-instructions>

# Fallbacks and Tracking

# Model Fallbacks and Usage Tracking

A single model going down shouldn't take your app with it. And if you don't track token usage, you'll find out about costs from your invoice instead of your dashboard.

## Outcome

Create a centralized AI provider configuration with automatic model fallbacks and token usage logging.

## Fast Track

1. Configure fallback models in the AI Gateway dashboard (the preferred approach)
2. Create `src/lib/ai/provider.ts` with a shared AI Gateway client
3. Add a `wrapLanguageModel` middleware that logs token usage

## Gateway-Level Fallbacks (Preferred)

The easiest way to add model fallbacks is in the AI Gateway itself, not in your code. In the Vercel dashboard:

1. Go to your project **Settings** → **AI Gateway**
2. Select your primary model (`anthropic/claude-sonnet-4`)
3. Add a fallback model (`anthropic/claude-haiku-4.5`)
4. Set conditions: timeout threshold, error codes that trigger fallback

When the primary model is unavailable or slow, the gateway automatically routes to the fallback. Your application code doesn't change at all. No try/catch, no retry logic, no second model configuration. The gateway handles it at the infrastructure level.

This is the approach we recommend for most production apps. Code-level fallbacks (shown later in the Advanced section) are there for cases where you need fine-grained control, like falling back only for specific endpoints or adjusting the prompt for a different model.

## Why Centralize the Provider?

Right now, each endpoint creates its own gateway client:

```typescript
// In api/chat/+server.ts
const gateway = createGateway({ apiKey: AI_GATEWAY_API_KEY });

// In api/parse-alert/+server.ts (same thing, duplicated)
const gateway = createGateway({ apiKey: AI_GATEWAY_API_KEY });
```

A centralized provider means one place to:

- Configure the API key
- Add usage tracking middleware
- Define fallback models
- Adjust settings across all endpoints

## Hands-on exercise 2.4

Let's create a centralized AI provider with usage tracking and model fallbacks:

**Requirements:**

1. Configure fallback models in the AI Gateway dashboard
2. Create `src/lib/ai/provider.ts`
3. Use `wrapLanguageModel` to add middleware that logs input/output token counts
4. Export a `getModel()` function that returns a wrapped model instance
5. Update the chat and parse-alert endpoints to use the shared provider

**Implementation hints:**

- Set up gateway-level fallbacks first (dashboard config, no code needed)
- `wrapLanguageModel` from the `ai` package wraps any model with middleware hooks
- The middleware object needs `specificationVersion: 'v3'`
- `wrapGenerate` intercepts `generateText()` calls, `wrapStream` intercepts `streamText()` calls. You need both to cover all endpoints
- Token usage is available as `result.usage.inputTokens.total` and `result.usage.outputTokens.total`

## Try It

1. **Send a chat message and check server logs:**

   ```
   What's the weather like at Mammoth?
   ```

   Server logs should show:

   ```
   [AI Usage] Model: anthropic/claude-sonnet-4
   [AI Usage] Input tokens: 245
   [AI Usage] Output tokens: 89
   [AI Usage] Total tokens: 334
   ```

2. **Test the parse-alert endpoint (also uses the shared provider):**

   ```bash
   $ curl -X POST http://localhost:5173/api/parse-alert \
     -H "Content-Type: application/json" \
     -d '{"query": "powder at Grand Targhee"}'
   ```

   Server logs should show usage for this request too.

3. **Verify fallback behavior:**

   Temporarily change the primary model to an invalid name and verify the fallback model handles the request.

## Commit

```bash
git add -A
git commit -m "feat(ai): centralize provider with usage tracking and fallbacks"
git push
```

## Done-When

- [ ] Fallback model is configured in the AI Gateway dashboard
- [ ] `src/lib/ai/provider.ts` exports a `getModel()` function
- [ ] Token usage is logged for every AI request
- [ ] Chat and parse-alert endpoints use the shared provider instead of their own clients

## Solution

```typescript title="src/lib/ai/provider.ts"
import { createGateway, wrapLanguageModel } from 'ai';
import { AI_GATEWAY_API_KEY } from '$env/static/private';

const gateway = createGateway({
  apiKey: AI_GATEWAY_API_KEY
});

const PRIMARY_MODEL = 'anthropic/claude-sonnet-4';
const FALLBACK_MODEL = 'anthropic/claude-haiku-4.5';

function logUsage(usage: { inputTokens: { total?: number }; outputTokens: { total?: number } }) {
  const input = usage.inputTokens.total ?? 0;
  const output = usage.outputTokens.total ?? 0;
  console.log(`[AI Usage] Input tokens: ${input}`);
  console.log(`[AI Usage] Output tokens: ${output}`);
  console.log(`[AI Usage] Total tokens: ${input + output}`);
}

function withUsageTracking(model: ReturnType<typeof gateway>) {
  return wrapLanguageModel({
    model,
    middleware: {
      specificationVersion: 'v3',
      wrapGenerate: async ({ doGenerate }) => {
        const result = await doGenerate();
        if (result.usage) logUsage(result.usage);
        return result;
      },
      wrapStream: async ({ doStream }) => {
        const { stream, ...rest } = await doStream();
        let usage: typeof rest.rawResponse | undefined;

        return {
          stream: stream.pipeThrough(
            new TransformStream({
              transform(chunk, controller) {
                if (chunk.type === 'usage') usage = chunk.value;
                controller.enqueue(chunk);
              },
              flush() {
                if (usage) logUsage(usage);
              }
            })
          ),
          ...rest
        };
      }
    }
  });
}

export function getModel() {
  return withUsageTracking(gateway(PRIMARY_MODEL));
}

export function getFallbackModel() {
  return withUsageTracking(gateway(FALLBACK_MODEL));
}

export { PRIMARY_MODEL, FALLBACK_MODEL };
```

**Updated chat endpoint using the shared provider:**

```typescript title="src/routes/api/chat/+server.ts" {1,6}
import { getModel } from '$lib/ai/provider';
import { streamText, tool, stepCountIs } from 'ai';
import { valibotSchema } from '@ai-sdk/valibot';
import { resorts } from '$lib/data/resorts';
import { CreateAlertToolInputSchema } from '$lib/schemas/alert';
import type { RequestHandler } from './$types';

// Remove the local anthropic client. Use getModel() instead

export const POST: RequestHandler = async ({ request }) => {
  const { message } = await request.json();

  const resortList = resorts
    .map((r) => `- ${r.name} (id: ${r.id})`)
    .join('\n');

  const result = streamText({
    model: getModel(), // Uses the centralized, tracked model
    system: `You are a helpful ski conditions assistant...`,
    messages: [{ role: 'user', content: message }],
    tools: {
      create_alert: tool({ /* ... same as before */ })
    },
    stopWhen: stepCountIs(3)
  });

  // ... rest of the SSE stream logic unchanged
};
```

**Updated parse-alert endpoint:** The same change applies. Replace the local `createGateway` and `gateway(...)` call with `getModel()`:

```typescript title="src/routes/api/parse-alert/+server.ts" {1,7}
import { generateText, Output } from 'ai';
import { valibotSchema } from '@ai-sdk/valibot';
import * as v from 'valibot';
import { resorts } from '$lib/data/resorts';
import { CreateAlertToolInputSchema, AlertConditionSchema } from '$lib/schemas/alert';
import type { RequestHandler } from './$types';
import { getModel } from '$lib/ai/provider';

// Remove the local gateway client. Use getModel() in the generateText call:
//   model: getModel(),
```

The gateway handles fallbacks at the infrastructure level (configured in the dashboard earlier). The `getFallbackModel()` export is available for cases where you need explicit code-level control, covered in the Advanced section below.

`wrapLanguageModel` intercepts the model's lifecycle. `wrapGenerate` handles `generateText()` calls (like the parse-alert endpoint), while `wrapStream` handles `streamText()` calls (like the chat endpoint). Both hooks need to be present to track usage across all endpoints. The middleware needs `specificationVersion: 'v3'` in the v6 SDK. Usage data lives on `result.usage` with `inputTokens.total` and `outputTokens.total`. Since all endpoints now use `getModel()`, tracking and config changes apply everywhere from one file.

## Troubleshooting

\*\*Warning: Token counts show 0 for both input and output\*\*

Your middleware may not be wired up correctly. Verify that `withUsageTracking` is being called and that `getModel()` returns the wrapped model, not the raw gateway model.

\*\*Warning: result.usage is undefined\*\*

Check that `specificationVersion: 'v3'` is set in the middleware object. Without it, the v6 SDK won't pass usage data to your hooks.

## Advanced: Code-Level Fallbacks

If you need fallback behavior that's more nuanced than the gateway dashboard allows, like adjusting the prompt for a different model or falling back only for specific endpoints, handle it in code. The `getFallbackModel()` export from the provider gives you a cheaper, faster model:

```typescript
import { getModel, getFallbackModel } from '$lib/ai/provider';

// In your stream's start() function:
async start(controller) {
  try {
    for await (const part of result.fullStream) {
      // ... handle parts
    }
  } catch (error) {
    console.warn('[AI Fallback] Primary stream failed, retrying with fallback');
    const fallbackResult = streamText({ ...options, model: getFallbackModel() });
    for await (const part of fallbackResult.fullStream) {
      // ... handle parts
    }
  }
}
```

For most apps, the gateway-level approach is simpler and sufficient. Use code-level fallbacks when you need the extra control.

## Advanced: Cost Estimation

Add per-request cost estimates to your logs:

```typescript
// Approximate pricing (check anthropic.com/pricing for current rates)
const PRICING = {
  'anthropic/claude-sonnet-4': { input: 3.0, output: 15.0 }, // per million tokens
  'anthropic/claude-haiku-4.5': { input: 0.25, output: 1.25 }
};

function estimateCost(
  modelId: string,
  inputTokens: number,
  outputTokens: number
): string {
  const prices = PRICING[modelId as keyof typeof PRICING];
  if (!prices) return 'unknown';
  const cost =
    (inputTokens / 1_000_000) * prices.input +
    (outputTokens / 1_000_000) * prices.output;
  return `$${cost.toFixed(6)}`;
}
```

In production, you'd send these metrics to a monitoring service rather than just logging them.


---

[Full course index](/academy/llms.txt) · [Sitemap](/academy/sitemap.md)