Cost & Token Tracking

Monitor your LLM spending in real-time with automatic token counting and cost calculation across all providers.

How It Works

Opswald tracks costs server-side during ingestion. The proxy and SDKs capture raw token counts; the API backend calculates costs using a centralized pricing table.

Input/output tokens captured at the proxy and SDK level
Cost calculation performed server-side during span ingestion (not at the edge)
Provider-specific metrics (OpenAI vs Anthropic vs Google)
Per-session aggregation for conversation-level costs
Fractional cent precision — costs stored as NUMERIC(18,6), e.g. 0.0375 cents

Real-Time Monitoring

View costs as they accumulate:

# Check current session costs
curl -H "X-Opswald-Key: your-token" \
     https://proxy.opswald.com/v1/sessions/current/cost

{
  "sessionId": "chat-user-123",
  "totalCost": 0.0245,
  "totalTokens": {
    "input": 1250,
    "output": 850
  },
  "breakdown": {
    "gpt-4o": {"cost": 0.0180, "requests": 3},
    "claude-3-sonnet": {"cost": 0.0065, "requests": 1}
  }
}

Cost Enrichment

Cost is calculated on the backend during span ingestion, not at the proxy. The proxy sends costCents: 0 and the API backend enriches each span with the calculated cost based on the model and token counts.

Recalculation trigger: costCents === 0 AND (inputTokens > 0 OR outputTokens > 0)

Spans from older SDK versions that already include a non-zero costCents are accepted as-is (backward compatible).

Budget Controls

Set spending limits to prevent runaway costs:

Per-Session Budgets

import openai

client = openai.OpenAI(
    base_url="https://proxy.opswald.com/openai",
    default_headers={
        "X-Opswald-Key": "your-token",
        "X-Opswald-Session-Budget": "10.00",  # $10 max per session
        "X-Opswald-Session": "user-conversation-456"
    }
)

# Requests will be blocked if session exceeds $10
response = client.chat.completions.create(...)

Global Budgets

# Daily budget limit
client = openai.OpenAI(
    base_url="https://proxy.opswald.com/openai",
    default_headers={
        "X-Opswald-Key": "your-token",
        "X-Opswald-Daily-Budget": "100.00"  # $100/day max
    }
)

Budget Exceeded Response

When a budget is exceeded, the proxy returns:

{
  "error": {
    "type": "budget_exceeded",
    "message": "Session budget of $10.00 exceeded. Current: $10.23",
    "budget_type": "session",
    "limit": 10.00,
    "current": 10.23
  }
}

Pricing Table

The pricing table lives in apps/api/src/services/pricing.ts and is updated via code PRs. It supports prefix-matching for version-stamped model names (e.g. claude-3-5-sonnet-20241022 matches claude-3-5-sonnet).

Supported models include:

OpenAI: gpt-4o, gpt-4o-mini, gpt-4-turbo, gpt-4, gpt-3.5-turbo, o1, o1-mini, o3-mini, embeddings
Anthropic: claude-opus-4, claude-sonnet-4, claude-3-5-sonnet, claude-3-5-haiku, claude-3-opus, claude-3-sonnet, claude-3-haiku

Unknown models receive costCents: 0 with a debug log warning — no error is raised.

Note: Cache tokens (cacheCreationInputTokens, cacheReadInputTokens) are not priced in this iteration.

Cost Analytics

Dashboard Views

Access detailed cost analytics at app.opswald.com:

Daily spending trends
Cost per model breakdown
Most expensive sessions
Token efficiency metrics

API Analytics

# Get cost analytics
curl -H "Authorization: Bearer your-api-key" \
     https://api.opswald.com/v1/analytics/cost?period=7d

{
  "period": "7d",
  "totalCost": 145.32,
  "totalTokens": 2850000,
  "averageCostPerRequest": 0.18,
  "topModels": [
    {"model": "gpt-4o", "cost": 89.21, "share": 0.614},
    {"model": "claude-3-sonnet", "cost": 56.11, "share": 0.386}
  ],
  "daily": [
    {"date": "2026-03-10", "cost": 22.14, "requests": 145},
    {"date": "2026-03-11", "cost": 19.88, "requests": 132}
  ]
}

Cost Optimization

Model Selection

Choose cost-effective models for different use cases:

# Route by cost-effectiveness
client = openai.OpenAI(
    base_url="https://proxy.opswald.com/openai",
    default_headers={
        "X-Opswald-Key": "your-token",
        "X-Opswald-Route-By": "cost",        # Prefer cheaper models
        "X-Opswald-Quality-Threshold": "0.8" # But maintain 80% quality
    }
)

# Simple tasks → GPT-4o mini
# Complex tasks → GPT-4o
# The proxy automatically routes based on request complexity

Caching

Reduce costs with intelligent caching:

client = openai.OpenAI(
    base_url="https://proxy.opswald.com/openai",
    default_headers={
        "X-Opswald-Key": "your-token",
        "X-Opswald-Cache": "aggressive",  # Cache similar requests
        "X-Opswald-Cache-TTL": "3600"    # 1 hour cache
    }
)

# Identical or similar requests return cached responses
# Significant cost savings for repeated patterns

Export & Reporting

CSV Export

# Export cost data for accounting
curl -H "Authorization: Bearer your-api-key" \
     "https://api.opswald.com/v1/export/costs?format=csv&period=1m" \
     > costs-march-2026.csv

Usage Reports

from opswald import OpsClient

client = OpsClient("your-api-key")

# Generate monthly report
report = client.costs.monthly_report(
    month="2026-03",
    group_by=["model", "session", "user"]
)

print(f"Total: ${report.total}")
print(f"Most expensive session: {report.top_session}")

Integration Examples

Cost Alerts

import openai
from opswald import CostMonitor

# Set up cost monitoring
monitor = CostMonitor("your-api-key")

@monitor.on_budget_warning(threshold=0.8)
def warn_budget(session_id, current, limit):
    print(f"Session {session_id}: ${current:.2f} of ${limit:.2f} budget used")

@monitor.on_budget_exceeded
def block_session(session_id, current, limit):
    # Implement custom blocking logic
    sessions.pause(session_id)

Department Budgets

# Track costs per department
departments = {
    "engineering": {"budget": 500, "sessions": []},
    "marketing": {"budget": 200, "sessions": []},
    "support": {"budget": 300, "sessions": []}
}

for dept, config in departments.items():
    client = openai.OpenAI(
        base_url="https://proxy.opswald.com/openai",
        default_headers={
            "X-Opswald-Key": "your-token",
            "X-Opswald-Department": dept,
            "X-Opswald-Monthly-Budget": str(config["budget"])
        }
    )

Subscription Pricing

For Claude Pro or ChatGPT Plus subscriptions:

# Track effective cost per token for subscriptions
client = openai.OpenAI(
    base_url="https://proxy.opswald.com/openai",
    default_headers={
        "X-Opswald-Key": "your-token",
        "X-Opswald-Subscription": "chatgpt-plus",  # $20/month flat rate
        "X-Opswald-Subscription-Limit": "40"       # 40 requests/3 hours
    }
)

# Dashboard shows:
# - Effective cost per token based on subscription fee
# - Usage against rate limits
# - Savings vs pay-per-token pricing

Best Practices

Cost Management

Set conservative budgets initially and increase based on usage
Monitor daily for unexpected spikes
Use caching for repeated patterns
Route strategically between models based on task complexity
Track per-user costs for chargeback or optimization

Budget Planning

# Estimate costs before deployment
from opswald import CostEstimator

estimator = CostEstimator()

# Estimate based on expected usage
monthly_cost = estimator.estimate(
    requests_per_day=1000,
    avg_input_tokens=500,
    avg_output_tokens=200,
    models={"gpt-4o": 0.7, "gpt-4o-mini": 0.3}  # Distribution
)

print(f"Estimated monthly cost: ${monthly_cost:.2f}")

Troubleshooting High Costs

Common causes of unexpected costs:

Runaway loops - Agent keeps calling LLM without termination
Large context windows - Accidentally including large documents
High-cost models - Using GPT-4o when GPT-4o-mini would work
No caching - Repeating identical requests

The Opswald dashboard highlights these patterns automatically.