Skip to content

Cost & Token Tracking

Cost & Token Tracking

Monitor your LLM spending in real-time with automatic token counting and cost calculation across all providers.

How It Works

Opswald tracks costs server-side during ingestion. The proxy and SDKs capture raw token counts; the API backend calculates costs using a centralized pricing table.

  • Input/output tokens captured at the proxy and SDK level
  • Cost calculation performed server-side during span ingestion (not at the edge)
  • Provider-specific metrics (OpenAI vs Anthropic vs Google)
  • Per-session aggregation for conversation-level costs
  • Fractional cent precision — costs stored as NUMERIC(18,6), e.g. 0.0375 cents

Real-Time Monitoring

View costs as they accumulate:

Terminal window
# Check current session costs
curl -H "X-Opswald-Key: your-token" \
https://proxy.opswald.com/v1/sessions/current/cost
{
"sessionId": "chat-user-123",
"totalCost": 0.0245,
"totalTokens": {
"input": 1250,
"output": 850
},
"breakdown": {
"gpt-4o": {"cost": 0.0180, "requests": 3},
"claude-3-sonnet": {"cost": 0.0065, "requests": 1}
}
}

Cost Enrichment

Cost is calculated on the backend during span ingestion, not at the proxy. The proxy sends costCents: 0 and the API backend enriches each span with the calculated cost based on the model and token counts.

Recalculation trigger: costCents === 0 AND (inputTokens > 0 OR outputTokens > 0)

Spans from older SDK versions that already include a non-zero costCents are accepted as-is (backward compatible).

Budget Controls

Set spending limits to prevent runaway costs:

Per-Session Budgets

import openai
client = openai.OpenAI(
base_url="https://proxy.opswald.com/openai",
default_headers={
"X-Opswald-Key": "your-token",
"X-Opswald-Session-Budget": "10.00", # $10 max per session
"X-Opswald-Session": "user-conversation-456"
}
)
# Requests will be blocked if session exceeds $10
response = client.chat.completions.create(...)

Global Budgets

# Daily budget limit
client = openai.OpenAI(
base_url="https://proxy.opswald.com/openai",
default_headers={
"X-Opswald-Key": "your-token",
"X-Opswald-Daily-Budget": "100.00" # $100/day max
}
)

Budget Exceeded Response

When a budget is exceeded, the proxy returns:

{
"error": {
"type": "budget_exceeded",
"message": "Session budget of $10.00 exceeded. Current: $10.23",
"budget_type": "session",
"limit": 10.00,
"current": 10.23
}
}

Pricing Table

The pricing table lives in apps/api/src/services/pricing.ts and is updated via code PRs. It supports prefix-matching for version-stamped model names (e.g. claude-3-5-sonnet-20241022 matches claude-3-5-sonnet).

Supported models include:

  • OpenAI: gpt-4o, gpt-4o-mini, gpt-4-turbo, gpt-4, gpt-3.5-turbo, o1, o1-mini, o3-mini, embeddings
  • Anthropic: claude-opus-4, claude-sonnet-4, claude-3-5-sonnet, claude-3-5-haiku, claude-3-opus, claude-3-sonnet, claude-3-haiku

Unknown models receive costCents: 0 with a debug log warning — no error is raised.

Note: Cache tokens (cacheCreationInputTokens, cacheReadInputTokens) are not priced in this iteration.

Cost Analytics

Dashboard Views

Access detailed cost analytics at app.opswald.com:

  • Daily spending trends
  • Cost per model breakdown
  • Most expensive sessions
  • Token efficiency metrics

API Analytics

Terminal window
# Get cost analytics
curl -H "Authorization: Bearer your-api-key" \
https://api.opswald.com/v1/analytics/cost?period=7d
{
"period": "7d",
"totalCost": 145.32,
"totalTokens": 2850000,
"averageCostPerRequest": 0.18,
"topModels": [
{"model": "gpt-4o", "cost": 89.21, "share": 0.614},
{"model": "claude-3-sonnet", "cost": 56.11, "share": 0.386}
],
"daily": [
{"date": "2026-03-10", "cost": 22.14, "requests": 145},
{"date": "2026-03-11", "cost": 19.88, "requests": 132}
]
}

Cost Optimization

Model Selection

Choose cost-effective models for different use cases:

# Route by cost-effectiveness
client = openai.OpenAI(
base_url="https://proxy.opswald.com/openai",
default_headers={
"X-Opswald-Key": "your-token",
"X-Opswald-Route-By": "cost", # Prefer cheaper models
"X-Opswald-Quality-Threshold": "0.8" # But maintain 80% quality
}
)
# Simple tasks → GPT-4o mini
# Complex tasks → GPT-4o
# The proxy automatically routes based on request complexity

Caching

Reduce costs with intelligent caching:

client = openai.OpenAI(
base_url="https://proxy.opswald.com/openai",
default_headers={
"X-Opswald-Key": "your-token",
"X-Opswald-Cache": "aggressive", # Cache similar requests
"X-Opswald-Cache-TTL": "3600" # 1 hour cache
}
)
# Identical or similar requests return cached responses
# Significant cost savings for repeated patterns

Export & Reporting

CSV Export

Terminal window
# Export cost data for accounting
curl -H "Authorization: Bearer your-api-key" \
"https://api.opswald.com/v1/export/costs?format=csv&period=1m" \
> costs-march-2026.csv

Usage Reports

from opswald import OpsClient
client = OpsClient("your-api-key")
# Generate monthly report
report = client.costs.monthly_report(
month="2026-03",
group_by=["model", "session", "user"]
)
print(f"Total: ${report.total}")
print(f"Most expensive session: {report.top_session}")

Integration Examples

Cost Alerts

import openai
from opswald import CostMonitor
# Set up cost monitoring
monitor = CostMonitor("your-api-key")
@monitor.on_budget_warning(threshold=0.8)
def warn_budget(session_id, current, limit):
print(f"Session {session_id}: ${current:.2f} of ${limit:.2f} budget used")
@monitor.on_budget_exceeded
def block_session(session_id, current, limit):
# Implement custom blocking logic
sessions.pause(session_id)

Department Budgets

# Track costs per department
departments = {
"engineering": {"budget": 500, "sessions": []},
"marketing": {"budget": 200, "sessions": []},
"support": {"budget": 300, "sessions": []}
}
for dept, config in departments.items():
client = openai.OpenAI(
base_url="https://proxy.opswald.com/openai",
default_headers={
"X-Opswald-Key": "your-token",
"X-Opswald-Department": dept,
"X-Opswald-Monthly-Budget": str(config["budget"])
}
)

Subscription Pricing

For Claude Pro or ChatGPT Plus subscriptions:

# Track effective cost per token for subscriptions
client = openai.OpenAI(
base_url="https://proxy.opswald.com/openai",
default_headers={
"X-Opswald-Key": "your-token",
"X-Opswald-Subscription": "chatgpt-plus", # $20/month flat rate
"X-Opswald-Subscription-Limit": "40" # 40 requests/3 hours
}
)
# Dashboard shows:
# - Effective cost per token based on subscription fee
# - Usage against rate limits
# - Savings vs pay-per-token pricing

Best Practices

Cost Management

  1. Set conservative budgets initially and increase based on usage
  2. Monitor daily for unexpected spikes
  3. Use caching for repeated patterns
  4. Route strategically between models based on task complexity
  5. Track per-user costs for chargeback or optimization

Budget Planning

# Estimate costs before deployment
from opswald import CostEstimator
estimator = CostEstimator()
# Estimate based on expected usage
monthly_cost = estimator.estimate(
requests_per_day=1000,
avg_input_tokens=500,
avg_output_tokens=200,
models={"gpt-4o": 0.7, "gpt-4o-mini": 0.3} # Distribution
)
print(f"Estimated monthly cost: ${monthly_cost:.2f}")

Troubleshooting High Costs

Common causes of unexpected costs:

  1. Runaway loops - Agent keeps calling LLM without termination
  2. Large context windows - Accidentally including large documents
  3. High-cost models - Using GPT-4o when GPT-4o-mini would work
  4. No caching - Repeating identical requests

The Opswald dashboard highlights these patterns automatically.