Cost & Token Tracking
Cost & Token Tracking
Monitor your LLM spending in real-time with automatic token counting and cost calculation across all providers.
How It Works
Opswald tracks costs server-side during ingestion. The proxy and SDKs capture raw token counts; the API backend calculates costs using a centralized pricing table.
- Input/output tokens captured at the proxy and SDK level
- Cost calculation performed server-side during span ingestion (not at the edge)
- Provider-specific metrics (OpenAI vs Anthropic vs Google)
- Per-session aggregation for conversation-level costs
- Fractional cent precision — costs stored as NUMERIC(18,6), e.g. 0.0375 cents
Real-Time Monitoring
View costs as they accumulate:
# Check current session costscurl -H "X-Opswald-Key: your-token" \ https://proxy.opswald.com/v1/sessions/current/cost
{ "sessionId": "chat-user-123", "totalCost": 0.0245, "totalTokens": { "input": 1250, "output": 850 }, "breakdown": { "gpt-4o": {"cost": 0.0180, "requests": 3}, "claude-3-sonnet": {"cost": 0.0065, "requests": 1} }}Cost Enrichment
Cost is calculated on the backend during span ingestion, not at the proxy. The proxy sends costCents: 0 and the API backend enriches each span with the calculated cost based on the model and token counts.
Recalculation trigger: costCents === 0 AND (inputTokens > 0 OR outputTokens > 0)
Spans from older SDK versions that already include a non-zero costCents are accepted as-is (backward compatible).
Budget Controls
Set spending limits to prevent runaway costs:
Per-Session Budgets
import openai
client = openai.OpenAI( base_url="https://proxy.opswald.com/openai", default_headers={ "X-Opswald-Key": "your-token", "X-Opswald-Session-Budget": "10.00", # $10 max per session "X-Opswald-Session": "user-conversation-456" })
# Requests will be blocked if session exceeds $10response = client.chat.completions.create(...)Global Budgets
# Daily budget limitclient = openai.OpenAI( base_url="https://proxy.opswald.com/openai", default_headers={ "X-Opswald-Key": "your-token", "X-Opswald-Daily-Budget": "100.00" # $100/day max })Budget Exceeded Response
When a budget is exceeded, the proxy returns:
{ "error": { "type": "budget_exceeded", "message": "Session budget of $10.00 exceeded. Current: $10.23", "budget_type": "session", "limit": 10.00, "current": 10.23 }}Pricing Table
The pricing table lives in apps/api/src/services/pricing.ts and is updated via code PRs. It supports prefix-matching for version-stamped model names (e.g. claude-3-5-sonnet-20241022 matches claude-3-5-sonnet).
Supported models include:
- OpenAI: gpt-4o, gpt-4o-mini, gpt-4-turbo, gpt-4, gpt-3.5-turbo, o1, o1-mini, o3-mini, embeddings
- Anthropic: claude-opus-4, claude-sonnet-4, claude-3-5-sonnet, claude-3-5-haiku, claude-3-opus, claude-3-sonnet, claude-3-haiku
Unknown models receive costCents: 0 with a debug log warning — no error is raised.
Note: Cache tokens (cacheCreationInputTokens, cacheReadInputTokens) are not priced in this iteration.
Cost Analytics
Dashboard Views
Access detailed cost analytics at app.opswald.com:
- Daily spending trends
- Cost per model breakdown
- Most expensive sessions
- Token efficiency metrics
API Analytics
# Get cost analyticscurl -H "Authorization: Bearer your-api-key" \ https://api.opswald.com/v1/analytics/cost?period=7d
{ "period": "7d", "totalCost": 145.32, "totalTokens": 2850000, "averageCostPerRequest": 0.18, "topModels": [ {"model": "gpt-4o", "cost": 89.21, "share": 0.614}, {"model": "claude-3-sonnet", "cost": 56.11, "share": 0.386} ], "daily": [ {"date": "2026-03-10", "cost": 22.14, "requests": 145}, {"date": "2026-03-11", "cost": 19.88, "requests": 132} ]}Cost Optimization
Model Selection
Choose cost-effective models for different use cases:
# Route by cost-effectivenessclient = openai.OpenAI( base_url="https://proxy.opswald.com/openai", default_headers={ "X-Opswald-Key": "your-token", "X-Opswald-Route-By": "cost", # Prefer cheaper models "X-Opswald-Quality-Threshold": "0.8" # But maintain 80% quality })
# Simple tasks → GPT-4o mini# Complex tasks → GPT-4o# The proxy automatically routes based on request complexityCaching
Reduce costs with intelligent caching:
client = openai.OpenAI( base_url="https://proxy.opswald.com/openai", default_headers={ "X-Opswald-Key": "your-token", "X-Opswald-Cache": "aggressive", # Cache similar requests "X-Opswald-Cache-TTL": "3600" # 1 hour cache })
# Identical or similar requests return cached responses# Significant cost savings for repeated patternsExport & Reporting
CSV Export
# Export cost data for accountingcurl -H "Authorization: Bearer your-api-key" \ "https://api.opswald.com/v1/export/costs?format=csv&period=1m" \ > costs-march-2026.csvUsage Reports
from opswald import OpsClient
client = OpsClient("your-api-key")
# Generate monthly reportreport = client.costs.monthly_report( month="2026-03", group_by=["model", "session", "user"])
print(f"Total: ${report.total}")print(f"Most expensive session: {report.top_session}")Integration Examples
Cost Alerts
import openaifrom opswald import CostMonitor
# Set up cost monitoringmonitor = CostMonitor("your-api-key")
@monitor.on_budget_warning(threshold=0.8)def warn_budget(session_id, current, limit): print(f"Session {session_id}: ${current:.2f} of ${limit:.2f} budget used")
@monitor.on_budget_exceededdef block_session(session_id, current, limit): # Implement custom blocking logic sessions.pause(session_id)Department Budgets
# Track costs per departmentdepartments = { "engineering": {"budget": 500, "sessions": []}, "marketing": {"budget": 200, "sessions": []}, "support": {"budget": 300, "sessions": []}}
for dept, config in departments.items(): client = openai.OpenAI( base_url="https://proxy.opswald.com/openai", default_headers={ "X-Opswald-Key": "your-token", "X-Opswald-Department": dept, "X-Opswald-Monthly-Budget": str(config["budget"]) } )Subscription Pricing
For Claude Pro or ChatGPT Plus subscriptions:
# Track effective cost per token for subscriptionsclient = openai.OpenAI( base_url="https://proxy.opswald.com/openai", default_headers={ "X-Opswald-Key": "your-token", "X-Opswald-Subscription": "chatgpt-plus", # $20/month flat rate "X-Opswald-Subscription-Limit": "40" # 40 requests/3 hours })
# Dashboard shows:# - Effective cost per token based on subscription fee# - Usage against rate limits# - Savings vs pay-per-token pricingBest Practices
Cost Management
- Set conservative budgets initially and increase based on usage
- Monitor daily for unexpected spikes
- Use caching for repeated patterns
- Route strategically between models based on task complexity
- Track per-user costs for chargeback or optimization
Budget Planning
# Estimate costs before deploymentfrom opswald import CostEstimator
estimator = CostEstimator()
# Estimate based on expected usagemonthly_cost = estimator.estimate( requests_per_day=1000, avg_input_tokens=500, avg_output_tokens=200, models={"gpt-4o": 0.7, "gpt-4o-mini": 0.3} # Distribution)
print(f"Estimated monthly cost: ${monthly_cost:.2f}")Troubleshooting High Costs
Common causes of unexpected costs:
- Runaway loops - Agent keeps calling LLM without termination
- Large context windows - Accidentally including large documents
- High-cost models - Using GPT-4o when GPT-4o-mini would work
- No caching - Repeating identical requests
The Opswald dashboard highlights these patterns automatically.