AWS AI/ML Token Burn Plan

3-month runway after $100K AWS Activate credit expires. Budget: $45,000 for AI/ML services alone (months 13-15).

3-Month Post-Credit AI/ML Spend

Month 13 (transition)

$11,200

Month 14 (optimise)

$14,500

Month 15 (scale)

$19,300

Quarter Total

$45,000

Avg Monthly

$15,000

Year-2 Steady State

$10-14K/mo

Why this is aggressive: AWS gave us $100K credits to prove production load. The minute credits end, they expect to see $10-20K/mo in real PAYG spend. Dropping to $2K/mo after credits look like we were just freeloading. The plan below assumes we keep building and accept the burn as customer acquisition cost.

3-Month Burn Trajectory

MONTH 13 (Credit +0)

$11,200

Switch to PAYG on day 1. Keep Bedrock Provisioned Throughput (commit month-to-month). Activate 1-yr Savings Plans to lock rates. Aggressive cache warmup.

MONTH 14 (Crawl)

$14,500

Customers signing up. Token volume spikes 40%. Add 2nd Bedrock model unit. SageMaker endpoints scaled up. No Prometheus data yet to optimise - just absorb.

MONTH 15 (Walk)

$19,300

First renewal cohort + new customers. Production traffic at 80% of credit-era peak. Add Kinesis + SageMaker Batch for scale. Lock in 1-yr RIs.

Bedrock Token Budget Breakdown

Month 15 Token Allocation: ~3.8B input + 1.2B output tokens

Claude Sonnet 35%

Claude Haiku 30%

Nova/Llama 15%

Embeddings 10%

Batch API 10%

Mix designed to push hot-path queries through cheap models first, escalate to Sonnet only when confidence is low.

Service-by-Service Plan

1. Amazon Bedrock (Foundation Models)

Service / Model	Configuration	Monthly Cost	Purpose
Bedrock Claude Sonnet 4.5 (on-demand)	800M input + 200M output tokens/mo	~$3,000	Production risk analysis, report generation, complex reasoning
Bedrock Claude Sonnet 4.5 (1-yr Provisioned)	1 model unit, no upfront, 1-yr commit	~$1,950	Baseline customer-facing workload, predictable latency
Bedrock Claude Haiku 4.5	1.2B input + 400M output tokens/mo	~$2,100	High-volume: alert categorisation, claim summaries, notifications
Bedrock Amazon Nova Pro / Lite	600M tokens/mo (open-weight alternative)	~$450	Bulk classification, embedding auxiliary, multi-language
Bedrock Llama 3.1 70B / Mistral Large	300M tokens/mo (cost-sensitive workloads)	~$300	Open-weight tasks, batch processing, fine-tuned variants
Bedrock Titan Text Embeddings v2	200M tokens/mo (RAG + semantic search)	~$50	Vector embeddings for evidence retrieval (replaces sentence-transformers)
Bedrock Bedrock Batch API	300M tokens/mo (50% off)	~$200	Nightly portfolio risk recompute, bulk evidence processing
Bedrock Custom Model Import (2 models)	Custom inference, 50M tokens/mo each	~$400	Industry-specific fine-tuned: commercial property, fleet risk
Bedrock Guardrails for Bedrock	20M policy units/mo (PII, hallucination, topic filters)	~$200	Compliance + insurance regulatory content filtering

Bedrock Subtotal: ~$8,650/mo — this is the single largest AI line item. Intelligent prompt routing (Haiku -> Sonnet escalation) saves 30-40% versus using Sonnet for everything.

2. Amazon SageMaker (Custom ML)

Service	Configuration	Monthly Cost	Purpose
SageMaker Real-Time Endpoints	4x ml.m5.xlarge with auto-scaling (1-8 instances)	~$1,600	Custom claims classifier, anomaly detector, risk scoring ensemble
SageMaker Serverless Inference	10M invocations/mo, 4GB mem, 5s max latency	~$400	Spikey workloads: occasional inference, no idle cost
SageMaker Batch Transform	ml.m5.4xlarge, 300 hours/mo	~$900	Nightly portfolio risk recompute, model scoring at scale, bulk reports
SageMaker Training Jobs (GPU)	ml.p3.2xlarge (V100), 150 hours/mo	~$1,950	Weekly model retraining, drift adaptation, A/B challenger training
SageMaker Processing Jobs	ml.m5.2xlarge, 200 hours/mo	~$300	Feature engineering, data prep, label generation
SageMaker SageMaker Studio	5 users, ml.m5.4xlarge notebooks	~$200	Data scientist workspace, experiment tracking
SageMaker Feature Store	10M records, 100K reads/mo	~$150	Online + offline feature store for risk models
SageMaker Model Registry + Pipelines	50 model versions, 100 pipeline runs/mo	~$100	MLOps: model versioning, A/B testing, automated retraining

3. Other AI/ML Services

Service	Configuration	Monthly Cost	Purpose
Comprehend Comprehend (NER + Sentiment)	10M characters/mo (claims + incident text)	~$30	Extract entities, sentiment from claim descriptions
Comprehend Comprehend Medical	8M characters/mo (medical claims)	~$400	PHI extraction, medical entity recognition for injury claims
Rekognition Image Analysis	3M images/mo (property damage, incident photos)	~$300	Visual damage triage, label detection, PPE compliance
Rekognition Custom Labels	5M inference units/mo (custom damage classes)	~$200	Industry-specific visual damage classification
Forecast Time-Series Forecasting	20M predictions/mo (telemetry forecasting)	~$180	Sensor drift prediction, anomaly forecasting, capacity planning
Textract Document Extraction	50K pages/mo (insurance docs, certificates)	~$75	OCR + structured extraction from policy documents, certificates of insurance
Translate Real-Time Translation	5M characters/mo (multi-region support)	~$75	Multi-language evidence, customer comms in EU/Asia markets
Polly Text-to-Speech	20M characters/mo (alert voice calls)	~$60	Voice alert synthesis for emergency callouts
Lex Conversational AI Bot	5K text requests + 2K speech requests/mo	~$50	Customer support chatbot for risk queries
Kendra Intelligent Search	Developer Edition, 5M queries/mo	~$810	Enterprise search across evidence, reports, policies (replaces OpenSearch for RAG)

4. AI-Support Storage & Data

Service	Configuration	Monthly Cost	Purpose
S3 S3 Standard (AI training data)	3TB active training + model artefacts	~$70	Training datasets, model checkpoints, evaluation results
S3 S3 Intelligent-Tiering	10TB (prompt logs, completions, eval sets)	~$260	Long-term LLM conversation logs for fine-tuning + audit
S3 Vector Store (OpenSearch Serverless)	100GB vectors, 10M queries/mo	~$700	Managed vector DB for RAG (replaces self-hosted pgvector)
Kinesis Kinesis Data Streams (token events)	20 shards, prompt + completion event stream	~$440	Real-time token usage metering + anomaly detection

3-Month Totals

AI/ML Burn by Month

Bedrock (models)

$8,650

SageMaker (custom ML)

$5,600

Comprehend / Rekognition / Forecast

$1,370

Kendra / Lex / Polly / Translate / Textract

$1,070

AI Storage (S3 + Vector + Kinesis)

$1,470

Monthly Subtotal (steady)

$18,160

Month 13 (transition dip)

$11,200

Quarter Total

$45,000

Token Optimisation Playbook (Post-Credit)

Bedrock Intelligent Prompt Routing — route cheap models first, escalate to Sonnet only on low confidence (saves 30-40%)
Prompt Caching — cache system prompts + repeated context (50% off cached tokens, $0.30/MTok input)
Batch API for non-realtime workloads (50% off, 24hr SLA)
Model distillation — fine-tune Haiku on Sonnet outputs for 90% of common queries
1-year Provisioned Throughput for known baseline (committed discount vs on-demand)
Token quotas per tenant — slowapi-based limit on input tokens per minute (prevents runaway costs)
Response length capping — `max_tokens` discipline (currently 2048, drop to 1024 for Haiku)
Streaming responses with early stop — abort if confidence drops below threshold
RAG first, LLM second — check evidence store before invoking Bedrock (saves full inference cost on 60% of queries)
Guardrails for compliance only on PII queries, not all queries

Tenant Token Quota Strategy

Per the platform's rate limiting implementation, each tenant gets a tiered token budget. The default tiers (overridable per-tenant):

Tier	RPM	Tokens/min	Target Customer
Free	10	50K	Trial signups, sandbox tenants
Starter	60	500K	SMB insurers, < 100 assets
Pro	300	5M	Mid-market, 100-1K assets
Enterprise	Unlimited	Negotiated	Large insurers, > 1K assets, custom SLAs

This is enforced via Redis-backed token-bucket (key: `tquota:{tenant_id}:{model}:{window}`) and surfaced as 429 responses with `Retry-After` headers when exceeded.

Why Burn $15K/mo After Credits?

Three reasons:

Stickiness = future EDP negotiation power. $180K/yr post-credit burn puts us in the top decile of InsurTech AWS customers. That earns a custom Enterprise Discount Program contract at 15-25% off list — saving $30-45K/yr going forward.
Customers expect production-grade AI. If we cut Sonnet access to save $2K/mo, our risk analysis drops in quality, customers churn, and the whole burn plan collapses. $15K/mo is the cost of being taken seriously.
ML training compounds value. Weekly retraining on SageMaker is what makes the platform smarter. Cut that and we lose the "predictive" differentiator that justifies the price tag.

Funding Source for Post-Credit Burn

Series Seed extension: $250-500K at 18-24mo runway, $45K/quarter on AWS is ~9% of capital. Defensible.
Revenue: At 50 paying customers @ $500/mo = $25K MRR. AI/ML is 60% of COGS = $15K/mo. Self-funding at month 14.
AWS EDP: 15-25% discount after proving $180K/yr run rate — saves $27-45K/yr going forward.

Bottom line: We need to show AWS that we can sustain $10-20K/mo in real spend. The $100K credit was the down payment on proving production load. The next $45K is us putting our own capital where our roadmap is. If we can't afford that, we shouldn't be building this.