3-month runway after $100K AWS Activate credit expires. Budget: $45,000 for AI/ML services alone (months 13-15).
Mix designed to push hot-path queries through cheap models first, escalate to Sonnet only when confidence is low.
| Service / Model | Configuration | Monthly Cost | Purpose |
|---|---|---|---|
| Bedrock Claude Sonnet 4.5 (on-demand) |
800M input + 200M output tokens/mo | ~$3,000 | Production risk analysis, report generation, complex reasoning |
| Bedrock Claude Sonnet 4.5 (1-yr Provisioned) |
1 model unit, no upfront, 1-yr commit | ~$1,950 | Baseline customer-facing workload, predictable latency |
| Bedrock Claude Haiku 4.5 |
1.2B input + 400M output tokens/mo | ~$2,100 | High-volume: alert categorisation, claim summaries, notifications |
| Bedrock Amazon Nova Pro / Lite |
600M tokens/mo (open-weight alternative) | ~$450 | Bulk classification, embedding auxiliary, multi-language |
| Bedrock Llama 3.1 70B / Mistral Large |
300M tokens/mo (cost-sensitive workloads) | ~$300 | Open-weight tasks, batch processing, fine-tuned variants |
| Bedrock Titan Text Embeddings v2 |
200M tokens/mo (RAG + semantic search) | ~$50 | Vector embeddings for evidence retrieval (replaces sentence-transformers) |
| Bedrock Bedrock Batch API |
300M tokens/mo (50% off) | ~$200 | Nightly portfolio risk recompute, bulk evidence processing |
| Bedrock Custom Model Import (2 models) |
Custom inference, 50M tokens/mo each | ~$400 | Industry-specific fine-tuned: commercial property, fleet risk |
| Bedrock Guardrails for Bedrock |
20M policy units/mo (PII, hallucination, topic filters) | ~$200 | Compliance + insurance regulatory content filtering |
| Service | Configuration | Monthly Cost | Purpose |
|---|---|---|---|
| SageMaker Real-Time Endpoints |
4x ml.m5.xlarge with auto-scaling (1-8 instances) | ~$1,600 | Custom claims classifier, anomaly detector, risk scoring ensemble |
| SageMaker Serverless Inference |
10M invocations/mo, 4GB mem, 5s max latency | ~$400 | Spikey workloads: occasional inference, no idle cost |
| SageMaker Batch Transform |
ml.m5.4xlarge, 300 hours/mo | ~$900 | Nightly portfolio risk recompute, model scoring at scale, bulk reports |
| SageMaker Training Jobs (GPU) |
ml.p3.2xlarge (V100), 150 hours/mo | ~$1,950 | Weekly model retraining, drift adaptation, A/B challenger training |
| SageMaker Processing Jobs |
ml.m5.2xlarge, 200 hours/mo | ~$300 | Feature engineering, data prep, label generation |
| SageMaker SageMaker Studio |
5 users, ml.m5.4xlarge notebooks | ~$200 | Data scientist workspace, experiment tracking |
| SageMaker Feature Store |
10M records, 100K reads/mo | ~$150 | Online + offline feature store for risk models |
| SageMaker Model Registry + Pipelines |
50 model versions, 100 pipeline runs/mo | ~$100 | MLOps: model versioning, A/B testing, automated retraining |
| Service | Configuration | Monthly Cost | Purpose |
|---|---|---|---|
| Comprehend Comprehend (NER + Sentiment) |
10M characters/mo (claims + incident text) | ~$30 | Extract entities, sentiment from claim descriptions |
| Comprehend Comprehend Medical |
8M characters/mo (medical claims) | ~$400 | PHI extraction, medical entity recognition for injury claims |
| Rekognition Image Analysis |
3M images/mo (property damage, incident photos) | ~$300 | Visual damage triage, label detection, PPE compliance |
| Rekognition Custom Labels |
5M inference units/mo (custom damage classes) | ~$200 | Industry-specific visual damage classification |
| Forecast Time-Series Forecasting |
20M predictions/mo (telemetry forecasting) | ~$180 | Sensor drift prediction, anomaly forecasting, capacity planning |
| Textract Document Extraction |
50K pages/mo (insurance docs, certificates) | ~$75 | OCR + structured extraction from policy documents, certificates of insurance |
| Translate Real-Time Translation |
5M characters/mo (multi-region support) | ~$75 | Multi-language evidence, customer comms in EU/Asia markets |
| Polly Text-to-Speech |
20M characters/mo (alert voice calls) | ~$60 | Voice alert synthesis for emergency callouts |
| Lex Conversational AI Bot |
5K text requests + 2K speech requests/mo | ~$50 | Customer support chatbot for risk queries |
| Kendra Intelligent Search |
Developer Edition, 5M queries/mo | ~$810 | Enterprise search across evidence, reports, policies (replaces OpenSearch for RAG) |
| Service | Configuration | Monthly Cost | Purpose |
|---|---|---|---|
| S3 S3 Standard (AI training data) |
3TB active training + model artefacts | ~$70 | Training datasets, model checkpoints, evaluation results |
| S3 S3 Intelligent-Tiering |
10TB (prompt logs, completions, eval sets) | ~$260 | Long-term LLM conversation logs for fine-tuning + audit |
| S3 Vector Store (OpenSearch Serverless) |
100GB vectors, 10M queries/mo | ~$700 | Managed vector DB for RAG (replaces self-hosted pgvector) |
| Kinesis Kinesis Data Streams (token events) |
20 shards, prompt + completion event stream | ~$440 | Real-time token usage metering + anomaly detection |
Per the platform's rate limiting implementation, each tenant gets a tiered token budget. The default tiers (overridable per-tenant):
| Tier | RPM | Tokens/min | Target Customer |
|---|---|---|---|
| Free | 10 | 50K | Trial signups, sandbox tenants |
| Starter | 60 | 500K | SMB insurers, < 100 assets |
| Pro | 300 | 5M | Mid-market, 100-1K assets |
| Enterprise | Unlimited | Negotiated | Large insurers, > 1K assets, custom SLAs |
This is enforced via Redis-backed token-bucket (key: `tquota:{tenant_id}:{model}:{window}`) and surfaced as 429 responses with `Retry-After` headers when exceeded.