Last updated: 2026-04-11

Resilience

How SpendLil ensures your AI never goes down because of us.

SpendLil is built around one principle: your AI must never stop working because of us. Every architectural decision serves this goal.

The Resilience Model

StateWhat Happens
NormalFull proxy: requests logged, spend tracked, alerts fire, response returned
DynamoDB degradedRequest forwarded to provider, failed writes queued for retry, response returned
Lambda errorAPI Gateway returns 502; your app can retry or fall back to calling the provider directly
Complete outageDNS failover — your requests go direct to your AI provider using the key already in your request

Why This Works

The key insight is that your provider API key is always in your request. SpendLil never strips it, never replaces it, never stores it. This means at every failure point, you still have everything you need to call your AI provider directly.

Compare this with proxies that store your keys and inject them on your behalf. If that proxy goes down, your requests fail because the key isn't in them. With SpendLil, the worst case is a gap in your spend data — never a gap in your AI service.

Fire-and-Forget Logging

When the proxy processes a request, it returns the provider's response before confirming that the usage record was written to DynamoDB. If the write fails, you still get your response. The write failure is logged and can be retried, but it never blocks the critical path.

Nothing in the critical path can cause a permanent failure

The provider's response is returned to your app regardless of whether SpendLil successfully logged the usage. Tracking loss is temporary; service loss is never.

Dashboard Independence

The dashboard (app.spendlil.ai) is completely decoupled from the proxy (gateway.spendlil.ai). They share a DynamoDB table but run on independent infrastructure. If the dashboard goes down, the proxy keeps working. If the proxy has issues, the dashboard still shows your historical data.

Infrastructure

The proxy runs on AWS Lambda + API Gateway, which provides built-in multi-availability-zone redundancy managed by AWS. There are no single points of failure: no single server, no single container, no Redis instance, no RDS database. DynamoDB is serverless with point-in-time recovery enabled.

Building Fallback Into Your App

For maximum resilience, implement a simple fallback in your application code that calls your provider directly if SpendLil returns a 502 or 503. See the Error Handling guide for a complete code example.