Architecture
Tech Stack
| Component | Technology | Purpose |
|---|---|---|
| Framework | Fastify 5 (slim-bp-v2) | REST API |
| Persistence | MongoDB / Mongoose | Canvases, widgets, conversations |
| Query Engine | DuckDB | .duckdb file queries (energy audit data) |
| AI (Primary) | Configurable via DEFAULT_LLM | SQL + chart config generation |
| AI (Fallback) | Configurable via FALLBACK_LLM | Guards against transient API errors |
| Auth | JWT (Ed25519) + Cerbos | Session tokens + role-based policies |
AI Strategy
Model Selection
- Primary: Configurable via
DEFAULT_LLMenv var — proven 97% success rate (31/32 queries) on energy audit SQL generation - Fallback: Configurable via
FALLBACK_LLMenv var — triggered if primary returns no tool use or a transient API error (2-attempt flow: primary → fallback) - Store which model generated each widget's SQL (useful for quality debugging)
No Streaming
- Simple JSON request/response — no SSE/WebSocket streaming
- AI responses are short (sentence + chips, or SQL + config) — ~50-200 tokens
- Streaming complexity not justified for this use case
- Can be added later if latency becomes an issue
Structured Output
AI responses follow a structured format via tool use:
// Clarification response
{
"type": "clarification",
"message": "What month range should this cover?"
}
// Chart generation response
{
"type": "chart",
"title": "TC Trends — Line Chart",
"conversationName": "TC trends over time",
"chartType": "line",
"sql": "SELECT ...",
"chartConfig": { "xAxis": "month", "yAxis": ["audited", "total", "unaudited"] }
}
// Table generation response
{
"type": "table",
"title": "Section-wise Loss Breakdown",
"conversationName": "Loss percentage breakdown",
"sql": "SELECT ...",
"columns": [{ "key": "section", "label": "Section" }, ...]
}
Data Limits
- Charts: Max 10 data points per axis
- Prompt engineering: AI is instructed to write SQL that returns max 10 data points
- Backend safety net: After DuckDB execution, if rows > 10 for a chart, do not create the widget. Return a clarification to the user explaining the data has too many points for a chart, suggest narrowing the request or switching to a table. Conversation stays active so the user can adjust
- Tables: Server-side pagination (10 rows per page via
LIMIT/OFFSETin DuckDB). No AI-side limit needed
DuckDB Error Handling
- If AI-generated SQL fails DuckDB execution (syntax error, invalid column, etc.), do 1 retry — send the DuckDB error message back to AI so it can fix the SQL
- If the retry also fails, return an error to the user
Role-Based Scoping
- AI determines data scope based on user type/hierarchy
- WHERE clause injection for role-based filtering
- AEs: scoped to their section
- CIO and above: can span multiple sections, divisions, circles
DuckDB Integration
Architecture
- DuckDB runs in-process (Node.js native binding)
- Pre-built
.duckdbfile opened in read-only mode (tables/views already exist) - Read-only queries only (no writes)
- Connection pool managed per Cloud Run instance
Performance Benchmarks
Tested against ~43M rows (Sep-Dec 2025 data, months 2509-2512)
| Complexity | Queries | Avg Cold | Avg Warm | Max Cold |
|---|---|---|---|---|
| Simple | 12 | ~15ms | ~13ms | 27ms |
| Moderate | 10 | ~70ms | ~62ms | 180ms |
| Complex | 10 | ~160ms | ~150ms | 637ms |
- 31/32 queries succeeded (1 failed due to VARCHAR→DOUBLE cast edge case)
- SQL generated by Claude Haiku 4.5 (benchmark model — app now uses configurable
DEFAULT_LLM)
Benchmark Gaps (Follow-up)
- Concurrent query load (multi-user simulation)
- Memory/RAM consumption per query (Cloud Run sizing)
- Large result sets (10K+ rows for tables)
- Data scaling (12+ months of data)
- Role-scoped query overhead (additional WHERE/JOIN)
Latency Considerations
| Hop | Latency |
|---|---|
| Mumbai → Claude API (US) | ~200-300ms network RTT |
| LLM response (DEFAULT_LLM) | ~1-3s (50-200 tokens) |
| DuckDB query execution | ~15-200ms (typical) |
| Total per AI turn | ~1.5-3.5s |
- For a 2-3 turn conversation: ~5-10s total before widget appears
- Network latency is a small fraction of total wait
- If problematic: consider Bedrock in ap-south-1 (Mumbai) or ap-northeast-1 (Tokyo)
Data Flow
Table Pagination
- Tables re-execute the stored SQL with
LIMIT/OFFSETappended - Backend wraps stored SQL:
SELECT * FROM ({stored_sql}) LIMIT {limit} OFFSET {offset} - Also returns total count for frontend pagination UI