Architecture

Tech Stack

Component	Technology	Purpose
Framework	Fastify 5 (slim-bp-v2)	REST API
Persistence	MongoDB / Mongoose	Canvases, widgets, conversations
Query Engine	DuckDB	.duckdb file queries (energy audit data)
AI (Primary)	Configurable via DEFAULT_LLM	SQL + chart config generation
AI (Fallback)	Configurable via FALLBACK_LLM	Guards against transient API errors
Auth	JWT (Ed25519) + Cerbos	Session tokens + role-based policies

AI Strategy

Model Selection

Primary: Configurable via DEFAULT_LLM env var — proven 97% success rate (31/32 queries) on energy audit SQL generation
Fallback: Configurable via FALLBACK_LLM env var — triggered if primary returns no tool use or a transient API error (2-attempt flow: primary → fallback)
Store which model generated each widget's SQL (useful for quality debugging)

No Streaming

Simple JSON request/response — no SSE/WebSocket streaming
AI responses are short (sentence + chips, or SQL + config) — ~50-200 tokens
Streaming complexity not justified for this use case
Can be added later if latency becomes an issue

Structured Output

AI responses follow a structured format via tool use:

// Clarification response
{
  "type": "clarification",
  "message": "What month range should this cover?"
}

// Chart generation response
{
  "type": "chart",
  "title": "TC Trends — Line Chart",
  "conversationName": "TC trends over time",
  "chartType": "line",
  "sql": "SELECT ...",
  "chartConfig": { "xAxis": "month", "yAxis": ["audited", "total", "unaudited"] }
}

// Table generation response
{
  "type": "table",
  "title": "Section-wise Loss Breakdown",
  "conversationName": "Loss percentage breakdown",
  "sql": "SELECT ...",
  "columns": [{ "key": "section", "label": "Section" }, ...]
}

Data Limits

Charts: Max 10 data points per axis
- Prompt engineering: AI is instructed to write SQL that returns max 10 data points
- Backend safety net: After DuckDB execution, if rows > 10 for a chart, do not create the widget. Return a clarification to the user explaining the data has too many points for a chart, suggest narrowing the request or switching to a table. Conversation stays active so the user can adjust
Tables: Server-side pagination (10 rows per page via LIMIT/OFFSET in DuckDB). No AI-side limit needed

DuckDB Error Handling

If AI-generated SQL fails DuckDB execution (syntax error, invalid column, etc.), do 1 retry — send the DuckDB error message back to AI so it can fix the SQL
If the retry also fails, return an error to the user

Role-Based Scoping

AI determines data scope based on user type/hierarchy
WHERE clause injection for role-based filtering
AEs: scoped to their section
CIO and above: can span multiple sections, divisions, circles

DuckDB Integration

Architecture

DuckDB runs in-process (Node.js native binding)
Pre-built .duckdb file opened in read-only mode (tables/views already exist)
Read-only queries only (no writes)
Connection pool managed per Cloud Run instance

Performance Benchmarks

Tested against ~43M rows (Sep-Dec 2025 data, months 2509-2512)

Complexity	Queries	Avg Cold	Avg Warm	Max Cold
Simple	12	~15ms	~13ms	27ms
Moderate	10	~70ms	~62ms	180ms
Complex	10	~160ms	~150ms	637ms

31/32 queries succeeded (1 failed due to VARCHAR→DOUBLE cast edge case)
SQL generated by Claude Haiku 4.5 (benchmark model — app now uses configurable DEFAULT_LLM)

Benchmark Gaps (Follow-up)

Concurrent query load (multi-user simulation)
Memory/RAM consumption per query (Cloud Run sizing)
Large result sets (10K+ rows for tables)
Data scaling (12+ months of data)
Role-scoped query overhead (additional WHERE/JOIN)

Latency Considerations

Hop	Latency
Mumbai → Claude API (US)	~200-300ms network RTT
LLM response (DEFAULT_LLM)	~1-3s (50-200 tokens)
DuckDB query execution	~15-200ms (typical)
Total per AI turn	~1.5-3.5s

For a 2-3 turn conversation: ~5-10s total before widget appears
Network latency is a small fraction of total wait
If problematic: consider Bedrock in ap-south-1 (Mumbai) or ap-northeast-1 (Tokyo)

Data Flow

Table Pagination

Tables re-execute the stored SQL with LIMIT / OFFSET appended
Backend wraps stored SQL: SELECT * FROM ({stored_sql}) LIMIT {limit} OFFSET {offset}
Also returns total count for frontend pagination UI

Architecture

Tech Stack​

AI Strategy​

Model Selection​

No Streaming​

Structured Output​

Data Limits​

DuckDB Error Handling​

Role-Based Scoping​

DuckDB Integration​

Architecture​

Performance Benchmarks​

Benchmark Gaps (Follow-up)​

Latency Considerations​

Data Flow​

Table Pagination​

Tech Stack

AI Strategy

Model Selection

No Streaming

Structured Output

Data Limits

DuckDB Error Handling

Role-Based Scoping

DuckDB Integration

Architecture

Performance Benchmarks

Benchmark Gaps (Follow-up)

Latency Considerations

Data Flow

Table Pagination