Skip to main content

Command Palette

Search for a command to run...

Reference AI-Native GovTech Architecture

Published
23 min read
Reference AI-Native GovTech Architecture
A
http://laputski.ai/about

This article describes a reference architecture for a distributed system designed for typical GovTech projects. The key feature of the architecture is the AI-Native approach: sovereign artificial intelligence is embedded as a cross-cutting component at every layer – from data enrichment to analytics and decision-making.

Sovereign AI – the ability of a state to develop, deploy, and control artificial intelligence technologies using its own infrastructure, data, personnel, and business ecosystems that are independent of external platforms.

System Requirements

Functional Requirements

FR-1: Continuous Data Ingestion. The system ingests data from heterogeneous sources (State Information System, State Information Resource, external APIs, file storage, IoT sensors, manual input). Supported modes include batch loading, streaming, and incremental synchronization (Change Data Capture). Sources may have limited bandwidth and unstable connectivity.

FR-2: Unified Data Storage. All collected data is consolidated in a single data lake to leverage the benefits of shared data usage. The lake supports structured, semi-structured, and unstructured formats. The storage layer provides ACID transactions, versioning (time-travel), and access control.

FR-3: Multi-Tier Processing Pipelines. Data passes through three quality tiers: Bronze (raw data as-is), Silver (cleansed, deduplicated, standardized), and Gold (aggregated, enriched, ready for consumption). Sovereign AI may be applied at each tier: classification, entity extraction, metadata generation, anomaly detection.

FR-4: AI Enrichment and Target Logic. The platform enables the application of a sovereign LLM and proprietary ML models to solve applied tasks: semantic search (RAG), document classification, text recognition (OCR), response generation, predictive analytics, and decision support.

FR-5: Analytics and Visualization. The system provides OLAP analytics, interactive dashboards, and report generation for leadership. Supported capabilities include ad-hoc SQL queries, multidimensional analysis, drill-down, and export to standard formats. An AI assistant helps formulate queries in natural language.

FR-6: Model Training and Management. The platform provides a full MLOps lifecycle: experiment versioning, model registry, fine-tuning (LoRA), A/B testing, drift monitoring (Data Drift Monitoring is a tracking changes in incoming data properties), and automatic rollback.

Non-Functional Requirements

NFR-1: Deployment Mode. On-premise, air-gapped environment (full or partial internet isolation). Updates are delivered via Data Diode or local repository. Controlled through architectural audit.

NFR-2: Availability and Consistency. 99.9% for analytics and dashboard services; 99.5% for AI assistants. Strong consistency for mission-critical documents and eventual consistency for pipeline-generated data. Controlled via Prometheus, Grafana, and regular SLA reporting.

NFR-3: Response Time. Dashboards and OLAP queries: < 3 sec; RAG queries to LLM: < 5 sec to first token; Batch processing: < 4 hours for a daily batch. Controlled via OpenTelemetry and Jaeger.

NFR-4: Throughput. ≥ 100 RPS (requests per second) at API Gateway; ≥ 500 concurrent users; ≥ 10 million documents uploaded per month. Controlled via k6 or Locust.

NFR-5: Scalability. Horizontal scaling of all stateless components. Linear throughput growth when adding nodes. Controlled via Kubernetes HPA and utilization monitoring.

NFR-6: Fault Tolerance. RPO (Recovery Point Objective) ≤ 1 hour; RTO (Recovery Time Objective) ≤ 4 hours. Data is replicated in 3 copies. Controlled via quarterly DR testing.

NFR-7: Security. Data encryption at-rest (AES-256) and in-transit (TLS 1.3). RBAC at row and column level. Audit of all data operations. Controlled via penetration testing and log auditing.

NFR-8: Resilience Under Resource Constraints. System capacity is expanded in increments (procurement occurs once a year). The system operates under partial GPU node loss (graceful degradation): AI functions degrade, but analytics remains available. Controlled via chaos engineering (Litmus).

NFR-9: Observability. Logs, metrics, traces. Unified monitoring console. Alerts with escalation. Controlled via ELK/Loki + Prometheus + Jaeger.

NFR-10: Compatibility. API – RESTful (OpenAPI 3.0) and gRPC. Data formats are Parquet, JSON, Avro. Storage is S3-compatible. SQL is ANSI SQL via Trino. Controlled via contract testing (Pact).

General Architecture Concept

The proposed architecture is a multi-tier platform that supports the full lifecycle of AI solutions – from data collection and processing to model inference and integration with application systems.

Note: The specific technology stack may vary depending on project requirements. The key invariant is the architectural layers, not the specific frameworks, which may be changed or replaced with proprietary solutions. The emphasis is on the use of open-source technologies.

Diagram 1: Architectural layers and technology stack of the GovTech AI-Native platform

Architecture Layer Descriptions

L1. Data Ingestion Layer

This layer addresses FR-1: continuous collection of heterogeneous data from multiple government and external systems under conditions of limited bandwidth and unstable connectivity.

Ingestion Mode Technology When Applied
Batch Apache Spark + Airflow Scheduled exports from SIS, SIR. File arrays. Historical loads.
Streaming Kafka Connect Event streams: transactions, logs, IoT sensor data in real time.
CDC (Change Data Capture) Debezium → Kafka Incremental synchronization with source relational DBs without load on them (WAL log reading).
Files SFTP / NFS → Spark File exchanges with third-party systems.
Manual Web forms → API Gateway → Kafka Manual data entry by operators, document uploads.

For sources with unstable connectivity, the following mechanisms are in place: a guaranteed-delivery queue (Kafka with acknowledgement), request retries (retry with exponential backoff), and a source-side buffer (lightweight agent based on Filebeat). When the receiver is temporarily unavailable, data accumulates in the queue and is loaded upon recovery.

L2. Data Storage Layer

This layer addresses FR-2: consolidation of all data types in a single managed storage.

Component Technology Purpose
Object Storage SeaweedFS (S3-compatible) Unstructured data: documents, PDF scans, audio, video, ML model weights. Scalable storage for the Bronze layer.
Data Lakehouse Apache Iceberg + SeaweedFS Primary analytical storage. ACID transactions, time-travel (return to any data version), hidden partitioning. Contains Bronze, Silver, and Gold layer tables.
RDBMS PostgreSQL Structured Gold-level data, application and low-code solution metadata.
Vector Database Qdrant Vector embeddings of documents and texts for semantic search (RAG). Metadata filtering.
Message Queue Apache Kafka Event bus covers all inter-component interactions, CDC streams and notifications. Provides loose coupling and replay capability.

SeaweedFS stores data, Iceberg manages table metadata, Spark and Trino perform computations. This allows storage volume and processing capacity to be scaled independently.

L3. Data Processing Layer

This layer addresses FR-3 and FR-4: multi-tier data transformation pipelines with sovereign AI integration.

Tier Actions Technologies
Bronze Data is saved as-is, an exact copy of the source is made. Minimal processing: adding technical metadata (load date, source, batch ID). Schema-on-read approach. Spark, Flink, Iceberg
Silver Data cleansing: deduplication, format standardization (dates, addresses, names), type casting. Quality validation. Text extraction from documents. AI is used for OCR, named entity extraction (NER), document type classification, data normalization. Data quality is ensured via Great Expectations Framework. Spark, Flink, Airflow, LLM/VLM
Gold Aggregation, business metric calculation, building analytical data marts and OLAP cubes. Data enrichment for decision-making. Sovereign AI and proprietary ML solutions are applied for citizen appeal sentiment analysis, risk scoring, anomaly detection, predictive models, and embedding generation for RAG. Spark, Trino, MLflow

Diagram 2: Medallion Architecture pipeline

Apache Airflow is a stateless framework that manages scheduling and dependencies of batch pipelines (DAG). Temporal is used for long-running processes with state persistence and the ability to resume from the last successful step. Both orchestrators have retry logic and failure notifications. In general, Airflow is better suited for orchestrating Bronze→Silver→Gold transformations, while Temporal is better for reliable event-driven Bronze-layer data consumption and critical downstream notifications.

Resilience under resource constraints (NFR-8) is achieved through graceful degradation. If GPU nodes are temporarily unavailable, AI pipeline stages enter graceful degradation mode: documents are saved in Bronze and Silver layers, AI enrichment is deferred and executed upon GPU recovery. Analytics on the Gold layer continues to operate on previously prepared data.

Diagram 3: OCR pipeline structure

OCR process sequence:

  1. Document scans for digitization are placed into the Data Lakehouse.

  2. The OCR Producer service periodically retrieves file metadata from S3 and sends it to Kafka.

  3. The OCR Workflow service, acting as a process orchestrator, consumes messages from Kafka and initiates digitization.

  4. In the Preprocessing stage, the scan is split into pages and each page is standardized.

  5. In the Classification stage, the document type is determined using a predefined classifier.

  6. In the Extraction stage, document content is extracted using the most suitable model for that document type.

  7. In the Validation stage, the result is verified via Pydantic schemas, checksums, LLM, and Guardrails AI.

  8. In the Fixing Errors stage, a human corrects mistakes by annotating data in a labeling client such as Label Studio.

  9. In the Save Data stage, the result is saved to the database and vector store.

  10. In the Transform stage, the text is transformed according to the project's business logic.

L4. AI & ML Layer

This layer addresses FR-4: target data processing logic via sovereign AI and proprietary ML models, as well as FR-6: the full MLOps lifecycle.

Subsystem Components Purpose
LLM Inference vLLM (production), llama.cpp (edge / fallback), NeMo Guardrails (generation safety) LLM inference, including national models. Response generation, summarization, classification. Guardrails control output safety.
RAG Pipeline LangChain / LangGraph, Embedding Model, Reranker, Semantic Cache (Redis / Valkey), Langfuse Semantic document search: query → embedding → vector search (Qdrant) → reranking → context assembly → LLM → response. Semantic Cache reduces GPU load by 30–40% for repeated queries.
ML Training & Serving MLflow (Tracking, Registry, Serving), Kubeflow (distributed training), Feast (Feature Store) Experiment versioning, model registry, LoRA fine-tuning orchestration on national data, ML feature management.
ML Observability Evidently AI Data drift and model drift monitoring, automatic quality report generation, retraining triggers.

Diagram 4: RAG pipeline structure

RAG pipeline step-by-step:

  1. The user submits a natural language query via UI or API.

  2. The AI Gateway authenticates the request, applies rate limiting, and routes it to the RAG service.

  3. A check is performed to see if a semantically similar query exists in the cache. The query is hashed via embedding and a nearest neighbor is searched in the cache.

  4. If a cache hit occurs, the response is returned directly to the client (3a -> 3b); otherwise, control is passed to the RAG pipeline.

  5. The user query is converted to an embedding, using the same model that was used to index the documents.

  6. A vector search is performed to identify the Top-K nearest documents by cosine similarity. Optional metadata filtering is applied.

  7. The Top-K candidates are passed through a cross-encoder model that more accurately evaluates the relevance of each query-document pair. The Top-N best are selected.

  8. A prompt is assembled: system prompt + context from Top-N documents + user query. Prompt templates from the Prompt Registry are applied.

  9. The assembled prompt is sent to the LLM. vLLM performs inference with streaming (token-by-token).

  10. The LLM response passes through safety filters: toxicity check, hallucination detection, policy compliance. If a violation is detected, the response is blocked or modified.

  11. The response is saved to cache with a defined TTL.

  12. The final response is returned to the user.

It should be noted that NeMo Guardrails and Guardrails AI can perform validation not only after the main LLM request, but at all stages of the request lifecycle.

Stage 1. User Request:

  • NeMo – Input Rails: jailbreak attempt detection; toxic and offensive content filtering; off-topic request blocking; personal data detection and masking in the query.

  • Guardrails AI – Input Validation: prompt injection checks; input length and format validation; input schema compliance verification; personal data detection in user query.

Stages 5–7. Vector Search over Knowledge Base:

  • NeMo – Retrieval Rails: filtering of irrelevant chunks from results; source credibility and admissibility verification; context restriction to permitted documents only; blocking transmission of chunks with prohibited content to LLM.

  • NeMo – Dialog Rails: dialog flow management via Colang scripts; deterministic responses to specific phrases; fallback scenario handling (what to do when LLM doesn't know); topic control throughout the session.

Stage 8. Prompt Construction for LLM:

  • NeMo – Execution Rails: control of permitted tool calls; whitelist of allowed external APIs; blocking of unauthorized agent actions; logging of all tool calls for audit.

Stages 9–10. Response Generation:

  • NeMo – Output Rails: hallucination check; personal data removal from final response; toxic content filtering; off-topic response blocking.

  • Guardrails AI – Output Validation: parsing LLM response into structured format; response schema validation; toxicity, relevance, and factual accuracy checks; automatic query retry upon validation failure.

Diagram 5: Model Training pipeline

Model Training pipeline step-by-step:

  1. Create experiment. Name, hyperparameters, and dataset reference are defined.

  2. Launch training pipeline as a DAG in Kubernetes.

  3. Request training features. Feast ensures point-in-time correct data sampling, preventing data leakage.

  4. Read historical features from the offline store (Iceberg) and fresh features from the online store (PostgreSQL or Redis).

  5. Prepare data by forming train/validation/test splits, augmentation, normalization, and class balancing.

  6. Train the model. For LLMs – LoRA fine-tuning of the base model. For ML – training via XGBoost / CatBoost / scikit-learn.

  7. Log metrics at each epoch (loss, accuracy, F1), hyperparameters, and artifacts (checkpoints, charts).

  8. Evaluate the model on the test set. Calculate target metrics. For LLMs – MMLU benchmarks and user dataset evaluations.

  9. If metrics exceed the threshold, the model is registered in the registry (version, metadata, metrics) with Staging status.

  10. Promote the model from Staging to Production after review and/or A/B testing.

  11. Deploy the new version: LLM to vLLM (LoRA adapter update), ML to MLflow Serving or Triton. Rolling update with zero downtime.

  12. Continuous monitoring by comparing the distribution of input data and predictions against a reference dataset.

  13. Upon detection of data drift or model drift, an automatic retraining process is triggered. Data drift refers to changes in the statistical characteristics of input data over time, which can negatively impact model performance. Model drift refers to changes in model behavior or quality over time.

L5. Presentation Layer

This layer addresses FR-5: analytics, OLAP, and leadership dashboards.

Component Technology Purpose
Web Client with AI Assistant React / Angular, Chat UI Interface for dialogue with the corporate LLM: data queries, SQL generation in natural language, report summarization, etc.
AI / API Gateway Kong AI Gateway Single entry point for external consumers: authentication, authorization, rate limiting, routing. OpenAPI 3.0. Centralized limit management and token quota distribution, protection against cascading failures.
BI & Dashboards Apache Superset Interactive dashboards for leadership: KPIs, trends, drill-down. Connections to Trino (Gold layer) and PostgreSQL. Auto-refresh scheduling. Automatic PDF/Excel report generation and distribution on schedule.
OLAP Trino Federated SQL queries to the Gold layer (Iceberg), PostgreSQL, S3. Multidimensional analysis, pivot tables.

The AI assistant allows leaders to formulate analytical queries in natural language: "Show the trend of citizen appeals by region for the last quarter." The LLM translates the query into SQL, Trino executes it on the Gold layer, and Superset visualizes the result.

L6. Security, Governance & Observability

This cross-cutting layer ensures compliance with non-functional requirements NFR-1, NFR-7, NFR-9.

Domain Components Purpose
Identity & Access Keycloak (IAM), RBAC/ABAC Unified authentication (SSO), row- and column-level authorization, LDAP integration.
Secrets & Encryption HashiCorp Vault Key, certificate, and secret management. Key rotation. At-rest encryption.
Network Security Calico / Cilium Kubernetes network policies: micro-segmentation, zero-trust between pods.
Data Catalog & Lineage DataHub Cataloging of all data, lineage tracking from source through Bronze → Silver → Gold to dashboard. Metadata search.
Data Quality Great Expectations Automated validation at the Bronze/Silver boundary: completeness, format, ranges, business rules. Alerting on violations.
Observability ELK / Loki (logs), Prometheus + Grafana (metrics), OpenTelemetry + Jaeger (traces) Logs, metrics, traces. Unified console. SLA dashboard with availability, latency, and error metrics.
CI/CD & GitOps ArgoCD (GitOps), Harbor (Container Registry), Ansible / Terraform (IaC) Declarative configuration management. Local image registry for air-gapped environment. Reproducible deployments.

Key Architectural Decisions

Event-driven architecture on Kafka. All inter-component interactions – including data ingestion, index updates, and notifications – are implemented through a message queue. This ensures loose coupling and a complete audit trail of all operations.

Storage-Compute Separation. The Data Lakehouse based on Apache Iceberg and S3-compatible storage allows storage and processing to be scaled independently. This is critical as the data lake grows to millions of documents per month, where analytical compute capacity must not compete with inference capacity.

Medallion Architecture. The quality tier separation ensures reproducibility (Bronze – source copy), manageability (Silver – cleansed data), and readiness for use (Gold – analytics). AI enrichment is embedded at the transitions between tiers.

Graceful Degradation. When GPUs are unavailable, AI pipeline stages are deferred, but core data processing and analytics continue to operate. The system automatically resumes AI enrichment upon resource recovery.

Caching. Session caching, intermediate LLM result caching, rate limiting. Caching of semantically similar LLM queries reduces GPU load by 30–40% and decreases latency for common questions.

Air-gapped CI/CD. ArgoCD + Harbor + local Nexus provide a complete deployment cycle without internet access. Updates are delivered via Data Diode and undergo security verification.

Data Lineage (DataHub). End-to-end tracking of data origin from source through all Medallion Architecture layers to a specific dashboard. Required for audit and regulatory compliance.

Architecture Decision Records

ADR-001: Apache Kafka as the Unified Event Bus

Status: Accepted

Context
The system integrates dozens of heterogeneous data sources. Components must be loosely coupled. Environment: air-gapped, on-premise.

Option Pros Cons
Apache Kafka Event replay, high throughput, CDC integration via Debezium, mature ecosystem Operational complexity
RabbitMQ Simplicity, low latency No replay, weak CDC support

Decision: Kafka

Rationale
Event replay capability is critical for the Bronze layer: if AI enrichment fails, data is not lost and the pipeline restarts from the correct point. Debezium natively integrates only with Kafka.

Consequences

➕ Loose coupling of all components, full audit trail of operations

➕ CDC without load on source databases

➖ Requires a dedicated cluster and operational expertise

ADR-002: Apache Iceberg as the Data Lakehouse Format

Status: Accepted

Context
A storage format is needed for Medallion Architecture with ACID support, time-travel, and independent scaling of storage and compute. On-premise, S3-compatible storage.

Option Pros Cons
Apache Iceberg ACID, time-travel, hidden partitioning, support for Spark, Trino, and Flink Younger format, more complex than Delta Lake in some scenarios
Delta Lake Mature, excellent Spark integration Tied to Databricks ecosystem, weaker with Trino
Apache Hudi Good for CDC upserts More complex to operate, smaller community
Parquet + Hive Simplicity No ACID, no time-travel

Decision: Apache Iceberg

Rationale
Iceberg is an engine-agnostic standard: it is simultaneously read by Spark (processing), Trino (analytics), and Flink (streaming) without data copying. Delta Lake creates a dependency on Databricks, which is unacceptable for a sovereign environment.

Consequences

➕ Storage-Compute Separation: storage and compute scale independently

➕ Time-travel for audit and reproducibility

➖ Requires a metadata catalog (Hive Metastore or Nessie)

ADR-003: SeaweedFS as Object Storage

Status: Accepted

Context
An S3-compatible, open-source object storage is needed for on-premise deployment. It stores documents, PDF scans, ML model weights, and Bronze-layer data. Volume: tens of millions of files.

Option Pros Cons
SeaweedFS Efficient small file storage, built-in Filer, low metadata overhead, S3-compatible Smaller community, less documentation
MinIO De-facto S3 on-premise standard, large community, excellent documentation High overhead with millions of small files; developer discontinued Community Edition development
Ceph Versatility (block + object + file) High operational complexity

Decision: SeaweedFS

Rationale
Fully open-source. SeaweedFS is architecturally optimized for storing large numbers of small files.

Consequences

➕ Efficiency when storing millions of documents

➖ Smaller community and ecosystem compared to MinIO

ADR-004: vLLM + llama.cpp as Two-Tier LLM Inference

Status: Accepted

Context
LLM inference is needed in an air-gapped environment. GPU resources are limited and expanded in increments. Graceful degradation is required upon GPU node loss.

Option Pros Cons
vLLM (primary) PagedAttention, continuous batching, high throughput, streaming Requires GPU, high memory consumption
llama.cpp (fallback) CPU inference, minimal requirements, quantization Low speed on large models
Triton Inference Server Versatility, multi-model Configuration complexity, overkill for LLM
Ollama Simplicity Not production-ready for high load

Decision: vLLM as primary + llama.cpp as edge/fallback

Rationale
The two-tier approach addresses NFR-8 (resilience under resource constraints): when GPU is unavailable, the system does not fail but degrades to CPU inference at reduced speed. vLLM provides production throughput via PagedAttention and continuous batching.

Consequences

➕ Graceful degradation: if GPU is unavailable, llama.cpp continues operation

➕ LoRA adapters are updated in vLLM without restart

➖ Two different runtimes require maintaining two configurations

ADR-005: Langfuse for RAG Observability

Status: Accepted

Context
A tool is needed for tracing and quality evaluation of the RAG pipeline: call chain logging, relevance assessment, prompt A/B testing. Environment: air-gapped.

Option Pros Cons
Langfuse Open-source, self-hosted, full tracing functionality Fewer integrations than LangSmith
LangSmith Deep LangChain integration, rich UI Cloud service – incompatible with air-gapped

Decision: Langfuse

Rationale
LangSmith is a cloud service, fundamentally incompatible with an air-gapped environment and data sovereignty requirements. Langfuse is deployed on-premise, has open-source code, and covers all required scenarios: tracing, evaluation, and Prompt Registry.

Consequences

➕ Full control over tracing data

➕ Prompt Registry for prompt version management

➖ Requires self-deployment and maintenance

ADR-006: Airflow and Temporal for Orchestration

Status: Accepted

Context
The system contains two classes of processes: regular batch data transformations (Bronze→Silver→Gold) and long-running event-driven processes (OCR, downstream notifications). A single orchestrator handles both scenarios poorly.

Option Pros Cons
Airflow only Single tool, mature, large community Stateless: no resume-from-failure for long-running processes
Temporal only Stateful, durable execution, retry from any point Overkill for simple DAG scheduling
Airflow + Temporal Each tool optimized for its task Two tools in the stack
Prefect Modern, simpler than Airflow Smaller community, weaker for stateful workflows

Decision: Airflow for batch pipelines + Temporal for event-driven processes

Rationale
The OCR pipeline can run for hours and must resume from the point of failure without losing progress – this is Temporal's domain. Bronze→Silver→Gold transformations are classic scheduled DAGs – Airflow's domain. Separation of concerns eliminates compromises.

Consequences

➕ Each orchestrator is optimal for its class of tasks

➕ Reliability for long-running processes (OCR, notifications)

➖ The team must be proficient in both tools

ADR-007: Qdrant as the Vector Database

Status: Accepted

Context
The RAG pipeline requires storage and search of document vector embeddings. Volume: tens of millions of chunks. On-premise, air-gapped. Metadata filtering is required.

Option Pros Cons
Qdrant Rust implementation (performance), rich payload filtering, on-premise, active development Smaller ecosystem than Weaviate
pgvector PostgreSQL already in stack, simplicity Degrades at large volumes (>10M vectors)
Weaviate Rich functionality, GraphQL API Go implementation, higher memory consumption
Milvus High performance Deployment complexity, etcd dependency
ElasticSearch (kNN) Already in stack Not optimized for vector search

Decision: Qdrant

Rationale
At 10M+ documents per month, pgvector loses performance. Qdrant is written in Rust, demonstrates the best latency/throughput ratio in benchmarks, supports metadata filtering (critical for document-level RBAC), and deploys as a simple single binary.

Consequences

➕ High performance at large volumes

➕ Metadata filtering for access control

➖ Additional component in the stack (PostgreSQL cannot be reused)

ADR-008: Medallion Architecture

Status: Accepted

Context
Data arrives from dozens of heterogeneous sources of varying quality. A storage strategy is needed that ensures reproducibility, quality manageability, and analytics readiness. AI enrichment may be unavailable (no GPU).

Option Pros Cons
Medallion (Bronze/Silver/Gold) Reproducibility, quality isolation, AI graceful degradation Data stored in multiple copies
Data Vault Auditability, flexibility High modeling complexity
Kimball DWH Mature approach, familiar to analysts Rigid schema, poor fit for unstructured data
Single storage Simplicity No quality isolation, difficult to roll back

Decision: Medallion Architecture

Rationale
Bronze as an immutable source copy is insurance against transformation and AI enrichment errors. When GPU is unavailable, documents accumulate in Bronze/Silver, and AI enrichment is executed upon resource recovery. The Gold layer is always available for analytics on previously prepared data.

Consequences

➕ Reproducibility: any stage can be restarted

➕ Graceful degradation: analytics does not depend on AI availability

➖ Storing data in three copies increases disk space requirements

ADR-009: NeMo Guardrails + Guardrails AI as Two-Tier LLM Protection

Status: Accepted

Context
The government system handles citizens' personal data. The LLM may generate toxic content, hallucinations, or personal data leaks. Protection is needed at all stages of the request lifecycle.

Option Pros Cons
NeMo Guardrails Colang scripts for dialog flow, input/output/retrieval/execution rails, from NVIDIA Ties to NVIDIA ecosystem
Guardrails AI Structured output validation, Pydantic schemas, retry on failure Does not cover dialog flow
Prompt engineering only Simplicity Unreliable, easily bypassed
Custom filters Full control High development and maintenance cost

Decision: NeMo Guardrails (dialog/flow protection) + Guardrails AI (structural validation)

Rationale
The tools cover different aspects: NeMo manages dialog logic and blocks undesirable scenarios at the flow level; Guardrails AI validates the structure and content of output against schemas. Together they form a layered defense critical for a government system handling personal data.

Consequences

➕ Protection at all stages: input → retrieval → prompt → output

➕ Full auditability of all blocking decisions

➖ Latency overhead at each stage (~100–200ms total)

➖ Requires maintaining Colang scripts when business logic changes