Reference AI-Native GovTech Architecture

This article describes a reference architecture for a distributed system designed for typical GovTech projects. The key feature of the architecture is the AI-Native approach: sovereign artificial intelligence is embedded as a cross-cutting component at every layer – from data enrichment to analytics and decision-making.
Sovereign AI – the ability of a state to develop, deploy, and control artificial intelligence technologies using its own infrastructure, data, personnel, and business ecosystems that are independent of external platforms.
System Requirements
Functional Requirements
FR-1: Continuous Data Ingestion. The system ingests data from heterogeneous sources (State Information System, State Information Resource, external APIs, file storage, IoT sensors, manual input). Supported modes include batch loading, streaming, and incremental synchronization (Change Data Capture). Sources may have limited bandwidth and unstable connectivity.
FR-2: Unified Data Storage. All collected data is consolidated in a single data lake to leverage the benefits of shared data usage. The lake supports structured, semi-structured, and unstructured formats. The storage layer provides ACID transactions, versioning (time-travel), and access control.
FR-3: Multi-Tier Processing Pipelines. Data passes through three quality tiers: Bronze (raw data as-is), Silver (cleansed, deduplicated, standardized), and Gold (aggregated, enriched, ready for consumption). Sovereign AI may be applied at each tier: classification, entity extraction, metadata generation, anomaly detection.
FR-4: AI Enrichment and Target Logic. The platform enables the application of a sovereign LLM and proprietary ML models to solve applied tasks: semantic search (RAG), document classification, text recognition (OCR), response generation, predictive analytics, and decision support.
FR-5: Analytics and Visualization. The system provides OLAP analytics, interactive dashboards, and report generation for leadership. Supported capabilities include ad-hoc SQL queries, multidimensional analysis, drill-down, and export to standard formats. An AI assistant helps formulate queries in natural language.
FR-6: Model Training and Management. The platform provides a full MLOps lifecycle: experiment versioning, model registry, fine-tuning (LoRA), A/B testing, drift monitoring (Data Drift Monitoring is a tracking changes in incoming data properties), and automatic rollback.
Non-Functional Requirements
NFR-1: Deployment Mode. On-premise, air-gapped environment (full or partial internet isolation). Updates are delivered via Data Diode or local repository. Controlled through architectural audit.
NFR-2: Availability and Consistency. 99.9% for analytics and dashboard services; 99.5% for AI assistants. Strong consistency for mission-critical documents and eventual consistency for pipeline-generated data. Controlled via Prometheus, Grafana, and regular SLA reporting.
NFR-3: Response Time. Dashboards and OLAP queries: < 3 sec; RAG queries to LLM: < 5 sec to first token; Batch processing: < 4 hours for a daily batch. Controlled via OpenTelemetry and Jaeger.
NFR-4: Throughput. ≥ 100 RPS (requests per second) at API Gateway; ≥ 500 concurrent users; ≥ 10 million documents uploaded per month. Controlled via k6 or Locust.
NFR-5: Scalability. Horizontal scaling of all stateless components. Linear throughput growth when adding nodes. Controlled via Kubernetes HPA and utilization monitoring.
NFR-6: Fault Tolerance. RPO (Recovery Point Objective) ≤ 1 hour; RTO (Recovery Time Objective) ≤ 4 hours. Data is replicated in 3 copies. Controlled via quarterly DR testing.
NFR-7: Security. Data encryption at-rest (AES-256) and in-transit (TLS 1.3). RBAC at row and column level. Audit of all data operations. Controlled via penetration testing and log auditing.
NFR-8: Resilience Under Resource Constraints. System capacity is expanded in increments (procurement occurs once a year). The system operates under partial GPU node loss (graceful degradation): AI functions degrade, but analytics remains available. Controlled via chaos engineering (Litmus).
NFR-9: Observability. Logs, metrics, traces. Unified monitoring console. Alerts with escalation. Controlled via ELK/Loki + Prometheus + Jaeger.
NFR-10: Compatibility. API – RESTful (OpenAPI 3.0) and gRPC. Data formats are Parquet, JSON, Avro. Storage is S3-compatible. SQL is ANSI SQL via Trino. Controlled via contract testing (Pact).
General Architecture Concept
The proposed architecture is a multi-tier platform that supports the full lifecycle of AI solutions – from data collection and processing to model inference and integration with application systems.
Note: The specific technology stack may vary depending on project requirements. The key invariant is the
architectural layers, not the specific frameworks, which may be changed or replaced with proprietary solutions. The emphasis is on the use ofopen-sourcetechnologies.
Diagram 1: Architectural layers and technology stack of the GovTech AI-Native platform
Architecture Layer Descriptions
L1. Data Ingestion Layer
This layer addresses FR-1: continuous collection of heterogeneous data from multiple government and external systems under conditions of limited bandwidth and unstable connectivity.
| Ingestion Mode | Technology | When Applied |
|---|---|---|
| Batch | Apache Spark + Airflow | Scheduled exports from SIS, SIR. File arrays. Historical loads. |
| Streaming | Kafka Connect | Event streams: transactions, logs, IoT sensor data in real time. |
| CDC (Change Data Capture) | Debezium → Kafka | Incremental synchronization with source relational DBs without load on them (WAL log reading). |
| Files | SFTP / NFS → Spark | File exchanges with third-party systems. |
| Manual | Web forms → API Gateway → Kafka | Manual data entry by operators, document uploads. |
For sources with unstable connectivity, the following mechanisms are in place: a guaranteed-delivery queue (Kafka with acknowledgement), request retries (retry with exponential backoff), and a source-side buffer (lightweight agent based on Filebeat). When the receiver is temporarily unavailable, data accumulates in the queue and is loaded upon recovery.
L2. Data Storage Layer
This layer addresses FR-2: consolidation of all data types in a single managed storage.
| Component | Technology | Purpose |
|---|---|---|
| Object Storage | SeaweedFS (S3-compatible) | Unstructured data: documents, PDF scans, audio, video, ML model weights. Scalable storage for the Bronze layer. |
| Data Lakehouse | Apache Iceberg + SeaweedFS | Primary analytical storage. ACID transactions, time-travel (return to any data version), hidden partitioning. Contains Bronze, Silver, and Gold layer tables. |
| RDBMS | PostgreSQL | Structured Gold-level data, application and low-code solution metadata. |
| Vector Database | Qdrant | Vector embeddings of documents and texts for semantic search (RAG). Metadata filtering. |
| Message Queue | Apache Kafka | Event bus covers all inter-component interactions, CDC streams and notifications. Provides loose coupling and replay capability. |
SeaweedFS stores data, Iceberg manages table metadata, Spark and Trino perform computations. This allows storage volume and processing capacity to be scaled independently.
L3. Data Processing Layer
This layer addresses FR-3 and FR-4: multi-tier data transformation pipelines with sovereign AI integration.
| Tier | Actions | Technologies |
|---|---|---|
| Bronze | Data is saved as-is, an exact copy of the source is made. Minimal processing: adding technical metadata (load date, source, batch ID). Schema-on-read approach. | Spark, Flink, Iceberg |
| Silver | Data cleansing: deduplication, format standardization (dates, addresses, names), type casting. Quality validation. Text extraction from documents. AI is used for OCR, named entity extraction (NER), document type classification, data normalization. Data quality is ensured via Great Expectations Framework. | Spark, Flink, Airflow, LLM/VLM |
| Gold | Aggregation, business metric calculation, building analytical data marts and OLAP cubes. Data enrichment for decision-making. Sovereign AI and proprietary ML solutions are applied for citizen appeal sentiment analysis, risk scoring, anomaly detection, predictive models, and embedding generation for RAG. | Spark, Trino, MLflow |
Diagram 2: Medallion Architecture pipeline
Apache Airflow is a stateless framework that manages scheduling and dependencies of batch pipelines (DAG). Temporal is used for long-running processes with state persistence and the ability to resume from the last successful step. Both orchestrators have retry logic and failure notifications. In general, Airflow is better suited for orchestrating Bronze→Silver→Gold transformations, while Temporal is better for reliable event-driven Bronze-layer data consumption and critical downstream notifications.
Resilience under resource constraints (NFR-8) is achieved through graceful degradation. If GPU nodes are temporarily unavailable, AI pipeline stages enter graceful degradation mode: documents are saved in Bronze and Silver layers, AI enrichment is deferred and executed upon GPU recovery. Analytics on the Gold layer continues to operate on previously prepared data.
Diagram 3: OCR pipeline structure
OCR process sequence:
Document scans for digitization are placed into the Data Lakehouse.
The OCR Producer service periodically retrieves file metadata from S3 and sends it to Kafka.
The OCR Workflow service, acting as a process orchestrator, consumes messages from Kafka and initiates digitization.
In the Preprocessing stage, the scan is split into pages and each page is standardized.
In the Classification stage, the document type is determined using a predefined classifier.
In the Extraction stage, document content is extracted using the most suitable model for that document type.
In the Validation stage, the result is verified via Pydantic schemas, checksums, LLM, and Guardrails AI.
In the Fixing Errors stage, a human corrects mistakes by annotating data in a labeling client such as Label Studio.
In the Save Data stage, the result is saved to the database and vector store.
In the Transform stage, the text is transformed according to the project's business logic.
L4. AI & ML Layer
This layer addresses FR-4: target data processing logic via sovereign AI and proprietary ML models, as well as FR-6: the full MLOps lifecycle.
| Subsystem | Components | Purpose |
|---|---|---|
| LLM Inference | vLLM (production), llama.cpp (edge / fallback), NeMo Guardrails (generation safety) | LLM inference, including national models. Response generation, summarization, classification. Guardrails control output safety. |
| RAG Pipeline | LangChain / LangGraph, Embedding Model, Reranker, Semantic Cache (Redis / Valkey), Langfuse | Semantic document search: query → embedding → vector search (Qdrant) → reranking → context assembly → LLM → response. Semantic Cache reduces GPU load by 30–40% for repeated queries. |
| ML Training & Serving | MLflow (Tracking, Registry, Serving), Kubeflow (distributed training), Feast (Feature Store) | Experiment versioning, model registry, LoRA fine-tuning orchestration on national data, ML feature management. |
| ML Observability | Evidently AI | Data drift and model drift monitoring, automatic quality report generation, retraining triggers. |
Diagram 4: RAG pipeline structure
RAG pipeline step-by-step:
The user submits a natural language query via UI or API.
The AI Gateway authenticates the request, applies rate limiting, and routes it to the RAG service.
A check is performed to see if a semantically similar query exists in the cache. The query is hashed via embedding and a nearest neighbor is searched in the cache.
If a cache hit occurs, the response is returned directly to the client (3a -> 3b); otherwise, control is passed to the RAG pipeline.
The user query is converted to an embedding, using the same model that was used to index the documents.
A vector search is performed to identify the Top-K nearest documents by cosine similarity. Optional metadata filtering is applied.
The Top-K candidates are passed through a cross-encoder model that more accurately evaluates the relevance of each query-document pair. The Top-N best are selected.
A prompt is assembled: system prompt + context from Top-N documents + user query. Prompt templates from the Prompt Registry are applied.
The assembled prompt is sent to the LLM. vLLM performs inference with streaming (token-by-token).
The LLM response passes through safety filters: toxicity check, hallucination detection, policy compliance. If a violation is detected, the response is blocked or modified.
The response is saved to cache with a defined TTL.
The final response is returned to the user.
It should be noted that NeMo Guardrails and Guardrails AI can perform validation not only after the main LLM request, but at all stages of the request lifecycle.
Stage 1. User Request:
NeMo – Input Rails: jailbreak attempt detection; toxic and offensive content filtering; off-topic request blocking; personal data detection and masking in the query.
Guardrails AI – Input Validation: prompt injection checks; input length and format validation; input schema compliance verification; personal data detection in user query.
Stages 5–7. Vector Search over Knowledge Base:
NeMo – Retrieval Rails: filtering of irrelevant chunks from results; source credibility and admissibility verification; context restriction to permitted documents only; blocking transmission of chunks with prohibited content to LLM.
NeMo – Dialog Rails: dialog flow management via Colang scripts; deterministic responses to specific phrases; fallback scenario handling (what to do when LLM doesn't know); topic control throughout the session.
Stage 8. Prompt Construction for LLM:
- NeMo – Execution Rails: control of permitted tool calls; whitelist of allowed external APIs; blocking of unauthorized agent actions; logging of all tool calls for audit.
Stages 9–10. Response Generation:
NeMo – Output Rails: hallucination check; personal data removal from final response; toxic content filtering; off-topic response blocking.
Guardrails AI – Output Validation: parsing LLM response into structured format; response schema validation; toxicity, relevance, and factual accuracy checks; automatic query retry upon validation failure.
Diagram 5: Model Training pipeline
Model Training pipeline step-by-step:
Create experiment. Name, hyperparameters, and dataset reference are defined.
Launch training pipeline as a DAG in Kubernetes.
Request training features. Feast ensures point-in-time correct data sampling, preventing data leakage.
Read historical features from the offline store (Iceberg) and fresh features from the online store (PostgreSQL or Redis).
Prepare data by forming train/validation/test splits, augmentation, normalization, and class balancing.
Train the model. For LLMs – LoRA fine-tuning of the base model. For ML – training via XGBoost / CatBoost / scikit-learn.
Log metrics at each epoch (loss, accuracy, F1), hyperparameters, and artifacts (checkpoints, charts).
Evaluate the model on the test set. Calculate target metrics. For LLMs – MMLU benchmarks and user dataset evaluations.
If metrics exceed the threshold, the model is registered in the registry (version, metadata, metrics) with Staging status.
Promote the model from Staging to Production after review and/or A/B testing.
Deploy the new version: LLM to vLLM (LoRA adapter update), ML to MLflow Serving or Triton. Rolling update with zero downtime.
Continuous monitoring by comparing the distribution of input data and predictions against a reference dataset.
Upon detection of data drift or model drift, an automatic retraining process is triggered. Data drift refers to changes in the statistical characteristics of input data over time, which can negatively impact model performance. Model drift refers to changes in model behavior or quality over time.
L5. Presentation Layer
This layer addresses FR-5: analytics, OLAP, and leadership dashboards.
| Component | Technology | Purpose |
|---|---|---|
| Web Client with AI Assistant | React / Angular, Chat UI | Interface for dialogue with the corporate LLM: data queries, SQL generation in natural language, report summarization, etc. |
| AI / API Gateway | Kong AI Gateway | Single entry point for external consumers: authentication, authorization, rate limiting, routing. OpenAPI 3.0. Centralized limit management and token quota distribution, protection against cascading failures. |
| BI & Dashboards | Apache Superset | Interactive dashboards for leadership: KPIs, trends, drill-down. Connections to Trino (Gold layer) and PostgreSQL. Auto-refresh scheduling. Automatic PDF/Excel report generation and distribution on schedule. |
| OLAP | Trino | Federated SQL queries to the Gold layer (Iceberg), PostgreSQL, S3. Multidimensional analysis, pivot tables. |
The AI assistant allows leaders to formulate analytical queries in natural language: "Show the trend of citizen appeals by region for the last quarter." The LLM translates the query into SQL, Trino executes it on the Gold layer, and Superset visualizes the result.
L6. Security, Governance & Observability
This cross-cutting layer ensures compliance with non-functional requirements NFR-1, NFR-7, NFR-9.
| Domain | Components | Purpose |
|---|---|---|
| Identity & Access | Keycloak (IAM), RBAC/ABAC | Unified authentication (SSO), row- and column-level authorization, LDAP integration. |
| Secrets & Encryption | HashiCorp Vault | Key, certificate, and secret management. Key rotation. At-rest encryption. |
| Network Security | Calico / Cilium | Kubernetes network policies: micro-segmentation, zero-trust between pods. |
| Data Catalog & Lineage | DataHub | Cataloging of all data, lineage tracking from source through Bronze → Silver → Gold to dashboard. Metadata search. |
| Data Quality | Great Expectations | Automated validation at the Bronze/Silver boundary: completeness, format, ranges, business rules. Alerting on violations. |
| Observability | ELK / Loki (logs), Prometheus + Grafana (metrics), OpenTelemetry + Jaeger (traces) | Logs, metrics, traces. Unified console. SLA dashboard with availability, latency, and error metrics. |
| CI/CD & GitOps | ArgoCD (GitOps), Harbor (Container Registry), Ansible / Terraform (IaC) | Declarative configuration management. Local image registry for air-gapped environment. Reproducible deployments. |
Key Architectural Decisions
Event-driven architecture on Kafka. All inter-component interactions – including data ingestion, index updates, and notifications – are implemented through a message queue. This ensures loose coupling and a complete audit trail of all operations.
Storage-Compute Separation. The Data Lakehouse based on Apache Iceberg and S3-compatible storage allows storage and processing to be scaled independently. This is critical as the data lake grows to millions of documents per month, where analytical compute capacity must not compete with inference capacity.
Medallion Architecture. The quality tier separation ensures reproducibility (Bronze – source copy), manageability (Silver – cleansed data), and readiness for use (Gold – analytics). AI enrichment is embedded at the transitions between tiers.
Graceful Degradation. When GPUs are unavailable, AI pipeline stages are deferred, but core data processing and analytics continue to operate. The system automatically resumes AI enrichment upon resource recovery.
Caching. Session caching, intermediate LLM result caching, rate limiting. Caching of semantically similar LLM queries reduces GPU load by 30–40% and decreases latency for common questions.
Air-gapped CI/CD. ArgoCD + Harbor + local Nexus provide a complete deployment cycle without internet access. Updates are delivered via Data Diode and undergo security verification.
Data Lineage (DataHub). End-to-end tracking of data origin from source through all Medallion Architecture layers to a specific dashboard. Required for audit and regulatory compliance.
Architecture Decision Records
ADR-001: Apache Kafka as the Unified Event Bus
Status: Accepted
Context
The system integrates dozens of heterogeneous data sources. Components must be loosely coupled. Environment: air-gapped, on-premise.
| Option | Pros | Cons |
|---|---|---|
| Apache Kafka | Event replay, high throughput, CDC integration via Debezium, mature ecosystem | Operational complexity |
| RabbitMQ | Simplicity, low latency | No replay, weak CDC support |
Decision: Kafka
Rationale
Event replay capability is critical for the Bronze layer: if AI enrichment fails, data is not lost and the pipeline restarts from the correct point. Debezium natively integrates only with Kafka.
Consequences
➕ Loose coupling of all components, full audit trail of operations
➕ CDC without load on source databases
➖ Requires a dedicated cluster and operational expertise
ADR-002: Apache Iceberg as the Data Lakehouse Format
Status: Accepted
Context
A storage format is needed for Medallion Architecture with ACID support, time-travel, and independent scaling of storage and compute. On-premise, S3-compatible storage.
| Option | Pros | Cons |
|---|---|---|
| Apache Iceberg | ACID, time-travel, hidden partitioning, support for Spark, Trino, and Flink | Younger format, more complex than Delta Lake in some scenarios |
| Delta Lake | Mature, excellent Spark integration | Tied to Databricks ecosystem, weaker with Trino |
| Apache Hudi | Good for CDC upserts | More complex to operate, smaller community |
| Parquet + Hive | Simplicity | No ACID, no time-travel |
Decision: Apache Iceberg
Rationale
Iceberg is an engine-agnostic standard: it is simultaneously read by Spark (processing), Trino (analytics), and Flink (streaming) without data copying. Delta Lake creates a dependency on Databricks, which is unacceptable for a sovereign environment.
Consequences
➕ Storage-Compute Separation: storage and compute scale independently
➕ Time-travel for audit and reproducibility
➖ Requires a metadata catalog (Hive Metastore or Nessie)
ADR-003: SeaweedFS as Object Storage
Status: Accepted
Context
An S3-compatible, open-source object storage is needed for on-premise deployment. It stores documents, PDF scans, ML model weights, and Bronze-layer data. Volume: tens of millions of files.
| Option | Pros | Cons |
|---|---|---|
| SeaweedFS | Efficient small file storage, built-in Filer, low metadata overhead, S3-compatible | Smaller community, less documentation |
| MinIO | De-facto S3 on-premise standard, large community, excellent documentation | High overhead with millions of small files; developer discontinued Community Edition development |
| Ceph | Versatility (block + object + file) | High operational complexity |
Decision: SeaweedFS
Rationale
Fully open-source. SeaweedFS is architecturally optimized for storing large numbers of small files.
Consequences
➕ Efficiency when storing millions of documents
➖ Smaller community and ecosystem compared to MinIO
ADR-004: vLLM + llama.cpp as Two-Tier LLM Inference
Status: Accepted
Context
LLM inference is needed in an air-gapped environment. GPU resources are limited and expanded in increments. Graceful degradation is required upon GPU node loss.
| Option | Pros | Cons |
|---|---|---|
| vLLM (primary) | PagedAttention, continuous batching, high throughput, streaming | Requires GPU, high memory consumption |
| llama.cpp (fallback) | CPU inference, minimal requirements, quantization | Low speed on large models |
| Triton Inference Server | Versatility, multi-model | Configuration complexity, overkill for LLM |
| Ollama | Simplicity | Not production-ready for high load |
Decision: vLLM as primary + llama.cpp as edge/fallback
Rationale
The two-tier approach addresses NFR-8 (resilience under resource constraints): when GPU is unavailable, the system does not fail but degrades to CPU inference at reduced speed. vLLM provides production throughput via PagedAttention and continuous batching.
Consequences
➕ Graceful degradation: if GPU is unavailable, llama.cpp continues operation
➕ LoRA adapters are updated in vLLM without restart
➖ Two different runtimes require maintaining two configurations
ADR-005: Langfuse for RAG Observability
Status: Accepted
Context
A tool is needed for tracing and quality evaluation of the RAG pipeline: call chain logging, relevance assessment, prompt A/B testing. Environment: air-gapped.
| Option | Pros | Cons |
|---|---|---|
| Langfuse | Open-source, self-hosted, full tracing functionality | Fewer integrations than LangSmith |
| LangSmith | Deep LangChain integration, rich UI | Cloud service – incompatible with air-gapped |
Decision: Langfuse
Rationale
LangSmith is a cloud service, fundamentally incompatible with an air-gapped environment and data sovereignty requirements. Langfuse is deployed on-premise, has open-source code, and covers all required scenarios: tracing, evaluation, and Prompt Registry.
Consequences
➕ Full control over tracing data
➕ Prompt Registry for prompt version management
➖ Requires self-deployment and maintenance
ADR-006: Airflow and Temporal for Orchestration
Status: Accepted
Context
The system contains two classes of processes: regular batch data transformations (Bronze→Silver→Gold) and long-running event-driven processes (OCR, downstream notifications). A single orchestrator handles both scenarios poorly.
| Option | Pros | Cons |
|---|---|---|
| Airflow only | Single tool, mature, large community | Stateless: no resume-from-failure for long-running processes |
| Temporal only | Stateful, durable execution, retry from any point | Overkill for simple DAG scheduling |
| Airflow + Temporal | Each tool optimized for its task | Two tools in the stack |
| Prefect | Modern, simpler than Airflow | Smaller community, weaker for stateful workflows |
Decision: Airflow for batch pipelines + Temporal for event-driven processes
Rationale
The OCR pipeline can run for hours and must resume from the point of failure without losing progress – this is Temporal's domain. Bronze→Silver→Gold transformations are classic scheduled DAGs – Airflow's domain. Separation of concerns eliminates compromises.
Consequences
➕ Each orchestrator is optimal for its class of tasks
➕ Reliability for long-running processes (OCR, notifications)
➖ The team must be proficient in both tools
ADR-007: Qdrant as the Vector Database
Status: Accepted
Context
The RAG pipeline requires storage and search of document vector embeddings. Volume: tens of millions of chunks. On-premise, air-gapped. Metadata filtering is required.
| Option | Pros | Cons |
|---|---|---|
| Qdrant | Rust implementation (performance), rich payload filtering, on-premise, active development | Smaller ecosystem than Weaviate |
| pgvector | PostgreSQL already in stack, simplicity | Degrades at large volumes (>10M vectors) |
| Weaviate | Rich functionality, GraphQL API | Go implementation, higher memory consumption |
| Milvus | High performance | Deployment complexity, etcd dependency |
| ElasticSearch (kNN) | Already in stack | Not optimized for vector search |
Decision: Qdrant
Rationale
At 10M+ documents per month, pgvector loses performance. Qdrant is written in Rust, demonstrates the best latency/throughput ratio in benchmarks, supports metadata filtering (critical for document-level RBAC), and deploys as a simple single binary.
Consequences
➕ High performance at large volumes
➕ Metadata filtering for access control
➖ Additional component in the stack (PostgreSQL cannot be reused)
ADR-008: Medallion Architecture
Status: Accepted
Context
Data arrives from dozens of heterogeneous sources of varying quality. A storage strategy is needed that ensures reproducibility, quality manageability, and analytics readiness. AI enrichment may be unavailable (no GPU).
| Option | Pros | Cons |
|---|---|---|
| Medallion (Bronze/Silver/Gold) | Reproducibility, quality isolation, AI graceful degradation | Data stored in multiple copies |
| Data Vault | Auditability, flexibility | High modeling complexity |
| Kimball DWH | Mature approach, familiar to analysts | Rigid schema, poor fit for unstructured data |
| Single storage | Simplicity | No quality isolation, difficult to roll back |
Decision: Medallion Architecture
Rationale
Bronze as an immutable source copy is insurance against transformation and AI enrichment errors. When GPU is unavailable, documents accumulate in Bronze/Silver, and AI enrichment is executed upon resource recovery. The Gold layer is always available for analytics on previously prepared data.
Consequences
➕ Reproducibility: any stage can be restarted
➕ Graceful degradation: analytics does not depend on AI availability
➖ Storing data in three copies increases disk space requirements
ADR-009: NeMo Guardrails + Guardrails AI as Two-Tier LLM Protection
Status: Accepted
Context
The government system handles citizens' personal data. The LLM may generate toxic content, hallucinations, or personal data leaks. Protection is needed at all stages of the request lifecycle.
| Option | Pros | Cons |
|---|---|---|
| NeMo Guardrails | Colang scripts for dialog flow, input/output/retrieval/execution rails, from NVIDIA | Ties to NVIDIA ecosystem |
| Guardrails AI | Structured output validation, Pydantic schemas, retry on failure | Does not cover dialog flow |
| Prompt engineering only | Simplicity | Unreliable, easily bypassed |
| Custom filters | Full control | High development and maintenance cost |
Decision: NeMo Guardrails (dialog/flow protection) + Guardrails AI (structural validation)
Rationale
The tools cover different aspects: NeMo manages dialog logic and blocks undesirable scenarios at the flow level; Guardrails AI validates the structure and content of output against schemas. Together they form a layered defense critical for a government system handling personal data.
Consequences
➕ Protection at all stages: input → retrieval → prompt → output
➕ Full auditability of all blocking decisions
➖ Latency overhead at each stage (~100–200ms total)
➖ Requires maintaining Colang scripts when business logic changes





