My Self-Hosted AI Stack: Infrastructure Deep Dive (Part 2...

This is Part 2 of a multi-part series on my self-hosted AI stack. Part 1 covers the application architecture and a layer-by-layer walkthrough of every service. If you haven’t read it yet, I’d recommend starting there — it gives the context that makes Part 2 make sense.

Part 2 goes deeper into the infrastructure side of things: how traffic is designed to route, how containers are sized and constrained, what databases and data stores are running under the hood.

Specifically, this post covers:

Resource limits and sizing — CPU and memory constraints for every container, and the reasoning behind them
Infrastructure and routing — the dual-network design, Traefik reverse proxy configuration, Cloudflare Tunnels, and how external traffic is isolated from internal service communication
Data layer — every database and persistent store in the stack: PostgreSQL, SQLite, ChromaDB, Redis, and Minio
Backups — the sidecar backup strategy for PostgreSQL, n8n, and Open WebUI, with data landing on a NAS over SMB

Self-Hosted AI Stack: Overview

In this part I’ll walk through the network topology, container sizing, and every database in this self-hosted AI stack.

Resource Limits & Sizing

If you read part one, you will have seen that there are a lot of components that make up this AI stack. Every container in the stack has explicit CPU and memory limits and reservations set. This is to prevent a single runaway container or model starving other services of resources. These limits have been defined based on my production deployment running on my host with an NVIDIA A10 (24 GB vRAM), this is presented to a VM allocated with 24 CPU cores, and 96 GB RAM plus 500GB of NVMe storage. Based on that profile, I have allocated container resources as follows

Container	Role	Mem Limit	CPU Limit
ollama	LLM inference (GPU)	24G	12.0
comfyui	Image generation (GPU)	16G	6.0
whishper	Speech-to-text (GPU)	4G	4.0
mongo	DB for Whishper only	1G	1.0
ollama-exporter	Metrics scraper	128M	0.25
postgres	Shared relational DB	4G	4.0
qdrant	Vector DB for RAG	6G	4.0
tika	Document parsing (JVM)	2G	2.0
searxng	Metasearch engine	1G	1.0
qdrant-backup	Periodic backup sidecar	256M	0.5
langfuse-web	LLM observability UI	2G	2.0
langfuse-worker	Background trace processing	2G	2.0
clickhouse	Analytics DB for Langfuse	6G	4.0
redis	Job queue for Langfuse	512M	0.5
minio	S3-compatible object store	1G	1.0
minio-init	One-shot bucket creation	–	–
jaeger	Distributed tracing	4G	2.0
otel-collector	Telemetry aggregation	1G	1.0
nvidia-gpu-exporter	GPU metrics	256M	0.5
prometheus	Metrics TSDB	2G	2.0
grafana	Dashboards	2G	2.0
smarterrouter	LLM routing	2G	2.0
open-webui	Chat UI	2G	2.0
open-webui-backup	Periodic SQLite backup	256M	0.25
pipelines	Filter/plugin framework	2G	2.0
open-terminal	Terminal server	1G	1.0
n8n	Workflow automation	4G	4.0

As can be seen from the docker containers above the resource requirements are not trivial.

I would recommend the following VM configuration as a minimum to use the services with some tweaks to the docker and Ansible configuration to accommodate the reduced footprint. I would also highly recommend the data disk is on NVMe

Resource	Allocation
CPU	16
Memory (GB)	18
GPU vRAM (GB)	8
OS Disk (GB)	50
Data Disk (GB)	200

Infrastructure & Routing

All of the containers sit behind Traefik as the reverse proxy, with TLS certificates issued via the Cloudflare DNS challenge. Some public-facing services are exposed through Cloudflare Tunnels with Cloudflare Access policies enforcing authentication. Internal services that don’t need to be externally accessible stay isolated on the internal network with no Traefik labels and no external routes.

Network Isolation

The stack uses a dual-network design. An external traefik network connects services that need to be reachable via Traefik ( Blue Zone) (Open WebUI, Grafana, Jaeger, SmarterRouter, etc.), and a separate aistack-internal bridge network handles all inter-service communication. The internal network (Green Zone) is configured and a dedicated subnet (172.30.0.0/24), which means containers on it have no outbound internet access by default — they can only talk to each other.

Services that need both external routing and internal communication (like Open WebUI, which needs Traefik for HTTPS but also needs to reach Ollama, Qdrant, and Tika) sit on both networks. Purely backend services — PostgreSQL, Redis, ClickHouse, Qdrant, the OTel Collector — only get the internal network, with traefik.enable=false on their labels so they’re never accidentally exposed. This limits the blast radius if any single container is compromised.

Domain Portability

The deployment is designed in a way that the url can be adapted as needed. compose file uses the pattern ${DOMAIN:-jameskilby.cloud} rather than a hardcoded hostname. Traefik labels, environment variables, webhook URLs — all of them resolve through a single DOMAIN variable in the .env file. This allows you to fork the repo and deploy the stack under your own domain, you change one variable and every service picks it up — chat.yourdomain.com, grafana.yourdomain.com, langfuse.yourdomain.com, and so on. The :-jameskilby.cloud default means the compose still works if the variable is unset, which keeps docker compose config clean for local testing.

An .env.example file is included in the repo with placeholder values and generation instructions for every required secret. Copy it to .env, fill in your values, and the stack is ready to deploy.

Persistent data that needs to survive host failures (Prometheus TSDB, Jaeger traces) is stored on a SMB share. Data that needs fast local I/O (Ollama models, Open WebUI state, databases) uses local Docker volumes.

Data Layer: Databases in the Stack

The stack uses a number of different data stores, each chosen to match the access pattern of the service it supports. They fall into three persistence tiers: local Docker volumes for latency-sensitive transactional workloads, SMB-mounted NAS volumes for append-heavy observability data, and an S3-compatible object store for blob storage.

PostgreSQL 16 – The Shared Relational Core

A single PostgreSQL 16.13 (Alpine) instance hosts three databases: langfuse for LLM observability, n8n for workflow definitions and execution history, and grafana for dashboard and user config. The Ansible playbook creates all three databases automatically at deploy time under the aistack user. Data lives on a local Docker volume (postgres_data) for write performance, and Postgres handles its own WAL-based crash recovery.

ClickHouse 24.12 – Columnar Analytics for Langfuse

Langfuse generates high volumes of trace and event data that need fast analytical queries—aggregations, filtering by time range, counting tokens across thousands of requests. ClickHouse’s columnar storage engine is purpose-built for this. It runs alongside Langfuse’s other dependencies on a local Docker volume (clickhouse_data),

Redis 7 – Job Queue & Cache

Redis serves as Langfuse’s job queue and caching layer. It’s configured with RDB snapshotting (--save 60 1) so queued jobs survive a container restart. Runs on a local Docker volume (redis_data) on the internal network only.

MinIO – S3-Compatible Object Storage

MinIO provides S3-compatible blob storage for Langfuse’s event uploads. A one-shot minio-init container creates the langfuse bucket on first startup using mc mb. Data lives on a local Docker volume (minio_data). This avoids any dependency on external cloud storage while giving Langfuse the S3 API it expects.

Qdrant v1.17 – Vector Database for RAG

Qdrant powers Open WebUI’s retrieval-augmented generation pipeline, storing document embeddings and handling similarity search. It exposes HTTP on port 6333 and gRPC on 6334, both on the internal network only. Apache Tika handles document parsing upstream, and the resulting vectors land in Qdrant’s local storage volume (qdrant_storage).

MongoDB 7 – Whishper Transcription Store

Whishper (the speech-to-text service) uses MongoDB to store transcription metadata and results. It runs on a local Docker volume (mongo_data) with a straightforward connection string: mongodb://root:example@mongo:27017/whishper?authSource=admin. The credentials are currently hardcoded in the compose file—a candidate for moving into .env in a future cleanup pass.

SQLite – Embedded in Open WebUI & SmarterRouter

Two services use SQLite as an embedded database. Open WebUI stores everything—users, conversations, RAG collections, model configs—in webui.db on a local Docker volume. SmarterRouter keeps its routing config and model performance cache in router.db on a separate volume. Both need POSIX file-locking, which rules out running them directly on SMB. The backup sidecar (covered in the Open WebUI Backup Strategy section below) handles off-host durability for webui.db.

Prometheus v3.10 & Jaeger/Badger – Observability Stores on NAS

Unlike the transactional databases above, the two observability stores mount directly to the NAS via CIFS. Prometheus writes its time-series data (TSDB) to an SMB-backed volume with a 30-day retention policy (--storage.tsdb.retention.time=30d). Jaeger uses Badger as its embedded key-value store for trace data, also writing to a CIFS volume. The rationale: metrics and traces are append-heavy, sequential workloads that tolerate network storage latency, while the NAS provides built-in RAID protection and snapshot capabilities without needing a separate backup strategy.

Persistence Summary

Database	Version	Used By	Storage
PostgreSQL	16.13	Langfuse, n8n, Grafana	Local volume
ClickHouse	24.12	Langfuse (analytics)	Local volume
Redis	7 Alpine	Langfuse (queue/cache)	Local volume (RDB snapshots)
MinIO	latest	Langfuse (event uploads)	Local volume
Qdrant	1.17.0	Open WebUI (RAG vectors)	Local volume
MongoDB	7	Whishper (transcriptions)	Local volume
SQLite	embedded	Open WebUI, SmarterRouter	Local volume + NAS backup
Prometheus	3.10.0	Metrics (Grafana)	CIFS/NAS
Jaeger/Badger	2.15.0	Traces (Grafana)	CIFS/NAS

This split—local volumes for transactional databases, NAS for observability—keeps write-sensitive workloads fast while ensuring metrics and traces survive host failures without a separate backup pipeline. The only exception is Open WebUI’s SQLite database, which gets its own backup sidecar because it’s both write-sensitive and too important to leave without off-host copies.

Backups

The self-hosted AI stack contains data in multiple locations. Some of this can be recreated or redownloaded easily and some of it needs protecting

OpenWebUI Backups

OpenWebUI is the front end to most of the AI stack and it can end up storing a lot of data from various sources. This data is mostly stored in a SQLite database. Due to the way sqlite works it is not possible to run it on remote storage. It therefore runs on local NVMe storage attached to the VM. To enable backups of this data, I have a second container that runs alongside OpenWebUI sharing access to the same volumes. It periodically (every 6 hours) performs a crash-consistent database backup and writes it out to the persistent smb share. It also performs retention of the database only keeping the last 7 copies. This gives me the ability to restore

All User config
Every conversation and the associated message history
Model configurations
Knowledge base metadata and document collections

It also creates a tarball of the uploads directory for any documents added for RAG purposes, retaining the last 3 copies on the NAS.

Qdrant Backups

With the OpenWebUI backup restored RAG searches will silently fail as the storage of the actual embeddings generated from the documents will not be available. To fix this I need to ensure I have a backup and restore process for the Qdrant data as well. I handle this in a similar way to the OpenWebUI backups

I perform a periodic snapshot via a sidecar container. It iterates all collections, snapshots each one, bundles them as a tarball, and writes them out to the same SMB-backed storage, retaining the last 7 copies.

PostgreSQL Backups

I have also deployed a backup sidecar container to handle PostgreSQL following the same principles. It runs pg_dumpall every 6 hours using the same postgres:16.13-alpine image as the server — keeping the pg_dump version in lockstep prevents any version mismatch on restore. Each run produces a gzip-compressed SQL dump (postgres-YYYYMMDD-HHMMSS.sql.gz) covering all three databases — langfuse, n8n, and grafana — written to the SMB-backed NAS volume. The last 7 dumps are retained. The sidecar only starts once PostgreSQL passes its healthcheck, so it never attempts a dump against an uninitialised server.

N8N Backups

The N8N container also contains a lot of data. A dedicated backup sidecar uses the n8n REST API (authenticated via N8N_BACKUP_API_KEY) to export data every 6 hours, committing changes to a local git repository for point-in-time history. To push changes to a remote GitHub repository, also define N8N_BACKUP_GIT_REPO and N8N_BACKUP_GIT_TOKEN in the .env file GitHub repo URL (e.g. https://github.com/jameskilbynet/n8n-backups.git)

Workflows – Full workflow definitions including nodes, connections, and settings, exported as individual files for easy diffing and selective restore.
Credentials – Metadata only (name, type, ID). The n8n API does not expose secret values — credential secrets remain encrypted in PostgreSQL and are never written to the backup.
Tags – All workflow tags, exported as a single JSON file.

If the Git credentials are not set, it will back up to the SMB repo like the other containers.

Continue Reading

← Part 1: Architecture Overview – The big picture, layer-by-layer walkthrough from inference to observability.
Part 3: Operations & Maintenance → – Ansible deployment, monitoring with Uptime Kuma, and the Open WebUI backup strategy — coming soon

My Self-Hosted AI Stack: Infrastructure Deep Dive (Part 2)

Table of Contents

Self-Hosted AI Stack: Overview

Resource Limits & Sizing