
Frank, the Talos Cluster: Overview & Roadmap
This is the overview post for the Frank, the Talos Cluster series — a tutorial-style walkthrough of building an AI-hybrid Kubernetes homelab from scratch.
This post is a living document: it gets updated as new technologies and capabilities are added to the cluster.
Roadmap
1 Hardware — 7 Nodes, 3 Zones
3x Intel NUC (Core zone)
1x GPU tower — RTX 5070
1x Legacy desktop
2x Raspberry Pi 4
2 OS & Bootstrap
Talos Linux (immutable)
Sidero Omni (lifecycle)
Declarative machine config
Rolling upgrades
3 Networking — Cilium CNI
eBPF kube-proxy replacement
L2 LoadBalancer (ARP)
Hubble observability
Network policy
4 Storage — Longhorn
Distributed 3-replica block storage
GPU-local StorageClass
2x 4TB SSD on gpu-1
iSCSI via Talos extensions
5 GPU Compute
NVIDIA GPU Operator (RTX 5070)
Intel DRA driver (Arc iGPU)
Dynamic Resource Allocation
CDI device injection
6 GitOps — ArgoCD
App-of-Apps pattern
Multi-source Applications
Self-healing + drift detection
Zero-downtime adoption
7 Fun Stuff
OpenRGB via USB HID
DaemonSet + ConfigMap
Custom container build (GitHub Actions)
IT5701 firmware lock (in progress)
8 Observability
VictoriaMetrics (metrics + alerts)
VictoriaLogs (log aggregation)
Grafana dashboards
Fluent Bit log shipping
Blackbox Exporter (endpoint probes)
Pushgateway (heartbeat ingestion)
Telegram alerting
Health Bridge (GitHub lifecycle)
9 Backup
Longhorn → Cloudflare R2
Daily + weekly recurring jobs
SOPS-encrypted credentials
NAS target (pending Longhorn 1.13)
10 Secrets Management
Infisical (self-hosted vault)
External Secrets Operator
ClusterSecretStore
ExternalSecret → K8s Secret
11 Local Inference
Ollama (gpu-1, RTX 5070)
LiteLLM (unified gateway)
OpenRouter (free cloud models)
OpenAI-compatible API
12 Agentic Control Plane
Sympozium (K8s-native agents)
n8n (per-user workflow automation)
VK Remote (self-hosted kanban API)
ElectricSQL real-time sync
13 Unified Auth
Authentik IdP (OIDC + proxy)
SSO for ArgoCD, Grafana, Infisical
Forward auth for Longhorn, Hubble, Sympozium
OIDC-backed kubectl via apiserver
14 Multi-tenancy
vCluster (K8s-in-K8s)
Disposable experiment clusters
Resource quotas + network policies
GitOps-provisioned via ArgoCD
15 AI Agent Orchestrator
Paperclip (org-chart agents)
Virtual companies + budgets
Delegation chains + governance
LiteLLM gateway integration
16 Media Generation
ComfyUI (diffusion models)
LTX-2.3 video, SDXL image, Stable Audio
GPU Switcher dashboard (Go)
Time-sharing via replica scaling
17 Public Edge — Hop
Hetzner CX23 (single-node Talos)
Headscale mesh + Tailscale
Caddy reverse proxy + TLS
Split-DNS (MagicDNS)
18 Persistent Agent
Kali Linux workstation
Always-on Claude Code agent
SSH remote access
50Gi persistent /root
19 Progressive Delivery
Argo Rollouts controller
LiteLLM canary (Cilium traffic split)
Sympozium blue-green
VictoriaMetrics analysis gates
21 Secure Agent Pod
Hardened non-root Kali container
Cilium egress allowlist
VibeKanban agent orchestration
VK Relay (WebSocket tunnel to browser)
24 In-Cluster Ingress
Traefik v3 on raspi edge nodes
Wildcard TLS (*.cluster.derio.net)
Authentik forward-auth (12 services)
Homepage dashboard
25 CI/CD Platform
Gitea (GitHub mirror forge)
Tekton Pipelines + Triggers
Zot OCI registry (cosign signed)
Webhook-driven CI on pc-1
26 Agent Images and the VK-Local Sidecar
agent-images repo (shared base + children)
Matrix CI with cross-repo
repository_dispatch
VK-local sidecar (shared /home/claude PVC)
Lockstep bumper PR in frank— Virtual Machines — upcoming
KubeVirt (VMs as pods)
CDI disk image import
KubeVirt Manager UI
Longhorn-backed DataVolumes
Technology → Capability Map
| Technology | Capabilities Unlocked |
|---|---|
| Talos Linux + Omni | Immutable OS, declarative machine config, secure bootstrap |
| Cilium (eBPF) | Kube-proxy replacement, L2 LoadBalancer, Hubble UI (192.168.55.202) |
| Longhorn | Distributed block storage, GPU-local StorageClass, 3-replica HA, UI (192.168.55.201) |
| ArgoCD | GitOps, App-of-Apps, self-healing, drift detection |
| NVIDIA GPU Operator | GPU scheduling, AI/ML workloads, container toolkit |
| Intel GPU DRA Driver | iGPU sharing via DRA, namespace-scoped GPU access |
| OpenRGB | LED control from K8s (just for fun) |
| VictoriaMetrics + Grafana | Cluster-wide metrics, alerting, dashboards, Grafana UI (192.168.55.203) |
| VictoriaLogs + Fluent Bit | Centralised log aggregation and querying |
| Longhorn Backup + Cloudflare R2 | PVC backup/restore, daily + weekly schedules, offsite storage |
| Infisical + External Secrets Operator | Secret management with audit trail, ExternalSecret → K8s Secret sync (192.168.55.204) |
| Ollama | Local LLM inference on gpu-1’s RTX 5070 (qwen3.5:9b, deepseek-coder:6.7b) |
| LiteLLM | Unified OpenAI-compatible gateway, virtual keys, spend tracking (192.168.55.206) |
| OpenRouter | Free-tier cloud model aggregation (DeepSeek R1, Gemini Flash, Llama 3.3 70B) |
| Sympozium | Kubernetes-native agentic control plane — agent=Pod, policy=CRD, execution=Job (192.168.55.207) |
| cert-manager | Automated TLS certificate lifecycle for webhooks and internal services |
| Authentik | Unified SSO — OIDC for ArgoCD, Grafana, Infisical; forward-auth proxy for Longhorn, Hubble, Sympozium (192.168.55.211) |
| vCluster | Virtual K8s clusters inside Frank — disposable sandboxes with own API server, resource quotas, network policies |
| Paperclip | AI agent orchestrator — virtual companies with org charts, budgets, and delegation chains; complements Sympozium (192.168.55.212) |
| ComfyUI | Diffusion model serving — video (LTX-2.3), image (SDXL), audio (Stable Audio), node-based workflow editor (192.168.55.213) |
| GPU Switcher | Custom Go dashboard for GPU time-sharing — one-click switching between Ollama and ComfyUI (192.168.55.214) |
| Hop (Hetzner Edge) | Public-facing single-node Talos cluster — Headscale mesh, Caddy reverse proxy, blog hosting, split-DNS |
| Headscale + Tailscale | WireGuard mesh networking — remote homelab access from any device, MagicDNS for split-DNS |
| Caddy | Automatic TLS (Cloudflare DNS challenge), public/mesh routing, path rewriting |
| Secure Agent Pod | Hardened non-root coding agent workstation — Cilium egress, dropped capabilities, VibeKanban orchestration, SSH (192.168.55.215) + UI (192.168.55.218) |
| Argo Rollouts | Progressive delivery — canary (Cilium traffic splitting + VictoriaMetrics analysis) and blue-green (preview + atomic cutover) |
| n8n | Per-user workflow automation — 400+ integrations, visual node editor, webhook triggers, Authentik forward-auth (192.168.55.216) |
| Blackbox Exporter + Pushgateway | Feature-level health monitoring — HTTP endpoint probes, cron heartbeat ingestion, Grafana alerting to Telegram |
| Health Bridge | Grafana alert → GitHub Project lifecycle state bridge — automatic degraded/dead/healthy transitions, issue comments, bug issue creation |
| Traefik (in-cluster) | In-cluster ingress controller, wildcard TLS (*.cluster.derio.net), ACME via Cloudflare DNS-01, Authentik forward-auth for 12 services (192.168.55.220) |
| VK Remote (self-hosted) | Self-hosted VibeKanban kanban API — PostgreSQL 16, ElectricSQL real-time sync, Rust/Axum server, local JWT auth, Authentik SSO ingress (vk.cluster.derio.net) |
| VK Relay | WebSocket relay sidecar tunneling browser API calls to local VK agent server via yamux multiplexing, SPAKE2 pairing, Ed25519 request signing |
| gethomepage.dev | Cluster dashboard at master.cluster.derio.net — service catalog with HTTP health indicators, custom bookmarks |
| Gitea | Self-hosted git forge with GitHub pull-mirror, Authentik OIDC SSO (192.168.55.209) |
| Tekton | K8s-native CI/CD pipelines — webhook-driven clone, test, build, sign, report status on pc-1 |
| Zot | OCI container/artifact registry with cert-manager TLS and cosign image signing (192.168.55.210) |
| agent-images | Shared base image + per-pod children repo — agent-base toolchain + secure-agent-kali / vk-local children, matrix CI, cross-repo repository_dispatch, lockstep bumper PR |
Cluster State
| Node | Zone | Role | Hardware |
|---|---|---|---|
| mini-1/2/3 | Core (B) | Control-plane + Worker | Intel Ultra 5, 64GB RAM, 1TB NVMe, Arc iGPU |
| gpu-1 | AI Compute (C) | Worker | i9, 128GB RAM, RTX 5070, 2x4TB SSD |
| pc-1 | Edge (D) | Worker | Legacy desktop, 64GB SSD + 3x HDD |
| raspi-1/2 | Edge (D) | Worker | Raspberry Pi 4, 32GB SD |
Series Index
- Introduction — Why Build a Kubernetes Homelab?
- Building the Foundation — Talos, Nodes, and Cilium
- Persistent Storage with Longhorn
- GPU Compute — NVIDIA and Intel
- GitOps Everything with ArgoCD
- Fun Stuff — Controlling Case LEDs from Kubernetes
- Observability — VictoriaMetrics, Grafana, and Fluent Bit
- Backup — Longhorn to Cloudflare R2
- Secrets Management — Infisical + External Secrets Operator
- Local Inference — Ollama, LiteLLM, and OpenRouter
- Agentic Control Plane — Sympozium
- GPU Containers on Talos — The Validation Fix
- Unified Auth — Authentik SSO for the Entire Cluster
- Multi-tenancy — Disposable Kubernetes Clusters with vCluster
- Paperclip — An AI Agent Orchestrator on Frank
- Media Generation — ComfyUI and GPU Time-Sharing
- Hopping Through the Portal — A Public Edge Cluster
- Persistent Agent — A Kali Workstation on Kubernetes
- Progressive Delivery with Argo Rollouts
- Workflow Automation with n8n
- Secure Agent Pod — Hardening an AI Coding Workstation
- Health Monitoring — Feature Probes, Heartbeats, and Telegram Alerts
- Health Bridge — Closing the Loop from Grafana Alerts to GitHub Issues
- In-Cluster Ingress — Traefik, Wildcard TLS, and a Homepage Dashboard
- VK Relay — Tunneling the Browser to a Local Agent Server
- VK Remote — Self-Hosting the Kanban Backend Before the Cloud Dies
- CI/CD Platform — Gitea, Tekton, Zot, and Cosign
- Agent Images and the VK-Local Sidecar — Unbaking VibeKanban
- Virtual Machines with KubeVirt (planned)
Operating on Frank — Series Index
Companion series with day-to-day commands, health checks, and debugging guides.
- Operating on Cluster & Nodes
- Operating on Storage & Backups
- Operating on GitOps
- Operating on GPU Compute
- Operating on Observability
- Operating on Secrets
- Operating on Local Inference
- Operating on Authentication
- Operating on Multi-tenancy
- Operating on Media Generation
- Operating on Hop — Single-Node Talos Edge Cluster
- Operating on Progressive Delivery
- Operating on Workflow Automation
- Operating on Secure Agent Pod
- Operating on Health Monitoring
- Operating on Health Bridge
- Operating on In-Cluster Ingress
- Operating on Paperclip
- Git Credentials Without a Shell
- Operating on VK Relay
- Operating on VK Remote
- Operating on CI/CD Platform
- Operating on ArgoCD Drift