Skip to content
Frank, the Talos Cluster: Overview & Roadmap
Frank, the Talos Cluster: Overview & Roadmap

Frank, the Talos Cluster: Overview & Roadmap

This is the overview post for the Frank, the Talos Cluster series — a tutorial-style walkthrough of building an AI-hybrid Kubernetes homelab from scratch.

This post is a living document: it gets updated as new technologies and capabilities are added to the cluster.

Roadmap

1 Hardware — 7 Nodes, 3 Zones
3x Intel NUC (Core zone) 1x GPU tower — RTX 5070 1x Legacy desktop 2x Raspberry Pi 4
x86_64 arm64 heterogeneous
2 OS & Bootstrap
Talos Linux (immutable) Sidero Omni (lifecycle) Declarative machine config Rolling upgrades
no SSH API-driven reproducible
3 Networking — Cilium CNI
eBPF kube-proxy replacement L2 LoadBalancer (ARP) Hubble observability Network policy
eBPF 192.168.55.200-254
4 Storage — Longhorn
Distributed 3-replica block storage GPU-local StorageClass 2x 4TB SSD on gpu-1 iSCSI via Talos extensions
strict-local best-effort all 7 nodes
5 GPU Compute
NVIDIA GPU Operator (RTX 5070) Intel DRA driver (Arc iGPU) Dynamic Resource Allocation CDI device injection
K8s 1.35 DRA ResourceClaim DeviceClass
6 GitOps — ArgoCD
App-of-Apps pattern Multi-source Applications Self-healing + drift detection Zero-downtime adoption
single repo annotation tracking
7 Fun Stuff
OpenRGB via USB HID DaemonSet + ConfigMap Custom container build (GitHub Actions) IT5701 firmware lock (in progress)
completely unnecessary fans still rainbow
8 Observability
VictoriaMetrics (metrics + alerts) VictoriaLogs (log aggregation) Grafana dashboards Fluent Bit log shipping Blackbox Exporter (endpoint probes) Pushgateway (heartbeat ingestion) Telegram alerting Health Bridge (GitHub lifecycle)
VMSingle Alertmanager Feature Health health-bridge 192.168.55.203
9 Backup
Longhorn → Cloudflare R2 Daily + weekly recurring jobs SOPS-encrypted credentials NAS target (pending Longhorn 1.13)
S3-compatible 7-day RPO
10 Secrets Management
Infisical (self-hosted vault) External Secrets Operator ClusterSecretStore ExternalSecret → K8s Secret
audit trail Universal Auth 192.168.55.204
11 Local Inference
Ollama (gpu-1, RTX 5070) LiteLLM (unified gateway) OpenRouter (free cloud models) OpenAI-compatible API
ollama litellm 192.168.55.206
12 Agentic Control Plane
Sympozium (K8s-native agents) n8n (per-user workflow automation) VK Remote (self-hosted kanban API) ElectricSQL real-time sync
agent=Pod n8n vibekanban 192.168.55.207 192.168.55.216
13 Unified Auth
Authentik IdP (OIDC + proxy) SSO for ArgoCD, Grafana, Infisical Forward auth for Longhorn, Hubble, Sympozium OIDC-backed kubectl via apiserver
OIDC forward-auth 192.168.55.211
14 Multi-tenancy
vCluster (K8s-in-K8s) Disposable experiment clusters Resource quotas + network policies GitOps-provisioned via ArgoCD
vcluster multi-tenant SQLite
15 AI Agent Orchestrator
Paperclip (org-chart agents) Virtual companies + budgets Delegation chains + governance LiteLLM gateway integration
paperclip company model 192.168.55.212
16 Media Generation
ComfyUI (diffusion models) LTX-2.3 video, SDXL image, Stable Audio GPU Switcher dashboard (Go) Time-sharing via replica scaling
comfyui gpu-switcher 192.168.55.213
17 Public Edge — Hop
Hetzner CX23 (single-node Talos) Headscale mesh + Tailscale Caddy reverse proxy + TLS Split-DNS (MagicDNS)
edge WireGuard blog.derio.net
18 Persistent Agent
Kali Linux workstation Always-on Claude Code agent SSH remote access 50Gi persistent /root
kali claude --remote 192.168.55.215
19 Progressive Delivery
Argo Rollouts controller LiteLLM canary (Cilium traffic split) Sympozium blue-green VictoriaMetrics analysis gates
canary blue-green workloadRef
21 Secure Agent Pod
Hardened non-root Kali container Cilium egress allowlist VibeKanban agent orchestration VK Relay (WebSocket tunnel to browser)
security vibekanban relay 192.168.55.215
24 In-Cluster Ingress
Traefik v3 on raspi edge nodes Wildcard TLS (*.cluster.derio.net) Authentik forward-auth (12 services) Homepage dashboard
traefik acme 192.168.55.220
25 CI/CD Platform
Gitea (GitHub mirror forge) Tekton Pipelines + Triggers Zot OCI registry (cosign signed) Webhook-driven CI on pc-1
gitea tekton zot 192.168.55.209
26 Agent Images and the VK-Local Sidecar
agent-images repo (shared base + children) Matrix CI with cross-repo repository_dispatch VK-local sidecar (shared /home/claude PVC) Lockstep bumper PR in frank
docker github-actions sidecar
Virtual Machines — upcoming
KubeVirt (VMs as pods) CDI disk image import KubeVirt Manager UI Longhorn-backed DataVolumes
KVM 192.168.55.205

Technology → Capability Map

TechnologyCapabilities Unlocked
Talos Linux + OmniImmutable OS, declarative machine config, secure bootstrap
Cilium (eBPF)Kube-proxy replacement, L2 LoadBalancer, Hubble UI (192.168.55.202)
LonghornDistributed block storage, GPU-local StorageClass, 3-replica HA, UI (192.168.55.201)
ArgoCDGitOps, App-of-Apps, self-healing, drift detection
NVIDIA GPU OperatorGPU scheduling, AI/ML workloads, container toolkit
Intel GPU DRA DriveriGPU sharing via DRA, namespace-scoped GPU access
OpenRGBLED control from K8s (just for fun)
VictoriaMetrics + GrafanaCluster-wide metrics, alerting, dashboards, Grafana UI (192.168.55.203)
VictoriaLogs + Fluent BitCentralised log aggregation and querying
Longhorn Backup + Cloudflare R2PVC backup/restore, daily + weekly schedules, offsite storage
Infisical + External Secrets OperatorSecret management with audit trail, ExternalSecret → K8s Secret sync (192.168.55.204)
OllamaLocal LLM inference on gpu-1’s RTX 5070 (qwen3.5:9b, deepseek-coder:6.7b)
LiteLLMUnified OpenAI-compatible gateway, virtual keys, spend tracking (192.168.55.206)
OpenRouterFree-tier cloud model aggregation (DeepSeek R1, Gemini Flash, Llama 3.3 70B)
SympoziumKubernetes-native agentic control plane — agent=Pod, policy=CRD, execution=Job (192.168.55.207)
cert-managerAutomated TLS certificate lifecycle for webhooks and internal services
AuthentikUnified SSO — OIDC for ArgoCD, Grafana, Infisical; forward-auth proxy for Longhorn, Hubble, Sympozium (192.168.55.211)
vClusterVirtual K8s clusters inside Frank — disposable sandboxes with own API server, resource quotas, network policies
PaperclipAI agent orchestrator — virtual companies with org charts, budgets, and delegation chains; complements Sympozium (192.168.55.212)
ComfyUIDiffusion model serving — video (LTX-2.3), image (SDXL), audio (Stable Audio), node-based workflow editor (192.168.55.213)
GPU SwitcherCustom Go dashboard for GPU time-sharing — one-click switching between Ollama and ComfyUI (192.168.55.214)
Hop (Hetzner Edge)Public-facing single-node Talos cluster — Headscale mesh, Caddy reverse proxy, blog hosting, split-DNS
Headscale + TailscaleWireGuard mesh networking — remote homelab access from any device, MagicDNS for split-DNS
CaddyAutomatic TLS (Cloudflare DNS challenge), public/mesh routing, path rewriting
Secure Agent PodHardened non-root coding agent workstation — Cilium egress, dropped capabilities, VibeKanban orchestration, SSH (192.168.55.215) + UI (192.168.55.218)
Argo RolloutsProgressive delivery — canary (Cilium traffic splitting + VictoriaMetrics analysis) and blue-green (preview + atomic cutover)
n8nPer-user workflow automation — 400+ integrations, visual node editor, webhook triggers, Authentik forward-auth (192.168.55.216)
Blackbox Exporter + PushgatewayFeature-level health monitoring — HTTP endpoint probes, cron heartbeat ingestion, Grafana alerting to Telegram
Health BridgeGrafana alert → GitHub Project lifecycle state bridge — automatic degraded/dead/healthy transitions, issue comments, bug issue creation
Traefik (in-cluster)In-cluster ingress controller, wildcard TLS (*.cluster.derio.net), ACME via Cloudflare DNS-01, Authentik forward-auth for 12 services (192.168.55.220)
VK Remote (self-hosted)Self-hosted VibeKanban kanban API — PostgreSQL 16, ElectricSQL real-time sync, Rust/Axum server, local JWT auth, Authentik SSO ingress (vk.cluster.derio.net)
VK RelayWebSocket relay sidecar tunneling browser API calls to local VK agent server via yamux multiplexing, SPAKE2 pairing, Ed25519 request signing
gethomepage.devCluster dashboard at master.cluster.derio.net — service catalog with HTTP health indicators, custom bookmarks
GiteaSelf-hosted git forge with GitHub pull-mirror, Authentik OIDC SSO (192.168.55.209)
TektonK8s-native CI/CD pipelines — webhook-driven clone, test, build, sign, report status on pc-1
ZotOCI container/artifact registry with cert-manager TLS and cosign image signing (192.168.55.210)
agent-imagesShared base image + per-pod children repo — agent-base toolchain + secure-agent-kali / vk-local children, matrix CI, cross-repo repository_dispatch, lockstep bumper PR

Cluster State

NodeZoneRoleHardware
mini-1/2/3Core (B)Control-plane + WorkerIntel Ultra 5, 64GB RAM, 1TB NVMe, Arc iGPU
gpu-1AI Compute (C)Workeri9, 128GB RAM, RTX 5070, 2x4TB SSD
pc-1Edge (D)WorkerLegacy desktop, 64GB SSD + 3x HDD
raspi-1/2Edge (D)WorkerRaspberry Pi 4, 32GB SD

Series Index

  1. Introduction — Why Build a Kubernetes Homelab?
  2. Building the Foundation — Talos, Nodes, and Cilium
  3. Persistent Storage with Longhorn
  4. GPU Compute — NVIDIA and Intel
  5. GitOps Everything with ArgoCD
  6. Fun Stuff — Controlling Case LEDs from Kubernetes
  7. Observability — VictoriaMetrics, Grafana, and Fluent Bit
  8. Backup — Longhorn to Cloudflare R2
  9. Secrets Management — Infisical + External Secrets Operator
  10. Local Inference — Ollama, LiteLLM, and OpenRouter
  11. Agentic Control Plane — Sympozium
  12. GPU Containers on Talos — The Validation Fix
  13. Unified Auth — Authentik SSO for the Entire Cluster
  14. Multi-tenancy — Disposable Kubernetes Clusters with vCluster
  15. Paperclip — An AI Agent Orchestrator on Frank
  16. Media Generation — ComfyUI and GPU Time-Sharing
  17. Hopping Through the Portal — A Public Edge Cluster
  18. Persistent Agent — A Kali Workstation on Kubernetes
  19. Progressive Delivery with Argo Rollouts
  20. Workflow Automation with n8n
  21. Secure Agent Pod — Hardening an AI Coding Workstation
  22. Health Monitoring — Feature Probes, Heartbeats, and Telegram Alerts
  23. Health Bridge — Closing the Loop from Grafana Alerts to GitHub Issues
  24. In-Cluster Ingress — Traefik, Wildcard TLS, and a Homepage Dashboard
  25. VK Relay — Tunneling the Browser to a Local Agent Server
  26. VK Remote — Self-Hosting the Kanban Backend Before the Cloud Dies
  27. CI/CD Platform — Gitea, Tekton, Zot, and Cosign
  28. Agent Images and the VK-Local Sidecar — Unbaking VibeKanban
  • Virtual Machines with KubeVirt (planned)

Operating on Frank — Series Index

Companion series with day-to-day commands, health checks, and debugging guides.

  1. Operating on Cluster & Nodes
  2. Operating on Storage & Backups
  3. Operating on GitOps
  4. Operating on GPU Compute
  5. Operating on Observability
  6. Operating on Secrets
  7. Operating on Local Inference
  8. Operating on Authentication
  9. Operating on Multi-tenancy
  10. Operating on Media Generation
  11. Operating on Hop — Single-Node Talos Edge Cluster
  12. Operating on Progressive Delivery
  13. Operating on Workflow Automation
  14. Operating on Secure Agent Pod
  15. Operating on Health Monitoring
  16. Operating on Health Bridge
  17. Operating on In-Cluster Ingress
  18. Operating on Paperclip
  19. Git Credentials Without a Shell
  20. Operating on VK Relay
  21. Operating on VK Remote
  22. Operating on CI/CD Platform
  23. Operating on ArgoCD Drift