Skip to content
Operating on Paperclip
Operating on Paperclip

Operating on Paperclip

This is the operational companion to Paperclip — AI Agent Orchestrator. That post explains the architecture and deployment. This one covers health checks, database access, secret management, and common failure modes.

What “Healthy” Looks Like

Paperclip is healthy when:

  • The paperclip pod is 1/1 Running in the paperclip-system namespace
  • The web UI responds at http://192.168.55.212:3100
  • PostgreSQL is 1/1 Running with the metrics sidecar
  • All four ExternalSecrets show SecretSynced
  • The PVC is Bound

Quick health check:

# All-in-one status
kubectl get pods,pvc,externalsecret -n paperclip-system

Expected output: one paperclip pod, one paperclip-db pod, one 2Gi PVC bound, four ExternalSecrets synced.

$ kubectl get pods,pvc,externalsecret -n paperclip-system
NAME                             READY   STATUS    RESTARTS   AGE
pod/paperclip-78cfb8db86-z7z4n   1/1     Running   0          12d
pod/paperclip-db-postgresql-0    2/2     Running   0          28d

NAME                                                   STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   VOLUMEATTRIBUTESCLASS   AGE
persistentvolumeclaim/data-paperclip-db-postgresql-0   Bound    pvc-1929c98e-6a59-4eec-8c41-353833f43dec   5Gi        RWO            longhorn       <unset>                 37d
persistentvolumeclaim/paperclip-data                   Bound    pvc-1ded449d-e2bc-4e38-b7c9-c5d5ee264294   2Gi        RWO            longhorn       <unset>                 37d

NAME                                                     STORETYPE            STORE       REFRESH INTERVAL   STATUS         READY
externalsecret.external-secrets.io/paperclip-anthropic   ClusterSecretStore   infisical   5m                 SecretSynced   True
externalsecret.external-secrets.io/paperclip-auth        ClusterSecretStore   infisical   5m                 SecretSynced   True
externalsecret.external-secrets.io/paperclip-ghcr        ClusterSecretStore   infisical   5m                 SecretSynced   True
externalsecret.external-secrets.io/paperclip-llm-key     ClusterSecretStore   infisical   5m                 SecretSynced   True

Observing State

Pod Health

# Check pod status and restarts
kubectl get pods -n paperclip-system -o wide

# Verify the web UI is responding
curl -s -o /dev/null -w "%{http_code}" http://192.168.55.212:3100/
# Expected: 200 (or 403 in private mode — either means the app is up)

# Check startup logs (migrations, Agent JWT, backup schedule)
kubectl logs -n paperclip-system -l app.kubernetes.io/name=paperclip | head -30

# Tail logs in real-time
kubectl logs -n paperclip-system -l app.kubernetes.io/name=paperclip -f --tail=50

Database Health

# Check PostgreSQL pod
kubectl get pods -n paperclip-system -l app.kubernetes.io/instance=paperclip-db

# Connect to the database
kubectl exec -it -n paperclip-system \
  $(kubectl get pod -n paperclip-system -l app.kubernetes.io/instance=paperclip-db -o name) \
  -- psql -U paperclip -d paperclip

# Quick table count (inside psql)
SELECT schemaname, count(*) FROM pg_tables GROUP BY schemaname;

ExternalSecret Sync

# Check all secrets are synced from Infisical
kubectl get externalsecret -n paperclip-system

# Detailed sync status for a specific secret
kubectl describe externalsecret paperclip-llm-key -n paperclip-system

Four ExternalSecrets exist:

  • paperclip-llm-key — OPENAI_API_KEY and OPENAI_BASE_URL (points to LiteLLM)
  • paperclip-auth — BETTER_AUTH_SECRET for session signing
  • paperclip-anthropic — ANTHROPIC_API_KEY (optional, marked optional: true)
  • paperclip-ghcr — Image pull credentials for ghcr.io

Common Operations

Restarting Paperclip

The Deployment uses strategy: Recreate because the PVC is ReadWriteOnce. A rolling update would deadlock — the new pod can’t mount the volume while the old pod holds it. Recreate kills the old pod first, then starts the new one.

# Restart (zero-downtime is not possible with RWO PVC)
kubectl rollout restart deployment/paperclip -n paperclip-system

# Watch the restart
kubectl get pods -n paperclip-system -w

Expect a brief gap (10-30s) where Paperclip is unavailable while the old pod terminates and the new one starts.

Updating the Image

Paperclip uses a custom-built image pushed to GHCR. To deploy a new version:

# Update the image tag
kubectl set image deployment/paperclip \
  paperclip=ghcr.io/derio-net/paperclip:v2026.325.0-derio.2 \
  -n paperclip-system

# Or edit the manifest and let ArgoCD sync
# apps/paperclip/manifests/deployment.yaml → image tag

Database Backup and Restore

PostgreSQL data lives on a Longhorn PVC backed up by the cluster-wide recurring backup job.

# Check Longhorn backup status for the paperclip-db volume
kubectl get volume -n longhorn-system | grep paperclip

# Manual backup via Longhorn UI
# Navigate to http://192.168.55.201 → Volumes → paperclip-db → Create Backup

Troubleshooting

Pod Stuck in CrashLoopBackOff

Check the logs first:

kubectl logs -n paperclip-system -l app.kubernetes.io/name=paperclip --previous
kubectl describe pod -n paperclip-system -l app.kubernetes.io/name=paperclip

Common causes:

  • Database not ready — paperclip-db pod must be Running before paperclip starts. Check kubectl get pods -n paperclip-system.
  • Missing secret — if a non-optional ExternalSecret fails to sync, the pod hits CreateContainerConfigError. Check kubectl get externalsecret -n paperclip-system.
  • Port conflict — another process on the node binding port 3100 (unlikely with Cilium LB, but check events).

Multi-Attach Error on PVC

If you see Multi-Attach error for volume in events, the old pod didn’t release the volume before the new one started. This shouldn’t happen with Recreate strategy, but if it does:

# Force-delete the stuck pod
kubectl delete pod <old-pod-name> -n paperclip-system --grace-period=0 --force

# The new pod will mount the PVC and start
kubectl get pods -n paperclip-system -w

ExternalSecret Not Syncing

# Check the ExternalSecret status
kubectl describe externalsecret paperclip-llm-key -n paperclip-system

# Common issue: Infisical secret path changed
# Verify the secret exists in Infisical under the expected path
# Then check the ClusterSecretStore is healthy
kubectl get clustersecretstore infisical

LoadBalancer IP Not Assigned

# Check service status
kubectl get svc paperclip-lb -n paperclip-system

# If <pending>, check Cilium L2 IPAM
kubectl get ciliumpoolipaddress -A | grep 192.168.55.212

Gotchas

  • No Argo Rollouts for Paperclip. The RWO PVC makes it incompatible with blue-green and canary strategies. It runs as a plain Deployment with Recreate strategy. See Operating on Progressive Delivery for context on the Phase 3 revert.

  • TCP probes, not HTTP. In private mode, the root path returns 403 to non-localhost requests. Probes use tcpSocket on port http instead of httpGet.

  • PostgreSQL image uses GCR mirror. Bitnami no longer serves named tags on Docker Hub. The chart uses mirror.gcr.io/bitnamilegacy/* images. If the mirror goes down, you’ll need to find another source for the 14.1.10-debian-11-r16 tag.

  • Optional Anthropic secret. paperclip-anthropic ExternalSecret is marked optional: true. If the key doesn’t exist in Infisical, the pod starts fine without it — Paperclip falls back to LiteLLM for all model access.

References