clustered-fucks/docs/control-server-guide.md
2026-01-23 18:47:58 +00:00

11 KiB

Control Server Operations Guide

Host: control (CT 127)
IP: 192.168.1.127
Location: pve2
User: maddox
Last Updated: January 23, 2026


Overview

The control server is the centralized command center for managing the Proxmox cluster infrastructure. It provides:

  • Passwordless SSH to all 13 managed hosts
  • Ansible automation for cluster-wide operations
  • tmux sessions for multi-host management
  • Git-based configuration synced to Forgejo

Quick Start

Launch Interactive Menu

~/scripts/control-menu.sh

Launch Multi-Host SSH Session

~/scripts/ssh-manager.sh

Run Ansible Ad-Hoc Command

cd ~/clustered-fucks
ansible all -m ping
ansible docker_hosts -m shell -a "docker ps --format 'table {{.Names}}\t{{.Status}}'"

Directory Structure

/home/maddox/
├── .ssh/
│   ├── config              # SSH host definitions
│   ├── tmux-hosts.conf     # tmux session configuration
│   ├── id_ed25519          # SSH private key
│   └── id_ed25519.pub      # SSH public key (add to new hosts)
│
├── clustered-fucks/        # Git repo (synced to Forgejo)
│   ├── ansible.cfg         # Ansible configuration
│   ├── inventory/
│   │   ├── hosts.yml       # Host inventory
│   │   └── group_vars/
│   │       └── all.yml     # Global variables
│   └── playbooks/
│       ├── check-status.yml
│       ├── docker-prune.yml
│       ├── restart-utils.yml
│       ├── update-all.yml
│       └── deploy-utils.yml
│
└── scripts/
    ├── ssh-manager.sh      # tmux multi-host launcher
    ├── control-menu.sh     # Interactive Ansible menu
    └── add-host.sh         # New host onboarding

Managed Hosts

Host IP User Port Type Group
pve2 .3 root 22 Proxmox proxmox_nodes
pve-dell .4 root 22 Proxmox proxmox_nodes
replicant .80 maddox 22 VM docker_hosts
databases .81 root 22 VM docker_hosts
immich .82 root 22 VM docker_hosts
media-transcode .120 root 22 LXC docker_hosts
network-services .121 root 22 LXC docker_hosts
download-stack .122 root 22 LXC docker_hosts
docker666 .123 root 22 LXC docker_hosts
tailscale-home .124 root 22 LXC docker_hosts
dns-lxc .125 root 22 LXC infrastructure
nas .251 maddox 44822 NAS legacy
alien .252 maddox 22 Docker legacy

Ansible Host Groups

Group Members Use Case
all All 13 hosts Connectivity tests
docker_hosts 8 hosts Docker operations
all_managed 11 hosts System updates
proxmox_nodes pve2, pve-dell Node-level ops
infrastructure dns-lxc Non-Docker infra
legacy nas, alien Manual operations
vms replicant, databases, immich VM-specific
lxcs 6 LXC containers LXC-specific

Playbooks Reference

check-status.yml

Reports disk usage, memory usage, and container counts.

ansible-playbook playbooks/check-status.yml

Target: all_managed
Output: Per-host status line (Disk=X% Mem=X% Containers=X)


update-all.yml

Runs apt update and upgrade on all Docker hosts.

ansible-playbook playbooks/update-all.yml

# With reboot if required:
ansible-playbook playbooks/update-all.yml -e "reboot=true"

Target: docker_hosts
Note: Checks for reboot requirement, notifies but doesn't auto-reboot unless -e "reboot=true"


docker-prune.yml

Cleans unused Docker resources (images, networks, build cache).

ansible-playbook playbooks/docker-prune.yml

Target: docker_hosts
Note: dns-lxc will fail (no Docker) - this is expected


restart-utils.yml

Restarts the utils stack (watchtower, autoheal, docker-proxy) on all hosts.

ansible-playbook playbooks/restart-utils.yml

Target: docker_hosts
Note: Uses host-specific docker_appdata variable for non-standard paths


deploy-utils.yml

Deploys standardized utils stack to a new host.

ansible-playbook playbooks/deploy-utils.yml --limit new-host

Target: docker_hosts
Note: Creates directory structure and .env file only; compose file must be added separately


Scripts Reference

ssh-manager.sh

Launches a tmux session with SSH connections to all hosts.

~/scripts/ssh-manager.sh

Features:

  • Window 0: Control (local shell)
  • Windows 1-13: Individual host SSH sessions
  • Final window: Multi-View (all hosts in split panes)

Navigation:

  • Ctrl+b then window number to switch
  • Ctrl+b d to detach (keeps session running)
  • tmux attach -t cluster to reattach

control-menu.sh

Interactive menu for common operations.

~/scripts/control-menu.sh

Menu Options:

[1] Ping All        - Test connectivity
[2] Check Status    - Disk/memory/containers
[3] Update All      - apt upgrade docker hosts
[4] Docker Prune    - Clean unused resources
[5] Restart Utils   - Restart utils stack everywhere

[A] Ad-hoc Command  - Run custom command
[I] Inventory       - Show host list
[S] SSH Manager     - Launch tmux session

[Q] Quit

add-host.sh

Wizard for onboarding new hosts.

~/scripts/add-host.sh

Steps:

  1. Prompts for hostname, IP, user, port, description
  2. Tests SSH connectivity
  3. Copies SSH key if needed
  4. Adds to ~/.ssh/config
  5. Adds to ~/.ssh/tmux-hosts.conf

Note: Ansible inventory must be edited manually.


Common Operations

SSH to a Specific Host

ssh replicant
ssh databases
ssh nas  # Uses port 44822 automatically

Run Command on All Docker Hosts

cd ~/clustered-fucks
ansible docker_hosts -m shell -a "docker ps -q | wc -l"

Run Command on Specific Host

ansible replicant -m shell -a "df -h"

Copy File to All Hosts

ansible docker_hosts -m copy -a "src=/path/to/file dest=/path/to/dest"

Check Specific Service

ansible docker_hosts -m shell -a "docker ps --filter name=watchtower --format '{{.Status}}'"

View Ansible Inventory

ansible-inventory --graph
ansible-inventory --list

Git Workflow

Repository Location

Standard Workflow

cd ~/clustered-fucks

# Make changes to playbooks/inventory
vim playbooks/new-playbook.yml

# Commit and push
git add -A
git commit -m "Add new playbook"
git push origin main

Pull Latest Changes

cd ~/clustered-fucks
git pull origin main

Adding a New Host

1. Run Onboarding Script

~/scripts/add-host.sh

2. Edit Ansible Inventory

vim ~/clustered-fucks/inventory/hosts.yml

Add under appropriate group:

    new-host:
      ansible_host: 192.168.1.XXX
      ansible_user: root

If non-standard appdata path:

    new-host:
      ansible_host: 192.168.1.XXX
      ansible_user: root
      docker_appdata: /custom/path/appdata

3. Test Connection

ansible new-host -m ping

4. Commit Changes

cd ~/clustered-fucks
git add -A
git commit -m "Add new-host to inventory"
git push origin main

Troubleshooting

SSH Connection Refused

# Check if SSH is running on target
ssh -v hostname

# If connection refused, access via Proxmox console:
# For LXC: pct enter <CT_ID>
# For VM: qm terminal <VM_ID>

# Inside container/VM:
apt install openssh-server
systemctl enable ssh
systemctl start ssh

SSH Permission Denied

# Check key is in authorized_keys on target
ssh-copy-id hostname

# If still failing, check permissions on target:
# (via Proxmox console)
chmod 700 ~
chmod 700 ~/.ssh
chmod 600 ~/.ssh/authorized_keys
chown -R root:root ~/.ssh  # or appropriate user

Ansible "Missing sudo password"

The host is configured with ansible_become: yes but no password is set.

Fix: Either remove ansible_become: yes from inventory, or set up passwordless sudo on target:

echo "username ALL=(ALL) NOPASSWD:ALL" >> /etc/sudoers.d/username

Playbook Skips Host

Check if host is in the correct group:

ansible-inventory --graph

Check host variables:

ansible-inventory --host hostname

Docker Command Not Found

Host is in docker_hosts but doesn't have Docker. Move to infrastructure group:

    infrastructure:
      hosts:
        hostname:
          ansible_host: 192.168.1.XXX

Non-Standard Configurations

Hosts with Different Appdata Paths

Host Path
replicant /home/maddox/docker/appdata
docker666 /root/docker/appdata
All others /home/docker/appdata

These are handled via docker_appdata variable in inventory.

Hosts with Non-Standard SSH

Host Port User
nas 44822 maddox

Configured in both ~/.ssh/config and inventory/hosts.yml.

Hosts Without Utils Stack

Host Reason
tailscale-home Only runs Headscale, no utils needed
dns-lxc No Docker installed

Maintenance

Update Ansible

sudo apt update
sudo apt upgrade ansible

Regenerate SSH Keys (if compromised)

# Generate new key
ssh-keygen -t ed25519 -N "" -f ~/.ssh/id_ed25519

# Distribute to all hosts (will prompt for passwords)
for host in pve2 pve-dell replicant databases immich media-transcode network-services download-stack docker666 tailscale-home dns-lxc alien; do
  ssh-copy-id $host
done

# NAS requires special handling
ssh-copy-id -p 44822 maddox@192.168.1.251

Backup Configuration

cd ~/clustered-fucks
git add -A
git commit -m "Backup: $(date +%Y-%m-%d)"
git push origin main

Reference Files

~/.ssh/config

Host *
    StrictHostKeyChecking accept-new
    ServerAliveInterval 60
    ServerAliveCountMax 3

Host pve2
    HostName 192.168.1.3
    User root

Host pve-dell
    HostName 192.168.1.4
    User root

Host replicant
    HostName 192.168.1.80
    User maddox

Host databases
    HostName 192.168.1.81
    User root

Host immich
    HostName 192.168.1.82
    User root

Host media-transcode
    HostName 192.168.1.120
    User root

Host network-services
    HostName 192.168.1.121
    User root

Host download-stack
    HostName 192.168.1.122
    User root

Host docker666
    HostName 192.168.1.123
    User root

Host tailscale-home
    HostName 192.168.1.124
    User root

Host dns-lxc
    HostName 192.168.1.125
    User root

Host nas
    HostName 192.168.1.251
    User maddox
    Port 44822

Host alien
    HostName 192.168.1.252
    User maddox

~/clustered-fucks/ansible.cfg

[defaults]
inventory = inventory/hosts.yml
remote_user = root
host_key_checking = False
retry_files_enabled = False
gathering = smart
fact_caching = jsonfile
fact_caching_connection = /tmp/ansible_facts
fact_caching_timeout = 86400
stdout_callback = yaml
forks = 10

[privilege_escalation]
become = False

[ssh_connection]
pipelining = True
ssh_args = -o ControlMaster=auto -o ControlPersist=60s

Changelog

Date Change
2026-01-23 Initial deployment, all hosts connected, playbooks tested