Add control server operations guide

This commit is contained in:
Maddox 2026-01-23 18:47:58 +00:00
parent 14f6348bf4
commit 4cb3a41f1c

View file

@ -0,0 +1,560 @@
# Control Server Operations Guide
**Host:** control (CT 127)
**IP:** 192.168.1.127
**Location:** pve2
**User:** maddox
**Last Updated:** January 23, 2026
---
## Overview
The control server is the centralized command center for managing the Proxmox cluster infrastructure. It provides:
- **Passwordless SSH** to all 13 managed hosts
- **Ansible automation** for cluster-wide operations
- **tmux sessions** for multi-host management
- **Git-based configuration** synced to Forgejo
---
## Quick Start
### Launch Interactive Menu
```bash
~/scripts/control-menu.sh
```
### Launch Multi-Host SSH Session
```bash
~/scripts/ssh-manager.sh
```
### Run Ansible Ad-Hoc Command
```bash
cd ~/clustered-fucks
ansible all -m ping
ansible docker_hosts -m shell -a "docker ps --format 'table {{.Names}}\t{{.Status}}'"
```
---
## Directory Structure
```
/home/maddox/
├── .ssh/
│ ├── config # SSH host definitions
│ ├── tmux-hosts.conf # tmux session configuration
│ ├── id_ed25519 # SSH private key
│ └── id_ed25519.pub # SSH public key (add to new hosts)
├── clustered-fucks/ # Git repo (synced to Forgejo)
│ ├── ansible.cfg # Ansible configuration
│ ├── inventory/
│ │ ├── hosts.yml # Host inventory
│ │ └── group_vars/
│ │ └── all.yml # Global variables
│ └── playbooks/
│ ├── check-status.yml
│ ├── docker-prune.yml
│ ├── restart-utils.yml
│ ├── update-all.yml
│ └── deploy-utils.yml
└── scripts/
├── ssh-manager.sh # tmux multi-host launcher
├── control-menu.sh # Interactive Ansible menu
└── add-host.sh # New host onboarding
```
---
## Managed Hosts
| Host | IP | User | Port | Type | Group |
|------|-----|------|------|------|-------|
| pve2 | .3 | root | 22 | Proxmox | proxmox_nodes |
| pve-dell | .4 | root | 22 | Proxmox | proxmox_nodes |
| replicant | .80 | maddox | 22 | VM | docker_hosts |
| databases | .81 | root | 22 | VM | docker_hosts |
| immich | .82 | root | 22 | VM | docker_hosts |
| media-transcode | .120 | root | 22 | LXC | docker_hosts |
| network-services | .121 | root | 22 | LXC | docker_hosts |
| download-stack | .122 | root | 22 | LXC | docker_hosts |
| docker666 | .123 | root | 22 | LXC | docker_hosts |
| tailscale-home | .124 | root | 22 | LXC | docker_hosts |
| dns-lxc | .125 | root | 22 | LXC | infrastructure |
| nas | .251 | maddox | 44822 | NAS | legacy |
| alien | .252 | maddox | 22 | Docker | legacy |
---
## Ansible Host Groups
| Group | Members | Use Case |
|-------|---------|----------|
| `all` | All 13 hosts | Connectivity tests |
| `docker_hosts` | 8 hosts | Docker operations |
| `all_managed` | 11 hosts | System updates |
| `proxmox_nodes` | pve2, pve-dell | Node-level ops |
| `infrastructure` | dns-lxc | Non-Docker infra |
| `legacy` | nas, alien | Manual operations |
| `vms` | replicant, databases, immich | VM-specific |
| `lxcs` | 6 LXC containers | LXC-specific |
---
## Playbooks Reference
### check-status.yml
Reports disk usage, memory usage, and container counts.
```bash
ansible-playbook playbooks/check-status.yml
```
**Target:** all_managed
**Output:** Per-host status line (Disk=X% Mem=X% Containers=X)
---
### update-all.yml
Runs apt update and upgrade on all Docker hosts.
```bash
ansible-playbook playbooks/update-all.yml
# With reboot if required:
ansible-playbook playbooks/update-all.yml -e "reboot=true"
```
**Target:** docker_hosts
**Note:** Checks for reboot requirement, notifies but doesn't auto-reboot unless `-e "reboot=true"`
---
### docker-prune.yml
Cleans unused Docker resources (images, networks, build cache).
```bash
ansible-playbook playbooks/docker-prune.yml
```
**Target:** docker_hosts
**Note:** dns-lxc will fail (no Docker) - this is expected
---
### restart-utils.yml
Restarts the utils stack (watchtower, autoheal, docker-proxy) on all hosts.
```bash
ansible-playbook playbooks/restart-utils.yml
```
**Target:** docker_hosts
**Note:** Uses host-specific `docker_appdata` variable for non-standard paths
---
### deploy-utils.yml
Deploys standardized utils stack to a new host.
```bash
ansible-playbook playbooks/deploy-utils.yml --limit new-host
```
**Target:** docker_hosts
**Note:** Creates directory structure and .env file only; compose file must be added separately
---
## Scripts Reference
### ssh-manager.sh
Launches a tmux session with SSH connections to all hosts.
```bash
~/scripts/ssh-manager.sh
```
**Features:**
- Window 0: Control (local shell)
- Windows 1-13: Individual host SSH sessions
- Final window: Multi-View (all hosts in split panes)
**Navigation:**
- `Ctrl+b` then window number to switch
- `Ctrl+b d` to detach (keeps session running)
- `tmux attach -t cluster` to reattach
---
### control-menu.sh
Interactive menu for common operations.
```bash
~/scripts/control-menu.sh
```
**Menu Options:**
```
[1] Ping All - Test connectivity
[2] Check Status - Disk/memory/containers
[3] Update All - apt upgrade docker hosts
[4] Docker Prune - Clean unused resources
[5] Restart Utils - Restart utils stack everywhere
[A] Ad-hoc Command - Run custom command
[I] Inventory - Show host list
[S] SSH Manager - Launch tmux session
[Q] Quit
```
---
### add-host.sh
Wizard for onboarding new hosts.
```bash
~/scripts/add-host.sh
```
**Steps:**
1. Prompts for hostname, IP, user, port, description
2. Tests SSH connectivity
3. Copies SSH key if needed
4. Adds to `~/.ssh/config`
5. Adds to `~/.ssh/tmux-hosts.conf`
**Note:** Ansible inventory must be edited manually.
---
## Common Operations
### SSH to a Specific Host
```bash
ssh replicant
ssh databases
ssh nas # Uses port 44822 automatically
```
### Run Command on All Docker Hosts
```bash
cd ~/clustered-fucks
ansible docker_hosts -m shell -a "docker ps -q | wc -l"
```
### Run Command on Specific Host
```bash
ansible replicant -m shell -a "df -h"
```
### Copy File to All Hosts
```bash
ansible docker_hosts -m copy -a "src=/path/to/file dest=/path/to/dest"
```
### Check Specific Service
```bash
ansible docker_hosts -m shell -a "docker ps --filter name=watchtower --format '{{.Status}}'"
```
### View Ansible Inventory
```bash
ansible-inventory --graph
ansible-inventory --list
```
---
## Git Workflow
### Repository Location
- **Local:** `~/clustered-fucks/`
- **Remote:** `ssh://git@192.168.1.81:2222/maddox/clustered-fucks.git`
- **Web:** https://git.3ddbrewery.com/maddox/clustered-fucks
### Standard Workflow
```bash
cd ~/clustered-fucks
# Make changes to playbooks/inventory
vim playbooks/new-playbook.yml
# Commit and push
git add -A
git commit -m "Add new playbook"
git push origin main
```
### Pull Latest Changes
```bash
cd ~/clustered-fucks
git pull origin main
```
---
## Adding a New Host
### 1. Run Onboarding Script
```bash
~/scripts/add-host.sh
```
### 2. Edit Ansible Inventory
```bash
vim ~/clustered-fucks/inventory/hosts.yml
```
Add under appropriate group:
```yaml
new-host:
ansible_host: 192.168.1.XXX
ansible_user: root
```
If non-standard appdata path:
```yaml
new-host:
ansible_host: 192.168.1.XXX
ansible_user: root
docker_appdata: /custom/path/appdata
```
### 3. Test Connection
```bash
ansible new-host -m ping
```
### 4. Commit Changes
```bash
cd ~/clustered-fucks
git add -A
git commit -m "Add new-host to inventory"
git push origin main
```
---
## Troubleshooting
### SSH Connection Refused
```bash
# Check if SSH is running on target
ssh -v hostname
# If connection refused, access via Proxmox console:
# For LXC: pct enter <CT_ID>
# For VM: qm terminal <VM_ID>
# Inside container/VM:
apt install openssh-server
systemctl enable ssh
systemctl start ssh
```
### SSH Permission Denied
```bash
# Check key is in authorized_keys on target
ssh-copy-id hostname
# If still failing, check permissions on target:
# (via Proxmox console)
chmod 700 ~
chmod 700 ~/.ssh
chmod 600 ~/.ssh/authorized_keys
chown -R root:root ~/.ssh # or appropriate user
```
### Ansible "Missing sudo password"
The host is configured with `ansible_become: yes` but no password is set.
Fix: Either remove `ansible_become: yes` from inventory, or set up passwordless sudo on target:
```bash
echo "username ALL=(ALL) NOPASSWD:ALL" >> /etc/sudoers.d/username
```
### Playbook Skips Host
Check if host is in the correct group:
```bash
ansible-inventory --graph
```
Check host variables:
```bash
ansible-inventory --host hostname
```
### Docker Command Not Found
Host is in `docker_hosts` but doesn't have Docker. Move to `infrastructure` group:
```yaml
infrastructure:
hosts:
hostname:
ansible_host: 192.168.1.XXX
```
---
## Non-Standard Configurations
### Hosts with Different Appdata Paths
| Host | Path |
|------|------|
| replicant | `/home/maddox/docker/appdata` |
| docker666 | `/root/docker/appdata` |
| All others | `/home/docker/appdata` |
These are handled via `docker_appdata` variable in inventory.
### Hosts with Non-Standard SSH
| Host | Port | User |
|------|------|------|
| nas | 44822 | maddox |
Configured in both `~/.ssh/config` and `inventory/hosts.yml`.
### Hosts Without Utils Stack
| Host | Reason |
|------|--------|
| tailscale-home | Only runs Headscale, no utils needed |
| dns-lxc | No Docker installed |
---
## Maintenance
### Update Ansible
```bash
sudo apt update
sudo apt upgrade ansible
```
### Regenerate SSH Keys (if compromised)
```bash
# Generate new key
ssh-keygen -t ed25519 -N "" -f ~/.ssh/id_ed25519
# Distribute to all hosts (will prompt for passwords)
for host in pve2 pve-dell replicant databases immich media-transcode network-services download-stack docker666 tailscale-home dns-lxc alien; do
ssh-copy-id $host
done
# NAS requires special handling
ssh-copy-id -p 44822 maddox@192.168.1.251
```
### Backup Configuration
```bash
cd ~/clustered-fucks
git add -A
git commit -m "Backup: $(date +%Y-%m-%d)"
git push origin main
```
---
## Reference Files
### ~/.ssh/config
```
Host *
StrictHostKeyChecking accept-new
ServerAliveInterval 60
ServerAliveCountMax 3
Host pve2
HostName 192.168.1.3
User root
Host pve-dell
HostName 192.168.1.4
User root
Host replicant
HostName 192.168.1.80
User maddox
Host databases
HostName 192.168.1.81
User root
Host immich
HostName 192.168.1.82
User root
Host media-transcode
HostName 192.168.1.120
User root
Host network-services
HostName 192.168.1.121
User root
Host download-stack
HostName 192.168.1.122
User root
Host docker666
HostName 192.168.1.123
User root
Host tailscale-home
HostName 192.168.1.124
User root
Host dns-lxc
HostName 192.168.1.125
User root
Host nas
HostName 192.168.1.251
User maddox
Port 44822
Host alien
HostName 192.168.1.252
User maddox
```
### ~/clustered-fucks/ansible.cfg
```ini
[defaults]
inventory = inventory/hosts.yml
remote_user = root
host_key_checking = False
retry_files_enabled = False
gathering = smart
fact_caching = jsonfile
fact_caching_connection = /tmp/ansible_facts
fact_caching_timeout = 86400
stdout_callback = yaml
forks = 10
[privilege_escalation]
become = False
[ssh_connection]
pipelining = True
ssh_args = -o ControlMaster=auto -o ControlPersist=60s
```
---
## Changelog
| Date | Change |
|------|--------|
| 2026-01-23 | Initial deployment, all hosts connected, playbooks tested |