Files
Mateusz Suski 0d3905b8a1
lint / shell-yaml-ansible (push) Failing after 17s
Add operational cheatsheets across repository
2026-05-09 09:41:55 +00:00

3.1 KiB

Lab Cheatsheet

Quick-reference notes for experiments, rebuilds, and short-lived troubleshooting. Expect rough edges. Capture what worked, what broke, and what should not be repeated in production.

K3s Lab

sudo systemctl status k3s --no-pager
sudo journalctl -u k3s -n 100 --no-pager
kubectl get nodes -o wide
kubectl get pods -A
kubectl get events -A --sort-by=.lastTimestamp | tail -30
sudo k3s kubectl get pods -A

Quick reset:

sudo /usr/local/bin/k3s-uninstall.sh   # destructive lab reset

Proxmox Lab

pvesh get /nodes
pvesh get /cluster/resources
qm list
qm config <vmid>
pct list
ha-manager status

Checks before changes:

zpool status
pvesm status
ip -br addr

GPU Passthrough

lspci -nn | grep -Ei 'vga|3d|nvidia'
nvidia-smi
dmesg -T | grep -Ei 'vfio|iommu|nvidia'
find /sys/kernel/iommu_groups/ -type l | sort

Good sanity check:

lsmod | grep -E 'vfio|kvm'

Terraform Experiments

terraform fmt -recursive
terraform init
terraform validate
terraform plan
terraform state list

Scratch workflow:

terraform plan -out=tfplan
terraform show tfplan

Networking Labs

ip -br addr
ip route
bridge link
ss -ltnp
tcpdump -ni any port 53
dig +short example.com
mtr -rwzc 10 1.1.1.1

Ansible Testing

ansible-inventory -i inventory/hosts.yml --graph
ansible-playbook -i inventory/hosts.yml playbook.yml --syntax-check
ansible-playbook -i inventory/hosts.yml playbook.yml --check --diff
ansible all -i inventory/hosts.yml -m ping

Docker Testing

docker ps -a
docker logs --tail 100 <container>
docker exec -it <container> sh
docker inspect <container> | jq '.[0].NetworkSettings'
docker system df

Useful Temporary Commands

watch -n2 'kubectl get pods -A'
watch -n2 'nvidia-smi'
watch -n2 'ip -br addr'
while true; do date -u; curl -fsS http://127.0.0.1:8080/health; sleep 2; done

Quick PoC Commands

python3 -m http.server 8080
openssl req -x509 -newkey rsa:2048 -nodes -days 3 -keyout key.pem -out cert.pem
curl -vk https://127.0.0.1:8443/
nc -lvkp 9000

Troubleshooting Notes

  • If K3s pods fail after host reboot, check time sync before chasing cert or API errors.
  • If PVCs stay pending in lab clusters, inspect the default storage class first.
  • If Docker networking looks broken, compare bridge subnet overlaps with the host route table.
  • If GPU pods see no devices, validate driver, toolkit, and device plugin in that order.

Useful One-liners

kubectl get pods -A -o wide | egrep 'CrashLoopBackOff|Error|Pending'
journalctl -p err -S today
find /var/log -type f -mtime -1 -ls | sort -k7,7n
ps -eo pid,%cpu,%mem,cmd --sort=-%cpu | head
grep -RniE 'error|failed|timeout' .

Things Worth Remembering

  • Pre-checks still matter in labs. Capture state before trying the risky thing.
  • Keep a copy of working configs before rapid iteration.
  • Short-lived labs still produce useful evidence; save command output when a fix works.
  • If a PoC needs repeated manual repair, turn the repair steps into a script or note.