145 lines
3.1 KiB
Markdown
145 lines
3.1 KiB
Markdown
# Lab Cheatsheet
|
|
|
|
Quick-reference notes for experiments, rebuilds, and short-lived troubleshooting. Expect rough edges. Capture what worked, what broke, and what should not be repeated in production.
|
|
|
|
## K3s Lab
|
|
|
|
```bash
|
|
sudo systemctl status k3s --no-pager
|
|
sudo journalctl -u k3s -n 100 --no-pager
|
|
kubectl get nodes -o wide
|
|
kubectl get pods -A
|
|
kubectl get events -A --sort-by=.lastTimestamp | tail -30
|
|
sudo k3s kubectl get pods -A
|
|
```
|
|
|
|
Quick reset:
|
|
|
|
```bash
|
|
sudo /usr/local/bin/k3s-uninstall.sh # destructive lab reset
|
|
```
|
|
|
|
## Proxmox Lab
|
|
|
|
```bash
|
|
pvesh get /nodes
|
|
pvesh get /cluster/resources
|
|
qm list
|
|
qm config <vmid>
|
|
pct list
|
|
ha-manager status
|
|
```
|
|
|
|
Checks before changes:
|
|
|
|
```bash
|
|
zpool status
|
|
pvesm status
|
|
ip -br addr
|
|
```
|
|
|
|
## GPU Passthrough
|
|
|
|
```bash
|
|
lspci -nn | grep -Ei 'vga|3d|nvidia'
|
|
nvidia-smi
|
|
dmesg -T | grep -Ei 'vfio|iommu|nvidia'
|
|
find /sys/kernel/iommu_groups/ -type l | sort
|
|
```
|
|
|
|
Good sanity check:
|
|
|
|
```bash
|
|
lsmod | grep -E 'vfio|kvm'
|
|
```
|
|
|
|
## Terraform Experiments
|
|
|
|
```bash
|
|
terraform fmt -recursive
|
|
terraform init
|
|
terraform validate
|
|
terraform plan
|
|
terraform state list
|
|
```
|
|
|
|
Scratch workflow:
|
|
|
|
```bash
|
|
terraform plan -out=tfplan
|
|
terraform show tfplan
|
|
```
|
|
|
|
## Networking Labs
|
|
|
|
```bash
|
|
ip -br addr
|
|
ip route
|
|
bridge link
|
|
ss -ltnp
|
|
tcpdump -ni any port 53
|
|
dig +short example.com
|
|
mtr -rwzc 10 1.1.1.1
|
|
```
|
|
|
|
## Ansible Testing
|
|
|
|
```bash
|
|
ansible-inventory -i inventory/hosts.yml --graph
|
|
ansible-playbook -i inventory/hosts.yml playbook.yml --syntax-check
|
|
ansible-playbook -i inventory/hosts.yml playbook.yml --check --diff
|
|
ansible all -i inventory/hosts.yml -m ping
|
|
```
|
|
|
|
## Docker Testing
|
|
|
|
```bash
|
|
docker ps -a
|
|
docker logs --tail 100 <container>
|
|
docker exec -it <container> sh
|
|
docker inspect <container> | jq '.[0].NetworkSettings'
|
|
docker system df
|
|
```
|
|
|
|
## Useful Temporary Commands
|
|
|
|
```bash
|
|
watch -n2 'kubectl get pods -A'
|
|
watch -n2 'nvidia-smi'
|
|
watch -n2 'ip -br addr'
|
|
while true; do date -u; curl -fsS http://127.0.0.1:8080/health; sleep 2; done
|
|
```
|
|
|
|
## Quick PoC Commands
|
|
|
|
```bash
|
|
python3 -m http.server 8080
|
|
openssl req -x509 -newkey rsa:2048 -nodes -days 3 -keyout key.pem -out cert.pem
|
|
curl -vk https://127.0.0.1:8443/
|
|
nc -lvkp 9000
|
|
```
|
|
|
|
## Troubleshooting Notes
|
|
|
|
- If K3s pods fail after host reboot, check time sync before chasing cert or API errors.
|
|
- If PVCs stay pending in lab clusters, inspect the default storage class first.
|
|
- If Docker networking looks broken, compare bridge subnet overlaps with the host route table.
|
|
- If GPU pods see no devices, validate driver, toolkit, and device plugin in that order.
|
|
|
|
## Useful One-liners
|
|
|
|
```bash
|
|
kubectl get pods -A -o wide | egrep 'CrashLoopBackOff|Error|Pending'
|
|
journalctl -p err -S today
|
|
find /var/log -type f -mtime -1 -ls | sort -k7,7n
|
|
ps -eo pid,%cpu,%mem,cmd --sort=-%cpu | head
|
|
grep -RniE 'error|failed|timeout' .
|
|
```
|
|
|
|
## Things Worth Remembering
|
|
|
|
- Pre-checks still matter in labs. Capture state before trying the risky thing.
|
|
- Keep a copy of working configs before rapid iteration.
|
|
- Short-lived labs still produce useful evidence; save command output when a fix works.
|
|
- If a PoC needs repeated manual repair, turn the repair steps into a script or note.
|