This commit is contained in:
@@ -0,0 +1,144 @@
|
||||
# Lab Cheatsheet
|
||||
|
||||
Quick-reference notes for experiments, rebuilds, and short-lived troubleshooting. Expect rough edges. Capture what worked, what broke, and what should not be repeated in production.
|
||||
|
||||
## K3s Lab
|
||||
|
||||
```bash
|
||||
sudo systemctl status k3s --no-pager
|
||||
sudo journalctl -u k3s -n 100 --no-pager
|
||||
kubectl get nodes -o wide
|
||||
kubectl get pods -A
|
||||
kubectl get events -A --sort-by=.lastTimestamp | tail -30
|
||||
sudo k3s kubectl get pods -A
|
||||
```
|
||||
|
||||
Quick reset:
|
||||
|
||||
```bash
|
||||
sudo /usr/local/bin/k3s-uninstall.sh # destructive lab reset
|
||||
```
|
||||
|
||||
## Proxmox Lab
|
||||
|
||||
```bash
|
||||
pvesh get /nodes
|
||||
pvesh get /cluster/resources
|
||||
qm list
|
||||
qm config <vmid>
|
||||
pct list
|
||||
ha-manager status
|
||||
```
|
||||
|
||||
Checks before changes:
|
||||
|
||||
```bash
|
||||
zpool status
|
||||
pvesm status
|
||||
ip -br addr
|
||||
```
|
||||
|
||||
## GPU Passthrough
|
||||
|
||||
```bash
|
||||
lspci -nn | grep -Ei 'vga|3d|nvidia'
|
||||
nvidia-smi
|
||||
dmesg -T | grep -Ei 'vfio|iommu|nvidia'
|
||||
find /sys/kernel/iommu_groups/ -type l | sort
|
||||
```
|
||||
|
||||
Good sanity check:
|
||||
|
||||
```bash
|
||||
lsmod | grep -E 'vfio|kvm'
|
||||
```
|
||||
|
||||
## Terraform Experiments
|
||||
|
||||
```bash
|
||||
terraform fmt -recursive
|
||||
terraform init
|
||||
terraform validate
|
||||
terraform plan
|
||||
terraform state list
|
||||
```
|
||||
|
||||
Scratch workflow:
|
||||
|
||||
```bash
|
||||
terraform plan -out=tfplan
|
||||
terraform show tfplan
|
||||
```
|
||||
|
||||
## Networking Labs
|
||||
|
||||
```bash
|
||||
ip -br addr
|
||||
ip route
|
||||
bridge link
|
||||
ss -ltnp
|
||||
tcpdump -ni any port 53
|
||||
dig +short example.com
|
||||
mtr -rwzc 10 1.1.1.1
|
||||
```
|
||||
|
||||
## Ansible Testing
|
||||
|
||||
```bash
|
||||
ansible-inventory -i inventory/hosts.yml --graph
|
||||
ansible-playbook -i inventory/hosts.yml playbook.yml --syntax-check
|
||||
ansible-playbook -i inventory/hosts.yml playbook.yml --check --diff
|
||||
ansible all -i inventory/hosts.yml -m ping
|
||||
```
|
||||
|
||||
## Docker Testing
|
||||
|
||||
```bash
|
||||
docker ps -a
|
||||
docker logs --tail 100 <container>
|
||||
docker exec -it <container> sh
|
||||
docker inspect <container> | jq '.[0].NetworkSettings'
|
||||
docker system df
|
||||
```
|
||||
|
||||
## Useful Temporary Commands
|
||||
|
||||
```bash
|
||||
watch -n2 'kubectl get pods -A'
|
||||
watch -n2 'nvidia-smi'
|
||||
watch -n2 'ip -br addr'
|
||||
while true; do date -u; curl -fsS http://127.0.0.1:8080/health; sleep 2; done
|
||||
```
|
||||
|
||||
## Quick PoC Commands
|
||||
|
||||
```bash
|
||||
python3 -m http.server 8080
|
||||
openssl req -x509 -newkey rsa:2048 -nodes -days 3 -keyout key.pem -out cert.pem
|
||||
curl -vk https://127.0.0.1:8443/
|
||||
nc -lvkp 9000
|
||||
```
|
||||
|
||||
## Troubleshooting Notes
|
||||
|
||||
- If K3s pods fail after host reboot, check time sync before chasing cert or API errors.
|
||||
- If PVCs stay pending in lab clusters, inspect the default storage class first.
|
||||
- If Docker networking looks broken, compare bridge subnet overlaps with the host route table.
|
||||
- If GPU pods see no devices, validate driver, toolkit, and device plugin in that order.
|
||||
|
||||
## Useful One-liners
|
||||
|
||||
```bash
|
||||
kubectl get pods -A -o wide | egrep 'CrashLoopBackOff|Error|Pending'
|
||||
journalctl -p err -S today
|
||||
find /var/log -type f -mtime -1 -ls | sort -k7,7n
|
||||
ps -eo pid,%cpu,%mem,cmd --sort=-%cpu | head
|
||||
grep -RniE 'error|failed|timeout' .
|
||||
```
|
||||
|
||||
## Things Worth Remembering
|
||||
|
||||
- Pre-checks still matter in labs. Capture state before trying the risky thing.
|
||||
- Keep a copy of working configs before rapid iteration.
|
||||
- Short-lived labs still produce useful evidence; save command output when a fix works.
|
||||
- If a PoC needs repeated manual repair, turn the repair steps into a script or note.
|
||||
Reference in New Issue
Block a user