Add Linux fresh setup toolkit
lint / shell-yaml-ansible (push) Failing after 16s

This commit is contained in:
Mateusz Suski
2026-06-06 00:23:11 +00:00
parent 8cb92de06f
commit 4e739c5c99
25 changed files with 1646 additions and 0 deletions
+276
View File
@@ -0,0 +1,276 @@
# Linux Fresh Setup Toolkit
## Executive summary
The Linux Fresh Setup Toolkit is day-0 bootstrap automation for a clean Ubuntu
lab server or workstation. It prepares a host for routine administration,
Cockpit, Docker workloads, libvirt/KVM virtual machines, optional NVIDIA
diagnostics, bounded logging, practical kernel tuning, and a conservative
security baseline.
The scripts are modular and safe to rerun. Optional components remain optional,
UFW is not enabled without a specific flag, and an NVIDIA driver is never
installed without an explicit version. This is a portfolio and homelab
implementation, not a production-certified build standard.
## Scope and non-goals
The toolkit supports Ubuntu 24.04 and newer and assumes a systemd-based host
with APT package management. It is suitable for a host such as `ailab` that may
run WebODM, Open WebUI, Homepage, NVIDIA workloads, or test virtual machines.
It does not:
- Deploy applications, containers, or virtual machines.
- Configure GPU passthrough, VFIO bindings, bridges, or Windows guests.
- Select an NVIDIA driver automatically.
- Define a complete firewall policy or compliance baseline.
- Replace backup, monitoring, patching, or ongoing maintenance processes.
- Claim live validation against every future Ubuntu release.
## Why this is separate from ailab-maintenance
This project establishes a fresh host. The sibling
[AI Lab Maintenance Toolkit](../ailab-maintenance/) handles day-2 health
checks, scheduled cleanup, configuration backup, disk monitoring, and VM
inventory after a host is operating.
Keeping bootstrap and maintenance separate makes the change boundary clear:
this toolkit installs platform capabilities and baseline configuration, while
the maintenance toolkit manages recurring operational tasks.
## Directory layout
```text
setup/
├── README.md
├── install.sh
├── scripts/
│ ├── 00-preflight.sh
│ ├── 00-platform-guard.inc
│ ├── 01-base-packages.sh
│ ├── 02-shell-profile.sh
│ ├── 03-cockpit.sh
│ ├── 04-docker.sh
│ ├── 05-libvirt.sh
│ ├── 06-nvidia-tools.sh
│ ├── 07-tuning.sh
│ ├── 08-security-baseline.sh
│ └── 99-postcheck.sh
├── files/
│ ├── bashrc.d/ailab.sh
│ ├── docker/daemon.json
│ ├── sysctl/99-ailab.conf
│ └── systemd/journald-ailab-limits.conf
└── docs/
├── fresh-install-checklist.md
├── cockpit.md
├── docker.md
├── libvirt.md
├── nvidia.md
└── bash-shell.md
```
`00-platform-guard.inc` is an internal sourced helper used by mutating
component scripts; it is not an executable profile.
## Supported profiles and flags
| Flag | Result |
| --- | --- |
| `--base` | Install operational CLI, diagnostic, storage, and network packages |
| `--shell` | Install the root AI lab Bash profile |
| `--cockpit` | Install and enable Cockpit |
| `--docker` | Install Docker and bounded JSON-file logging |
| `--libvirt` | Install and enable libvirt/KVM |
| `--nvidia-tools` | Install NVIDIA and OpenCL diagnostics without a driver |
| `--install-nvidia-driver VERSION` | Install diagnostics and the named Ubuntu driver package |
| `--tuning` | Apply journald, sysctl, sensor, and sysstat settings |
| `--security` | Install and enable fail2ban; install but do not enable UFW |
| `--enable-ufw` | Run security setup and explicitly enable UFW |
| `--all` | Run every standard profile without UFW enablement or driver installation |
`--install-nvidia-driver` implies `--nvidia-tools`. `--enable-ufw` implies
`--security`. With no flags, the installer prints help and makes no changes.
## Installation examples
Review the scripts and current host access path before execution:
```bash
cd labs/linux/setup
./install.sh
sudo ./install.sh --base --shell
sudo ./install.sh --cockpit --docker --libvirt
sudo ./install.sh --all
```
Explicit high-impact options can be combined with `--all`:
```bash
sudo ./install.sh --all --enable-ufw
sudo ./install.sh --all --install-nvidia-driver 550
```
The installer runs the read-only preflight once before selected profiles and a
postcheck after all successful profile steps.
## Fresh host workflow
1. Patch the base Ubuntu installation and confirm console or out-of-band access.
2. Review [the fresh install checklist](docs/fresh-install-checklist.md).
3. Run `sudo ./install.sh --base --shell`.
4. Add only the platform profiles needed by the host.
5. Review service state, listening ports, storage, networking, and warnings in
the postcheck.
6. Reboot if a driver or kernel-related package requires it.
7. Capture host-specific configuration and backup requirements separately.
## AI lab workflow
A general AI lab host can start with:
```bash
sudo ./install.sh --base --shell --cockpit --docker --nvidia-tools --tuning --security
```
This installs GPU diagnostics but leaves driver choice to the operator. Add
libvirt only when the host will run VMs. Enable UFW only after confirming SSH,
Cockpit, application, bridge, and VM networking requirements.
## Safety model
- Mutating profiles require root and refuse non-Ubuntu systems or Ubuntu older
than 24.04.
- Component profiles install their own direct prerequisites.
- Existing managed configuration is changed only when content differs.
- Changed root shell, Docker, journald, and sysctl files receive timestamped
backups.
- Existing valid Docker JSON is merged so unrelated settings survive.
- Invalid Docker JSON stops configuration rather than being overwritten.
- UFW and NVIDIA driver installation require explicit flags.
- Package and service failures are not hidden.
- Postcheck warnings report optional or inactive components without masking a
successfully completed diagnostic script.
APT installation and service restarts are real system changes. Test first on a
disposable host and maintain a console path when changing remote access policy.
## Bash shell profile
The shell profile is installed as `/root/.bashrc.d/ailab.sh`, and one exact
source line is maintained in `/root/.bashrc`. It adds concise helpers for
systemd, journals, Docker, libvirt, NVIDIA, ports, archives, and disk usage.
See [Bash shell profile](docs/bash-shell.md) for command details and cautions.
## Cockpit setup
Cockpit provides browser-based host, storage, network, package, VM, metrics,
and support-report views. The installer enables `cockpit.socket` and reports
`https://HOSTNAME:9090`. `cockpit-files` is optional because it is not
available in every enabled Ubuntu repository.
See [Cockpit setup](docs/cockpit.md).
## Docker setup
The Ubuntu `docker.io` package path is preferred. The Docker official
repository is configured only when `docker.io` is unavailable. The daemon uses
the `json-file` log driver with five 50 MB files per container.
The toolkit configures log retention only. It does not prune data, deploy
Compose applications, or configure an NVIDIA container runtime.
See [Docker setup](docs/docker.md).
## libvirt/KVM setup
The libvirt profile installs QEMU, OVMF, software TPM support, virt-install,
virt-manager, bridge utilities, and libvirt clients and services. It enables
`libvirtd` and prints existing guests and networks.
See [libvirt/KVM setup](docs/libvirt.md).
## NVIDIA tooling
The default NVIDIA profile installs `nvtop`, `clinfo`, and PCI diagnostics.
It reports detected NVIDIA devices, `nvidia-smi`, and DKMS state when those
commands exist.
Driver installation requires a numeric version that maps to an available
Ubuntu package, for example `nvidia-driver-550`. Secure Boot enrollment,
driver suitability, CUDA, container runtime support, and passthrough remain
operator decisions.
See [NVIDIA tooling](docs/nvidia.md).
## Tuning
The tuning profile bounds persistent journal use, raises inotify limits for
development and container workloads, reduces swappiness, enables sysstat, and
runs automatic sensor detection when available.
Review these values against available memory, storage, monitoring retention,
and workload behavior before deployment beyond a lab.
## Security baseline
The security profile installs UFW and fail2ban and enables fail2ban. It leaves
UFW disabled unless `--enable-ufw` is present. Explicit UFW enablement permits
OpenSSH and TCP port 9090 before activation.
This is a minimal access-preservation baseline, not a complete host firewall or
hardening standard. Application and VM networking may require additional
reviewed rules.
## Postcheck
The final script reports:
- Failed systemd units.
- Cockpit, Docker, libvirt, and fail2ban status when installed.
- Running Docker containers and defined virtual machines.
- NVIDIA runtime state.
- Filesystem usage and listening ports.
Warnings require operator review but optional component absence does not cause
the postcheck itself to fail.
## Troubleshooting
Run individual read-only checks after correcting a failed profile:
```bash
sudo ./scripts/00-preflight.sh
sudo ./scripts/99-postcheck.sh
systemctl --failed
journalctl -u docker -u libvirtd -u cockpit.socket -u fail2ban
```
Common failure areas are unavailable APT repositories, unsupported package
names on a future Ubuntu release, invalid pre-existing Docker JSON, Secure Boot
module signing, disabled CPU virtualization, and remote firewall assumptions.
To roll back a managed configuration, compare the current file with its
timestamped `.bak` copy, restore the reviewed backup, and restart or reload the
owning service. Package removal is intentionally not automated because it may
affect workloads and dependencies.
## Interview talking points
- Why day-0 bootstrap and day-2 maintenance have separate ownership.
- How explicit flags protect firewall and GPU driver decisions.
- Why Docker JSON is validated, backed up, and merged.
- How idempotent content checks prevent backup and restart churn.
- Why preflight and postcheck evidence surround mutating profiles.
- Which virtualization, Secure Boot, IOMMU, and GPU decisions remain manual.
## Future improvements
- Add automated tests using disposable Ubuntu VMs.
- Add a documented NVIDIA Container Toolkit profile.
- Add optional non-root administrative user and group membership management.
- Add bridge and VFIO planning checks without applying passthrough changes.
- Add package compatibility matrices after validating future Ubuntu releases.
- Export postcheck results in a structured format for evidence collection.