277 lines
10 KiB
Markdown
277 lines
10 KiB
Markdown
|
|
# Linux Fresh Setup Toolkit
|
||
|
|
|
||
|
|
## Executive summary
|
||
|
|
|
||
|
|
The Linux Fresh Setup Toolkit is day-0 bootstrap automation for a clean Ubuntu
|
||
|
|
lab server or workstation. It prepares a host for routine administration,
|
||
|
|
Cockpit, Docker workloads, libvirt/KVM virtual machines, optional NVIDIA
|
||
|
|
diagnostics, bounded logging, practical kernel tuning, and a conservative
|
||
|
|
security baseline.
|
||
|
|
|
||
|
|
The scripts are modular and safe to rerun. Optional components remain optional,
|
||
|
|
UFW is not enabled without a specific flag, and an NVIDIA driver is never
|
||
|
|
installed without an explicit version. This is a portfolio and homelab
|
||
|
|
implementation, not a production-certified build standard.
|
||
|
|
|
||
|
|
## Scope and non-goals
|
||
|
|
|
||
|
|
The toolkit supports Ubuntu 24.04 and newer and assumes a systemd-based host
|
||
|
|
with APT package management. It is suitable for a host such as `ailab` that may
|
||
|
|
run WebODM, Open WebUI, Homepage, NVIDIA workloads, or test virtual machines.
|
||
|
|
|
||
|
|
It does not:
|
||
|
|
|
||
|
|
- Deploy applications, containers, or virtual machines.
|
||
|
|
- Configure GPU passthrough, VFIO bindings, bridges, or Windows guests.
|
||
|
|
- Select an NVIDIA driver automatically.
|
||
|
|
- Define a complete firewall policy or compliance baseline.
|
||
|
|
- Replace backup, monitoring, patching, or ongoing maintenance processes.
|
||
|
|
- Claim live validation against every future Ubuntu release.
|
||
|
|
|
||
|
|
## Why this is separate from ailab-maintenance
|
||
|
|
|
||
|
|
This project establishes a fresh host. The sibling
|
||
|
|
[AI Lab Maintenance Toolkit](../ailab-maintenance/) handles day-2 health
|
||
|
|
checks, scheduled cleanup, configuration backup, disk monitoring, and VM
|
||
|
|
inventory after a host is operating.
|
||
|
|
|
||
|
|
Keeping bootstrap and maintenance separate makes the change boundary clear:
|
||
|
|
this toolkit installs platform capabilities and baseline configuration, while
|
||
|
|
the maintenance toolkit manages recurring operational tasks.
|
||
|
|
|
||
|
|
## Directory layout
|
||
|
|
|
||
|
|
```text
|
||
|
|
setup/
|
||
|
|
├── README.md
|
||
|
|
├── install.sh
|
||
|
|
├── scripts/
|
||
|
|
│ ├── 00-preflight.sh
|
||
|
|
│ ├── 00-platform-guard.inc
|
||
|
|
│ ├── 01-base-packages.sh
|
||
|
|
│ ├── 02-shell-profile.sh
|
||
|
|
│ ├── 03-cockpit.sh
|
||
|
|
│ ├── 04-docker.sh
|
||
|
|
│ ├── 05-libvirt.sh
|
||
|
|
│ ├── 06-nvidia-tools.sh
|
||
|
|
│ ├── 07-tuning.sh
|
||
|
|
│ ├── 08-security-baseline.sh
|
||
|
|
│ └── 99-postcheck.sh
|
||
|
|
├── files/
|
||
|
|
│ ├── bashrc.d/ailab.sh
|
||
|
|
│ ├── docker/daemon.json
|
||
|
|
│ ├── sysctl/99-ailab.conf
|
||
|
|
│ └── systemd/journald-ailab-limits.conf
|
||
|
|
└── docs/
|
||
|
|
├── fresh-install-checklist.md
|
||
|
|
├── cockpit.md
|
||
|
|
├── docker.md
|
||
|
|
├── libvirt.md
|
||
|
|
├── nvidia.md
|
||
|
|
└── bash-shell.md
|
||
|
|
```
|
||
|
|
|
||
|
|
`00-platform-guard.inc` is an internal sourced helper used by mutating
|
||
|
|
component scripts; it is not an executable profile.
|
||
|
|
|
||
|
|
## Supported profiles and flags
|
||
|
|
|
||
|
|
| Flag | Result |
|
||
|
|
| --- | --- |
|
||
|
|
| `--base` | Install operational CLI, diagnostic, storage, and network packages |
|
||
|
|
| `--shell` | Install the root AI lab Bash profile |
|
||
|
|
| `--cockpit` | Install and enable Cockpit |
|
||
|
|
| `--docker` | Install Docker and bounded JSON-file logging |
|
||
|
|
| `--libvirt` | Install and enable libvirt/KVM |
|
||
|
|
| `--nvidia-tools` | Install NVIDIA and OpenCL diagnostics without a driver |
|
||
|
|
| `--install-nvidia-driver VERSION` | Install diagnostics and the named Ubuntu driver package |
|
||
|
|
| `--tuning` | Apply journald, sysctl, sensor, and sysstat settings |
|
||
|
|
| `--security` | Install and enable fail2ban; install but do not enable UFW |
|
||
|
|
| `--enable-ufw` | Run security setup and explicitly enable UFW |
|
||
|
|
| `--all` | Run every standard profile without UFW enablement or driver installation |
|
||
|
|
|
||
|
|
`--install-nvidia-driver` implies `--nvidia-tools`. `--enable-ufw` implies
|
||
|
|
`--security`. With no flags, the installer prints help and makes no changes.
|
||
|
|
|
||
|
|
## Installation examples
|
||
|
|
|
||
|
|
Review the scripts and current host access path before execution:
|
||
|
|
|
||
|
|
```bash
|
||
|
|
cd labs/linux/setup
|
||
|
|
./install.sh
|
||
|
|
sudo ./install.sh --base --shell
|
||
|
|
sudo ./install.sh --cockpit --docker --libvirt
|
||
|
|
sudo ./install.sh --all
|
||
|
|
```
|
||
|
|
|
||
|
|
Explicit high-impact options can be combined with `--all`:
|
||
|
|
|
||
|
|
```bash
|
||
|
|
sudo ./install.sh --all --enable-ufw
|
||
|
|
sudo ./install.sh --all --install-nvidia-driver 550
|
||
|
|
```
|
||
|
|
|
||
|
|
The installer runs the read-only preflight once before selected profiles and a
|
||
|
|
postcheck after all successful profile steps.
|
||
|
|
|
||
|
|
## Fresh host workflow
|
||
|
|
|
||
|
|
1. Patch the base Ubuntu installation and confirm console or out-of-band access.
|
||
|
|
2. Review [the fresh install checklist](docs/fresh-install-checklist.md).
|
||
|
|
3. Run `sudo ./install.sh --base --shell`.
|
||
|
|
4. Add only the platform profiles needed by the host.
|
||
|
|
5. Review service state, listening ports, storage, networking, and warnings in
|
||
|
|
the postcheck.
|
||
|
|
6. Reboot if a driver or kernel-related package requires it.
|
||
|
|
7. Capture host-specific configuration and backup requirements separately.
|
||
|
|
|
||
|
|
## AI lab workflow
|
||
|
|
|
||
|
|
A general AI lab host can start with:
|
||
|
|
|
||
|
|
```bash
|
||
|
|
sudo ./install.sh --base --shell --cockpit --docker --nvidia-tools --tuning --security
|
||
|
|
```
|
||
|
|
|
||
|
|
This installs GPU diagnostics but leaves driver choice to the operator. Add
|
||
|
|
libvirt only when the host will run VMs. Enable UFW only after confirming SSH,
|
||
|
|
Cockpit, application, bridge, and VM networking requirements.
|
||
|
|
|
||
|
|
## Safety model
|
||
|
|
|
||
|
|
- Mutating profiles require root and refuse non-Ubuntu systems or Ubuntu older
|
||
|
|
than 24.04.
|
||
|
|
- Component profiles install their own direct prerequisites.
|
||
|
|
- Existing managed configuration is changed only when content differs.
|
||
|
|
- Changed root shell, Docker, journald, and sysctl files receive timestamped
|
||
|
|
backups.
|
||
|
|
- Existing valid Docker JSON is merged so unrelated settings survive.
|
||
|
|
- Invalid Docker JSON stops configuration rather than being overwritten.
|
||
|
|
- UFW and NVIDIA driver installation require explicit flags.
|
||
|
|
- Package and service failures are not hidden.
|
||
|
|
- Postcheck warnings report optional or inactive components without masking a
|
||
|
|
successfully completed diagnostic script.
|
||
|
|
|
||
|
|
APT installation and service restarts are real system changes. Test first on a
|
||
|
|
disposable host and maintain a console path when changing remote access policy.
|
||
|
|
|
||
|
|
## Bash shell profile
|
||
|
|
|
||
|
|
The shell profile is installed as `/root/.bashrc.d/ailab.sh`, and one exact
|
||
|
|
source line is maintained in `/root/.bashrc`. It adds concise helpers for
|
||
|
|
systemd, journals, Docker, libvirt, NVIDIA, ports, archives, and disk usage.
|
||
|
|
|
||
|
|
See [Bash shell profile](docs/bash-shell.md) for command details and cautions.
|
||
|
|
|
||
|
|
## Cockpit setup
|
||
|
|
|
||
|
|
Cockpit provides browser-based host, storage, network, package, VM, metrics,
|
||
|
|
and support-report views. The installer enables `cockpit.socket` and reports
|
||
|
|
`https://HOSTNAME:9090`. `cockpit-files` is optional because it is not
|
||
|
|
available in every enabled Ubuntu repository.
|
||
|
|
|
||
|
|
See [Cockpit setup](docs/cockpit.md).
|
||
|
|
|
||
|
|
## Docker setup
|
||
|
|
|
||
|
|
The Ubuntu `docker.io` package path is preferred. The Docker official
|
||
|
|
repository is configured only when `docker.io` is unavailable. The daemon uses
|
||
|
|
the `json-file` log driver with five 50 MB files per container.
|
||
|
|
|
||
|
|
The toolkit configures log retention only. It does not prune data, deploy
|
||
|
|
Compose applications, or configure an NVIDIA container runtime.
|
||
|
|
|
||
|
|
See [Docker setup](docs/docker.md).
|
||
|
|
|
||
|
|
## libvirt/KVM setup
|
||
|
|
|
||
|
|
The libvirt profile installs QEMU, OVMF, software TPM support, virt-install,
|
||
|
|
virt-manager, bridge utilities, and libvirt clients and services. It enables
|
||
|
|
`libvirtd` and prints existing guests and networks.
|
||
|
|
|
||
|
|
See [libvirt/KVM setup](docs/libvirt.md).
|
||
|
|
|
||
|
|
## NVIDIA tooling
|
||
|
|
|
||
|
|
The default NVIDIA profile installs `nvtop`, `clinfo`, and PCI diagnostics.
|
||
|
|
It reports detected NVIDIA devices, `nvidia-smi`, and DKMS state when those
|
||
|
|
commands exist.
|
||
|
|
|
||
|
|
Driver installation requires a numeric version that maps to an available
|
||
|
|
Ubuntu package, for example `nvidia-driver-550`. Secure Boot enrollment,
|
||
|
|
driver suitability, CUDA, container runtime support, and passthrough remain
|
||
|
|
operator decisions.
|
||
|
|
|
||
|
|
See [NVIDIA tooling](docs/nvidia.md).
|
||
|
|
|
||
|
|
## Tuning
|
||
|
|
|
||
|
|
The tuning profile bounds persistent journal use, raises inotify limits for
|
||
|
|
development and container workloads, reduces swappiness, enables sysstat, and
|
||
|
|
runs automatic sensor detection when available.
|
||
|
|
|
||
|
|
Review these values against available memory, storage, monitoring retention,
|
||
|
|
and workload behavior before deployment beyond a lab.
|
||
|
|
|
||
|
|
## Security baseline
|
||
|
|
|
||
|
|
The security profile installs UFW and fail2ban and enables fail2ban. It leaves
|
||
|
|
UFW disabled unless `--enable-ufw` is present. Explicit UFW enablement permits
|
||
|
|
OpenSSH and TCP port 9090 before activation.
|
||
|
|
|
||
|
|
This is a minimal access-preservation baseline, not a complete host firewall or
|
||
|
|
hardening standard. Application and VM networking may require additional
|
||
|
|
reviewed rules.
|
||
|
|
|
||
|
|
## Postcheck
|
||
|
|
|
||
|
|
The final script reports:
|
||
|
|
|
||
|
|
- Failed systemd units.
|
||
|
|
- Cockpit, Docker, libvirt, and fail2ban status when installed.
|
||
|
|
- Running Docker containers and defined virtual machines.
|
||
|
|
- NVIDIA runtime state.
|
||
|
|
- Filesystem usage and listening ports.
|
||
|
|
|
||
|
|
Warnings require operator review but optional component absence does not cause
|
||
|
|
the postcheck itself to fail.
|
||
|
|
|
||
|
|
## Troubleshooting
|
||
|
|
|
||
|
|
Run individual read-only checks after correcting a failed profile:
|
||
|
|
|
||
|
|
```bash
|
||
|
|
sudo ./scripts/00-preflight.sh
|
||
|
|
sudo ./scripts/99-postcheck.sh
|
||
|
|
systemctl --failed
|
||
|
|
journalctl -u docker -u libvirtd -u cockpit.socket -u fail2ban
|
||
|
|
```
|
||
|
|
|
||
|
|
Common failure areas are unavailable APT repositories, unsupported package
|
||
|
|
names on a future Ubuntu release, invalid pre-existing Docker JSON, Secure Boot
|
||
|
|
module signing, disabled CPU virtualization, and remote firewall assumptions.
|
||
|
|
|
||
|
|
To roll back a managed configuration, compare the current file with its
|
||
|
|
timestamped `.bak` copy, restore the reviewed backup, and restart or reload the
|
||
|
|
owning service. Package removal is intentionally not automated because it may
|
||
|
|
affect workloads and dependencies.
|
||
|
|
|
||
|
|
## Interview talking points
|
||
|
|
|
||
|
|
- Why day-0 bootstrap and day-2 maintenance have separate ownership.
|
||
|
|
- How explicit flags protect firewall and GPU driver decisions.
|
||
|
|
- Why Docker JSON is validated, backed up, and merged.
|
||
|
|
- How idempotent content checks prevent backup and restart churn.
|
||
|
|
- Why preflight and postcheck evidence surround mutating profiles.
|
||
|
|
- Which virtualization, Secure Boot, IOMMU, and GPU decisions remain manual.
|
||
|
|
|
||
|
|
## Future improvements
|
||
|
|
|
||
|
|
- Add automated tests using disposable Ubuntu VMs.
|
||
|
|
- Add a documented NVIDIA Container Toolkit profile.
|
||
|
|
- Add optional non-root administrative user and group membership management.
|
||
|
|
- Add bridge and VFIO planning checks without applying passthrough changes.
|
||
|
|
- Add package compatibility matrices after validating future Ubuntu releases.
|
||
|
|
- Export postcheck results in a structured format for evidence collection.
|