This commit is contained in:
@@ -0,0 +1,276 @@
|
||||
# Linux Fresh Setup Toolkit
|
||||
|
||||
## Executive summary
|
||||
|
||||
The Linux Fresh Setup Toolkit is day-0 bootstrap automation for a clean Ubuntu
|
||||
lab server or workstation. It prepares a host for routine administration,
|
||||
Cockpit, Docker workloads, libvirt/KVM virtual machines, optional NVIDIA
|
||||
diagnostics, bounded logging, practical kernel tuning, and a conservative
|
||||
security baseline.
|
||||
|
||||
The scripts are modular and safe to rerun. Optional components remain optional,
|
||||
UFW is not enabled without a specific flag, and an NVIDIA driver is never
|
||||
installed without an explicit version. This is a portfolio and homelab
|
||||
implementation, not a production-certified build standard.
|
||||
|
||||
## Scope and non-goals
|
||||
|
||||
The toolkit supports Ubuntu 24.04 and newer and assumes a systemd-based host
|
||||
with APT package management. It is suitable for a host such as `ailab` that may
|
||||
run WebODM, Open WebUI, Homepage, NVIDIA workloads, or test virtual machines.
|
||||
|
||||
It does not:
|
||||
|
||||
- Deploy applications, containers, or virtual machines.
|
||||
- Configure GPU passthrough, VFIO bindings, bridges, or Windows guests.
|
||||
- Select an NVIDIA driver automatically.
|
||||
- Define a complete firewall policy or compliance baseline.
|
||||
- Replace backup, monitoring, patching, or ongoing maintenance processes.
|
||||
- Claim live validation against every future Ubuntu release.
|
||||
|
||||
## Why this is separate from ailab-maintenance
|
||||
|
||||
This project establishes a fresh host. The sibling
|
||||
[AI Lab Maintenance Toolkit](../ailab-maintenance/) handles day-2 health
|
||||
checks, scheduled cleanup, configuration backup, disk monitoring, and VM
|
||||
inventory after a host is operating.
|
||||
|
||||
Keeping bootstrap and maintenance separate makes the change boundary clear:
|
||||
this toolkit installs platform capabilities and baseline configuration, while
|
||||
the maintenance toolkit manages recurring operational tasks.
|
||||
|
||||
## Directory layout
|
||||
|
||||
```text
|
||||
setup/
|
||||
├── README.md
|
||||
├── install.sh
|
||||
├── scripts/
|
||||
│ ├── 00-preflight.sh
|
||||
│ ├── 00-platform-guard.inc
|
||||
│ ├── 01-base-packages.sh
|
||||
│ ├── 02-shell-profile.sh
|
||||
│ ├── 03-cockpit.sh
|
||||
│ ├── 04-docker.sh
|
||||
│ ├── 05-libvirt.sh
|
||||
│ ├── 06-nvidia-tools.sh
|
||||
│ ├── 07-tuning.sh
|
||||
│ ├── 08-security-baseline.sh
|
||||
│ └── 99-postcheck.sh
|
||||
├── files/
|
||||
│ ├── bashrc.d/ailab.sh
|
||||
│ ├── docker/daemon.json
|
||||
│ ├── sysctl/99-ailab.conf
|
||||
│ └── systemd/journald-ailab-limits.conf
|
||||
└── docs/
|
||||
├── fresh-install-checklist.md
|
||||
├── cockpit.md
|
||||
├── docker.md
|
||||
├── libvirt.md
|
||||
├── nvidia.md
|
||||
└── bash-shell.md
|
||||
```
|
||||
|
||||
`00-platform-guard.inc` is an internal sourced helper used by mutating
|
||||
component scripts; it is not an executable profile.
|
||||
|
||||
## Supported profiles and flags
|
||||
|
||||
| Flag | Result |
|
||||
| --- | --- |
|
||||
| `--base` | Install operational CLI, diagnostic, storage, and network packages |
|
||||
| `--shell` | Install the root AI lab Bash profile |
|
||||
| `--cockpit` | Install and enable Cockpit |
|
||||
| `--docker` | Install Docker and bounded JSON-file logging |
|
||||
| `--libvirt` | Install and enable libvirt/KVM |
|
||||
| `--nvidia-tools` | Install NVIDIA and OpenCL diagnostics without a driver |
|
||||
| `--install-nvidia-driver VERSION` | Install diagnostics and the named Ubuntu driver package |
|
||||
| `--tuning` | Apply journald, sysctl, sensor, and sysstat settings |
|
||||
| `--security` | Install and enable fail2ban; install but do not enable UFW |
|
||||
| `--enable-ufw` | Run security setup and explicitly enable UFW |
|
||||
| `--all` | Run every standard profile without UFW enablement or driver installation |
|
||||
|
||||
`--install-nvidia-driver` implies `--nvidia-tools`. `--enable-ufw` implies
|
||||
`--security`. With no flags, the installer prints help and makes no changes.
|
||||
|
||||
## Installation examples
|
||||
|
||||
Review the scripts and current host access path before execution:
|
||||
|
||||
```bash
|
||||
cd labs/linux/setup
|
||||
./install.sh
|
||||
sudo ./install.sh --base --shell
|
||||
sudo ./install.sh --cockpit --docker --libvirt
|
||||
sudo ./install.sh --all
|
||||
```
|
||||
|
||||
Explicit high-impact options can be combined with `--all`:
|
||||
|
||||
```bash
|
||||
sudo ./install.sh --all --enable-ufw
|
||||
sudo ./install.sh --all --install-nvidia-driver 550
|
||||
```
|
||||
|
||||
The installer runs the read-only preflight once before selected profiles and a
|
||||
postcheck after all successful profile steps.
|
||||
|
||||
## Fresh host workflow
|
||||
|
||||
1. Patch the base Ubuntu installation and confirm console or out-of-band access.
|
||||
2. Review [the fresh install checklist](docs/fresh-install-checklist.md).
|
||||
3. Run `sudo ./install.sh --base --shell`.
|
||||
4. Add only the platform profiles needed by the host.
|
||||
5. Review service state, listening ports, storage, networking, and warnings in
|
||||
the postcheck.
|
||||
6. Reboot if a driver or kernel-related package requires it.
|
||||
7. Capture host-specific configuration and backup requirements separately.
|
||||
|
||||
## AI lab workflow
|
||||
|
||||
A general AI lab host can start with:
|
||||
|
||||
```bash
|
||||
sudo ./install.sh --base --shell --cockpit --docker --nvidia-tools --tuning --security
|
||||
```
|
||||
|
||||
This installs GPU diagnostics but leaves driver choice to the operator. Add
|
||||
libvirt only when the host will run VMs. Enable UFW only after confirming SSH,
|
||||
Cockpit, application, bridge, and VM networking requirements.
|
||||
|
||||
## Safety model
|
||||
|
||||
- Mutating profiles require root and refuse non-Ubuntu systems or Ubuntu older
|
||||
than 24.04.
|
||||
- Component profiles install their own direct prerequisites.
|
||||
- Existing managed configuration is changed only when content differs.
|
||||
- Changed root shell, Docker, journald, and sysctl files receive timestamped
|
||||
backups.
|
||||
- Existing valid Docker JSON is merged so unrelated settings survive.
|
||||
- Invalid Docker JSON stops configuration rather than being overwritten.
|
||||
- UFW and NVIDIA driver installation require explicit flags.
|
||||
- Package and service failures are not hidden.
|
||||
- Postcheck warnings report optional or inactive components without masking a
|
||||
successfully completed diagnostic script.
|
||||
|
||||
APT installation and service restarts are real system changes. Test first on a
|
||||
disposable host and maintain a console path when changing remote access policy.
|
||||
|
||||
## Bash shell profile
|
||||
|
||||
The shell profile is installed as `/root/.bashrc.d/ailab.sh`, and one exact
|
||||
source line is maintained in `/root/.bashrc`. It adds concise helpers for
|
||||
systemd, journals, Docker, libvirt, NVIDIA, ports, archives, and disk usage.
|
||||
|
||||
See [Bash shell profile](docs/bash-shell.md) for command details and cautions.
|
||||
|
||||
## Cockpit setup
|
||||
|
||||
Cockpit provides browser-based host, storage, network, package, VM, metrics,
|
||||
and support-report views. The installer enables `cockpit.socket` and reports
|
||||
`https://HOSTNAME:9090`. `cockpit-files` is optional because it is not
|
||||
available in every enabled Ubuntu repository.
|
||||
|
||||
See [Cockpit setup](docs/cockpit.md).
|
||||
|
||||
## Docker setup
|
||||
|
||||
The Ubuntu `docker.io` package path is preferred. The Docker official
|
||||
repository is configured only when `docker.io` is unavailable. The daemon uses
|
||||
the `json-file` log driver with five 50 MB files per container.
|
||||
|
||||
The toolkit configures log retention only. It does not prune data, deploy
|
||||
Compose applications, or configure an NVIDIA container runtime.
|
||||
|
||||
See [Docker setup](docs/docker.md).
|
||||
|
||||
## libvirt/KVM setup
|
||||
|
||||
The libvirt profile installs QEMU, OVMF, software TPM support, virt-install,
|
||||
virt-manager, bridge utilities, and libvirt clients and services. It enables
|
||||
`libvirtd` and prints existing guests and networks.
|
||||
|
||||
See [libvirt/KVM setup](docs/libvirt.md).
|
||||
|
||||
## NVIDIA tooling
|
||||
|
||||
The default NVIDIA profile installs `nvtop`, `clinfo`, and PCI diagnostics.
|
||||
It reports detected NVIDIA devices, `nvidia-smi`, and DKMS state when those
|
||||
commands exist.
|
||||
|
||||
Driver installation requires a numeric version that maps to an available
|
||||
Ubuntu package, for example `nvidia-driver-550`. Secure Boot enrollment,
|
||||
driver suitability, CUDA, container runtime support, and passthrough remain
|
||||
operator decisions.
|
||||
|
||||
See [NVIDIA tooling](docs/nvidia.md).
|
||||
|
||||
## Tuning
|
||||
|
||||
The tuning profile bounds persistent journal use, raises inotify limits for
|
||||
development and container workloads, reduces swappiness, enables sysstat, and
|
||||
runs automatic sensor detection when available.
|
||||
|
||||
Review these values against available memory, storage, monitoring retention,
|
||||
and workload behavior before deployment beyond a lab.
|
||||
|
||||
## Security baseline
|
||||
|
||||
The security profile installs UFW and fail2ban and enables fail2ban. It leaves
|
||||
UFW disabled unless `--enable-ufw` is present. Explicit UFW enablement permits
|
||||
OpenSSH and TCP port 9090 before activation.
|
||||
|
||||
This is a minimal access-preservation baseline, not a complete host firewall or
|
||||
hardening standard. Application and VM networking may require additional
|
||||
reviewed rules.
|
||||
|
||||
## Postcheck
|
||||
|
||||
The final script reports:
|
||||
|
||||
- Failed systemd units.
|
||||
- Cockpit, Docker, libvirt, and fail2ban status when installed.
|
||||
- Running Docker containers and defined virtual machines.
|
||||
- NVIDIA runtime state.
|
||||
- Filesystem usage and listening ports.
|
||||
|
||||
Warnings require operator review but optional component absence does not cause
|
||||
the postcheck itself to fail.
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
Run individual read-only checks after correcting a failed profile:
|
||||
|
||||
```bash
|
||||
sudo ./scripts/00-preflight.sh
|
||||
sudo ./scripts/99-postcheck.sh
|
||||
systemctl --failed
|
||||
journalctl -u docker -u libvirtd -u cockpit.socket -u fail2ban
|
||||
```
|
||||
|
||||
Common failure areas are unavailable APT repositories, unsupported package
|
||||
names on a future Ubuntu release, invalid pre-existing Docker JSON, Secure Boot
|
||||
module signing, disabled CPU virtualization, and remote firewall assumptions.
|
||||
|
||||
To roll back a managed configuration, compare the current file with its
|
||||
timestamped `.bak` copy, restore the reviewed backup, and restart or reload the
|
||||
owning service. Package removal is intentionally not automated because it may
|
||||
affect workloads and dependencies.
|
||||
|
||||
## Interview talking points
|
||||
|
||||
- Why day-0 bootstrap and day-2 maintenance have separate ownership.
|
||||
- How explicit flags protect firewall and GPU driver decisions.
|
||||
- Why Docker JSON is validated, backed up, and merged.
|
||||
- How idempotent content checks prevent backup and restart churn.
|
||||
- Why preflight and postcheck evidence surround mutating profiles.
|
||||
- Which virtualization, Secure Boot, IOMMU, and GPU decisions remain manual.
|
||||
|
||||
## Future improvements
|
||||
|
||||
- Add automated tests using disposable Ubuntu VMs.
|
||||
- Add a documented NVIDIA Container Toolkit profile.
|
||||
- Add optional non-root administrative user and group membership management.
|
||||
- Add bridge and VFIO planning checks without applying passthrough changes.
|
||||
- Add package compatibility matrices after validating future Ubuntu releases.
|
||||
- Export postcheck results in a structured format for evidence collection.
|
||||
Reference in New Issue
Block a user