Files
Mateusz Suski 4e739c5c99
lint / shell-yaml-ansible (push) Failing after 16s
Add Linux fresh setup toolkit
2026-06-06 00:23:11 +00:00

10 KiB

Linux Fresh Setup Toolkit

Executive summary

The Linux Fresh Setup Toolkit is day-0 bootstrap automation for a clean Ubuntu lab server or workstation. It prepares a host for routine administration, Cockpit, Docker workloads, libvirt/KVM virtual machines, optional NVIDIA diagnostics, bounded logging, practical kernel tuning, and a conservative security baseline.

The scripts are modular and safe to rerun. Optional components remain optional, UFW is not enabled without a specific flag, and an NVIDIA driver is never installed without an explicit version. This is a portfolio and homelab implementation, not a production-certified build standard.

Scope and non-goals

The toolkit supports Ubuntu 24.04 and newer and assumes a systemd-based host with APT package management. It is suitable for a host such as ailab that may run WebODM, Open WebUI, Homepage, NVIDIA workloads, or test virtual machines.

It does not:

  • Deploy applications, containers, or virtual machines.
  • Configure GPU passthrough, VFIO bindings, bridges, or Windows guests.
  • Select an NVIDIA driver automatically.
  • Define a complete firewall policy or compliance baseline.
  • Replace backup, monitoring, patching, or ongoing maintenance processes.
  • Claim live validation against every future Ubuntu release.

Why this is separate from ailab-maintenance

This project establishes a fresh host. The sibling AI Lab Maintenance Toolkit handles day-2 health checks, scheduled cleanup, configuration backup, disk monitoring, and VM inventory after a host is operating.

Keeping bootstrap and maintenance separate makes the change boundary clear: this toolkit installs platform capabilities and baseline configuration, while the maintenance toolkit manages recurring operational tasks.

Directory layout

setup/
├── README.md
├── install.sh
├── scripts/
│   ├── 00-preflight.sh
│   ├── 00-platform-guard.inc
│   ├── 01-base-packages.sh
│   ├── 02-shell-profile.sh
│   ├── 03-cockpit.sh
│   ├── 04-docker.sh
│   ├── 05-libvirt.sh
│   ├── 06-nvidia-tools.sh
│   ├── 07-tuning.sh
│   ├── 08-security-baseline.sh
│   └── 99-postcheck.sh
├── files/
│   ├── bashrc.d/ailab.sh
│   ├── docker/daemon.json
│   ├── sysctl/99-ailab.conf
│   └── systemd/journald-ailab-limits.conf
└── docs/
    ├── fresh-install-checklist.md
    ├── cockpit.md
    ├── docker.md
    ├── libvirt.md
    ├── nvidia.md
    └── bash-shell.md

00-platform-guard.inc is an internal sourced helper used by mutating component scripts; it is not an executable profile.

Supported profiles and flags

Flag Result
--base Install operational CLI, diagnostic, storage, and network packages
--shell Install the root AI lab Bash profile
--cockpit Install and enable Cockpit
--docker Install Docker and bounded JSON-file logging
--libvirt Install and enable libvirt/KVM
--nvidia-tools Install NVIDIA and OpenCL diagnostics without a driver
--install-nvidia-driver VERSION Install diagnostics and the named Ubuntu driver package
--tuning Apply journald, sysctl, sensor, and sysstat settings
--security Install and enable fail2ban; install but do not enable UFW
--enable-ufw Run security setup and explicitly enable UFW
--all Run every standard profile without UFW enablement or driver installation

--install-nvidia-driver implies --nvidia-tools. --enable-ufw implies --security. With no flags, the installer prints help and makes no changes.

Installation examples

Review the scripts and current host access path before execution:

cd labs/linux/setup
./install.sh
sudo ./install.sh --base --shell
sudo ./install.sh --cockpit --docker --libvirt
sudo ./install.sh --all

Explicit high-impact options can be combined with --all:

sudo ./install.sh --all --enable-ufw
sudo ./install.sh --all --install-nvidia-driver 550

The installer runs the read-only preflight once before selected profiles and a postcheck after all successful profile steps.

Fresh host workflow

  1. Patch the base Ubuntu installation and confirm console or out-of-band access.
  2. Review the fresh install checklist.
  3. Run sudo ./install.sh --base --shell.
  4. Add only the platform profiles needed by the host.
  5. Review service state, listening ports, storage, networking, and warnings in the postcheck.
  6. Reboot if a driver or kernel-related package requires it.
  7. Capture host-specific configuration and backup requirements separately.

AI lab workflow

A general AI lab host can start with:

sudo ./install.sh --base --shell --cockpit --docker --nvidia-tools --tuning --security

This installs GPU diagnostics but leaves driver choice to the operator. Add libvirt only when the host will run VMs. Enable UFW only after confirming SSH, Cockpit, application, bridge, and VM networking requirements.

Safety model

  • Mutating profiles require root and refuse non-Ubuntu systems or Ubuntu older than 24.04.
  • Component profiles install their own direct prerequisites.
  • Existing managed configuration is changed only when content differs.
  • Changed root shell, Docker, journald, and sysctl files receive timestamped backups.
  • Existing valid Docker JSON is merged so unrelated settings survive.
  • Invalid Docker JSON stops configuration rather than being overwritten.
  • UFW and NVIDIA driver installation require explicit flags.
  • Package and service failures are not hidden.
  • Postcheck warnings report optional or inactive components without masking a successfully completed diagnostic script.

APT installation and service restarts are real system changes. Test first on a disposable host and maintain a console path when changing remote access policy.

Bash shell profile

The shell profile is installed as /root/.bashrc.d/ailab.sh, and one exact source line is maintained in /root/.bashrc. It adds concise helpers for systemd, journals, Docker, libvirt, NVIDIA, ports, archives, and disk usage.

See Bash shell profile for command details and cautions.

Cockpit setup

Cockpit provides browser-based host, storage, network, package, VM, metrics, and support-report views. The installer enables cockpit.socket and reports https://HOSTNAME:9090. cockpit-files is optional because it is not available in every enabled Ubuntu repository.

See Cockpit setup.

Docker setup

The Ubuntu docker.io package path is preferred. The Docker official repository is configured only when docker.io is unavailable. The daemon uses the json-file log driver with five 50 MB files per container.

The toolkit configures log retention only. It does not prune data, deploy Compose applications, or configure an NVIDIA container runtime.

See Docker setup.

libvirt/KVM setup

The libvirt profile installs QEMU, OVMF, software TPM support, virt-install, virt-manager, bridge utilities, and libvirt clients and services. It enables libvirtd and prints existing guests and networks.

See libvirt/KVM setup.

NVIDIA tooling

The default NVIDIA profile installs nvtop, clinfo, and PCI diagnostics. It reports detected NVIDIA devices, nvidia-smi, and DKMS state when those commands exist.

Driver installation requires a numeric version that maps to an available Ubuntu package, for example nvidia-driver-550. Secure Boot enrollment, driver suitability, CUDA, container runtime support, and passthrough remain operator decisions.

See NVIDIA tooling.

Tuning

The tuning profile bounds persistent journal use, raises inotify limits for development and container workloads, reduces swappiness, enables sysstat, and runs automatic sensor detection when available.

Review these values against available memory, storage, monitoring retention, and workload behavior before deployment beyond a lab.

Security baseline

The security profile installs UFW and fail2ban and enables fail2ban. It leaves UFW disabled unless --enable-ufw is present. Explicit UFW enablement permits OpenSSH and TCP port 9090 before activation.

This is a minimal access-preservation baseline, not a complete host firewall or hardening standard. Application and VM networking may require additional reviewed rules.

Postcheck

The final script reports:

  • Failed systemd units.
  • Cockpit, Docker, libvirt, and fail2ban status when installed.
  • Running Docker containers and defined virtual machines.
  • NVIDIA runtime state.
  • Filesystem usage and listening ports.

Warnings require operator review but optional component absence does not cause the postcheck itself to fail.

Troubleshooting

Run individual read-only checks after correcting a failed profile:

sudo ./scripts/00-preflight.sh
sudo ./scripts/99-postcheck.sh
systemctl --failed
journalctl -u docker -u libvirtd -u cockpit.socket -u fail2ban

Common failure areas are unavailable APT repositories, unsupported package names on a future Ubuntu release, invalid pre-existing Docker JSON, Secure Boot module signing, disabled CPU virtualization, and remote firewall assumptions.

To roll back a managed configuration, compare the current file with its timestamped .bak copy, restore the reviewed backup, and restart or reload the owning service. Package removal is intentionally not automated because it may affect workloads and dependencies.

Interview talking points

  • Why day-0 bootstrap and day-2 maintenance have separate ownership.
  • How explicit flags protect firewall and GPU driver decisions.
  • Why Docker JSON is validated, backed up, and merged.
  • How idempotent content checks prevent backup and restart churn.
  • Why preflight and postcheck evidence surround mutating profiles.
  • Which virtualization, Secure Boot, IOMMU, and GPU decisions remain manual.

Future improvements

  • Add automated tests using disposable Ubuntu VMs.
  • Add a documented NVIDIA Container Toolkit profile.
  • Add optional non-root administrative user and group membership management.
  • Add bridge and VFIO planning checks without applying passthrough changes.
  • Add package compatibility matrices after validating future Ubuntu releases.
  • Export postcheck results in a structured format for evidence collection.