Compare commits
2 Commits
1843796e92
...
main
| Author | SHA1 | Date | |
|---|---|---|---|
| 4e739c5c99 | |||
| 8cb92de06f |
@@ -4,6 +4,8 @@
|
|||||||
|
|
||||||
### Added
|
### Added
|
||||||
|
|
||||||
|
- Added Linux Fresh Setup Toolkit under `labs/linux/setup` for day-0 Ubuntu lab host bootstrap automation.
|
||||||
|
- Added AI Lab Maintenance Toolkit with systemd-based Linux maintenance automation.
|
||||||
- Python tooling validation for operational scripts.
|
- Python tooling validation for operational scripts.
|
||||||
- `incident-log-summary` for general incident log summarization.
|
- `incident-log-summary` for general incident log summarization.
|
||||||
- `log-diff-checker` for pre-change and post-change log comparison.
|
- `log-diff-checker` for pre-change and post-change log comparison.
|
||||||
|
|||||||
@@ -10,6 +10,11 @@ Current subdirectories are planning areas unless their own README documents a ru
|
|||||||
- `ci-cd`
|
- `ci-cd`
|
||||||
- `docker`
|
- `docker`
|
||||||
|
|
||||||
|
## Linux operations labs
|
||||||
|
|
||||||
|
- [Linux Fresh Setup Toolkit](./linux/setup/) - Bootstrap automation for fresh Ubuntu lab hosts, including shell profile, Cockpit, Docker, libvirt/KVM, NVIDIA diagnostics, tuning and safe baseline defaults.
|
||||||
|
- [AI Lab Maintenance Toolkit](./linux/ailab-maintenance/) - Homelab-safe Linux maintenance automation for an Ubuntu AI infrastructure host, covering cleanup, health checks, config backup, Docker hygiene, kernel safety and systemd timers.
|
||||||
|
|
||||||
Lab content should document prerequisites, topology, validation, cleanup, and what remains untested. Do not present lab behavior as production-ready.
|
Lab content should document prerequisites, topology, validation, cleanup, and what remains untested. Do not present lab behavior as production-ready.
|
||||||
|
|
||||||
Planned lab topics are tracked in [ROADMAP.md](../ROADMAP.md). For Codex-driven changes, use [AGENTS.md](../AGENTS.md) and the templates under [docs/codex](../docs/codex/).
|
Planned lab topics are tracked in [ROADMAP.md](../ROADMAP.md). For Codex-driven changes, use [AGENTS.md](../AGENTS.md) and the templates under [docs/codex](../docs/codex/).
|
||||||
|
|||||||
@@ -0,0 +1,308 @@
|
|||||||
|
# AI Lab Maintenance Toolkit
|
||||||
|
|
||||||
|
## Executive summary
|
||||||
|
|
||||||
|
The AI Lab Maintenance Toolkit is a Bash and systemd operations lab for an
|
||||||
|
Ubuntu AI infrastructure host named `ailab`. It combines repeatable health
|
||||||
|
reporting, disk monitoring, conservative package cleanup, Docker hygiene,
|
||||||
|
configuration backup, and non-destructive VM inventory into a small toolkit
|
||||||
|
that is readable enough for review and guarded enough for homelab use.
|
||||||
|
|
||||||
|
This is a portfolio and lab implementation, not evidence of production
|
||||||
|
certification. Review package policy, backup coverage, maintenance windows, and
|
||||||
|
application impact before deploying it to another host.
|
||||||
|
|
||||||
|
## Problem solved
|
||||||
|
|
||||||
|
AI lab hosts accumulate operating system packages, kernel packages, container
|
||||||
|
images, build cache, journals, and configuration changes while also carrying
|
||||||
|
stateful workloads. Manual maintenance is easy to defer and risky to perform
|
||||||
|
without evidence. This project provides scheduled, logged tasks with explicit
|
||||||
|
safety boundaries and separate read-only audit commands.
|
||||||
|
|
||||||
|
## What this demonstrates
|
||||||
|
|
||||||
|
- Bash strict mode, input validation, dependency checks, and operational exit
|
||||||
|
codes.
|
||||||
|
- Dry-run-first maintenance with explicit authorization for changes.
|
||||||
|
- systemd oneshot services and persistent calendar timers.
|
||||||
|
- APT-managed kernel cleanup suitable for HWE, NVIDIA, DKMS, and VFIO review.
|
||||||
|
- Docker cleanup that preserves volumes.
|
||||||
|
- Configuration-focused backups with bounded retention.
|
||||||
|
- Optional discovery for Docker, libvirt, NVIDIA, SMART, and systemd.
|
||||||
|
- Idempotent installation and guarded JSON configuration updates.
|
||||||
|
|
||||||
|
## Architecture and directory layout
|
||||||
|
|
||||||
|
```text
|
||||||
|
ailab-maintenance/
|
||||||
|
├── README.md
|
||||||
|
├── install.sh
|
||||||
|
├── scripts/
|
||||||
|
│ ├── ailab-healthcheck.sh
|
||||||
|
│ ├── ailab-disk-watch.sh
|
||||||
|
│ ├── ailab-apt-cleanup.sh
|
||||||
|
│ ├── ailab-kernel-cleanup.sh
|
||||||
|
│ ├── ailab-docker-cleanup.sh
|
||||||
|
│ ├── ailab-config-backup.sh
|
||||||
|
│ └── ailab-vm-audit.sh
|
||||||
|
└── systemd/
|
||||||
|
├── ailab-apt-cleanup.service
|
||||||
|
├── ailab-apt-cleanup.timer
|
||||||
|
├── ailab-kernel-cleanup.service
|
||||||
|
├── ailab-kernel-cleanup.timer
|
||||||
|
├── ailab-docker-cleanup.service
|
||||||
|
├── ailab-docker-cleanup.timer
|
||||||
|
├── ailab-config-backup.service
|
||||||
|
├── ailab-config-backup.timer
|
||||||
|
├── ailab-disk-watch.service
|
||||||
|
└── ailab-disk-watch.timer
|
||||||
|
```
|
||||||
|
|
||||||
|
The installer deploys scripts to `/usr/local/sbin` and units to
|
||||||
|
`/etc/systemd/system`. Scripts run directly as root from systemd rather than
|
||||||
|
through an additional framework.
|
||||||
|
|
||||||
|
## Maintenance tasks
|
||||||
|
|
||||||
|
| Command | Purpose | Change behavior |
|
||||||
|
| --- | --- | --- |
|
||||||
|
| `ailab-healthcheck.sh` | Host, storage, service, container, VM, GPU, and SMART report | Read-only |
|
||||||
|
| `ailab-disk-watch.sh` | Filesystem threshold check | Read-only |
|
||||||
|
| `ailab-apt-cleanup.sh` | APT metadata refresh and unused package cleanup | Dry-run by default |
|
||||||
|
| `ailab-kernel-cleanup.sh` | APT-managed kernel package cleanup | Dry-run by default |
|
||||||
|
| `ailab-docker-cleanup.sh` | Unused Docker object and build-cache cleanup | Dry-run by default |
|
||||||
|
| `ailab-config-backup.sh` | Configuration archive and retention | Dry-run by default |
|
||||||
|
| `ailab-vm-audit.sh` | VM, pool, volume, and image-file inventory | Read-only |
|
||||||
|
|
||||||
|
## Safety model
|
||||||
|
|
||||||
|
Change-capable scripts default to dry-run behavior. Manual execution requires
|
||||||
|
`--execute` and an interactive `EXECUTE` confirmation. The systemd services
|
||||||
|
use `--execute --non-interactive`; installing and enabling those reviewed unit
|
||||||
|
files is the explicit authorization for scheduled maintenance.
|
||||||
|
|
||||||
|
Exit codes follow the repository convention:
|
||||||
|
|
||||||
|
- `0`: completed successfully or an optional component was absent.
|
||||||
|
- `1`: an operational check or maintenance action failed.
|
||||||
|
- `2`: invalid input, missing required dependency, or insufficient privilege.
|
||||||
|
|
||||||
|
The scripts do not bypass APT or Docker locks, delete VM resources, manually
|
||||||
|
select kernel names for removal, or hide command failures.
|
||||||
|
|
||||||
|
## Installation
|
||||||
|
|
||||||
|
Review every script and unit first. Installation changes package state,
|
||||||
|
journald settings, Docker daemon settings when Docker exists, and enabled timer
|
||||||
|
state.
|
||||||
|
|
||||||
|
```bash
|
||||||
|
cd labs/linux/ailab-maintenance
|
||||||
|
sudo ./install.sh
|
||||||
|
```
|
||||||
|
|
||||||
|
The installer:
|
||||||
|
|
||||||
|
1. Installs the documented Ubuntu utilities.
|
||||||
|
2. Deploys scripts and systemd units with fixed permissions.
|
||||||
|
3. Writes `/etc/systemd/journald.conf.d/ailab-limits.conf`.
|
||||||
|
4. Restarts `systemd-journald`.
|
||||||
|
5. Validates and backs up an existing Docker `daemon.json`, merges log limits
|
||||||
|
with `jq`, and attempts a Docker restart.
|
||||||
|
6. Enables all five timers.
|
||||||
|
7. Writes an initial report to `/root/ailab-healthcheck-now.txt`.
|
||||||
|
|
||||||
|
The installer is intended for Ubuntu 26.04. It is not run automatically by
|
||||||
|
repository validation.
|
||||||
|
|
||||||
|
## Manual commands
|
||||||
|
|
||||||
|
Read-only reports:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
sudo /usr/local/sbin/ailab-healthcheck.sh
|
||||||
|
sudo /usr/local/sbin/ailab-disk-watch.sh
|
||||||
|
sudo /usr/local/sbin/ailab-vm-audit.sh
|
||||||
|
```
|
||||||
|
|
||||||
|
Preview maintenance:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
sudo /usr/local/sbin/ailab-apt-cleanup.sh
|
||||||
|
sudo /usr/local/sbin/ailab-kernel-cleanup.sh
|
||||||
|
sudo /usr/local/sbin/ailab-docker-cleanup.sh
|
||||||
|
sudo /usr/local/sbin/ailab-config-backup.sh
|
||||||
|
```
|
||||||
|
|
||||||
|
Apply reviewed maintenance interactively:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
sudo /usr/local/sbin/ailab-apt-cleanup.sh --execute
|
||||||
|
sudo /usr/local/sbin/ailab-kernel-cleanup.sh --execute
|
||||||
|
sudo /usr/local/sbin/ailab-docker-cleanup.sh --execute
|
||||||
|
sudo /usr/local/sbin/ailab-config-backup.sh --execute
|
||||||
|
```
|
||||||
|
|
||||||
|
`--non-interactive` is reserved for reviewed automation and is rejected unless
|
||||||
|
`--execute` is also present.
|
||||||
|
|
||||||
|
## Systemd timers
|
||||||
|
|
||||||
|
| Timer | Schedule |
|
||||||
|
| --- | --- |
|
||||||
|
| `ailab-config-backup.timer` | Daily at 03:30 |
|
||||||
|
| `ailab-disk-watch.timer` | Hourly |
|
||||||
|
| `ailab-apt-cleanup.timer` | Sunday at 04:00 |
|
||||||
|
| `ailab-kernel-cleanup.timer` | Sunday at 04:20 |
|
||||||
|
| `ailab-docker-cleanup.timer` | Sunday at 04:40 |
|
||||||
|
|
||||||
|
All timers use `Persistent=true`, so a missed event runs after the host becomes
|
||||||
|
available. Inspect timer and service evidence with:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
systemctl list-timers --all | grep ailab-
|
||||||
|
systemctl status ailab-config-backup.timer
|
||||||
|
journalctl -u ailab-kernel-cleanup.service
|
||||||
|
```
|
||||||
|
|
||||||
|
## Logs
|
||||||
|
|
||||||
|
Scheduled and manual maintenance writes to:
|
||||||
|
|
||||||
|
```text
|
||||||
|
/var/log/ailab-apt-cleanup.log
|
||||||
|
/var/log/ailab-kernel-cleanup.log
|
||||||
|
/var/log/ailab-docker-cleanup.log
|
||||||
|
/var/log/ailab-config-backup.log
|
||||||
|
/var/log/ailab-disk-watch.log
|
||||||
|
```
|
||||||
|
|
||||||
|
systemd also records service output in the journal. Logrotate is installed as a
|
||||||
|
dependency, but this lab does not create a custom rotation policy for these
|
||||||
|
small maintenance logs.
|
||||||
|
|
||||||
|
## Docker policy
|
||||||
|
|
||||||
|
Docker cleanup runs `docker system prune -af` and removes build cache older
|
||||||
|
than seven days. It never passes `--volumes`. Named and anonymous volumes
|
||||||
|
remain outside this automated policy and require application-aware review.
|
||||||
|
|
||||||
|
The installer configures the `json-file` driver with a maximum size of `50m`
|
||||||
|
and five files. Existing valid JSON is backed up and merged. Invalid JSON
|
||||||
|
causes installation to stop rather than overwrite operator configuration.
|
||||||
|
|
||||||
|
## Kernel policy
|
||||||
|
|
||||||
|
Kernel removal is delegated to `apt autoremove --purge`; package names are not
|
||||||
|
constructed or purged with regular expressions. Before execution, the script
|
||||||
|
logs the APT simulation and refuses cleanup unless at least two installed
|
||||||
|
versioned kernel image packages remain after simulated removals.
|
||||||
|
|
||||||
|
This protects a fallback kernel while preserving Ubuntu dependency policy.
|
||||||
|
Operators must still review DKMS builds, NVIDIA compatibility, VFIO bindings,
|
||||||
|
Secure Boot state, and the simulated removal set before manual execution.
|
||||||
|
|
||||||
|
## Backup policy
|
||||||
|
|
||||||
|
Backups are written to `/srv/backups/ailab-config` as
|
||||||
|
`ailab-config-YYYYMMDD-HHMMSS.tar.gz`. Matching archives older than 30 days are
|
||||||
|
deleted only after a new archive is created.
|
||||||
|
|
||||||
|
The backup covers `/etc`, selected root shell configuration,
|
||||||
|
`/opt/ailab-maintenance` when present, and libvirt configuration under
|
||||||
|
`/var/lib/libvirt/qemu`. It does not include `/var/lib/docker`, WebODM data,
|
||||||
|
Ollama models, VM disk images, or other large application datasets. Because
|
||||||
|
`/etc` is included, explicitly listed configuration subdirectories are already
|
||||||
|
covered even when optional-path reporting mentions them separately.
|
||||||
|
|
||||||
|
This is a local configuration backup, not a disaster-recovery design. A real
|
||||||
|
deployment should copy archives to independently protected storage and test
|
||||||
|
restoration.
|
||||||
|
|
||||||
|
## Journald policy
|
||||||
|
|
||||||
|
The installer applies:
|
||||||
|
|
||||||
|
```ini
|
||||||
|
[Journal]
|
||||||
|
SystemMaxUse=1G
|
||||||
|
SystemKeepFree=2G
|
||||||
|
MaxRetentionSec=14day
|
||||||
|
Compress=yes
|
||||||
|
```
|
||||||
|
|
||||||
|
These settings bound journal growth while retaining useful troubleshooting
|
||||||
|
evidence. Capacity and retention should be adjusted to the host's disk size
|
||||||
|
and incident-response requirements.
|
||||||
|
|
||||||
|
## Disk watch policy
|
||||||
|
|
||||||
|
The disk check uses `df -P`, defaults to an 85 percent threshold, and returns
|
||||||
|
`1` when any checked filesystem meets or exceeds the threshold. Override the
|
||||||
|
threshold for a manual or unit invocation with:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
sudo AILAB_DISK_THRESHOLD=90 /usr/local/sbin/ailab-disk-watch.sh
|
||||||
|
```
|
||||||
|
|
||||||
|
The script reports every filesystem as `OK` or `WARNING`; it does not delete
|
||||||
|
data or attempt remediation.
|
||||||
|
|
||||||
|
## Example operational workflows
|
||||||
|
|
||||||
|
### Weekly maintenance review
|
||||||
|
|
||||||
|
```bash
|
||||||
|
sudo /usr/local/sbin/ailab-healthcheck.sh
|
||||||
|
sudo /usr/local/sbin/ailab-kernel-cleanup.sh
|
||||||
|
sudo /usr/local/sbin/ailab-docker-cleanup.sh
|
||||||
|
systemctl list-timers --all | grep ailab-
|
||||||
|
```
|
||||||
|
|
||||||
|
Review the kernel simulation, Docker usage, failed units, backup freshness, and
|
||||||
|
disk warnings before approving manual changes.
|
||||||
|
|
||||||
|
### Disk pressure investigation
|
||||||
|
|
||||||
|
```bash
|
||||||
|
sudo AILAB_DISK_THRESHOLD=80 /usr/local/sbin/ailab-disk-watch.sh
|
||||||
|
sudo docker system df
|
||||||
|
sudo journalctl --disk-usage
|
||||||
|
sudo /usr/local/sbin/ailab-vm-audit.sh
|
||||||
|
```
|
||||||
|
|
||||||
|
Use the evidence to identify ownership. Do not treat Docker pruning or file
|
||||||
|
deletion as a substitute for application-specific retention policy.
|
||||||
|
|
||||||
|
### Post-maintenance evidence
|
||||||
|
|
||||||
|
```bash
|
||||||
|
sudo /usr/local/sbin/ailab-healthcheck.sh \
|
||||||
|
| sudo tee /root/ailab-healthcheck-after-maintenance.txt
|
||||||
|
journalctl --since today -u 'ailab-*.service'
|
||||||
|
```
|
||||||
|
|
||||||
|
## Interview talking points
|
||||||
|
|
||||||
|
- Why timer units explicitly carry the non-interactive execution boundary.
|
||||||
|
- Why APT dependency policy is safer than regex-based kernel deletion.
|
||||||
|
- How Docker volume preservation separates platform hygiene from application
|
||||||
|
data lifecycle decisions.
|
||||||
|
- How optional dependency handling keeps one health command useful across
|
||||||
|
container, GPU, and virtualization host variants.
|
||||||
|
- Why configuration backup and application-data backup are separate concerns.
|
||||||
|
- How exit codes, persistent timers, logs, and post-checks support operations.
|
||||||
|
|
||||||
|
## Future improvements
|
||||||
|
|
||||||
|
- Add a dedicated logrotate policy after measuring log growth.
|
||||||
|
- Export disk-watch status to a monitoring system instead of relying only on
|
||||||
|
timer failure state.
|
||||||
|
- Add automated archive integrity checks and off-host replication.
|
||||||
|
- Add Bats tests using mocked `apt`, `docker`, `virsh`, and `systemctl`
|
||||||
|
commands.
|
||||||
|
- Add package-lock detection with bounded retry policy if recurring contention
|
||||||
|
is observed.
|
||||||
|
- Validate NVIDIA DKMS state and libvirt GPU passthrough configuration in a
|
||||||
|
dedicated read-only audit.
|
||||||
Executable
+103
@@ -0,0 +1,103 @@
|
|||||||
|
#!/usr/bin/env bash
|
||||||
|
set -euo pipefail
|
||||||
|
|
||||||
|
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
|
||||||
|
JOURNALD_DROP_IN="/etc/systemd/journald.conf.d/ailab-limits.conf"
|
||||||
|
DOCKER_CONFIG="/etc/docker/daemon.json"
|
||||||
|
packages=(
|
||||||
|
logrotate
|
||||||
|
needrestart
|
||||||
|
smartmontools
|
||||||
|
nvme-cli
|
||||||
|
sysstat
|
||||||
|
iotop
|
||||||
|
ncdu
|
||||||
|
duf
|
||||||
|
jq
|
||||||
|
lsof
|
||||||
|
psmisc
|
||||||
|
tar
|
||||||
|
gzip
|
||||||
|
)
|
||||||
|
timers=(
|
||||||
|
ailab-apt-cleanup.timer
|
||||||
|
ailab-kernel-cleanup.timer
|
||||||
|
ailab-docker-cleanup.timer
|
||||||
|
ailab-config-backup.timer
|
||||||
|
ailab-disk-watch.timer
|
||||||
|
)
|
||||||
|
|
||||||
|
if ((EUID != 0)); then
|
||||||
|
printf 'CRITICAL: install.sh must run as root\n' >&2
|
||||||
|
exit 2
|
||||||
|
fi
|
||||||
|
for command_name in apt-get install systemctl; do
|
||||||
|
if ! command -v "$command_name" >/dev/null 2>&1; then
|
||||||
|
printf 'CRITICAL: required command is missing: %s\n' "$command_name" >&2
|
||||||
|
exit 2
|
||||||
|
fi
|
||||||
|
done
|
||||||
|
|
||||||
|
printf 'Installing maintenance dependencies...\n'
|
||||||
|
apt-get update
|
||||||
|
DEBIAN_FRONTEND=noninteractive apt-get install -y "${packages[@]}"
|
||||||
|
|
||||||
|
printf 'Installing scripts and systemd units...\n'
|
||||||
|
for script in "$SCRIPT_DIR"/scripts/*.sh; do
|
||||||
|
install -m 0755 "$script" "/usr/local/sbin/$(basename "$script")"
|
||||||
|
done
|
||||||
|
for unit in "$SCRIPT_DIR"/systemd/*.{service,timer}; do
|
||||||
|
install -m 0644 "$unit" "/etc/systemd/system/$(basename "$unit")"
|
||||||
|
done
|
||||||
|
|
||||||
|
install -d -m 0755 "$(dirname "$JOURNALD_DROP_IN")"
|
||||||
|
tmp_journald="$(mktemp)"
|
||||||
|
trap 'rm -f "$tmp_journald" "${tmp_docker:-}"' EXIT
|
||||||
|
cat >"$tmp_journald" <<'EOF'
|
||||||
|
[Journal]
|
||||||
|
SystemMaxUse=1G
|
||||||
|
SystemKeepFree=2G
|
||||||
|
MaxRetentionSec=14day
|
||||||
|
Compress=yes
|
||||||
|
EOF
|
||||||
|
install -m 0644 "$tmp_journald" "$JOURNALD_DROP_IN"
|
||||||
|
systemctl restart systemd-journald
|
||||||
|
|
||||||
|
if command -v docker >/dev/null 2>&1; then
|
||||||
|
printf 'Configuring Docker log rotation limits...\n'
|
||||||
|
install -d -m 0755 /etc/docker
|
||||||
|
tmp_docker="$(mktemp)"
|
||||||
|
|
||||||
|
if [[ -f "$DOCKER_CONFIG" ]]; then
|
||||||
|
if ! jq empty "$DOCKER_CONFIG" >/dev/null 2>&1; then
|
||||||
|
printf 'CRITICAL: %s is not valid JSON; refusing to overwrite it\n' "$DOCKER_CONFIG" >&2
|
||||||
|
exit 1
|
||||||
|
fi
|
||||||
|
backup="$DOCKER_CONFIG.$(date '+%Y%m%d-%H%M%S').bak"
|
||||||
|
install -m 0644 "$DOCKER_CONFIG" "$backup"
|
||||||
|
jq '. + {
|
||||||
|
"log-driver": "json-file",
|
||||||
|
"log-opts": ((."log-opts" // {}) + {"max-size": "50m", "max-file": "5"})
|
||||||
|
}' "$DOCKER_CONFIG" >"$tmp_docker"
|
||||||
|
else
|
||||||
|
jq -n '{
|
||||||
|
"log-driver": "json-file",
|
||||||
|
"log-opts": {"max-size": "50m", "max-file": "5"}
|
||||||
|
}' >"$tmp_docker"
|
||||||
|
fi
|
||||||
|
|
||||||
|
jq empty "$tmp_docker"
|
||||||
|
install -m 0644 "$tmp_docker" "$DOCKER_CONFIG"
|
||||||
|
systemctl restart docker || true
|
||||||
|
else
|
||||||
|
printf 'INFO: Docker is not installed; Docker daemon configuration was skipped\n'
|
||||||
|
fi
|
||||||
|
|
||||||
|
systemctl daemon-reload
|
||||||
|
systemctl enable --now "${timers[@]}"
|
||||||
|
|
||||||
|
printf '\nEnabled AI Lab timers:\n'
|
||||||
|
systemctl list-timers --all --no-pager | grep 'ailab-' || true
|
||||||
|
|
||||||
|
/usr/local/sbin/ailab-healthcheck.sh > /root/ailab-healthcheck-now.txt
|
||||||
|
printf '\nOK: installation complete; initial health report: /root/ailab-healthcheck-now.txt\n'
|
||||||
@@ -0,0 +1,66 @@
|
|||||||
|
#!/usr/bin/env bash
|
||||||
|
set -o errexit
|
||||||
|
set -o nounset
|
||||||
|
set -o pipefail
|
||||||
|
|
||||||
|
LOG_FILE="/var/log/ailab-apt-cleanup.log"
|
||||||
|
execute=false
|
||||||
|
non_interactive=false
|
||||||
|
|
||||||
|
usage() {
|
||||||
|
printf 'Usage: %s [--execute [--non-interactive]]\n' "$(basename "$0")"
|
||||||
|
}
|
||||||
|
|
||||||
|
while (($# > 0)); do
|
||||||
|
case "$1" in
|
||||||
|
--execute) execute=true ;;
|
||||||
|
--non-interactive) non_interactive=true ;;
|
||||||
|
-h|--help) usage; exit 0 ;;
|
||||||
|
*) printf 'CRITICAL: unknown argument: %s\n' "$1" >&2; usage >&2; exit 2 ;;
|
||||||
|
esac
|
||||||
|
shift
|
||||||
|
done
|
||||||
|
|
||||||
|
if [[ "$non_interactive" == true && "$execute" != true ]]; then
|
||||||
|
printf 'CRITICAL: --non-interactive requires --execute\n' >&2
|
||||||
|
exit 2
|
||||||
|
fi
|
||||||
|
if ((EUID != 0)); then
|
||||||
|
printf 'CRITICAL: this script must run as root\n' >&2
|
||||||
|
exit 2
|
||||||
|
fi
|
||||||
|
if ! command -v apt >/dev/null 2>&1; then
|
||||||
|
printf 'CRITICAL: apt is required\n' >&2
|
||||||
|
exit 2
|
||||||
|
fi
|
||||||
|
|
||||||
|
exec > >(tee -a "$LOG_FILE") 2>&1
|
||||||
|
printf '\n[%s] APT cleanup\n' "$(date --iso-8601=seconds)"
|
||||||
|
|
||||||
|
if [[ "$execute" != true ]]; then
|
||||||
|
printf 'INFO: dry-run mode; apt update, autoremove, autoclean, and needrestart are not executed\n'
|
||||||
|
printf 'INFO: simulated autoremove follows\n'
|
||||||
|
LC_ALL=C apt -s autoremove --purge
|
||||||
|
printf 'INFO: rerun with --execute and confirm to apply changes\n'
|
||||||
|
exit 0
|
||||||
|
fi
|
||||||
|
|
||||||
|
if [[ "$non_interactive" != true ]]; then
|
||||||
|
printf 'WARNING: this will update APT metadata and remove packages marked as automatically installed and unused.\n'
|
||||||
|
printf 'Type EXECUTE to continue: '
|
||||||
|
read -r confirmation
|
||||||
|
if [[ "$confirmation" != "EXECUTE" ]]; then
|
||||||
|
printf 'CRITICAL: confirmation failed; no changes made\n'
|
||||||
|
exit 2
|
||||||
|
fi
|
||||||
|
fi
|
||||||
|
|
||||||
|
apt update
|
||||||
|
apt autoremove --purge -y
|
||||||
|
apt autoclean -y
|
||||||
|
if command -v needrestart >/dev/null 2>&1; then
|
||||||
|
needrestart -b || true
|
||||||
|
else
|
||||||
|
printf 'WARNING: needrestart is not installed\n'
|
||||||
|
fi
|
||||||
|
printf 'OK: APT cleanup completed\n'
|
||||||
@@ -0,0 +1,90 @@
|
|||||||
|
#!/usr/bin/env bash
|
||||||
|
set -o errexit
|
||||||
|
set -o nounset
|
||||||
|
set -o pipefail
|
||||||
|
|
||||||
|
LOG_FILE="/var/log/ailab-config-backup.log"
|
||||||
|
BACKUP_DIR="/srv/backups/ailab-config"
|
||||||
|
RETENTION_DAYS=30
|
||||||
|
execute=false
|
||||||
|
non_interactive=false
|
||||||
|
|
||||||
|
usage() {
|
||||||
|
printf 'Usage: %s [--execute [--non-interactive]]\n' "$(basename "$0")"
|
||||||
|
}
|
||||||
|
|
||||||
|
while (($# > 0)); do
|
||||||
|
case "$1" in
|
||||||
|
--execute) execute=true ;;
|
||||||
|
--non-interactive) non_interactive=true ;;
|
||||||
|
-h|--help) usage; exit 0 ;;
|
||||||
|
*) printf 'CRITICAL: unknown argument: %s\n' "$1" >&2; usage >&2; exit 2 ;;
|
||||||
|
esac
|
||||||
|
shift
|
||||||
|
done
|
||||||
|
|
||||||
|
if [[ "$non_interactive" == true && "$execute" != true ]]; then
|
||||||
|
printf 'CRITICAL: --non-interactive requires --execute\n' >&2
|
||||||
|
exit 2
|
||||||
|
fi
|
||||||
|
if ((EUID != 0)); then
|
||||||
|
printf 'CRITICAL: this script must run as root\n' >&2
|
||||||
|
exit 2
|
||||||
|
fi
|
||||||
|
for command_name in tar gzip find; do
|
||||||
|
if ! command -v "$command_name" >/dev/null 2>&1; then
|
||||||
|
printf 'CRITICAL: required command is missing: %s\n' "$command_name" >&2
|
||||||
|
exit 2
|
||||||
|
fi
|
||||||
|
done
|
||||||
|
|
||||||
|
exec > >(tee -a "$LOG_FILE") 2>&1
|
||||||
|
timestamp="$(date '+%Y%m%d-%H%M%S')"
|
||||||
|
archive="$BACKUP_DIR/ailab-config-$timestamp.tar.gz"
|
||||||
|
candidate_paths=(
|
||||||
|
/etc
|
||||||
|
/root/.bashrc
|
||||||
|
/root/.bashrc.d
|
||||||
|
/opt/ailab-maintenance
|
||||||
|
/var/lib/libvirt/qemu
|
||||||
|
)
|
||||||
|
source_paths=()
|
||||||
|
|
||||||
|
printf '\n[%s] Configuration backup\n' "$(date --iso-8601=seconds)"
|
||||||
|
for path in "${candidate_paths[@]}"; do
|
||||||
|
if [[ -e "$path" ]]; then
|
||||||
|
source_paths+=("${path#/}")
|
||||||
|
printf 'OK: include %s\n' "$path"
|
||||||
|
else
|
||||||
|
printf 'INFO: optional path is absent: %s\n' "$path"
|
||||||
|
fi
|
||||||
|
done
|
||||||
|
|
||||||
|
if ((${#source_paths[@]} == 0)); then
|
||||||
|
printf 'CRITICAL: no backup source paths are present\n'
|
||||||
|
exit 1
|
||||||
|
fi
|
||||||
|
|
||||||
|
printf 'Backup destination: %s\n' "$archive"
|
||||||
|
printf 'Retention: matching archives older than %d days\n' "$RETENTION_DAYS"
|
||||||
|
printf 'Configuration beneath /etc includes libvirt, Docker, and systemd when present\n'
|
||||||
|
printf 'Excluded by policy: Docker data, application data, model data, and VM disk images\n'
|
||||||
|
|
||||||
|
if [[ "$execute" != true ]]; then
|
||||||
|
printf 'INFO: dry-run mode; no archive or directory was created and no retention deletion ran\n'
|
||||||
|
exit 0
|
||||||
|
fi
|
||||||
|
|
||||||
|
if [[ "$non_interactive" != true ]]; then
|
||||||
|
printf 'Type EXECUTE to create the archive and apply retention: '
|
||||||
|
read -r confirmation
|
||||||
|
if [[ "$confirmation" != "EXECUTE" ]]; then
|
||||||
|
printf 'CRITICAL: confirmation failed; no changes made\n'
|
||||||
|
exit 2
|
||||||
|
fi
|
||||||
|
fi
|
||||||
|
|
||||||
|
install -d -m 0750 "$BACKUP_DIR"
|
||||||
|
tar --create --gzip --file "$archive" --ignore-failed-read --directory / -- "${source_paths[@]}"
|
||||||
|
find "$BACKUP_DIR" -maxdepth 1 -type f -name 'ailab-config-*.tar.gz' -mtime "+$RETENTION_DAYS" -print -delete
|
||||||
|
printf 'OK: configuration backup created: %s\n' "$archive"
|
||||||
@@ -0,0 +1,38 @@
|
|||||||
|
#!/usr/bin/env bash
|
||||||
|
set -o errexit
|
||||||
|
set -o nounset
|
||||||
|
set -o pipefail
|
||||||
|
|
||||||
|
LOG_FILE="/var/log/ailab-disk-watch.log"
|
||||||
|
threshold="${AILAB_DISK_THRESHOLD:-85}"
|
||||||
|
|
||||||
|
if ((EUID != 0)); then
|
||||||
|
printf 'CRITICAL: this script must run as root to write %s\n' "$LOG_FILE" >&2
|
||||||
|
exit 2
|
||||||
|
fi
|
||||||
|
|
||||||
|
if [[ ! "$threshold" =~ ^[0-9]+$ ]] || ((threshold < 1 || threshold > 100)); then
|
||||||
|
printf 'CRITICAL: AILAB_DISK_THRESHOLD must be an integer from 1 to 100\n' >&2
|
||||||
|
exit 2
|
||||||
|
fi
|
||||||
|
|
||||||
|
exec > >(tee -a "$LOG_FILE") 2>&1
|
||||||
|
printf '\n[%s] Disk usage check; threshold=%s%%\n' "$(date --iso-8601=seconds)" "$threshold"
|
||||||
|
|
||||||
|
status=0
|
||||||
|
while read -r filesystem _blocks _used available use_percent mountpoint; do
|
||||||
|
usage="${use_percent%\%}"
|
||||||
|
|
||||||
|
if [[ ! "$usage" =~ ^[0-9]+$ ]]; then
|
||||||
|
printf 'WARNING: unable to parse usage for %s mounted on %s\n' "$filesystem" "$mountpoint"
|
||||||
|
status=1
|
||||||
|
elif ((usage >= threshold)); then
|
||||||
|
printf 'WARNING: %s mounted on %s is %s used; threshold=%s%%; available=%s KB\n' \
|
||||||
|
"$filesystem" "$mountpoint" "$use_percent" "$threshold" "$available"
|
||||||
|
status=1
|
||||||
|
else
|
||||||
|
printf 'OK: %s mounted on %s is %s used\n' "$filesystem" "$mountpoint" "$use_percent"
|
||||||
|
fi
|
||||||
|
done < <(df -P -x tmpfs -x devtmpfs | awk 'NR > 1 {print $1, $2, $3, $4, $5, $6}')
|
||||||
|
|
||||||
|
exit "$status"
|
||||||
@@ -0,0 +1,70 @@
|
|||||||
|
#!/usr/bin/env bash
|
||||||
|
set -o errexit
|
||||||
|
set -o nounset
|
||||||
|
set -o pipefail
|
||||||
|
|
||||||
|
LOG_FILE="/var/log/ailab-docker-cleanup.log"
|
||||||
|
execute=false
|
||||||
|
non_interactive=false
|
||||||
|
|
||||||
|
usage() {
|
||||||
|
printf 'Usage: %s [--execute [--non-interactive]]\n' "$(basename "$0")"
|
||||||
|
}
|
||||||
|
|
||||||
|
while (($# > 0)); do
|
||||||
|
case "$1" in
|
||||||
|
--execute) execute=true ;;
|
||||||
|
--non-interactive) non_interactive=true ;;
|
||||||
|
-h|--help) usage; exit 0 ;;
|
||||||
|
*) printf 'CRITICAL: unknown argument: %s\n' "$1" >&2; usage >&2; exit 2 ;;
|
||||||
|
esac
|
||||||
|
shift
|
||||||
|
done
|
||||||
|
|
||||||
|
if [[ "$non_interactive" == true && "$execute" != true ]]; then
|
||||||
|
printf 'CRITICAL: --non-interactive requires --execute\n' >&2
|
||||||
|
exit 2
|
||||||
|
fi
|
||||||
|
if ((EUID != 0)); then
|
||||||
|
printf 'CRITICAL: this script must run as root\n' >&2
|
||||||
|
exit 2
|
||||||
|
fi
|
||||||
|
|
||||||
|
exec > >(tee -a "$LOG_FILE") 2>&1
|
||||||
|
printf '\n[%s] Docker cleanup\n' "$(date --iso-8601=seconds)"
|
||||||
|
|
||||||
|
if ! command -v docker >/dev/null 2>&1; then
|
||||||
|
printf 'INFO: Docker is not installed; nothing to do\n'
|
||||||
|
exit 0
|
||||||
|
fi
|
||||||
|
if command -v systemctl >/dev/null 2>&1 && ! systemctl is-active --quiet docker; then
|
||||||
|
printf 'INFO: docker.service is inactive; nothing to do\n'
|
||||||
|
exit 0
|
||||||
|
fi
|
||||||
|
|
||||||
|
printf '\nDocker disk usage before cleanup:\n'
|
||||||
|
docker system df
|
||||||
|
|
||||||
|
if [[ "$execute" != true ]]; then
|
||||||
|
printf 'INFO: dry-run mode; would run docker system prune -af\n'
|
||||||
|
printf 'INFO: dry-run mode; would run docker builder prune -af --filter until=168h\n'
|
||||||
|
printf 'INFO: Docker volumes are never included in this cleanup\n'
|
||||||
|
exit 0
|
||||||
|
fi
|
||||||
|
|
||||||
|
if [[ "$non_interactive" != true ]]; then
|
||||||
|
printf 'WARNING: this removes unused containers, networks, images, and old build cache, but not volumes.\n'
|
||||||
|
printf 'Type EXECUTE to continue: '
|
||||||
|
read -r confirmation
|
||||||
|
if [[ "$confirmation" != "EXECUTE" ]]; then
|
||||||
|
printf 'CRITICAL: confirmation failed; no changes made\n'
|
||||||
|
exit 2
|
||||||
|
fi
|
||||||
|
fi
|
||||||
|
|
||||||
|
docker system prune -af
|
||||||
|
docker builder prune -af --filter "until=168h"
|
||||||
|
|
||||||
|
printf '\nDocker disk usage after cleanup:\n'
|
||||||
|
docker system df
|
||||||
|
printf 'OK: Docker cleanup completed; volumes were not pruned\n'
|
||||||
+111
@@ -0,0 +1,111 @@
|
|||||||
|
#!/usr/bin/env bash
|
||||||
|
set -o errexit
|
||||||
|
set -o nounset
|
||||||
|
set -o pipefail
|
||||||
|
|
||||||
|
section() {
|
||||||
|
printf '\n== %s ==\n' "$1"
|
||||||
|
}
|
||||||
|
|
||||||
|
run_optional() {
|
||||||
|
local description="$1"
|
||||||
|
shift
|
||||||
|
|
||||||
|
if "$@"; then
|
||||||
|
return 0
|
||||||
|
fi
|
||||||
|
|
||||||
|
printf 'WARNING: %s failed\n' "$description"
|
||||||
|
return 0
|
||||||
|
}
|
||||||
|
|
||||||
|
section "Host identity"
|
||||||
|
if command -v hostnamectl >/dev/null 2>&1; then
|
||||||
|
run_optional "hostnamectl" hostnamectl
|
||||||
|
else
|
||||||
|
run_optional "hostname" hostname
|
||||||
|
fi
|
||||||
|
run_optional "kernel information" uname -a
|
||||||
|
run_optional "uptime" uptime
|
||||||
|
|
||||||
|
section "Memory"
|
||||||
|
if command -v free >/dev/null 2>&1; then
|
||||||
|
run_optional "memory report" free -h
|
||||||
|
else
|
||||||
|
printf 'WARNING: free is not available\n'
|
||||||
|
fi
|
||||||
|
|
||||||
|
section "Filesystems"
|
||||||
|
if command -v df >/dev/null 2>&1; then
|
||||||
|
run_optional "filesystem report" df -hT
|
||||||
|
printf '\nKey mountpoints present:\n'
|
||||||
|
for mountpoint in / /boot /var /srv /opt /home; do
|
||||||
|
if findmnt -rn --target "$mountpoint" >/dev/null 2>&1; then
|
||||||
|
run_optional "filesystem report for $mountpoint" df -hT "$mountpoint"
|
||||||
|
fi
|
||||||
|
done
|
||||||
|
else
|
||||||
|
printf 'WARNING: df is not available\n'
|
||||||
|
fi
|
||||||
|
|
||||||
|
section "Journal usage"
|
||||||
|
if command -v journalctl >/dev/null 2>&1; then
|
||||||
|
run_optional "journal disk usage" journalctl --disk-usage
|
||||||
|
else
|
||||||
|
printf 'WARNING: journalctl is not available\n'
|
||||||
|
fi
|
||||||
|
|
||||||
|
section "Docker"
|
||||||
|
if command -v docker >/dev/null 2>&1; then
|
||||||
|
if command -v systemctl >/dev/null 2>&1; then
|
||||||
|
run_optional "Docker service state" systemctl is-active docker
|
||||||
|
fi
|
||||||
|
run_optional "Docker container list" docker ps --format 'table {{.Names}}\t{{.Image}}\t{{.Status}}\t{{.Ports}}'
|
||||||
|
run_optional "Docker disk usage" docker system df
|
||||||
|
else
|
||||||
|
printf 'INFO: Docker is not installed\n'
|
||||||
|
fi
|
||||||
|
|
||||||
|
section "Libvirt"
|
||||||
|
if command -v virsh >/dev/null 2>&1; then
|
||||||
|
if command -v systemctl >/dev/null 2>&1; then
|
||||||
|
run_optional "libvirtd service state" systemctl is-active libvirtd
|
||||||
|
fi
|
||||||
|
run_optional "libvirt guest list" virsh list --all
|
||||||
|
else
|
||||||
|
printf 'INFO: virsh is not installed\n'
|
||||||
|
fi
|
||||||
|
|
||||||
|
section "NVIDIA"
|
||||||
|
if command -v nvidia-smi >/dev/null 2>&1; then
|
||||||
|
run_optional "NVIDIA status" nvidia-smi
|
||||||
|
else
|
||||||
|
printf 'INFO: nvidia-smi is not installed\n'
|
||||||
|
fi
|
||||||
|
|
||||||
|
section "Failed systemd units"
|
||||||
|
if command -v systemctl >/dev/null 2>&1; then
|
||||||
|
run_optional "failed systemd unit report" systemctl --failed --no-pager
|
||||||
|
else
|
||||||
|
printf 'WARNING: systemctl is not available\n'
|
||||||
|
fi
|
||||||
|
|
||||||
|
section "SMART quick health"
|
||||||
|
if command -v smartctl >/dev/null 2>&1; then
|
||||||
|
shopt -s nullglob
|
||||||
|
devices=(/dev/sd? /dev/nvme?n?)
|
||||||
|
shopt -u nullglob
|
||||||
|
|
||||||
|
if ((${#devices[@]} == 0)); then
|
||||||
|
printf 'INFO: no matching SATA/SCSI or NVMe devices found\n'
|
||||||
|
else
|
||||||
|
for device in "${devices[@]}"; do
|
||||||
|
printf '\n-- %s --\n' "$device"
|
||||||
|
run_optional "SMART health check for $device" smartctl -H "$device"
|
||||||
|
done
|
||||||
|
fi
|
||||||
|
else
|
||||||
|
printf 'INFO: smartctl is not installed\n'
|
||||||
|
fi
|
||||||
|
|
||||||
|
exit 0
|
||||||
@@ -0,0 +1,117 @@
|
|||||||
|
#!/usr/bin/env bash
|
||||||
|
set -o errexit
|
||||||
|
set -o nounset
|
||||||
|
set -o pipefail
|
||||||
|
|
||||||
|
# APT autoremove respects package dependencies and kernel protection rules. That
|
||||||
|
# is safer than name-based purging on HWE hosts using NVIDIA, DKMS, or VFIO.
|
||||||
|
|
||||||
|
LOG_FILE="/var/log/ailab-kernel-cleanup.log"
|
||||||
|
execute=false
|
||||||
|
non_interactive=false
|
||||||
|
|
||||||
|
usage() {
|
||||||
|
printf 'Usage: %s [--execute [--non-interactive]]\n' "$(basename "$0")"
|
||||||
|
}
|
||||||
|
|
||||||
|
kernel_packages() {
|
||||||
|
dpkg-query -W -f='${db:Status-Abbrev} ${binary:Package}\n' \
|
||||||
|
'linux-image*' 'linux-headers*' 'linux-modules*' 2>/dev/null \
|
||||||
|
| awk '$1 ~ /^ii/ {print $2}' \
|
||||||
|
| sort -u || true
|
||||||
|
}
|
||||||
|
|
||||||
|
versioned_kernel_images() {
|
||||||
|
dpkg-query -W -f='${db:Status-Abbrev} ${binary:Package}\n' 'linux-image-[0-9]*' 2>/dev/null \
|
||||||
|
| awk '$1 ~ /^ii/ {sub(/:.*/, "", $2); print $2}' \
|
||||||
|
| sort -u || true
|
||||||
|
}
|
||||||
|
|
||||||
|
while (($# > 0)); do
|
||||||
|
case "$1" in
|
||||||
|
--execute) execute=true ;;
|
||||||
|
--non-interactive) non_interactive=true ;;
|
||||||
|
-h|--help) usage; exit 0 ;;
|
||||||
|
*) printf 'CRITICAL: unknown argument: %s\n' "$1" >&2; usage >&2; exit 2 ;;
|
||||||
|
esac
|
||||||
|
shift
|
||||||
|
done
|
||||||
|
|
||||||
|
if [[ "$non_interactive" == true && "$execute" != true ]]; then
|
||||||
|
printf 'CRITICAL: --non-interactive requires --execute\n' >&2
|
||||||
|
exit 2
|
||||||
|
fi
|
||||||
|
if ((EUID != 0)); then
|
||||||
|
printf 'CRITICAL: this script must run as root\n' >&2
|
||||||
|
exit 2
|
||||||
|
fi
|
||||||
|
for command_name in apt dpkg-query uname; do
|
||||||
|
if ! command -v "$command_name" >/dev/null 2>&1; then
|
||||||
|
printf 'CRITICAL: required command is missing: %s\n' "$command_name" >&2
|
||||||
|
exit 2
|
||||||
|
fi
|
||||||
|
done
|
||||||
|
|
||||||
|
exec > >(tee -a "$LOG_FILE") 2>&1
|
||||||
|
printf '\n[%s] Kernel cleanup\n' "$(date --iso-8601=seconds)"
|
||||||
|
printf 'Running kernel: %s\n' "$(uname -r)"
|
||||||
|
printf '\nInstalled kernel-related packages before cleanup:\n'
|
||||||
|
kernel_packages
|
||||||
|
|
||||||
|
simulation="$(LC_ALL=C apt -s autoremove --purge)"
|
||||||
|
printf '\nAPT autoremove simulation:\n%s\n' "$simulation"
|
||||||
|
|
||||||
|
mapfile -t installed_images < <(versioned_kernel_images)
|
||||||
|
mapfile -t removed_images < <(
|
||||||
|
awk '$1 == "Remv" && $2 ~ /^linux-image-[0-9]/ {sub(/:.*/, "", $2); print $2}' <<<"$simulation" | sort -u
|
||||||
|
)
|
||||||
|
|
||||||
|
remaining_images=0
|
||||||
|
for image in "${installed_images[@]}"; do
|
||||||
|
remove_image=false
|
||||||
|
for removed in "${removed_images[@]}"; do
|
||||||
|
if [[ "$image" == "$removed" ]]; then
|
||||||
|
remove_image=true
|
||||||
|
break
|
||||||
|
fi
|
||||||
|
done
|
||||||
|
if [[ "$remove_image" != true ]]; then
|
||||||
|
remaining_images=$((remaining_images + 1))
|
||||||
|
fi
|
||||||
|
done
|
||||||
|
|
||||||
|
printf 'Kernel image safety check: installed=%d simulated-removals=%d remaining=%d\n' \
|
||||||
|
"${#installed_images[@]}" "${#removed_images[@]}" "$remaining_images"
|
||||||
|
|
||||||
|
if ((${#installed_images[@]} < 2 || remaining_images < 2)); then
|
||||||
|
printf 'CRITICAL: cleanup would not leave at least two versioned kernel images; refusing execution\n'
|
||||||
|
exit 1
|
||||||
|
fi
|
||||||
|
|
||||||
|
if [[ "$execute" != true ]]; then
|
||||||
|
printf 'INFO: dry-run mode; no packages were removed\n'
|
||||||
|
printf 'INFO: rerun with --execute and confirm to apply the simulated cleanup\n'
|
||||||
|
exit 0
|
||||||
|
fi
|
||||||
|
|
||||||
|
if [[ "$non_interactive" != true ]]; then
|
||||||
|
printf 'WARNING: APT will remove the packages shown in the simulation above.\n'
|
||||||
|
printf 'Type EXECUTE to continue: '
|
||||||
|
read -r confirmation
|
||||||
|
if [[ "$confirmation" != "EXECUTE" ]]; then
|
||||||
|
printf 'CRITICAL: confirmation failed; no changes made\n'
|
||||||
|
exit 2
|
||||||
|
fi
|
||||||
|
fi
|
||||||
|
|
||||||
|
apt autoremove --purge -y
|
||||||
|
apt autoclean -y
|
||||||
|
if command -v update-grub >/dev/null 2>&1; then
|
||||||
|
update-grub || true
|
||||||
|
else
|
||||||
|
printf 'WARNING: update-grub is not installed\n'
|
||||||
|
fi
|
||||||
|
|
||||||
|
printf '\nInstalled kernel-related packages after cleanup:\n'
|
||||||
|
kernel_packages
|
||||||
|
printf 'OK: kernel cleanup completed with APT-managed package selection\n'
|
||||||
+42
@@ -0,0 +1,42 @@
|
|||||||
|
#!/usr/bin/env bash
|
||||||
|
set -o errexit
|
||||||
|
set -o nounset
|
||||||
|
set -o pipefail
|
||||||
|
|
||||||
|
section() {
|
||||||
|
printf '\n== %s ==\n' "$1"
|
||||||
|
}
|
||||||
|
|
||||||
|
if ! command -v virsh >/dev/null 2>&1; then
|
||||||
|
printf 'INFO: virsh is not installed; VM audit skipped\n'
|
||||||
|
exit 0
|
||||||
|
fi
|
||||||
|
|
||||||
|
section "Virtual machines"
|
||||||
|
virsh list --all || printf 'WARNING: unable to list virtual machines\n'
|
||||||
|
|
||||||
|
section "Storage pools"
|
||||||
|
virsh pool-list --all || printf 'WARNING: unable to list storage pools\n'
|
||||||
|
|
||||||
|
mapfile -t pools < <(virsh pool-list --all --name 2>/dev/null | sed '/^[[:space:]]*$/d' || true)
|
||||||
|
for pool in "${pools[@]}"; do
|
||||||
|
section "Volumes in pool: $pool"
|
||||||
|
virsh vol-list "$pool" || printf 'WARNING: unable to list volumes in pool %s\n' "$pool"
|
||||||
|
done
|
||||||
|
|
||||||
|
section "Possible VM disk and installation images"
|
||||||
|
search_roots=()
|
||||||
|
for path in /var/lib/libvirt /srv /opt; do
|
||||||
|
[[ -d "$path" ]] && search_roots+=("$path")
|
||||||
|
done
|
||||||
|
|
||||||
|
if ((${#search_roots[@]} == 0)); then
|
||||||
|
printf 'INFO: no configured search roots are present\n'
|
||||||
|
else
|
||||||
|
find "${search_roots[@]}" -xdev -type f \
|
||||||
|
\( -iname '*.qcow2' -o -iname '*.raw' -o -iname '*.iso' \) \
|
||||||
|
-printf '%12s bytes %p\n' 2>/dev/null \
|
||||||
|
| sort -nr || true
|
||||||
|
fi
|
||||||
|
|
||||||
|
printf '\nINFO: audit complete; no files or libvirt resources were modified\n'
|
||||||
@@ -0,0 +1,8 @@
|
|||||||
|
[Unit]
|
||||||
|
Description=AI Lab safe APT cleanup
|
||||||
|
After=network-online.target
|
||||||
|
Wants=network-online.target
|
||||||
|
|
||||||
|
[Service]
|
||||||
|
Type=oneshot
|
||||||
|
ExecStart=/usr/local/sbin/ailab-apt-cleanup.sh --execute --non-interactive
|
||||||
@@ -0,0 +1,9 @@
|
|||||||
|
[Unit]
|
||||||
|
Description=Run AI Lab APT cleanup weekly
|
||||||
|
|
||||||
|
[Timer]
|
||||||
|
OnCalendar=Sun *-*-* 04:00:00
|
||||||
|
Persistent=true
|
||||||
|
|
||||||
|
[Install]
|
||||||
|
WantedBy=timers.target
|
||||||
@@ -0,0 +1,6 @@
|
|||||||
|
[Unit]
|
||||||
|
Description=AI Lab configuration backup
|
||||||
|
|
||||||
|
[Service]
|
||||||
|
Type=oneshot
|
||||||
|
ExecStart=/usr/local/sbin/ailab-config-backup.sh --execute --non-interactive
|
||||||
@@ -0,0 +1,9 @@
|
|||||||
|
[Unit]
|
||||||
|
Description=Run AI Lab configuration backup daily
|
||||||
|
|
||||||
|
[Timer]
|
||||||
|
OnCalendar=*-*-* 03:30:00
|
||||||
|
Persistent=true
|
||||||
|
|
||||||
|
[Install]
|
||||||
|
WantedBy=timers.target
|
||||||
@@ -0,0 +1,6 @@
|
|||||||
|
[Unit]
|
||||||
|
Description=AI Lab disk usage check
|
||||||
|
|
||||||
|
[Service]
|
||||||
|
Type=oneshot
|
||||||
|
ExecStart=/usr/local/sbin/ailab-disk-watch.sh
|
||||||
@@ -0,0 +1,9 @@
|
|||||||
|
[Unit]
|
||||||
|
Description=Run AI Lab disk usage check hourly
|
||||||
|
|
||||||
|
[Timer]
|
||||||
|
OnCalendar=hourly
|
||||||
|
Persistent=true
|
||||||
|
|
||||||
|
[Install]
|
||||||
|
WantedBy=timers.target
|
||||||
@@ -0,0 +1,8 @@
|
|||||||
|
[Unit]
|
||||||
|
Description=AI Lab safe Docker cleanup
|
||||||
|
Requires=docker.service
|
||||||
|
After=docker.service
|
||||||
|
|
||||||
|
[Service]
|
||||||
|
Type=oneshot
|
||||||
|
ExecStart=/usr/local/sbin/ailab-docker-cleanup.sh --execute --non-interactive
|
||||||
@@ -0,0 +1,9 @@
|
|||||||
|
[Unit]
|
||||||
|
Description=Run AI Lab Docker cleanup weekly
|
||||||
|
|
||||||
|
[Timer]
|
||||||
|
OnCalendar=Sun *-*-* 04:40:00
|
||||||
|
Persistent=true
|
||||||
|
|
||||||
|
[Install]
|
||||||
|
WantedBy=timers.target
|
||||||
@@ -0,0 +1,8 @@
|
|||||||
|
[Unit]
|
||||||
|
Description=AI Lab safe kernel cleanup
|
||||||
|
After=network-online.target ailab-apt-cleanup.service
|
||||||
|
Wants=network-online.target
|
||||||
|
|
||||||
|
[Service]
|
||||||
|
Type=oneshot
|
||||||
|
ExecStart=/usr/local/sbin/ailab-kernel-cleanup.sh --execute --non-interactive
|
||||||
@@ -0,0 +1,9 @@
|
|||||||
|
[Unit]
|
||||||
|
Description=Run AI Lab kernel cleanup weekly
|
||||||
|
|
||||||
|
[Timer]
|
||||||
|
OnCalendar=Sun *-*-* 04:20:00
|
||||||
|
Persistent=true
|
||||||
|
|
||||||
|
[Install]
|
||||||
|
WantedBy=timers.target
|
||||||
@@ -0,0 +1,276 @@
|
|||||||
|
# Linux Fresh Setup Toolkit
|
||||||
|
|
||||||
|
## Executive summary
|
||||||
|
|
||||||
|
The Linux Fresh Setup Toolkit is day-0 bootstrap automation for a clean Ubuntu
|
||||||
|
lab server or workstation. It prepares a host for routine administration,
|
||||||
|
Cockpit, Docker workloads, libvirt/KVM virtual machines, optional NVIDIA
|
||||||
|
diagnostics, bounded logging, practical kernel tuning, and a conservative
|
||||||
|
security baseline.
|
||||||
|
|
||||||
|
The scripts are modular and safe to rerun. Optional components remain optional,
|
||||||
|
UFW is not enabled without a specific flag, and an NVIDIA driver is never
|
||||||
|
installed without an explicit version. This is a portfolio and homelab
|
||||||
|
implementation, not a production-certified build standard.
|
||||||
|
|
||||||
|
## Scope and non-goals
|
||||||
|
|
||||||
|
The toolkit supports Ubuntu 24.04 and newer and assumes a systemd-based host
|
||||||
|
with APT package management. It is suitable for a host such as `ailab` that may
|
||||||
|
run WebODM, Open WebUI, Homepage, NVIDIA workloads, or test virtual machines.
|
||||||
|
|
||||||
|
It does not:
|
||||||
|
|
||||||
|
- Deploy applications, containers, or virtual machines.
|
||||||
|
- Configure GPU passthrough, VFIO bindings, bridges, or Windows guests.
|
||||||
|
- Select an NVIDIA driver automatically.
|
||||||
|
- Define a complete firewall policy or compliance baseline.
|
||||||
|
- Replace backup, monitoring, patching, or ongoing maintenance processes.
|
||||||
|
- Claim live validation against every future Ubuntu release.
|
||||||
|
|
||||||
|
## Why this is separate from ailab-maintenance
|
||||||
|
|
||||||
|
This project establishes a fresh host. The sibling
|
||||||
|
[AI Lab Maintenance Toolkit](../ailab-maintenance/) handles day-2 health
|
||||||
|
checks, scheduled cleanup, configuration backup, disk monitoring, and VM
|
||||||
|
inventory after a host is operating.
|
||||||
|
|
||||||
|
Keeping bootstrap and maintenance separate makes the change boundary clear:
|
||||||
|
this toolkit installs platform capabilities and baseline configuration, while
|
||||||
|
the maintenance toolkit manages recurring operational tasks.
|
||||||
|
|
||||||
|
## Directory layout
|
||||||
|
|
||||||
|
```text
|
||||||
|
setup/
|
||||||
|
├── README.md
|
||||||
|
├── install.sh
|
||||||
|
├── scripts/
|
||||||
|
│ ├── 00-preflight.sh
|
||||||
|
│ ├── 00-platform-guard.inc
|
||||||
|
│ ├── 01-base-packages.sh
|
||||||
|
│ ├── 02-shell-profile.sh
|
||||||
|
│ ├── 03-cockpit.sh
|
||||||
|
│ ├── 04-docker.sh
|
||||||
|
│ ├── 05-libvirt.sh
|
||||||
|
│ ├── 06-nvidia-tools.sh
|
||||||
|
│ ├── 07-tuning.sh
|
||||||
|
│ ├── 08-security-baseline.sh
|
||||||
|
│ └── 99-postcheck.sh
|
||||||
|
├── files/
|
||||||
|
│ ├── bashrc.d/ailab.sh
|
||||||
|
│ ├── docker/daemon.json
|
||||||
|
│ ├── sysctl/99-ailab.conf
|
||||||
|
│ └── systemd/journald-ailab-limits.conf
|
||||||
|
└── docs/
|
||||||
|
├── fresh-install-checklist.md
|
||||||
|
├── cockpit.md
|
||||||
|
├── docker.md
|
||||||
|
├── libvirt.md
|
||||||
|
├── nvidia.md
|
||||||
|
└── bash-shell.md
|
||||||
|
```
|
||||||
|
|
||||||
|
`00-platform-guard.inc` is an internal sourced helper used by mutating
|
||||||
|
component scripts; it is not an executable profile.
|
||||||
|
|
||||||
|
## Supported profiles and flags
|
||||||
|
|
||||||
|
| Flag | Result |
|
||||||
|
| --- | --- |
|
||||||
|
| `--base` | Install operational CLI, diagnostic, storage, and network packages |
|
||||||
|
| `--shell` | Install the root AI lab Bash profile |
|
||||||
|
| `--cockpit` | Install and enable Cockpit |
|
||||||
|
| `--docker` | Install Docker and bounded JSON-file logging |
|
||||||
|
| `--libvirt` | Install and enable libvirt/KVM |
|
||||||
|
| `--nvidia-tools` | Install NVIDIA and OpenCL diagnostics without a driver |
|
||||||
|
| `--install-nvidia-driver VERSION` | Install diagnostics and the named Ubuntu driver package |
|
||||||
|
| `--tuning` | Apply journald, sysctl, sensor, and sysstat settings |
|
||||||
|
| `--security` | Install and enable fail2ban; install but do not enable UFW |
|
||||||
|
| `--enable-ufw` | Run security setup and explicitly enable UFW |
|
||||||
|
| `--all` | Run every standard profile without UFW enablement or driver installation |
|
||||||
|
|
||||||
|
`--install-nvidia-driver` implies `--nvidia-tools`. `--enable-ufw` implies
|
||||||
|
`--security`. With no flags, the installer prints help and makes no changes.
|
||||||
|
|
||||||
|
## Installation examples
|
||||||
|
|
||||||
|
Review the scripts and current host access path before execution:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
cd labs/linux/setup
|
||||||
|
./install.sh
|
||||||
|
sudo ./install.sh --base --shell
|
||||||
|
sudo ./install.sh --cockpit --docker --libvirt
|
||||||
|
sudo ./install.sh --all
|
||||||
|
```
|
||||||
|
|
||||||
|
Explicit high-impact options can be combined with `--all`:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
sudo ./install.sh --all --enable-ufw
|
||||||
|
sudo ./install.sh --all --install-nvidia-driver 550
|
||||||
|
```
|
||||||
|
|
||||||
|
The installer runs the read-only preflight once before selected profiles and a
|
||||||
|
postcheck after all successful profile steps.
|
||||||
|
|
||||||
|
## Fresh host workflow
|
||||||
|
|
||||||
|
1. Patch the base Ubuntu installation and confirm console or out-of-band access.
|
||||||
|
2. Review [the fresh install checklist](docs/fresh-install-checklist.md).
|
||||||
|
3. Run `sudo ./install.sh --base --shell`.
|
||||||
|
4. Add only the platform profiles needed by the host.
|
||||||
|
5. Review service state, listening ports, storage, networking, and warnings in
|
||||||
|
the postcheck.
|
||||||
|
6. Reboot if a driver or kernel-related package requires it.
|
||||||
|
7. Capture host-specific configuration and backup requirements separately.
|
||||||
|
|
||||||
|
## AI lab workflow
|
||||||
|
|
||||||
|
A general AI lab host can start with:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
sudo ./install.sh --base --shell --cockpit --docker --nvidia-tools --tuning --security
|
||||||
|
```
|
||||||
|
|
||||||
|
This installs GPU diagnostics but leaves driver choice to the operator. Add
|
||||||
|
libvirt only when the host will run VMs. Enable UFW only after confirming SSH,
|
||||||
|
Cockpit, application, bridge, and VM networking requirements.
|
||||||
|
|
||||||
|
## Safety model
|
||||||
|
|
||||||
|
- Mutating profiles require root and refuse non-Ubuntu systems or Ubuntu older
|
||||||
|
than 24.04.
|
||||||
|
- Component profiles install their own direct prerequisites.
|
||||||
|
- Existing managed configuration is changed only when content differs.
|
||||||
|
- Changed root shell, Docker, journald, and sysctl files receive timestamped
|
||||||
|
backups.
|
||||||
|
- Existing valid Docker JSON is merged so unrelated settings survive.
|
||||||
|
- Invalid Docker JSON stops configuration rather than being overwritten.
|
||||||
|
- UFW and NVIDIA driver installation require explicit flags.
|
||||||
|
- Package and service failures are not hidden.
|
||||||
|
- Postcheck warnings report optional or inactive components without masking a
|
||||||
|
successfully completed diagnostic script.
|
||||||
|
|
||||||
|
APT installation and service restarts are real system changes. Test first on a
|
||||||
|
disposable host and maintain a console path when changing remote access policy.
|
||||||
|
|
||||||
|
## Bash shell profile
|
||||||
|
|
||||||
|
The shell profile is installed as `/root/.bashrc.d/ailab.sh`, and one exact
|
||||||
|
source line is maintained in `/root/.bashrc`. It adds concise helpers for
|
||||||
|
systemd, journals, Docker, libvirt, NVIDIA, ports, archives, and disk usage.
|
||||||
|
|
||||||
|
See [Bash shell profile](docs/bash-shell.md) for command details and cautions.
|
||||||
|
|
||||||
|
## Cockpit setup
|
||||||
|
|
||||||
|
Cockpit provides browser-based host, storage, network, package, VM, metrics,
|
||||||
|
and support-report views. The installer enables `cockpit.socket` and reports
|
||||||
|
`https://HOSTNAME:9090`. `cockpit-files` is optional because it is not
|
||||||
|
available in every enabled Ubuntu repository.
|
||||||
|
|
||||||
|
See [Cockpit setup](docs/cockpit.md).
|
||||||
|
|
||||||
|
## Docker setup
|
||||||
|
|
||||||
|
The Ubuntu `docker.io` package path is preferred. The Docker official
|
||||||
|
repository is configured only when `docker.io` is unavailable. The daemon uses
|
||||||
|
the `json-file` log driver with five 50 MB files per container.
|
||||||
|
|
||||||
|
The toolkit configures log retention only. It does not prune data, deploy
|
||||||
|
Compose applications, or configure an NVIDIA container runtime.
|
||||||
|
|
||||||
|
See [Docker setup](docs/docker.md).
|
||||||
|
|
||||||
|
## libvirt/KVM setup
|
||||||
|
|
||||||
|
The libvirt profile installs QEMU, OVMF, software TPM support, virt-install,
|
||||||
|
virt-manager, bridge utilities, and libvirt clients and services. It enables
|
||||||
|
`libvirtd` and prints existing guests and networks.
|
||||||
|
|
||||||
|
See [libvirt/KVM setup](docs/libvirt.md).
|
||||||
|
|
||||||
|
## NVIDIA tooling
|
||||||
|
|
||||||
|
The default NVIDIA profile installs `nvtop`, `clinfo`, and PCI diagnostics.
|
||||||
|
It reports detected NVIDIA devices, `nvidia-smi`, and DKMS state when those
|
||||||
|
commands exist.
|
||||||
|
|
||||||
|
Driver installation requires a numeric version that maps to an available
|
||||||
|
Ubuntu package, for example `nvidia-driver-550`. Secure Boot enrollment,
|
||||||
|
driver suitability, CUDA, container runtime support, and passthrough remain
|
||||||
|
operator decisions.
|
||||||
|
|
||||||
|
See [NVIDIA tooling](docs/nvidia.md).
|
||||||
|
|
||||||
|
## Tuning
|
||||||
|
|
||||||
|
The tuning profile bounds persistent journal use, raises inotify limits for
|
||||||
|
development and container workloads, reduces swappiness, enables sysstat, and
|
||||||
|
runs automatic sensor detection when available.
|
||||||
|
|
||||||
|
Review these values against available memory, storage, monitoring retention,
|
||||||
|
and workload behavior before deployment beyond a lab.
|
||||||
|
|
||||||
|
## Security baseline
|
||||||
|
|
||||||
|
The security profile installs UFW and fail2ban and enables fail2ban. It leaves
|
||||||
|
UFW disabled unless `--enable-ufw` is present. Explicit UFW enablement permits
|
||||||
|
OpenSSH and TCP port 9090 before activation.
|
||||||
|
|
||||||
|
This is a minimal access-preservation baseline, not a complete host firewall or
|
||||||
|
hardening standard. Application and VM networking may require additional
|
||||||
|
reviewed rules.
|
||||||
|
|
||||||
|
## Postcheck
|
||||||
|
|
||||||
|
The final script reports:
|
||||||
|
|
||||||
|
- Failed systemd units.
|
||||||
|
- Cockpit, Docker, libvirt, and fail2ban status when installed.
|
||||||
|
- Running Docker containers and defined virtual machines.
|
||||||
|
- NVIDIA runtime state.
|
||||||
|
- Filesystem usage and listening ports.
|
||||||
|
|
||||||
|
Warnings require operator review but optional component absence does not cause
|
||||||
|
the postcheck itself to fail.
|
||||||
|
|
||||||
|
## Troubleshooting
|
||||||
|
|
||||||
|
Run individual read-only checks after correcting a failed profile:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
sudo ./scripts/00-preflight.sh
|
||||||
|
sudo ./scripts/99-postcheck.sh
|
||||||
|
systemctl --failed
|
||||||
|
journalctl -u docker -u libvirtd -u cockpit.socket -u fail2ban
|
||||||
|
```
|
||||||
|
|
||||||
|
Common failure areas are unavailable APT repositories, unsupported package
|
||||||
|
names on a future Ubuntu release, invalid pre-existing Docker JSON, Secure Boot
|
||||||
|
module signing, disabled CPU virtualization, and remote firewall assumptions.
|
||||||
|
|
||||||
|
To roll back a managed configuration, compare the current file with its
|
||||||
|
timestamped `.bak` copy, restore the reviewed backup, and restart or reload the
|
||||||
|
owning service. Package removal is intentionally not automated because it may
|
||||||
|
affect workloads and dependencies.
|
||||||
|
|
||||||
|
## Interview talking points
|
||||||
|
|
||||||
|
- Why day-0 bootstrap and day-2 maintenance have separate ownership.
|
||||||
|
- How explicit flags protect firewall and GPU driver decisions.
|
||||||
|
- Why Docker JSON is validated, backed up, and merged.
|
||||||
|
- How idempotent content checks prevent backup and restart churn.
|
||||||
|
- Why preflight and postcheck evidence surround mutating profiles.
|
||||||
|
- Which virtualization, Secure Boot, IOMMU, and GPU decisions remain manual.
|
||||||
|
|
||||||
|
## Future improvements
|
||||||
|
|
||||||
|
- Add automated tests using disposable Ubuntu VMs.
|
||||||
|
- Add a documented NVIDIA Container Toolkit profile.
|
||||||
|
- Add optional non-root administrative user and group membership management.
|
||||||
|
- Add bridge and VFIO planning checks without applying passthrough changes.
|
||||||
|
- Add package compatibility matrices after validating future Ubuntu releases.
|
||||||
|
- Export postcheck results in a structured format for evidence collection.
|
||||||
@@ -0,0 +1,53 @@
|
|||||||
|
# Bash Shell Profile
|
||||||
|
|
||||||
|
## Installation
|
||||||
|
|
||||||
|
The shell profile is installed for root:
|
||||||
|
|
||||||
|
```text
|
||||||
|
/root/.bashrc.d/ailab.sh
|
||||||
|
```
|
||||||
|
|
||||||
|
The installer maintains one exact source line in `/root/.bashrc` and backs up
|
||||||
|
changed files. Start a new Bash session or run:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
source /root/.bashrc
|
||||||
|
```
|
||||||
|
|
||||||
|
## Aliases
|
||||||
|
|
||||||
|
| Alias | Purpose |
|
||||||
|
| --- | --- |
|
||||||
|
| `ll`, `la` | Detailed and hidden-file directory listings |
|
||||||
|
| `ports` | Listening TCP/UDP sockets and processes |
|
||||||
|
| `dus`, `dufh` | Directory and filesystem usage |
|
||||||
|
| `failed`, `jerr`, `timers` | systemd failure, journal error, and timer views |
|
||||||
|
| `dps`, `ddf`, `dcu` | Docker containers, disk use, and Compose startup |
|
||||||
|
| `vms` | All libvirt guests |
|
||||||
|
| `gpu`, `gpuloop` | NVIDIA status once or refreshed every two seconds |
|
||||||
|
| `now` | Current timestamp and timezone |
|
||||||
|
|
||||||
|
`dcu` runs `docker compose up -d` in the current directory and therefore may
|
||||||
|
create or start resources. Review the Compose project before using it.
|
||||||
|
|
||||||
|
## Functions
|
||||||
|
|
||||||
|
- `svc_status SERVICE`
|
||||||
|
- `svc_logs SERVICE [LINES]`
|
||||||
|
- `docker_logs CONTAINER [LINES]`
|
||||||
|
- `docker_restart CONTAINER`
|
||||||
|
- `vm_autostart VM`
|
||||||
|
- `vm_no_autostart VM`
|
||||||
|
- `path_backup PATH`
|
||||||
|
- `extract ARCHIVE`
|
||||||
|
|
||||||
|
Functions validate argument counts, and Docker, libvirt, and NVIDIA helpers
|
||||||
|
report missing commands clearly. `path_backup` creates a timestamped adjacent
|
||||||
|
copy and can consume substantial space for large paths.
|
||||||
|
|
||||||
|
## Rollback
|
||||||
|
|
||||||
|
Review timestamped backups under `/root`, restore the desired `.bashrc` or
|
||||||
|
profile copy, and start a new shell. Avoid restoring a backup without checking
|
||||||
|
for unrelated shell changes made after bootstrap.
|
||||||
@@ -0,0 +1,41 @@
|
|||||||
|
# Cockpit
|
||||||
|
|
||||||
|
## Purpose
|
||||||
|
|
||||||
|
The Cockpit profile installs browser-based host administration modules for
|
||||||
|
system state, storage, networking, packages, virtual machines, metrics, and
|
||||||
|
support reports. It enables the socket-activated service.
|
||||||
|
|
||||||
|
## Installation and validation
|
||||||
|
|
||||||
|
```bash
|
||||||
|
sudo ./install.sh --cockpit
|
||||||
|
systemctl status cockpit.socket
|
||||||
|
ss -ltnp | grep ':9090'
|
||||||
|
```
|
||||||
|
|
||||||
|
Connect to `https://HOSTNAME:9090`. A browser warning is expected when the
|
||||||
|
default host certificate is not trusted.
|
||||||
|
|
||||||
|
`cockpit-files` is installed when available and skipped with a warning
|
||||||
|
otherwise.
|
||||||
|
|
||||||
|
## Access and firewall
|
||||||
|
|
||||||
|
The Cockpit profile does not change UFW. Explicit toolkit UFW enablement allows
|
||||||
|
TCP 9090, but upstream firewalls and network ACLs remain external concerns.
|
||||||
|
Use normal Linux accounts and review which users may administer the host.
|
||||||
|
|
||||||
|
## Troubleshooting and rollback
|
||||||
|
|
||||||
|
```bash
|
||||||
|
journalctl -u cockpit.socket -u cockpit.service
|
||||||
|
systemctl restart cockpit.socket
|
||||||
|
apt-cache policy cockpit cockpit-machines cockpit-files
|
||||||
|
```
|
||||||
|
|
||||||
|
To disable remote access without removing packages:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
sudo systemctl disable --now cockpit.socket
|
||||||
|
```
|
||||||
@@ -0,0 +1,56 @@
|
|||||||
|
# Docker
|
||||||
|
|
||||||
|
## Package policy
|
||||||
|
|
||||||
|
The profile prefers Ubuntu's `docker.io` package. If that package is
|
||||||
|
unavailable after an APT refresh, it configures Docker's official Ubuntu
|
||||||
|
repository and installs Docker Engine, containerd, Buildx, and Compose plugins.
|
||||||
|
|
||||||
|
This fallback requires network access to `download.docker.com`.
|
||||||
|
|
||||||
|
## Daemon configuration
|
||||||
|
|
||||||
|
The managed settings are:
|
||||||
|
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"log-driver": "json-file",
|
||||||
|
"log-opts": {
|
||||||
|
"max-size": "50m",
|
||||||
|
"max-file": "5"
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
Existing valid `/etc/docker/daemon.json` content is preserved and merged with
|
||||||
|
these log settings. A changed file is backed up with a timestamp. Invalid JSON
|
||||||
|
causes the profile to stop rather than overwrite operator configuration.
|
||||||
|
|
||||||
|
Log limits apply to newly created containers. Existing containers may retain
|
||||||
|
their original logging configuration until recreated.
|
||||||
|
|
||||||
|
## Validation
|
||||||
|
|
||||||
|
```bash
|
||||||
|
docker version
|
||||||
|
docker compose version
|
||||||
|
docker info
|
||||||
|
docker ps
|
||||||
|
docker system df
|
||||||
|
jq . /etc/docker/daemon.json
|
||||||
|
```
|
||||||
|
|
||||||
|
## Troubleshooting and rollback
|
||||||
|
|
||||||
|
```bash
|
||||||
|
systemctl status docker
|
||||||
|
journalctl -u docker
|
||||||
|
jq empty /etc/docker/daemon.json
|
||||||
|
```
|
||||||
|
|
||||||
|
To restore a previous daemon configuration, review a timestamped backup,
|
||||||
|
replace the current file, validate it with `jq empty`, and restart Docker.
|
||||||
|
Do not restore blindly when workloads depend on newer daemon settings.
|
||||||
|
|
||||||
|
The profile does not configure Docker data roots, prune objects, deploy
|
||||||
|
applications, or install the NVIDIA Container Toolkit.
|
||||||
@@ -0,0 +1,47 @@
|
|||||||
|
# Fresh Install Checklist
|
||||||
|
|
||||||
|
## Before bootstrap
|
||||||
|
|
||||||
|
- Confirm Ubuntu 24.04 or newer and record the release and kernel.
|
||||||
|
- Apply firmware settings for virtualization, IOMMU, or Secure Boot as needed.
|
||||||
|
- Confirm console or out-of-band access before firewall work.
|
||||||
|
- Record interfaces, addresses, routes, DNS, storage, and intended mountpoints.
|
||||||
|
- Patch the base system and reboot if required.
|
||||||
|
- Decide whether the host needs Docker, libvirt, Cockpit, or NVIDIA support.
|
||||||
|
- Review application ports and VM networking before enabling UFW.
|
||||||
|
- Confirm backups exist for any pre-existing host configuration.
|
||||||
|
|
||||||
|
## Bootstrap
|
||||||
|
|
||||||
|
Start with the least capability required:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
sudo ./install.sh --base --shell
|
||||||
|
```
|
||||||
|
|
||||||
|
Add reviewed platform profiles:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
sudo ./install.sh --cockpit --docker --libvirt --nvidia-tools --tuning --security
|
||||||
|
```
|
||||||
|
|
||||||
|
Do not select `--enable-ufw` until remote access and application rules are
|
||||||
|
understood. Do not install an NVIDIA driver until hardware, kernel, Secure Boot,
|
||||||
|
and workload compatibility are known.
|
||||||
|
|
||||||
|
## Post-bootstrap evidence
|
||||||
|
|
||||||
|
- Review all installer warnings.
|
||||||
|
- Run `systemctl --failed`.
|
||||||
|
- Confirm expected services with `systemctl status`.
|
||||||
|
- Review `ss -tulpn`, `df -hT`, `ip -brief address`, and `ip route`.
|
||||||
|
- Confirm Docker with `docker version` and `docker compose version`.
|
||||||
|
- Confirm libvirt with `virsh list --all` and `virsh net-list --all`.
|
||||||
|
- Confirm GPU state with `lspci -nn | grep -i nvidia` and `nvidia-smi`.
|
||||||
|
- Reboot after driver installation and repeat the postcheck.
|
||||||
|
|
||||||
|
## Handover
|
||||||
|
|
||||||
|
Document host-specific storage, network, firewall, backup, application, GPU,
|
||||||
|
and VM decisions. Install the separate `ailab-maintenance` toolkit only after
|
||||||
|
reviewing its scheduled day-2 behavior.
|
||||||
@@ -0,0 +1,54 @@
|
|||||||
|
# libvirt and KVM
|
||||||
|
|
||||||
|
## Purpose
|
||||||
|
|
||||||
|
The libvirt profile installs QEMU/KVM administration, UEFI firmware, software
|
||||||
|
TPM support, VM creation tools, bridge utilities, and the libvirt daemon. This
|
||||||
|
supports later Linux or Windows 11 VM work without defining guests.
|
||||||
|
|
||||||
|
## Firmware pre-checks
|
||||||
|
|
||||||
|
Confirm CPU virtualization is enabled:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
lscpu | grep -E 'Virtualization|Hypervisor'
|
||||||
|
grep -Eom1 '(vmx|svm)' /proc/cpuinfo
|
||||||
|
```
|
||||||
|
|
||||||
|
IOMMU and GPU passthrough require separate firmware, kernel command-line,
|
||||||
|
device isolation, driver binding, and recovery planning. This toolkit reports
|
||||||
|
hints but does not apply those changes.
|
||||||
|
|
||||||
|
## Validation
|
||||||
|
|
||||||
|
```bash
|
||||||
|
systemctl status libvirtd
|
||||||
|
virsh list --all
|
||||||
|
virsh net-list --all
|
||||||
|
virsh pool-list --all
|
||||||
|
```
|
||||||
|
|
||||||
|
Use `virt-host-validate` when available for a broader host capability report.
|
||||||
|
Desktop use of `virt-manager` requires a graphical environment or remote
|
||||||
|
display strategy.
|
||||||
|
|
||||||
|
## Networking and Windows 11
|
||||||
|
|
||||||
|
The default libvirt NAT network is distinct from host bridge networking. Review
|
||||||
|
DHCP, DNS, forwarding, and firewall behavior before changing it.
|
||||||
|
|
||||||
|
Windows 11 typically needs UEFI and a TPM device. The installed OVMF and swtpm
|
||||||
|
packages provide those building blocks, but guest creation and licensing remain
|
||||||
|
manual.
|
||||||
|
|
||||||
|
## Troubleshooting
|
||||||
|
|
||||||
|
```bash
|
||||||
|
journalctl -u libvirtd
|
||||||
|
virsh net-info default
|
||||||
|
virsh pool-list --all
|
||||||
|
lsmod | grep kvm
|
||||||
|
```
|
||||||
|
|
||||||
|
Disabling `libvirtd` does not remove VM disks or definitions. Package removal
|
||||||
|
and VM data deletion are intentionally outside this toolkit.
|
||||||
@@ -0,0 +1,52 @@
|
|||||||
|
# NVIDIA Tooling
|
||||||
|
|
||||||
|
## Diagnostic-only default
|
||||||
|
|
||||||
|
The normal NVIDIA profile installs `nvtop`, `clinfo`, and PCI utilities. It
|
||||||
|
does not install or select a driver:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
sudo ./install.sh --nvidia-tools
|
||||||
|
```
|
||||||
|
|
||||||
|
Review hardware and current module state:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
lspci -nn | grep -i nvidia
|
||||||
|
nvidia-smi
|
||||||
|
dkms status
|
||||||
|
mokutil --sb-state
|
||||||
|
```
|
||||||
|
|
||||||
|
## Explicit driver installation
|
||||||
|
|
||||||
|
Install only a reviewed Ubuntu driver package version:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
sudo ./install.sh --install-nvidia-driver 550
|
||||||
|
```
|
||||||
|
|
||||||
|
The numeric value maps directly to `nvidia-driver-VERSION`. The profile refuses
|
||||||
|
an unavailable package. Reboot after installation, then validate `nvidia-smi`,
|
||||||
|
kernel logs, DKMS state, and application behavior.
|
||||||
|
|
||||||
|
## Selection considerations
|
||||||
|
|
||||||
|
- GPU generation and supported driver branch.
|
||||||
|
- Ubuntu release, kernel, and HWE stack.
|
||||||
|
- Secure Boot module enrollment.
|
||||||
|
- CUDA or application compatibility.
|
||||||
|
- Docker NVIDIA Container Toolkit requirements.
|
||||||
|
- Whether the device will be bound to VFIO instead of the host driver.
|
||||||
|
|
||||||
|
## Troubleshooting
|
||||||
|
|
||||||
|
```bash
|
||||||
|
journalctl -k | grep -Ei 'nvidia|nouveau|NVRM'
|
||||||
|
lsmod | grep -E 'nvidia|nouveau'
|
||||||
|
dkms status
|
||||||
|
apt-cache policy 'nvidia-driver-*'
|
||||||
|
```
|
||||||
|
|
||||||
|
Driver rollback is environment-specific and is not automated. Preserve console
|
||||||
|
access and a known-good kernel before changing GPU or Secure Boot configuration.
|
||||||
@@ -0,0 +1,133 @@
|
|||||||
|
#!/usr/bin/env bash
|
||||||
|
# AI lab operational shell helpers. This file is intended to be sourced.
|
||||||
|
|
||||||
|
alias ll='ls -alF'
|
||||||
|
alias la='ls -A'
|
||||||
|
alias ports='ss -tulpn'
|
||||||
|
alias dus='du -xhd1 2>/dev/null | sort -h'
|
||||||
|
alias dufh='df -hT'
|
||||||
|
alias failed='systemctl --failed --no-pager'
|
||||||
|
alias jerr='journalctl -p err -b --no-pager'
|
||||||
|
alias timers='systemctl list-timers --all --no-pager'
|
||||||
|
alias dps='command -v docker >/dev/null 2>&1 && docker ps --format "table {{.Names}}\t{{.Image}}\t{{.Status}}\t{{.Ports}}" || printf "Docker is not installed\n"'
|
||||||
|
alias ddf='command -v docker >/dev/null 2>&1 && docker system df || printf "Docker is not installed\n"'
|
||||||
|
alias dcu='command -v docker >/dev/null 2>&1 && docker compose up -d || printf "Docker Compose is not available\n"'
|
||||||
|
alias vms='command -v virsh >/dev/null 2>&1 && virsh list --all || printf "virsh is not installed\n"'
|
||||||
|
alias gpu='command -v nvidia-smi >/dev/null 2>&1 && nvidia-smi || printf "nvidia-smi is not installed\n"'
|
||||||
|
alias gpuloop='command -v nvidia-smi >/dev/null 2>&1 && watch -n 2 nvidia-smi || printf "nvidia-smi is not installed\n"'
|
||||||
|
alias now='date "+%Y-%m-%d %H:%M:%S %Z"'
|
||||||
|
|
||||||
|
svc_status() {
|
||||||
|
if (($# != 1)); then
|
||||||
|
printf 'Usage: svc_status SERVICE\n' >&2
|
||||||
|
return 2
|
||||||
|
fi
|
||||||
|
systemctl status "$1" --no-pager
|
||||||
|
}
|
||||||
|
|
||||||
|
svc_logs() {
|
||||||
|
if (($# < 1 || $# > 2)); then
|
||||||
|
printf 'Usage: svc_logs SERVICE [LINES]\n' >&2
|
||||||
|
return 2
|
||||||
|
fi
|
||||||
|
local lines="${2:-100}"
|
||||||
|
[[ "$lines" =~ ^[0-9]+$ ]] || {
|
||||||
|
printf 'LINES must be numeric\n' >&2
|
||||||
|
return 2
|
||||||
|
}
|
||||||
|
journalctl -u "$1" -n "$lines" --no-pager
|
||||||
|
}
|
||||||
|
|
||||||
|
docker_logs() {
|
||||||
|
if (($# < 1 || $# > 2)); then
|
||||||
|
printf 'Usage: docker_logs CONTAINER [LINES]\n' >&2
|
||||||
|
return 2
|
||||||
|
fi
|
||||||
|
command -v docker >/dev/null 2>&1 || {
|
||||||
|
printf 'Docker is not installed\n' >&2
|
||||||
|
return 1
|
||||||
|
}
|
||||||
|
local lines="${2:-100}"
|
||||||
|
[[ "$lines" =~ ^[0-9]+$ ]] || {
|
||||||
|
printf 'LINES must be numeric\n' >&2
|
||||||
|
return 2
|
||||||
|
}
|
||||||
|
docker logs --tail "$lines" "$1"
|
||||||
|
}
|
||||||
|
|
||||||
|
docker_restart() {
|
||||||
|
if (($# != 1)); then
|
||||||
|
printf 'Usage: docker_restart CONTAINER\n' >&2
|
||||||
|
return 2
|
||||||
|
fi
|
||||||
|
command -v docker >/dev/null 2>&1 || {
|
||||||
|
printf 'Docker is not installed\n' >&2
|
||||||
|
return 1
|
||||||
|
}
|
||||||
|
docker restart "$1"
|
||||||
|
}
|
||||||
|
|
||||||
|
vm_autostart() {
|
||||||
|
if (($# != 1)); then
|
||||||
|
printf 'Usage: vm_autostart VM\n' >&2
|
||||||
|
return 2
|
||||||
|
fi
|
||||||
|
command -v virsh >/dev/null 2>&1 || {
|
||||||
|
printf 'virsh is not installed\n' >&2
|
||||||
|
return 1
|
||||||
|
}
|
||||||
|
virsh autostart "$1"
|
||||||
|
}
|
||||||
|
|
||||||
|
vm_no_autostart() {
|
||||||
|
if (($# != 1)); then
|
||||||
|
printf 'Usage: vm_no_autostart VM\n' >&2
|
||||||
|
return 2
|
||||||
|
fi
|
||||||
|
command -v virsh >/dev/null 2>&1 || {
|
||||||
|
printf 'virsh is not installed\n' >&2
|
||||||
|
return 1
|
||||||
|
}
|
||||||
|
virsh autostart --disable "$1"
|
||||||
|
}
|
||||||
|
|
||||||
|
path_backup() {
|
||||||
|
if (($# != 1)); then
|
||||||
|
printf 'Usage: path_backup PATH\n' >&2
|
||||||
|
return 2
|
||||||
|
fi
|
||||||
|
if [[ ! -e "$1" ]]; then
|
||||||
|
printf 'Path does not exist: %s\n' "$1" >&2
|
||||||
|
return 1
|
||||||
|
fi
|
||||||
|
local destination
|
||||||
|
destination="${1%/}.$(date '+%Y%m%d-%H%M%S').bak"
|
||||||
|
cp -a -- "$1" "$destination"
|
||||||
|
printf 'Backup created: %s\n' "$destination"
|
||||||
|
}
|
||||||
|
|
||||||
|
extract() {
|
||||||
|
if (($# != 1)); then
|
||||||
|
printf 'Usage: extract ARCHIVE\n' >&2
|
||||||
|
return 2
|
||||||
|
fi
|
||||||
|
if [[ ! -f "$1" ]]; then
|
||||||
|
printf 'Archive does not exist: %s\n' "$1" >&2
|
||||||
|
return 1
|
||||||
|
fi
|
||||||
|
case "$1" in
|
||||||
|
*.tar.bz2|*.tbz2) tar xjf "$1" ;;
|
||||||
|
*.tar.gz|*.tgz) tar xzf "$1" ;;
|
||||||
|
*.tar.xz|*.txz) tar xJf "$1" ;;
|
||||||
|
*.tar) tar xf "$1" ;;
|
||||||
|
*.bz2) bunzip2 "$1" ;;
|
||||||
|
*.gz) gunzip "$1" ;;
|
||||||
|
*.zip) unzip "$1" ;;
|
||||||
|
*.7z) 7z x "$1" ;;
|
||||||
|
*.rar) unrar x "$1" ;;
|
||||||
|
*)
|
||||||
|
printf 'Unsupported archive type: %s\n' "$1" >&2
|
||||||
|
return 2
|
||||||
|
;;
|
||||||
|
esac
|
||||||
|
}
|
||||||
@@ -0,0 +1,7 @@
|
|||||||
|
{
|
||||||
|
"log-driver": "json-file",
|
||||||
|
"log-opts": {
|
||||||
|
"max-size": "50m",
|
||||||
|
"max-file": "5"
|
||||||
|
}
|
||||||
|
}
|
||||||
@@ -0,0 +1,3 @@
|
|||||||
|
fs.inotify.max_user_watches=1048576
|
||||||
|
fs.inotify.max_user_instances=1024
|
||||||
|
vm.swappiness=10
|
||||||
@@ -0,0 +1,5 @@
|
|||||||
|
[Journal]
|
||||||
|
SystemMaxUse=1G
|
||||||
|
SystemKeepFree=2G
|
||||||
|
MaxRetentionSec=14day
|
||||||
|
Compress=yes
|
||||||
Executable
+182
@@ -0,0 +1,182 @@
|
|||||||
|
#!/usr/bin/env bash
|
||||||
|
set -o errexit
|
||||||
|
set -o nounset
|
||||||
|
set -o pipefail
|
||||||
|
|
||||||
|
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
|
||||||
|
|
||||||
|
run_base=0
|
||||||
|
run_shell=0
|
||||||
|
run_cockpit=0
|
||||||
|
run_docker=0
|
||||||
|
run_libvirt=0
|
||||||
|
run_nvidia=0
|
||||||
|
run_tuning=0
|
||||||
|
run_security=0
|
||||||
|
enable_ufw=0
|
||||||
|
nvidia_driver_version=""
|
||||||
|
|
||||||
|
usage() {
|
||||||
|
cat <<'EOF'
|
||||||
|
Usage: sudo ./install.sh [OPTIONS]
|
||||||
|
|
||||||
|
Day-0 bootstrap automation for Ubuntu 24.04 or newer.
|
||||||
|
|
||||||
|
Profiles:
|
||||||
|
--base Install baseline operational packages
|
||||||
|
--shell Install the root shell profile
|
||||||
|
--cockpit Install and enable Cockpit
|
||||||
|
--docker Install and configure Docker
|
||||||
|
--libvirt Install and enable libvirt/KVM
|
||||||
|
--nvidia-tools Install NVIDIA diagnostic tools only
|
||||||
|
--install-nvidia-driver VERSION
|
||||||
|
Install diagnostic tools and the explicit driver
|
||||||
|
--tuning Install journald and sysctl tuning
|
||||||
|
--security Install fail2ban and UFW; do not enable UFW
|
||||||
|
--enable-ufw Run security profile and explicitly enable UFW
|
||||||
|
--all Run every profile without enabling UFW or
|
||||||
|
installing an NVIDIA driver
|
||||||
|
-h, --help Show this help
|
||||||
|
|
||||||
|
Examples:
|
||||||
|
sudo ./install.sh --base --shell
|
||||||
|
sudo ./install.sh --all
|
||||||
|
sudo ./install.sh --all --enable-ufw
|
||||||
|
sudo ./install.sh --nvidia-tools --install-nvidia-driver 550
|
||||||
|
EOF
|
||||||
|
}
|
||||||
|
|
||||||
|
require_supported_ubuntu() {
|
||||||
|
if [[ ! -r /etc/os-release ]]; then
|
||||||
|
printf 'CRITICAL: /etc/os-release is unavailable; refusing system changes\n' >&2
|
||||||
|
exit 2
|
||||||
|
fi
|
||||||
|
|
||||||
|
# shellcheck disable=SC1091
|
||||||
|
source /etc/os-release
|
||||||
|
if [[ "${ID:-}" != "ubuntu" ]]; then
|
||||||
|
printf 'CRITICAL: this toolkit supports Ubuntu only; detected %s\n' "${ID:-unknown}" >&2
|
||||||
|
exit 2
|
||||||
|
fi
|
||||||
|
if ! dpkg --compare-versions "${VERSION_ID:-0}" ge "24.04"; then
|
||||||
|
printf 'CRITICAL: Ubuntu 24.04 or newer is required; detected %s\n' \
|
||||||
|
"${VERSION_ID:-unknown}" >&2
|
||||||
|
exit 2
|
||||||
|
fi
|
||||||
|
}
|
||||||
|
|
||||||
|
if (($# == 0)); then
|
||||||
|
usage
|
||||||
|
exit 0
|
||||||
|
fi
|
||||||
|
|
||||||
|
while (($# > 0)); do
|
||||||
|
case "$1" in
|
||||||
|
--base)
|
||||||
|
run_base=1
|
||||||
|
;;
|
||||||
|
--shell)
|
||||||
|
run_shell=1
|
||||||
|
;;
|
||||||
|
--cockpit)
|
||||||
|
run_cockpit=1
|
||||||
|
;;
|
||||||
|
--docker)
|
||||||
|
run_docker=1
|
||||||
|
;;
|
||||||
|
--libvirt)
|
||||||
|
run_libvirt=1
|
||||||
|
;;
|
||||||
|
--nvidia-tools)
|
||||||
|
run_nvidia=1
|
||||||
|
;;
|
||||||
|
--install-nvidia-driver)
|
||||||
|
if (($# < 2)); then
|
||||||
|
printf 'CRITICAL: --install-nvidia-driver requires a VERSION\n' >&2
|
||||||
|
exit 2
|
||||||
|
fi
|
||||||
|
nvidia_driver_version="$2"
|
||||||
|
if [[ ! "$nvidia_driver_version" =~ ^[0-9]+$ ]]; then
|
||||||
|
printf 'CRITICAL: NVIDIA driver VERSION must contain digits only\n' >&2
|
||||||
|
exit 2
|
||||||
|
fi
|
||||||
|
run_nvidia=1
|
||||||
|
shift
|
||||||
|
;;
|
||||||
|
--tuning)
|
||||||
|
run_tuning=1
|
||||||
|
;;
|
||||||
|
--security)
|
||||||
|
run_security=1
|
||||||
|
;;
|
||||||
|
--enable-ufw)
|
||||||
|
enable_ufw=1
|
||||||
|
run_security=1
|
||||||
|
;;
|
||||||
|
--all)
|
||||||
|
run_base=1
|
||||||
|
run_shell=1
|
||||||
|
run_cockpit=1
|
||||||
|
run_docker=1
|
||||||
|
run_libvirt=1
|
||||||
|
run_nvidia=1
|
||||||
|
run_tuning=1
|
||||||
|
run_security=1
|
||||||
|
;;
|
||||||
|
-h|--help)
|
||||||
|
usage
|
||||||
|
exit 0
|
||||||
|
;;
|
||||||
|
*)
|
||||||
|
printf 'CRITICAL: unknown option: %s\n\n' "$1" >&2
|
||||||
|
usage >&2
|
||||||
|
exit 2
|
||||||
|
;;
|
||||||
|
esac
|
||||||
|
shift
|
||||||
|
done
|
||||||
|
|
||||||
|
if ((EUID != 0)); then
|
||||||
|
printf 'CRITICAL: install.sh must run as root for selected profiles\n' >&2
|
||||||
|
exit 2
|
||||||
|
fi
|
||||||
|
|
||||||
|
for required_command in bash dpkg; do
|
||||||
|
if ! command -v "$required_command" >/dev/null 2>&1; then
|
||||||
|
printf 'CRITICAL: required command is missing: %s\n' "$required_command" >&2
|
||||||
|
exit 2
|
||||||
|
fi
|
||||||
|
done
|
||||||
|
|
||||||
|
require_supported_ubuntu
|
||||||
|
|
||||||
|
printf 'INFO: running read-only preflight\n'
|
||||||
|
"$SCRIPT_DIR/scripts/00-preflight.sh"
|
||||||
|
|
||||||
|
((run_base == 0)) || "$SCRIPT_DIR/scripts/01-base-packages.sh"
|
||||||
|
((run_shell == 0)) || "$SCRIPT_DIR/scripts/02-shell-profile.sh"
|
||||||
|
((run_cockpit == 0)) || "$SCRIPT_DIR/scripts/03-cockpit.sh"
|
||||||
|
((run_docker == 0)) || "$SCRIPT_DIR/scripts/04-docker.sh"
|
||||||
|
((run_libvirt == 0)) || "$SCRIPT_DIR/scripts/05-libvirt.sh"
|
||||||
|
|
||||||
|
if ((run_nvidia == 1)); then
|
||||||
|
if [[ -n "$nvidia_driver_version" ]]; then
|
||||||
|
"$SCRIPT_DIR/scripts/06-nvidia-tools.sh" --install-driver "$nvidia_driver_version"
|
||||||
|
else
|
||||||
|
"$SCRIPT_DIR/scripts/06-nvidia-tools.sh"
|
||||||
|
fi
|
||||||
|
fi
|
||||||
|
|
||||||
|
((run_tuning == 0)) || "$SCRIPT_DIR/scripts/07-tuning.sh"
|
||||||
|
|
||||||
|
if ((run_security == 1)); then
|
||||||
|
if ((enable_ufw == 1)); then
|
||||||
|
"$SCRIPT_DIR/scripts/08-security-baseline.sh" --enable-ufw
|
||||||
|
else
|
||||||
|
"$SCRIPT_DIR/scripts/08-security-baseline.sh"
|
||||||
|
fi
|
||||||
|
fi
|
||||||
|
|
||||||
|
printf '\nINFO: running post-install checks\n'
|
||||||
|
"$SCRIPT_DIR/scripts/99-postcheck.sh"
|
||||||
|
printf '\nOK: selected Linux setup profiles completed\n'
|
||||||
@@ -0,0 +1,20 @@
|
|||||||
|
# shellcheck shell=bash
|
||||||
|
|
||||||
|
require_supported_ubuntu() {
|
||||||
|
if [[ ! -r /etc/os-release ]] || ! command -v dpkg >/dev/null 2>&1; then
|
||||||
|
printf 'CRITICAL: Ubuntu release detection requires /etc/os-release and dpkg\n' >&2
|
||||||
|
exit 2
|
||||||
|
fi
|
||||||
|
|
||||||
|
# shellcheck disable=SC1091
|
||||||
|
source /etc/os-release
|
||||||
|
if [[ "${ID:-}" != "ubuntu" ]]; then
|
||||||
|
printf 'CRITICAL: this toolkit supports Ubuntu only; detected %s\n' "${ID:-unknown}" >&2
|
||||||
|
exit 2
|
||||||
|
fi
|
||||||
|
if ! dpkg --compare-versions "${VERSION_ID:-0}" ge "24.04"; then
|
||||||
|
printf 'CRITICAL: Ubuntu 24.04 or newer is required; detected %s\n' \
|
||||||
|
"${VERSION_ID:-unknown}" >&2
|
||||||
|
exit 2
|
||||||
|
fi
|
||||||
|
}
|
||||||
Executable
+124
@@ -0,0 +1,124 @@
|
|||||||
|
#!/usr/bin/env bash
|
||||||
|
set -o errexit
|
||||||
|
set -o nounset
|
||||||
|
set -o pipefail
|
||||||
|
|
||||||
|
section() {
|
||||||
|
printf '\n== %s ==\n' "$1"
|
||||||
|
}
|
||||||
|
|
||||||
|
run_optional() {
|
||||||
|
local description="$1"
|
||||||
|
shift
|
||||||
|
|
||||||
|
if "$@"; then
|
||||||
|
return 0
|
||||||
|
fi
|
||||||
|
printf 'WARNING: %s failed\n' "$description"
|
||||||
|
return 0
|
||||||
|
}
|
||||||
|
|
||||||
|
section "Operating system"
|
||||||
|
if [[ -r /etc/os-release ]]; then
|
||||||
|
run_optional "OS release report" cat /etc/os-release
|
||||||
|
else
|
||||||
|
printf 'WARNING: /etc/os-release is unavailable\n'
|
||||||
|
fi
|
||||||
|
run_optional "kernel report" uname -a
|
||||||
|
|
||||||
|
section "Host"
|
||||||
|
run_optional "hostname report" hostname
|
||||||
|
run_optional "uptime report" uptime
|
||||||
|
|
||||||
|
section "CPU and virtualization"
|
||||||
|
if command -v lscpu >/dev/null 2>&1; then
|
||||||
|
run_optional "CPU report" lscpu
|
||||||
|
printf '\nVirtualization flags:\n'
|
||||||
|
lscpu | grep -E 'Virtualization|Hypervisor vendor' || \
|
||||||
|
printf 'INFO: no virtualization summary reported by lscpu\n'
|
||||||
|
else
|
||||||
|
printf 'WARNING: lscpu is unavailable\n'
|
||||||
|
fi
|
||||||
|
if grep -Eqm1 '(^|[[:space:]])(vmx|svm)([[:space:]]|$)' /proc/cpuinfo; then
|
||||||
|
printf 'OK: CPU virtualization flags detected\n'
|
||||||
|
else
|
||||||
|
printf 'WARNING: CPU virtualization flags were not detected\n'
|
||||||
|
fi
|
||||||
|
|
||||||
|
section "Memory"
|
||||||
|
if command -v free >/dev/null 2>&1; then
|
||||||
|
run_optional "memory report" free -h
|
||||||
|
else
|
||||||
|
run_optional "memory report" cat /proc/meminfo
|
||||||
|
fi
|
||||||
|
|
||||||
|
section "Disks"
|
||||||
|
if command -v lsblk >/dev/null 2>&1; then
|
||||||
|
run_optional "block device report" lsblk -o NAME,TYPE,SIZE,FSTYPE,MOUNTPOINTS,MODEL
|
||||||
|
else
|
||||||
|
printf 'WARNING: lsblk is unavailable\n'
|
||||||
|
fi
|
||||||
|
run_optional "filesystem report" df -hT
|
||||||
|
|
||||||
|
section "Network"
|
||||||
|
if command -v ip >/dev/null 2>&1; then
|
||||||
|
run_optional "network interface report" ip -brief address
|
||||||
|
run_optional "route report" ip route
|
||||||
|
else
|
||||||
|
printf 'WARNING: ip is unavailable\n'
|
||||||
|
fi
|
||||||
|
|
||||||
|
section "Firmware and Secure Boot"
|
||||||
|
if [[ -d /sys/firmware/efi ]]; then
|
||||||
|
printf 'OK: boot mode is UEFI\n'
|
||||||
|
else
|
||||||
|
printf 'INFO: boot mode appears to be legacy BIOS\n'
|
||||||
|
fi
|
||||||
|
if command -v mokutil >/dev/null 2>&1; then
|
||||||
|
run_optional "Secure Boot report" mokutil --sb-state
|
||||||
|
else
|
||||||
|
printf 'INFO: mokutil is unavailable; Secure Boot state not queried\n'
|
||||||
|
fi
|
||||||
|
|
||||||
|
section "IOMMU"
|
||||||
|
if [[ -r /proc/cmdline ]]; then
|
||||||
|
printf 'Kernel command line:\n'
|
||||||
|
cat /proc/cmdline
|
||||||
|
if grep -Eq '(^|[[:space:]])(intel_iommu=on|amd_iommu=on|iommu=)' /proc/cmdline; then
|
||||||
|
printf 'OK: IOMMU-related kernel arguments detected\n'
|
||||||
|
else
|
||||||
|
printf 'INFO: no explicit IOMMU kernel argument detected\n'
|
||||||
|
fi
|
||||||
|
fi
|
||||||
|
if command -v dmesg >/dev/null 2>&1; then
|
||||||
|
dmesg 2>/dev/null | grep -Ei 'DMAR|IOMMU|AMD-Vi' | tail -n 30 || \
|
||||||
|
printf 'INFO: no readable IOMMU hints found in dmesg\n'
|
||||||
|
fi
|
||||||
|
|
||||||
|
section "NVIDIA hardware"
|
||||||
|
if command -v lspci >/dev/null 2>&1; then
|
||||||
|
lspci -nn | grep -i nvidia || printf 'INFO: no NVIDIA PCI devices detected\n'
|
||||||
|
else
|
||||||
|
printf 'INFO: lspci is unavailable\n'
|
||||||
|
fi
|
||||||
|
|
||||||
|
section "Existing platform components"
|
||||||
|
for command_name in docker virsh cockpit-bridge; do
|
||||||
|
if command -v "$command_name" >/dev/null 2>&1; then
|
||||||
|
printf 'OK: %s is installed at %s\n' "$command_name" "$(command -v "$command_name")"
|
||||||
|
else
|
||||||
|
printf 'INFO: %s is not installed\n' "$command_name"
|
||||||
|
fi
|
||||||
|
done
|
||||||
|
if command -v systemctl >/dev/null 2>&1; then
|
||||||
|
for unit in docker.service libvirtd.service cockpit.socket; do
|
||||||
|
if systemctl cat "$unit" >/dev/null 2>&1; then
|
||||||
|
state="$(systemctl is-active "$unit" 2>/dev/null || true)"
|
||||||
|
printf 'INFO: %-20s state=%s\n' "$unit" "${state:-unknown}"
|
||||||
|
else
|
||||||
|
printf 'INFO: %s is not installed\n' "$unit"
|
||||||
|
fi
|
||||||
|
done
|
||||||
|
fi
|
||||||
|
|
||||||
|
printf '\nOK: preflight completed without modifying the host\n'
|
||||||
Executable
+41
@@ -0,0 +1,41 @@
|
|||||||
|
#!/usr/bin/env bash
|
||||||
|
set -o errexit
|
||||||
|
set -o nounset
|
||||||
|
set -o pipefail
|
||||||
|
|
||||||
|
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
|
||||||
|
# shellcheck source=00-platform-guard.inc
|
||||||
|
source "$SCRIPT_DIR/00-platform-guard.inc"
|
||||||
|
|
||||||
|
packages=(
|
||||||
|
curl wget git vim nano tmux byobu htop btop glances
|
||||||
|
jq unzip zip rsync tree ncdu duf
|
||||||
|
lsof strace tcpdump nmap dnsutils net-tools iperf3 ethtool
|
||||||
|
smartmontools nvme-cli lm-sensors pciutils usbutils hwinfo
|
||||||
|
sysstat iotop iftop nload
|
||||||
|
ca-certificates gnupg software-properties-common apt-transport-https
|
||||||
|
needrestart unattended-upgrades logrotate
|
||||||
|
)
|
||||||
|
|
||||||
|
if ((EUID != 0)); then
|
||||||
|
printf 'CRITICAL: base package setup must run as root\n' >&2
|
||||||
|
exit 2
|
||||||
|
fi
|
||||||
|
require_supported_ubuntu
|
||||||
|
if ! command -v apt-get >/dev/null 2>&1; then
|
||||||
|
printf 'CRITICAL: apt-get is required\n' >&2
|
||||||
|
exit 2
|
||||||
|
fi
|
||||||
|
|
||||||
|
printf 'INFO: refreshing APT metadata\n'
|
||||||
|
apt-get update
|
||||||
|
printf 'INFO: installing baseline operational packages\n'
|
||||||
|
DEBIAN_FRONTEND=noninteractive apt-get install -y "${packages[@]}"
|
||||||
|
|
||||||
|
if command -v systemctl >/dev/null 2>&1; then
|
||||||
|
systemctl enable --now sysstat
|
||||||
|
else
|
||||||
|
printf 'WARNING: systemctl is unavailable; sysstat was not enabled\n'
|
||||||
|
fi
|
||||||
|
|
||||||
|
printf 'OK: baseline operational packages are installed\n'
|
||||||
Executable
+60
@@ -0,0 +1,60 @@
|
|||||||
|
#!/usr/bin/env bash
|
||||||
|
set -o errexit
|
||||||
|
set -o nounset
|
||||||
|
set -o pipefail
|
||||||
|
|
||||||
|
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
|
||||||
|
# shellcheck source=00-platform-guard.inc
|
||||||
|
source "$SCRIPT_DIR/00-platform-guard.inc"
|
||||||
|
SOURCE_FILE="$SCRIPT_DIR/../files/bashrc.d/ailab.sh"
|
||||||
|
PROFILE_DIR="/root/.bashrc.d"
|
||||||
|
PROFILE_FILE="$PROFILE_DIR/ailab.sh"
|
||||||
|
BASHRC="/root/.bashrc"
|
||||||
|
SOURCE_LINE='[[ -f /root/.bashrc.d/ailab.sh ]] && source /root/.bashrc.d/ailab.sh'
|
||||||
|
|
||||||
|
backup_file() {
|
||||||
|
local path="$1"
|
||||||
|
local backup
|
||||||
|
|
||||||
|
backup="${path}.$(date '+%Y%m%d-%H%M%S').bak"
|
||||||
|
install -m 0644 "$path" "$backup"
|
||||||
|
printf 'INFO: backed up %s to %s\n' "$path" "$backup"
|
||||||
|
}
|
||||||
|
|
||||||
|
if ((EUID != 0)); then
|
||||||
|
printf 'CRITICAL: shell profile setup must run as root\n' >&2
|
||||||
|
exit 2
|
||||||
|
fi
|
||||||
|
require_supported_ubuntu
|
||||||
|
if [[ ! -r "$SOURCE_FILE" ]]; then
|
||||||
|
printf 'CRITICAL: shell profile source is missing: %s\n' "$SOURCE_FILE" >&2
|
||||||
|
exit 2
|
||||||
|
fi
|
||||||
|
|
||||||
|
install -d -m 0755 "$PROFILE_DIR"
|
||||||
|
if [[ ! -f "$PROFILE_FILE" ]] || ! cmp -s "$SOURCE_FILE" "$PROFILE_FILE"; then
|
||||||
|
if [[ -f "$PROFILE_FILE" ]]; then
|
||||||
|
backup_file "$PROFILE_FILE"
|
||||||
|
fi
|
||||||
|
install -m 0644 "$SOURCE_FILE" "$PROFILE_FILE"
|
||||||
|
printf 'OK: installed %s\n' "$PROFILE_FILE"
|
||||||
|
else
|
||||||
|
printf 'OK: shell profile is already current\n'
|
||||||
|
fi
|
||||||
|
|
||||||
|
if [[ ! -f "$BASHRC" ]]; then
|
||||||
|
install -m 0644 /dev/null "$BASHRC"
|
||||||
|
fi
|
||||||
|
|
||||||
|
source_count="$(grep -Fxc "$SOURCE_LINE" "$BASHRC" || true)"
|
||||||
|
if [[ "$source_count" != "1" ]]; then
|
||||||
|
tmp_bashrc="$(mktemp)"
|
||||||
|
trap 'rm -f "$tmp_bashrc"' EXIT
|
||||||
|
grep -Fvx "$SOURCE_LINE" "$BASHRC" >"$tmp_bashrc" || true
|
||||||
|
printf '\n%s\n' "$SOURCE_LINE" >>"$tmp_bashrc"
|
||||||
|
backup_file "$BASHRC"
|
||||||
|
install -m 0644 "$tmp_bashrc" "$BASHRC"
|
||||||
|
printf 'OK: configured %s to source the AI lab profile exactly once\n' "$BASHRC"
|
||||||
|
else
|
||||||
|
printf 'OK: %s already sources the AI lab profile exactly once\n' "$BASHRC"
|
||||||
|
fi
|
||||||
Executable
+36
@@ -0,0 +1,36 @@
|
|||||||
|
#!/usr/bin/env bash
|
||||||
|
set -o errexit
|
||||||
|
set -o nounset
|
||||||
|
set -o pipefail
|
||||||
|
|
||||||
|
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
|
||||||
|
# shellcheck source=00-platform-guard.inc
|
||||||
|
source "$SCRIPT_DIR/00-platform-guard.inc"
|
||||||
|
|
||||||
|
required_packages=(
|
||||||
|
cockpit cockpit-system cockpit-storaged cockpit-networkmanager
|
||||||
|
cockpit-packagekit cockpit-machines cockpit-sosreport cockpit-pcp
|
||||||
|
)
|
||||||
|
|
||||||
|
if ((EUID != 0)); then
|
||||||
|
printf 'CRITICAL: Cockpit setup must run as root\n' >&2
|
||||||
|
exit 2
|
||||||
|
fi
|
||||||
|
require_supported_ubuntu
|
||||||
|
if ! command -v apt-get >/dev/null 2>&1; then
|
||||||
|
printf 'CRITICAL: apt-get is required\n' >&2
|
||||||
|
exit 2
|
||||||
|
fi
|
||||||
|
|
||||||
|
apt-get update
|
||||||
|
DEBIAN_FRONTEND=noninteractive apt-get install -y "${required_packages[@]}"
|
||||||
|
|
||||||
|
if apt-cache show cockpit-files >/dev/null 2>&1; then
|
||||||
|
DEBIAN_FRONTEND=noninteractive apt-get install -y cockpit-files
|
||||||
|
printf 'OK: installed optional cockpit-files package\n'
|
||||||
|
else
|
||||||
|
printf 'WARNING: cockpit-files is unavailable; continuing without it\n'
|
||||||
|
fi
|
||||||
|
|
||||||
|
systemctl enable --now cockpit.socket
|
||||||
|
printf 'OK: Cockpit is enabled at https://%s:9090\n' "$(hostname)"
|
||||||
Executable
+136
@@ -0,0 +1,136 @@
|
|||||||
|
#!/usr/bin/env bash
|
||||||
|
set -o errexit
|
||||||
|
set -o nounset
|
||||||
|
set -o pipefail
|
||||||
|
|
||||||
|
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
|
||||||
|
# shellcheck source=00-platform-guard.inc
|
||||||
|
source "$SCRIPT_DIR/00-platform-guard.inc"
|
||||||
|
SOURCE_CONFIG="$SCRIPT_DIR/../files/docker/daemon.json"
|
||||||
|
DOCKER_CONFIG="/etc/docker/daemon.json"
|
||||||
|
temporary_files=()
|
||||||
|
|
||||||
|
cleanup() {
|
||||||
|
local path
|
||||||
|
|
||||||
|
for path in "${temporary_files[@]}"; do
|
||||||
|
rm -f "$path"
|
||||||
|
done
|
||||||
|
}
|
||||||
|
|
||||||
|
trap cleanup EXIT
|
||||||
|
|
||||||
|
backup_file() {
|
||||||
|
local path="$1"
|
||||||
|
local backup
|
||||||
|
|
||||||
|
backup="${path}.$(date '+%Y%m%d-%H%M%S').bak"
|
||||||
|
install -m 0644 "$path" "$backup"
|
||||||
|
printf 'INFO: backed up %s to %s\n' "$path" "$backup"
|
||||||
|
}
|
||||||
|
|
||||||
|
if ((EUID != 0)); then
|
||||||
|
printf 'CRITICAL: Docker setup must run as root\n' >&2
|
||||||
|
exit 2
|
||||||
|
fi
|
||||||
|
require_supported_ubuntu
|
||||||
|
for command_name in apt-get apt-cache; do
|
||||||
|
if ! command -v "$command_name" >/dev/null 2>&1; then
|
||||||
|
printf 'CRITICAL: required command is missing: %s\n' "$command_name" >&2
|
||||||
|
exit 2
|
||||||
|
fi
|
||||||
|
done
|
||||||
|
|
||||||
|
apt-get update
|
||||||
|
DEBIAN_FRONTEND=noninteractive apt-get install -y ca-certificates curl gnupg jq
|
||||||
|
|
||||||
|
if apt-cache show docker.io >/dev/null 2>&1; then
|
||||||
|
packages=(docker.io)
|
||||||
|
if apt-cache show docker-compose-v2 >/dev/null 2>&1; then
|
||||||
|
packages+=(docker-compose-v2)
|
||||||
|
else
|
||||||
|
printf 'WARNING: docker-compose-v2 is unavailable from Ubuntu repositories\n'
|
||||||
|
fi
|
||||||
|
else
|
||||||
|
printf 'WARNING: docker.io is unavailable; configuring Docker official repository\n'
|
||||||
|
install -d -m 0755 /etc/apt/keyrings
|
||||||
|
tmp_key="$(mktemp)"
|
||||||
|
temporary_files+=("$tmp_key")
|
||||||
|
curl -fsSL https://download.docker.com/linux/ubuntu/gpg \
|
||||||
|
| gpg --dearmor --yes -o "$tmp_key"
|
||||||
|
if [[ ! -f /etc/apt/keyrings/docker.gpg ]] || \
|
||||||
|
! cmp -s "$tmp_key" /etc/apt/keyrings/docker.gpg; then
|
||||||
|
if [[ -f /etc/apt/keyrings/docker.gpg ]]; then
|
||||||
|
backup_file /etc/apt/keyrings/docker.gpg
|
||||||
|
fi
|
||||||
|
install -m 0644 "$tmp_key" /etc/apt/keyrings/docker.gpg
|
||||||
|
fi
|
||||||
|
|
||||||
|
# shellcheck disable=SC1091
|
||||||
|
source /etc/os-release
|
||||||
|
architecture="$(dpkg --print-architecture)"
|
||||||
|
tmp_repository="$(mktemp)"
|
||||||
|
temporary_files+=("$tmp_repository")
|
||||||
|
printf 'deb [arch=%s signed-by=/etc/apt/keyrings/docker.gpg] https://download.docker.com/linux/ubuntu %s stable\n' \
|
||||||
|
"$architecture" "${VERSION_CODENAME:?}" \
|
||||||
|
>"$tmp_repository"
|
||||||
|
if [[ ! -f /etc/apt/sources.list.d/docker.list ]] || \
|
||||||
|
! cmp -s "$tmp_repository" /etc/apt/sources.list.d/docker.list; then
|
||||||
|
if [[ -f /etc/apt/sources.list.d/docker.list ]]; then
|
||||||
|
backup_file /etc/apt/sources.list.d/docker.list
|
||||||
|
fi
|
||||||
|
install -m 0644 "$tmp_repository" /etc/apt/sources.list.d/docker.list
|
||||||
|
fi
|
||||||
|
apt-get update
|
||||||
|
packages=(docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin)
|
||||||
|
fi
|
||||||
|
|
||||||
|
DEBIAN_FRONTEND=noninteractive apt-get install -y "${packages[@]}"
|
||||||
|
install -d -m 0755 /etc/docker
|
||||||
|
|
||||||
|
if [[ ! -r "$SOURCE_CONFIG" ]]; then
|
||||||
|
printf 'CRITICAL: Docker configuration template is missing: %s\n' "$SOURCE_CONFIG" >&2
|
||||||
|
exit 2
|
||||||
|
fi
|
||||||
|
jq empty "$SOURCE_CONFIG"
|
||||||
|
|
||||||
|
tmp_config="$(mktemp)"
|
||||||
|
temporary_files+=("$tmp_config")
|
||||||
|
if [[ -f "$DOCKER_CONFIG" ]]; then
|
||||||
|
if ! jq empty "$DOCKER_CONFIG" >/dev/null 2>&1; then
|
||||||
|
printf 'CRITICAL: %s is invalid JSON; refusing to overwrite it\n' "$DOCKER_CONFIG" >&2
|
||||||
|
exit 1
|
||||||
|
fi
|
||||||
|
jq '. + {
|
||||||
|
"log-driver": "json-file",
|
||||||
|
"log-opts": ((."log-opts" // {}) + {"max-size": "50m", "max-file": "5"})
|
||||||
|
}' "$DOCKER_CONFIG" >"$tmp_config"
|
||||||
|
else
|
||||||
|
install -m 0644 "$SOURCE_CONFIG" "$tmp_config"
|
||||||
|
fi
|
||||||
|
jq empty "$tmp_config"
|
||||||
|
|
||||||
|
config_changed=0
|
||||||
|
if [[ ! -f "$DOCKER_CONFIG" ]] || ! cmp -s "$tmp_config" "$DOCKER_CONFIG"; then
|
||||||
|
if [[ -f "$DOCKER_CONFIG" ]]; then
|
||||||
|
backup_file "$DOCKER_CONFIG"
|
||||||
|
fi
|
||||||
|
install -m 0644 "$tmp_config" "$DOCKER_CONFIG"
|
||||||
|
config_changed=1
|
||||||
|
printf 'OK: installed Docker daemon log limits\n'
|
||||||
|
else
|
||||||
|
printf 'OK: Docker daemon configuration is already current\n'
|
||||||
|
fi
|
||||||
|
|
||||||
|
systemctl enable --now docker
|
||||||
|
if ((config_changed == 1)); then
|
||||||
|
systemctl restart docker
|
||||||
|
fi
|
||||||
|
|
||||||
|
docker version
|
||||||
|
if docker compose version >/dev/null 2>&1; then
|
||||||
|
docker compose version
|
||||||
|
else
|
||||||
|
printf 'WARNING: Docker Compose v2 is unavailable\n'
|
||||||
|
fi
|
||||||
|
printf 'OK: Docker setup completed\n'
|
||||||
Executable
+33
@@ -0,0 +1,33 @@
|
|||||||
|
#!/usr/bin/env bash
|
||||||
|
set -o errexit
|
||||||
|
set -o nounset
|
||||||
|
set -o pipefail
|
||||||
|
|
||||||
|
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
|
||||||
|
# shellcheck source=00-platform-guard.inc
|
||||||
|
source "$SCRIPT_DIR/00-platform-guard.inc"
|
||||||
|
|
||||||
|
packages=(
|
||||||
|
qemu-system-x86 qemu-utils libvirt-daemon-system libvirt-clients
|
||||||
|
virtinst virt-manager bridge-utils ovmf swtpm swtpm-tools dnsmasq-base
|
||||||
|
)
|
||||||
|
|
||||||
|
if ((EUID != 0)); then
|
||||||
|
printf 'CRITICAL: libvirt setup must run as root\n' >&2
|
||||||
|
exit 2
|
||||||
|
fi
|
||||||
|
require_supported_ubuntu
|
||||||
|
if ! command -v apt-get >/dev/null 2>&1; then
|
||||||
|
printf 'CRITICAL: apt-get is required\n' >&2
|
||||||
|
exit 2
|
||||||
|
fi
|
||||||
|
|
||||||
|
apt-get update
|
||||||
|
DEBIAN_FRONTEND=noninteractive apt-get install -y "${packages[@]}"
|
||||||
|
systemctl enable --now libvirtd
|
||||||
|
|
||||||
|
printf '\n== Virtual machines ==\n'
|
||||||
|
virsh list --all || printf 'WARNING: unable to list virtual machines\n'
|
||||||
|
printf '\n== Virtual networks ==\n'
|
||||||
|
virsh net-list --all || printf 'WARNING: unable to list virtual networks\n'
|
||||||
|
printf 'OK: libvirt/KVM setup completed\n'
|
||||||
Executable
+88
@@ -0,0 +1,88 @@
|
|||||||
|
#!/usr/bin/env bash
|
||||||
|
set -o errexit
|
||||||
|
set -o nounset
|
||||||
|
set -o pipefail
|
||||||
|
|
||||||
|
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
|
||||||
|
# shellcheck source=00-platform-guard.inc
|
||||||
|
source "$SCRIPT_DIR/00-platform-guard.inc"
|
||||||
|
|
||||||
|
driver_version=""
|
||||||
|
|
||||||
|
usage() {
|
||||||
|
cat <<'EOF'
|
||||||
|
Usage: sudo ./06-nvidia-tools.sh [--install-driver VERSION]
|
||||||
|
|
||||||
|
Without --install-driver, only non-driver diagnostic tools are installed.
|
||||||
|
EOF
|
||||||
|
}
|
||||||
|
|
||||||
|
while (($# > 0)); do
|
||||||
|
case "$1" in
|
||||||
|
--install-driver)
|
||||||
|
if (($# < 2)); then
|
||||||
|
printf 'CRITICAL: --install-driver requires a VERSION\n' >&2
|
||||||
|
exit 2
|
||||||
|
fi
|
||||||
|
driver_version="$2"
|
||||||
|
if [[ ! "$driver_version" =~ ^[0-9]+$ ]]; then
|
||||||
|
printf 'CRITICAL: NVIDIA driver VERSION must contain digits only\n' >&2
|
||||||
|
exit 2
|
||||||
|
fi
|
||||||
|
shift
|
||||||
|
;;
|
||||||
|
-h|--help)
|
||||||
|
usage
|
||||||
|
exit 0
|
||||||
|
;;
|
||||||
|
*)
|
||||||
|
printf 'CRITICAL: unknown option: %s\n' "$1" >&2
|
||||||
|
exit 2
|
||||||
|
;;
|
||||||
|
esac
|
||||||
|
shift
|
||||||
|
done
|
||||||
|
|
||||||
|
if ((EUID != 0)); then
|
||||||
|
printf 'CRITICAL: NVIDIA tooling setup must run as root\n' >&2
|
||||||
|
exit 2
|
||||||
|
fi
|
||||||
|
require_supported_ubuntu
|
||||||
|
if ! command -v apt-get >/dev/null 2>&1; then
|
||||||
|
printf 'CRITICAL: apt-get is required\n' >&2
|
||||||
|
exit 2
|
||||||
|
fi
|
||||||
|
|
||||||
|
apt-get update
|
||||||
|
DEBIAN_FRONTEND=noninteractive apt-get install -y nvtop clinfo pciutils
|
||||||
|
|
||||||
|
printf '\n== NVIDIA PCI devices ==\n'
|
||||||
|
lspci -nn | grep -i nvidia || printf 'INFO: no NVIDIA PCI devices detected\n'
|
||||||
|
|
||||||
|
printf '\n== NVIDIA runtime ==\n'
|
||||||
|
if command -v nvidia-smi >/dev/null 2>&1; then
|
||||||
|
nvidia-smi || printf 'WARNING: nvidia-smi returned an error\n'
|
||||||
|
else
|
||||||
|
printf 'INFO: nvidia-smi is not installed\n'
|
||||||
|
fi
|
||||||
|
|
||||||
|
printf '\n== DKMS ==\n'
|
||||||
|
if command -v dkms >/dev/null 2>&1; then
|
||||||
|
dkms status || printf 'WARNING: dkms status returned an error\n'
|
||||||
|
else
|
||||||
|
printf 'INFO: dkms is not installed\n'
|
||||||
|
fi
|
||||||
|
|
||||||
|
if [[ -n "$driver_version" ]]; then
|
||||||
|
driver_package="nvidia-driver-$driver_version"
|
||||||
|
if ! apt-cache show "$driver_package" >/dev/null 2>&1; then
|
||||||
|
printf 'CRITICAL: requested NVIDIA driver package is unavailable: %s\n' \
|
||||||
|
"$driver_package" >&2
|
||||||
|
exit 1
|
||||||
|
fi
|
||||||
|
DEBIAN_FRONTEND=noninteractive apt-get install -y "$driver_package"
|
||||||
|
printf 'WARNING: NVIDIA driver %s was installed; reboot before validation\n' \
|
||||||
|
"$driver_version"
|
||||||
|
else
|
||||||
|
printf 'OK: NVIDIA diagnostic tools installed; no driver was installed\n'
|
||||||
|
fi
|
||||||
Executable
+67
@@ -0,0 +1,67 @@
|
|||||||
|
#!/usr/bin/env bash
|
||||||
|
set -o errexit
|
||||||
|
set -o nounset
|
||||||
|
set -o pipefail
|
||||||
|
|
||||||
|
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
|
||||||
|
# shellcheck source=00-platform-guard.inc
|
||||||
|
source "$SCRIPT_DIR/00-platform-guard.inc"
|
||||||
|
JOURNAL_SOURCE="$SCRIPT_DIR/../files/systemd/journald-ailab-limits.conf"
|
||||||
|
JOURNAL_DEST="/etc/systemd/journald.conf.d/ailab-limits.conf"
|
||||||
|
SYSCTL_SOURCE="$SCRIPT_DIR/../files/sysctl/99-ailab.conf"
|
||||||
|
SYSCTL_DEST="/etc/sysctl.d/99-ailab.conf"
|
||||||
|
|
||||||
|
install_config() {
|
||||||
|
local source_path="$1"
|
||||||
|
local destination_path="$2"
|
||||||
|
local mode="$3"
|
||||||
|
local backup
|
||||||
|
|
||||||
|
install -d -m 0755 "$(dirname "$destination_path")"
|
||||||
|
if [[ -f "$destination_path" ]] && cmp -s "$source_path" "$destination_path"; then
|
||||||
|
printf 'OK: %s is already current\n' "$destination_path"
|
||||||
|
return 0
|
||||||
|
fi
|
||||||
|
if [[ -f "$destination_path" ]]; then
|
||||||
|
backup="${destination_path}.$(date '+%Y%m%d-%H%M%S').bak"
|
||||||
|
install -m "$mode" "$destination_path" "$backup"
|
||||||
|
printf 'INFO: backed up %s to %s\n' "$destination_path" "$backup"
|
||||||
|
fi
|
||||||
|
install -m "$mode" "$source_path" "$destination_path"
|
||||||
|
printf 'OK: installed %s\n' "$destination_path"
|
||||||
|
}
|
||||||
|
|
||||||
|
if ((EUID != 0)); then
|
||||||
|
printf 'CRITICAL: tuning setup must run as root\n' >&2
|
||||||
|
exit 2
|
||||||
|
fi
|
||||||
|
require_supported_ubuntu
|
||||||
|
for source_path in "$JOURNAL_SOURCE" "$SYSCTL_SOURCE"; do
|
||||||
|
if [[ ! -r "$source_path" ]]; then
|
||||||
|
printf 'CRITICAL: required configuration is missing: %s\n' "$source_path" >&2
|
||||||
|
exit 2
|
||||||
|
fi
|
||||||
|
done
|
||||||
|
|
||||||
|
if ! command -v sysctl >/dev/null 2>&1 || ! command -v systemctl >/dev/null 2>&1; then
|
||||||
|
printf 'CRITICAL: sysctl and systemctl are required\n' >&2
|
||||||
|
exit 2
|
||||||
|
fi
|
||||||
|
|
||||||
|
if ! command -v sensors-detect >/dev/null 2>&1 || \
|
||||||
|
! systemctl cat sysstat.service >/dev/null 2>&1; then
|
||||||
|
apt-get update
|
||||||
|
DEBIAN_FRONTEND=noninteractive apt-get install -y lm-sensors sysstat
|
||||||
|
fi
|
||||||
|
|
||||||
|
install_config "$JOURNAL_SOURCE" "$JOURNAL_DEST" 0644
|
||||||
|
install_config "$SYSCTL_SOURCE" "$SYSCTL_DEST" 0644
|
||||||
|
|
||||||
|
sysctl --system
|
||||||
|
systemctl restart systemd-journald
|
||||||
|
systemctl enable --now sysstat
|
||||||
|
|
||||||
|
if command -v sensors-detect >/dev/null 2>&1; then
|
||||||
|
sensors-detect --auto || printf 'WARNING: sensors-detect did not complete successfully\n'
|
||||||
|
fi
|
||||||
|
printf 'OK: host tuning completed\n'
|
||||||
+61
@@ -0,0 +1,61 @@
|
|||||||
|
#!/usr/bin/env bash
|
||||||
|
set -o errexit
|
||||||
|
set -o nounset
|
||||||
|
set -o pipefail
|
||||||
|
|
||||||
|
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
|
||||||
|
# shellcheck source=00-platform-guard.inc
|
||||||
|
source "$SCRIPT_DIR/00-platform-guard.inc"
|
||||||
|
|
||||||
|
enable_ufw=0
|
||||||
|
|
||||||
|
usage() {
|
||||||
|
cat <<'EOF'
|
||||||
|
Usage: sudo ./08-security-baseline.sh [--enable-ufw]
|
||||||
|
|
||||||
|
Installs fail2ban and UFW. UFW is enabled only with the explicit flag.
|
||||||
|
EOF
|
||||||
|
}
|
||||||
|
|
||||||
|
while (($# > 0)); do
|
||||||
|
case "$1" in
|
||||||
|
--enable-ufw)
|
||||||
|
enable_ufw=1
|
||||||
|
;;
|
||||||
|
-h|--help)
|
||||||
|
usage
|
||||||
|
exit 0
|
||||||
|
;;
|
||||||
|
*)
|
||||||
|
printf 'CRITICAL: unknown option: %s\n' "$1" >&2
|
||||||
|
exit 2
|
||||||
|
;;
|
||||||
|
esac
|
||||||
|
shift
|
||||||
|
done
|
||||||
|
|
||||||
|
if ((EUID != 0)); then
|
||||||
|
printf 'CRITICAL: security baseline setup must run as root\n' >&2
|
||||||
|
exit 2
|
||||||
|
fi
|
||||||
|
require_supported_ubuntu
|
||||||
|
if ! command -v apt-get >/dev/null 2>&1; then
|
||||||
|
printf 'CRITICAL: apt-get is required\n' >&2
|
||||||
|
exit 2
|
||||||
|
fi
|
||||||
|
|
||||||
|
apt-get update
|
||||||
|
DEBIAN_FRONTEND=noninteractive apt-get install -y fail2ban ufw
|
||||||
|
systemctl enable --now fail2ban
|
||||||
|
|
||||||
|
if ((enable_ufw == 1)); then
|
||||||
|
printf 'WARNING: UFW was explicitly requested; SSH and Cockpit rules will be added before enablement\n'
|
||||||
|
ufw allow OpenSSH
|
||||||
|
ufw allow 9090/tcp comment 'Cockpit'
|
||||||
|
ufw --force enable
|
||||||
|
else
|
||||||
|
printf 'WARNING: UFW is installed but was not enabled; use --enable-ufw after reviewing access requirements\n'
|
||||||
|
fi
|
||||||
|
|
||||||
|
ufw status verbose || printf 'WARNING: unable to read UFW status\n'
|
||||||
|
printf 'OK: security baseline completed\n'
|
||||||
Executable
+69
@@ -0,0 +1,69 @@
|
|||||||
|
#!/usr/bin/env bash
|
||||||
|
set -o errexit
|
||||||
|
set -o nounset
|
||||||
|
set -o pipefail
|
||||||
|
|
||||||
|
section() {
|
||||||
|
printf '\n== %s ==\n' "$1"
|
||||||
|
}
|
||||||
|
|
||||||
|
run_optional() {
|
||||||
|
local description="$1"
|
||||||
|
shift
|
||||||
|
|
||||||
|
if "$@"; then
|
||||||
|
return 0
|
||||||
|
fi
|
||||||
|
printf 'WARNING: %s failed\n' "$description"
|
||||||
|
return 0
|
||||||
|
}
|
||||||
|
|
||||||
|
section "Failed systemd units"
|
||||||
|
if command -v systemctl >/dev/null 2>&1; then
|
||||||
|
run_optional "failed systemd unit report" systemctl --failed --no-pager
|
||||||
|
|
||||||
|
section "Selected service status"
|
||||||
|
for unit in cockpit.socket docker.service libvirtd.service fail2ban.service; do
|
||||||
|
if systemctl cat "$unit" >/dev/null 2>&1; then
|
||||||
|
run_optional "$unit status" systemctl status "$unit" --no-pager
|
||||||
|
else
|
||||||
|
printf 'INFO: %s is not installed\n' "$unit"
|
||||||
|
fi
|
||||||
|
done
|
||||||
|
else
|
||||||
|
printf 'WARNING: systemctl is unavailable\n'
|
||||||
|
fi
|
||||||
|
|
||||||
|
section "Docker"
|
||||||
|
if command -v docker >/dev/null 2>&1; then
|
||||||
|
run_optional "Docker container list" docker ps
|
||||||
|
else
|
||||||
|
printf 'INFO: Docker is not installed\n'
|
||||||
|
fi
|
||||||
|
|
||||||
|
section "Libvirt"
|
||||||
|
if command -v virsh >/dev/null 2>&1; then
|
||||||
|
run_optional "libvirt guest list" virsh list --all
|
||||||
|
else
|
||||||
|
printf 'INFO: virsh is not installed\n'
|
||||||
|
fi
|
||||||
|
|
||||||
|
section "NVIDIA"
|
||||||
|
if command -v nvidia-smi >/dev/null 2>&1; then
|
||||||
|
run_optional "NVIDIA status" nvidia-smi
|
||||||
|
else
|
||||||
|
printf 'INFO: nvidia-smi is not installed\n'
|
||||||
|
fi
|
||||||
|
|
||||||
|
section "Filesystems"
|
||||||
|
run_optional "filesystem report" df -hT
|
||||||
|
|
||||||
|
section "Listening ports"
|
||||||
|
if command -v ss >/dev/null 2>&1; then
|
||||||
|
run_optional "listening port report" ss -tulpn
|
||||||
|
else
|
||||||
|
printf 'WARNING: ss is unavailable\n'
|
||||||
|
fi
|
||||||
|
|
||||||
|
printf '\nOK: postcheck completed; review warnings above\n'
|
||||||
|
exit 0
|
||||||
Reference in New Issue
Block a user