This commit is contained in:
@@ -0,0 +1,53 @@
|
||||
# Bash Shell Profile
|
||||
|
||||
## Installation
|
||||
|
||||
The shell profile is installed for root:
|
||||
|
||||
```text
|
||||
/root/.bashrc.d/ailab.sh
|
||||
```
|
||||
|
||||
The installer maintains one exact source line in `/root/.bashrc` and backs up
|
||||
changed files. Start a new Bash session or run:
|
||||
|
||||
```bash
|
||||
source /root/.bashrc
|
||||
```
|
||||
|
||||
## Aliases
|
||||
|
||||
| Alias | Purpose |
|
||||
| --- | --- |
|
||||
| `ll`, `la` | Detailed and hidden-file directory listings |
|
||||
| `ports` | Listening TCP/UDP sockets and processes |
|
||||
| `dus`, `dufh` | Directory and filesystem usage |
|
||||
| `failed`, `jerr`, `timers` | systemd failure, journal error, and timer views |
|
||||
| `dps`, `ddf`, `dcu` | Docker containers, disk use, and Compose startup |
|
||||
| `vms` | All libvirt guests |
|
||||
| `gpu`, `gpuloop` | NVIDIA status once or refreshed every two seconds |
|
||||
| `now` | Current timestamp and timezone |
|
||||
|
||||
`dcu` runs `docker compose up -d` in the current directory and therefore may
|
||||
create or start resources. Review the Compose project before using it.
|
||||
|
||||
## Functions
|
||||
|
||||
- `svc_status SERVICE`
|
||||
- `svc_logs SERVICE [LINES]`
|
||||
- `docker_logs CONTAINER [LINES]`
|
||||
- `docker_restart CONTAINER`
|
||||
- `vm_autostart VM`
|
||||
- `vm_no_autostart VM`
|
||||
- `path_backup PATH`
|
||||
- `extract ARCHIVE`
|
||||
|
||||
Functions validate argument counts, and Docker, libvirt, and NVIDIA helpers
|
||||
report missing commands clearly. `path_backup` creates a timestamped adjacent
|
||||
copy and can consume substantial space for large paths.
|
||||
|
||||
## Rollback
|
||||
|
||||
Review timestamped backups under `/root`, restore the desired `.bashrc` or
|
||||
profile copy, and start a new shell. Avoid restoring a backup without checking
|
||||
for unrelated shell changes made after bootstrap.
|
||||
@@ -0,0 +1,41 @@
|
||||
# Cockpit
|
||||
|
||||
## Purpose
|
||||
|
||||
The Cockpit profile installs browser-based host administration modules for
|
||||
system state, storage, networking, packages, virtual machines, metrics, and
|
||||
support reports. It enables the socket-activated service.
|
||||
|
||||
## Installation and validation
|
||||
|
||||
```bash
|
||||
sudo ./install.sh --cockpit
|
||||
systemctl status cockpit.socket
|
||||
ss -ltnp | grep ':9090'
|
||||
```
|
||||
|
||||
Connect to `https://HOSTNAME:9090`. A browser warning is expected when the
|
||||
default host certificate is not trusted.
|
||||
|
||||
`cockpit-files` is installed when available and skipped with a warning
|
||||
otherwise.
|
||||
|
||||
## Access and firewall
|
||||
|
||||
The Cockpit profile does not change UFW. Explicit toolkit UFW enablement allows
|
||||
TCP 9090, but upstream firewalls and network ACLs remain external concerns.
|
||||
Use normal Linux accounts and review which users may administer the host.
|
||||
|
||||
## Troubleshooting and rollback
|
||||
|
||||
```bash
|
||||
journalctl -u cockpit.socket -u cockpit.service
|
||||
systemctl restart cockpit.socket
|
||||
apt-cache policy cockpit cockpit-machines cockpit-files
|
||||
```
|
||||
|
||||
To disable remote access without removing packages:
|
||||
|
||||
```bash
|
||||
sudo systemctl disable --now cockpit.socket
|
||||
```
|
||||
@@ -0,0 +1,56 @@
|
||||
# Docker
|
||||
|
||||
## Package policy
|
||||
|
||||
The profile prefers Ubuntu's `docker.io` package. If that package is
|
||||
unavailable after an APT refresh, it configures Docker's official Ubuntu
|
||||
repository and installs Docker Engine, containerd, Buildx, and Compose plugins.
|
||||
|
||||
This fallback requires network access to `download.docker.com`.
|
||||
|
||||
## Daemon configuration
|
||||
|
||||
The managed settings are:
|
||||
|
||||
```json
|
||||
{
|
||||
"log-driver": "json-file",
|
||||
"log-opts": {
|
||||
"max-size": "50m",
|
||||
"max-file": "5"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
Existing valid `/etc/docker/daemon.json` content is preserved and merged with
|
||||
these log settings. A changed file is backed up with a timestamp. Invalid JSON
|
||||
causes the profile to stop rather than overwrite operator configuration.
|
||||
|
||||
Log limits apply to newly created containers. Existing containers may retain
|
||||
their original logging configuration until recreated.
|
||||
|
||||
## Validation
|
||||
|
||||
```bash
|
||||
docker version
|
||||
docker compose version
|
||||
docker info
|
||||
docker ps
|
||||
docker system df
|
||||
jq . /etc/docker/daemon.json
|
||||
```
|
||||
|
||||
## Troubleshooting and rollback
|
||||
|
||||
```bash
|
||||
systemctl status docker
|
||||
journalctl -u docker
|
||||
jq empty /etc/docker/daemon.json
|
||||
```
|
||||
|
||||
To restore a previous daemon configuration, review a timestamped backup,
|
||||
replace the current file, validate it with `jq empty`, and restart Docker.
|
||||
Do not restore blindly when workloads depend on newer daemon settings.
|
||||
|
||||
The profile does not configure Docker data roots, prune objects, deploy
|
||||
applications, or install the NVIDIA Container Toolkit.
|
||||
@@ -0,0 +1,47 @@
|
||||
# Fresh Install Checklist
|
||||
|
||||
## Before bootstrap
|
||||
|
||||
- Confirm Ubuntu 24.04 or newer and record the release and kernel.
|
||||
- Apply firmware settings for virtualization, IOMMU, or Secure Boot as needed.
|
||||
- Confirm console or out-of-band access before firewall work.
|
||||
- Record interfaces, addresses, routes, DNS, storage, and intended mountpoints.
|
||||
- Patch the base system and reboot if required.
|
||||
- Decide whether the host needs Docker, libvirt, Cockpit, or NVIDIA support.
|
||||
- Review application ports and VM networking before enabling UFW.
|
||||
- Confirm backups exist for any pre-existing host configuration.
|
||||
|
||||
## Bootstrap
|
||||
|
||||
Start with the least capability required:
|
||||
|
||||
```bash
|
||||
sudo ./install.sh --base --shell
|
||||
```
|
||||
|
||||
Add reviewed platform profiles:
|
||||
|
||||
```bash
|
||||
sudo ./install.sh --cockpit --docker --libvirt --nvidia-tools --tuning --security
|
||||
```
|
||||
|
||||
Do not select `--enable-ufw` until remote access and application rules are
|
||||
understood. Do not install an NVIDIA driver until hardware, kernel, Secure Boot,
|
||||
and workload compatibility are known.
|
||||
|
||||
## Post-bootstrap evidence
|
||||
|
||||
- Review all installer warnings.
|
||||
- Run `systemctl --failed`.
|
||||
- Confirm expected services with `systemctl status`.
|
||||
- Review `ss -tulpn`, `df -hT`, `ip -brief address`, and `ip route`.
|
||||
- Confirm Docker with `docker version` and `docker compose version`.
|
||||
- Confirm libvirt with `virsh list --all` and `virsh net-list --all`.
|
||||
- Confirm GPU state with `lspci -nn | grep -i nvidia` and `nvidia-smi`.
|
||||
- Reboot after driver installation and repeat the postcheck.
|
||||
|
||||
## Handover
|
||||
|
||||
Document host-specific storage, network, firewall, backup, application, GPU,
|
||||
and VM decisions. Install the separate `ailab-maintenance` toolkit only after
|
||||
reviewing its scheduled day-2 behavior.
|
||||
@@ -0,0 +1,54 @@
|
||||
# libvirt and KVM
|
||||
|
||||
## Purpose
|
||||
|
||||
The libvirt profile installs QEMU/KVM administration, UEFI firmware, software
|
||||
TPM support, VM creation tools, bridge utilities, and the libvirt daemon. This
|
||||
supports later Linux or Windows 11 VM work without defining guests.
|
||||
|
||||
## Firmware pre-checks
|
||||
|
||||
Confirm CPU virtualization is enabled:
|
||||
|
||||
```bash
|
||||
lscpu | grep -E 'Virtualization|Hypervisor'
|
||||
grep -Eom1 '(vmx|svm)' /proc/cpuinfo
|
||||
```
|
||||
|
||||
IOMMU and GPU passthrough require separate firmware, kernel command-line,
|
||||
device isolation, driver binding, and recovery planning. This toolkit reports
|
||||
hints but does not apply those changes.
|
||||
|
||||
## Validation
|
||||
|
||||
```bash
|
||||
systemctl status libvirtd
|
||||
virsh list --all
|
||||
virsh net-list --all
|
||||
virsh pool-list --all
|
||||
```
|
||||
|
||||
Use `virt-host-validate` when available for a broader host capability report.
|
||||
Desktop use of `virt-manager` requires a graphical environment or remote
|
||||
display strategy.
|
||||
|
||||
## Networking and Windows 11
|
||||
|
||||
The default libvirt NAT network is distinct from host bridge networking. Review
|
||||
DHCP, DNS, forwarding, and firewall behavior before changing it.
|
||||
|
||||
Windows 11 typically needs UEFI and a TPM device. The installed OVMF and swtpm
|
||||
packages provide those building blocks, but guest creation and licensing remain
|
||||
manual.
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
```bash
|
||||
journalctl -u libvirtd
|
||||
virsh net-info default
|
||||
virsh pool-list --all
|
||||
lsmod | grep kvm
|
||||
```
|
||||
|
||||
Disabling `libvirtd` does not remove VM disks or definitions. Package removal
|
||||
and VM data deletion are intentionally outside this toolkit.
|
||||
@@ -0,0 +1,52 @@
|
||||
# NVIDIA Tooling
|
||||
|
||||
## Diagnostic-only default
|
||||
|
||||
The normal NVIDIA profile installs `nvtop`, `clinfo`, and PCI utilities. It
|
||||
does not install or select a driver:
|
||||
|
||||
```bash
|
||||
sudo ./install.sh --nvidia-tools
|
||||
```
|
||||
|
||||
Review hardware and current module state:
|
||||
|
||||
```bash
|
||||
lspci -nn | grep -i nvidia
|
||||
nvidia-smi
|
||||
dkms status
|
||||
mokutil --sb-state
|
||||
```
|
||||
|
||||
## Explicit driver installation
|
||||
|
||||
Install only a reviewed Ubuntu driver package version:
|
||||
|
||||
```bash
|
||||
sudo ./install.sh --install-nvidia-driver 550
|
||||
```
|
||||
|
||||
The numeric value maps directly to `nvidia-driver-VERSION`. The profile refuses
|
||||
an unavailable package. Reboot after installation, then validate `nvidia-smi`,
|
||||
kernel logs, DKMS state, and application behavior.
|
||||
|
||||
## Selection considerations
|
||||
|
||||
- GPU generation and supported driver branch.
|
||||
- Ubuntu release, kernel, and HWE stack.
|
||||
- Secure Boot module enrollment.
|
||||
- CUDA or application compatibility.
|
||||
- Docker NVIDIA Container Toolkit requirements.
|
||||
- Whether the device will be bound to VFIO instead of the host driver.
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
```bash
|
||||
journalctl -k | grep -Ei 'nvidia|nouveau|NVRM'
|
||||
lsmod | grep -E 'nvidia|nouveau'
|
||||
dkms status
|
||||
apt-cache policy 'nvidia-driver-*'
|
||||
```
|
||||
|
||||
Driver rollback is environment-specific and is not automated. Preserve console
|
||||
access and a known-good kernel before changing GPU or Secure Boot configuration.
|
||||
Reference in New Issue
Block a user