2026-05-05 21:47:33 +00:00
# Changelog
2026-05-06 09:25:43 +00:00
## [Unreleased]
### Added
2026-06-06 00:23:11 +00:00
- Added Linux Fresh Setup Toolkit under `labs/linux/setup` for day-0 Ubuntu lab host bootstrap automation.
2026-06-06 00:10:44 +00:00
- Added AI Lab Maintenance Toolkit with systemd-based Linux maintenance automation.
2026-05-11 17:10:10 +00:00
- Python tooling validation for operational scripts.
- `incident-log-summary` for general incident log summarization.
- `log-diff-checker` for pre-change and post-change log comparison.
- `auth-log-audit` for Linux authentication log review.
- `jvm-log-analyzer` for JVM application log summaries.
- `journal-analyzer` for exported `journalctl` log review.
- `known-error-matcher` with JSON-based known error patterns.
2026-05-11 18:49:00 +00:00
- Standalone Bash incident checks for CPU, memory/OOM, service restart loops, failed SSH logins, certificate expiry, DNS connectivity, NTP drift, read-only filesystems, inode usage, and JVM process diagnostics.
2026-05-12 20:00:42 +00:00
- `incident_triage_report.sh` for L2 Markdown incident handover reports built from existing Bash incident checks.
2026-05-10 11:11:03 +00:00
- Repository-level Codex guidance:
- `AGENTS.md`
- `docs/codex/README.md`
- `docs/codex/review-checklist.md`
- `docs/codex/task-template.md`
- `docs/codex/plans-template.md`
- Lightweight validation helpers:
- `scripts/validate-repo.sh`
- `scripts/check-bash.sh`
- `scripts/check-ansible.sh`
- `scripts/check-docs.sh`
2026-05-09 09:41:55 +00:00
- Cross-repository operational documentation structure:
- `infra-run/docs/operations-cheatsheet.md`
- `platform-projects/docs/platform-cheatsheet.md`
- `labs/docs/lab-cheatsheet.md`
- Production-oriented Linux/Unix operations reference with incident workflows, storage and networking checks, SSL/TLS notes, AIX commands, automation safety patterns, Ansible operational usage, and observability quick-reference.
- SELinux operational coverage for mode checks, context inspection, AVC audit review, persistent relabel workflow, booleans, and SELinux-specific incident response.
2026-05-08 21:18:22 +00:00
- Selected baseline Ansible hardening automation:
2026-05-06 09:25:43 +00:00
- RHEL 9 role and playbook.
- Debian 13 / Ubuntu 26.04 role and playbook.
- IBM AIX 7 role and playbook.
- Shared sanitized Ansible inventory defaults for Linux and AIX examples.
- Role-level task structure covering pre-checks, SSH, sudo, auditing, logging, services, filesystem controls, platform-specific settings, handlers, and post-check validation.
2026-06-04 19:54:43 +00:00
- Slurm AI/HPC Cluster Automation Lab under `platform-projects` , covering Ansible-managed Slurm operations, GPU scheduling, cgroup enforcement, SlurmDBD accounting, QOS/fairshare, lifecycle workflows, rolling upgrades, and health remediation.
2026-05-06 09:25:43 +00:00
### Changed
2026-05-10 11:11:03 +00:00
- Updated root, `infra-run` , Bash, Ansible, platform, and lab README guidance for safety-first usage, validation, and future Codex-driven work.
2026-05-09 09:41:55 +00:00
- Updated repository and `infra-run` README files to surface the new documentation structure and operational cheatsheets.
2026-05-06 09:25:43 +00:00
- Updated repository, `infra-run` , and Ansible README files to describe the new hardening automation instead of placeholder-only Ansible structure.
2026-05-11 17:10:10 +00:00
- Updated Python tooling documentation and repository roadmap.
- Integrated Python syntax validation into repository validation workflow and CI.
2026-05-06 09:25:43 +00:00
### Notes
2026-05-08 21:18:22 +00:00
- Hardening content covers selected baseline controls and intended for portfolio/lab use; live use requires environment-specific review and validation.
2026-05-06 09:25:43 +00:00
2026-05-05 21:47:33 +00:00
## [Initial Version]
### Added
- Repository structure:
- `infra-run`
- `platform-projects`
- `labs`
2026-05-05 21:50:20 +00:00
- Linux operations Bash toolkit under `infra-run/scripts/bash/os-healthcheck/` :
2026-05-05 21:47:33 +00:00
- healthcheck
- disk usage checks
- service checks
- system reporting
- Disk full incident toolkit:
- disk analysis
- large files detection
- deleted open files detection
- safe cleanup suggestions
2026-05-05 21:50:20 +00:00
- Network troubleshooting script under `infra-run/scripts/bash/os-healthcheck/` :
2026-05-05 21:47:33 +00:00
- interface, routing, DNS, connectivity checks
- Veritas storage toolkit:
- VxVM disk detection
- diskgroup extension
- volume/filesystem resize
- VCS freeze/unfreeze workflow
- GPFS storage toolkit:
- cluster validation
- NSD planning
- filesystem expansion
- rebalance
- Runbook-style structure and step-based execution.
2026-05-05 21:50:20 +00:00
### Changed
- Moved Linux operations scripts into `infra-run/scripts/bash/os-healthcheck/` to keep host health and troubleshooting checks grouped together.
2026-05-05 21:47:33 +00:00
### Notes
- All scripts default to dry-run where change actions are present.
- Designed for safety and readability.
- No destructive actions without explicit confirmation.