Files
portfolio/platform-projects/README.md
T
Mateusz Suski 1843796e92
lint / shell-yaml-ansible (push) Failing after 17s
Document Slurm AI/HPC cluster project
2026-06-05 15:39:24 +00:00

22 lines
1.0 KiB
Markdown

# platform-projects
This directory contains larger infrastructure platform topics and case studies. Most subdirectories are planning areas unless their own README says otherwise.
## Implemented platform projects
- [hpc-slurm-ai-cluster](./hpc-slurm-ai-cluster/) - Slurm AI/HPC cluster automation covering Ansible-managed Slurm operations, GPU scheduling with GRES, cgroup enforcement, SlurmDBD accounting, QOS/fairshare/priority, node lifecycle operations, rolling upgrades, and health remediation.
## Planning areas
These subdirectories are intentionally light and should be read as planning areas unless their own README says otherwise:
- `monitoring-zabbix`
- `elk-log-analysis`
- `storage`
- `clustering`
- `virtualization`
Planned platform topics are tracked in [ROADMAP.md](../ROADMAP.md). Keep future additions operational: scope, topology, validation, limitations, and runbook links should matter more than diagrams or buzzwords.
For Codex-driven changes, use [AGENTS.md](../AGENTS.md) and the templates under [docs/codex](../docs/codex/).