22 lines
1.0 KiB
Markdown
22 lines
1.0 KiB
Markdown
# platform-projects
|
|
|
|
This directory contains larger infrastructure platform topics and case studies. Most subdirectories are planning areas unless their own README says otherwise.
|
|
|
|
## Implemented platform projects
|
|
|
|
- [hpc-slurm-ai-cluster](./hpc-slurm-ai-cluster/) - Slurm AI/HPC cluster automation covering Ansible-managed Slurm operations, GPU scheduling with GRES, cgroup enforcement, SlurmDBD accounting, QOS/fairshare/priority, node lifecycle operations, rolling upgrades, and health remediation.
|
|
|
|
## Planning areas
|
|
|
|
These subdirectories are intentionally light and should be read as planning areas unless their own README says otherwise:
|
|
|
|
- `monitoring-zabbix`
|
|
- `elk-log-analysis`
|
|
- `storage`
|
|
- `clustering`
|
|
- `virtualization`
|
|
|
|
Planned platform topics are tracked in [ROADMAP.md](../ROADMAP.md). Keep future additions operational: scope, topology, validation, limitations, and runbook links should matter more than diagrams or buzzwords.
|
|
|
|
For Codex-driven changes, use [AGENTS.md](../AGENTS.md) and the templates under [docs/codex](../docs/codex/).
|