Files
portfolio/platform-projects/README.md
T
Mateusz Suski 1843796e92
lint / shell-yaml-ansible (push) Failing after 17s
Document Slurm AI/HPC cluster project
2026-06-05 15:39:24 +00:00

1.0 KiB

platform-projects

This directory contains larger infrastructure platform topics and case studies. Most subdirectories are planning areas unless their own README says otherwise.

Implemented platform projects

  • hpc-slurm-ai-cluster - Slurm AI/HPC cluster automation covering Ansible-managed Slurm operations, GPU scheduling with GRES, cgroup enforcement, SlurmDBD accounting, QOS/fairshare/priority, node lifecycle operations, rolling upgrades, and health remediation.

Planning areas

These subdirectories are intentionally light and should be read as planning areas unless their own README says otherwise:

  • monitoring-zabbix
  • elk-log-analysis
  • storage
  • clustering
  • virtualization

Planned platform topics are tracked in ROADMAP.md. Keep future additions operational: scope, topology, validation, limitations, and runbook links should matter more than diagrams or buzzwords.

For Codex-driven changes, use AGENTS.md and the templates under docs/codex.