2026-05-10 11:11:03 +00:00
# Linux/Unix Infrastructure Engineering Portfolio
2026-05-05 21:08:22 +00:00
2026-05-10 11:11:03 +00:00
This repository contains sanitized infrastructure automation examples based on Linux/Unix operations and infrastructure workflows. The focus is on incident response, troubleshooting, pre-checks, dry-run behavior, controlled execution, post-checks, and readable operational evidence.
2026-05-05 21:47:33 +00:00
2026-05-10 11:11:03 +00:00
It is a technical portfolio, not a production toolkit. The examples show how operational work is structured: understand the current state, make changes only with explicit controls, verify the result, and leave enough evidence for review.
2026-05-06 06:36:53 +00:00
2026-05-10 11:11:03 +00:00
## What This Repo Is
2026-05-05 21:47:33 +00:00
2026-05-10 11:11:03 +00:00
- Practical Linux/Unix operations examples.
- Safe Bash and Ansible patterns for lab and review.
- Runbook-driven examples for incident response, storage operations, hardening, and observability.
- A place for platform and lab topics to grow without pretending unfinished areas are complete.
## What This Repo Is Not
- It is not a compliance benchmark implementation.
- It is not a drop-in change automation framework.
- It is not proof that these exact scripts ran in any production environment.
- It does not replace change review, peer review, backups, monitoring, or platform-specific runbooks.
## Repository Layout
- [infra-run ](./infra-run/ ) - core operational tooling and automation.
- [platform-projects ](./platform-projects/ ) - larger platform topics and case-study areas.
- [labs ](./labs/ ) - experimental/lab environments and notes.
- [docs/codex ](./docs/codex/ ) - guidance for future Codex-driven changes.
- [scripts ](./scripts/ ) - lightweight repository validation helpers.
## Usable Now
- [infra-run ](./infra-run/ ) - the main implemented project in this repository.
2026-05-08 21:18:22 +00:00
- [Linux healthcheck scripts ](./infra-run/scripts/bash/os-healthcheck/ ) - host, disk, service, network, and report helpers.
- [Disk full workflow ](./infra-run/scripts/bash/disk-full/ ) - triage scripts for usage, inode pressure, deleted open files, large files, log cleanup review, and postchecks.
- [Veritas examples ](./infra-run/scripts/bash/veritas/ ) - dry-run-first VxVM/VCS storage expansion workflow examples.
- [GPFS examples ](./infra-run/scripts/bash/gpfs/ ) - dry-run-first IBM Spectrum Scale expansion workflow examples.
- [Ansible hardening examples ](./infra-run/ansible/ ) - selected Linux and AIX baseline hardening tasks organized as lab-safe roles.
2026-05-05 21:47:33 +00:00
2026-05-10 11:11:03 +00:00
## Planned Areas
2026-05-05 21:47:33 +00:00
2026-05-08 21:18:22 +00:00
The `labs` and `platform-projects` trees are intentionally thin. They are kept as planning areas for future lab notes and case studies, not as completed projects. Current planned topics are tracked in [ROADMAP.md ](./ROADMAP.md ).
2026-05-05 21:47:33 +00:00
2026-05-09 09:41:55 +00:00
## Documentation
### Production Operations
- [infra-run/docs/operations-cheatsheet.md ](./infra-run/docs/operations-cheatsheet.md ) - production-focused Linux/Unix operations reference for incident handling, validation, storage, networking, Ansible, observability, and safety-first change execution.
### Platform Engineering
- [platform-projects/docs/platform-cheatsheet.md ](./platform-projects/docs/platform-cheatsheet.md ) - platform operations reference for Kubernetes, Helm, containers, Terraform, CI/CD, observability, and GPU-backed infrastructure troubleshooting.
### Labs & Experiments
- [labs/docs/lab-cheatsheet.md ](./labs/docs/lab-cheatsheet.md ) - quick-reference scratchpad for K3s, Proxmox, Terraform, Docker, networking, and short-lived lab troubleshooting work.
2026-05-10 11:11:03 +00:00
### Codex and Review Guidance
2026-05-05 21:47:33 +00:00
2026-05-10 11:11:03 +00:00
- [AGENTS.md ](./AGENTS.md ) - repository rules for automated and assisted changes.
- [docs/codex/README.md ](./docs/codex/README.md ) - Codex workflow and expected final response format.
- [docs/codex/review-checklist.md ](./docs/codex/review-checklist.md ) - safety, Bash, Ansible, docs, and validation review checklist.
- [docs/codex/task-template.md ](./docs/codex/task-template.md ) - reusable scoped task templates.
## Safety-First Usage
Read scripts and playbooks before running them. Operational examples are sanitized and may need adaptation for a real system.
- Prefer read-only commands first.
- Use dry-run/check mode before execution.
- Treat `--execute` as a change-control boundary.
- Confirm backups, monitoring, application impact, and rollback steps before live use.
- Do not run platform-specific storage commands without a matching Veritas, GPFS, or AIX lab.
2026-05-05 21:47:33 +00:00
2026-05-08 21:18:22 +00:00
## Validation
2026-05-05 21:47:33 +00:00
2026-05-08 21:18:22 +00:00
Basic local validation:
2026-05-05 21:47:33 +00:00
2026-05-08 21:18:22 +00:00
``` bash
2026-05-10 11:11:03 +00:00
./scripts/validate-repo.sh
./scripts/check-bash.sh
./scripts/check-ansible.sh
./scripts/check-docs.sh
2026-05-08 21:18:22 +00:00
```
2026-05-05 21:47:33 +00:00
2026-05-10 11:11:03 +00:00
The validation helpers run required lightweight checks and use optional tools such as `shellcheck` , `yamllint` , `ansible-playbook` , `ansible-lint` , and `markdownlint` when available. Set `STRICT=1` to fail when optional tools are missing.
2026-05-08 21:18:22 +00:00
Some scripts depend on platform tools such as `vxdisk` , `hagrp` , `mmcrnsd` , and `mmlscluster` . Those commands are not expected to exist on a normal workstation, so functional testing against Veritas or GPFS requires a real lab environment.
2026-05-05 21:47:33 +00:00
2026-05-08 21:18:22 +00:00
See [infra-run/TESTED.md ](./infra-run/TESTED.md ) and [infra-run/KNOWN_LIMITATIONS.md ](./infra-run/KNOWN_LIMITATIONS.md ) for the current validation status.
2026-05-05 21:47:33 +00:00
2026-05-10 11:11:03 +00:00
## Operational Areas Demonstrated
2026-05-05 21:47:33 +00:00
2026-05-08 21:18:22 +00:00
- Linux operations triage and reporting.
- Disk pressure and deleted-file incident analysis.
- Dry-run-first Bash automation.
- Controlled storage change workflow design.
- Veritas VxVM/VCS operational awareness.
- GPFS / IBM Spectrum Scale operational awareness.
- Ansible role organization for selected hardening controls.
- Clear documentation of what was tested and what still needs a real system.