# Linux/Unix Infrastructure Engineering Portfolio This repository contains sanitized infrastructure automation examples based on Linux/Unix operations and infrastructure workflows. The focus is on incident response, troubleshooting, pre-checks, dry-run behavior, controlled execution, post-checks, and readable operational evidence. It is a technical portfolio, not a production toolkit. The examples show how operational work is structured: understand the current state, make changes only with explicit controls, verify the result, and leave enough evidence for review. ## What This Repo Is - Practical Linux/Unix operations examples. - Safe Bash and Ansible patterns for lab and review. - Runbook-driven examples for incident response, storage operations, hardening, and observability. - A place for platform and lab topics to grow without pretending unfinished areas are complete. ## What This Repo Is Not - It is not a compliance benchmark implementation. - It is not a drop-in change automation framework. - It is not proof that these exact scripts ran in any production environment. - It does not replace change review, peer review, backups, monitoring, or platform-specific runbooks. ## Repository Layout - [infra-run](./infra-run/) - core operational tooling and automation. - [platform-projects](./platform-projects/) - larger platform topics and case-study areas. - [labs](./labs/) - experimental/lab environments and notes. - [docs/codex](./docs/codex/) - guidance for future Codex-driven changes. - [scripts](./scripts/) - lightweight repository validation helpers. ## Usable Now - [infra-run](./infra-run/) - the main implemented project in this repository. - [Linux healthcheck scripts](./infra-run/scripts/bash/os-healthcheck/) - host, disk, service, network, and report helpers. - [Bash incident checks](./infra-run/scripts/bash/incident-checks/) - standalone read-only checks for common Linux incidents, designed for copy-to-server triage and ticket evidence. - [Disk full workflow](./infra-run/scripts/bash/disk-full/) - triage scripts for usage, inode pressure, deleted open files, large files, log cleanup review, and postchecks. - [Veritas examples](./infra-run/scripts/bash/veritas/) - dry-run-first VxVM/VCS storage expansion workflow examples. - [GPFS examples](./infra-run/scripts/bash/gpfs/) - dry-run-first IBM Spectrum Scale expansion workflow examples. - [Incident log summary](./infra-run/scripts/python/incident-log-summary/) - read-only Python helper for local incident log pattern summaries. - [Log diff checker](./infra-run/scripts/python/log-diff-checker/) - read-only Python helper for before/after change log comparison. - [Auth log audit](./infra-run/scripts/python/auth-log-audit/) - read-only Python helper for local authentication log review. - [JVM log analyzer](./infra-run/scripts/python/jvm-log-analyzer/) - read-only Python helper for local JVM and Java application log review. - [Journal analyzer](./infra-run/scripts/python/journal-analyzer/) - read-only Python helper for exported `journalctl` text review. - [Known error matcher](./infra-run/scripts/python/known-error-matcher/) - read-only Python helper for matching logs against a JSON known-error catalog with runbook references. - [Python operational log analysis tools](./infra-run/scripts/python/) - small standard-library helpers for local log summaries, before/after comparisons, and evidence reports. - [Ansible hardening examples](./infra-run/ansible/) - selected Linux and AIX baseline hardening tasks organized as lab-safe roles. ## Planned Areas The `labs` and `platform-projects` trees are intentionally thin. They are kept as planning areas for future lab notes and case studies, not as completed projects. Current planned topics are tracked in [ROADMAP.md](./ROADMAP.md). ## Documentation ### Production Operations - [infra-run/docs/operations-cheatsheet.md](./infra-run/docs/operations-cheatsheet.md) - production-focused Linux/Unix operations reference for incident handling, validation, storage, networking, Ansible, observability, and safety-first change execution. ### Platform Engineering - [platform-projects/docs/platform-cheatsheet.md](./platform-projects/docs/platform-cheatsheet.md) - platform operations reference for Kubernetes, Helm, containers, Terraform, CI/CD, observability, and GPU-backed infrastructure troubleshooting. ### Labs & Experiments - [labs/docs/lab-cheatsheet.md](./labs/docs/lab-cheatsheet.md) - quick-reference scratchpad for K3s, Proxmox, Terraform, Docker, networking, and short-lived lab troubleshooting work. ### Codex and Review Guidance - [AGENTS.md](./AGENTS.md) - repository rules for automated and assisted changes. - [docs/codex/README.md](./docs/codex/README.md) - Codex workflow and expected final response format. - [docs/codex/review-checklist.md](./docs/codex/review-checklist.md) - safety, Bash, Ansible, docs, and validation review checklist. - [docs/codex/task-template.md](./docs/codex/task-template.md) - reusable scoped task templates. ## Safety-First Usage Read scripts and playbooks before running them. Operational examples are sanitized and may need adaptation for a real system. - Prefer read-only commands first. - Use dry-run/check mode before execution. - Treat `--execute` as a change-control boundary. - Confirm backups, monitoring, application impact, and rollback steps before live use. - Do not run platform-specific storage commands without a matching Veritas, GPFS, or AIX lab. ## Validation Basic local validation: ```bash ./scripts/validate-repo.sh ./scripts/check-bash.sh ./scripts/check-ansible.sh ./scripts/check-python.sh ./scripts/check-docs.sh ``` The validation helpers run required lightweight checks and use optional tools such as `shellcheck`, `yamllint`, `ansible-playbook`, `ansible-lint`, and `markdownlint` when available. Python checks use `python3 -m py_compile` and do not require external Python tooling. Set `STRICT=1` to fail when optional tools are missing. Some scripts depend on platform tools such as `vxdisk`, `hagrp`, `mmcrnsd`, and `mmlscluster`. Those commands are not expected to exist on a normal workstation, so functional testing against Veritas or GPFS requires a real lab environment. See [infra-run/TESTED.md](./infra-run/TESTED.md) and [infra-run/KNOWN_LIMITATIONS.md](./infra-run/KNOWN_LIMITATIONS.md) for the current validation status. ## Operational Areas Demonstrated - Linux operations triage and reporting. - Local operational log analysis with read-only Python helpers. - Disk pressure and deleted-file incident analysis. - Dry-run-first Bash automation. - Controlled storage change workflow design. - Veritas VxVM/VCS operational awareness. - GPFS / IBM Spectrum Scale operational awareness. - Ansible role organization for selected hardening controls. - Clear documentation of what was tested and what still needs a real system.