100 lines
5.4 KiB
Markdown
100 lines
5.4 KiB
Markdown
# Linux/Unix Infrastructure Engineering Portfolio
|
|
|
|
This repository contains sanitized infrastructure automation examples based on Linux/Unix operations and infrastructure workflows. The focus is on incident response, troubleshooting, pre-checks, dry-run behavior, controlled execution, post-checks, and readable operational evidence.
|
|
|
|
It is a technical portfolio, not a production toolkit. The examples show how operational work is structured: understand the current state, make changes only with explicit controls, verify the result, and leave enough evidence for review.
|
|
|
|
## What This Repo Is
|
|
|
|
- Practical Linux/Unix operations examples.
|
|
- Safe Bash and Ansible patterns for lab and review.
|
|
- Runbook-driven examples for incident response, storage operations, hardening, and observability.
|
|
- A place for platform and lab topics to grow without pretending unfinished areas are complete.
|
|
|
|
## What This Repo Is Not
|
|
|
|
- It is not a compliance benchmark implementation.
|
|
- It is not a drop-in change automation framework.
|
|
- It is not proof that these exact scripts ran in any production environment.
|
|
- It does not replace change review, peer review, backups, monitoring, or platform-specific runbooks.
|
|
|
|
## Repository Layout
|
|
|
|
- [infra-run](./infra-run/) - core operational tooling and automation.
|
|
- [platform-projects](./platform-projects/) - larger platform topics and case-study areas.
|
|
- [labs](./labs/) - experimental/lab environments and notes.
|
|
- [docs/codex](./docs/codex/) - guidance for future Codex-driven changes.
|
|
- [scripts](./scripts/) - lightweight repository validation helpers.
|
|
|
|
## Usable Now
|
|
|
|
- [infra-run](./infra-run/) - the main implemented project in this repository.
|
|
- [Linux healthcheck scripts](./infra-run/scripts/bash/os-healthcheck/) - host, disk, service, network, and report helpers.
|
|
- [Disk full workflow](./infra-run/scripts/bash/disk-full/) - triage scripts for usage, inode pressure, deleted open files, large files, log cleanup review, and postchecks.
|
|
- [Veritas examples](./infra-run/scripts/bash/veritas/) - dry-run-first VxVM/VCS storage expansion workflow examples.
|
|
- [GPFS examples](./infra-run/scripts/bash/gpfs/) - dry-run-first IBM Spectrum Scale expansion workflow examples.
|
|
- [Ansible hardening examples](./infra-run/ansible/) - selected Linux and AIX baseline hardening tasks organized as lab-safe roles.
|
|
|
|
## Planned Areas
|
|
|
|
The `labs` and `platform-projects` trees are intentionally thin. They are kept as planning areas for future lab notes and case studies, not as completed projects. Current planned topics are tracked in [ROADMAP.md](./ROADMAP.md).
|
|
|
|
## Documentation
|
|
|
|
### Production Operations
|
|
|
|
- [infra-run/docs/operations-cheatsheet.md](./infra-run/docs/operations-cheatsheet.md) - production-focused Linux/Unix operations reference for incident handling, validation, storage, networking, Ansible, observability, and safety-first change execution.
|
|
|
|
### Platform Engineering
|
|
|
|
- [platform-projects/docs/platform-cheatsheet.md](./platform-projects/docs/platform-cheatsheet.md) - platform operations reference for Kubernetes, Helm, containers, Terraform, CI/CD, observability, and GPU-backed infrastructure troubleshooting.
|
|
|
|
### Labs & Experiments
|
|
|
|
- [labs/docs/lab-cheatsheet.md](./labs/docs/lab-cheatsheet.md) - quick-reference scratchpad for K3s, Proxmox, Terraform, Docker, networking, and short-lived lab troubleshooting work.
|
|
|
|
### Codex and Review Guidance
|
|
|
|
- [AGENTS.md](./AGENTS.md) - repository rules for automated and assisted changes.
|
|
- [docs/codex/README.md](./docs/codex/README.md) - Codex workflow and expected final response format.
|
|
- [docs/codex/review-checklist.md](./docs/codex/review-checklist.md) - safety, Bash, Ansible, docs, and validation review checklist.
|
|
- [docs/codex/task-template.md](./docs/codex/task-template.md) - reusable scoped task templates.
|
|
|
|
## Safety-First Usage
|
|
|
|
Read scripts and playbooks before running them. Operational examples are sanitized and may need adaptation for a real system.
|
|
|
|
- Prefer read-only commands first.
|
|
- Use dry-run/check mode before execution.
|
|
- Treat `--execute` as a change-control boundary.
|
|
- Confirm backups, monitoring, application impact, and rollback steps before live use.
|
|
- Do not run platform-specific storage commands without a matching Veritas, GPFS, or AIX lab.
|
|
|
|
## Validation
|
|
|
|
Basic local validation:
|
|
|
|
```bash
|
|
./scripts/validate-repo.sh
|
|
./scripts/check-bash.sh
|
|
./scripts/check-ansible.sh
|
|
./scripts/check-docs.sh
|
|
```
|
|
|
|
The validation helpers run required lightweight checks and use optional tools such as `shellcheck`, `yamllint`, `ansible-playbook`, `ansible-lint`, and `markdownlint` when available. Set `STRICT=1` to fail when optional tools are missing.
|
|
|
|
Some scripts depend on platform tools such as `vxdisk`, `hagrp`, `mmcrnsd`, and `mmlscluster`. Those commands are not expected to exist on a normal workstation, so functional testing against Veritas or GPFS requires a real lab environment.
|
|
|
|
See [infra-run/TESTED.md](./infra-run/TESTED.md) and [infra-run/KNOWN_LIMITATIONS.md](./infra-run/KNOWN_LIMITATIONS.md) for the current validation status.
|
|
|
|
## Operational Areas Demonstrated
|
|
|
|
- Linux operations triage and reporting.
|
|
- Disk pressure and deleted-file incident analysis.
|
|
- Dry-run-first Bash automation.
|
|
- Controlled storage change workflow design.
|
|
- Veritas VxVM/VCS operational awareness.
|
|
- GPFS / IBM Spectrum Scale operational awareness.
|
|
- Ansible role organization for selected hardening controls.
|
|
- Clear documentation of what was tested and what still needs a real system.
|