Files
portfolio/README.md
T
Mateusz Suski e851568c8c
lint / shell-yaml-ansible (push) Failing after 16s
Add standalone Bash incident check scripts
2026-05-11 18:49:00 +00:00

110 lines
6.7 KiB
Markdown

# Linux/Unix Infrastructure Engineering Portfolio
This repository contains sanitized infrastructure automation examples based on Linux/Unix operations and infrastructure workflows. The focus is on incident response, troubleshooting, pre-checks, dry-run behavior, controlled execution, post-checks, and readable operational evidence.
It is a technical portfolio, not a production toolkit. The examples show how operational work is structured: understand the current state, make changes only with explicit controls, verify the result, and leave enough evidence for review.
## What This Repo Is
- Practical Linux/Unix operations examples.
- Safe Bash and Ansible patterns for lab and review.
- Runbook-driven examples for incident response, storage operations, hardening, and observability.
- A place for platform and lab topics to grow without pretending unfinished areas are complete.
## What This Repo Is Not
- It is not a compliance benchmark implementation.
- It is not a drop-in change automation framework.
- It is not proof that these exact scripts ran in any production environment.
- It does not replace change review, peer review, backups, monitoring, or platform-specific runbooks.
## Repository Layout
- [infra-run](./infra-run/) - core operational tooling and automation.
- [platform-projects](./platform-projects/) - larger platform topics and case-study areas.
- [labs](./labs/) - experimental/lab environments and notes.
- [docs/codex](./docs/codex/) - guidance for future Codex-driven changes.
- [scripts](./scripts/) - lightweight repository validation helpers.
## Usable Now
- [infra-run](./infra-run/) - the main implemented project in this repository.
- [Linux healthcheck scripts](./infra-run/scripts/bash/os-healthcheck/) - host, disk, service, network, and report helpers.
- [Bash incident checks](./infra-run/scripts/bash/incident-checks/) - standalone read-only checks for common Linux incidents, designed for copy-to-server triage and ticket evidence.
- [Disk full workflow](./infra-run/scripts/bash/disk-full/) - triage scripts for usage, inode pressure, deleted open files, large files, log cleanup review, and postchecks.
- [Veritas examples](./infra-run/scripts/bash/veritas/) - dry-run-first VxVM/VCS storage expansion workflow examples.
- [GPFS examples](./infra-run/scripts/bash/gpfs/) - dry-run-first IBM Spectrum Scale expansion workflow examples.
- [Incident log summary](./infra-run/scripts/python/incident-log-summary/) - read-only Python helper for local incident log pattern summaries.
- [Log diff checker](./infra-run/scripts/python/log-diff-checker/) - read-only Python helper for before/after change log comparison.
- [Auth log audit](./infra-run/scripts/python/auth-log-audit/) - read-only Python helper for local authentication log review.
- [JVM log analyzer](./infra-run/scripts/python/jvm-log-analyzer/) - read-only Python helper for local JVM and Java application log review.
- [Journal analyzer](./infra-run/scripts/python/journal-analyzer/) - read-only Python helper for exported `journalctl` text review.
- [Known error matcher](./infra-run/scripts/python/known-error-matcher/) - read-only Python helper for matching logs against a JSON known-error catalog with runbook references.
- [Python operational log analysis tools](./infra-run/scripts/python/) - small standard-library helpers for local log summaries, before/after comparisons, and evidence reports.
- [Ansible hardening examples](./infra-run/ansible/) - selected Linux and AIX baseline hardening tasks organized as lab-safe roles.
## Planned Areas
The `labs` and `platform-projects` trees are intentionally thin. They are kept as planning areas for future lab notes and case studies, not as completed projects. Current planned topics are tracked in [ROADMAP.md](./ROADMAP.md).
## Documentation
### Production Operations
- [infra-run/docs/operations-cheatsheet.md](./infra-run/docs/operations-cheatsheet.md) - production-focused Linux/Unix operations reference for incident handling, validation, storage, networking, Ansible, observability, and safety-first change execution.
### Platform Engineering
- [platform-projects/docs/platform-cheatsheet.md](./platform-projects/docs/platform-cheatsheet.md) - platform operations reference for Kubernetes, Helm, containers, Terraform, CI/CD, observability, and GPU-backed infrastructure troubleshooting.
### Labs & Experiments
- [labs/docs/lab-cheatsheet.md](./labs/docs/lab-cheatsheet.md) - quick-reference scratchpad for K3s, Proxmox, Terraform, Docker, networking, and short-lived lab troubleshooting work.
### Codex and Review Guidance
- [AGENTS.md](./AGENTS.md) - repository rules for automated and assisted changes.
- [docs/codex/README.md](./docs/codex/README.md) - Codex workflow and expected final response format.
- [docs/codex/review-checklist.md](./docs/codex/review-checklist.md) - safety, Bash, Ansible, docs, and validation review checklist.
- [docs/codex/task-template.md](./docs/codex/task-template.md) - reusable scoped task templates.
## Safety-First Usage
Read scripts and playbooks before running them. Operational examples are sanitized and may need adaptation for a real system.
- Prefer read-only commands first.
- Use dry-run/check mode before execution.
- Treat `--execute` as a change-control boundary.
- Confirm backups, monitoring, application impact, and rollback steps before live use.
- Do not run platform-specific storage commands without a matching Veritas, GPFS, or AIX lab.
## Validation
Basic local validation:
```bash
./scripts/validate-repo.sh
./scripts/check-bash.sh
./scripts/check-ansible.sh
./scripts/check-python.sh
./scripts/check-docs.sh
```
The validation helpers run required lightweight checks and use optional tools such as `shellcheck`, `yamllint`, `ansible-playbook`, `ansible-lint`, and `markdownlint` when available. Python checks use `python3 -m py_compile` and do not require external Python tooling. Set `STRICT=1` to fail when optional tools are missing.
Some scripts depend on platform tools such as `vxdisk`, `hagrp`, `mmcrnsd`, and `mmlscluster`. Those commands are not expected to exist on a normal workstation, so functional testing against Veritas or GPFS requires a real lab environment.
See [infra-run/TESTED.md](./infra-run/TESTED.md) and [infra-run/KNOWN_LIMITATIONS.md](./infra-run/KNOWN_LIMITATIONS.md) for the current validation status.
## Operational Areas Demonstrated
- Linux operations triage and reporting.
- Local operational log analysis with read-only Python helpers.
- Disk pressure and deleted-file incident analysis.
- Dry-run-first Bash automation.
- Controlled storage change workflow design.
- Veritas VxVM/VCS operational awareness.
- GPFS / IBM Spectrum Scale operational awareness.
- Ansible role organization for selected hardening controls.
- Clear documentation of what was tested and what still needs a real system.