diff --git a/AGENTS.md b/AGENTS.md new file mode 100644 index 0000000..9e9218e --- /dev/null +++ b/AGENTS.md @@ -0,0 +1,116 @@ +# AGENTS.md + +Guidance for Codex and other automated agents working in this repository. + +## Purpose + +This repository is a Linux/Unix infrastructure engineering portfolio. It shows practical operational work: incident response, troubleshooting, safe Bash tooling, Ansible hardening examples, storage workflows, runbooks, and platform/lab notes. + +Treat it like internal operations tooling maintained by an infrastructure engineer. Preserve operational realism and avoid generic tutorial or template filler. + +## Layout + +- `infra-run/` - core operational tooling, Ansible, Bash scripts, runbooks, examples, and operations docs. +- `platform-projects/` - larger platform topics such as monitoring, storage, clustering, virtualization, and observability. +- `labs/` - experimental/lab environments for Kubernetes, Terraform, networking, CI/CD, Docker, and related work. +- `docs/codex/` - Codex workflow guidance, task templates, review checklist, and planning template. +- `scripts/` - repository validation helpers. + +## Inspect First + +Before editing, inspect the affected tree and nearby README files. Prefer: + +```bash +rg --files +git status --short +sed -n '1,220p' +``` + +Check existing style before introducing new structure. Keep changes small and reviewable. + +## Validation + +Run the broad repo check when practical: + +```bash +./scripts/validate-repo.sh +``` + +Focused checks: + +```bash +./scripts/check-bash.sh +./scripts/check-ansible.sh +./scripts/check-docs.sh +``` + +Optional strict mode fails when optional tools are missing: + +```bash +STRICT=1 ./scripts/validate-repo.sh +``` + +Also run targeted checks for changed files, such as `bash -n`, `ansible-playbook --syntax-check`, or link checks when relevant. + +## Bash Standards + +- Use `#!/usr/bin/env bash`. +- Use `set -o errexit`, `set -o nounset`, and `set -o pipefail`. +- Validate input before using it. +- Handle missing commands clearly. +- Default to read-only or dry-run behavior. +- Require explicit `--execute` plus confirmation for destructive operations. +- Use clear `OK`, `WARNING`, and `CRITICAL` output. +- Exit codes: `0` OK, `1` operational issue, `2` invalid input or missing dependency. +- Keep scripts readable; separate discovery, pre-check, change, post-check, and reporting when it helps. + +## Ansible Standards + +- Keep playbooks short and roles simple. +- Prefer modules over `shell` or `command`. +- Use `shell` or `command` only when the module set cannot express the operation, and document why if risk is not obvious. +- Preserve check-mode and diff-mode friendliness where possible. +- Use handlers, tags, defaults, and validation tasks when they clarify operations. +- Keep inventory under `inventory/hosts.yml`, `group_vars/`, and `host_vars/`. +- Do not present selected hardening examples as complete compliance certification. + +## Documentation Standards + +- Explain what exists, what is planned, and what is intentionally not supported. +- Prefer runbook style: scope, pre-checks, execution guardrails, rollback thinking, post-checks, and evidence. +- Avoid marketing language, fake enterprise wording, and tutorial bloat. +- Update README files and `CHANGELOG.md` when adding meaningful behavior or structure. + +## Safety Rules + +- Do not run destructive commands. +- Do not rename large directories unless the benefit is clear and low-risk. +- Do not hide validation failures. +- Do not claim live production validation for sanitized examples. +- Do not add secrets, real hostnames, customer identifiers, or private infrastructure details. +- Do not turn placeholders into fake completed projects. + +## PR and Review Expectations + +- State the operational risk of the change. +- Include commands run and whether tools were missing. +- Review scripts for dry-run behavior, input validation, dependency handling, and rollback path. +- Review Ansible for idempotency, check-mode behavior, inventory targeting, tags, handlers, and module choice. +- Keep diffs focused. + +## Definition of Done + +- The change preserves the repository intent. +- Relevant docs are updated. +- Changed Bash scripts pass `bash -n`. +- Available validation helpers were run. +- Missing optional tools are reported. +- Any remaining risk or follow-up is documented. + +## Do Not + +- Do not add an "ultimate DevOps template" structure. +- Do not replace working simple Bash with unnecessary abstractions. +- Do not make examples appear production-certified. +- Do not add destructive behavior without `--execute`, confirmation, and clear rollback notes. +- Do not delete useful content unless it is clearly duplicate, broken, or misleading. diff --git a/CHANGELOG.md b/CHANGELOG.md index 5bb9d5c..9435e14 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -4,6 +4,17 @@ ### Added +- Repository-level Codex guidance: + - `AGENTS.md` + - `docs/codex/README.md` + - `docs/codex/review-checklist.md` + - `docs/codex/task-template.md` + - `docs/codex/plans-template.md` +- Lightweight validation helpers: + - `scripts/validate-repo.sh` + - `scripts/check-bash.sh` + - `scripts/check-ansible.sh` + - `scripts/check-docs.sh` - Cross-repository operational documentation structure: - `infra-run/docs/operations-cheatsheet.md` - `platform-projects/docs/platform-cheatsheet.md` @@ -19,6 +30,7 @@ ### Changed +- Updated root, `infra-run`, Bash, Ansible, platform, and lab README guidance for safety-first usage, validation, and future Codex-driven work. - Updated repository and `infra-run` README files to surface the new documentation structure and operational cheatsheets. - Updated repository, `infra-run`, and Ansible README files to describe the new hardening automation instead of placeholder-only Ansible structure. diff --git a/README.md b/README.md index 6c7c737..026fc5c 100644 --- a/README.md +++ b/README.md @@ -1,19 +1,41 @@ -# Portfolio +# Linux/Unix Infrastructure Engineering Portfolio -This repository contains sanitized infrastructure automation examples based on Linux operations and enterprise infrastructure workflows. The focus is on precheck, dry-run, controlled execution, postcheck, troubleshooting, and clear operational reporting. +This repository contains sanitized infrastructure automation examples based on Linux/Unix operations and infrastructure workflows. The focus is on incident response, troubleshooting, pre-checks, dry-run behavior, controlled execution, post-checks, and readable operational evidence. -It is a technical portfolio, not a production toolkit. The examples are meant to show how I structure operational work: understand the current state, make changes only with explicit controls, verify the result, and leave readable evidence for review. +It is a technical portfolio, not a production toolkit. The examples show how operational work is structured: understand the current state, make changes only with explicit controls, verify the result, and leave enough evidence for review. -## What Is Usable Now +## What This Repo Is -- [infra-run](./infra-run/) - the main project in this repository. +- Practical Linux/Unix operations examples. +- Safe Bash and Ansible patterns for lab and review. +- Runbook-driven examples for incident response, storage operations, hardening, and observability. +- A place for platform and lab topics to grow without pretending unfinished areas are complete. + +## What This Repo Is Not + +- It is not a compliance benchmark implementation. +- It is not a drop-in change automation framework. +- It is not proof that these exact scripts ran in any production environment. +- It does not replace change review, peer review, backups, monitoring, or platform-specific runbooks. + +## Repository Layout + +- [infra-run](./infra-run/) - core operational tooling and automation. +- [platform-projects](./platform-projects/) - larger platform topics and case-study areas. +- [labs](./labs/) - experimental/lab environments and notes. +- [docs/codex](./docs/codex/) - guidance for future Codex-driven changes. +- [scripts](./scripts/) - lightweight repository validation helpers. + +## Usable Now + +- [infra-run](./infra-run/) - the main implemented project in this repository. - [Linux healthcheck scripts](./infra-run/scripts/bash/os-healthcheck/) - host, disk, service, network, and report helpers. - [Disk full workflow](./infra-run/scripts/bash/disk-full/) - triage scripts for usage, inode pressure, deleted open files, large files, log cleanup review, and postchecks. - [Veritas examples](./infra-run/scripts/bash/veritas/) - dry-run-first VxVM/VCS storage expansion workflow examples. - [GPFS examples](./infra-run/scripts/bash/gpfs/) - dry-run-first IBM Spectrum Scale expansion workflow examples. - [Ansible hardening examples](./infra-run/ansible/) - selected Linux and AIX baseline hardening tasks organized as lab-safe roles. -## What Is Planned +## Planned Areas The `labs` and `platform-projects` trees are intentionally thin. They are kept as planning areas for future lab notes and case studies, not as completed projects. Current planned topics are tracked in [ROADMAP.md](./ROADMAP.md). @@ -31,28 +53,41 @@ The `labs` and `platform-projects` trees are intentionally thin. They are kept a - [labs/docs/lab-cheatsheet.md](./labs/docs/lab-cheatsheet.md) - quick-reference scratchpad for K3s, Proxmox, Terraform, Docker, networking, and short-lived lab troubleshooting work. -## What This Repo Is Not +### Codex and Review Guidance -- It is not a compliance benchmark implementation. -- It is not a drop-in change automation framework. -- It is not proof that these exact scripts ran in any production environment. -- It does not replace change review, peer review, backups, monitoring, or platform-specific runbooks. +- [AGENTS.md](./AGENTS.md) - repository rules for automated and assisted changes. +- [docs/codex/README.md](./docs/codex/README.md) - Codex workflow and expected final response format. +- [docs/codex/review-checklist.md](./docs/codex/review-checklist.md) - safety, Bash, Ansible, docs, and validation review checklist. +- [docs/codex/task-template.md](./docs/codex/task-template.md) - reusable scoped task templates. + +## Safety-First Usage + +Read scripts and playbooks before running them. Operational examples are sanitized and may need adaptation for a real system. + +- Prefer read-only commands first. +- Use dry-run/check mode before execution. +- Treat `--execute` as a change-control boundary. +- Confirm backups, monitoring, application impact, and rollback steps before live use. +- Do not run platform-specific storage commands without a matching Veritas, GPFS, or AIX lab. ## Validation Basic local validation: ```bash -find infra-run/scripts/bash -name '*.sh' -print0 | xargs -0 shellcheck -x -P infra-run/scripts/bash/disk-full -P infra-run/scripts/bash/gpfs -P infra-run/scripts/bash/veritas -yamllint . -cd infra-run/ansible && ansible-lint playbooks roles +./scripts/validate-repo.sh +./scripts/check-bash.sh +./scripts/check-ansible.sh +./scripts/check-docs.sh ``` +The validation helpers run required lightweight checks and use optional tools such as `shellcheck`, `yamllint`, `ansible-playbook`, `ansible-lint`, and `markdownlint` when available. Set `STRICT=1` to fail when optional tools are missing. + Some scripts depend on platform tools such as `vxdisk`, `hagrp`, `mmcrnsd`, and `mmlscluster`. Those commands are not expected to exist on a normal workstation, so functional testing against Veritas or GPFS requires a real lab environment. See [infra-run/TESTED.md](./infra-run/TESTED.md) and [infra-run/KNOWN_LIMITATIONS.md](./infra-run/KNOWN_LIMITATIONS.md) for the current validation status. -## Skills Demonstrated +## Operational Areas Demonstrated - Linux operations triage and reporting. - Disk pressure and deleted-file incident analysis. diff --git a/docs/codex/README.md b/docs/codex/README.md new file mode 100644 index 0000000..2838993 --- /dev/null +++ b/docs/codex/README.md @@ -0,0 +1,55 @@ +# Codex Workflow + +This directory keeps future Codex sessions consistent when working in this infrastructure portfolio. + +## How To Start + +1. Read [AGENTS.md](../../AGENTS.md). +2. Inspect the affected tree and nearby README files. +3. Check `git status --short` so existing user work is preserved. +4. Decide whether a plan is needed before editing. +5. Make small, reviewable changes. +6. Run focused validation plus `./scripts/validate-repo.sh` when practical. + +## When To Plan First + +Plan before editing when a task touches more than one subsystem, changes operational behavior, adds or modifies destructive actions, changes Ansible targeting, or updates repository conventions. + +For small typo fixes, narrow README updates, or obvious syntax fixes, inspect first and then make the change directly. + +Use [plans-template.md](./plans-template.md) for larger changes. + +## Scoped Tasks + +Good tasks name the operational goal, affected directories, constraints, validation commands, and what "done" means. Use [task-template.md](./task-template.md) for reusable prompts. + +Keep scope tied to real operations: + +- Bash tool: discovery, pre-check, dry-run, execute, post-check, report. +- Ansible change: inventory target, role/playbook scope, check mode, idempotency, validation. +- Runbook: incident signal, triage, decision points, rollback, evidence. +- Lab/platform project: status, prerequisites, validation, limitations. + +## Validation + +Prefer the repository helpers: + +```bash +./scripts/check-bash.sh +./scripts/check-ansible.sh +./scripts/check-docs.sh +./scripts/validate-repo.sh +``` + +If optional tools are missing, report that clearly and continue with available checks. Do not claim skipped checks passed. + +## Final Response Format + +End with: + +1. Summary of what changed. +2. Files created or modified. +3. Validation commands run and results. +4. Skipped checks and why. +5. Risks or follow-ups. +6. Whether the repo is ready for future Codex-driven work. diff --git a/docs/codex/plans-template.md b/docs/codex/plans-template.md new file mode 100644 index 0000000..dc35cc0 --- /dev/null +++ b/docs/codex/plans-template.md @@ -0,0 +1,35 @@ +# Implementation Plan Template + +Use this for changes that touch multiple files, alter operational behavior, or add new repository conventions. + +## Goal + +State the operational or maintenance outcome. + +## Current State + +Summarize the directories and conventions inspected. + +## Scope + +List files or directories expected to change. + +## Non-Goals + +Name what will not be redesigned, renamed, deleted, or claimed as complete. + +## Plan + +1. Inspect relevant scripts, playbooks, docs, and examples. +2. Make the smallest structural or documentation changes needed. +3. Update validation or runbook guidance. +4. Run focused checks. +5. Summarize residual risk and follow-ups. + +## Validation + +List commands to run, including fallback behavior for missing tools. + +## Risks + +Call out destructive operations, platform assumptions, missing lab environments, or checks that require real systems. diff --git a/docs/codex/review-checklist.md b/docs/codex/review-checklist.md new file mode 100644 index 0000000..7c771c6 --- /dev/null +++ b/docs/codex/review-checklist.md @@ -0,0 +1,52 @@ +# Review Checklist + +Use this checklist for repository reviews and pull requests. + +## Safety + +- Destructive actions default to dry-run or read-only. +- Real changes require explicit `--execute` and operator confirmation. +- Inputs are validated before use. +- Paths, service names, disks, volumes, and inventory targets are constrained. +- Rollback or recovery thinking is documented where the operation can change state. + +## Bash + +- Uses `#!/usr/bin/env bash`. +- Uses `set -o errexit`, `set -o nounset`, and `set -o pipefail`. +- Missing commands return a clear warning or invalid-input/dependency exit. +- Output uses `OK`, `WARNING`, and `CRITICAL` consistently. +- Exit codes follow repo convention: `0` OK, `1` operational issue, `2` invalid input or missing dependency. +- Help output exists for scripts that accept arguments. + +## Ansible + +- Target hosts are explicit and appropriate for the role. +- Modules are preferred over `shell` or `command`. +- Check mode and diff mode are considered. +- Tasks are idempotent or clearly documented when a check is inherently read-only or platform-specific. +- Handlers, tags, defaults, and validation tasks are used where useful. +- Inventory, vars, and role defaults do not contain secrets or real environment data. + +## Documentation + +- README files explain current state without overstating completeness. +- Runbooks include scope, pre-checks, execution controls, post-checks, and evidence. +- Docs avoid tutorial filler and fake enterprise complexity. +- Important limitations are linked or documented. +- `CHANGELOG.md` is updated for meaningful repo changes. + +## Operational Realism + +- The change reflects RHEL/Oracle Linux, Debian/Ubuntu, AIX, Veritas, GPFS, Zabbix, ELK, Docker, Kubernetes/K3s, Terraform, VMware, or Proxmox operations accurately. +- Examples remain sanitized. +- Placeholder projects are identified as placeholders. +- There is no unnecessary abstraction or invented complexity. + +## Validation + +- Changed Bash scripts pass `bash -n`. +- `shellcheck` was run if available, or its absence was reported. +- Ansible syntax/lint checks were run if available and relevant. +- YAML/Markdown sanity checks were run if available. +- Failures and skipped checks are visible in the final summary. diff --git a/docs/codex/task-template.md b/docs/codex/task-template.md new file mode 100644 index 0000000..fa499aa --- /dev/null +++ b/docs/codex/task-template.md @@ -0,0 +1,276 @@ +# Task Templates + +Copy the relevant section into a future Codex request and fill in the blanks. + +## Operational Bash Tool + +### Goal + +Build or improve a Bash tool for: + +### Context + +Affected platform, incident, or operational workflow: + +### Constraints + +- Default to dry-run/read-only. +- Require `--execute` for changes. +- Use `OK`, `WARNING`, and `CRITICAL`. +- Exit `0` OK, `1` operational issue, `2` invalid input or missing dependency. + +### Files/directories to inspect + +- `infra-run/scripts/bash/` +- Relevant runbook or README: + +### Implementation steps + +1. Inspect neighboring scripts and shared helpers. +2. Add or adjust usage/help output. +3. Add discovery, pre-check, guarded change, post-check, and reporting sections where useful. +4. Update README or runbook notes. + +### Validation commands + +```bash +bash -n