Clean up Python log analysis documentation

2026-05-11 17:10:10 +00:00
parent 1636f46f81
commit 8a7b7c5abc
13 changed files with 158 additions and 20 deletions
@@ -4,6 +4,13 @@

 ### Added

+- Python tooling validation for operational scripts.
+- `incident-log-summary` for general incident log summarization.
+- `log-diff-checker` for pre-change and post-change log comparison.
+- `auth-log-audit` for Linux authentication log review.
+- `jvm-log-analyzer` for JVM application log summaries.
+- `journal-analyzer` for exported `journalctl` log review.
+- `known-error-matcher` with JSON-based known error patterns.
 - Repository-level Codex guidance:
  - `AGENTS.md`
  - `docs/codex/README.md`
@@ -33,6 +40,8 @@
 - Updated root, `infra-run`, Bash, Ansible, platform, and lab README guidance for safety-first usage, validation, and future Codex-driven work.
 - Updated repository and `infra-run` README files to surface the new documentation structure and operational cheatsheets.
 - Updated repository, `infra-run`, and Ansible README files to describe the new hardening automation instead of placeholder-only Ansible structure.
+- Updated Python tooling documentation and repository roadmap.
+- Integrated Python syntax validation into repository validation workflow and CI.

 ### Notes

@@ -33,6 +33,13 @@ It is a technical portfolio, not a production toolkit. The examples show how ope
 - [Disk full workflow](./infra-run/scripts/bash/disk-full/) - triage scripts for usage, inode pressure, deleted open files, large files, log cleanup review, and postchecks.
 - [Veritas examples](./infra-run/scripts/bash/veritas/) - dry-run-first VxVM/VCS storage expansion workflow examples.
 - [GPFS examples](./infra-run/scripts/bash/gpfs/) - dry-run-first IBM Spectrum Scale expansion workflow examples.
+- [Incident log summary](./infra-run/scripts/python/incident-log-summary/) - read-only Python helper for local incident log pattern summaries.
+- [Log diff checker](./infra-run/scripts/python/log-diff-checker/) - read-only Python helper for before/after change log comparison.
+- [Auth log audit](./infra-run/scripts/python/auth-log-audit/) - read-only Python helper for local authentication log review.
+- [JVM log analyzer](./infra-run/scripts/python/jvm-log-analyzer/) - read-only Python helper for local JVM and Java application log review.
+- [Journal analyzer](./infra-run/scripts/python/journal-analyzer/) - read-only Python helper for exported `journalctl` text review.
+- [Known error matcher](./infra-run/scripts/python/known-error-matcher/) - read-only Python helper for matching logs against a JSON known-error catalog with runbook references.
+- [Python operational log analysis tools](./infra-run/scripts/python/) - small standard-library helpers for local log summaries, before/after comparisons, and evidence reports.
 - [Ansible hardening examples](./infra-run/ansible/) - selected Linux and AIX baseline hardening tasks organized as lab-safe roles.

 ## Planned Areas
@@ -78,10 +85,11 @@ Basic local validation:
 ./scripts/validate-repo.sh
 ./scripts/check-bash.sh
 ./scripts/check-ansible.sh
+./scripts/check-python.sh
 ./scripts/check-docs.sh
 ```

-The validation helpers run required lightweight checks and use optional tools such as `shellcheck`, `yamllint`, `ansible-playbook`, `ansible-lint`, and `markdownlint` when available. Set `STRICT=1` to fail when optional tools are missing.
+The validation helpers run required lightweight checks and use optional tools such as `shellcheck`, `yamllint`, `ansible-playbook`, `ansible-lint`, and `markdownlint` when available. Python checks use `python3 -m py_compile` and do not require external Python tooling. Set `STRICT=1` to fail when optional tools are missing.

 Some scripts depend on platform tools such as `vxdisk`, `hagrp`, `mmcrnsd`, and `mmlscluster`. Those commands are not expected to exist on a normal workstation, so functional testing against Veritas or GPFS requires a real lab environment.

@@ -90,6 +98,7 @@ See [infra-run/TESTED.md](./infra-run/TESTED.md) and [infra-run/KNOWN_LIMITATION
 ## Operational Areas Demonstrated

 - Linux operations triage and reporting.
+- Local operational log analysis with read-only Python helpers.
 - Disk pressure and deleted-file incident analysis.
 - Dry-run-first Bash automation.
 - Controlled storage change workflow design.
@@ -16,6 +16,22 @@ This file keeps future portfolio ideas in one place so empty folders do not look
 - Clustering: service group checks, failover review, and operational checklists.
 - Monitoring: Zabbix-oriented alert review and host onboarding notes.
 - Virtualization: VM lifecycle and platform operations examples.
- Log analysis: ELK-style search examples for incident review.
+- Log analysis: optional ELK-style search case study under `platform-projects`, separate from current local Python helpers.

-Nothing in this roadmap should be read as completed implementation.
+## Implemented Portfolio Additions
+
+- Python operational log analysis suite under `infra-run/scripts/python/`:
+  - `incident-log-summary`
+  - `log-diff-checker`
+  - `auth-log-audit`
+  - `jvm-log-analyzer`
+  - `journal-analyzer`
+  - `known-error-matcher`
+
+## Future Python Tooling Ideas
+
+- Real-world sample report examples using sanitized evidence.
+- Integration examples that combine log summaries with change evidence collection.
+- A shared Python helper library only if the standalone tools begin duplicating enough stable behavior to justify it.
+
+Planned sections remain future work unless listed as implemented.
@@ -1,16 +1,34 @@
 # infra-run

-`infra-run` is a sanitized infrastructure operations project. It contains Bash and Ansible examples based on Linux administration, incident response, storage operations, hardening, prechecks, postchecks, and controlled change workflows.
+`infra-run` is a sanitized infrastructure operations project. It contains Bash, Ansible, Python, and documentation examples based on Linux administration, incident response, storage operations, hardening, prechecks, postchecks, and controlled change workflows.

 The goal is to show operational judgment, not to ship a universal automation product.

 ## Current Contents

+### Bash Operational Scripts
+
 - [scripts/bash/os-healthcheck](./scripts/bash/os-healthcheck/) - general Linux health, service, disk, network, and report scripts.
 - [scripts/bash/disk-full](./scripts/bash/disk-full/) - disk-full triage and cleanup review workflow.
 - [scripts/bash/veritas](./scripts/bash/veritas/) - Veritas VxVM/VCS storage expansion workflow examples.
 - [scripts/bash/gpfs](./scripts/bash/gpfs/) - GPFS / IBM Spectrum Scale expansion workflow examples.
+
+### Python Log And Reporting Tools
+
+- [scripts/python](./scripts/python/) - read-only Python operational helpers using the standard library only.
+- [scripts/python/incident-log-summary](./scripts/python/incident-log-summary/) - read-only Python log summary helper for incident pattern review.
+- [scripts/python/log-diff-checker](./scripts/python/log-diff-checker/) - read-only Python before/after log comparison helper for change review.
+- [scripts/python/auth-log-audit](./scripts/python/auth-log-audit/) - read-only Python authentication log audit helper for SSH, sudo, su, and PAM review.
+- [scripts/python/jvm-log-analyzer](./scripts/python/jvm-log-analyzer/) - read-only Python JVM and Java application log analyzer for exception, stack trace, HTTP 5xx, database, and TLS review.
+- [scripts/python/journal-analyzer](./scripts/python/journal-analyzer/) - read-only Python exported journal analyzer for failed units, restart patterns, OOM events, and service warnings.
+- [scripts/python/known-error-matcher](./scripts/python/known-error-matcher/) - read-only Python matcher for local logs and JSON known-error catalogs with runbook references.
+
+### Ansible Automation
+
 - [ansible](./ansible/) - selected baseline hardening examples for RHEL-like Linux, Debian/Ubuntu, and AIX.
+
+### Runbooks And Documentation
+
 - [examples](./examples/) - sanitized sample command outputs and incident notes.

 ## Documentation
@@ -36,6 +54,7 @@ The goal is to show operational judgment, not to ship a universal automation pro
 - Bash syntax can be checked locally.
 - Shell scripts can be reviewed and partially exercised on a Linux workstation when platform commands are available or mocked.
 - Disk-full read-only scripts can be run against local paths for basic behavior checks.
+- Python log analysis examples can be run against sanitized sample logs under each tool directory.
 - Ansible YAML and role structure can be linted locally.

 ## Running Safely
@@ -70,7 +89,7 @@ From the repository root:
 ./scripts/validate-repo.sh
 ```

-Focused checks are available in `scripts/check-bash.sh`, `scripts/check-ansible.sh`, and `scripts/check-docs.sh`. If `ansible-lint` reports collection-related issues, install the collections listed in [ansible/collections/requirements.yml](./ansible/collections/requirements.yml) and rerun it. Treat lint as a starting point; platform testing still requires actual target systems.
+Focused checks are available in `scripts/check-bash.sh`, `scripts/check-ansible.sh`, `scripts/check-python.sh`, and `scripts/check-docs.sh`. If `ansible-lint` reports collection-related issues, install the collections listed in [ansible/collections/requirements.yml](./ansible/collections/requirements.yml) and rerun it. Treat lint as a starting point; platform testing still requires actual target systems.

 ## Supporting Notes

@@ -8,6 +8,20 @@ This file tracks planned `infra-run` additions without presenting them as comple
 - A small Python parser for converting script output into a markdown change note.
 - Additional Ansible molecule or container-based syntax checks where platform support is realistic.
 - Standalone runbooks that reference the existing Bash workflows.
+- Shared known-error pattern catalog review.
+- Additional links between Python findings and existing runbooks.
+- Change evidence collector for pre-check and post-check notes.
+- Report examples suitable for incident and change tickets.
+- Optional wrapper command only after the standalone Python tools stabilize.
+
+## Implemented Additions
+
+- `infra-run/scripts/python/incident-log-summary/` - first read-only Python log analysis helper for summarizing configured incident patterns from local log files.
+- `infra-run/scripts/python/log-diff-checker/` - read-only before/after log comparison helper for post-change pattern review.
+- `infra-run/scripts/python/auth-log-audit/` - read-only authentication log audit helper for local SSH, sudo, su, and PAM review.
+- `infra-run/scripts/python/jvm-log-analyzer/` - read-only JVM and Java application log analyzer for exceptions, stack traces, HTTP 5xx entries, database issues, TLS failures, and JVM failure symptoms.
+- `infra-run/scripts/python/journal-analyzer/` - read-only exported `journalctl` text analyzer for summarizing failed units, dependency issues, restart patterns, OOM findings, disk/filesystem symptoms, and related service warnings.
+- `infra-run/scripts/python/known-error-matcher/` - read-only known-error matcher for local logs and JSON pattern catalogs with severity, category, samples, and runbook references.

 ## Not Planned

@@ -1,6 +1,6 @@
 # infra-run/scripts

-This directory groups executable tooling used across the `infra-run` project. It separates shell-first operational scripts from future Python-based utilities while keeping both under one automation entry point.
+This directory groups executable tooling used across the `infra-run` project. It separates shell-first operational scripts from Python-based analysis utilities while keeping both under one automation entry point.

 ## Diagram

@@ -9,16 +9,17 @@ flowchart TD
  A["scripts"] --> B["bash"]
  A --> C["python"]
  B --> D["Operational toolkits"]
-  C --> E["Future helper utilities"]
+  C --> E["Analysis helper utilities"]
 ```

 ## Scope

- `bash` - current implementation area with operations toolkits.
- `python` - reserved space for future supporting utilities.
+- [bash](./bash/) - operational toolkits for host health checks, disk-full triage, Veritas examples, and GPFS examples.
+- [python](./python/) - read-only tools for local log parsing, reporting, and structured operational analysis.

 ## Notes

- The repository currently emphasizes Bash because it maps directly to day-to-day Linux operations.
- The structure leaves room for higher-level helpers without mixing concerns.
+- Bash remains the right default for direct host checks and operational wrappers.
+- Python is used where parsing, report generation, comparison, or JSON output is clearer than shell.
 - Bash tooling should remain safe by default, readable, and validated with `../../scripts/check-bash.sh` from the repository root.
+- Python tooling should remain read-only by default, standard-library based, and validated with `../../scripts/check-python.sh` from the repository root.
@@ -1,5 +1,69 @@
-# python
+# Python Operational Tools

-Planned area for small Python helpers.
+This directory contains small Python utilities that support operational analysis in `infra-run`.

-No Python tooling is implemented in `infra-run` yet.
+Python is used here only when it adds practical value over Bash: parsing structured or noisy input, producing repeatable reports, comparing evidence, or emitting machine-readable output for later automation. Shell remains the default choice for direct host checks and simple command wrappers.
+
+## Tools
+
+| Tool | Path | Purpose | Typical use | Example command |
+| --- | --- | --- | --- | --- |
+| incident-log-summary | [incident-log-summary](./incident-log-summary/) | Summarize configured incident patterns from one local log file. | First-pass incident notes from system or application logs. | `python3 incident_log_summary.py --file examples/system-messages.log` |
+| log-diff-checker | [log-diff-checker](./log-diff-checker/) | Compare configured patterns before and after a change. | Post-change review for new, increased, decreased, resolved, or unchanged log symptoms. | `python3 log_diff_checker.py --before examples/pre-change.log --after examples/post-change.log` |
+| auth-log-audit | [auth-log-audit](./auth-log-audit/) | Summarize SSH, sudo, su, and PAM findings from local authentication logs. | Authentication incident review or access-control evidence gathering. | `python3 auth_log_audit.py --file examples/sample-auth.log` |
+| jvm-log-analyzer | [jvm-log-analyzer](./jvm-log-analyzer/) | Summarize JVM exceptions, stack traces, HTTP 5xx entries, database issues, and TLS symptoms. | Java application support, restart review, or incident handoff evidence. | `python3 jvm_log_analyzer.py --file examples/sample-jvm-app.log` |
+| journal-analyzer | [journal-analyzer](./journal-analyzer/) | Summarize exported `journalctl` text for failed units, restart loops, OOM events, and service warnings. | Linux service incident review or patching/change evidence. | `python3 journal_analyzer.py --file examples/sample-journal.log` |
+| known-error-matcher | [known-error-matcher](./known-error-matcher/) | Match local logs against a JSON known-error catalog. | Connect known symptoms to severity, category, samples, and runbook references. | `python3 known_error_matcher.py --file examples/sample-app.log --patterns patterns.json` |
+
+## Expected Use Cases
+
+- Log parsing for incident review.
+- Markdown or text report generation from collected evidence.
+- Change evidence helpers for pre-check and post-check notes.
+- Incident summary builders from sanitized inputs.
+- Structured output for automation, such as JSON where useful.
+
+## Standards
+
+- Use the Python standard library only unless a later tool clearly justifies another dependency.
+- Keep tools read-only by default.
+- Do not perform destructive actions.
+- Use `argparse` for command-line interfaces.
+- Produce predictable text output suitable for terminal review and change notes.
+- Support text, Markdown, and JSON output where useful for terminal review, tickets, or local automation.
+- Use an `OK`, `WARNING`, `CRITICAL`, and `UNKNOWN` status model for findings.
+- Handle malformed input, permission problems, and runtime errors defensively.
+- Return meaningful exit codes.
+- Keep each tool small, focused, and easy to review.
+
+## Exit Codes
+
+- `0` - OK, no findings, or successful validation.
+- `1` - Operational findings detected.
+- `2` - Invalid input, missing dependency, permission issue, or runtime error.
+
+## Validation
+
+From the repository root:
+
+```bash
+bash scripts/check-python.sh
+bash scripts/validate-repo.sh
+```
+
+The checks use `python3 -m py_compile` and do not require external Python dependencies.
+
+## Expected Tool Structure
+
+Future tools should use a small self-contained layout:
+
+```text
+tool-name/
+  tool_name.py
+  README.md
+  examples/
+    sample-input.log
+    sample-report.md
+```
+
+Do not add package metadata, framework scaffolding, or external dependency files unless a future tool has a specific operational reason.
@@ -184,6 +184,7 @@ Review the report before attaching it. A `WARNING` or `CRITICAL` result should b
 ## Safety Notes

 - The tool only reads the input log and optionally writes a separate report.
+- The implementation uses the Python standard library only and does not require package installation.
 - It does not require elevated privileges unless the chosen log path requires them.
 - Do not include secrets, customer data, private hostnames, or unsanitized production details in portfolio examples.
- Treat findings as prompts for operator review, not automated remediation instructions.
+- Treat operational findings as prompts that require review; the tool does not prove compromise or determine root cause automatically.
@@ -153,6 +153,7 @@ Review the report before attaching it. The output is evidence for triage; it is
 ## Safety Notes

 - The tool only reads the input log and optionally writes a separate report.
+- The implementation uses the Python standard library only and does not require package installation.
 - It does not require elevated privileges unless the chosen log path requires them.
 - Do not include secrets, customer data, private hostnames, or unsanitized production details in portfolio examples.
- Treat findings as prompts for operator review, not automated remediation instructions.
+- Treat operational findings as prompts that require review; the tool does not determine root cause automatically.
@@ -209,6 +209,7 @@ Review the report before attaching it. Use it as a concise summary of exported j
 ## Safety Notes

 - The tool only reads the input journal export and optionally writes a separate report.
+- The implementation uses the Python standard library only and does not require package installation.
 - It does not require root privileges unless the chosen log path requires them.
 - Do not include secrets, private hostnames, customer identifiers, or unsanitized production details in portfolio examples.
- Treat the output as triage evidence that requires operator review, not an automated remediation decision.
+- Treat operational findings as triage evidence that requires review; the tool does not determine root cause automatically.
@@ -212,6 +212,7 @@ Review the report before attaching it. A `WARNING` or `CRITICAL` result should b
 ## Safety Notes

 - The tool only reads the input log and optionally writes a separate report.
+- The implementation uses the Python standard library only and does not require package installation.
 - It does not require elevated privileges unless the chosen log path requires them.
 - Do not include secrets, customer data, private hostnames, tokens, or unsanitized production details in portfolio examples.
- Treat findings as prompts for operator review, not automated remediation instructions.
+- Treat operational findings as prompts that require review; the tool does not determine root cause automatically.
@@ -193,6 +193,7 @@ Review the report before attaching it. A `WARNING` or `CRITICAL` result should b
 ## Safety Notes

 - The tool only reads the input log and pattern catalog and optionally writes a separate report.
+- The implementation uses the Python standard library only and does not require package installation.
 - It does not require elevated privileges unless the chosen log path requires them.
 - Do not include secrets, private hostnames, customer identifiers, tokens, or unsanitized production details in portfolio examples.
- Treat matches as prompts for operator review, not automated remediation instructions.
+- Treat operational findings as prompts that require review; the tool does not determine root cause automatically.
@@ -158,6 +158,7 @@ Use the report as a log perspective on the change. A `CRITICAL` or `WARNING` res
 ## Safety Notes

 - The tool only reads the input logs and optionally writes a separate report.
+- The implementation uses the Python standard library only and does not require package installation.
 - It does not require elevated privileges unless the chosen log path requires them.
 - Do not include secrets, customer data, private hostnames, or unsanitized production details in portfolio examples.
- Treat findings as prompts for operator review, not automated remediation instructions.
+- Treat operational findings as prompts that require review; the tool does not determine root cause automatically.