Add log diff checker tool

2026-05-11 17:04:10 +00:00
parent 5dde403ce3
commit 452ff4fac1
5 changed files with 774 additions and 0 deletions
@@ -0,0 +1,163 @@
+# log-diff-checker
+
+`log-diff-checker` is a read-only Python CLI for comparing configured operational log patterns before and after a change. It is intended to help an infrastructure engineer decide whether a patch, deployment, configuration change, or service restart introduced new log risk or reduced existing noise.
+
+The tool compares local pre-change and post-change log extracts. It does not modify input logs or system state.
+
+## When To Use
+
+- After a planned change when pre-check and post-check log extracts are available.
+- During change validation when the question is whether errors increased, disappeared, or stayed flat.
+- Before attaching log evidence to a change, incident, or problem ticket.
+- When predictable text, Markdown, or JSON output is useful for local review.
+
+## What It Does
+
+- Reads two local text log files supplied with `--before` and `--after`.
+- Scans both files for configured critical and warning patterns.
+- Compares before and after counts for each detected pattern.
+- Classifies patterns as `NEW`, `INCREASED`, `DECREASED`, `RESOLVED`, or `UNCHANGED`.
+- Sets an overall status of `OK`, `WARNING`, or `CRITICAL`.
+- Includes sample log lines from the side that best explains the change.
+
+## What It Does Not Do
+
+- It does not read remote systems.
+- It does not modify logs, services, or host state.
+- It does not query ELK, Zabbix, SIEM, journald, or application APIs.
+- It does not prove root cause or change safety.
+- It does not replace service-specific post-change checks.
+- It does not classify every possible vendor or application error.
+
+## Supported Input
+
+- Two local text log files:
+  - `--before` for the pre-change log extract.
+  - `--after` for the post-change log extract.
+- UTF-8 input is expected. Invalid byte sequences are replaced during read so review can continue.
+- Empty, missing, unreadable, or non-file paths are rejected with exit code `2`.
+
+## Supported Patterns
+
+Critical patterns:
+
+- `CRITICAL`
+- `FATAL`
+- `panic`
+- `kernel panic`
+- `no space left on device`
+- `out of memory`
+- `killed process`
+- `read-only file system`
+- `segmentation fault`
+- `segfault`
+- `certificate expired`
+- `TLS handshake failed`
+- `SSLHandshakeException`
+- `database unavailable`
+- `HTTP 500`
+- `HTTP 502`
+- `HTTP 503`
+- `HTTP 504`
+
+Warning patterns:
+
+- `ERROR`
+- `failed`
+- `failure`
+- `timeout`
+- `connection refused`
+- `connection reset`
+- `permission denied`
+- `authentication failed`
+- `denied`
+- `unavailable`
+- `service restart`
+- `retrying`
+
+By default matching is case-sensitive. Use `--ignore-case` for case-insensitive matching across all configured patterns.
+
+## Usage
+
+```bash
+cd infra-run/scripts/python/log-diff-checker
+
+python3 log_diff_checker.py --before examples/pre-change.log --after examples/post-change.log
+python3 log_diff_checker.py --before examples/pre-change.log --after examples/post-change.log --format markdown
+python3 log_diff_checker.py --before examples/pre-change.log --after examples/post-change.log --format markdown --output change-log-diff.md
+python3 log_diff_checker.py --before examples/pre-change.log --after examples/post-change.log --format json
+python3 log_diff_checker.py --before examples/pre-change.log --after examples/post-change.log --ignore-case
+python3 log_diff_checker.py --before examples/pre-change.log --after examples/post-change.log --top 20
+python3 log_diff_checker.py --before examples/pre-change.log --after examples/post-change.log --max-samples 5
+```
+
+## Output Formats
+
+- `text` - default terminal-oriented report.
+- `markdown` - change or incident ticket attachment format.
+- `json` - structured output for local automation.
+
+Use `--output <path>` to write the rendered report to a separate file. Without `--output`, the report is printed to stdout. The tool rejects an output path that resolves to either input log file.
+
+## Exit Codes
+
+- `0` - OK, no new or increased findings.
+- `1` - New or increased findings detected.
+- `2` - Invalid input, unreadable file, bad argument, output write failure, or runtime error.
+
+## Example Text Output
+
+```text
+Log Diff Checker
+================
+
+[CRITICAL] CRITICAL - NEW
+Before count: 0
+After count: 1
+Delta: +1
+Sample source: after
+Samples:
+  - 2026-05-11 10:14:31 app01 inventory-api[2294]: CRITICAL database unavailable while opening checkout connection
+
+Operational Summary
+-------------------
+Total lines scanned before: 7
+Total lines scanned after: 8
+Total unique patterns compared: 9
+New findings count: 3
+Increased findings count: 3
+Decreased findings count: 0
+Resolved findings count: 2
+Unchanged findings count: 1
+Overall status: CRITICAL
+```
+
+## Markdown Workflow
+
+Generate a Markdown report from collected pre-change and post-change logs, review it, and attach it to the change ticket as supporting evidence:
+
+```bash
+python3 log_diff_checker.py \
+  --before examples/pre-change.log \
+  --after examples/post-change.log \
+  --format markdown \
+  --output change-log-diff.md
+```
+
+Use the report as a log perspective on the change. A `CRITICAL` or `WARNING` result should be reviewed with service health checks, monitoring, rollback criteria, and the relevant application owner.
+
+## Operational Limitations
+
+- Pattern matching is intentionally simple and predictable.
+- A single line can match multiple patterns, such as `CRITICAL`, `database unavailable`, and `unavailable`.
+- Case-sensitive default matching can miss lowercase variants unless `--ignore-case` is used.
+- The tool compares counts, not rates, time windows, or request volume.
+- Large log files are read into memory; collect scoped extracts for very large incidents.
+- `--top` limits displayed findings only. The operational summary still reflects all compared patterns.
+
+## Safety Notes
+
+- The tool only reads the input logs and optionally writes a separate report.
+- It does not require elevated privileges unless the chosen log path requires them.
+- Do not include secrets, customer data, private hostnames, or unsanitized production details in portfolio examples.
+- Treat findings as prompts for operator review, not automated remediation instructions.