Files
portfolio/infra-run/scripts/python/incident-log-summary
2026-05-11 17:03:31 +00:00
..
2026-05-11 17:03:31 +00:00
2026-05-11 17:03:31 +00:00

incident-log-summary

incident-log-summary is a read-only Python CLI for quick incident log review. It scans a local Linux system log or application log and groups configured operational patterns by severity, count, timestamps, and sample lines.

The tool is meant for first-pass triage and incident notes. It does not replace full log search, alert correlation, service-specific runbooks, or review by an operator who understands the affected platform.

When To Use

  • During incident response when a collected log file needs a fast pattern summary.
  • Before attaching evidence to an incident, problem, or change ticket.
  • When comparing whether a log contains obvious storage, memory, service, TLS, HTTP, or connectivity failures.
  • When JSON output is useful for later local automation.

What It Does Not Do

  • It does not read remote systems.
  • It does not modify logs or system state.
  • It does not query ELK, Zabbix, SIEM, journald, or application APIs.
  • It does not prove root cause.
  • It does not classify every possible vendor or application error.
  • It does not treat sanitized examples as production validation.

Supported Input

  • One local text log file provided with --file.
  • UTF-8 input is expected. Invalid byte sequences are replaced during read so review can continue.
  • Empty, missing, unreadable, or non-file paths are rejected with exit code 2.

Supported Patterns

Critical patterns:

  • CRITICAL
  • FATAL
  • panic
  • kernel panic
  • no space left on device
  • out of memory
  • killed process
  • read-only file system
  • segmentation fault
  • segfault
  • certificate expired
  • TLS handshake failed
  • SSLHandshakeException
  • database unavailable
  • HTTP 500
  • HTTP 502
  • HTTP 503
  • HTTP 504

Warning patterns:

  • ERROR
  • failed
  • failure
  • timeout
  • connection refused
  • connection reset
  • permission denied
  • authentication failed
  • denied
  • unavailable
  • service restart
  • retrying

By default matching is case-sensitive. Use --ignore-case for case-insensitive matching across all configured patterns.

Timestamp Handling

The scanner attempts to parse:

  • 2026-05-11 10:15:30
  • 2026-05-11T10:15:30
  • May 11 10:15:30

Timestamp parsing is best-effort. Lines with unparseable timestamps are still analyzed, and date filtering keeps those lines by default so potentially important findings are not silently discarded.

Syslog-style timestamps do not include a year. For filtering, the tool uses the year from --since when present, otherwise the current local year.

Usage

cd infra-run/scripts/python/incident-log-summary

python3 incident_log_summary.py --file examples/system-messages.log
python3 incident_log_summary.py --file examples/app-error.log --format markdown --output incident-report.md
python3 incident_log_summary.py --file examples/app-error.log --format json
python3 incident_log_summary.py --file examples/app-error.log --top 20
python3 incident_log_summary.py --file examples/app-error.log --ignore-case
python3 incident_log_summary.py --file examples/app-error.log --since "2026-05-11 10:00:00"
python3 incident_log_summary.py --file examples/app-error.log --until "2026-05-11 12:00:00"

Output Formats

  • text - default terminal-oriented report.
  • markdown - incident or change ticket attachment format.
  • json - structured output for local automation.

Use --output <path> to write the rendered report to a file. Without --output, the report is printed to stdout.

Exit Codes

  • 0 - OK, no findings.
  • 1 - Operational findings detected.
  • 2 - Invalid input, unreadable file, bad argument, or runtime error.

Example Text Output

Incident Log Summary
====================

[CRITICAL] no space left on device
Occurrences: 1
First seen: 2026-05-11 10:16:07
Last seen: 2026-05-11 10:16:07
Samples:
  - May 11 10:16:07 ops-node-01 kernel: EXT4-fs warning: no space left on device while writing /var/log/messages

Operational Summary
-------------------
Total lines scanned: 7
Total findings: 7
Critical finding groups: 3
Warning finding groups: 4
Overall status: CRITICAL

Markdown Workflow

Generate a markdown report from the collected log and attach it to the incident or change ticket as supporting evidence:

python3 incident_log_summary.py \
  --file examples/app-error.log \
  --format markdown \
  --output incident-report.md

Review the report before attaching it. The output is evidence for triage; it is not a final root cause statement.

Operational Limitations

  • Pattern matching is intentionally simple and predictable.
  • A single line can match multiple patterns, such as ERROR, HTTP 503, and unavailable.
  • Case-sensitive default matching can miss lowercase variants unless --ignore-case is used.
  • Syslog timestamps without a year are normalized with an inferred year.
  • Date filters are best-effort because lines without parseable timestamps are retained.
  • Large log files are read into memory; collect a scoped file or time-windowed extract for very large incidents.

Safety Notes

  • The tool only reads the input log and optionally writes a separate report.
  • It does not require elevated privileges unless the chosen log path requires them.
  • Do not include secrets, customer data, private hostnames, or unsanitized production details in portfolio examples.
  • Treat findings as prompts for operator review, not automated remediation instructions.