Files

160 lines
5.5 KiB
Markdown
Raw Permalink Normal View History

2026-05-11 17:03:31 +00:00
# incident-log-summary
`incident-log-summary` is a read-only Python CLI for quick incident log review. It scans a local Linux system log or application log and groups configured operational patterns by severity, count, timestamps, and sample lines.
The tool is meant for first-pass triage and incident notes. It does not replace full log search, alert correlation, service-specific runbooks, or review by an operator who understands the affected platform.
## When To Use
- During incident response when a collected log file needs a fast pattern summary.
- Before attaching evidence to an incident, problem, or change ticket.
- When comparing whether a log contains obvious storage, memory, service, TLS, HTTP, or connectivity failures.
- When JSON output is useful for later local automation.
## What It Does Not Do
- It does not read remote systems.
- It does not modify logs or system state.
- It does not query ELK, Zabbix, SIEM, journald, or application APIs.
- It does not prove root cause.
- It does not classify every possible vendor or application error.
- It does not treat sanitized examples as production validation.
## Supported Input
- One local text log file provided with `--file`.
- UTF-8 input is expected. Invalid byte sequences are replaced during read so review can continue.
- Empty, missing, unreadable, or non-file paths are rejected with exit code `2`.
## Supported Patterns
Critical patterns:
- `CRITICAL`
- `FATAL`
- `panic`
- `kernel panic`
- `no space left on device`
- `out of memory`
- `killed process`
- `read-only file system`
- `segmentation fault`
- `segfault`
- `certificate expired`
- `TLS handshake failed`
- `SSLHandshakeException`
- `database unavailable`
- `HTTP 500`
- `HTTP 502`
- `HTTP 503`
- `HTTP 504`
Warning patterns:
- `ERROR`
- `failed`
- `failure`
- `timeout`
- `connection refused`
- `connection reset`
- `permission denied`
- `authentication failed`
- `denied`
- `unavailable`
- `service restart`
- `retrying`
By default matching is case-sensitive. Use `--ignore-case` for case-insensitive matching across all configured patterns.
## Timestamp Handling
The scanner attempts to parse:
- `2026-05-11 10:15:30`
- `2026-05-11T10:15:30`
- `May 11 10:15:30`
Timestamp parsing is best-effort. Lines with unparseable timestamps are still analyzed, and date filtering keeps those lines by default so potentially important findings are not silently discarded.
Syslog-style timestamps do not include a year. For filtering, the tool uses the year from `--since` when present, otherwise the current local year.
## Usage
```bash
cd infra-run/scripts/python/incident-log-summary
python3 incident_log_summary.py --file examples/system-messages.log
python3 incident_log_summary.py --file examples/app-error.log --format markdown --output incident-report.md
python3 incident_log_summary.py --file examples/app-error.log --format json
python3 incident_log_summary.py --file examples/app-error.log --top 20
python3 incident_log_summary.py --file examples/app-error.log --ignore-case
python3 incident_log_summary.py --file examples/app-error.log --since "2026-05-11 10:00:00"
python3 incident_log_summary.py --file examples/app-error.log --until "2026-05-11 12:00:00"
```
## Output Formats
- `text` - default terminal-oriented report.
- `markdown` - incident or change ticket attachment format.
- `json` - structured output for local automation.
Use `--output <path>` to write the rendered report to a file. Without `--output`, the report is printed to stdout.
## Exit Codes
- `0` - OK, no findings.
- `1` - Operational findings detected.
- `2` - Invalid input, unreadable file, bad argument, or runtime error.
## Example Text Output
```text
Incident Log Summary
====================
[CRITICAL] no space left on device
Occurrences: 1
First seen: 2026-05-11 10:16:07
Last seen: 2026-05-11 10:16:07
Samples:
- May 11 10:16:07 ops-node-01 kernel: EXT4-fs warning: no space left on device while writing /var/log/messages
Operational Summary
-------------------
Total lines scanned: 7
Total findings: 7
Critical finding groups: 3
Warning finding groups: 4
Overall status: CRITICAL
```
## Markdown Workflow
Generate a markdown report from the collected log and attach it to the incident or change ticket as supporting evidence:
```bash
python3 incident_log_summary.py \
--file examples/app-error.log \
--format markdown \
--output incident-report.md
```
Review the report before attaching it. The output is evidence for triage; it is not a final root cause statement.
## Operational Limitations
- Pattern matching is intentionally simple and predictable.
- A single line can match multiple patterns, such as `ERROR`, `HTTP 503`, and `unavailable`.
- Case-sensitive default matching can miss lowercase variants unless `--ignore-case` is used.
- Syslog timestamps without a year are normalized with an inferred year.
- Date filters are best-effort because lines without parseable timestamps are retained.
- Large log files are read into memory; collect a scoped file or time-windowed extract for very large incidents.
## Safety Notes
- The tool only reads the input log and optionally writes a separate report.
2026-05-11 17:10:10 +00:00
- The implementation uses the Python standard library only and does not require package installation.
2026-05-11 17:03:31 +00:00
- It does not require elevated privileges unless the chosen log path requires them.
- Do not include secrets, customer data, private hostnames, or unsanitized production details in portfolio examples.
2026-05-11 17:10:10 +00:00
- Treat operational findings as prompts that require review; the tool does not determine root cause automatically.