160 lines
5.5 KiB
Markdown
160 lines
5.5 KiB
Markdown
# incident-log-summary
|
|
|
|
`incident-log-summary` is a read-only Python CLI for quick incident log review. It scans a local Linux system log or application log and groups configured operational patterns by severity, count, timestamps, and sample lines.
|
|
|
|
The tool is meant for first-pass triage and incident notes. It does not replace full log search, alert correlation, service-specific runbooks, or review by an operator who understands the affected platform.
|
|
|
|
## When To Use
|
|
|
|
- During incident response when a collected log file needs a fast pattern summary.
|
|
- Before attaching evidence to an incident, problem, or change ticket.
|
|
- When comparing whether a log contains obvious storage, memory, service, TLS, HTTP, or connectivity failures.
|
|
- When JSON output is useful for later local automation.
|
|
|
|
## What It Does Not Do
|
|
|
|
- It does not read remote systems.
|
|
- It does not modify logs or system state.
|
|
- It does not query ELK, Zabbix, SIEM, journald, or application APIs.
|
|
- It does not prove root cause.
|
|
- It does not classify every possible vendor or application error.
|
|
- It does not treat sanitized examples as production validation.
|
|
|
|
## Supported Input
|
|
|
|
- One local text log file provided with `--file`.
|
|
- UTF-8 input is expected. Invalid byte sequences are replaced during read so review can continue.
|
|
- Empty, missing, unreadable, or non-file paths are rejected with exit code `2`.
|
|
|
|
## Supported Patterns
|
|
|
|
Critical patterns:
|
|
|
|
- `CRITICAL`
|
|
- `FATAL`
|
|
- `panic`
|
|
- `kernel panic`
|
|
- `no space left on device`
|
|
- `out of memory`
|
|
- `killed process`
|
|
- `read-only file system`
|
|
- `segmentation fault`
|
|
- `segfault`
|
|
- `certificate expired`
|
|
- `TLS handshake failed`
|
|
- `SSLHandshakeException`
|
|
- `database unavailable`
|
|
- `HTTP 500`
|
|
- `HTTP 502`
|
|
- `HTTP 503`
|
|
- `HTTP 504`
|
|
|
|
Warning patterns:
|
|
|
|
- `ERROR`
|
|
- `failed`
|
|
- `failure`
|
|
- `timeout`
|
|
- `connection refused`
|
|
- `connection reset`
|
|
- `permission denied`
|
|
- `authentication failed`
|
|
- `denied`
|
|
- `unavailable`
|
|
- `service restart`
|
|
- `retrying`
|
|
|
|
By default matching is case-sensitive. Use `--ignore-case` for case-insensitive matching across all configured patterns.
|
|
|
|
## Timestamp Handling
|
|
|
|
The scanner attempts to parse:
|
|
|
|
- `2026-05-11 10:15:30`
|
|
- `2026-05-11T10:15:30`
|
|
- `May 11 10:15:30`
|
|
|
|
Timestamp parsing is best-effort. Lines with unparseable timestamps are still analyzed, and date filtering keeps those lines by default so potentially important findings are not silently discarded.
|
|
|
|
Syslog-style timestamps do not include a year. For filtering, the tool uses the year from `--since` when present, otherwise the current local year.
|
|
|
|
## Usage
|
|
|
|
```bash
|
|
cd infra-run/scripts/python/incident-log-summary
|
|
|
|
python3 incident_log_summary.py --file examples/system-messages.log
|
|
python3 incident_log_summary.py --file examples/app-error.log --format markdown --output incident-report.md
|
|
python3 incident_log_summary.py --file examples/app-error.log --format json
|
|
python3 incident_log_summary.py --file examples/app-error.log --top 20
|
|
python3 incident_log_summary.py --file examples/app-error.log --ignore-case
|
|
python3 incident_log_summary.py --file examples/app-error.log --since "2026-05-11 10:00:00"
|
|
python3 incident_log_summary.py --file examples/app-error.log --until "2026-05-11 12:00:00"
|
|
```
|
|
|
|
## Output Formats
|
|
|
|
- `text` - default terminal-oriented report.
|
|
- `markdown` - incident or change ticket attachment format.
|
|
- `json` - structured output for local automation.
|
|
|
|
Use `--output <path>` to write the rendered report to a file. Without `--output`, the report is printed to stdout.
|
|
|
|
## Exit Codes
|
|
|
|
- `0` - OK, no findings.
|
|
- `1` - Operational findings detected.
|
|
- `2` - Invalid input, unreadable file, bad argument, or runtime error.
|
|
|
|
## Example Text Output
|
|
|
|
```text
|
|
Incident Log Summary
|
|
====================
|
|
|
|
[CRITICAL] no space left on device
|
|
Occurrences: 1
|
|
First seen: 2026-05-11 10:16:07
|
|
Last seen: 2026-05-11 10:16:07
|
|
Samples:
|
|
- May 11 10:16:07 ops-node-01 kernel: EXT4-fs warning: no space left on device while writing /var/log/messages
|
|
|
|
Operational Summary
|
|
-------------------
|
|
Total lines scanned: 7
|
|
Total findings: 7
|
|
Critical finding groups: 3
|
|
Warning finding groups: 4
|
|
Overall status: CRITICAL
|
|
```
|
|
|
|
## Markdown Workflow
|
|
|
|
Generate a markdown report from the collected log and attach it to the incident or change ticket as supporting evidence:
|
|
|
|
```bash
|
|
python3 incident_log_summary.py \
|
|
--file examples/app-error.log \
|
|
--format markdown \
|
|
--output incident-report.md
|
|
```
|
|
|
|
Review the report before attaching it. The output is evidence for triage; it is not a final root cause statement.
|
|
|
|
## Operational Limitations
|
|
|
|
- Pattern matching is intentionally simple and predictable.
|
|
- A single line can match multiple patterns, such as `ERROR`, `HTTP 503`, and `unavailable`.
|
|
- Case-sensitive default matching can miss lowercase variants unless `--ignore-case` is used.
|
|
- Syslog timestamps without a year are normalized with an inferred year.
|
|
- Date filters are best-effort because lines without parseable timestamps are retained.
|
|
- Large log files are read into memory; collect a scoped file or time-windowed extract for very large incidents.
|
|
|
|
## Safety Notes
|
|
|
|
- The tool only reads the input log and optionally writes a separate report.
|
|
- The implementation uses the Python standard library only and does not require package installation.
|
|
- It does not require elevated privileges unless the chosen log path requires them.
|
|
- Do not include secrets, customer data, private hostnames, or unsanitized production details in portfolio examples.
|
|
- Treat operational findings as prompts that require review; the tool does not determine root cause automatically.
|