215 lines
8.9 KiB
Markdown
215 lines
8.9 KiB
Markdown
|
|
# journal-analyzer
|
||
|
|
|
||
|
|
`journal-analyzer` is a read-only Python CLI for reviewing exported `journalctl` text logs. It summarizes systemd, service, and system-level journal findings that require operator review during Linux incident response, post-patching validation, restart troubleshooting, and change evidence collection.
|
||
|
|
|
||
|
|
The tool analyzes exported journal text only. It does not call `journalctl` directly, does not modify host state, and does not claim root cause.
|
||
|
|
|
||
|
|
## Purpose
|
||
|
|
|
||
|
|
- Summarize which units failed and which services appear repeatedly affected.
|
||
|
|
- Surface dependency failures, restart loops, timeout patterns, OOM symptoms, disk/filesystem errors, TLS/certificate issues, authentication events, and network-related warnings.
|
||
|
|
- Produce predictable text, Markdown, or JSON output that can be attached to an incident or change ticket.
|
||
|
|
|
||
|
|
## When To Use
|
||
|
|
|
||
|
|
- After exporting a scoped `journalctl` window during incident response.
|
||
|
|
- After package patching or service restarts when failed units or degraded services need review.
|
||
|
|
- During Linux service troubleshooting when repeated restart or dependency messages need a quick grouped summary.
|
||
|
|
- Before attaching journal evidence to an incident, problem, or change record.
|
||
|
|
|
||
|
|
## What It Does Not Do
|
||
|
|
|
||
|
|
- It does not call `journalctl` directly in v1.
|
||
|
|
- It does not modify the input log, systemd state, service state, or host configuration.
|
||
|
|
- It does not read remote systems or live journal streams.
|
||
|
|
- It does not query SIEM, ELK, Zabbix, APM, or ticketing systems.
|
||
|
|
- It does not prove root cause or a service defect.
|
||
|
|
- It does not classify every vendor-specific journal message.
|
||
|
|
|
||
|
|
## Supported Input Type
|
||
|
|
|
||
|
|
- One exported local `journalctl` text file supplied with `--file`.
|
||
|
|
- UTF-8 input is expected. Invalid byte sequences are replaced during read so review can continue.
|
||
|
|
- Empty, missing, unreadable, or non-file paths are rejected with exit code `2`.
|
||
|
|
|
||
|
|
Example export commands:
|
||
|
|
|
||
|
|
```bash
|
||
|
|
journalctl --since "1 hour ago" > journal.log
|
||
|
|
journalctl -u nginx --since today > nginx-journal.log
|
||
|
|
journalctl -p warning..alert --since "24 hours ago" > warnings.log
|
||
|
|
journalctl --no-pager --since "2026-05-11 10:00:00" > journal.log
|
||
|
|
```
|
||
|
|
|
||
|
|
## Supported Event Categories
|
||
|
|
|
||
|
|
Critical-oriented categories:
|
||
|
|
|
||
|
|
- Failed unit or failed start findings.
|
||
|
|
- Dependency failures.
|
||
|
|
- Kernel panic and panic findings.
|
||
|
|
- OOM killer and killed process findings.
|
||
|
|
- Disk and filesystem issues such as `no space left on device`, read-only filesystem, filesystem errors, and I/O errors.
|
||
|
|
- Service or application crash patterns such as `segfault`.
|
||
|
|
- TLS and certificate failures.
|
||
|
|
- Emergency mode findings.
|
||
|
|
|
||
|
|
Warning-oriented categories:
|
||
|
|
|
||
|
|
- Restart and repeated start request findings.
|
||
|
|
- Timeout and timed out findings.
|
||
|
|
- Connection refused and connection reset findings.
|
||
|
|
- Permission denied and denied findings.
|
||
|
|
- Authentication failure findings.
|
||
|
|
- Availability, degraded, failed, and warning findings that still require review.
|
||
|
|
|
||
|
|
The matching is practical and pattern-based. Default matching is already case-tolerant for common operational wording, and `--ignore-case` is available for explicit filter runs and predictable operator intent. The tool is intended for first-pass operational review, not for proving causality.
|
||
|
|
|
||
|
|
## Timestamp Support
|
||
|
|
|
||
|
|
The analyzer attempts to parse common journal and syslog timestamp formats:
|
||
|
|
|
||
|
|
- `May 11 10:15:30`
|
||
|
|
- `2026-05-11 10:15:30`
|
||
|
|
- `2026-05-11T10:15:30`
|
||
|
|
- `2026-05-11 10:15:30.123456`
|
||
|
|
- `2026-05-11 10:15:30,123`
|
||
|
|
|
||
|
|
If a timestamp cannot be parsed:
|
||
|
|
|
||
|
|
- the line is still analyzed
|
||
|
|
- first seen / last seen remain `UNKNOWN` where needed
|
||
|
|
- time-window filters keep the line by default rather than silently discarding it
|
||
|
|
|
||
|
|
Syslog-style timestamps without a year use the current local year internally unless `--since` provides a year context.
|
||
|
|
|
||
|
|
## Service Filtering
|
||
|
|
|
||
|
|
Use `--service SERVICE_NAME` to keep findings for a specific service, unit, or process name. Partial matches are allowed.
|
||
|
|
|
||
|
|
Examples:
|
||
|
|
|
||
|
|
```bash
|
||
|
|
python3 journal_analyzer.py --file examples/sample-journal.log --service nginx
|
||
|
|
python3 journal_analyzer.py --file examples/sample-journal.log --service sshd
|
||
|
|
```
|
||
|
|
|
||
|
|
`--service nginx` matches practical variants such as `nginx`, `nginx.service`, and lines where the raw journal text includes `nginx`.
|
||
|
|
|
||
|
|
## Severity Filtering
|
||
|
|
|
||
|
|
Use `--severity warning` or `--severity critical` to limit the displayed findings.
|
||
|
|
|
||
|
|
Examples:
|
||
|
|
|
||
|
|
```bash
|
||
|
|
python3 journal_analyzer.py --file examples/sample-journal.log --severity critical
|
||
|
|
python3 journal_analyzer.py --file examples/sample-journal.log --severity warning
|
||
|
|
```
|
||
|
|
|
||
|
|
## Severity Model
|
||
|
|
|
||
|
|
Overall status is conservative:
|
||
|
|
|
||
|
|
- `OK` - no journal findings detected.
|
||
|
|
- `WARNING` - warning-level findings exist but no critical findings exist.
|
||
|
|
- `CRITICAL` - one or more critical findings exist.
|
||
|
|
|
||
|
|
Critical status is driven by failed units, dependency failures, OOM events, kernel panic findings, disk full or read-only filesystem symptoms, emergency mode, TLS/certificate failures, and I/O or filesystem errors.
|
||
|
|
|
||
|
|
Warning status is driven by restart-related findings, timeout patterns, connection issues, permission denied events, authentication failures, degraded messages, and generic warning/failure entries that still require review.
|
||
|
|
|
||
|
|
The report summarizes exported journal findings that require review. It does not claim root cause.
|
||
|
|
|
||
|
|
## Usage
|
||
|
|
|
||
|
|
```bash
|
||
|
|
cd infra-run/scripts/python/journal-analyzer
|
||
|
|
|
||
|
|
python3 journal_analyzer.py --file examples/sample-journal.log
|
||
|
|
python3 journal_analyzer.py --file examples/sample-journal.log --format markdown
|
||
|
|
python3 journal_analyzer.py --file examples/sample-journal.log --format markdown --output journal-report.md
|
||
|
|
python3 journal_analyzer.py --file examples/sample-journal.log --format json
|
||
|
|
python3 journal_analyzer.py --file examples/sample-journal.log --service sshd
|
||
|
|
python3 journal_analyzer.py --file examples/sample-journal.log --service nginx
|
||
|
|
python3 journal_analyzer.py --file examples/sample-journal.log --severity critical
|
||
|
|
python3 journal_analyzer.py --file examples/sample-journal.log --top 10
|
||
|
|
python3 journal_analyzer.py --file examples/sample-journal.log --since "2026-05-11 10:00:00"
|
||
|
|
python3 journal_analyzer.py --file examples/sample-journal.log --until "2026-05-11 12:00:00"
|
||
|
|
python3 journal_analyzer.py --file examples/sample-journal.log --ignore-case
|
||
|
|
```
|
||
|
|
|
||
|
|
## Output Formats
|
||
|
|
|
||
|
|
- `text` - default terminal-oriented report.
|
||
|
|
- `markdown` - incident or change ticket attachment format.
|
||
|
|
- `json` - structured output for local automation.
|
||
|
|
|
||
|
|
Use `--output <path>` to write the report to a separate file. Without `--output`, the report is printed to stdout.
|
||
|
|
|
||
|
|
## Exit Codes
|
||
|
|
|
||
|
|
- `0` - OK, no journal findings.
|
||
|
|
- `1` - Journal findings detected.
|
||
|
|
- `2` - Invalid input, unreadable file, bad argument, output write failure, or runtime error.
|
||
|
|
|
||
|
|
## Example Text Output
|
||
|
|
|
||
|
|
```text
|
||
|
|
Journal Analyzer
|
||
|
|
================
|
||
|
|
|
||
|
|
Overall status: CRITICAL
|
||
|
|
Journal findings require review; logs alone do not prove root cause.
|
||
|
|
|
||
|
|
[CRITICAL] nginx.service - failed_unit
|
||
|
|
Pattern: failed to start
|
||
|
|
Occurrences: 1
|
||
|
|
Unit: nginx.service
|
||
|
|
Process: systemd
|
||
|
|
PID: 1
|
||
|
|
First seen: May 11 10:16:11
|
||
|
|
Last seen: May 11 10:16:11
|
||
|
|
Samples:
|
||
|
|
- May 11 10:16:11 web01 systemd[1]: Failed to start nginx.service - A high performance web server and a reverse proxy server.
|
||
|
|
|
||
|
|
Operational Summary
|
||
|
|
-------------------
|
||
|
|
Overall status: CRITICAL
|
||
|
|
Total lines scanned: 17
|
||
|
|
Total findings: 13
|
||
|
|
Critical finding groups: 7
|
||
|
|
Warning finding groups: 5
|
||
|
|
Affected services/units count: 9
|
||
|
|
```
|
||
|
|
|
||
|
|
## Markdown Workflow
|
||
|
|
|
||
|
|
Generate a Markdown report from an exported journal and attach it to the incident or change ticket as supporting evidence:
|
||
|
|
|
||
|
|
```bash
|
||
|
|
python3 journal_analyzer.py \
|
||
|
|
--file examples/sample-journal.log \
|
||
|
|
--format markdown \
|
||
|
|
--output journal-report.md
|
||
|
|
```
|
||
|
|
|
||
|
|
Review the report before attaching it. Use it as a concise summary of exported journal findings, then correlate it with service status, monitoring, recent changes, package history, and runbook-specific post-checks.
|
||
|
|
|
||
|
|
## Operational Limitations
|
||
|
|
|
||
|
|
- Pattern matching is intentionally simple and predictable.
|
||
|
|
- A single line can match more than one finding when it contains more than one meaningful symptom, such as a TLS failure plus certificate expiry.
|
||
|
|
- Default matching is already case-tolerant for practical journal review; `--ignore-case` remains available when you want to force case-insensitive operator searches.
|
||
|
|
- Unit, process, and PID extraction are best-effort and may return `UNKNOWN`.
|
||
|
|
- Time filtering is best-effort because lines without parseable timestamps are retained.
|
||
|
|
- Large log files are read into memory; use scoped journal exports for very large review windows.
|
||
|
|
- The tool does not inspect structured journal fields because v1 works on exported text logs.
|
||
|
|
|
||
|
|
## Safety Notes
|
||
|
|
|
||
|
|
- The tool only reads the input journal export and optionally writes a separate report.
|
||
|
|
- It does not require root privileges unless the chosen log path requires them.
|
||
|
|
- Do not include secrets, private hostnames, customer identifiers, or unsanitized production details in portfolio examples.
|
||
|
|
- Treat the output as triage evidence that requires operator review, not an automated remediation decision.
|