infra-run/scripts/python/journal-analyzer/README.md

# journal-analyzer

`journal-analyzer` is a read-only Python CLI for reviewing exported `journalctl` text logs. It summarizes systemd, service, and system-level journal findings that require operator review during Linux incident response, post-patching validation, restart troubleshooting, and change evidence collection.

The tool analyzes exported journal text only. It does not call `journalctl` directly, does not modify host state, and does not claim root cause.

## Purpose

- Summarize which units failed and which services appear repeatedly affected.
- Surface dependency failures, restart loops, timeout patterns, OOM symptoms, disk/filesystem errors, TLS/certificate issues, authentication events, and network-related warnings.
- Produce predictable text, Markdown, or JSON output that can be attached to an incident or change ticket.

## When To Use

- After exporting a scoped `journalctl` window during incident response.
- After package patching or service restarts when failed units or degraded services need review.
- During Linux service troubleshooting when repeated restart or dependency messages need a quick grouped summary.
- Before attaching journal evidence to an incident, problem, or change record.

## What It Does Not Do

- It does not call `journalctl` directly in v1.
- It does not modify the input log, systemd state, service state, or host configuration.
- It does not read remote systems or live journal streams.
- It does not query SIEM, ELK, Zabbix, APM, or ticketing systems.
- It does not prove root cause or a service defect.
- It does not classify every vendor-specific journal message.

## Supported Input Type

- One exported local `journalctl` text file supplied with `--file`.
- UTF-8 input is expected. Invalid byte sequences are replaced during read so review can continue.
- Empty, missing, unreadable, or non-file paths are rejected with exit code `2`.

Example export commands:

```bash
journalctl --since "1 hour ago" > journal.log
journalctl -u nginx --since today > nginx-journal.log
journalctl -p warning..alert --since "24 hours ago" > warnings.log
journalctl --no-pager --since "2026-05-11 10:00:00" > journal.log
```

## Supported Event Categories

Critical-oriented categories:

- Failed unit or failed start findings.
- Dependency failures.
- Kernel panic and panic findings.
- OOM killer and killed process findings.
- Disk and filesystem issues such as `no space left on device`, read-only filesystem, filesystem errors, and I/O errors.
- Service or application crash patterns such as `segfault`.
- TLS and certificate failures.
- Emergency mode findings.

Warning-oriented categories:

- Restart and repeated start request findings.
- Timeout and timed out findings.
- Connection refused and connection reset findings.
- Permission denied and denied findings.
- Authentication failure findings.
- Availability, degraded, failed, and warning findings that still require review.

The matching is practical and pattern-based. Default matching is already case-tolerant for common operational wording, and `--ignore-case` is available for explicit filter runs and predictable operator intent. The tool is intended for first-pass operational review, not for proving causality.

## Timestamp Support

The analyzer attempts to parse common journal and syslog timestamp formats:

- `May 11 10:15:30`
- `2026-05-11 10:15:30`
- `2026-05-11T10:15:30`
- `2026-05-11 10:15:30.123456`
- `2026-05-11 10:15:30,123`

If a timestamp cannot be parsed:

- the line is still analyzed
- first seen / last seen remain `UNKNOWN` where needed
- time-window filters keep the line by default rather than silently discarding it

Syslog-style timestamps without a year use the current local year internally unless `--since` provides a year context.

## Service Filtering

Use `--service SERVICE_NAME` to keep findings for a specific service, unit, or process name. Partial matches are allowed.

Examples:

```bash
python3 journal_analyzer.py --file examples/sample-journal.log --service nginx
python3 journal_analyzer.py --file examples/sample-journal.log --service sshd
```

`--service nginx` matches practical variants such as `nginx`, `nginx.service`, and lines where the raw journal text includes `nginx`.

## Severity Filtering

Use `--severity warning` or `--severity critical` to limit the displayed findings.

Examples:

```bash
python3 journal_analyzer.py --file examples/sample-journal.log --severity critical
python3 journal_analyzer.py --file examples/sample-journal.log --severity warning
```

## Severity Model

Overall status is conservative:

- `OK` - no journal findings detected.
- `WARNING` - warning-level findings exist but no critical findings exist.
- `CRITICAL` - one or more critical findings exist.

Critical status is driven by failed units, dependency failures, OOM events, kernel panic findings, disk full or read-only filesystem symptoms, emergency mode, TLS/certificate failures, and I/O or filesystem errors.

Warning status is driven by restart-related findings, timeout patterns, connection issues, permission denied events, authentication failures, degraded messages, and generic warning/failure entries that still require review.

The report summarizes exported journal findings that require review. It does not claim root cause.

## Usage

```bash
cd infra-run/scripts/python/journal-analyzer

python3 journal_analyzer.py --file examples/sample-journal.log
python3 journal_analyzer.py --file examples/sample-journal.log --format markdown
python3 journal_analyzer.py --file examples/sample-journal.log --format markdown --output journal-report.md
python3 journal_analyzer.py --file examples/sample-journal.log --format json
python3 journal_analyzer.py --file examples/sample-journal.log --service sshd
python3 journal_analyzer.py --file examples/sample-journal.log --service nginx
python3 journal_analyzer.py --file examples/sample-journal.log --severity critical
python3 journal_analyzer.py --file examples/sample-journal.log --top 10
python3 journal_analyzer.py --file examples/sample-journal.log --since "2026-05-11 10:00:00"
python3 journal_analyzer.py --file examples/sample-journal.log --until "2026-05-11 12:00:00"
python3 journal_analyzer.py --file examples/sample-journal.log --ignore-case
```

## Output Formats

- `text` - default terminal-oriented report.
- `markdown` - incident or change ticket attachment format.
- `json` - structured output for local automation.

Use `--output <path>` to write the report to a separate file. Without `--output`, the report is printed to stdout.

## Exit Codes

- `0` - OK, no journal findings.
- `1` - Journal findings detected.
- `2` - Invalid input, unreadable file, bad argument, output write failure, or runtime error.

## Example Text Output

```text
Journal Analyzer
================

Overall status: CRITICAL
Journal findings require review; logs alone do not prove root cause.

[CRITICAL] nginx.service - failed_unit
Pattern: failed to start
Occurrences: 1
Unit: nginx.service
Process: systemd
PID: 1
First seen: May 11 10:16:11
Last seen: May 11 10:16:11
Samples:
  - May 11 10:16:11 web01 systemd[1]: Failed to start nginx.service - A high performance web server and a reverse proxy server.

Operational Summary
-------------------
Overall status: CRITICAL
Total lines scanned: 17
Total findings: 13
Critical finding groups: 7
Warning finding groups: 5
Affected services/units count: 9
```

## Markdown Workflow

Generate a Markdown report from an exported journal and attach it to the incident or change ticket as supporting evidence:

```bash
python3 journal_analyzer.py \
  --file examples/sample-journal.log \
  --format markdown \
  --output journal-report.md
```

Review the report before attaching it. Use it as a concise summary of exported journal findings, then correlate it with service status, monitoring, recent changes, package history, and runbook-specific post-checks.

## Operational Limitations

- Pattern matching is intentionally simple and predictable.
- A single line can match more than one finding when it contains more than one meaningful symptom, such as a TLS failure plus certificate expiry.
- Default matching is already case-tolerant for practical journal review; `--ignore-case` remains available when you want to force case-insensitive operator searches.
- Unit, process, and PID extraction are best-effort and may return `UNKNOWN`.
- Time filtering is best-effort because lines without parseable timestamps are retained.
- Large log files are read into memory; use scoped journal exports for very large review windows.
- The tool does not inspect structured journal fields because v1 works on exported text logs.

## Safety Notes

- The tool only reads the input journal export and optionally writes a separate report.
- It does not require root privileges unless the chosen log path requires them.
- Do not include secrets, private hostnames, customer identifiers, or unsanitized production details in portfolio examples.
- Treat the output as triage evidence that requires operator review, not an automated remediation decision.
Add journal analyzer tool 2026-05-11 17:06:05 +00:00			`# journal-analyzer`

			`journal-analyzer` is a read-only Python CLI for reviewing exported `journalctl` text logs. It summarizes systemd, service, and system-level journal findings that require operator review during Linux incident response, post-patching validation, restart troubleshooting, and change evidence collection.

			The tool analyzes exported journal text only. It does not call `journalctl` directly, does not modify host state, and does not claim root cause.

			`## Purpose`

			`- Summarize which units failed and which services appear repeatedly affected.`
			`- Surface dependency failures, restart loops, timeout patterns, OOM symptoms, disk/filesystem errors, TLS/certificate issues, authentication events, and network-related warnings.`
			`- Produce predictable text, Markdown, or JSON output that can be attached to an incident or change ticket.`

			`## When To Use`

			- After exporting a scoped `journalctl` window during incident response.
			`- After package patching or service restarts when failed units or degraded services need review.`
			`- During Linux service troubleshooting when repeated restart or dependency messages need a quick grouped summary.`
			`- Before attaching journal evidence to an incident, problem, or change record.`

			`## What It Does Not Do`

			- It does not call `journalctl` directly in v1.
			`- It does not modify the input log, systemd state, service state, or host configuration.`
			`- It does not read remote systems or live journal streams.`
			`- It does not query SIEM, ELK, Zabbix, APM, or ticketing systems.`
			`- It does not prove root cause or a service defect.`
			`- It does not classify every vendor-specific journal message.`

			`## Supported Input Type`

			- One exported local `journalctl` text file supplied with `--file`.
			`- UTF-8 input is expected. Invalid byte sequences are replaced during read so review can continue.`
			- Empty, missing, unreadable, or non-file paths are rejected with exit code `2`.

			`Example export commands:`

			```bash
			`journalctl --since "1 hour ago" > journal.log`
			`journalctl -u nginx --since today > nginx-journal.log`
			`journalctl -p warning..alert --since "24 hours ago" > warnings.log`
			`journalctl --no-pager --since "2026-05-11 10:00:00" > journal.log`
			```

			`## Supported Event Categories`

			`Critical-oriented categories:`

			`- Failed unit or failed start findings.`
			`- Dependency failures.`
			`- Kernel panic and panic findings.`
			`- OOM killer and killed process findings.`
			- Disk and filesystem issues such as `no space left on device`, read-only filesystem, filesystem errors, and I/O errors.
			- Service or application crash patterns such as `segfault`.
			`- TLS and certificate failures.`
			`- Emergency mode findings.`

			`Warning-oriented categories:`

			`- Restart and repeated start request findings.`
			`- Timeout and timed out findings.`
			`- Connection refused and connection reset findings.`
			`- Permission denied and denied findings.`
			`- Authentication failure findings.`
			`- Availability, degraded, failed, and warning findings that still require review.`

			The matching is practical and pattern-based. Default matching is already case-tolerant for common operational wording, and `--ignore-case` is available for explicit filter runs and predictable operator intent. The tool is intended for first-pass operational review, not for proving causality.

			`## Timestamp Support`

			`The analyzer attempts to parse common journal and syslog timestamp formats:`

			- `May 11 10:15:30`
			- `2026-05-11 10:15:30`
			- `2026-05-11T10:15:30`
			- `2026-05-11 10:15:30.123456`
			- `2026-05-11 10:15:30,123`

			`If a timestamp cannot be parsed:`

			`- the line is still analyzed`
			- first seen / last seen remain `UNKNOWN` where needed
			`- time-window filters keep the line by default rather than silently discarding it`

			Syslog-style timestamps without a year use the current local year internally unless `--since` provides a year context.

			`## Service Filtering`

			Use `--service SERVICE_NAME` to keep findings for a specific service, unit, or process name. Partial matches are allowed.

			`Examples:`

			```bash
			`python3 journal_analyzer.py --file examples/sample-journal.log --service nginx`
			`python3 journal_analyzer.py --file examples/sample-journal.log --service sshd`
			```

			`--service nginx` matches practical variants such as `nginx`, `nginx.service`, and lines where the raw journal text includes `nginx`.

			`## Severity Filtering`

			Use `--severity warning` or `--severity critical` to limit the displayed findings.

			`Examples:`

			```bash
			`python3 journal_analyzer.py --file examples/sample-journal.log --severity critical`
			`python3 journal_analyzer.py --file examples/sample-journal.log --severity warning`
			```

			`## Severity Model`

			`Overall status is conservative:`

			- `OK` - no journal findings detected.
			- `WARNING` - warning-level findings exist but no critical findings exist.
			- `CRITICAL` - one or more critical findings exist.

			`Critical status is driven by failed units, dependency failures, OOM events, kernel panic findings, disk full or read-only filesystem symptoms, emergency mode, TLS/certificate failures, and I/O or filesystem errors.`

			`Warning status is driven by restart-related findings, timeout patterns, connection issues, permission denied events, authentication failures, degraded messages, and generic warning/failure entries that still require review.`

			`The report summarizes exported journal findings that require review. It does not claim root cause.`

			`## Usage`

			```bash
			`cd infra-run/scripts/python/journal-analyzer`

			`python3 journal_analyzer.py --file examples/sample-journal.log`
			`python3 journal_analyzer.py --file examples/sample-journal.log --format markdown`
			`python3 journal_analyzer.py --file examples/sample-journal.log --format markdown --output journal-report.md`
			`python3 journal_analyzer.py --file examples/sample-journal.log --format json`
			`python3 journal_analyzer.py --file examples/sample-journal.log --service sshd`
			`python3 journal_analyzer.py --file examples/sample-journal.log --service nginx`
			`python3 journal_analyzer.py --file examples/sample-journal.log --severity critical`
			`python3 journal_analyzer.py --file examples/sample-journal.log --top 10`
			`python3 journal_analyzer.py --file examples/sample-journal.log --since "2026-05-11 10:00:00"`
			`python3 journal_analyzer.py --file examples/sample-journal.log --until "2026-05-11 12:00:00"`
			`python3 journal_analyzer.py --file examples/sample-journal.log --ignore-case`
			```

			`## Output Formats`

			- `text` - default terminal-oriented report.
			- `markdown` - incident or change ticket attachment format.
			- `json` - structured output for local automation.

			Use `--output <path>` to write the report to a separate file. Without `--output`, the report is printed to stdout.

			`## Exit Codes`

			- `0` - OK, no journal findings.
			- `1` - Journal findings detected.
			- `2` - Invalid input, unreadable file, bad argument, output write failure, or runtime error.

			`## Example Text Output`

			```text
			`Journal Analyzer`
			`================`

			`Overall status: CRITICAL`
			`Journal findings require review; logs alone do not prove root cause.`

			`[CRITICAL] nginx.service - failed_unit`
			`Pattern: failed to start`
			`Occurrences: 1`
			`Unit: nginx.service`
			`Process: systemd`
			`PID: 1`
			`First seen: May 11 10:16:11`
			`Last seen: May 11 10:16:11`
			`Samples:`
			`- May 11 10:16:11 web01 systemd[1]: Failed to start nginx.service - A high performance web server and a reverse proxy server.`

			`Operational Summary`
			`-------------------`
			`Overall status: CRITICAL`
			`Total lines scanned: 17`
			`Total findings: 13`
			`Critical finding groups: 7`
			`Warning finding groups: 5`
			`Affected services/units count: 9`
			```

			`## Markdown Workflow`

			`Generate a Markdown report from an exported journal and attach it to the incident or change ticket as supporting evidence:`

			```bash
			`python3 journal_analyzer.py \`
			`--file examples/sample-journal.log \`
			`--format markdown \`
			`--output journal-report.md`
			```

			`Review the report before attaching it. Use it as a concise summary of exported journal findings, then correlate it with service status, monitoring, recent changes, package history, and runbook-specific post-checks.`

			`## Operational Limitations`

			`- Pattern matching is intentionally simple and predictable.`
			`- A single line can match more than one finding when it contains more than one meaningful symptom, such as a TLS failure plus certificate expiry.`
			- Default matching is already case-tolerant for practical journal review; `--ignore-case` remains available when you want to force case-insensitive operator searches.
			- Unit, process, and PID extraction are best-effort and may return `UNKNOWN`.
			`- Time filtering is best-effort because lines without parseable timestamps are retained.`
			`- Large log files are read into memory; use scoped journal exports for very large review windows.`
			`- The tool does not inspect structured journal fields because v1 works on exported text logs.`

			`## Safety Notes`

			`- The tool only reads the input journal export and optionally writes a separate report.`
			`- It does not require root privileges unless the chosen log path requires them.`
			`- Do not include secrets, private hostnames, customer identifiers, or unsanitized production details in portfolio examples.`
			`- Treat the output as triage evidence that requires operator review, not an automated remediation decision.`