infra-run/scripts/python/jvm-log-analyzer/README.md

# jvm-log-analyzer

`jvm-log-analyzer` is a read-only Python CLI for reviewing local JVM and Java application logs. It summarizes common Java exceptions, stack trace fragments, JVM failure symptoms, database issues, network/TLS problems, HTTP 5xx entries, and repeated application warning/error patterns that require operator review.

The tool is intended for Linux infrastructure, SRE, and application support workflows where a collected log file needs a quick first-pass operational summary. It does not modify logs or system state.

## When To Use

- During incident response when a JVM application log needs a fast exception and symptom summary.
- During application support handoff when stack traces, HTTP 5xx entries, or database failures need to be attached as evidence.
- After a restart, deployment, certificate change, database incident, or capacity event when local log extracts are available.
- When predictable text, Markdown, or JSON output is useful for local review.

## What It Does

- Reads one local JVM or Java application log supplied with `--file`.
- Detects configured critical and warning JVM/application patterns.
- Extracts timestamps, log levels, thread names, logger/class names, exception types, raw samples, and short stack trace fragments where practical.
- Aggregates top finding groups, exception types, and operational symptoms.
- Produces text, Markdown, or JSON output.

## What It Does Not Do

- It does not read remote systems or live journal streams.
- It does not modify logs, services, application files, JVM flags, certificates, or database state.
- It does not query APM, ELK, SIEM, Zabbix, ticketing systems, or application APIs.
- It does not find root cause automatically.
- It does not prove an application defect.
- It does not classify every vendor-specific Java framework or application message.

## Supported Input Types

- Java / JVM application logs.
- Spring Boot style logs.
- Tomcat-style application logs.
- Generic application logs containing Java exceptions and stack traces.

UTF-8 text input is expected. Invalid byte sequences are replaced during read so review can continue. Empty, missing, unreadable, or non-file paths are rejected with exit code `2`.

## Supported JVM/Application Patterns

Critical patterns:

- `OutOfMemoryError`
- `Java heap space`
- `GC overhead limit exceeded`
- `StackOverflowError`
- `NoClassDefFoundError`
- `ClassNotFoundException`
- `ExceptionInInitializerError`
- `SSLHandshakeException`
- `CertificateExpiredException`
- `SQLException`
- `SQLRecoverableException`
- `CommunicationsException`
- `database unavailable`
- `connection pool exhausted`
- `HTTP 500`
- `HTTP 502`
- `HTTP 503`
- `HTTP 504`
- `FATAL`

Warning patterns:

- `NullPointerException`
- `IllegalArgumentException`
- `IllegalStateException`
- `SocketTimeoutException`
- `ConnectException`
- `TimeoutException`
- `connection refused`
- `connection reset`
- `Broken pipe`
- `WARN`
- `ERROR`
- `retrying`
- `slow query`
- `deadlock detected`

By default matching is case-sensitive. Use `--ignore-case` for case-insensitive matching across configured patterns.

## Stack Trace Handling

The scanner detects practical multiline Java stack traces using common starts such as:

- Fully qualified Java exception lines, such as `java.lang.NullPointerException`.
- `Exception in thread "main"`.
- `Caused by:`.
- Application exceptions ending in `Exception` or `Error`.

Following stack frames are grouped when they look like Java frames:

- Lines starting with whitespace followed by `at `.
- Lines starting with `Caused by:`.
- Lines containing `... N more`.

Stack traces are associated with the detected exception type where possible. Text and Markdown output include only short sample lines by default. Use `--include-stacktraces` to include capped multiline stack trace fragments.

## Timestamp Handling

The scanner attempts to parse:

- `2026-05-11 10:15:30`
- `2026-05-11T10:15:30`
- `2026-05-11 10:15:30,123`
- `2026-05-11 10:15:30.123`
- `May 11 10:15:30`

Timestamp parsing is best-effort. Lines with unparseable timestamps are still analyzed. When `--since` or `--until` is used, lines without parseable timestamps are retained by default so potentially important findings are not silently discarded.

## Severity Model

Overall status is conservative:

- `OK` - no JVM/application findings.
- `WARNING` - warning-level findings exist but no critical findings exist.
- `CRITICAL` - one or more critical findings exist.

Critical status is driven by JVM memory failures, fatal JVM symptoms, selected class loading errors, TLS/certificate failures, database unavailable or pool exhaustion symptoms, and HTTP 5xx volume at or above the configured threshold.

Warning status is driven by non-fatal exceptions, `WARN`/`ERROR` entries, timeout/retry patterns, connection refused/reset symptoms, slow query findings, and deadlock patterns.

HTTP 5xx findings are warnings until their total reaches `--http-critical-threshold`, which defaults to `5`. The report summarizes findings that require review; it does not claim root cause.

## Usage

```bash
cd infra-run/scripts/python/jvm-log-analyzer

python3 jvm_log_analyzer.py --file examples/sample-jvm-app.log
python3 jvm_log_analyzer.py --file examples/sample-jvm-app.log --format markdown
python3 jvm_log_analyzer.py --file examples/sample-jvm-app.log --format markdown --output jvm-report.md
python3 jvm_log_analyzer.py --file examples/sample-jvm-app.log --format json
python3 jvm_log_analyzer.py --file examples/sample-jvm-app.log --top 10
python3 jvm_log_analyzer.py --file examples/sample-jvm-app.log --max-samples 5
python3 jvm_log_analyzer.py --file examples/sample-jvm-app.log --include-stacktraces
python3 jvm_log_analyzer.py --file examples/sample-jvm-app.log --since "2026-05-11 10:00:00"
python3 jvm_log_analyzer.py --file examples/sample-jvm-app.log --until "2026-05-11 12:00:00"
python3 jvm_log_analyzer.py --file examples/sample-jvm-app.log --http-critical-threshold 2
```

## Output Formats

- `text` - default terminal-oriented report.
- `markdown` - incident or application support ticket attachment format.
- `json` - structured output for local automation.

Use `--output <path>` to write the rendered report to a separate file. Without `--output`, the report is printed to stdout. The tool rejects an output path that resolves to the input log file.

## Exit Codes

- `0` - OK, no JVM/application findings.
- `1` - JVM/application findings detected.
- `2` - Invalid input, unreadable file, bad argument, output write failure, or runtime error.

## Example Text Output

```text
JVM Log Analyzer
================

Overall status: CRITICAL
Findings require review; logs alone do not prove root cause.

[CRITICAL] OutOfMemoryError
Occurrences: 1
Symptom: jvm_memory
First seen: UNKNOWN
Last seen: UNKNOWN
Stack traces linked: 1
Samples:
  - Exception in thread "main" java.lang.OutOfMemoryError: Java heap space

Operational Summary
-------------------
Overall status: CRITICAL
Total lines scanned: 33
Total findings: 27
Total stack traces detected: 4
Critical finding groups: 11
Warning finding groups: 8
HTTP 5xx count: 3
Parsed timestamps count: 21
Unknown timestamps count: 12
```

## Markdown Workflow

Generate a Markdown report from a collected JVM application log and attach it to the incident or application support ticket as supporting evidence:

```bash
python3 jvm_log_analyzer.py \
  --file examples/sample-jvm-app.log \
  --format markdown \
  --include-stacktraces \
  --output jvm-report.md
```

Review the report before attaching it. A `WARNING` or `CRITICAL` result should be reviewed with application health checks, JVM memory telemetry, database status, certificate state, recent deployments, and the relevant application owner.

## Operational Limitations

- Pattern matching is intentionally simple and predictable.
- A single log line can match multiple findings, such as `ERROR`, `HTTP 503`, and a Java exception.
- Case-sensitive default matching can miss lowercase variants unless `--ignore-case` is used.
- Stack trace grouping is practical, not a complete Java parser.
- Timestamp parsing is best-effort; unparseable lines are retained during time filtering.
- HTTP 5xx counts are raw log counts, not request rates or customer impact.
- Large log files are read into memory; collect scoped extracts for very large incidents.

## Safety Notes

- The tool only reads the input log and optionally writes a separate report.
- The implementation uses the Python standard library only and does not require package installation.
- It does not require elevated privileges unless the chosen log path requires them.
- Do not include secrets, customer data, private hostnames, tokens, or unsanitized production details in portfolio examples.
- Treat operational findings as prompts that require review; the tool does not determine root cause automatically.
Add JVM log analyzer tool 2026-05-11 17:05:27 +00:00			`# jvm-log-analyzer`

			`jvm-log-analyzer` is a read-only Python CLI for reviewing local JVM and Java application logs. It summarizes common Java exceptions, stack trace fragments, JVM failure symptoms, database issues, network/TLS problems, HTTP 5xx entries, and repeated application warning/error patterns that require operator review.

			`The tool is intended for Linux infrastructure, SRE, and application support workflows where a collected log file needs a quick first-pass operational summary. It does not modify logs or system state.`

			`## When To Use`

			`- During incident response when a JVM application log needs a fast exception and symptom summary.`
			`- During application support handoff when stack traces, HTTP 5xx entries, or database failures need to be attached as evidence.`
			`- After a restart, deployment, certificate change, database incident, or capacity event when local log extracts are available.`
			`- When predictable text, Markdown, or JSON output is useful for local review.`

			`## What It Does`

			- Reads one local JVM or Java application log supplied with `--file`.
			`- Detects configured critical and warning JVM/application patterns.`
			`- Extracts timestamps, log levels, thread names, logger/class names, exception types, raw samples, and short stack trace fragments where practical.`
			`- Aggregates top finding groups, exception types, and operational symptoms.`
			`- Produces text, Markdown, or JSON output.`

			`## What It Does Not Do`

			`- It does not read remote systems or live journal streams.`
			`- It does not modify logs, services, application files, JVM flags, certificates, or database state.`
			`- It does not query APM, ELK, SIEM, Zabbix, ticketing systems, or application APIs.`
			`- It does not find root cause automatically.`
			`- It does not prove an application defect.`
			`- It does not classify every vendor-specific Java framework or application message.`

			`## Supported Input Types`

			`- Java / JVM application logs.`
			`- Spring Boot style logs.`
			`- Tomcat-style application logs.`
			`- Generic application logs containing Java exceptions and stack traces.`

			UTF-8 text input is expected. Invalid byte sequences are replaced during read so review can continue. Empty, missing, unreadable, or non-file paths are rejected with exit code `2`.

			`## Supported JVM/Application Patterns`

			`Critical patterns:`

			- `OutOfMemoryError`
			- `Java heap space`
			- `GC overhead limit exceeded`
			- `StackOverflowError`
			- `NoClassDefFoundError`
			- `ClassNotFoundException`
			- `ExceptionInInitializerError`
			- `SSLHandshakeException`
			- `CertificateExpiredException`
			- `SQLException`
			- `SQLRecoverableException`
			- `CommunicationsException`
			- `database unavailable`
			- `connection pool exhausted`
			- `HTTP 500`
			- `HTTP 502`
			- `HTTP 503`
			- `HTTP 504`
			- `FATAL`

			`Warning patterns:`

			- `NullPointerException`
			- `IllegalArgumentException`
			- `IllegalStateException`
			- `SocketTimeoutException`
			- `ConnectException`
			- `TimeoutException`
			- `connection refused`
			- `connection reset`
			- `Broken pipe`
			- `WARN`
			- `ERROR`
			- `retrying`
			- `slow query`
			- `deadlock detected`

			By default matching is case-sensitive. Use `--ignore-case` for case-insensitive matching across configured patterns.

			`## Stack Trace Handling`

			`The scanner detects practical multiline Java stack traces using common starts such as:`

			- Fully qualified Java exception lines, such as `java.lang.NullPointerException`.
			- `Exception in thread "main"`.
			- `Caused by:`.
			- Application exceptions ending in `Exception` or `Error`.

			`Following stack frames are grouped when they look like Java frames:`

			- Lines starting with whitespace followed by `at `.
			- Lines starting with `Caused by:`.
			- Lines containing `... N more`.

			Stack traces are associated with the detected exception type where possible. Text and Markdown output include only short sample lines by default. Use `--include-stacktraces` to include capped multiline stack trace fragments.

			`## Timestamp Handling`

			`The scanner attempts to parse:`

			- `2026-05-11 10:15:30`
			- `2026-05-11T10:15:30`
			- `2026-05-11 10:15:30,123`
			- `2026-05-11 10:15:30.123`
			- `May 11 10:15:30`

			Timestamp parsing is best-effort. Lines with unparseable timestamps are still analyzed. When `--since` or `--until` is used, lines without parseable timestamps are retained by default so potentially important findings are not silently discarded.

			`## Severity Model`

			`Overall status is conservative:`

			- `OK` - no JVM/application findings.
			- `WARNING` - warning-level findings exist but no critical findings exist.
			- `CRITICAL` - one or more critical findings exist.

			`Critical status is driven by JVM memory failures, fatal JVM symptoms, selected class loading errors, TLS/certificate failures, database unavailable or pool exhaustion symptoms, and HTTP 5xx volume at or above the configured threshold.`

			Warning status is driven by non-fatal exceptions, `WARN`/`ERROR` entries, timeout/retry patterns, connection refused/reset symptoms, slow query findings, and deadlock patterns.

			HTTP 5xx findings are warnings until their total reaches `--http-critical-threshold`, which defaults to `5`. The report summarizes findings that require review; it does not claim root cause.

			`## Usage`

			```bash
			`cd infra-run/scripts/python/jvm-log-analyzer`

			`python3 jvm_log_analyzer.py --file examples/sample-jvm-app.log`
			`python3 jvm_log_analyzer.py --file examples/sample-jvm-app.log --format markdown`
			`python3 jvm_log_analyzer.py --file examples/sample-jvm-app.log --format markdown --output jvm-report.md`
			`python3 jvm_log_analyzer.py --file examples/sample-jvm-app.log --format json`
			`python3 jvm_log_analyzer.py --file examples/sample-jvm-app.log --top 10`
			`python3 jvm_log_analyzer.py --file examples/sample-jvm-app.log --max-samples 5`
			`python3 jvm_log_analyzer.py --file examples/sample-jvm-app.log --include-stacktraces`
			`python3 jvm_log_analyzer.py --file examples/sample-jvm-app.log --since "2026-05-11 10:00:00"`
			`python3 jvm_log_analyzer.py --file examples/sample-jvm-app.log --until "2026-05-11 12:00:00"`
			`python3 jvm_log_analyzer.py --file examples/sample-jvm-app.log --http-critical-threshold 2`
			```

			`## Output Formats`

			- `text` - default terminal-oriented report.
			- `markdown` - incident or application support ticket attachment format.
			- `json` - structured output for local automation.

			Use `--output <path>` to write the rendered report to a separate file. Without `--output`, the report is printed to stdout. The tool rejects an output path that resolves to the input log file.

			`## Exit Codes`

			- `0` - OK, no JVM/application findings.
			- `1` - JVM/application findings detected.
			- `2` - Invalid input, unreadable file, bad argument, output write failure, or runtime error.

			`## Example Text Output`

			```text
			`JVM Log Analyzer`
			`================`

			`Overall status: CRITICAL`
			`Findings require review; logs alone do not prove root cause.`

			`[CRITICAL] OutOfMemoryError`
			`Occurrences: 1`
			`Symptom: jvm_memory`
			`First seen: UNKNOWN`
			`Last seen: UNKNOWN`
			`Stack traces linked: 1`
			`Samples:`
			`- Exception in thread "main" java.lang.OutOfMemoryError: Java heap space`

			`Operational Summary`
			`-------------------`
			`Overall status: CRITICAL`
			`Total lines scanned: 33`
			`Total findings: 27`
			`Total stack traces detected: 4`
			`Critical finding groups: 11`
			`Warning finding groups: 8`
			`HTTP 5xx count: 3`
			`Parsed timestamps count: 21`
			`Unknown timestamps count: 12`
			```

			`## Markdown Workflow`

			`Generate a Markdown report from a collected JVM application log and attach it to the incident or application support ticket as supporting evidence:`

			```bash
			`python3 jvm_log_analyzer.py \`
			`--file examples/sample-jvm-app.log \`
			`--format markdown \`
			`--include-stacktraces \`
			`--output jvm-report.md`
			```

			Review the report before attaching it. A `WARNING` or `CRITICAL` result should be reviewed with application health checks, JVM memory telemetry, database status, certificate state, recent deployments, and the relevant application owner.

			`## Operational Limitations`

			`- Pattern matching is intentionally simple and predictable.`
			- A single log line can match multiple findings, such as `ERROR`, `HTTP 503`, and a Java exception.
			- Case-sensitive default matching can miss lowercase variants unless `--ignore-case` is used.
			`- Stack trace grouping is practical, not a complete Java parser.`
			`- Timestamp parsing is best-effort; unparseable lines are retained during time filtering.`
			`- HTTP 5xx counts are raw log counts, not request rates or customer impact.`
			`- Large log files are read into memory; collect scoped extracts for very large incidents.`

			`## Safety Notes`

			`- The tool only reads the input log and optionally writes a separate report.`
Clean up Python log analysis documentation 2026-05-11 17:10:10 +00:00			`- The implementation uses the Python standard library only and does not require package installation.`
Add JVM log analyzer tool 2026-05-11 17:05:27 +00:00			`- It does not require elevated privileges unless the chosen log path requires them.`
			`- Do not include secrets, customer data, private hostnames, tokens, or unsanitized production details in portfolio examples.`
Clean up Python log analysis documentation 2026-05-11 17:10:10 +00:00			`- Treat operational findings as prompts that require review; the tool does not determine root cause automatically.`