Files

T

Mateusz Suski 89b7fabb96 Add JVM log analyzer tool

2026-05-11 17:05:27 +00:00

8.6 KiB

Raw Blame History

jvm-log-analyzer

jvm-log-analyzer is a read-only Python CLI for reviewing local JVM and Java application logs. It summarizes common Java exceptions, stack trace fragments, JVM failure symptoms, database issues, network/TLS problems, HTTP 5xx entries, and repeated application warning/error patterns that require operator review.

The tool is intended for Linux infrastructure, SRE, and application support workflows where a collected log file needs a quick first-pass operational summary. It does not modify logs or system state.

When To Use

During incident response when a JVM application log needs a fast exception and symptom summary.
During application support handoff when stack traces, HTTP 5xx entries, or database failures need to be attached as evidence.
After a restart, deployment, certificate change, database incident, or capacity event when local log extracts are available.
When predictable text, Markdown, or JSON output is useful for local review.

What It Does

Reads one local JVM or Java application log supplied with --file.
Detects configured critical and warning JVM/application patterns.
Extracts timestamps, log levels, thread names, logger/class names, exception types, raw samples, and short stack trace fragments where practical.
Aggregates top finding groups, exception types, and operational symptoms.
Produces text, Markdown, or JSON output.

What It Does Not Do

It does not read remote systems or live journal streams.
It does not modify logs, services, application files, JVM flags, certificates, or database state.
It does not query APM, ELK, SIEM, Zabbix, ticketing systems, or application APIs.
It does not find root cause automatically.
It does not prove an application defect.
It does not classify every vendor-specific Java framework or application message.

Supported Input Types

Java / JVM application logs.
Spring Boot style logs.
Tomcat-style application logs.
Generic application logs containing Java exceptions and stack traces.

UTF-8 text input is expected. Invalid byte sequences are replaced during read so review can continue. Empty, missing, unreadable, or non-file paths are rejected with exit code 2.

Supported JVM/Application Patterns

Critical patterns:

OutOfMemoryError
Java heap space
GC overhead limit exceeded
StackOverflowError
NoClassDefFoundError
ClassNotFoundException
ExceptionInInitializerError
SSLHandshakeException
CertificateExpiredException
SQLException
SQLRecoverableException
CommunicationsException
database unavailable
connection pool exhausted
HTTP 500
HTTP 502
HTTP 503
HTTP 504
FATAL

Warning patterns:

NullPointerException
IllegalArgumentException
IllegalStateException
SocketTimeoutException
ConnectException
TimeoutException
connection refused
connection reset
Broken pipe
WARN
ERROR
retrying
slow query
deadlock detected

By default matching is case-sensitive. Use --ignore-case for case-insensitive matching across configured patterns.

Stack Trace Handling

The scanner detects practical multiline Java stack traces using common starts such as:

Fully qualified Java exception lines, such as java.lang.NullPointerException.
Exception in thread "main".
Caused by:.
Application exceptions ending in Exception or Error.

Following stack frames are grouped when they look like Java frames:

Lines starting with whitespace followed by at .
Lines starting with Caused by:.
Lines containing ... N more.

Stack traces are associated with the detected exception type where possible. Text and Markdown output include only short sample lines by default. Use --include-stacktraces to include capped multiline stack trace fragments.

Timestamp Handling

The scanner attempts to parse:

2026-05-11 10:15:30
2026-05-11T10:15:30
2026-05-11 10:15:30,123
2026-05-11 10:15:30.123
May 11 10:15:30

Timestamp parsing is best-effort. Lines with unparseable timestamps are still analyzed. When --since or --until is used, lines without parseable timestamps are retained by default so potentially important findings are not silently discarded.

Severity Model

Overall status is conservative:

OK - no JVM/application findings.
WARNING - warning-level findings exist but no critical findings exist.
CRITICAL - one or more critical findings exist.

Critical status is driven by JVM memory failures, fatal JVM symptoms, selected class loading errors, TLS/certificate failures, database unavailable or pool exhaustion symptoms, and HTTP 5xx volume at or above the configured threshold.

Warning status is driven by non-fatal exceptions, WARN/ERROR entries, timeout/retry patterns, connection refused/reset symptoms, slow query findings, and deadlock patterns.

HTTP 5xx findings are warnings until their total reaches --http-critical-threshold, which defaults to 5. The report summarizes findings that require review; it does not claim root cause.

Usage

cd infra-run/scripts/python/jvm-log-analyzer

python3 jvm_log_analyzer.py --file examples/sample-jvm-app.log
python3 jvm_log_analyzer.py --file examples/sample-jvm-app.log --format markdown
python3 jvm_log_analyzer.py --file examples/sample-jvm-app.log --format markdown --output jvm-report.md
python3 jvm_log_analyzer.py --file examples/sample-jvm-app.log --format json
python3 jvm_log_analyzer.py --file examples/sample-jvm-app.log --top 10
python3 jvm_log_analyzer.py --file examples/sample-jvm-app.log --max-samples 5
python3 jvm_log_analyzer.py --file examples/sample-jvm-app.log --include-stacktraces
python3 jvm_log_analyzer.py --file examples/sample-jvm-app.log --since "2026-05-11 10:00:00"
python3 jvm_log_analyzer.py --file examples/sample-jvm-app.log --until "2026-05-11 12:00:00"
python3 jvm_log_analyzer.py --file examples/sample-jvm-app.log --http-critical-threshold 2

Output Formats

text - default terminal-oriented report.
markdown - incident or application support ticket attachment format.
json - structured output for local automation.

Use --output <path> to write the rendered report to a separate file. Without --output, the report is printed to stdout. The tool rejects an output path that resolves to the input log file.

Exit Codes

0 - OK, no JVM/application findings.
1 - JVM/application findings detected.
2 - Invalid input, unreadable file, bad argument, output write failure, or runtime error.

Example Text Output

JVM Log Analyzer
================

Overall status: CRITICAL
Findings require review; logs alone do not prove root cause.

[CRITICAL] OutOfMemoryError
Occurrences: 1
Symptom: jvm_memory
First seen: UNKNOWN
Last seen: UNKNOWN
Stack traces linked: 1
Samples:
  - Exception in thread "main" java.lang.OutOfMemoryError: Java heap space

Operational Summary
-------------------
Overall status: CRITICAL
Total lines scanned: 33
Total findings: 27
Total stack traces detected: 4
Critical finding groups: 11
Warning finding groups: 8
HTTP 5xx count: 3
Parsed timestamps count: 21
Unknown timestamps count: 12

Markdown Workflow

Generate a Markdown report from a collected JVM application log and attach it to the incident or application support ticket as supporting evidence:

python3 jvm_log_analyzer.py \
  --file examples/sample-jvm-app.log \
  --format markdown \
  --include-stacktraces \
  --output jvm-report.md

Review the report before attaching it. A WARNING or CRITICAL result should be reviewed with application health checks, JVM memory telemetry, database status, certificate state, recent deployments, and the relevant application owner.

Operational Limitations

Pattern matching is intentionally simple and predictable.
A single log line can match multiple findings, such as ERROR, HTTP 503, and a Java exception.
Case-sensitive default matching can miss lowercase variants unless --ignore-case is used.
Stack trace grouping is practical, not a complete Java parser.
Timestamp parsing is best-effort; unparseable lines are retained during time filtering.
HTTP 5xx counts are raw log counts, not request rates or customer impact.
Large log files are read into memory; collect scoped extracts for very large incidents.

Safety Notes

The tool only reads the input log and optionally writes a separate report.
It does not require elevated privileges unless the chosen log path requires them.
Do not include secrets, customer data, private hostnames, tokens, or unsanitized production details in portfolio examples.
Treat findings as prompts for operator review, not automated remediation instructions.

8.6 KiB Raw Blame History