Add known error matcher tool

This commit is contained in:
Mateusz Suski
2026-05-11 17:06:46 +00:00
parent 5fc96348c5
commit 1636f46f81
6 changed files with 1096 additions and 0 deletions
@@ -0,0 +1,9 @@
2026-05-11 10:15:30 app01 checkout-api[1842]: INFO request_id=a1 path=/checkout status=200 duration_ms=42
2026-05-11 10:16:02 app01 checkout-api[1842]: ERROR HTTP 500 request_id=a2 path=/checkout customer_id=redacted
2026-05-11 10:16:07 app01 checkout-api[1842]: ERROR database unavailable while opening checkout connection pool
2026-05-11 10:16:11 app01 checkout-api[1842]: WARN upstream inventory-api connection refused at 10.20.30.40:8443
2026-05-11 10:16:15,123 app01 checkout-api[1842]: WARN payment provider request timed out after 5000 ms
2026-05-11T10:16:22 app01 checkout-api[1842]: ERROR javax.net.ssl.SSLHandshakeException: PKIX path building failed
2026-05-11 10:16:31.456 app01 nginx[907]: 198.51.100.25 - - "GET /checkout HTTP/1.1" 503 312 "-" "synthetic-check"
2026-05-11 10:16:40 app01 checkout-api[1842]: FATAL java.lang.OutOfMemoryError: Java heap space
2026-05-11 10:17:03 app01 checkout-api[1842]: INFO healthcheck completed status=degraded
@@ -0,0 +1,97 @@
# Known Error Matcher Report
- Overall status: `CRITICAL`
- Known error pattern matches require operator review; logs alone do not prove root cause.
## Matched Known Errors
### [CRITICAL] database_unavailable - Database unavailable
- Category: `application`
- Occurrences: `1`
- First seen: `2026-05-11 10:16:07`
- Last seen: `2026-05-11 10:16:07`
- Runbook: `infra-run/scripts/python/jvm-log-analyzer/README.md`
- Description: Application logged unavailable database or database connectivity symptoms.
- Samples:
- `2026-05-11 10:16:07 app01 checkout-api[1842]: ERROR database unavailable while opening checkout connection pool`
### [CRITICAL] http_500 - HTTP 500
- Category: `application`
- Occurrences: `1`
- First seen: `2026-05-11 10:16:02`
- Last seen: `2026-05-11 10:16:02`
- Runbook: `infra-run/runbooks/incidents/http-5xx.md`
- Description: Application or proxy logged HTTP 500 responses.
- Samples:
- `2026-05-11 10:16:02 app01 checkout-api[1842]: ERROR HTTP 500 request_id=a2 path=/checkout customer_id=redacted`
### [CRITICAL] http_503 - HTTP 503
- Category: `application`
- Occurrences: `1`
- First seen: `2026-05-11 10:16:31.456`
- Last seen: `2026-05-11 10:16:31.456`
- Runbook: `infra-run/runbooks/incidents/http-5xx.md`
- Description: Application or proxy logged HTTP 503 service unavailable responses.
- Samples:
- `2026-05-11 10:16:31.456 app01 nginx[907]: 198.51.100.25 - - "GET /checkout HTTP/1.1" 503 312 "-" "synthetic-check"`
### [CRITICAL] java_out_of_memory - Java OutOfMemoryError
- Category: `application_jvm`
- Occurrences: `1`
- First seen: `2026-05-11 10:16:40`
- Last seen: `2026-05-11 10:16:40`
- Runbook: `infra-run/scripts/python/jvm-log-analyzer/README.md`
- Description: Java process logged memory exhaustion symptoms.
- Samples:
- `2026-05-11 10:16:40 app01 checkout-api[1842]: FATAL java.lang.OutOfMemoryError: Java heap space`
### [CRITICAL] ssl_handshake_exception - SSLHandshakeException
- Category: `application_jvm`
- Occurrences: `1`
- First seen: `2026-05-11 10:16:22`
- Last seen: `2026-05-11 10:16:22`
- Runbook: `infra-run/scripts/python/jvm-log-analyzer/README.md`
- Description: Java TLS handshake exception was logged.
- Samples:
- `2026-05-11T10:16:22 app01 checkout-api[1842]: ERROR javax.net.ssl.SSLHandshakeException: PKIX path building failed`
### [WARNING] connection_refused - Connection refused
- Category: `network`
- Occurrences: `1`
- First seen: `2026-05-11 10:16:11`
- Last seen: `2026-05-11 10:16:11`
- Runbook: `infra-run/scripts/bash/os-healthcheck/README.md`
- Description: Client connection attempts were refused by the destination service or host.
- Samples:
- `2026-05-11 10:16:11 app01 checkout-api[1842]: WARN upstream inventory-api connection refused at 10.20.30.40:8443`
### [WARNING] timeout - Timeout
- Category: `network`
- Occurrences: `1`
- First seen: `2026-05-11 10:16:15,123`
- Last seen: `2026-05-11 10:16:15,123`
- Runbook: `infra-run/scripts/bash/os-healthcheck/README.md`
- Description: Operation timed out and may require network, service, or dependency review.
- Samples:
- `2026-05-11 10:16:15,123 app01 checkout-api[1842]: WARN payment provider request timed out after 5000 ms`
## Operational Summary
- Overall status: `CRITICAL`
- Total lines scanned: `9`
- Known error matches: `7`
- Matched known error patterns: `7`
- Critical matched patterns: `5`
- Warning matched patterns: `2`
- Top categories: application (3), network (2), application_jvm (2)
- Top matched known errors: database_unavailable (1), http_500 (1), http_503 (1), java_out_of_memory (1), ssl_handshake_exception (1), connection_refused (1), timeout (1)
- Timestamp coverage: parsed=`9`, unknown=`0`
- Filters used: severity=`None`, category=`None`
- Pattern catalog path: `patterns.json`
@@ -0,0 +1,10 @@
May 11 10:15:30 web01 kernel: EXT4-fs warning: No space left on device while writing /var/log/messages
May 11 10:15:35 web01 kernel: EXT4-fs error (device dm-0): Remounting filesystem read-only
May 11 10:15:41 web01 kernel: nginx invoked oom-killer: gfp_mask=0x140cca(GFP_HIGHUSER_MOVABLE)
May 11 10:15:42 web01 kernel: Out of memory: Killed process 2281 (java) total-vm:2097152kB
May 11 10:16:11 web01 systemd[1]: Failed to start nginx.service - A high performance web server and a reverse proxy server.
May 11 10:16:12 web01 systemd[1]: Dependency failed for webapp.service - Local web application.
May 11 10:16:13 web01 systemd[1]: nginx.service: Start request repeated too quickly.
May 11 10:16:25 web01 sshd[3371]: pam_unix(sshd:auth): authentication failure; logname= uid=0 euid=0 tty=ssh ruser= rhost=203.0.113.50 user=deploy
May 11 10:16:31 web01 sudo: deploy : command not allowed ; TTY=pts/0 ; PWD=/srv/app ; USER=root ; COMMAND=/bin/systemctl restart webapp
May 11 10:16:32 web01 sudo: deploy : permission denied while opening /etc/sudoers.d/webapp