Add known error matcher tool
This commit is contained in:
@@ -0,0 +1,9 @@
|
||||
2026-05-11 10:15:30 app01 checkout-api[1842]: INFO request_id=a1 path=/checkout status=200 duration_ms=42
|
||||
2026-05-11 10:16:02 app01 checkout-api[1842]: ERROR HTTP 500 request_id=a2 path=/checkout customer_id=redacted
|
||||
2026-05-11 10:16:07 app01 checkout-api[1842]: ERROR database unavailable while opening checkout connection pool
|
||||
2026-05-11 10:16:11 app01 checkout-api[1842]: WARN upstream inventory-api connection refused at 10.20.30.40:8443
|
||||
2026-05-11 10:16:15,123 app01 checkout-api[1842]: WARN payment provider request timed out after 5000 ms
|
||||
2026-05-11T10:16:22 app01 checkout-api[1842]: ERROR javax.net.ssl.SSLHandshakeException: PKIX path building failed
|
||||
2026-05-11 10:16:31.456 app01 nginx[907]: 198.51.100.25 - - "GET /checkout HTTP/1.1" 503 312 "-" "synthetic-check"
|
||||
2026-05-11 10:16:40 app01 checkout-api[1842]: FATAL java.lang.OutOfMemoryError: Java heap space
|
||||
2026-05-11 10:17:03 app01 checkout-api[1842]: INFO healthcheck completed status=degraded
|
||||
@@ -0,0 +1,97 @@
|
||||
# Known Error Matcher Report
|
||||
|
||||
- Overall status: `CRITICAL`
|
||||
- Known error pattern matches require operator review; logs alone do not prove root cause.
|
||||
|
||||
## Matched Known Errors
|
||||
|
||||
### [CRITICAL] database_unavailable - Database unavailable
|
||||
|
||||
- Category: `application`
|
||||
- Occurrences: `1`
|
||||
- First seen: `2026-05-11 10:16:07`
|
||||
- Last seen: `2026-05-11 10:16:07`
|
||||
- Runbook: `infra-run/scripts/python/jvm-log-analyzer/README.md`
|
||||
- Description: Application logged unavailable database or database connectivity symptoms.
|
||||
- Samples:
|
||||
- `2026-05-11 10:16:07 app01 checkout-api[1842]: ERROR database unavailable while opening checkout connection pool`
|
||||
|
||||
### [CRITICAL] http_500 - HTTP 500
|
||||
|
||||
- Category: `application`
|
||||
- Occurrences: `1`
|
||||
- First seen: `2026-05-11 10:16:02`
|
||||
- Last seen: `2026-05-11 10:16:02`
|
||||
- Runbook: `infra-run/runbooks/incidents/http-5xx.md`
|
||||
- Description: Application or proxy logged HTTP 500 responses.
|
||||
- Samples:
|
||||
- `2026-05-11 10:16:02 app01 checkout-api[1842]: ERROR HTTP 500 request_id=a2 path=/checkout customer_id=redacted`
|
||||
|
||||
### [CRITICAL] http_503 - HTTP 503
|
||||
|
||||
- Category: `application`
|
||||
- Occurrences: `1`
|
||||
- First seen: `2026-05-11 10:16:31.456`
|
||||
- Last seen: `2026-05-11 10:16:31.456`
|
||||
- Runbook: `infra-run/runbooks/incidents/http-5xx.md`
|
||||
- Description: Application or proxy logged HTTP 503 service unavailable responses.
|
||||
- Samples:
|
||||
- `2026-05-11 10:16:31.456 app01 nginx[907]: 198.51.100.25 - - "GET /checkout HTTP/1.1" 503 312 "-" "synthetic-check"`
|
||||
|
||||
### [CRITICAL] java_out_of_memory - Java OutOfMemoryError
|
||||
|
||||
- Category: `application_jvm`
|
||||
- Occurrences: `1`
|
||||
- First seen: `2026-05-11 10:16:40`
|
||||
- Last seen: `2026-05-11 10:16:40`
|
||||
- Runbook: `infra-run/scripts/python/jvm-log-analyzer/README.md`
|
||||
- Description: Java process logged memory exhaustion symptoms.
|
||||
- Samples:
|
||||
- `2026-05-11 10:16:40 app01 checkout-api[1842]: FATAL java.lang.OutOfMemoryError: Java heap space`
|
||||
|
||||
### [CRITICAL] ssl_handshake_exception - SSLHandshakeException
|
||||
|
||||
- Category: `application_jvm`
|
||||
- Occurrences: `1`
|
||||
- First seen: `2026-05-11 10:16:22`
|
||||
- Last seen: `2026-05-11 10:16:22`
|
||||
- Runbook: `infra-run/scripts/python/jvm-log-analyzer/README.md`
|
||||
- Description: Java TLS handshake exception was logged.
|
||||
- Samples:
|
||||
- `2026-05-11T10:16:22 app01 checkout-api[1842]: ERROR javax.net.ssl.SSLHandshakeException: PKIX path building failed`
|
||||
|
||||
### [WARNING] connection_refused - Connection refused
|
||||
|
||||
- Category: `network`
|
||||
- Occurrences: `1`
|
||||
- First seen: `2026-05-11 10:16:11`
|
||||
- Last seen: `2026-05-11 10:16:11`
|
||||
- Runbook: `infra-run/scripts/bash/os-healthcheck/README.md`
|
||||
- Description: Client connection attempts were refused by the destination service or host.
|
||||
- Samples:
|
||||
- `2026-05-11 10:16:11 app01 checkout-api[1842]: WARN upstream inventory-api connection refused at 10.20.30.40:8443`
|
||||
|
||||
### [WARNING] timeout - Timeout
|
||||
|
||||
- Category: `network`
|
||||
- Occurrences: `1`
|
||||
- First seen: `2026-05-11 10:16:15,123`
|
||||
- Last seen: `2026-05-11 10:16:15,123`
|
||||
- Runbook: `infra-run/scripts/bash/os-healthcheck/README.md`
|
||||
- Description: Operation timed out and may require network, service, or dependency review.
|
||||
- Samples:
|
||||
- `2026-05-11 10:16:15,123 app01 checkout-api[1842]: WARN payment provider request timed out after 5000 ms`
|
||||
|
||||
## Operational Summary
|
||||
|
||||
- Overall status: `CRITICAL`
|
||||
- Total lines scanned: `9`
|
||||
- Known error matches: `7`
|
||||
- Matched known error patterns: `7`
|
||||
- Critical matched patterns: `5`
|
||||
- Warning matched patterns: `2`
|
||||
- Top categories: application (3), network (2), application_jvm (2)
|
||||
- Top matched known errors: database_unavailable (1), http_500 (1), http_503 (1), java_out_of_memory (1), ssl_handshake_exception (1), connection_refused (1), timeout (1)
|
||||
- Timestamp coverage: parsed=`9`, unknown=`0`
|
||||
- Filters used: severity=`None`, category=`None`
|
||||
- Pattern catalog path: `patterns.json`
|
||||
@@ -0,0 +1,10 @@
|
||||
May 11 10:15:30 web01 kernel: EXT4-fs warning: No space left on device while writing /var/log/messages
|
||||
May 11 10:15:35 web01 kernel: EXT4-fs error (device dm-0): Remounting filesystem read-only
|
||||
May 11 10:15:41 web01 kernel: nginx invoked oom-killer: gfp_mask=0x140cca(GFP_HIGHUSER_MOVABLE)
|
||||
May 11 10:15:42 web01 kernel: Out of memory: Killed process 2281 (java) total-vm:2097152kB
|
||||
May 11 10:16:11 web01 systemd[1]: Failed to start nginx.service - A high performance web server and a reverse proxy server.
|
||||
May 11 10:16:12 web01 systemd[1]: Dependency failed for webapp.service - Local web application.
|
||||
May 11 10:16:13 web01 systemd[1]: nginx.service: Start request repeated too quickly.
|
||||
May 11 10:16:25 web01 sshd[3371]: pam_unix(sshd:auth): authentication failure; logname= uid=0 euid=0 tty=ssh ruser= rhost=203.0.113.50 user=deploy
|
||||
May 11 10:16:31 web01 sudo: deploy : command not allowed ; TTY=pts/0 ; PWD=/srv/app ; USER=root ; COMMAND=/bin/systemctl restart webapp
|
||||
May 11 10:16:32 web01 sudo: deploy : permission denied while opening /etc/sudoers.d/webapp
|
||||
Reference in New Issue
Block a user