Add known error matcher tool

2026-05-11 17:06:46 +00:00
parent 5fc96348c5
commit 1636f46f81
6 changed files with 1096 additions and 0 deletions
@@ -0,0 +1,198 @@
+# known-error-matcher
+
+`known-error-matcher` is a read-only Python CLI for scanning local log files against a JSON catalog of known operational error patterns. It connects matched log symptoms with severity, category, sample lines, and runbook references so an infrastructure engineer can decide what needs review next.
+
+The tool matches known operational error patterns that require review. It does not prove an incident, identify root cause automatically, or replace service-specific runbooks.
+
+## Purpose
+
+- Identify which cataloged operational problems are visible in a collected log.
+- Count how often each known error pattern appears.
+- Surface warning and critical matches conservatively.
+- Point operators toward relevant runbooks or supporting local tools.
+- Produce predictable text, Markdown, or JSON output for incident notes.
+
+## When To Use
+
+- During incident response when a collected application, system, or journal extract needs quick known-error matching.
+- Before attaching log evidence to an incident, problem, or change ticket.
+- When teams maintain a small local catalog of operational patterns and runbook links.
+- When JSON output is useful for later local automation.
+
+## What It Does Not Do
+
+- It does not read remote systems or live streams.
+- It does not modify logs, services, applications, accounts, or host state.
+- It does not query ELK, SIEM, APM, Zabbix, ticketing systems, or external services.
+- It does not find root cause automatically.
+- It does not prove an incident or confirm customer impact.
+- It does not classify every vendor-specific log message.
+
+## Pattern Catalog Format
+
+Patterns are defined in JSON because the Python standard library can parse JSON without third-party dependencies.
+
+```json
+{
+  "patterns": [
+    {
+      "id": "disk_full",
+      "name": "Disk full",
+      "severity": "CRITICAL",
+      "regex": "No space left on device|disk full",
+      "category": "storage",
+      "runbook": "infra-run/scripts/bash/disk-full/README.md",
+      "description": "Filesystem or application failed because free space was exhausted."
+    }
+  ]
+}
+```
+
+Required fields per pattern:
+
+- `id` - stable non-empty identifier.
+- `name` - human-readable finding name.
+- `severity` - `WARNING` or `CRITICAL`.
+- `regex` - Python regular expression used for matching.
+
+Optional fields:
+
+- `category` - operational grouping such as `storage`, `network`, `security`, `application`, or `systemd`. Missing values are reported as `UNKNOWN`.
+- `runbook` - repository path to review when the pattern matches. Missing values are reported as `None`.
+- `description` - short operator-facing explanation. Missing values are reported as `None`.
+
+The catalog is validated before scanning starts. Invalid JSON, missing required fields, duplicate IDs, invalid severity values, and invalid regexes fail with exit code `2`.
+
+## Adding A Known Error Pattern
+
+Add a new object under `patterns` in `patterns.json`:
+
+```json
+{
+  "id": "example_dependency_failure",
+  "name": "Example dependency failure",
+  "severity": "WARNING",
+  "regex": "dependency request failed|upstream dependency unavailable",
+  "category": "application",
+  "runbook": "infra-run/runbooks/incidents/dependency-failure.md",
+  "description": "Application logged a dependency failure that requires review."
+}
+```
+
+Use a stable `id`, choose the lowest severity that still reflects operational risk, and keep the regex specific enough to avoid noisy generic matches. Prefer a runbook path that already exists; otherwise use a plausible future path under `infra-run/runbooks/incidents/` or leave it empty.
+
+## Severity Model
+
+Overall status is conservative:
+
+- `OK` - no known error patterns matched.
+- `WARNING` - one or more warning patterns matched and no critical patterns matched.
+- `CRITICAL` - one or more critical patterns matched.
+
+The status means known error patterns require review. It is not a final root-cause statement.
+
+## Category Filtering
+
+Use `--category CATEGORY` to include only matches where the pattern category exactly matches the provided value.
+
+Examples:
+
+```bash
+python3 known_error_matcher.py --file examples/sample-system.log --patterns patterns.json --category storage
+python3 known_error_matcher.py --file examples/sample-app.log --patterns patterns.json --category application
+```
+
+## Usage
+
+```bash
+cd infra-run/scripts/python/known-error-matcher
+
+python3 known_error_matcher.py --file examples/sample-app.log --patterns patterns.json
+python3 known_error_matcher.py --file examples/sample-system.log --patterns patterns.json
+python3 known_error_matcher.py --file examples/sample-app.log --patterns patterns.json --format markdown
+python3 known_error_matcher.py --file examples/sample-app.log --patterns patterns.json --format markdown --output known-error-report.md
+python3 known_error_matcher.py --file examples/sample-app.log --patterns patterns.json --format json
+python3 known_error_matcher.py --file examples/sample-app.log --patterns patterns.json --ignore-case
+python3 known_error_matcher.py --file examples/sample-app.log --patterns patterns.json --severity critical
+python3 known_error_matcher.py --file examples/sample-app.log --patterns patterns.json --top 10
+python3 known_error_matcher.py --file examples/sample-app.log --patterns patterns.json --max-samples 5
+```
+
+## Output Formats
+
+- `text` - default terminal-oriented report.
+- `markdown` - incident, problem, or change ticket attachment format.
+- `json` - structured output for local automation.
+
+Use `--output <path>` to write the rendered report to a separate file. Without `--output`, the report is printed to stdout. The tool rejects an output path that resolves to the input log file or pattern catalog file.
+
+## Exit Codes
+
+- `0` - OK, no known error matches.
+- `1` - Known error matches detected.
+- `2` - Invalid input, unreadable file, invalid JSON, invalid pattern catalog, invalid regex, bad argument, output write failure, or runtime error.
+
+## Example Text Output
+
+```text
+Known Error Matcher
+===================
+
+Overall status: CRITICAL
+Known error pattern matches require operator review; logs alone do not prove root cause.
+
+[CRITICAL] database_unavailable - Database unavailable
+Category: application
+Occurrences: 1
+First seen: 2026-05-11 10:16:07
+Last seen: 2026-05-11 10:16:07
+Runbook: infra-run/scripts/python/jvm-log-analyzer/README.md
+Description: Application logged unavailable database or database connectivity symptoms.
+Samples:
+  - 2026-05-11 10:16:07 app01 checkout-api[1842]: ERROR database unavailable while opening checkout connection pool
+
+Operational Summary
+-------------------
+Overall status: CRITICAL
+Total lines scanned: 9
+Known error matches: 7
+Matched known error patterns: 7
+Critical matched patterns: 5
+Warning matched patterns: 2
+Top categories: application (3), network (2), application_jvm (2)
+Top matched known errors: database_unavailable (1), http_500 (1), http_503 (1), java_out_of_memory (1), ssl_handshake_exception (1), connection_refused (1), timeout (1)
+Timestamp coverage: parsed=9, unknown=0
+Filters used: severity=None, category=None
+Pattern catalog path: patterns.json
+```
+
+## Markdown Workflow
+
+Generate a Markdown report from the collected log and attach it to the incident or problem ticket as supporting evidence:
+
+```bash
+python3 known_error_matcher.py \
+  --file examples/sample-app.log \
+  --patterns patterns.json \
+  --format markdown \
+  --output known-error-report.md
+```
+
+Review the report before attaching it. A `WARNING` or `CRITICAL` result should be correlated with service health, monitoring, recent changes, dependency status, and the referenced runbook.
+
+## Operational Limitations
+
+- Pattern matching is intentionally simple and predictable.
+- A single log line can match multiple known error patterns.
+- Case-sensitive default matching can miss lowercase variants unless `--ignore-case` is used.
+- Timestamp parsing is best-effort; unparseable timestamps are reported as `UNKNOWN`.
+- Counts are raw log-line matches, not request rates, incident duration, or customer impact.
+- `--top` limits displayed findings only. The summary still reflects all matched patterns after filters.
+- Large log files are read into memory; use scoped extracts for very large incidents.
+
+## Safety Notes
+
+- The tool only reads the input log and pattern catalog and optionally writes a separate report.
+- It does not require elevated privileges unless the chosen log path requires them.
+- Do not include secrets, private hostnames, customer identifiers, tokens, or unsanitized production details in portfolio examples.
+- Treat matches as prompts for operator review, not automated remediation instructions.
@@ -0,0 +1,9 @@
+2026-05-11 10:15:30 app01 checkout-api[1842]: INFO request_id=a1 path=/checkout status=200 duration_ms=42
+2026-05-11 10:16:02 app01 checkout-api[1842]: ERROR HTTP 500 request_id=a2 path=/checkout customer_id=redacted
+2026-05-11 10:16:07 app01 checkout-api[1842]: ERROR database unavailable while opening checkout connection pool
+2026-05-11 10:16:11 app01 checkout-api[1842]: WARN upstream inventory-api connection refused at 10.20.30.40:8443
+2026-05-11 10:16:15,123 app01 checkout-api[1842]: WARN payment provider request timed out after 5000 ms
+2026-05-11T10:16:22 app01 checkout-api[1842]: ERROR javax.net.ssl.SSLHandshakeException: PKIX path building failed
+2026-05-11 10:16:31.456 app01 nginx[907]: 198.51.100.25 - - "GET /checkout HTTP/1.1" 503 312 "-" "synthetic-check"
+2026-05-11 10:16:40 app01 checkout-api[1842]: FATAL java.lang.OutOfMemoryError: Java heap space
+2026-05-11 10:17:03 app01 checkout-api[1842]: INFO healthcheck completed status=degraded
@@ -0,0 +1,97 @@
+# Known Error Matcher Report
+
+- Overall status: `CRITICAL`
+- Known error pattern matches require operator review; logs alone do not prove root cause.
+
+## Matched Known Errors
+
+### [CRITICAL] database_unavailable - Database unavailable
+
+- Category: `application`
+- Occurrences: `1`
+- First seen: `2026-05-11 10:16:07`
+- Last seen: `2026-05-11 10:16:07`
+- Runbook: `infra-run/scripts/python/jvm-log-analyzer/README.md`
+- Description: Application logged unavailable database or database connectivity symptoms.
+- Samples:
+  - `2026-05-11 10:16:07 app01 checkout-api[1842]: ERROR database unavailable while opening checkout connection pool`
+
+### [CRITICAL] http_500 - HTTP 500
+
+- Category: `application`
+- Occurrences: `1`
+- First seen: `2026-05-11 10:16:02`
+- Last seen: `2026-05-11 10:16:02`
+- Runbook: `infra-run/runbooks/incidents/http-5xx.md`
+- Description: Application or proxy logged HTTP 500 responses.
+- Samples:
+  - `2026-05-11 10:16:02 app01 checkout-api[1842]: ERROR HTTP 500 request_id=a2 path=/checkout customer_id=redacted`
+
+### [CRITICAL] http_503 - HTTP 503
+
+- Category: `application`
+- Occurrences: `1`
+- First seen: `2026-05-11 10:16:31.456`
+- Last seen: `2026-05-11 10:16:31.456`
+- Runbook: `infra-run/runbooks/incidents/http-5xx.md`
+- Description: Application or proxy logged HTTP 503 service unavailable responses.
+- Samples:
+  - `2026-05-11 10:16:31.456 app01 nginx[907]: 198.51.100.25 - - "GET /checkout HTTP/1.1" 503 312 "-" "synthetic-check"`
+
+### [CRITICAL] java_out_of_memory - Java OutOfMemoryError
+
+- Category: `application_jvm`
+- Occurrences: `1`
+- First seen: `2026-05-11 10:16:40`
+- Last seen: `2026-05-11 10:16:40`
+- Runbook: `infra-run/scripts/python/jvm-log-analyzer/README.md`
+- Description: Java process logged memory exhaustion symptoms.
+- Samples:
+  - `2026-05-11 10:16:40 app01 checkout-api[1842]: FATAL java.lang.OutOfMemoryError: Java heap space`
+
+### [CRITICAL] ssl_handshake_exception - SSLHandshakeException
+
+- Category: `application_jvm`
+- Occurrences: `1`
+- First seen: `2026-05-11 10:16:22`
+- Last seen: `2026-05-11 10:16:22`
+- Runbook: `infra-run/scripts/python/jvm-log-analyzer/README.md`
+- Description: Java TLS handshake exception was logged.
+- Samples:
+  - `2026-05-11T10:16:22 app01 checkout-api[1842]: ERROR javax.net.ssl.SSLHandshakeException: PKIX path building failed`
+
+### [WARNING] connection_refused - Connection refused
+
+- Category: `network`
+- Occurrences: `1`
+- First seen: `2026-05-11 10:16:11`
+- Last seen: `2026-05-11 10:16:11`
+- Runbook: `infra-run/scripts/bash/os-healthcheck/README.md`
+- Description: Client connection attempts were refused by the destination service or host.
+- Samples:
+  - `2026-05-11 10:16:11 app01 checkout-api[1842]: WARN upstream inventory-api connection refused at 10.20.30.40:8443`
+
+### [WARNING] timeout - Timeout
+
+- Category: `network`
+- Occurrences: `1`
+- First seen: `2026-05-11 10:16:15,123`
+- Last seen: `2026-05-11 10:16:15,123`
+- Runbook: `infra-run/scripts/bash/os-healthcheck/README.md`
+- Description: Operation timed out and may require network, service, or dependency review.
+- Samples:
+  - `2026-05-11 10:16:15,123 app01 checkout-api[1842]: WARN payment provider request timed out after 5000 ms`
+
+## Operational Summary
+
+- Overall status: `CRITICAL`
+- Total lines scanned: `9`
+- Known error matches: `7`
+- Matched known error patterns: `7`
+- Critical matched patterns: `5`
+- Warning matched patterns: `2`
+- Top categories: application (3), network (2), application_jvm (2)
+- Top matched known errors: database_unavailable (1), http_500 (1), http_503 (1), java_out_of_memory (1), ssl_handshake_exception (1), connection_refused (1), timeout (1)
+- Timestamp coverage: parsed=`9`, unknown=`0`
+- Filters used: severity=`None`, category=`None`
+- Pattern catalog path: `patterns.json`
@@ -0,0 +1,10 @@
+May 11 10:15:30 web01 kernel: EXT4-fs warning: No space left on device while writing /var/log/messages
+May 11 10:15:35 web01 kernel: EXT4-fs error (device dm-0): Remounting filesystem read-only
+May 11 10:15:41 web01 kernel: nginx invoked oom-killer: gfp_mask=0x140cca(GFP_HIGHUSER_MOVABLE)
+May 11 10:15:42 web01 kernel: Out of memory: Killed process 2281 (java) total-vm:2097152kB
+May 11 10:16:11 web01 systemd[1]: Failed to start nginx.service - A high performance web server and a reverse proxy server.
+May 11 10:16:12 web01 systemd[1]: Dependency failed for webapp.service - Local web application.
+May 11 10:16:13 web01 systemd[1]: nginx.service: Start request repeated too quickly.
+May 11 10:16:25 web01 sshd[3371]: pam_unix(sshd:auth): authentication failure; logname= uid=0 euid=0 tty=ssh ruser= rhost=203.0.113.50 user=deploy
+May 11 10:16:31 web01 sudo: deploy : command not allowed ; TTY=pts/0 ; PWD=/srv/app ; USER=root ; COMMAND=/bin/systemctl restart webapp
+May 11 10:16:32 web01 sudo: deploy : permission denied while opening /etc/sudoers.d/webapp
@@ -0,0 +1,562 @@
+#!/usr/bin/env python3
+"""Match local logs against a JSON catalog of known operational error patterns."""
+
+from __future__ import annotations
+
+import argparse
+import json
+import re
+import sys
+from collections import Counter
+from datetime import datetime
+from pathlib import Path
+from typing import Any
+
+
+EXIT_OK = 0
+EXIT_FINDINGS = 1
+EXIT_INVALID = 2
+
+UNKNOWN = "UNKNOWN"
+VALID_SEVERITIES = {"WARNING", "CRITICAL"}
+SEVERITY_ORDER = {"CRITICAL": 0, "WARNING": 1}
+
+ISO_TIMESTAMP_RE = re.compile(
+    r"\b(\d{4}-\d{2}-\d{2})[ T](\d{2}:\d{2}:\d{2})([,.]\d{1,6})?\b"
+)
+SYSLOG_TIMESTAMP_RE = re.compile(r"^([A-Z][a-z]{2}\s+\d{1,2}\s+\d{2}:\d{2}:\d{2})\b")
+
+
+def build_parser() -> argparse.ArgumentParser:
+    parser = argparse.ArgumentParser(
+        description="Scan a local log file for known operational error patterns."
+    )
+    parser.add_argument("--file", required=True, help="Local log file to scan.")
+    parser.add_argument("--patterns", required=True, help="JSON known error pattern catalog.")
+    parser.add_argument(
+        "--format",
+        choices=("text", "markdown", "json"),
+        default="text",
+        help="Report format. Default: text.",
+    )
+    parser.add_argument("--output", help="Write report to this path instead of stdout.")
+    parser.add_argument(
+        "--severity",
+        choices=("warning", "critical"),
+        help="Only include findings with this severity.",
+    )
+    parser.add_argument(
+        "--category",
+        help="Only include findings from this exact category.",
+    )
+    parser.add_argument(
+        "--top",
+        type=positive_int,
+        default=10,
+        help="Number of matched known errors and summary entries to display. Default: 10.",
+    )
+    parser.add_argument(
+        "--max-samples",
+        type=non_negative_int,
+        default=3,
+        help="Maximum sample lines per matched known error. Default: 3.",
+    )
+    parser.add_argument(
+        "--ignore-case",
+        action="store_true",
+        help="Compile catalog regex patterns case-insensitively.",
+    )
+    return parser
+
+
+def positive_int(value: str) -> int:
+    try:
+        number = int(value)
+    except ValueError as exc:
+        raise argparse.ArgumentTypeError("must be a positive integer") from exc
+    if number <= 0:
+        raise argparse.ArgumentTypeError("must be a positive integer")
+    return number
+
+
+def non_negative_int(value: str) -> int:
+    try:
+        number = int(value)
+    except ValueError as exc:
+        raise argparse.ArgumentTypeError("must be zero or a positive integer") from exc
+    if number < 0:
+        raise argparse.ArgumentTypeError("must be zero or a positive integer")
+    return number
+
+
+def read_text_file(path: Path, label: str) -> str:
+    if not path.exists():
+        raise OSError(f"{label} does not exist: {path}")
+    if not path.is_file():
+        raise OSError(f"{label} is not a regular file: {path}")
+    try:
+        text = path.read_text(encoding="utf-8", errors="replace")
+    except PermissionError as exc:
+        raise OSError(f"{label} is not readable: {path}") from exc
+    except OSError as exc:
+        raise OSError(f"unable to read {label} {path}: {exc}") from exc
+    if text == "":
+        raise ValueError(f"{label} is empty: {path}")
+    return text
+
+
+def load_pattern_catalog(path: Path, ignore_case: bool) -> list[dict[str, Any]]:
+    text = read_text_file(path, "pattern catalog")
+    try:
+        catalog = json.loads(text)
+    except json.JSONDecodeError as exc:
+        raise ValueError(f"invalid JSON in pattern catalog {path}: {exc}") from exc
+
+    errors: list[str] = []
+    if not isinstance(catalog, dict):
+        raise ValueError("invalid pattern catalog: top-level JSON value must be an object")
+    if "patterns" not in catalog:
+        raise ValueError('invalid pattern catalog: missing top-level "patterns" field')
+    if not isinstance(catalog["patterns"], list):
+        raise ValueError('invalid pattern catalog: "patterns" must be a list')
+
+    seen_ids: set[str] = set()
+    compiled_patterns: list[dict[str, Any]] = []
+    flags = re.IGNORECASE if ignore_case else 0
+
+    for index, item in enumerate(catalog["patterns"], start=1):
+        if not isinstance(item, dict):
+            errors.append(f"pattern #{index}: must be an object")
+            continue
+
+        pattern_id = normalize_required_text(item, "id")
+        name = normalize_required_text(item, "name")
+        severity = normalize_required_text(item, "severity").upper()
+        regex_text = normalize_required_text(item, "regex")
+
+        if not pattern_id:
+            errors.append(f"pattern #{index}: id is required and must be non-empty")
+        elif pattern_id in seen_ids:
+            errors.append(f"pattern #{index}: duplicate id {pattern_id}")
+        else:
+            seen_ids.add(pattern_id)
+
+        if not name:
+            errors.append(f"pattern {pattern_id or f'#{index}'}: name is required and must be non-empty")
+        if severity not in VALID_SEVERITIES:
+            errors.append(
+                f"pattern {pattern_id or f'#{index}'}: severity must be WARNING or CRITICAL"
+            )
+        if not regex_text:
+            errors.append(f"pattern {pattern_id or f'#{index}'}: regex is required and must be non-empty")
+
+        compiled_regex = None
+        if regex_text:
+            try:
+                compiled_regex = re.compile(regex_text, flags)
+            except re.error as exc:
+                errors.append(f"pattern {pattern_id or f'#{index}'}: invalid regex: {exc}")
+
+        if pattern_id and name and severity in VALID_SEVERITIES and regex_text and compiled_regex:
+            compiled_patterns.append(
+                {
+                    "id": pattern_id,
+                    "name": name,
+                    "severity": severity,
+                    "regex_text": regex_text,
+                    "regex": compiled_regex,
+                    "category": normalize_optional_text(item, "category", UNKNOWN),
+                    "runbook": normalize_optional_text(item, "runbook", ""),
+                    "description": normalize_optional_text(item, "description", ""),
+                }
+            )
+
+    if errors:
+        raise ValueError("invalid pattern catalog:\n- " + "\n- ".join(errors))
+    if not compiled_patterns:
+        raise ValueError("invalid pattern catalog: no patterns configured")
+    return compiled_patterns
+
+
+def normalize_required_text(item: dict[str, Any], field: str) -> str:
+    value = item.get(field)
+    if not isinstance(value, str):
+        return ""
+    return value.strip()
+
+
+def normalize_optional_text(item: dict[str, Any], field: str, default: str) -> str:
+    value = item.get(field, default)
+    if not isinstance(value, str):
+        return default
+    value = value.strip()
+    return value if value else default
+
+
+def parse_line_timestamp(line: str, syslog_year: int) -> tuple[datetime | None, str]:
+    iso_match = ISO_TIMESTAMP_RE.search(line)
+    if iso_match:
+        fraction = iso_match.group(3) or ""
+        raw = f"{iso_match.group(1)} {iso_match.group(2)}"
+        parse_value = raw
+        fmt = "%Y-%m-%d %H:%M:%S"
+        if fraction:
+            parse_value = f"{raw}.{fraction[1:].ljust(6, '0')[:6]}"
+            fmt = "%Y-%m-%d %H:%M:%S.%f"
+        try:
+            return datetime.strptime(parse_value, fmt), raw + fraction
+        except ValueError:
+            return None, UNKNOWN
+
+    syslog_match = SYSLOG_TIMESTAMP_RE.search(line)
+    if syslog_match:
+        raw = syslog_match.group(1)
+        try:
+            parsed = datetime.strptime(f"{syslog_year} {raw}", "%Y %b %d %H:%M:%S")
+        except ValueError:
+            return None, UNKNOWN
+        return parsed, raw
+
+    return None, UNKNOWN
+
+
+def severity_filter_matches(selected: str | None, severity: str) -> bool:
+    if selected is None:
+        return True
+    return selected.upper() == severity
+
+
+def category_filter_matches(selected: str | None, category: str) -> bool:
+    if selected is None:
+        return True
+    return selected == category
+
+
+def update_seen(group: dict[str, Any], parsed_at: datetime | None, rendered_at: str) -> None:
+    if parsed_at is None:
+        return
+    if group["first_seen"] is None or parsed_at < group["first_seen"][0]:
+        group["first_seen"] = (parsed_at, rendered_at)
+    if group["last_seen"] is None or parsed_at > group["last_seen"][0]:
+        group["last_seen"] = (parsed_at, rendered_at)
+
+
+def render_seen(value: tuple[datetime, str] | None) -> str:
+    if value is None:
+        return UNKNOWN
+    return value[1] or value[0].strftime("%Y-%m-%d %H:%M:%S")
+
+
+def append_limited(items: list[str], value: str, limit: int) -> None:
+    if limit == 0:
+        return
+    if value in items:
+        return
+    if len(items) < limit:
+        items.append(value)
+
+
+def analyze_log(
+    lines: list[str],
+    patterns: list[dict[str, Any]],
+    severity_filter: str | None,
+    category_filter: str | None,
+    top: int,
+    max_samples: int,
+    pattern_catalog_path: Path,
+) -> dict[str, Any]:
+    syslog_year = datetime.now().year
+    groups: dict[str, dict[str, Any]] = {}
+    top_categories = Counter()
+    total_lines_scanned = 0
+    parsed_timestamps = 0
+    unknown_timestamps = 0
+
+    for line in lines:
+        total_lines_scanned += 1
+        parsed_at, rendered_at = parse_line_timestamp(line, syslog_year)
+        if parsed_at is None:
+            unknown_timestamps += 1
+        else:
+            parsed_timestamps += 1
+
+        for pattern in patterns:
+            if not severity_filter_matches(severity_filter, pattern["severity"]):
+                continue
+            if not category_filter_matches(category_filter, pattern["category"]):
+                continue
+            if not pattern["regex"].search(line):
+                continue
+
+            group = groups.setdefault(
+                pattern["id"],
+                {
+                    "id": pattern["id"],
+                    "name": pattern["name"],
+                    "severity": pattern["severity"],
+                    "category": pattern["category"],
+                    "runbook": pattern["runbook"],
+                    "description": pattern["description"],
+                    "regex": pattern["regex_text"],
+                    "occurrences": 0,
+                    "first_seen": None,
+                    "last_seen": None,
+                    "samples": [],
+                },
+            )
+            group["occurrences"] += 1
+            update_seen(group, parsed_at, rendered_at)
+            append_limited(group["samples"], line, max_samples)
+            top_categories[pattern["category"]] += 1
+
+    findings = sorted(
+        groups.values(),
+        key=lambda item: (
+            SEVERITY_ORDER[item["severity"]],
+            -item["occurrences"],
+            item["id"],
+        ),
+    )
+
+    rendered_findings = [
+        {
+            **finding,
+            "first_seen": render_seen(finding["first_seen"]),
+            "last_seen": render_seen(finding["last_seen"]),
+        }
+        for finding in findings
+    ]
+
+    critical_patterns = sum(1 for item in rendered_findings if item["severity"] == "CRITICAL")
+    warning_patterns = sum(1 for item in rendered_findings if item["severity"] == "WARNING")
+    total_matches = sum(item["occurrences"] for item in rendered_findings)
+
+    overall_status = "OK"
+    if critical_patterns > 0:
+        overall_status = "CRITICAL"
+    elif warning_patterns > 0:
+        overall_status = "WARNING"
+
+    return {
+        "overall_status": overall_status,
+        "total_lines_scanned": total_lines_scanned,
+        "total_known_error_matches": total_matches,
+        "matched_pattern_count": len(rendered_findings),
+        "critical_matched_pattern_count": critical_patterns,
+        "warning_matched_pattern_count": warning_patterns,
+        "top_categories": [
+            {"category": name, "count": count}
+            for name, count in top_categories.most_common(top)
+        ],
+        "top_known_errors": [
+            {"id": item["id"], "name": item["name"], "severity": item["severity"], "count": item["occurrences"]}
+            for item in rendered_findings[:top]
+        ],
+        "timestamp_coverage": {
+            "parsed_timestamps_count": parsed_timestamps,
+            "unknown_timestamps_count": unknown_timestamps,
+        },
+        "filters_used": {
+            "severity": severity_filter.lower() if severity_filter else None,
+            "category": category_filter,
+        },
+        "pattern_catalog_path": str(pattern_catalog_path),
+        "findings": rendered_findings[:top],
+        "findings_total": len(rendered_findings),
+    }
+
+
+def render_top_pairs(items: list[dict[str, Any]], key: str) -> str:
+    if not items:
+        return "None"
+    return ", ".join(f"{item[key]} ({item['count']})" for item in items)
+
+
+def render_text(report: dict[str, Any]) -> str:
+    lines = [
+        "Known Error Matcher",
+        "===================",
+        "",
+        f"Overall status: {report['overall_status']}",
+        "Known error pattern matches require operator review; logs alone do not prove root cause.",
+        "",
+    ]
+
+    if report["findings"]:
+        for finding in report["findings"]:
+            lines.extend(
+                [
+                    f"[{finding['severity']}] {finding['id']} - {finding['name']}",
+                    f"Category: {finding['category']}",
+                    f"Occurrences: {finding['occurrences']}",
+                    f"First seen: {finding['first_seen']}",
+                    f"Last seen: {finding['last_seen']}",
+                    f"Runbook: {finding['runbook'] or 'None'}",
+                    f"Description: {finding['description'] or 'None'}",
+                    "Samples:",
+                ]
+            )
+            if finding["samples"]:
+                for sample in finding["samples"]:
+                    lines.append(f"  - {sample}")
+            else:
+                lines.append("  - None")
+            lines.append("")
+    else:
+        lines.extend(["No known error patterns matched for the selected filters.", ""])
+
+    lines.extend(render_summary_lines(report, markdown=False))
+    return "\n".join(lines)
+
+
+def render_summary_lines(report: dict[str, Any], markdown: bool) -> list[str]:
+    if markdown:
+        return [
+            "## Operational Summary",
+            "",
+            f"- Overall status: `{report['overall_status']}`",
+            f"- Total lines scanned: `{report['total_lines_scanned']}`",
+            f"- Known error matches: `{report['total_known_error_matches']}`",
+            f"- Matched known error patterns: `{report['matched_pattern_count']}`",
+            f"- Critical matched patterns: `{report['critical_matched_pattern_count']}`",
+            f"- Warning matched patterns: `{report['warning_matched_pattern_count']}`",
+            "- Top categories: " + render_top_pairs(report["top_categories"], "category"),
+            "- Top matched known errors: " + render_top_pairs(report["top_known_errors"], "id"),
+            "- Timestamp coverage: "
+            f"parsed=`{report['timestamp_coverage']['parsed_timestamps_count']}`, "
+            f"unknown=`{report['timestamp_coverage']['unknown_timestamps_count']}`",
+            "- Filters used: "
+            f"severity=`{report['filters_used']['severity'] or 'None'}`, "
+            f"category=`{report['filters_used']['category'] or 'None'}`",
+            f"- Pattern catalog path: `{report['pattern_catalog_path']}`",
+        ]
+    return [
+        "Operational Summary",
+        "-------------------",
+        f"Overall status: {report['overall_status']}",
+        f"Total lines scanned: {report['total_lines_scanned']}",
+        f"Known error matches: {report['total_known_error_matches']}",
+        f"Matched known error patterns: {report['matched_pattern_count']}",
+        f"Critical matched patterns: {report['critical_matched_pattern_count']}",
+        f"Warning matched patterns: {report['warning_matched_pattern_count']}",
+        "Top categories: " + render_top_pairs(report["top_categories"], "category"),
+        "Top matched known errors: " + render_top_pairs(report["top_known_errors"], "id"),
+        "Timestamp coverage: "
+        f"parsed={report['timestamp_coverage']['parsed_timestamps_count']}, "
+        f"unknown={report['timestamp_coverage']['unknown_timestamps_count']}",
+        "Filters used: "
+        f"severity={report['filters_used']['severity'] or 'None'}, "
+        f"category={report['filters_used']['category'] or 'None'}",
+        f"Pattern catalog path: {report['pattern_catalog_path']}",
+    ]
+
+
+def render_markdown(report: dict[str, Any]) -> str:
+    lines = [
+        "# Known Error Matcher Report",
+        "",
+        f"- Overall status: `{report['overall_status']}`",
+        "- Known error pattern matches require operator review; logs alone do not prove root cause.",
+        "",
+    ]
+
+    if report["findings"]:
+        lines.extend(["## Matched Known Errors", ""])
+        for finding in report["findings"]:
+            lines.extend(
+                [
+                    f"### [{finding['severity']}] {finding['id']} - {finding['name']}",
+                    "",
+                    f"- Category: `{finding['category']}`",
+                    f"- Occurrences: `{finding['occurrences']}`",
+                    f"- First seen: `{finding['first_seen']}`",
+                    f"- Last seen: `{finding['last_seen']}`",
+                    f"- Runbook: `{finding['runbook'] or 'None'}`",
+                    f"- Description: {finding['description'] or 'None'}",
+                    "- Samples:",
+                ]
+            )
+            if finding["samples"]:
+                for sample in finding["samples"]:
+                    lines.append(f"  - `{sample}`")
+            else:
+                lines.append("  - `None`")
+            lines.append("")
+    else:
+        lines.extend(["## Matched Known Errors", "", "No known error patterns matched for the selected filters.", ""])
+
+    lines.extend(render_summary_lines(report, markdown=True))
+    return "\n".join(lines)
+
+
+def render_json(report: dict[str, Any]) -> str:
+    return json.dumps(report, indent=2)
+
+
+def write_output(text: str, output_path: str | None, protected_inputs: list[Path]) -> None:
+    if output_path is None:
+        print(text)
+        return
+
+    destination = Path(output_path)
+    try:
+        destination_resolved = destination.resolve()
+        for input_path in protected_inputs:
+            if input_path.resolve() == destination_resolved:
+                raise OSError("output path must not overwrite an input file")
+    except FileNotFoundError as exc:
+        raise OSError(f"unable to resolve output path {destination}: {exc}") from exc
+
+    try:
+        destination.write_text(text + ("\n" if not text.endswith("\n") else ""), encoding="utf-8")
+    except OSError as exc:
+        raise OSError(f"unable to write report to {destination}: {exc}") from exc
+
+
+def determine_exit_code(report: dict[str, Any]) -> int:
+    if report["total_known_error_matches"] > 0:
+        return EXIT_FINDINGS
+    return EXIT_OK
+
+
+def main() -> int:
+    parser = build_parser()
+    args = parser.parse_args()
+
+    try:
+        log_path = Path(args.file)
+        pattern_path = Path(args.patterns)
+        log_text = read_text_file(log_path, "log file")
+        lines = log_text.splitlines()
+        patterns = load_pattern_catalog(pattern_path, args.ignore_case)
+        severity_filter = args.severity.upper() if args.severity else None
+
+        report = analyze_log(
+            lines=lines,
+            patterns=patterns,
+            severity_filter=severity_filter,
+            category_filter=args.category,
+            top=args.top,
+            max_samples=args.max_samples,
+            pattern_catalog_path=pattern_path,
+        )
+
+        if args.format == "text":
+            rendered = render_text(report)
+        elif args.format == "markdown":
+            rendered = render_markdown(report)
+        else:
+            rendered = render_json(report)
+
+        write_output(rendered, args.output, [log_path, pattern_path])
+        return determine_exit_code(report)
+    except (OSError, ValueError) as exc:
+        print(f"ERROR: {exc}", file=sys.stderr)
+        return EXIT_INVALID
+    except Exception as exc:  # pragma: no cover - defensive operational fallback
+        print(f"ERROR: unexpected runtime failure: {exc}", file=sys.stderr)
+        return EXIT_INVALID
+
+
+if __name__ == "__main__":
+    sys.exit(main())
@@ -0,0 +1,220 @@
+{
+  "patterns": [
+    {
+      "id": "disk_full",
+      "name": "Disk full",
+      "severity": "CRITICAL",
+      "regex": "No space left on device|disk full|filesystem full",
+      "category": "storage",
+      "runbook": "infra-run/scripts/bash/disk-full/README.md",
+      "description": "Filesystem or application failed because free space was exhausted."
+    },
+    {
+      "id": "inode_exhaustion",
+      "name": "Inode exhaustion",
+      "severity": "CRITICAL",
+      "regex": "No space left on device.*inode|inode.*exhaust|free inodes.*0",
+      "category": "storage",
+      "runbook": "infra-run/scripts/bash/disk-full/README.md",
+      "description": "Filesystem may have free blocks but too few available inodes."
+    },
+    {
+      "id": "read_only_filesystem",
+      "name": "Read-only filesystem",
+      "severity": "CRITICAL",
+      "regex": "read-only file system|read-only filesystem|Remounting filesystem read-only",
+      "category": "storage",
+      "runbook": "infra-run/runbooks/incidents/read-only-filesystem.md",
+      "description": "Filesystem writes failed because the mount was read-only or remounted read-only."
+    },
+    {
+      "id": "io_error",
+      "name": "I/O error",
+      "severity": "CRITICAL",
+      "regex": "\\bI/O error\\b|Buffer I/O error|blk_update_request.*I/O error",
+      "category": "storage",
+      "runbook": "infra-run/runbooks/incidents/storage-io-error.md",
+      "description": "Kernel or application reported storage I/O errors that require device and filesystem review."
+    },
+    {
+      "id": "out_of_memory",
+      "name": "Out of memory",
+      "severity": "CRITICAL",
+      "regex": "\\bout of memory\\b|Cannot allocate memory",
+      "category": "memory",
+      "runbook": "infra-run/runbooks/incidents/memory-pressure.md",
+      "description": "Process or host reported memory exhaustion symptoms."
+    },
+    {
+      "id": "oom_killer",
+      "name": "OOM killer invoked",
+      "severity": "CRITICAL",
+      "regex": "oom-killer|Killed process \\d+|Out of memory: Killed process",
+      "category": "memory",
+      "runbook": "infra-run/runbooks/incidents/oom-killer.md",
+      "description": "Kernel OOM killer activity was logged and affected processes should be reviewed."
+    },
+    {
+      "id": "segmentation_fault",
+      "name": "Segmentation fault",
+      "severity": "CRITICAL",
+      "regex": "segmentation fault|segfault",
+      "category": "process",
+      "runbook": "infra-run/runbooks/incidents/process-crash.md",
+      "description": "A process crash pattern was logged."
+    },
+    {
+      "id": "connection_refused",
+      "name": "Connection refused",
+      "severity": "WARNING",
+      "regex": "connection refused|ConnectException: Connection refused",
+      "category": "network",
+      "runbook": "infra-run/scripts/bash/os-healthcheck/README.md",
+      "description": "Client connection attempts were refused by the destination service or host."
+    },
+    {
+      "id": "connection_reset",
+      "name": "Connection reset",
+      "severity": "WARNING",
+      "regex": "connection reset|Connection reset by peer",
+      "category": "network",
+      "runbook": "infra-run/scripts/bash/os-healthcheck/README.md",
+      "description": "Established network connections were reset and require endpoint review."
+    },
+    {
+      "id": "timeout",
+      "name": "Timeout",
+      "severity": "WARNING",
+      "regex": "\\btimeout\\b|timed out|TimeoutException|SocketTimeoutException",
+      "category": "network",
+      "runbook": "infra-run/scripts/bash/os-healthcheck/README.md",
+      "description": "Operation timed out and may require network, service, or dependency review."
+    },
+    {
+      "id": "dns_resolution_failure",
+      "name": "DNS resolution failure",
+      "severity": "WARNING",
+      "regex": "Temporary failure in name resolution|Name or service not known|NXDOMAIN|UnknownHostException|could not resolve host",
+      "category": "network",
+      "runbook": "infra-run/runbooks/incidents/dns-resolution.md",
+      "description": "Name resolution failed for a host or service dependency."
+    },
+    {
+      "id": "certificate_expired",
+      "name": "Certificate expired",
+      "severity": "CRITICAL",
+      "regex": "certificate expired|CertificateExpiredException|certificate has expired|notAfter",
+      "category": "tls",
+      "runbook": "infra-run/runbooks/incidents/certificate-expired.md",
+      "description": "TLS certificate expiry was logged and certificate state should be reviewed."
+    },
+    {
+      "id": "tls_handshake_failed",
+      "name": "TLS handshake failed",
+      "severity": "WARNING",
+      "regex": "TLS handshake failed|SSL handshake failed|handshake_failure",
+      "category": "tls",
+      "runbook": "infra-run/runbooks/incidents/tls-handshake.md",
+      "description": "TLS handshake failed and may require certificate, protocol, or trust-store review."
+    },
+    {
+      "id": "authentication_failure",
+      "name": "Authentication failure",
+      "severity": "WARNING",
+      "regex": "authentication failure|Failed password|authentication failed",
+      "category": "security",
+      "runbook": "infra-run/scripts/python/auth-log-audit/README.md",
+      "description": "Authentication failures were logged and may require access review."
+    },
+    {
+      "id": "permission_denied",
+      "name": "Permission denied",
+      "severity": "WARNING",
+      "regex": "permission denied|access denied|denied by policy",
+      "category": "security",
+      "runbook": "infra-run/runbooks/incidents/permission-denied.md",
+      "description": "Access or permission denial was logged."
+    },
+    {
+      "id": "invalid_user",
+      "name": "Invalid user",
+      "severity": "WARNING",
+      "regex": "Invalid user|invalid user|user unknown|User not known",
+      "category": "security",
+      "runbook": "infra-run/scripts/python/auth-log-audit/README.md",
+      "description": "Log contains attempts involving invalid or unknown users."
+    },
+    {
+      "id": "java_out_of_memory",
+      "name": "Java OutOfMemoryError",
+      "severity": "CRITICAL",
+      "regex": "OutOfMemoryError|Java heap space|GC overhead limit exceeded",
+      "category": "application_jvm",
+      "runbook": "infra-run/scripts/python/jvm-log-analyzer/README.md",
+      "description": "Java process logged memory exhaustion symptoms."
+    },
+    {
+      "id": "ssl_handshake_exception",
+      "name": "SSLHandshakeException",
+      "severity": "CRITICAL",
+      "regex": "SSLHandshakeException|javax\\.net\\.ssl\\.SSLHandshakeException",
+      "category": "application_jvm",
+      "runbook": "infra-run/scripts/python/jvm-log-analyzer/README.md",
+      "description": "Java TLS handshake exception was logged."
+    },
+    {
+      "id": "database_unavailable",
+      "name": "Database unavailable",
+      "severity": "CRITICAL",
+      "regex": "database unavailable|database is unavailable|SQLRecoverableException|CommunicationsException|connection pool exhausted",
+      "category": "application",
+      "runbook": "infra-run/scripts/python/jvm-log-analyzer/README.md",
+      "description": "Application logged unavailable database or database connectivity symptoms."
+    },
+    {
+      "id": "http_500",
+      "name": "HTTP 500",
+      "severity": "CRITICAL",
+      "regex": "\\bHTTP\\s+500\\b|\\bstatus=500\\b|\\s500\\s",
+      "category": "application",
+      "runbook": "infra-run/runbooks/incidents/http-5xx.md",
+      "description": "Application or proxy logged HTTP 500 responses."
+    },
+    {
+      "id": "http_503",
+      "name": "HTTP 503",
+      "severity": "CRITICAL",
+      "regex": "\\bHTTP\\s+503\\b|\\bstatus=503\\b|\\s503\\s|Service Unavailable",
+      "category": "application",
+      "runbook": "infra-run/runbooks/incidents/http-5xx.md",
+      "description": "Application or proxy logged HTTP 503 service unavailable responses."
+    },
+    {
+      "id": "service_failed",
+      "name": "Systemd service failed",
+      "severity": "CRITICAL",
+      "regex": "Failed to start .*\\.service|entered failed state|Unit .*\\.service failed|Main process exited.*status=",
+      "category": "systemd",
+      "runbook": "infra-run/scripts/python/journal-analyzer/README.md",
+      "description": "Systemd logged a failed service or failed service start."
+    },
+    {
+      "id": "dependency_failed",
+      "name": "Systemd dependency failed",
+      "severity": "CRITICAL",
+      "regex": "Dependency failed for|dependency failed",
+      "category": "systemd",
+      "runbook": "infra-run/scripts/python/journal-analyzer/README.md",
+      "description": "Systemd logged a unit dependency failure."
+    },
+    {
+      "id": "start_request_repeated",
+      "name": "Start request repeated too quickly",
+      "severity": "WARNING",
+      "regex": "Start request repeated too quickly|start request repeated too quickly",
+      "category": "systemd",
+      "runbook": "infra-run/scripts/python/journal-analyzer/README.md",
+      "description": "Systemd throttled service restarts after repeated start failures."
+    }
+  ]
+}