This commit is contained in:
@@ -0,0 +1,20 @@
|
||||
WARNING: Certificate for app.example.com:443 expires in 18 day(s)
|
||||
|
||||
Certificate details:
|
||||
Subject: CN = app.example.com
|
||||
Issuer: C = US, O = Example CA, CN = Example Intermediate CA
|
||||
notBefore: Apr 11 00:00:00 2026 GMT
|
||||
notAfter: May 29 23:59:59 2026 GMT
|
||||
SAN/CN: DNS:app.example.com, DNS:api.example.com
|
||||
|
||||
Evidence:
|
||||
Target: app.example.com:443
|
||||
SNI: app.example.com
|
||||
Thresholds: warning=30 days critical=7 days
|
||||
|
||||
Recommended next steps:
|
||||
- Renew certificate before the operational threshold is breached
|
||||
- Check the full chain and intermediate certificates
|
||||
- Check the load balancer, ingress, or reverse proxy serving this certificate
|
||||
- Verify monitoring threshold and alert ownership
|
||||
- Attach this output to incident or change ticket
|
||||
@@ -0,0 +1,23 @@
|
||||
OK: DNS=OK ping=OK tcp_443=OK
|
||||
|
||||
DNS result:
|
||||
93.184.216.34 example.com
|
||||
|
||||
Ping result:
|
||||
3 packets transmitted, 3 received, 0% packet loss, time 2002ms
|
||||
|
||||
TCP port result:
|
||||
OK: TCP connection to example.com:443 succeeded
|
||||
|
||||
Local network hints:
|
||||
default via 10.0.2.1 dev eth0 proto dhcp src 10.0.2.15
|
||||
|
||||
Evidence:
|
||||
Host: example.com count=3 timeout=3s port=443
|
||||
|
||||
Recommended next steps:
|
||||
- Verify the DNS record and resolver path
|
||||
- Check firewall, routing, security group, or proxy policy
|
||||
- Compare results from another host or network segment
|
||||
- Check application endpoint health after network reachability is confirmed
|
||||
- Attach this output to incident ticket
|
||||
@@ -0,0 +1,26 @@
|
||||
CRITICAL: Found 73 failed SSH login attempt(s) for requested window
|
||||
|
||||
Top source IPs:
|
||||
52 203.0.113.44
|
||||
12 198.51.100.20
|
||||
9 192.0.2.10
|
||||
|
||||
Top attempted users:
|
||||
31 admin
|
||||
24 oracle
|
||||
18 root
|
||||
|
||||
Sample recent lines:
|
||||
May 11 10:01:02 host sshd[2201]: Failed password for invalid user admin from 203.0.113.44 port 51240 ssh2
|
||||
May 11 10:01:06 host sshd[2205]: Invalid user oracle from 198.51.100.20
|
||||
|
||||
Evidence:
|
||||
Thresholds: warning=20 critical=50 since="1 hour ago"
|
||||
Log source: journalctl
|
||||
|
||||
Recommended next steps:
|
||||
- Verify source IPs against expected scanners, admins, or automation
|
||||
- Check firewall, fail2ban, or security tooling state
|
||||
- Confirm whether the attempts are expected for this host
|
||||
- Review successful logins too, not only failures
|
||||
- Attach this output to incident ticket
|
||||
@@ -0,0 +1,16 @@
|
||||
CRITICAL: Found 1 read-only filesystem(s)
|
||||
|
||||
Read-only filesystems:
|
||||
MOUNT_POINT SOURCE FSTYPE OPTIONS
|
||||
/data /dev/mapper/vg_data-lv_data xfs ro,relatime,seclabel,attr2,inode64
|
||||
|
||||
Evidence:
|
||||
include_system=0
|
||||
Collector: findmnt
|
||||
|
||||
Recommended next steps:
|
||||
- Check dmesg or journal logs for I/O errors and filesystem remount events
|
||||
- Check storage path, multipath, SAN, cloud volume, or underlying disk health
|
||||
- Check filesystem health with the platform-approved procedure
|
||||
- Do not remount read-write before understanding the cause
|
||||
- Attach this output to incident ticket
|
||||
@@ -0,0 +1,22 @@
|
||||
WARNING: 1-minute load is 7.82 across 8 CPU(s) (97% of CPU count)
|
||||
|
||||
Load average:
|
||||
1m=7.82 5m=6.91 15m=5.40
|
||||
|
||||
CPU count:
|
||||
8
|
||||
|
||||
Top CPU processes:
|
||||
PID PPID USER %CPU %MEM COMMAND COMMAND
|
||||
2314 1 app 245 12.1 java java -jar order-api.jar
|
||||
991 1 root 38 0.4 backup-agent backup-agent --scan
|
||||
|
||||
Evidence:
|
||||
WARNING: load is close to online CPU count; runnable task saturation is possible
|
||||
|
||||
Recommended next steps:
|
||||
- Check process ownership and whether the top process is expected
|
||||
- Check recent deployments, cron jobs, batch jobs, or maintenance activity
|
||||
- Review logs for the top CPU-consuming process
|
||||
- Compare with longer trend data from monitoring before taking action
|
||||
- Attach this output to the incident ticket
|
||||
@@ -0,0 +1,25 @@
|
||||
WARNING: Memory usage is 84% and swap usage is 12%
|
||||
|
||||
Memory summary:
|
||||
total used free shared buff/cache available
|
||||
Mem: 15934 13386 512 121 2036 2101
|
||||
Swap: 4095 512 3583
|
||||
|
||||
Top memory processes:
|
||||
PID RSS_MB COMMAND
|
||||
1234 2048 java
|
||||
987 812 postgres
|
||||
|
||||
OOM events since 24 hours ago:
|
||||
2026-05-11 08:42:13 kernel: Out of memory: Killed process 1234 (java)
|
||||
|
||||
Evidence:
|
||||
Thresholds: warning=80% critical=90% since="24 hours ago"
|
||||
OOM evidence source: journalctl
|
||||
|
||||
Recommended next steps:
|
||||
- Check application memory trend
|
||||
- Review JVM heap settings if process is Java
|
||||
- Verify swap pressure and paging activity
|
||||
- Confirm whether OOM events align with application impact
|
||||
- Attach this output to incident ticket
|
||||
@@ -0,0 +1,22 @@
|
||||
WARNING: Highest inode usage is 87%
|
||||
|
||||
Filesystems above threshold:
|
||||
/dev/mapper/vg_var-lv_var 1310720 1140326 170394 87% /var
|
||||
|
||||
Inode usage table:
|
||||
Filesystem Inodes IUsed IFree IUse% Mounted on
|
||||
/dev/mapper/vg_root-lv_root 524288 91300 432988 18% /
|
||||
/dev/mapper/vg_var-lv_var 1310720 1140326 170394 87% /var
|
||||
|
||||
Top affected mount points:
|
||||
87% /var /dev/mapper/vg_var-lv_var inodes=1310720 used=1140326 free=170394
|
||||
|
||||
Evidence:
|
||||
Thresholds: warning=80% critical=90%
|
||||
|
||||
Recommended next steps:
|
||||
- Find directories with many small files under affected mount points
|
||||
- Check logs, cache, spool, session, and temporary directories
|
||||
- Avoid deleting blindly; confirm ownership and application impact first
|
||||
- Confirm whether inode exhaustion is causing write or deploy failures
|
||||
- Attach this output to incident ticket
|
||||
@@ -0,0 +1,30 @@
|
||||
OK: JVM diagnostics collected for PID 1234
|
||||
|
||||
Detected JVM process:
|
||||
PID USER RSS_MB CPU COMMAND
|
||||
1234 app 2048 42.1 java -Xms2g -Xmx2g -jar order-api.jar
|
||||
Thread count: 188
|
||||
|
||||
Heap and JVM evidence:
|
||||
|
||||
[jcmd VM.flags]
|
||||
1234:
|
||||
-XX:InitialHeapSize=2147483648 -XX:MaxHeapSize=2147483648
|
||||
|
||||
[jcmd GC.heap_info]
|
||||
garbage-first heap total 2097152K, used 1521000K
|
||||
|
||||
[jcmd Thread.print summary]
|
||||
102 java.lang.Thread.State: WAITING
|
||||
53 java.lang.Thread.State: RUNNABLE
|
||||
33 java.lang.Thread.State: TIMED_WAITING
|
||||
|
||||
Evidence:
|
||||
PID=1234 thread_count=188 top=10
|
||||
|
||||
Recommended next steps:
|
||||
- Review GC logs and recent application errors
|
||||
- Check JVM heap sizing against container or host memory limits
|
||||
- Check thread count trend in monitoring before concluding a leak
|
||||
- Capture jstack only if approved by operational process
|
||||
- Attach this output to incident ticket
|
||||
@@ -0,0 +1,23 @@
|
||||
WARNING: Time sync status=yes offset_ms=812
|
||||
|
||||
Time status:
|
||||
System time: 2026-05-11 10:18:01 UTC +0000
|
||||
Timezone: UTC +0000
|
||||
Detected tool: chronyc
|
||||
NTP synchronized: yes
|
||||
Offset ms: 812
|
||||
|
||||
Tool evidence:
|
||||
Reference ID : 203.0.113.10
|
||||
System time : 0.812345 seconds fast of NTP time
|
||||
Last offset : +0.812345 seconds
|
||||
|
||||
Evidence:
|
||||
Thresholds: warning=500ms critical=5000ms
|
||||
|
||||
Recommended next steps:
|
||||
- Verify chrony or ntpd service status and configuration
|
||||
- Check NTP sources and reachability
|
||||
- Check virtualization host time if this is a VM
|
||||
- Avoid restarting time services blindly in production
|
||||
- Attach this output to incident ticket
|
||||
@@ -0,0 +1,27 @@
|
||||
CRITICAL: Service app.service state=failed substate=failed restarts=12
|
||||
|
||||
Service state:
|
||||
app.service - Example application
|
||||
Loaded: loaded (/etc/systemd/system/app.service; enabled)
|
||||
Active: failed (Result: exit-code)
|
||||
|
||||
Systemd properties:
|
||||
Id=app.service
|
||||
ActiveState=failed
|
||||
SubState=failed
|
||||
Result=exit-code
|
||||
NRestarts=12
|
||||
|
||||
Recent start/stop/failure log lines since 1 hour ago:
|
||||
May 11 09:05:01 host systemd[1]: app.service: Main process exited, status=1/FAILURE
|
||||
May 11 09:05:01 host systemd[1]: app.service: Failed with result 'exit-code'.
|
||||
|
||||
Evidence:
|
||||
Thresholds: warning=3 restarts critical=10 restarts since="1 hour ago"
|
||||
|
||||
Recommended next steps:
|
||||
- Inspect the unit file and drop-in overrides
|
||||
- Review application logs around the restart timestamps
|
||||
- Check dependencies such as network, storage, database, or secrets
|
||||
- Verify recent configuration or package changes
|
||||
- Do not restart blindly; attach this output to the incident ticket
|
||||
Reference in New Issue
Block a user