ssh无法连上dpu host节点
问题信息
- 故障时间:2025-05-06 10:00
- 故障节点:内蒙08多AZ测试 94.106 94.236
- 故障现象:ssh无法连上物理节点,物理节点每次soft
lockup栈都不一样,vmcore中RIP寄存器每次地址都不一样
- 操作系统:ctyunos3 kernel-5.10 90
- cpu: Hygon C86-4G (OPN:7493)
问题分析
/var/lib/systemd/coredump
94.106出现coredump之后计算结点登录不上,重启了计算结点
可以正常登录。
17464949551974
通过日志分析,systemd-hostname、systemd-journal、systemd-logind、systemd-machine、systemd-udevd产生core的原因均是因为看门狗超时,接收SIGABRT信号强制退出的。
17465005414991
94.106 /var/log/messages日志中可以看到,物理机多次报过soft
lockup。
17465026293468
麻烦开启软死锁检测后复现,复现后会有内核coredump生成:
1
| sysctl -w kernel.softlockup_panic=1
|
crash报错分析
从vmcore-dmesg.txt中可以看到,内核panic的原因是softlockup: hung
tasks。每次soft
lockup的栈信息都不一样,vmcore中RIP寄存器每次地址都不一样。
共同点是栈都跟软中断上下文相关。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319
| [root@nm08-az2-compute-hcm3ne-10e8e94e236 crash]
Processing file: /var/crash/127.0.0.1-2025-05-08-13:13:40/vmcore-dmesg.txt /var/crash/127.0.0.1-2025-05-08-13:13:40/vmcore-dmesg.txt:[ 9487.514630] Kernel panic - not syncing: softlockup: hung tasks /var/crash/127.0.0.1-2025-05-08-13:13:40/vmcore-dmesg.txt:[ 9487.521917] CPU: 123 PID: 637 Comm: ksoftirqd/123 Kdump: loaded Tainted: G L 5.10.0-136.12.0.90.ctl3.x86_64 /var/crash/127.0.0.1-2025-05-08-13:13:40/vmcore-dmesg.txt:[ 9487.535305] Hardware name: Suma MH221/62DC24-C, BIOS 08.03.04.71.14 03/14/2025 /var/crash/127.0.0.1-2025-05-08-13:13:40/vmcore-dmesg.txt:[ 9487.544135] Call Trace: /var/crash/127.0.0.1-2025-05-08-13:13:40/vmcore-dmesg.txt:[ 9487.547641] <IRQ> /var/crash/127.0.0.1-2025-05-08-13:13:40/vmcore-dmesg.txt:[ 9487.550656] dump_stack+0x57/0x6e /var/crash/127.0.0.1-2025-05-08-13:13:40/vmcore-dmesg.txt:[ 9487.555136] panic+0xfb/0x2dc /var/crash/127.0.0.1-2025-05-08-13:13:40/vmcore-dmesg.txt:[ 9487.559227] watchdog_timer_fn.cold+0xc/0x16 /var/crash/127.0.0.1-2025-05-08-13:13:40/vmcore-dmesg.txt:[ 9487.564776] __run_hrtimer+0x5e/0x190 /var/crash/127.0.0.1-2025-05-08-13:13:40/vmcore-dmesg.txt:[ 9487.569643] __hrtimer_run_queues+0x81/0xf0 /var/crash/127.0.0.1-2025-05-08-13:13:40/vmcore-dmesg.txt:[ 9487.575088] hrtimer_interrupt+0x110/0x2c0 /var/crash/127.0.0.1-2025-05-08-13:13:40/vmcore-dmesg.txt:[ 9487.580433] __sysvec_apic_timer_interrupt+0x5f/0xe0 /var/crash/127.0.0.1-2025-05-08-13:13:40/vmcore-dmesg.txt:[ 9487.586755] asm_call_irq_on_stack+0x12/0x20 /var/crash/127.0.0.1-2025-05-08-13:13:40/vmcore-dmesg.txt:[ 9487.592297] </IRQ> /var/crash/127.0.0.1-2025-05-08-13:13:40/vmcore-dmesg.txt:[ 9487.595417] sysvec_apic_timer_interrupt+0x72/0x80 /var/crash/127.0.0.1-2025-05-08-13:13:40/vmcore-dmesg.txt:[ 9487.601544] asm_sysvec_apic_timer_interrupt+0x12/0x20 /var/crash/127.0.0.1-2025-05-08-13:13:40/vmcore-dmesg.txt:[ 9487.608059] RIP: 0010:packet_rcv+0x35e/0x3d0 /var/crash/127.0.0.1-2025-05-08-13:13:40/vmcore-dmesg.txt:[ 9487.613599] Code: 00 0f 84 01 fd ff ff 8b 85 f4 00 00 00 83 f8 01 0f 84 f2 fc ff ff 8b 04 24 4c 89 b5 e8 00 00 00 89 45 70 e9 e0 fc ff ff 3c 04 <0f> 85 4b fd ff ff 0f b7 b5 b4 00 00 00 48 03 b5 e0 00 00 00 48 89 /var/crash/127.0.0.1-2025-05-08-13:13:40/vmcore-dmesg.txt:[ 9487.635335] RSP: 0018:ffffc3e1daa27c30 EFLAGS: 00000293 /var/crash/127.0.0.1-2025-05-08-13:13:40/vmcore-dmesg.txt:[ 9487.641944] RAX: 0000000000000001 RBX: ffffa10c55cf2000 RCX: ffffffffc09125c0 /var/crash/127.0.0.1-2025-05-08-13:13:40/vmcore-dmesg.txt:[ 9487.650677] RDX: ffffa0ec593d2e00 RSI: 000000000000001c RDI: ffffa04ceaf37f00 /var/crash/127.0.0.1-2025-05-08-13:13:40/vmcore-dmesg.txt:[ 9487.659417] RBP: ffffa04ceaf37f00 R08: ffffa04ceaf37f00 R09: ffffc3e1daa27ca0 /var/crash/127.0.0.1-2025-05-08-13:13:40/vmcore-dmesg.txt:[ 9487.668159] R10: 00000000e0ccdeeb R11: 0000000000000001 R12: ffffa0ec593d2800 /var/crash/127.0.0.1-2025-05-08-13:13:40/vmcore-dmesg.txt:[ 9487.676901] R13: ffffa0ec53d90000 R14: ffffa04cdc202652 R15: 000000000000001c /var/crash/127.0.0.1-2025-05-08-13:13:40/vmcore-dmesg.txt:[ 9487.685647] __netif_receive_skb_list_core+0x27b/0x2e0 /var/crash/127.0.0.1-2025-05-08-13:13:40/vmcore-dmesg.txt:[ 9487.692160] __netif_receive_skb_list+0xfd/0x190 /var/crash/127.0.0.1-2025-05-08-13:13:40/vmcore-dmesg.txt:[ 9487.698081] netif_receive_skb_list_internal+0xfc/0x1e0 /var/crash/127.0.0.1-2025-05-08-13:13:40/vmcore-dmesg.txt:[ 9487.704681] napi_complete_done+0x6f/0x190 /var/crash/127.0.0.1-2025-05-08-13:13:40/vmcore-dmesg.txt:[ 9487.710032] virtnet_poll+0x14e/0x210 [virtio_net] /var/crash/127.0.0.1-2025-05-08-13:13:40/vmcore-dmesg.txt:[ 9487.716158] napi_poll+0x95/0x1c0 /var/crash/127.0.0.1-2025-05-08-13:13:40/vmcore-dmesg.txt:[ 9487.720633] net_rx_action+0xaa/0x1b0 /var/crash/127.0.0.1-2025-05-08-13:13:40/vmcore-dmesg.txt:[ 9487.725491] __do_softirq+0xc4/0x280 /var/crash/127.0.0.1-2025-05-08-13:13:40/vmcore-dmesg.txt:[ 9487.730260] run_ksoftirqd+0x1e/0x30 /var/crash/127.0.0.1-2025-05-08-13:13:40/vmcore-dmesg.txt:[ 9487.735026] smpboot_thread_fn+0xc5/0x160 /var/crash/127.0.0.1-2025-05-08-13:13:40/vmcore-dmesg.txt:[ 9487.740277] ? sort_range+0x20/0x20 /var/crash/127.0.0.1-2025-05-08-13:13:40/vmcore-dmesg.txt:[ 9487.744940] kthread+0xfe/0x140 /var/crash/127.0.0.1-2025-05-08-13:13:40/vmcore-dmesg.txt:[ 9487.749220] ? kthread_park+0x90/0x90 /var/crash/127.0.0.1-2025-05-08-13:13:40/vmcore-dmesg.txt:[ 9487.754083] ret_from_fork+0x22/0x30 /var/crash/127.0.0.1-2025-05-08-13:13:40/vmcore-dmesg.txt:[ 9487.761662] kexec: Bye!
Processing file: /var/crash/127.0.0.1-2025-05-08-16:07:16/vmcore-dmesg.txt /var/crash/127.0.0.1-2025-05-08-16:07:16/vmcore-dmesg.txt:[ 1967.954987] Kernel panic - not syncing: softlockup: hung tasks /var/crash/127.0.0.1-2025-05-08-16:07:16/vmcore-dmesg.txt:[ 1967.962276] CPU: 5 PID: 40 Comm: ksoftirqd/5 Kdump: loaded Tainted: G L 5.10.0-136.12.0.90.ctl3.x86_64 /var/crash/127.0.0.1-2025-05-08-16:07:16/vmcore-dmesg.txt:[ 1967.975187] Hardware name: Suma MH221/62DC24-C, BIOS 08.03.04.71.14 03/14/2025 /var/crash/127.0.0.1-2025-05-08-16:07:16/vmcore-dmesg.txt:[ 1967.984024] Call Trace: /var/crash/127.0.0.1-2025-05-08-16:07:16/vmcore-dmesg.txt:[ 1967.987532] <IRQ> /var/crash/127.0.0.1-2025-05-08-16:07:16/vmcore-dmesg.txt:[ 1967.990554] dump_stack+0x57/0x6e /var/crash/127.0.0.1-2025-05-08-16:07:16/vmcore-dmesg.txt:[ 1967.995034] panic+0xfb/0x2dc /var/crash/127.0.0.1-2025-05-08-16:07:16/vmcore-dmesg.txt:[ 1967.999127] watchdog_timer_fn.cold+0xc/0x16 /var/crash/127.0.0.1-2025-05-08-16:07:16/vmcore-dmesg.txt:[ 1968.004674] __run_hrtimer+0x5e/0x190 /var/crash/127.0.0.1-2025-05-08-16:07:16/vmcore-dmesg.txt:[ 1968.009539] __hrtimer_run_queues+0x81/0xf0 /var/crash/127.0.0.1-2025-05-08-16:07:16/vmcore-dmesg.txt:[ 1968.014985] hrtimer_interrupt+0x110/0x2c0 /var/crash/127.0.0.1-2025-05-08-16:07:16/vmcore-dmesg.txt:[ 1968.020336] ? handle_irq_event+0x76/0xc0 /var/crash/127.0.0.1-2025-05-08-16:07:16/vmcore-dmesg.txt:[ 1968.025595] __sysvec_apic_timer_interrupt+0x5f/0xe0 /var/crash/127.0.0.1-2025-05-08-16:07:16/vmcore-dmesg.txt:[ 1968.031917] sysvec_apic_timer_interrupt+0x31/0x80 /var/crash/127.0.0.1-2025-05-08-16:07:16/vmcore-dmesg.txt:[ 1968.038040] asm_sysvec_apic_timer_interrupt+0x12/0x20 /var/crash/127.0.0.1-2025-05-08-16:07:16/vmcore-dmesg.txt:[ 1968.044544] RIP: 0010:_raw_spin_trylock+0x14/0x30 /var/crash/127.0.0.1-2025-05-08-16:07:16/vmcore-dmesg.txt:[ 1968.050571] Code: 84 00 00 00 00 00 0f 1f 44 00 00 c6 07 00 56 9d e9 01 67 32 00 90 0f 1f 44 00 00 8b 07 85 c0 75 15 ba 01 00 00 00 f0 0f b1 17 <75> 0a b8 01 00 00 00 e9 e0 66 32 00 31 c0 e9 d9 66 32 00 66 0f 1f /var/crash/127.0.0.1-2025-05-08-16:07:16/vmcore-dmesg.txt:[ 1968.072296] RSP: 0018:ffffbcb00057cf30 EFLAGS: 00000246 /var/crash/127.0.0.1-2025-05-08-16:07:16/vmcore-dmesg.txt:[ 1968.078906] RAX: 0000000000000000 RBX: 000000010018d79c RCX: 00000000000006ff /var/crash/127.0.0.1-2025-05-08-16:07:16/vmcore-dmesg.txt:[ 1968.087648] RDX: 0000000000000001 RSI: 0000000000000001 RDI: ffffffff9995ac54 /var/crash/127.0.0.1-2025-05-08-16:07:16/vmcore-dmesg.txt:[ 1968.096381] RBP: ffffa02a860b5540 R08: 0000000000000000 R09: 0000000000000000 /var/crash/127.0.0.1-2025-05-08-16:07:16/vmcore-dmesg.txt:[ 1968.105115] R10: 00000000ffffffff R11: 0000000000000000 R12: 000000000006a5ab /var/crash/127.0.0.1-2025-05-08-16:07:16/vmcore-dmesg.txt:[ 1968.113854] R13: 0000000000000001 R14: 0000000000000001 R15: ffffa00b9980b400 /var/crash/127.0.0.1-2025-05-08-16:07:16/vmcore-dmesg.txt:[ 1968.122601] rebalance_domains+0x2a6/0x3b0 /var/crash/127.0.0.1-2025-05-08-16:07:16/vmcore-dmesg.txt:[ 1968.127952] __do_softirq+0xc4/0x280 /var/crash/127.0.0.1-2025-05-08-16:07:16/vmcore-dmesg.txt:[ 1968.132720] asm_call_irq_on_stack+0x12/0x20 /var/crash/127.0.0.1-2025-05-08-16:07:16/vmcore-dmesg.txt:[ 1968.138261] </IRQ> /var/crash/127.0.0.1-2025-05-08-16:07:16/vmcore-dmesg.txt:[ 1968.141382] do_softirq_own_stack+0x37/0x50 /var/crash/127.0.0.1-2025-05-08-16:07:16/vmcore-dmesg.txt:[ 1968.146831] irq_exit_rcu+0xbe/0x100 /var/crash/127.0.0.1-2025-05-08-16:07:16/vmcore-dmesg.txt:[ 1968.151599] common_interrupt+0x74/0x130 /var/crash/127.0.0.1-2025-05-08-16:07:16/vmcore-dmesg.txt:[ 1968.156758] asm_common_interrupt+0x1e/0x40 /var/crash/127.0.0.1-2025-05-08-16:07:16/vmcore-dmesg.txt:[ 1968.162205] RIP: 0010:smpboot_thread_fn+0x8b/0x160 /var/crash/127.0.0.1-2025-05-08-16:07:16/vmcore-dmesg.txt:[ 1968.168329] Code: 75 1e 8b 7d 00 65 8b 15 83 38 b0 68 39 d7 0f 85 e4 00 00 00 e8 b6 c2 ce 00 c7 45 04 02 00 00 00 e8 4a 76 ff ff eb 95 8b 7d 00 <65> 8b 05 5e 38 b0 68 39 c7 75 6e 8b 45 04 85 c0 74 2d 83 f8 02 74 /var/crash/127.0.0.1-2025-05-08-16:07:16/vmcore-dmesg.txt:[ 1968.190065] RSP: 0018:ffffbcb0195b3ef0 EFLAGS: 00000246 /var/crash/127.0.0.1-2025-05-08-16:07:16/vmcore-dmesg.txt:[ 1968.196674] RAX: 0000000000000000 RBX: ffffa00b98f4c000 RCX: 0000000000000000 /var/crash/127.0.0.1-2025-05-08-16:07:16/vmcore-dmesg.txt:[ 1968.205416] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000005 /var/crash/127.0.0.1-2025-05-08-16:07:16/vmcore-dmesg.txt:[ 1968.214158] RBP: ffffa00b98d2d9e0 R08: ffffa00bdabb80e8 R09: ffffa00bdabb80e8 /var/crash/127.0.0.1-2025-05-08-16:07:16/vmcore-dmesg.txt:[ 1968.222899] R10: 0000000000000308 R11: 00000000e0ccdeeb R12: ffffffff98c4c300 /var/crash/127.0.0.1-2025-05-08-16:07:16/vmcore-dmesg.txt:[ 1968.231641] R13: ffffbcb000147cb0 R14: ffffa00b98d2d9e0 R15: ffffa00b98f4c000 /var/crash/127.0.0.1-2025-05-08-16:07:16/vmcore-dmesg.txt:[ 1968.240381] ? sort_range+0x20/0x20 /var/crash/127.0.0.1-2025-05-08-16:07:16/vmcore-dmesg.txt:[ 1968.245052] kthread+0xfe/0x140 /var/crash/127.0.0.1-2025-05-08-16:07:16/vmcore-dmesg.txt:[ 1968.249336] ? kthread_park+0x90/0x90 /var/crash/127.0.0.1-2025-05-08-16:07:16/vmcore-dmesg.txt:[ 1968.254200] ret_from_fork+0x22/0x30 /var/crash/127.0.0.1-2025-05-08-16:07:16/vmcore-dmesg.txt:[ 1968.261766] kexec: Bye!
Processing file: /var/crash/127.0.0.1-2025-05-08-13:34:06/vmcore-dmesg.txt /var/crash/127.0.0.1-2025-05-08-13:34:06/vmcore-dmesg.txt:[ 967.047867] Kernel panic - not syncing: softlockup: hung tasks /var/crash/127.0.0.1-2025-05-08-13:34:06/vmcore-dmesg.txt:[ 967.055158] CPU: 119 PID: 617 Comm: ksoftirqd/119 Kdump: loaded Tainted: G L 5.10.0-136.12.0.90.ctl3.x86_64 /var/crash/127.0.0.1-2025-05-08-13:34:06/vmcore-dmesg.txt:[ 967.068554] Hardware name: Suma MH221/62DC24-C, BIOS 08.03.04.71.14 03/14/2025 /var/crash/127.0.0.1-2025-05-08-13:34:06/vmcore-dmesg.txt:[ 967.077383] Call Trace: /var/crash/127.0.0.1-2025-05-08-13:34:06/vmcore-dmesg.txt:[ 967.080892] <IRQ> /var/crash/127.0.0.1-2025-05-08-13:34:06/vmcore-dmesg.txt:[ 967.083921] dump_stack+0x57/0x6e /var/crash/127.0.0.1-2025-05-08-13:34:06/vmcore-dmesg.txt:[ 967.088401] panic+0xfb/0x2dc /var/crash/127.0.0.1-2025-05-08-13:34:06/vmcore-dmesg.txt:[ 967.092495] watchdog_timer_fn.cold+0xc/0x16 /var/crash/127.0.0.1-2025-05-08-13:34:06/vmcore-dmesg.txt:[ 967.098042] __run_hrtimer+0x5e/0x190 /var/crash/127.0.0.1-2025-05-08-13:34:06/vmcore-dmesg.txt:[ 967.102907] __hrtimer_run_queues+0x81/0xf0 /var/crash/127.0.0.1-2025-05-08-13:34:06/vmcore-dmesg.txt:[ 967.108354] hrtimer_interrupt+0x110/0x2c0 /var/crash/127.0.0.1-2025-05-08-13:34:06/vmcore-dmesg.txt:[ 967.113710] __sysvec_apic_timer_interrupt+0x5f/0xe0 /var/crash/127.0.0.1-2025-05-08-13:34:06/vmcore-dmesg.txt:[ 967.120035] asm_call_irq_on_stack+0x12/0x20 /var/crash/127.0.0.1-2025-05-08-13:34:06/vmcore-dmesg.txt:[ 967.125582] </IRQ> /var/crash/127.0.0.1-2025-05-08-13:34:06/vmcore-dmesg.txt:[ 967.128701] sysvec_apic_timer_interrupt+0x72/0x80 /var/crash/127.0.0.1-2025-05-08-13:34:06/vmcore-dmesg.txt:[ 967.134826] asm_sysvec_apic_timer_interrupt+0x12/0x20 /var/crash/127.0.0.1-2025-05-08-13:34:06/vmcore-dmesg.txt:[ 967.141342] RIP: 0010:ipt_do_table+0x23c/0x4a0 [ip_tables] /var/crash/127.0.0.1-2025-05-08-13:34:06/vmcore-dmesg.txt:[ 967.148243] Code: 24 28 4c 89 fd 49 89 df 4c 89 f3 49 89 d6 49 8b 46 08 49 8d 56 20 48 89 de 48 89 ef 48 89 54 24 50 48 89 44 24 48 48 8b 40 30 <e8> df 9e 02 d8 84 c0 0f 84 22 01 00 00 41 0f b7 06 49 01 c6 41 0f /var/crash/127.0.0.1-2025-05-08-13:34:06/vmcore-dmesg.txt:[ 967.169977] RSP: 0018:ffff9c06da9777f8 EFLAGS: 00000283 /var/crash/127.0.0.1-2025-05-08-13:34:06/vmcore-dmesg.txt:[ 967.176587] RAX: ffffffffc0801000 RBX: ffff9c06da977840 RCX: 0000000000000000 /var/crash/127.0.0.1-2025-05-08-13:34:06/vmcore-dmesg.txt:[ 967.185329] RDX: ffff8de3eab904e8 RSI: ffff9c06da977840 RDI: ffff8de3df166000 /var/crash/127.0.0.1-2025-05-08-13:34:06/vmcore-dmesg.txt:[ 967.194063] RBP: ffff8de3df166000 R08: ffff8ec2cfbccc80 R09: 0000000000000001 /var/crash/127.0.0.1-2025-05-08-13:34:06/vmcore-dmesg.txt:[ 967.202804] R10: 0000000000000000 R11: 0000000000000000 R12: ffffffffc07dc3d8 /var/crash/127.0.0.1-2025-05-08-13:34:06/vmcore-dmesg.txt:[ 967.211547] R13: ffff8ea35ccb1000 R14: ffff8de3eab904c8 R15: ffff8de3eab90458 /var/crash/127.0.0.1-2025-05-08-13:34:06/vmcore-dmesg.txt:[ 967.220298] ? 0xffffffffc0801000 /var/crash/127.0.0.1-2025-05-08-13:34:06/vmcore-dmesg.txt:[ 967.224777] ? ipt_do_table+0x326/0x4a0 [ip_tables] /var/crash/127.0.0.1-2025-05-08-13:34:06/vmcore-dmesg.txt:[ 967.231007] nf_hook_slow+0x3f/0xb0 /var/crash/127.0.0.1-2025-05-08-13:34:06/vmcore-dmesg.txt:[ 967.235679] ip_output+0xdf/0x120 /var/crash/127.0.0.1-2025-05-08-13:34:06/vmcore-dmesg.txt:[ 967.240157] ? __ip_finish_output+0x160/0x160 /var/crash/127.0.0.1-2025-05-08-13:34:06/vmcore-dmesg.txt:[ 967.245797] __ip_queue_xmit+0x196/0x420 /var/crash/127.0.0.1-2025-05-08-13:34:06/vmcore-dmesg.txt:[ 967.250956] __tcp_transmit_skb+0x8d0/0x990 /var/crash/127.0.0.1-2025-05-08-13:34:06/vmcore-dmesg.txt:[ 967.256402] tcp_v4_do_rcv+0x140/0x1f0 /var/crash/127.0.0.1-2025-05-08-13:34:06/vmcore-dmesg.txt:[ 967.261364] tcp_v4_rcv+0xdf7/0x10d0 /var/crash/127.0.0.1-2025-05-08-13:34:06/vmcore-dmesg.txt:[ 967.266131] ip_protocol_deliver_rcu+0xb2/0x190 /var/crash/127.0.0.1-2025-05-08-13:34:06/vmcore-dmesg.txt:[ 967.271963] ip_local_deliver_finish+0x44/0x60 /var/crash/127.0.0.1-2025-05-08-13:34:06/vmcore-dmesg.txt:[ 967.277700] ip_sublist_rcv_finish+0x57/0x70 /var/crash/127.0.0.1-2025-05-08-13:34:06/vmcore-dmesg.txt:[ 967.283242] ip_list_rcv_finish.constprop.0+0x190/0x1c0 /var/crash/127.0.0.1-2025-05-08-13:34:06/vmcore-dmesg.txt:[ 967.289853] ip_list_rcv+0x135/0x160 /var/crash/127.0.0.1-2025-05-08-13:34:06/vmcore-dmesg.txt:[ 967.294627] __netif_receive_skb_list_core+0x2b0/0x2e0 /var/crash/127.0.0.1-2025-05-08-13:34:06/vmcore-dmesg.txt:[ 967.301146] __netif_receive_skb_list+0xfd/0x190 /var/crash/127.0.0.1-2025-05-08-13:34:06/vmcore-dmesg.txt:[ 967.307078] netif_receive_skb_list_internal+0xfc/0x1e0 /var/crash/127.0.0.1-2025-05-08-13:34:06/vmcore-dmesg.txt:[ 967.313688] gro_normal_one+0x77/0xa0 /var/crash/127.0.0.1-2025-05-08-13:34:06/vmcore-dmesg.txt:[ 967.318550] napi_gro_receive+0x152/0x190 /var/crash/127.0.0.1-2025-05-08-13:34:06/vmcore-dmesg.txt:[ 967.323808] virtnet_receive+0x85/0x1f0 [virtio_net] /var/crash/127.0.0.1-2025-05-08-13:34:06/vmcore-dmesg.txt:[ 967.330129] virtnet_poll+0x54/0x210 [virtio_net] /var/crash/127.0.0.1-2025-05-08-13:34:06/vmcore-dmesg.txt:[ 967.336158] napi_poll+0x95/0x1c0 /var/crash/127.0.0.1-2025-05-08-13:34:06/vmcore-dmesg.txt:[ 967.340636] net_rx_action+0xaa/0x1b0 /var/crash/127.0.0.1-2025-05-08-13:34:06/vmcore-dmesg.txt:[ 967.345505] __do_softirq+0xc4/0x280 /var/crash/127.0.0.1-2025-05-08-13:34:06/vmcore-dmesg.txt:[ 967.350278] run_ksoftirqd+0x1e/0x30 /var/crash/127.0.0.1-2025-05-08-13:34:06/vmcore-dmesg.txt:[ 967.355048] smpboot_thread_fn+0xc5/0x160 /var/crash/127.0.0.1-2025-05-08-13:34:06/vmcore-dmesg.txt:[ 967.360304] ? sort_range+0x20/0x20 /var/crash/127.0.0.1-2025-05-08-13:34:06/vmcore-dmesg.txt:[ 967.364975] kthread+0xfe/0x140 /var/crash/127.0.0.1-2025-05-08-13:34:06/vmcore-dmesg.txt:[ 967.369258] ? kthread_park+0x90/0x90 /var/crash/127.0.0.1-2025-05-08-13:34:06/vmcore-dmesg.txt:[ 967.374125] ret_from_fork+0x22/0x30 /var/crash/127.0.0.1-2025-05-08-13:34:06/vmcore-dmesg.txt:[ 967.381718] kexec: Bye!
Processing file: /var/crash/127.0.0.1-2025-05-07-23:32:03/vmcore-dmesg.txt /var/crash/127.0.0.1-2025-05-07-23:32:03/vmcore-dmesg.txt:[30618.876250] Kernel panic - not syncing: softlockup: hung tasks /var/crash/127.0.0.1-2025-05-07-23:32:03/vmcore-dmesg.txt:[30618.883545] CPU: 6 PID: 45 Comm: ksoftirqd/6 Kdump: loaded Tainted: G L 5.10.0-136.12.0.90.ctl3.x86_64 /var/crash/127.0.0.1-2025-05-07-23:32:03/vmcore-dmesg.txt:[30618.896463] Hardware name: Suma MH221/62DC24-C, BIOS 08.03.04.71.14 03/14/2025 /var/crash/127.0.0.1-2025-05-07-23:32:03/vmcore-dmesg.txt:[30618.905310] Call Trace: /var/crash/127.0.0.1-2025-05-07-23:32:03/vmcore-dmesg.txt:[30618.908820] <IRQ> /var/crash/127.0.0.1-2025-05-07-23:32:03/vmcore-dmesg.txt:[30618.911852] dump_stack+0x57/0x6e /var/crash/127.0.0.1-2025-05-07-23:32:03/vmcore-dmesg.txt:[30618.916332] panic+0xfb/0x2dc /var/crash/127.0.0.1-2025-05-07-23:32:03/vmcore-dmesg.txt:[30618.920427] watchdog_timer_fn.cold+0xc/0x16 /var/crash/127.0.0.1-2025-05-07-23:32:03/vmcore-dmesg.txt:[30618.925979] __run_hrtimer+0x5e/0x190 /var/crash/127.0.0.1-2025-05-07-23:32:03/vmcore-dmesg.txt:[30618.930847] __hrtimer_run_queues+0x81/0xf0 /var/crash/127.0.0.1-2025-05-07-23:32:03/vmcore-dmesg.txt:[30618.936298] hrtimer_interrupt+0x110/0x2c0 /var/crash/127.0.0.1-2025-05-07-23:32:03/vmcore-dmesg.txt:[30618.941656] __sysvec_apic_timer_interrupt+0x5f/0xe0 /var/crash/127.0.0.1-2025-05-07-23:32:03/vmcore-dmesg.txt:[30618.947983] asm_call_irq_on_stack+0x12/0x20 /var/crash/127.0.0.1-2025-05-07-23:32:03/vmcore-dmesg.txt:[30618.953527] </IRQ> /var/crash/127.0.0.1-2025-05-07-23:32:03/vmcore-dmesg.txt:[30618.956652] sysvec_apic_timer_interrupt+0x72/0x80 /var/crash/127.0.0.1-2025-05-07-23:32:03/vmcore-dmesg.txt:[30618.962781] asm_sysvec_apic_timer_interrupt+0x12/0x20 /var/crash/127.0.0.1-2025-05-07-23:32:03/vmcore-dmesg.txt:[30618.969301] RIP: 0010:expire_timers+0xa3/0x110 /var/crash/127.0.0.1-2025-05-07-23:32:03/vmcore-dmesg.txt:[30618.975040] Code: 74 04 48 89 50 08 48 c7 45 08 00 00 00 00 48 8b 75 18 4c 89 e7 4c 89 75 00 f6 45 22 20 75 9b c6 07 00 0f 1f 40 00 fb 4c 89 ea <48> 89 ef e8 55 fe ff ff 4c 89 e7 e8 2d 2e 95 00 49 c7 44 24 08 00 /var/crash/127.0.0.1-2025-05-07-23:32:03/vmcore-dmesg.txt:[30618.996778] RSP: 0018:ffffab60595dbe00 EFLAGS: 00000246 /var/crash/127.0.0.1-2025-05-07-23:32:03/vmcore-dmesg.txt:[30619.003390] RAX: 0000000000000000 RBX: ffffab60595dbe30 RCX: ffff9bf206124980 /var/crash/127.0.0.1-2025-05-07-23:32:03/vmcore-dmesg.txt:[30619.012134] RDX: 0000000101ce0168 RSI: ffffffffb93947d0 RDI: ffff9bf206124980 /var/crash/127.0.0.1-2025-05-07-23:32:03/vmcore-dmesg.txt:[30619.020876] RBP: ffff9c52a3d4e670 R08: 0000000000000000 R09: 0000000000000002 /var/crash/127.0.0.1-2025-05-07-23:32:03/vmcore-dmesg.txt:[30619.029619] R10: 0000000000000202 R11: ffffab60595dbe38 R12: ffff9bf206124980 /var/crash/127.0.0.1-2025-05-07-23:32:03/vmcore-dmesg.txt:[30619.038361] R13: 0000000101ce0168 R14: dead000000000122 R15: 0000000000000202 /var/crash/127.0.0.1-2025-05-07-23:32:03/vmcore-dmesg.txt:[30619.047111] ? tcp_delack_timer_handler+0x170/0x170 /var/crash/127.0.0.1-2025-05-07-23:32:03/vmcore-dmesg.txt:[30619.053334] run_timer_softirq+0x165/0x1d0 /var/crash/127.0.0.1-2025-05-07-23:32:03/vmcore-dmesg.txt:[30619.058691] ? asm_common_interrupt+0x1e/0x40 /var/crash/127.0.0.1-2025-05-07-23:32:03/vmcore-dmesg.txt:[30619.064329] ? sched_clock+0x5/0x10 /var/crash/127.0.0.1-2025-05-07-23:32:03/vmcore-dmesg.txt:[30619.069003] ? sched_clock_cpu+0xc/0xb0 /var/crash/127.0.0.1-2025-05-07-23:32:03/vmcore-dmesg.txt:[30619.074066] __do_softirq+0xc4/0x280 /var/crash/127.0.0.1-2025-05-07-23:32:03/vmcore-dmesg.txt:[30619.078841] run_ksoftirqd+0x1e/0x30 /var/crash/127.0.0.1-2025-05-07-23:32:03/vmcore-dmesg.txt:[30619.083613] smpboot_thread_fn+0xc5/0x160 /var/crash/127.0.0.1-2025-05-07-23:32:03/vmcore-dmesg.txt:[30619.088867] ? sort_range+0x20/0x20 /var/crash/127.0.0.1-2025-05-07-23:32:03/vmcore-dmesg.txt:[30619.093540] kthread+0xfe/0x140 /var/crash/127.0.0.1-2025-05-07-23:32:03/vmcore-dmesg.txt:[30619.097825] ? kthread_park+0x90/0x90 /var/crash/127.0.0.1-2025-05-07-23:32:03/vmcore-dmesg.txt:[30619.102694] ret_from_fork+0x22/0x30 /var/crash/127.0.0.1-2025-05-07-23:32:03/vmcore-dmesg.txt:[30619.110293] kexec: Bye!
Processing file: /var/crash/127.0.0.1-2025-05-08-10:31:14/vmcore-dmesg.txt /var/crash/127.0.0.1-2025-05-08-10:31:14/vmcore-dmesg.txt:[39291.316139] Kernel panic - not syncing: softlockup: hung tasks /var/crash/127.0.0.1-2025-05-08-10:31:14/vmcore-dmesg.txt:[39291.316139] CPU: 98 PID: 247602 Comm: ps Kdump: loaded Tainted: G L 5.10.0-136.12.0.90.ctl3.x86_64 /var/crash/127.0.0.1-2025-05-08-10:31:14/vmcore-dmesg.txt:[39291.316139] Hardware name: Suma MH221/62DC24-C, BIOS 08.03.04.71.14 03/14/2025 /var/crash/127.0.0.1-2025-05-08-10:31:14/vmcore-dmesg.txt:[39291.316139] Call Trace: /var/crash/127.0.0.1-2025-05-08-10:31:14/vmcore-dmesg.txt:[39291.316139] <IRQ> /var/crash/127.0.0.1-2025-05-08-10:31:14/vmcore-dmesg.txt:[39291.316205] dump_stack+0x57/0x6e /var/crash/127.0.0.1-2025-05-08-10:31:14/vmcore-dmesg.txt:[39291.316205] panic+0xfb/0x2dc /var/crash/127.0.0.1-2025-05-08-10:31:14/vmcore-dmesg.txt:[39291.316205] watchdog_timer_fn.cold+0xc/0x16 /var/crash/127.0.0.1-2025-05-08-10:31:14/vmcore-dmesg.txt:[39291.316205] __run_hrtimer+0x5e/0x190 /var/crash/127.0.0.1-2025-05-08-10:31:14/vmcore-dmesg.txt:[39291.316205] __hrtimer_run_queues+0x81/0xf0 /var/crash/127.0.0.1-2025-05-08-10:31:14/vmcore-dmesg.txt:[39291.316205] hrtimer_interrupt+0x110/0x2c0 /var/crash/127.0.0.1-2025-05-08-10:31:14/vmcore-dmesg.txt:[39291.316205] __sysvec_apic_timer_interrupt+0x5f/0xe0 /var/crash/127.0.0.1-2025-05-08-10:31:14/vmcore-dmesg.txt:[39291.316222] asm_call_irq_on_stack+0x12/0x20 /var/crash/127.0.0.1-2025-05-08-10:31:14/vmcore-dmesg.txt:[39291.316222] </IRQ> /var/crash/127.0.0.1-2025-05-08-10:31:14/vmcore-dmesg.txt:[39291.316222] sysvec_apic_timer_interrupt+0x72/0x80 /var/crash/127.0.0.1-2025-05-08-10:31:14/vmcore-dmesg.txt:[39291.316222] asm_sysvec_apic_timer_interrupt+0x12/0x20 /var/crash/127.0.0.1-2025-05-08-10:31:14/vmcore-dmesg.txt:[39291.316222] RIP: 0010:smp_call_function_single+0x9b/0x120 /var/crash/127.0.0.1-2025-05-08-10:31:14/vmcore-dmesg.txt:[39291.316222] Code: be e7 7d a9 00 01 ff 00 0f 85 8e 00 00 00 85 c9 75 4c 48 c7 c6 80 67 03 00 65 48 03 35 e6 5f e7 7d 8b 46 08 a8 01 74 09 f3 90 <8b> 46 08 a8 01 75 f7 83 4e 08 01 4c 89 46 10 48 89 56 18 e8 8d fe /var/crash/127.0.0.1-2025-05-08-10:31:14/vmcore-dmesg.txt:[39291.316222] RSP: 0018:ffffbe79f6cdbc60 EFLAGS: 00000202 /var/crash/127.0.0.1-2025-05-08-10:31:14/vmcore-dmesg.txt:[39291.316222] RAX: 0000000000000001 RBX: 000023b608e9f502 RCX: 0000000000000000 /var/crash/127.0.0.1-2025-05-08-10:31:14/vmcore-dmesg.txt:[39291.316222] RDX: 0000000000000000 RSI: ffffa0ad4d436780 RDI: 000000000000007d /var/crash/127.0.0.1-2025-05-08-10:31:14/vmcore-dmesg.txt:[39291.316222] RBP: ffffbe79f6cdbca8 R08: ffffffff8203e8a0 R09: 000000000000007d /var/crash/127.0.0.1-2025-05-08-10:31:14/vmcore-dmesg.txt:[39291.316222] R10: 0000000000010000 R11: 0000000000000007 R12: 0000000000000001 /var/crash/127.0.0.1-2025-05-08-10:31:14/vmcore-dmesg.txt:[39291.316222] R13: 000000000001f8a8 R14: 0000000000000001 R15: 000000000eaa1b4d /var/crash/127.0.0.1-2025-05-08-10:31:14/vmcore-dmesg.txt:[39291.316222] ? cpu_show_retbleed+0x50/0x50 /var/crash/127.0.0.1-2025-05-08-10:31:14/vmcore-dmesg.txt:[39291.316222] ? ktime_get+0x38/0xa0 /var/crash/127.0.0.1-2025-05-08-10:31:14/vmcore-dmesg.txt:[39291.316222] arch_freq_prepare_all+0x8b/0xe0 /var/crash/127.0.0.1-2025-05-08-10:31:14/vmcore-dmesg.txt:[39291.316222] ? proc_reg_release+0x90/0x90 /var/crash/127.0.0.1-2025-05-08-10:31:14/vmcore-dmesg.txt:[39291.316222] cpuinfo_open+0xe/0x20 /var/crash/127.0.0.1-2025-05-08-10:31:14/vmcore-dmesg.txt:[39291.316222] do_dentry_open+0x14b/0x370 /var/crash/127.0.0.1-2025-05-08-10:31:14/vmcore-dmesg.txt:[39291.316222] do_open+0x1dc/0x320 /var/crash/127.0.0.1-2025-05-08-10:31:14/vmcore-dmesg.txt:[39291.316222] path_openat+0x10b/0x1d0 /var/crash/127.0.0.1-2025-05-08-10:31:14/vmcore-dmesg.txt:[39291.316222] do_filp_open+0x90/0x140 /var/crash/127.0.0.1-2025-05-08-10:31:14/vmcore-dmesg.txt:[39291.316222] ? rcu_nocb_try_bypass+0x1f3/0x370 /var/crash/127.0.0.1-2025-05-08-10:31:14/vmcore-dmesg.txt:[39291.316222] ? page_counter_try_charge+0x2f/0xc0 /var/crash/127.0.0.1-2025-05-08-10:31:14/vmcore-dmesg.txt:[39291.316222] ? files_cgroup_alloc_fd+0x5c/0x70 /var/crash/127.0.0.1-2025-05-08-10:31:14/vmcore-dmesg.txt:[39291.316222] do_sys_openat2+0x207/0x2e0 /var/crash/127.0.0.1-2025-05-08-10:31:14/vmcore-dmesg.txt:[39291.316222] __x64_sys_openat+0x54/0xa0 /var/crash/127.0.0.1-2025-05-08-10:31:14/vmcore-dmesg.txt:[39291.316222] do_syscall_64+0x40/0x80 /var/crash/127.0.0.1-2025-05-08-10:31:14/vmcore-dmesg.txt:[39291.316222] entry_SYSCALL_64_after_hwframe+0x61/0xc6 /var/crash/127.0.0.1-2025-05-08-10:31:14/vmcore-dmesg.txt:[39291.316222] RIP: 0033:0x7f0af54fc8eb /var/crash/127.0.0.1-2025-05-08-10:31:14/vmcore-dmesg.txt:[39291.316222] Code: 25 00 00 41 00 3d 00 00 41 00 74 4b 64 8b 04 25 18 00 00 00 85 c0 75 67 44 89 e2 48 89 ee bf 9c ff ff ff b8 01 01 00 00 0f 05 <48> 3d 00 f0 ff ff 0f 87 91 00 00 00 48 8b 4c 24 28 64 48 2b 0c 25 /var/crash/127.0.0.1-2025-05-08-10:31:14/vmcore-dmesg.txt:[39291.316222] RSP: 002b:00007ffd915fe0d0 EFLAGS: 00000246 ORIG_RAX: 0000000000000101 /var/crash/127.0.0.1-2025-05-08-10:31:14/vmcore-dmesg.txt:[39291.316222] RAX: ffffffffffffffda RBX: 000056199e62e2d0 RCX: 00007f0af54fc8eb /var/crash/127.0.0.1-2025-05-08-10:31:14/vmcore-dmesg.txt:[39291.316222] RDX: 0000000000000000 RSI: 00007f0af56eb7f0 RDI: 00000000ffffff9c /var/crash/127.0.0.1-2025-05-08-10:31:14/vmcore-dmesg.txt:[39291.316222] RBP: 00007f0af56eb7f0 R08: 0000000000000008 R09: 0000000000000001 /var/crash/127.0.0.1-2025-05-08-10:31:14/vmcore-dmesg.txt:[39291.316222] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 /var/crash/127.0.0.1-2025-05-08-10:31:14/vmcore-dmesg.txt:[39291.316222] R13: 000056199e62e2d0 R14: 0000000000000001 R15: 000056199e650e30 /var/crash/127.0.0.1-2025-05-08-10:31:14/vmcore-dmesg.txt:[39291.316383] kexec: Bye!
Processing file: /var/crash/127.0.0.1-2025-05-07-14:57:26/vmcore-dmesg.txt /var/crash/127.0.0.1-2025-05-07-14:57:26/vmcore-dmesg.txt:[12355.525521] Kernel panic - not syncing: softlockup: hung tasks /var/crash/127.0.0.1-2025-05-07-14:57:26/vmcore-dmesg.txt:[12355.532809] CPU: 10 PID: 65 Comm: ksoftirqd/10 Kdump: loaded Tainted: G L 5.10.0-136.12.0.90.ctl3.x86_64 /var/crash/127.0.0.1-2025-05-07-14:57:26/vmcore-dmesg.txt:[12355.545913] Hardware name: Suma MH221/62DC24-C, BIOS 08.03.04.71.14 03/14/2025 /var/crash/127.0.0.1-2025-05-07-14:57:26/vmcore-dmesg.txt:[12355.554750] Call Trace: /var/crash/127.0.0.1-2025-05-07-14:57:26/vmcore-dmesg.txt:[12355.558256] <IRQ> /var/crash/127.0.0.1-2025-05-07-14:57:26/vmcore-dmesg.txt:[12355.561271] dump_stack+0x57/0x6e /var/crash/127.0.0.1-2025-05-07-14:57:26/vmcore-dmesg.txt:[12355.565750] panic+0xfb/0x2dc /var/crash/127.0.0.1-2025-05-07-14:57:26/vmcore-dmesg.txt:[12355.569840] watchdog_timer_fn.cold+0xc/0x16 /var/crash/127.0.0.1-2025-05-07-14:57:26/vmcore-dmesg.txt:[12355.575386] __run_hrtimer+0x5e/0x190 /var/crash/127.0.0.1-2025-05-07-14:57:26/vmcore-dmesg.txt:[12355.580251] __hrtimer_run_queues+0x81/0xf0 /var/crash/127.0.0.1-2025-05-07-14:57:26/vmcore-dmesg.txt:[12355.585695] hrtimer_interrupt+0x110/0x2c0 /var/crash/127.0.0.1-2025-05-07-14:57:26/vmcore-dmesg.txt:[12355.591051] __sysvec_apic_timer_interrupt+0x5f/0xe0 /var/crash/127.0.0.1-2025-05-07-14:57:26/vmcore-dmesg.txt:[12355.597372] asm_call_irq_on_stack+0x12/0x20 /var/crash/127.0.0.1-2025-05-07-14:57:26/vmcore-dmesg.txt:[12355.602915] </IRQ> /var/crash/127.0.0.1-2025-05-07-14:57:26/vmcore-dmesg.txt:[12355.606035] sysvec_apic_timer_interrupt+0x72/0x80 /var/crash/127.0.0.1-2025-05-07-14:57:26/vmcore-dmesg.txt:[12355.612158] asm_sysvec_apic_timer_interrupt+0x12/0x20 /var/crash/127.0.0.1-2025-05-07-14:57:26/vmcore-dmesg.txt:[12355.618662] RIP: 0010:update_blocked_averages+0xd5/0x120 /var/crash/127.0.0.1-2025-05-07-14:57:26/vmcore-dmesg.txt:[12355.625358] Code: 1e 83 bd b8 09 00 00 01 49 8b 5d 00 76 3b 48 8b b5 c0 09 00 00 31 d2 4c 89 ef e8 c6 84 cd 00 48 89 ef e8 8e 2c ff ff 41 54 9d <48> 8b 44 24 08 65 48 2b 04 25 28 00 00 00 75 2f 48 83 c4 10 5b 5d /var/crash/127.0.0.1-2025-05-07-14:57:26/vmcore-dmesg.txt:[12355.647092] RSP: 0018:ffffa2a799683e48 EFLAGS: 00000246 /var/crash/127.0.0.1-2025-05-07-14:57:26/vmcore-dmesg.txt:[12355.653701] RAX: ffff91b346320510 RBX: 0000000000000001 RCX: 0000000000000050 /var/crash/127.0.0.1-2025-05-07-14:57:26/vmcore-dmesg.txt:[12355.662442] RDX: 000000000000000a RSI: ffff91b3463355c0 RDI: ffff91b346335540 /var/crash/127.0.0.1-2025-05-07-14:57:26/vmcore-dmesg.txt:[12355.671183] RBP: ffff91b346335540 R08: 000008f4f22c8a69 R09: 000008f4f22c8a69 /var/crash/127.0.0.1-2025-05-07-14:57:26/vmcore-dmesg.txt:[12355.679922] R10: 0000000000000322 R11: 00000000d744fcc9 R12: 0000000000000246 /var/crash/127.0.0.1-2025-05-07-14:57:26/vmcore-dmesg.txt:[12355.688662] R13: 0000000000000000 R14: ffffffff936060f8 R15: 00000000000000a0 /var/crash/127.0.0.1-2025-05-07-14:57:26/vmcore-dmesg.txt:[12355.697404] run_rebalance_domains+0x3e/0x60 /var/crash/127.0.0.1-2025-05-07-14:57:26/vmcore-dmesg.txt:[12355.702935] __do_softirq+0xc4/0x280 /var/crash/127.0.0.1-2025-05-07-14:57:26/vmcore-dmesg.txt:[12355.707706] run_ksoftirqd+0x1e/0x30 /var/crash/127.0.0.1-2025-05-07-14:57:26/vmcore-dmesg.txt:[12355.712471] smpboot_thread_fn+0xc5/0x160 /var/crash/127.0.0.1-2025-05-07-14:57:26/vmcore-dmesg.txt:[12355.717720] ? sort_range+0x20/0x20 /var/crash/127.0.0.1-2025-05-07-14:57:26/vmcore-dmesg.txt:[12355.722389] kthread+0xfe/0x140 /var/crash/127.0.0.1-2025-05-07-14:57:26/vmcore-dmesg.txt:[12355.726669] ? kthread_park+0x90/0x90 /var/crash/127.0.0.1-2025-05-07-14:57:26/vmcore-dmesg.txt:[12355.731533] ret_from_fork+0x22/0x30 /var/crash/127.0.0.1-2025-05-07-14:57:26/vmcore-dmesg.txt:[12355.739114] kexec: Bye!
Processing file: /var/crash/127.0.0.1-2025-05-08-15:30:09/vmcore-dmesg.txt /var/crash/127.0.0.1-2025-05-08-15:30:09/vmcore-dmesg.txt:[ 6703.222135] Kernel panic - not syncing: softlockup: hung tasks /var/crash/127.0.0.1-2025-05-08-15:30:09/vmcore-dmesg.txt:[ 6703.229422] CPU: 13 PID: 79 Comm: migration/13 Kdump: loaded Tainted: G L 5.10.0-136.12.0.90.ctl3.x86_64 /var/crash/127.0.0.1-2025-05-08-15:30:09/vmcore-dmesg.txt:[ 6703.242519] Hardware name: Suma MH221/62DC24-C, BIOS 08.03.04.71.14 03/14/2025 /var/crash/127.0.0.1-2025-05-08-15:30:09/vmcore-dmesg.txt:[ 6703.251348] Call Trace: /var/crash/127.0.0.1-2025-05-08-15:30:09/vmcore-dmesg.txt:[ 6703.254854] <IRQ> /var/crash/127.0.0.1-2025-05-08-15:30:09/vmcore-dmesg.txt:[ 6703.257871] dump_stack+0x57/0x6e /var/crash/127.0.0.1-2025-05-08-15:30:09/vmcore-dmesg.txt:[ 6703.262350] panic+0xfb/0x2dc /var/crash/127.0.0.1-2025-05-08-15:30:09/vmcore-dmesg.txt:[ 6703.266442] watchdog_timer_fn.cold+0xc/0x16 /var/crash/127.0.0.1-2025-05-08-15:30:09/vmcore-dmesg.txt:[ 6703.271991] __run_hrtimer+0x5e/0x190 /var/crash/127.0.0.1-2025-05-08-15:30:09/vmcore-dmesg.txt:[ 6703.276856] __hrtimer_run_queues+0x81/0xf0 /var/crash/127.0.0.1-2025-05-08-15:30:09/vmcore-dmesg.txt:[ 6703.282302] hrtimer_interrupt+0x110/0x2c0 /var/crash/127.0.0.1-2025-05-08-15:30:09/vmcore-dmesg.txt:[ 6703.287645] __sysvec_apic_timer_interrupt+0x5f/0xe0 /var/crash/127.0.0.1-2025-05-08-15:30:09/vmcore-dmesg.txt:[ 6703.293964] asm_call_irq_on_stack+0x12/0x20 /var/crash/127.0.0.1-2025-05-08-15:30:09/vmcore-dmesg.txt:[ 6703.299506] </IRQ> /var/crash/127.0.0.1-2025-05-08-15:30:09/vmcore-dmesg.txt:[ 6703.302627] sysvec_apic_timer_interrupt+0x72/0x80 /var/crash/127.0.0.1-2025-05-08-15:30:09/vmcore-dmesg.txt:[ 6703.308755] asm_sysvec_apic_timer_interrupt+0x12/0x20 /var/crash/127.0.0.1-2025-05-08-15:30:09/vmcore-dmesg.txt:[ 6703.315270] RIP: 0010:finish_task_switch+0x86/0x290 /var/crash/127.0.0.1-2025-05-08-15:30:09/vmcore-dmesg.txt:[ 6703.321491] Code: 00 0f 1f 44 00 00 0f 1f 44 00 00 41 c7 45 3c 00 00 00 00 0f 1f 44 00 00 4c 89 e7 e8 c4 fe ff ff fb 65 48 8b 04 25 80 f8 01 00 <e9> 45 00 00 00 4d 85 f6 74 21 65 48 8b 04 25 80 f8 01 00 4c 3b b0 /var/crash/127.0.0.1-2025-05-08-15:30:09/vmcore-dmesg.txt:[ 6703.343214] RSP: 0018:ffffb296d96ffe48 EFLAGS: 00000246 /var/crash/127.0.0.1-2025-05-08-15:30:09/vmcore-dmesg.txt:[ 6703.349824] RAX: ffff8b7ad90f8000 RBX: ffff8b7ad90f8000 RCX: 0000000000000000 /var/crash/127.0.0.1-2025-05-08-15:30:09/vmcore-dmesg.txt:[ 6703.358565] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff8b99c64b5540 /var/crash/127.0.0.1-2025-05-08-15:30:09/vmcore-dmesg.txt:[ 6703.367307] RBP: ffffb296d96ffe70 R08: 0000000000000047 R09: 0000000000000047 /var/crash/127.0.0.1-2025-05-08-15:30:09/vmcore-dmesg.txt:[ 6703.376046] R10: ffff8b7ad91040c0 R11: 0000000000000000 R12: ffff8b99c64b5540 /var/crash/127.0.0.1-2025-05-08-15:30:09/vmcore-dmesg.txt:[ 6703.384785] R13: ffff8b7ad9104000 R14: 0000000000000000 R15: 0000000000000000 /var/crash/127.0.0.1-2025-05-08-15:30:09/vmcore-dmesg.txt:[ 6703.393529] ? finish_task_switch+0x7c/0x290 /var/crash/127.0.0.1-2025-05-08-15:30:09/vmcore-dmesg.txt:[ 6703.399072] __schedule+0x2f2/0x640 /var/crash/127.0.0.1-2025-05-08-15:30:09/vmcore-dmesg.txt:[ 6703.403734] schedule+0x46/0xb0 /var/crash/127.0.0.1-2025-05-08-15:30:09/vmcore-dmesg.txt:[ 6703.408021] smpboot_thread_fn+0x10b/0x160 /var/crash/127.0.0.1-2025-05-08-15:30:09/vmcore-dmesg.txt:[ 6703.413360] ? sort_range+0x20/0x20 /var/crash/127.0.0.1-2025-05-08-15:30:09/vmcore-dmesg.txt:[ 6703.418021] kthread+0xfe/0x140 /var/crash/127.0.0.1-2025-05-08-15:30:09/vmcore-dmesg.txt:[ 6703.422303] ? kthread_park+0x90/0x90 /var/crash/127.0.0.1-2025-05-08-15:30:09/vmcore-dmesg.txt:[ 6703.427167] ret_from_fork+0x22/0x30 /var/crash/127.0.0.1-2025-05-08-15:30:09/vmcore-dmesg.txt:[ 6703.434780] kexec: Bye!
|
find /var/crash -type f -name "vmcore-dmesg.txt"
:查找所有
vmcore-dmesg.txt
文件。
-exec sh -c '...'
:为每个文件运行
shell 脚本。
printf "\n\n\nProcessing file: %s\n" "$1"
:
printf "\n\n\n"
:输出三行空行(每个 \n
是一行)。
Processing file: %s\n
:打印文件名($1
是文件路径)。
awk "/Kernel panic/{p=1} p {print \"$1:\" \$0}" "$1"
:处理文件内容,打印包含
"Kernel panic" 的行及其后续行,每行前加上文件名。
_ {}
:_
是占位符,{}
是文件路径。
测试场景分析
当前vm中的iperf流量不会走到物理机上,直接通过net-vf转走了,vm中的fio流量会通过qemu走到物理机的pf。
iperf |
否 |
是 |
否 |
net-vf 直通,不经 QEMU |
fio (Ceph) |
是 |
否 |
是 |
QEMU 发起 RADOS over TCP,走 PF 网卡 |
当前在物理机上跑很多vm,vm间只打iperf流不崩溃,只打fio流也不崩溃,vm间iperf流+fio流混合打,物理机就会崩溃。
【核心猜测】混合负载引发资源竞争或软中断风暴导致崩溃(例如
soft lockup)
一、可能原因分析
1.
内核软中断饱和(ksoftirqd 占满 CPU)
- iperf 是大量 UDP/TCP 流量直通 VF → 宿主机 PF。
- fio 是 QEMU 用户态 I/O → librbd → 宿主机内核 TCP stack → PF。
- 两者都压 PF 网卡、走 TCP 栈和 softirq。
- 大量混合流量可能触发 ksoftirqd 长时间运行,触发 soft
lockup,最终宕机。
2. 宿主机
PF 网卡或驱动(如 mlx5_core)处理能力不足
- net-vf 和 QEMU + RBD 流量同时堆积在 PF 上。
- 如果驱动中断处理不及时,可能导致 NAPI 驱动收包 backlog 爆炸。
- 一旦 backlog 占满 softnet 结构,会导致丢包、不可中断阻塞、CPU
hang。
3. Ceph librbd
与网络栈/软中断路径耦合太紧
- librbd 默认是同步调用 RADOS(同步 socket send/recv);
- 如果同时大量收发包 → socket 被阻塞 → QEMU 卡死 → VM 卡住;
- 宿主机可能因进程堆积 + 内核软中断延迟 → 出现 soft lockup。
4. NUMA、IRQ
绑定不合理导致瓶颈 CPU 被拖死
关中断时常统计
/proc/softirqs
从下图可以看到软中断在部分cpu上是很集中的。
17467712465771
ksoftirqd 线程
使用trace-irqoff观察到ksoftirqd线程存在长时间关闭软中断情况。
17467660274205
1746766102899
17467664157822
mpstat
使用mpstat观察到硬中断耗时比软中断耗时高。这里不太正常,毕竟硬中断在顶半部中处理,是非常快的。
17467813142241
vmstat
vifo中断注册
17467813142242
当前intel好像实现了postinterupt,貌似可以将vf中断直通到vm中,海光4号好像不行。
这涉及到 SR-IOV VF 中断的一种高级优化机制 —— Post-interrupt /
Posted Interrupts(PI),也叫 Interrupt Remapping with
Posted Interrupts。
1. 什么是 Posted
Interrupt(PI)?
Posted Interrupt 是 Intel VT-d
支持的一种优化机制,用于将 设备中断直接投递给
VM,而不经过宿主机的中断处理流程,也就是说:
- 传统中断路径:VF → Host CPU → KVM/vfio → 注入
VM
- Posted Interrupt 路径:VF → 直接送到 VM vCPU → Guest
内核响应中断
这样可以大大减少物理机 CPU 的开销,尤其是大规模网卡/高频中断时。
2. 实现 Posted Interrupt
的条件
要实现 PI,需要满足:
Intel VT-d 支持 PI |
硬件 IOMMU 必须支持 Posted Interrupt |
CPU 支持 APICv |
支持 APIC virtualization |
KVM + vfio-pci |
KVM 启用 interrupt remapping 且 vfio 使用 MSI/MSI-X |
使用 Intel 架构 |
当前只在 Intel 平台上支持(AMD、海光、飞腾目前均不支持) |
3. 海光 4
号为何不支持?
海光 4(x86 架构,Hygon Dhyana)虽然基于 AMD Zen
系列微架构,但目前并未开放对 Intel 风格 VT-d 的 Posted Interrupt
机制的支持:
- 海光支持 SR-IOV 和 IOMMU(通过 AMD-Vi);
- 但 没有实现 Posted
Interrupt(也可能内核没有适配支持);
- VF 的中断依然走传统路径 → 仍然吃宿主机 CPU。
4.
如何确认当前是否启用了 Posted Interrupt?
在 Intel 平台 上可以通过:
1
| dmesg | grep -i 'posted interrupt'
|
或者查看:
1
| cat /sys/module/kvm/parameters/enable_apicv
|
但在海光平台上基本不会看到这些项,表示不支持。
5. 影响总结
Intel(支持 PI) |
VF → VM vCPU |
低 |
是 |
AMD / 海光 |
VF → Host CPU → VM |
高 |
否 |
所以在海光 4 上观察到的现象(VF 中断还是会拖累宿主机 CPU,甚至引发
soft lockup)是预期行为,也是因为没有 Posted
Interrupt 支持所致。
vm vf中断打断ksoftirqd?
核心问题:VF
中断和 fio 流量为什么有关联?
首先明确两条通路:
① VM 内
iperf
(VF 网络)流量路径
- VF(
net-vf
)SR-IOV 设备直通给 VM;
- VF 中断 直接打到物理机 CPU,由 host IRQ handler
处理;
- 不会经过 QEMU,不走 host 内核协议栈(net
stack);
- 不会产生 host 的 ksoftirqd(NET_RX)负载;
- 除非 VF 中断也配置为 MSI-X 且频繁打断 host 的 ksoftirq。
② VM 内
fio
(virtio-blk)流量路径
- QEMU 通过 virtio-blk 模拟磁盘 → host 上的 Ceph/rbd/PF 实设备;
- 该 I/O 请求最终会在 host 上走块层 → 网络层 → PF 网卡;
- 中间需要 host 的 softirq(如
BLOCK
,
NET_TX
, NET_RX
)支持;
- 所以 fio 触发的是 host
的真实软中断和网络中断压力。
关键在于“打断”行为:VF
中断可能打断 ksoftirqd 正在执行的 softirq 处理路径
虽然 VF 流量并不会走 fio 流量路径,但:
VF 中断到来时,可能发生
- VF 中断触发,打在 host 上某个 CPU 上(中断 affinity
绑定核心);
- 正好这个核正在执行
ksoftirqd/n
,处理 PF 的软中断(如
Ceph 发包);
- 被 VF 中断抢占,软中断堆积,
ksoftirqd/n
执行被延迟;
- 导致软中断 backlog 堆积,进而触发
soft lockup
报警。
前面描述的现象说明:
- VF 中断压在物理机某核上频繁打断
ksoftirqd
;
- 同时,fio 走的 PF 需要的 softirq(net_tx/net_rx)无法及时处理;
- 结果:fio 变慢 + ksoftirqd 超时 + soft lockup。
总结:
VF 中断 |
是,频繁中断 |
否 |
否 |
fio blk 流量 |
否,但依赖 softirq |
是 |
是(如 Ceph → PF) |
所以问题核心不是“VF 中断通知 fio”,而是:VF 中断干扰了 fio
所依赖的 host softirq 执行。
解决方案
经测试,有如下两种解决方案:
FPGA中断聚合0x4770设置0x8f、0x87、0x85这几个都测试了下都正常,0x8f跑了20个小时、0x87跑了一个多小时、0x85跑了11个小时,都正常。(0x20a80寄存器保持的默认0)。
手动设置了vfio中断的cpu亲和性,使中断均衡后。测试脚本跑了一晚上,目前原问题没有出现。观察到node0上的cpu的irq负载也降了下去。
17470987709495
问题结果
vfio注册中断的流程中会设置中断的smp_affinity为vf设备所在node节点(node0)的所有cpu。irqbalance会修改这个smp_affinity,但是观察到也只是修改为node0上的某个cpu。手动设置smp_affinity需要关闭irqbalance。