kernel NULL deref regression with linux kernel 6.1.0-34-amd64
This was the scenario earlier today:
root@dal-node-01:~# gnt-node list
Node DTotal DFree MTotal MNode MFree Pinst Sinst
dal-node-01.torproject.org 5.2T 783.5G 503.5G 54.1G 224.2G 11 10
dal-node-02.torproject.org ? ? ? ? ? 14 12
dal-node-03.torproject.org 5.2T 2.8T 503.5G 13.2G 406.8G 13 12
After rebooting -02
, it came back to the cluster, but then several drdb disks were not syncing properly (WFConnection
and WFBitMapT
in /proc/drbd
) and there were kernel errors all around, for example:
[ 3461.048094] BUG: kernel NULL pointer dereference, address: 0000000000000000
[ 3461.055550] #PF: supervisor instruction fetch in kernel mode
[ 3461.061610] #PF: error_code(0x0010) - not-present page
We noticed there was a kernel upgrade recently and reverted from 6.1.0-34
to -33
, and now we don't see sync issues or kernel errors anymore.
Update: it also happens on tb-build-03.
Edited by anarcat