Skip to content

kernel NULL deref regression with linux kernel 6.1.0-34-amd64

This was the scenario earlier today:

root@dal-node-01:~# gnt-node list
Node                       DTotal  DFree MTotal MNode  MFree Pinst Sinst
dal-node-01.torproject.org   5.2T 783.5G 503.5G 54.1G 224.2G    11    10
dal-node-02.torproject.org      ?      ?      ?     ?      ?    14    12
dal-node-03.torproject.org   5.2T   2.8T 503.5G 13.2G 406.8G    13    12

After rebooting -02, it came back to the cluster, but then several drdb disks were not syncing properly (WFConnection and WFBitMapT in /proc/drbd) and there were kernel errors all around, for example:

[ 3461.048094] BUG: kernel NULL pointer dereference, address: 0000000000000000
[ 3461.055550] #PF: supervisor instruction fetch in kernel mode
[ 3461.061610] #PF: error_code(0x0010) - not-present page

We noticed there was a kernel upgrade recently and reverted from 6.1.0-34 to -33, and now we don't see sync issues or kernel errors anymore.

Update: it also happens on tb-build-03.

Edited by anarcat
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information