disk failure on dal-rescue-01
First diagnostic
looks like a disk failed on dal-rescue-01.
follow raid, consider cross-shipping dal-rescue-02.
there was also a warning about HTTPS not being reachable
Alerts and mails, click to expand
Date: Sun, 04 Jan 2026 19:14:21 +0000
From: alertmanager@prometheus-03.torproject.org
To: torproject-admin@torproject.org
Reply-To: tpa-team@lists.torproject.org
Subject: RAIDDegraded RAID array on dal-rescue-01.torproject.org is degraded
Total firing alerts: 1
## Firing Alerts
-----
Time: 2026-01-04 19:13:51.784 +0000 UTC
Summary: RAID array on dal-rescue-01.torproject.org is degraded
Description: The md1 RAID array on dal-rescue-01.torproject.org has failed: 1 disks failed in device md1
playbook: https://gitlab.torproject.org/tpo/tpa/team/-/wikis/howto/raid#failed-disk
-----
Date: Sun, 04 Jan 2026 19:12:57 +0000
From: mdadm monitoring <root@dal-rescue-01>
To: root@dal-rescue-01.torproject.org
Subject: Fail event on /dev/md/1:dal-rescue-01
This is an automatically generated mail message.
Fail event detected on md device /dev/md/1, component device /dev/sda3
The /proc/mdstat file currently contains the following:
Personalities : [raid1] [raid0] [raid6] [raid5] [raid4] [raid10]
md0 : active raid1 sda2[3] mmcblk0p2[2]
306176 blocks super 1.2 [2/2] [UU]
md1 : active raid1 sda3[3](F) mmcblk0p3[2]
3564544 blocks super 1.2 [2/1] [U_]
unused devices: <none>
Date: Sun, 04 Jan 2026 20:15:24 +0000
From: mdadm monitoring <root@dal-rescue-01>
To: root@dal-rescue-01.torproject.org
Subject: Fail event on /dev/md/0:dal-rescue-01
This is an automatically generated mail message.
Fail event detected on md device /dev/md/0, component device /dev/sda2
The /proc/mdstat file currently contains the following:
Personalities : [raid1] [raid0] [raid6] [raid5] [raid4] [raid10]
md0 : active raid1 mmcblk0p2[2]
306176 blocks super 1.2 [2/1] [U_]
md1 : active raid1 sda3[3](F) mmcblk0p3[2]
3564544 blocks super 1.2 [2/1] [U_]
unused devices: <none>
Current status
Roles
- Lead: unless otherwise noted, the issue assignee
- Operations:
- Communications:
- Planning:
Next steps
-
inspect state of ejected disk. try re-adding disk to array to see if it is accepted -
if problem persists ship dal-rescue-02 to the dallas DC. once rescue-02 is at the datacenter get rescue-01 shipped back so we can get it fixed locally
- in this case, the network configuration of dal-rescue-02 needs to be adjusted before it is shipped
Dashboards
Post-mortem
Edited by lelutin