/srv filesystem failure on tb-build-03
today i heard this from @richard:
13:49:14 <+richard> anarcat, lavamind: so i'm trying to do a build on tb-build-03
13:49:23 <+richard> and i can't seem to create any files?
13:49:35 <+richard> eg touch foo -> touch: cannot touch 'foo': Read-only file system
that started happening "very shortly before" that message, according to richard. so this is recent.
filesystem was remounted read-only:
13:55:10 <+anarcat> /dev/sdc on /srv type ext4 (ro,relatime)
13:55:10 <+anarcat> /dev/sdc on /home type ext4 (ro,relatime)
and there were errors like this in the dmesg
:
13:55:44 <+anarcat> [960186.849508] EXT4-fs warning (device sdc): ext4_end_bio:323: I/O error 5 writing to inode 28189202 (offset 8388608 size 4567040 starting block 159154688)
i naively rebooted the box, and now it fails to boot with this on the console:
[FAILED] Failed to start File Syste…5-6f79-40cf-8406-8d484132ffc6.
See 'systemctl status "systemd-fsck@dev…\x2d8d484132ffc6.service"' for details.
[DEPEND] Dependency failed for /srv.
[DEPEND] Dependency failed for Local File Systems.
[DEPEND] Dependency failed for /home.
in the rescue, trying to mount the filesystems gives me:
root@tb-build-03:~# mount -a
mount: /srv: wrong fs type, bad option, bad superblock on /dev/sdc, missing codepage or helper program, or other error.
mount: /home: special device /srv/home does not exist.
(ignore the /home
, it's a bind mount.)
so something is up with that filesystem. apparently:
14:04:12 <+richard> fwiw this happened in the middle of a build
14:04:49 <+richard> so it just broke (or read/writing to disk exposed something that broke awhile ago)
action plan:
- reboot in case that just fixes it (nope, it won't boot now)
- investigate state on the SAN
- compare with the state on the ganeti node...
- ... and from inside the VM
- possibly: recreate the LUN on the SAN?
- alternative: recreate the box from scratch, it doesn't have backups anyways (and the drives were created weird anyways)