Skip to content
GitLab
Explore
Sign in
Primary navigation
Search or go to…
Project
Wiki Replica
Manage
Activity
Members
Labels
Plan
Issues
Issue boards
Milestones
Code
Merge requests
Repository
Branches
Commits
Tags
Repository graph
Compare revisions
Build
Pipelines
Jobs
Pipeline schedules
Artifacts
Deploy
Releases
Container Registry
Model registry
Operate
Environments
Monitor
Incidents
Analyze
Value stream analytics
Contributor analytics
CI/CD analytics
Repository analytics
Model experiments
Help
Help
Support
GitLab documentation
Compare GitLab plans
Community forum
Contribute to GitLab
Provide feedback
Keyboard shortcuts
?
Snippets
Groups
Projects
Show more breadcrumbs
The Tor Project
TPA
Wiki Replica
Commits
e465b607
Verified
Commit
e465b607
authored
2 years ago
by
anarcat
Browse files
Options
Downloads
Patches
Plain Diff
document the ghost disk error that occured after
team#40910
parent
ff88340e
No related branches found
Branches containing commit
No related tags found
No related merge requests found
Changes
1
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
howto/ganeti.md
+52
-0
52 additions, 0 deletions
howto/ganeti.md
with
52 additions
and
0 deletions
howto/ganeti.md
+
52
−
0
View file @
e465b607
...
...
@@ -1956,6 +1956,58 @@ to remove the logical volumes on the target node:
ssh fsn-node-06.torproject.org -tt lvremove vg_ganeti/abf0eeac-55a0-4ccc-b8a0-adb0d8d67cf7.disk1_meta
ssh fsn-node-06.torproject.org -tt lvremove vg_ganeti/abf0eeac-55a0-4ccc-b8a0-adb0d8d67cf7.disk1_data
### Cleaning up ghost disks
Under certain circumstances, you might end up with "ghost" disks, for
example:
Tue Oct 4 13:24:07 2022 - ERROR: cluster : ghost disk 'ed225e68-83af-40f7-8d8c-cf7e46adad54' in temporary DRBD map
It's unclear how this happens, but in this specific case it is
believed the problem occurred because a disk failed to add to an
instance being resized.
It's
*possible*
this is a situation similar to the one above, in which
case you must first find
*where*
the ghost disk is, with something
like:
gnt-cluster command 'lvs --noheadings' | grep 'ed225e68-83af-40f7-8d8c-cf7e46adad54'
If this finds a device, you can remove it as normal:
ssh fsn-node-06.torproject.org -tt lvremove vg_ganeti/ed225e68-83af-40f7-8d8c-cf7e46adad54.disk1_data
... but in this case, the DRBD map is
*not*
associated with a logical
volume. You can also check the
`dmsetup`
output for a match as well:
gnt-cluster command 'dmsetup ls' | grep 'ed225e68-83af-40f7-8d8c-cf7e46adad54'
According to
[
this discussion
](
https://groups.google.com/g/ganeti/c/s5qoh26T1yA
)
, it's possible that restarting
ganeti on all nodes might clear out the issue:
gnt-cluster command 'service ganeti restart'
If
*all*
the "ghost" disks mentioned are not actually found anywhere
in the cluster, either in the device mapper or logical volumes, it
might just be stray data leftover in the data file.
So it
*looks*
like the proper way to do this is to
*remove*
the
temporary file where this data is stored:
gnt-cluster command 'grep ed225e68-83af-40f7-8d8c-cf7e46adad54 /var/lib/ganeti/tempres.data'
ssh ... service ganeti stop
ssh ... rm /var/lib/ganeti/tempres.data
ssh ... service ganeti start
gnt-cluster verify
That solution was proposed in
[
this discussion
](
https://groups.google.com/g/ganeti/c/SMR3yNek3Js
)
. Anarcat toured the
Ganeti source code and found that the
`ComputeDRBDMap`
function, in
the Haskell codebase, basically just sucks the data out of that
`tempres.data`
JSON file, and dumps it into the Python side of
things. Then the Python code looks for those disks in its internal
disk list and compares. It's pretty unlikely that the warning would
happen with the disks still being around, therefore.
### Fixing inconsistent disks
Sometimes
`gnt-cluster verify`
will give this error:
...
...
This diff is collapsed.
Click to expand it.
Preview
0%
Loading
Try again
or
attach a new file
.
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Save comment
Cancel
Please
register
or
sign in
to comment