Skip to content
Snippets Groups Projects
Closed colchicifolium backups are barely functional
  • View options
  • colchicifolium backups are barely functional

  • View options
  • Closed Issue created by anarcat

    colchicifolium's backup frequently come under the attention of Nagios:

    2022-02-07 09:25:38 <nsa> tor-nagios: [colchicifolium] backup - bacula - last backup is WARNING: WARN: Last backup of colchicifolium.torproject.org/any is 1.27 days old.
    2022-02-09 06:25:48 <nsa> tor-nagios: [colchicifolium] backup - bacula - last backup is CRITICAL: CRITICAL: Last backup of colchicifolium.torproject.org/any is 3.14 days old.
    2022-02-11 08:25:44 <nsa> tor-nagios: [colchicifolium] backup - bacula - last backup is OK: OK: Last backup of colchicifolium.torproject.org/any is 0.04 days old.
    2022-02-17 00:40:40 <nsa> tor-nagios: [colchicifolium] backup - bacula - last backup is WARNING: WARN: Last backup of colchicifolium.torproject.org/any is 1.42 days old.
    2022-02-18 16:40:38 <nsa> tor-nagios: [colchicifolium] backup - bacula - last backup is CRITICAL: CRITICAL: Last backup of colchicifolium.torproject.org/any is 3.08 days old.
    2022-02-19 02:40:45 <nsa> tor-nagios: [colchicifolium] backup - bacula - last backup is OK: OK: Last backup of colchicifolium.torproject.org/any is 0.03 days old.
    2022-02-26 15:55:43 <nsa> tor-nagios: [colchicifolium] backup - bacula - last backup is WARNING: WARN: Last backup of colchicifolium.torproject.org/any is 1.35 days old.
    2022-02-28 07:55:50 <nsa> tor-nagios: [colchicifolium] backup - bacula - last backup is CRITICAL: CRITICAL: Last backup of colchicifolium.torproject.org/any is 3.02 days old.
    2022-03-01 08:25:47 <nsa> tor-nagios: [colchicifolium] backup - bacula - last backup is CRITICAL: CRITICAL: Last backup of colchicifolium.torproject.org/any is 3.85 days old.

    The current backup has been running since 2022-02-27 18:37:53, about 4 days ago at the time of writing.

    This is the timing of the known jobs on that server in the current Bacula director:

    bacula=# SELECT level, jobstatus, starttime, endtime, (CASE WHEN endtime IS NULL THEN NOW() ELSE endtime END)-starttime AS duration, jobfiles, jobbytes FROM job WHERE name='colchicifolium.torproject.org' ORDER by starttime;
     level | jobstatus |      starttime      |       endtime       |        duration        | jobfiles |   jobbytes   
    -------+-----------+---------------------+---------------------+------------------------+----------+--------------
     I     | f         | 2015-12-03 09:57:02 | 2015-12-03 09:57:02 | 00:00:00               |        0 |            0
     D     | f         | 2017-12-09 00:35:08 | 2017-12-09 00:35:08 | 00:00:00               |        0 |            0
     D     | f         | 2019-03-05 18:15:28 | 2019-03-05 18:15:28 | 00:00:00               |        0 |            0
     F     | f         | 2019-07-22 14:06:13 | 2019-07-22 14:06:13 | 00:00:00               |        0 |            0
     I     | f         | 2019-09-07 20:02:52 | 2019-09-07 20:02:52 | 00:00:00               |        0 |            0
     I     | f         | 2020-12-11 02:06:57 | 2020-12-11 02:06:57 | 00:00:00               |        0 |            0
     F     | T         | 2021-10-30 04:18:48 | 2021-10-31 05:32:59 | 1 day 01:14:11         |  2973523 | 409597402632
     F     | T         | 2021-12-10 06:06:18 | 2021-12-12 01:41:37 | 1 day 19:35:19         |  3404504 | 456273938172
     D     | E         | 2022-01-12 15:03:53 | 2022-01-14 21:57:32 | 2 days 06:53:39        |  5029124 | 123658942337
     D     | T         | 2022-01-15 01:57:38 | 2022-01-17 17:24:20 | 2 days 15:26:42        |  5457677 | 130269432219
     F     | T         | 2022-01-19 22:33:54 | 2022-01-22 14:41:49 | 2 days 16:07:55        |  4336473 | 516207537019
     I     | T         | 2022-01-26 14:12:52 | 2022-01-26 16:25:40 | 02:12:48               |   185016 |   7712392837
     I     | T         | 2022-01-27 14:06:35 | 2022-01-27 16:47:50 | 02:41:15               |   188625 |   8433225061
     D     | T         | 2022-01-28 06:21:56 | 2022-01-28 18:13:24 | 11:51:28               |  1364571 |  28815354895
     I     | T         | 2022-01-29 06:41:31 | 2022-01-29 10:12:46 | 03:31:15               |   178896 |  33790932680
     I     | T         | 2022-01-30 04:46:21 | 2022-01-30 07:10:41 | 02:24:20               |   177074 |   7298789209
     I     | T         | 2022-01-31 04:19:19 | 2022-01-31 13:18:59 | 08:59:40               |   203085 |  37604120762
     I     | T         | 2022-02-01 04:11:16 | 2022-02-01 07:11:08 | 02:59:52               |   195922 |  41592974842
     I     | T         | 2022-02-02 04:30:15 | 2022-02-02 06:39:15 | 02:09:00               |   190243 |   8548513453
     I     | T         | 2022-02-03 02:55:37 | 2022-02-03 06:25:57 | 03:30:20               |   186250 |   6138223644
     I     | T         | 2022-02-04 01:06:54 | 2022-02-04 04:19:46 | 03:12:52               |   187868 |   8892468359
     I     | T         | 2022-02-05 01:46:11 | 2022-02-05 04:09:50 | 02:23:39               |   194623 |   8754299644
     I     | T         | 2022-02-06 01:45:57 | 2022-02-06 08:02:29 | 06:16:32               |   208416 |   9582975941
     D     | T         | 2022-02-06 21:07:00 | 2022-02-11 12:31:37 | 4 days 15:24:37        |  3428690 |  57424284749
     I     | T         | 2022-02-11 12:38:30 | 2022-02-11 18:52:52 | 06:14:22               |   590289 |  18987945922
     I     | T         | 2022-02-12 14:03:10 | 2022-02-12 16:36:49 | 02:33:39               |   190798 |   6760825592
     I     | T         | 2022-02-13 13:45:42 | 2022-02-13 15:34:05 | 01:48:23               |   189130 |   7132469485
     I     | T         | 2022-02-14 15:19:05 | 2022-02-14 18:58:24 | 03:39:19               |   199895 |   6797607219
     I     | T         | 2022-02-15 15:25:05 | 2022-02-15 19:40:27 | 04:15:22               |   199052 |   8115940960
     D     | T         | 2022-02-15 20:24:17 | 2022-02-19 06:54:49 | 3 days 10:30:32        |  4967994 |  77854030910
     I     | T         | 2022-02-19 07:02:32 | 2022-02-19 18:23:59 | 11:21:27               |   496812 |  24270098875
     I     | T         | 2022-02-20 07:45:46 | 2022-02-20 10:45:13 | 02:59:27               |   174086 |   7179666980
     I     | T         | 2022-02-21 06:57:49 | 2022-02-21 11:51:18 | 04:53:29               |   182035 |  15512560970
     I     | T         | 2022-02-22 05:10:39 | 2022-02-22 07:57:01 | 02:46:22               |   172397 |   7210544658
     I     | T         | 2022-02-23 06:36:44 | 2022-02-23 13:17:10 | 06:40:26               |   211809 |  29150059606
     I     | T         | 2022-02-24 05:39:43 | 2022-02-24 09:57:25 | 04:17:42               |   179419 |   7469834934
     I     | T         | 2022-02-25 05:30:58 | 2022-02-25 12:32:09 | 07:01:11               |   202945 |  30792174057
     D     | f         | 2022-02-25 12:33:48 | 2022-02-25 12:33:48 | 00:00:00               |        0 |            0
     D     | R         | 2022-02-27 18:37:53 |                     | 4 days 02:16:38.136257 |        0 |            0
    (39 rows)

    In there, the current backup has been running for 4 days, and is nowhere near done, as it has "only" backed up 1.8M files so far:

     JobId  Type Level     Files     Bytes  Name              Status
    ======================================================================
    187892  Back Diff  1,799,185    56.09 G colchicifolium.torproject.org is running

    In comparison, the last differential backed up 4.9M files, so this could go on for a long time.

    So we need to find a solution to fix collector's backup. A few observations:

    • almost every file seems to change about every other week, a full backup being ~5M files and the last differential being also ~5M files

    • maybe it's just Bacula that is slow at crawling: backing up 5M files and 500GB shouldn't be that slow... at 2.5days, it's "only" 18mbps, far from the line rate of the network (gigabit) or disk (200+MB/s AKA 1600Mbps on the backup server disk and the ganeti node)

    • in general, incrementals are somewhat fast enough (a handful of hours), but full backups and differentials are slow enough to block backups for a number of days and trigger those warnings

    • during that time, needless to say the backup itself is inconsistent: files get added, removed, or changed constantly during that time and Bacula is not taking this into account

    • the current backup is particularly suspicious in that it has "only" backed up 1.8M files in four days while transfering a comparatively large 56GB... typically, when we transfer ~50GB, we end up with 3.6M files, or, inversely, 1.5M files is closer to 29GB of content

    • we also recently had to add disk space on that server (#40535 (closed))

    /cc @hiro

    Edited by anarcat

    Linked items ... 0

  • Activity

    • All activity
    • Comments only
    • History only
    • Newest first
    • Oldest first
    Loading Loading Loading Loading Loading Loading Loading Loading Loading Loading