Skip to content

feat: Additional Prometheus metrics

stephen requested to merge additional-prometheus-metrics into main

As per #77 (closed), there are a few additional pieces of data which should be able to be monitored via our Prometheus endpoint (/metrics).

This MR addresses this need as follows:

  • tpa_crm_donate_ext_transaction_count or tpa_crm_donate_ext_recurring_transaction_count will be incremented, grouped by status success or failure.

  • Incoming webhook messages which fail validation generate webhook_message_rejected metrics with type matching the rejection reason. This is a limited set under Stripe - signature_not_present, signature_bad or payload_parse_error - and under Paypal it is probably always going to be VALIDATION_ERROR, but can also be UNAUTHORIZED or INTERNAL_SERVER_ERROR in some instances.

  • Incoming webhook messages which pass validation generate webhook_message_received metrics with vendor matching stripe or paypal and type matching the event type of the webhook. At present, we only generate metrics for the limited subset of event types we use to track individual donations, but it is also possible to increase the number of webhooks passed to this endpoint and to simply increase the number of labels we pay attention to. (But since we've been trying to keep labels to a minimum, I erred on the side of prudence.)

  • When a donation message is successfully handed off to CiviCRM, we call civicrm.repository.donation_transaction_counter() with the donation type and status, which is just a handler for generating various Prometheus metrics. At present, we track failed donations at this point in the process under tpa_crm_donate_ext_other_error_count with the label rejected_by_vendor; successful donations are either tpa_crm_donate_ext_transaction_count or tpa_crm_donate_ext_recurring_transaction_count, depending. (Tracking failed donations by vendor may be accomplished more easily by filtering webhook_message_received messages.)

  • If the act of handing a donation message to CiviCRM fails, we generate tpa_crm_donate_ext_other_error_count with a type of reporting_failed.

  • Speaking of tpa_crm_donate_ext_other_error_count, we now generate it from several places inside the donation form's validation flow if validation doesn't pass cleanly:

    • amount_not_an_integer, if the donation value is not an int
    • minimum_donation_not_met, if the donation value is below the minimum allowed
    • minimum_perk_donation_not_met, if the donation is below the minimum allowed for the selected perk
    • no_perk_with_gift, if no perk was selected, but the "no gifts for me, thanks" checkbox wasn't selected
    • perk_with_no_gift, if a perk was selected, but so was the "no gifts for me, thanks" checkbox
    • captcha, if the regular four-character CAPTCHA was entered incorrectly
  • Comments were also added to the CAPTCHA metric generation, since the Django plugin validates itself elsewhere in the codebase, and we instead check for it among all the valid values to log whether validation passed.

    CivicrmRepositoryProtocol was updated to reflect the contents of CivicrmRepositoryMock.

Closes #77 (closed).

Edited by anarcat

Merge request reports

Loading