automate deployment of grafana dashboards
Right now, grafana dashboards are managed in a rather haphazard way: some dashboards are managed by Puppet and "provisioned" (ie. deployed automatically), which makes them uneditable from the web UI. To save changes, you need to save the JSON file, put it in the right location in Puppet, commit, push, and have puppet run on the host again.
To make matters worst, only some dashboards are provisioned, and published in a person repo of mine: https://gitlab.com/anarcat/grafana-dashboards
Surely there must be a better way. I had high hopes that https://github.com/Beam-Connectivity/grafana-dashboard-manager could solve this problem, but it doesn't seem very well maintained, with a bunch of bugfix PRs waiting in the queue for more than a year, with possible straight out incompatibility with recent Grafana versions. The gdg project may be a better alternative. grizzly and others take the inverse approach of writing dashboards as code and loading them in grafana, but I think that's much harder.
Finally, we now have lots of dashboards, and it's really hard to find "that right one". We should use the folder structure to sort through those (or possibly labels?), but we can't actually move provisioned dashboards directly, at least I failed to do so right off the bat...
In general, we should probably push our configuration management of Grafana a little further. There Must Be A Better Way.
Requirements:
-
automatically save dashboards to configuration management or at least git versionning (fix the "oh, darn it, i need to save this dashboard to puppet" and "oops, this is not versioned in git" pain points) -
sort dashboards through folders (or labels?) -
cover the grafana2 server and allow collaboration with service admins
Nice to have:
-
public repository for our repos, to share with others: done in https://gitlab.torproject.org/tpo/tpa/grafana-dashboards -
automatically upgrade dashboard versions to newer grafana release (to reduce diff noise like https://gitlab.com/anarcat/grafana-dashboards/-/commit/22640fff18ef3235130d74456e4c3eb75863f44d) -
figure out the "datasource mess", where the datasource fields get recursively expanded as some dashboards are saved -
remove the duplicate data sources (we have three Prometheus datasources, all pointing to the same server, on grafana1)
Next steps:
-
evaluate grafana dashboard manager project -
review upstream literature on how to provision / version dashboards -
ask around in the community for tips and ideas