Implement Alert System for logcollector
In order to implement an alert system for resource monitored by logcollector, a standardized result reporting system should be used to report a subset of data to a metrics monitoring system(Prometheus), which will subsequently generate alert(grafana alert managor).
This would require adding state system to log collector to keep essential data between restart as well as prometheus data exporter for some data.
State system
A json file with all persistent states will be written to the disk every second if there is a change in the status. The file will be named status.{start_time}.{time_after_start}.json
, where {start_time} is the system wall clock time when process started in linux time format, {time_after_start} is the number of seconds since the process started in monotonic clock second. (Current wall clock time is not used to deal with leap second or more likely ntp time adjustment).
A new file is always created for each save, and logcollecor is designed to never overwrite any state file. An external script will be used to delete old status.
Each exporter have its own state files.
prometheus data exporter
Log collector will expose a standard prometheus data exporter endpoint to facilitate alerts.
It will initially only expose vantage point last seen information as vantage_point_lastseen(site=XXX)
.
(See also: logcollector-admin#6)