reliability issues in puppet catalog runs
- Truncate descriptions
i'm not sure we should worry about this too much, but today i started seeing some reliability issues when running catalogs. i am not sure it's specific to any given host, but right now we have a SystemdFailedUnits
alert on tbb-nightlies-master, and the failed unit is puppet-run, and the error is:
Nov 07 19:24:42 tbb-nightlies-master puppet-agent[787337]: Could not retrieve catalog from remote server: Request to https://puppet:8140/puppet/v3/catalog/tbb-nightlies-master.torproject.org?environment=production failed after 3.155 seconds: SSL_read: unexpected eof while reading
Nov 07 19:24:42 tbb-nightlies-master puppet-agent[787337]: Wrapped exception:
Nov 07 19:24:42 tbb-nightlies-master puppet-agent[787337]: SSL_read: unexpected eof while reading
Nov 07 19:24:42 tbb-nightlies-master puppet-agent[787337]: Not using cache on failed catalog
Nov 07 19:24:42 tbb-nightlies-master puppet-agent[787337]: Could not retrieve catalog; skipping run
"unexpected eof while reading".
i've also seen issues talking with PuppetDB. not sure what's going on, or even if we should worry about this before we upgrade to Puppet server 7 (#41819 (closed)), but i thought i should mention it because it's getting more and more disruptive.
also, @lelutin, we probably have some double-alerting going on here between the puppet monitoring and the systemd monitoring... at least here puppet flags didn't fire up, but they do fire up in systemd first... not sure how to handle that.
restarting puppet on that host fixed it. it might be worth checking our puppet metrics to see what else is affected, or if it's specific to that host.
- Show labels
- Show closed items