FEATURE: detect collector needs restarting if pid file changes

FEATURE: detect collector needs restarting if pid file changes

Previously if for any reason pid file was removed collector could be left in a bad state where it would constantly restart.

This can get very costly.

Additionally:

  • Liveness checks now only happen every 5 seconds (used to be 1 second)
  • Collector is running directly under master proces (without sh in the middle)
diff --git a/.rubocop.yml b/.rubocop.yml
index 838b6c1..eeb56e6 100644
--- a/.rubocop.yml
+++ b/.rubocop.yml
@@ -2,4 +2,4 @@ inherit_gem:
   rubocop-discourse: default.yml
 
 Style/GlobalVars:
-  AllowedVariables: [$prometheus_client, $parent_pid, $port]
+  AllowedVariables: [$prometheus_client, $parent_pid, $port, $pid_file]
diff --git a/bin/collector b/bin/collector
index 3bb0ada..2f76cc3 100755
--- a/bin/collector
+++ b/bin/collector
@@ -8,6 +8,7 @@ spec = Gem::Specification.load spec_file
 spec.activate
 
 require 'thread'
+require 'set'
 require 'oj'
 require 'prometheus_exporter'
 require 'prometheus_exporter/server'
@@ -25,6 +26,7 @@ require_relative '../lib/collector'
 
 $port = ARGV[0].to_i
 $parent_pid = ARGV[1].to_i
+$pid_file = ARGV[2]
 
 STDERR.puts "#{Time.now}: Starting Prometheus Collector pid: #{Process.pid} port: #{$port}"
 
@@ -33,7 +35,7 @@ if $parent_pid > 0
   Thread.new do
     def alive?(pid)
       Process.kill(0, pid)
-      true
+      File.read($pid_file).to_i == Process.pid
     rescue
       false
     end
@@ -49,7 +51,7 @@ if $parent_pid > 0
       rescue => e
         STDERR.puts "URGENT monitoring thread had an exception #{e}"
       end
-      sleep 1
+      sleep 5
     end
   end
 end
diff --git a/lib/demon.rb b/lib/demon.rb
index c3a99a7..cf1ab27 100644
--- a/lib/demon.rb
+++ b/lib/demon.rb
@@ -15,9 +15,9 @@ class DiscoursePrometheus::Demon < ::Demon::Base
 
     collector = File.expand_path("../../bin/collector", __FILE__)
 
-    env = "RUBY_GLOBAL_METHOD_CACHE_SIZE=2048 " \
-      "RUBY_GC_HEAP_INIT_SLOTS=10000 "
+    ENV["RUBY_GLOBAL_METHOD_CACHE_SIZE"] = "2048"
+    ENV["RUBY_GC_HEAP_INIT_SLOTS"] = "10000"
 
-    exec "#{env} #{collector} #{GlobalSetting.prometheus_collector_port} #{parent_pid}"
+    exec collector, GlobalSetting.prometheus_collector_port.to_s, parent_pid.to_s, pid_file
   end
 end
diff --git a/plugin.rb b/plugin.rb
index 5eb4f25..ee16572 100644
--- a/plugin.rb
+++ b/plugin.rb
@@ -56,7 +56,7 @@ after_initialize do
         DiscoursePrometheus::Demon.start
         while true
           DiscoursePrometheus::Demon.ensure_running
-          sleep 1
+          sleep 5
         end
       rescue => e
         STDERR.puts "Failed to initialize prometheus web server from pid: #{Process.pid} #{e}"

GitHub sha: db513500