PREF: optimise preloading application

PREF: optimise preloading application

We preload to ensure as much memory as possible is reused from unicorn master to various workers using copy-on-write (sidekiq, unicorn)

This migrates the preloading code into the Discourse module for easier reuse and adds 3 notable preloading changes

  1. We attempt to localize a string on each site, ensuring we warmup the i18n

  2. We preload all our templates (compiling .erb to class)

  3. We warm-up our search tokenizer which uses cppjieba which is a large memory consumer, this will only cause a warmup on CJK sites or sites with the special site setting enabled.

diff --git a/Gemfile b/Gemfile
index 8c54f88..e9c9af1 100644
--- a/Gemfile
+++ b/Gemfile
@@ -26,6 +26,10 @@ else
   gem 'sprockets-rails'
 end
 
+# this will eventually be added to rails,
+# allows us to precompile all our templates in the unicorn master
+gem 'actionview_precompiler', require: false
+
 gem 'seed-fu'
 
 gem 'mail', require: false
diff --git a/Gemfile.lock b/Gemfile.lock
index d7fcb79..ac2a8ca 100644
--- a/Gemfile.lock
+++ b/Gemfile.lock
@@ -20,6 +20,8 @@ GEM
       erubi (~> 1.4)
       rails-dom-testing (~> 2.0)
       rails-html-sanitizer (~> 1.1, >= 1.2.0)
+    actionview_precompiler (0.2.1)
+      actionview (>= 6.0.a)
     active_model_serializers (0.8.4)
       activemodel (>= 3.0)
     activejob (6.0.0)
@@ -428,6 +430,7 @@ DEPENDENCIES
   actionmailer (= 6.0.0)
   actionpack (= 6.0.0)
   actionview (= 6.0.0)
+  actionview_precompiler
   active_model_serializers (~> 0.8.3)
   activemodel (= 6.0.0)
   activerecord (= 6.0.0)
diff --git a/config/unicorn.conf.rb b/config/unicorn.conf.rb
index 9b140ea..e38f345 100644
--- a/config/unicorn.conf.rb
+++ b/config/unicorn.conf.rb
@@ -53,43 +53,15 @@ initialized = false
 before_fork do |server, worker|
 
   unless initialized
-    # load up the yaml for the localization bits, in master process
-    I18n.t(:posts)
-
-    # load up all models and schema
-    (ActiveRecord::Base.connection.tables - %w[schema_migrations versions]).each do |table|
-      table.classify.constantize.first rescue nil
-    end
-
-    # ensure we have a full schema cache in case we missed something above
-    ActiveRecord::Base.connection.data_sources.each do |table|
-      ActiveRecord::Base.connection.schema_cache.add(table)
-    end
-
-    schema_cache = ActiveRecord::Base.connection.schema_cache
-
-    # load up schema cache for all multisite assuming all dbs have
-    # an identical schema
-    RailsMultisite::ConnectionManagement.each_connection do
-      dup_cache = schema_cache.dup
-      # this line is not really needed, but just in case the
-      # underlying implementation changes lets give it a shot
-      dup_cache.connection = nil
-      ActiveRecord::Base.connection.schema_cache = dup_cache
-    end
-
-    # router warm up
-    Rails.application.routes.recognize_path('abc') rescue nil
-
-    # preload discourse version
-    Discourse.git_version
-    Discourse.git_branch
-    Discourse.full_version
+    Discourse.preload_rails!
 
     # V8 does not support forking, make sure all contexts are disposed
     ObjectSpace.each_object(MiniRacer::Context) { |c| c.dispose }
 
     # get rid of rubbish so we don't share it
+    # longer term we will use compact! here
+    GC.start
+    GC.start
     GC.start
 
     initialized = true
diff --git a/lib/discourse.rb b/lib/discourse.rb
index ca603be..0295d92 100644
--- a/lib/discourse.rb
+++ b/lib/discourse.rb
@@ -764,4 +764,50 @@ module Discourse
   def self.skip_post_deployment_migrations?
     ['1', 'true'].include?(ENV["SKIP_POST_DEPLOYMENT_MIGRATIONS"]&.to_s)
   end
+
+  # this is used to preload as much stuff as possible prior to forking
+  # in turn this can conserve large amounts of memory on forking servers
+  def self.preload_rails!
+    return if @preloaded_rails
+
+    # load up all models and schema
+    (ActiveRecord::Base.connection.tables - %w[schema_migrations versions]).each do |table|
+      table.classify.constantize.first rescue nil
+    end
+
+    # ensure we have a full schema cache in case we missed something above
+    ActiveRecord::Base.connection.data_sources.each do |table|
+      ActiveRecord::Base.connection.schema_cache.add(table)
+    end
+
+    schema_cache = ActiveRecord::Base.connection.schema_cache
+
+    # load up schema cache for all multisite assuming all dbs have
+    # an identical schema
+    RailsMultisite::ConnectionManagement.each_connection do
+      dup_cache = schema_cache.dup
+      # this line is not really needed, but just in case the
+      # underlying implementation changes lets give it a shot
+      dup_cache.connection = nil
+      ActiveRecord::Base.connection.schema_cache = dup_cache
+      I18n.t(:posts)
+
+      # this will force Cppjieba to preload if any site has it
+      # enabled allowing it to be reused between all child processes
+      Search.prepare_data("test")
+    end
+
+    # router warm up
+    Rails.application.routes.recognize_path('abc') rescue nil
+
+    # preload discourse version
+    Discourse.git_version
+    Discourse.git_branch
+    Discourse.full_version
+
+    require 'actionview_precompiler'
+    ActionviewPrecompiler.precompile
+  ensure
+    @preloaded_rails = true
+  end
 end

GitHub sha: 8d5f47dd

4 Likes

Why 3 GC.start?

Discovered this here ^^ running GC.start does not clear absolutely everything, you need to run it a few times to pick up on all the loose objects.

3 Likes