FEATURE: allow a huge number of users to share common prefix

FEATURE: allow a huge number of users to share common prefix

Previously username suggester would give up after 100 attempts at getting a username and fallback to random string.

This amends the logic so we do all the work of figuring out a good username in SQL and avoids a large amount of queries in cases where a lot of usernames were used up.

This corrects an issue on sites with large numbers of anon users

diff --git a/lib/user_name_suggester.rb b/lib/user_name_suggester.rb
index f414351..12dd7df 100644
--- a/lib/user_name_suggester.rb
+++ b/lib/user_name_suggester.rb
@@ -23,14 +23,46 @@ module UserNameSuggester
 
   def self.find_available_username_based_on(name, allowed_username = nil)
     name = fix_username(name)
+    offset = nil
     i = 1
     attempt = name
+
     until attempt == allowed_username || User.username_available?(attempt) || i > 100
-      suffix = i.to_s
+
+      if offset.nil?
+        normalized = User.normalize_username(name)
+        similar = "#{normalized}(0|1|2|3|4|5|6|7|8|9)+"
+
+        count = DB.query_single(<<~SQL, like: "#{normalized}%", similar: similar).first
+          SELECT count(*)  FROM users
+          WHERE username_lower LIKE :like AND
+            username_lower SIMILAR TO :similar
+        SQL
+
+        if count > 0
+          available = DB.query_single(<<~SQL, count: count, name: normalized).first
+            WITH numbers AS (SELECT generate_series(1, :count) AS n)
+
+            SELECT n FROM numbers
+            LEFT JOIN users ON username_lower = :name || n::varchar
+            WHERE users.id IS NULL
+            ORDER by n ASC
+            LIMIT 1
+          SQL
+
+          # we start at 1
+          offset = available - 1
+        else
+          offset = 0
+        end
+      end
+
+      suffix = (i + offset).to_s
       max_length = User.username_length.end - suffix.length
       attempt = "#{truncate(name, max_length)}#{suffix}"
       i += 1
     end
+
     until attempt == allowed_username || User.username_available?(attempt) || i > 200
       attempt = SecureRandom.hex[1..SiteSetting.max_username_length]
       i += 1
diff --git a/spec/components/user_name_suggester_spec.rb b/spec/components/user_name_suggester_spec.rb
index 7965458..ae9c88b 100644
--- a/spec/components/user_name_suggester_spec.rb
+++ b/spec/components/user_name_suggester_spec.rb
@@ -10,6 +10,15 @@ describe UserNameSuggester do
       SiteSetting.max_username_length = 15
     end
 
+    it "keeps adding numbers to the username" do
+      Fabricate(:user, username: 'sam')
+      Fabricate(:user, username: 'sAm1')
+      Fabricate(:user, username: 'sam2')
+      Fabricate(:user, username: 'sam4')
+
+      expect(UserNameSuggester.suggest('saM')).to eq('saM3')
+    end
+
     it "doesn't raise an error on nil username" do
       expect(UserNameSuggester.suggest(nil)).to eq(nil)
     end

GitHub sha: a8fbb19e

This commit has been mentioned on Discourse Meta. There might be relevant details there:

Followup to a8fbb19e

Why not use

similar = "#{normalized}\d+"

NEVERMIND, we’re using it in a LIKE

1 Like

It is actually similar to, which is basically regex light

1 Like

Why not use the regex instead of SIMILAR TO. Are there any performance differences?

I think like is fastest, similar to second and regex slowest, but there is an aspect of micro optimizing here for sure

1 Like