FIX: use ordered_posts for last post check, not the posts relation

FIX: use ordered_posts for last post check, not the posts relation

The posts relation on Topic is not ordered. Using Topic.posts.first is basically the same as asking for a random post, it will depend on DB order. This breaks on Topic merge and split for example.

Additionally, a huge problem with that is that it forces active record down a slow path. Topic.posts.first is extremely slow on giant topics, since it has no default ordering it appears AR materializes the entire set prior to doing first.

This commit also illustrates the importance of testing, initially I only fixed the second instance of the problem in post_validator.rb but testing revealed that the problem was repeated at the top of the file.

Longer term we should consider a larger change of default ordering the posts relations so people do not fall down this trap anymore.

diff --git a/lib/validators/post_validator.rb b/lib/validators/post_validator.rb
index 9c1e7fe..b18abd9 100644
--- a/lib/validators/post_validator.rb
+++ b/lib/validators/post_validator.rb
@@ -146,7 +146,7 @@ class Validators::PostValidator < ActiveModel::Validator
     return if SiteSetting.max_consecutive_replies == 0 || post.id || post.acting_user&.staff? || private_message?(post)
 
     topic = post.topic
-    return if topic&.posts&.first&.user == post.user
+    return if topic&.ordered_posts&.first&.user == post.user
 
     last_posts_count = DB.query_single(<<~SQL, topic_id: post.topic_id, user_id: post.acting_user.id, max_replies: SiteSetting.max_consecutive_replies).first
       SELECT COUNT(*)
@@ -164,7 +164,7 @@ class Validators::PostValidator < ActiveModel::Validator
     return if last_posts_count < SiteSetting.max_consecutive_replies
 
     guardian = Guardian.new(post.acting_user)
-    if guardian.can_edit?(topic.posts.last)
+    if guardian.can_edit?(topic.ordered_posts.last)
       post.errors.add(:base, I18n.t(:max_consecutive_replies, count: SiteSetting.max_consecutive_replies))
     end
   end
diff --git a/spec/components/validators/post_validator_spec.rb b/spec/components/validators/post_validator_spec.rb
index f368daf..91f36ab 100644
--- a/spec/components/validators/post_validator_spec.rb
+++ b/spec/components/validators/post_validator_spec.rb
@@ -239,11 +239,11 @@ describe Validators::PostValidator do
     end
 
     it "should not allow posting more than 2 consecutive replies" do
-      Post.create(user: other_user, topic: topic, raw: "post number 0")
-      Post.create(user: user, topic: topic, raw: "post number 1")
-      Post.create(user: user, topic: topic, raw: "post number 2")
+      Post.create!(user: user, topic: topic, raw: "post number 2", post_number: 2)
+      Post.create!(user: user, topic: topic, raw: "post number 3", post_number: 3)
+      Post.create!(user: other_user, topic: topic, raw: "post number 1", post_number: 1)
 
-      post = Post.new(user: user, topic: topic, raw: "post number 3")
+      post = Post.new(user: user, topic: topic, raw: "post number 4", post_number: 4)
       validator.force_edit_last_validator(post)
       expect(post.errors.count).to eq(1)
     end

GitHub sha: a7628c1d

1 Like

Clarification. AR does not load the entire set, but default behavior is ORDER BY id ASC LIMIT 1 for .first and ORDER BY id DESC LIMIT 1 for .last.

Both are nonsensical results as they do not matter we only care about post_number ordering.

Giant performance hit is cause it also has a very big perf hit is cause we do not have a topic_id, id index it simply makes no sense to add.

This commit has been mentioned on Discourse Meta. There might be relevant details there:

Why not use topic&.first_post here?

2 Likes

This is actually super subtle in the larger context (I did a grep for all usages of topic.ordered_posts.first, there are lots)

topic.ordered_posts.each do |p|
   # do something with post
end

# no query is issued to DB
first = topic.ordered_posts.first

# query is issued to DB
first = topic.first_post

We can fix this so it is defined as

def first_post
   post = topic.ordered_posts.first
   post if post&.post_number == 1
end

And give up on:

has_one :first_post, -> { where post_number: 1 }, class_name: 'Post'

We would need to be a bit careful here to make sure there are no regressions with current expected side effects due to the has_one relation.

This also leaves the question of last_post cause we have not defined that and the code uses last_post a few lines down which I think is our first usage in the code base.

But … this all leaves this little bitter taste for me:

We currently have this is the code:

# TRAP: posts in "random" order
has_many :posts

# LESS TRAP: posts in an "ordered" fashion
has_many :ordered_posts, -> { order(post_number: :asc) }, class_name: "Post"

So I guess my big objection is around the double definition of the relation here, I find it a bit too trappy for my liking.

I could, I guess be swayed to deprecate ordered_posts relation and just move to first_post, last_post helpers and then explicit orderings when we need more than 1 post.

This I think is the only spot that requires extreme care with removal of ordered_posts:

    post_actions = actions.order('post_actions.created_at DESC')
      .includes(related_post: { topic: { ordered_posts: :user } })
      .where(post_id: post_ids)

But this still kind of leaves a trap for new devs where they need to remember to always use helpers or explicit ordering when asking for posts.

Eg:

first_3 = topic.posts.order(:post_number).limit(3)
1 Like

This commit has been mentioned on Discourse Meta. There might be relevant details there: