In the previous commit 5222247746340a2f915b17ae15f156e6324ae79a we added a topic_id column to EmailLog. This simply backfills it and gets rid of the topic method defined on EmailLog in favour of belongs_to
We need to do this in a post migration since we need the updated code to be deployed before running this migration to ensure we are backfilling correctly.
I’m alittle uneasy about updating so many columns at once since we may end up locking the table and temporary cause email sending to fail. On mailing list enabled sites, the email logs table is huge. Are we able to update in batches instead?
Dang it I second guessed myself here…will switch to a post migrate
I am going to test this query on a couple of sites first (with ROLLBACK) to see the impact…I have not been able to yet because the deploy keeps failing
I will look into the locking…would really prefer not to overcomplicate this query, I have done big UPDATE FROM queries in the past without issue.
@martin-brennan Btw I think I remember why I dropped the
topic_id column previously. We can basically get the
topic_id via the
post association. Is there another need for the
@tgxworld relying on an association is kind of a pain because you have to join across tables just to get the topic ID. Would rather just have the topic ID on this table like we do for
IncomingEmail, it will make things a lot easier. Filtering all outbound emails for a topic becomes a lot easier.
I still haven’t gotten around to testing out this migration, doing that today and will batch if needed.
@tgxworld Moved this update into batches and I tested on Meta with a rollback; this is what I did there:
Benchmark.measure do ActiveRecord::Base.transaction do BATCH_SIZE = 30000 offset = 0 email_log_count = EmailLog.count loop do DB.exec(<<~SQL, offset: offset, batch_size: BATCH_SIZE) WITH cte AS ( SELECT topic_id, post_id FROM email_logs ORDER BY id LIMIT :batch_size OFFSET :offset ) UPDATE email_logs SET topic_id = posts.topic_id FROM cte INNER JOIN posts ON posts.id = cte.post_id WHERE email_logs.post_id = cte.post_id SQL offset += BATCH_SIZE break if offset > (email_log_count + BATCH_SIZE * 2) end raise ActiveRecord::Rollback end end
The results were that it took 34s to run this on Meta. Meta has
506230 email log records right now, so that’s about 2s per batch of 30,000. I am actually a little wary of removing the
topic method in this PR, it leaves us with no wiggle room if this migration goes bad, I think I will remove that in another PR.
What are your thoughts on this now?
We need to avoid using ActiveRecord in migrations
I think it is fine to keep this method but I don’t think we need to change the commit history here.
Oh I must have put it in the wrong place sorry
Argh every time!! Will fix
@tgxworld any more changes here?
It seems like we do not have to select
topic_id here although I’m wondering if we’re missing a filter to not update rows
WHERE topic_id IS NOT NULL
Good point, we do not need topic_id here and for the update I will do this:
WHERE email_logs.post_id = cte.post_id AND email_logs.topic_id IS NULL
I think we can add the
email_logs.topic_id IS NULL query to the CTE instead.
Oh right, done
O no we can’t because the rows may change between queries. haha ignore my comment here.
Okay I will remove the additional WHERE lol