FEATURE: Add more columns to outbound EmailLog (PR #13449)

This adds the following columns to EmailLog:

  • cc_addresses
  • cc_user_ids
  • topic_id
  • raw

This is to bring the EmailLog table closer in parity to IncomingEmail so it can be better utilized for Group SMTP and IMAP mailing.

The raw column contains the full content of the outbound email, but only if the new hidden site setting enable_raw_outbound_email_logging is enabled. Most sites do not need it, and it’s mostly required for IMAP and SMTP sending.

In the next pull request, there will be a migration to backfill topic_id on the EmailLog table, at which point we can remove the topic fallback method on EmailLog.

GitHub

Temporary until the next PR where I will backfill this column.

Are we able to add this as a partial index on topic_id IS NOT NULL? From memory, the email_logs table is one of the largest table on some sites so we don’t want to be carrying a big index around.

I think we can just raise an irreversible migration error here.

We should be careful here, Topic.find will raise an error if topic_id is no longer valid.

Will this methods be used in a future PR? Seems odd to me that we’re adding methods which are only used in tests.

I’m not sure if we should be storing this in the table because I don’t think this site settings can be enabled for a long duration. On sites where alot of emails are sent, storing the raw in the database is going to make the email_logs table take up more disk space. Instead, could we just log the raw if a dev really needs to look at the raw message for debugging purposes?

It’s not irreversible though? Can easily just remove the columns…

Yes they will be used later.

The code here is only useful in development but once we decide on the data model, I feel like we don’t need to carry this code long term since we will never rollback in production.

We log all of the IncomingEmail raw emails unconditionally, so I think this is a good compromise. Leaving it up to a dev enable it after the fact is no good…the horse has already bolted at that point, you can’t log the raw of an email that has already been sent.

Now that I think about it though, I think this should just log unconditionally if the email is being sent via group SMTP. That way the impact will be minimal (only one or two groups will ever have this enabled on a site, and we are emailing multiple people with one EmailLog record now that we are going to be doing the CC addresses). Will change to do that and get rid of the setting.

Also, we delete all EmailLog records after 90 days so that should lessen the impact even more.

Ah right, I didn’t think of the case where the topic is completely deleted, will use find_by(id: self.topic_id) instead. I am going to change this to belongs_to :topic in the next PR after I backfill anyway.

Doesn’t that just happen automatically if you are adding an index to a NULLable column? Why would it index the NULL values?

null values are still indexed by default to support queries like where topic_id IS NULL. See discourse/post.rb at 53dab8cf1ecf08979c0f4c17fdd8f2181b77b5ba · discourse/discourse · GitHub for the partial indexes we use in Discourse even if a column is nullable

Leaving it up to a dev enable it after the fact is no good…the horse has already bolted at that point, you can’t log the raw of an email that has already been sent.

I understand the concern here but I guess this is a tradeoff we make. If problems with SMTP and IMAP sending rarely happens, we’re using up more resources for little benefit.

I think this should just log unconditionally if the email is being sent via group SMTP. That way the impact will be minimal (only one or two groups will ever have this enabled on a site, and we are emailing multiple people with one EmailLog record now that we are going to be doing the CC addresses).

I guess if this isn’t a widely used feature we can just log it in the DB when SMTP and IMAP is enabled but I’ll keep an eye on the disk usage. The comparison to IncomingEmail isn’t really fair because we send out way more emails than we receive.

DB when SMTP and IMAP is enabled

It’s even less than that; it will only be if SMTP/IMAP is configured for a group. So far we’ve sent 403 emails to non-Group users (I am changing it so the outbound group SMTP only sends to topic allowed users), and out of those many will be compressed into one email with several CC addresses in a future change, so I think this logging is acceptable, especially when we are deleting the logs after 90 days. We can revisit if it really is taking up heaps of space in the DB but I doubt that.

We could also make it a group-level setting at some point that is default off if needed.

Thank you for the clarification, lets just log in the DB as it is and keep an :eye: on the disk space usage.

Hmm I don’t really agree with this; sure we don’t rollback in production, but we don’t worry about this when we do change migrations that have automatic rollbacks e.g. FEATURE: lets users favorite 2 badges to show on user-card (#13151) · discourse/discourse@1cd0424 · GitHub

I think we need a tiebreaker :stuck_out_tongue:

The assertions here seems to be testing for the same thing.