REFACTOR: use tables instead of custom fields for polls (PR #6359)

This changes the “data layer” of the poll plugin from custom fields to tables.

I also vastly simplified the rules for updating a poll. We can now only update a poll within the edit window. Any votes casted during that period will be deleted if the poll is changed in any way.

I also added the possibility to show the results on vote or when the poll is closed.

I paid attention not to introduce any N+1 but I’m not a :robot:.

:warning: The MigratePollsData query is a monster :ghost: .

GitHub

You’ve signed the CLA, ZogStriP. Thank you! This pull request is ready for review.

what we can do here is piggy back on a custom field. So you would store a single custom field on a post saying “hey I have a poll” and then if that hits go to the polls table. This means that we would only run the complex poll query if one of the posts has a poll. This is much more efficient cause we run the “fetch these custom fields query” for multiple plugins. 99% of posts have no polls.

I am somewhat worried about this, in particular how does this handle corrupt data, we need to make sure that there is no case where we run this migration and just explode. I would kind of prefer if this iterated through all polls and perform the conversion in ruby just cause of the safety aspect.

Iterating through all polls is fine, there’s no way any site will have a pathological number of polls (say more than 1000) … not really possible in my opinion.

On Tue, Sep 4, 2018 at 7:17 PM, Sam notifications@github.com wrote:

@SamSaffron commented on this pull request.

In plugins/poll/db/migrate/20180820080623_migrate_polls_data.rb https://github.com/discourse/discourse/pull/6359#discussion_r215118886:

@@ -0,0 +1,82 @@ +class MigratePollsData < ActiveRecord::Migration[5.2]

  • def up
  • execute <<~SQL

I am somewhat worried about this, in particular how does this handle corrupt data, we need to make sure that there is no case where we run this migration and just explode. I would kind of prefer if this iterated through all polls and perform the conversion in ruby just cause of the safety aspect.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/discourse/discourse/pull/6359#pullrequestreview-152314890, or mute the thread https://github.com/notifications/unsubscribe-auth/ABc7VQlLRjsSMoFI51cWqdQMpc6p9Tulks5uXzQvgaJpZM4WaAwt .

Maybe we don’t need this? poll_options owns the poll_votes and has a dependent: :delete_all

Perhaps we can store the strings above in an enum so that we only have to store an integer in the columns?

Perhaps we can declare the has_many relationship in the serializer instead?

Perhaps we can just do t.references :post, index: true, foreign_key: true?

We’ll also need to think about how we’re going to handle the deployment process. After this migration has run, the old custom fields are still writable which means that a poll created or vote made before the app servers have been deployed with the new code will be lost. Iterating through the polls and doing the migration in Ruby is safer too as we can acquire a lock around the poll while the migration is happening.

I think we still need this test case to ensure that staff members are able to edit options.

Are the tests above no longer required as well?

Some test cases have been removed here and the test cases are still relevant IMO.

Ditto https://github.com/discourse/discourse/pull/6359/files#r215156653

My bad I just read that we simplified the update logic. Is there any previous context of why we are doing so?

This allows me to count the # of vote®s directly on the Poll object instead of going through all the PollOption.

@SamSaffron I was worried about that too and tried my best to handle this in the query. But I guess doing it in Ruby is safer. Will change.

@tgxworld I personally am fine losing votes during the migration and the odds of people making a poll while the migration is running are very very low. I’d much rather lose a bit of data than add a lot more complexity to this migration.

I have grown to hate integer enums. I feel the storage space “gained” is not worth the clarity of

SELECT * FROM polls WHERE status = 'closed'

Which is IMHO 10000x better than

SELECT * FROM polls WHERE status = 0

But I sure can create SQL types to enforce the values on these columns.

A single digit integer takes up 2 bytes while the shortest string here takes up 5 bytes. The longest string in this case takes up 9 bytes. On top of that, we don’t know if we’ll eventually introduce type/status/results that are longer in length. The other benefit is that indexing an integer column is usually faster and smaller in size. The gains are quite significant IMO and is a tradeoff that I think we should take and has already taken in other places of our app.

On the Ruby side of things, using an enum allows us to avoid specifying a common string in multiple places. I really like the pattern of Poll.types[:regular] or Poll.status[:closed] so that we only need to declare the string once and avoid any possibility of making a typo in multiple places.

during the migration and the odds of people making a poll while the migration is running are very very low

It isn’t the duration of the migration that results in the window of the data loss. The window of data loss is open when the migration has run but the app servers have not been redeployed with the latest code. That window can be quite significant depending on the different build processes we have in place.