Group membership via google hd authentication (PR #14835)

We should check that groups are removed correctly (right now, they aren’t - see my other suggestions)


          # Second login, one group removed
          mock_omniauth_for_groups([group1])
          get "/auth/google_oauth2/callback.json", params: {
            scope: groups_scope.split(' '),
            code: 'abcde',
            hd: domain
          }
          expect(response.status).to eq(302)
          expect(UserAssociatedGroup.where(user_id: user.id).count).to eq(1)

          # Third login, all groups removed
          mock_omniauth_for_groups([])
          get "/auth/google_oauth2/callback.json", params: {
            scope: groups_scope.split(' '),
            code: 'abcde',
            hd: domain
          }
          expect(response.status).to eq(302)
          expect(UserAssociatedGroup.where(user_id: user.id).count).to eq(0)
        end

We should process an empty array, otherwise a user will never be ‘removed’ from their final group (see spec addition)

    if authenticator&.provides_groups? && !associated_groups.nil?

We should process an empty array, otherwise a user will never be ‘removed’ from their final group (see spec addition)

    if provides_groups? && (groups = auth_token[:extra][:raw_groups])

Sorry for dropping in from the sidelines, but since you’re mentioning the next release: do you think you’ll be able to make this work with other non-Google Open ID providers as well by then or is this gonna take more time? Angus mentioned on Discourse Meta that this wouldn’t be too difficult once the heavy-lifting is done.

In any case, thanks for all of your efforts!

@davidtaylorhq Thanks, will process this shortly.

@burnoutberni Note that David mentioned merging this after the next beta. We’ll only be including google groups in the initial release. This is a big change already and including other providers would be too much all in one go.

I too want other providers in there, but we can’t rush a change like this.

Note that this one is already _fabricator.

@davidtaylorhq One thing we should consider is garbage collection of unused associated groups. Say for example that someone is setting up a google group structure: they name a group in google, log in, rename the group in google, log in…and do that a few times. You could end up with a lot of similar associated group names in the group admin UI, i.e. the dropdown could look something like

google_oauth2:my-cool-group
google_oauth2:my-coool-group
google_oauth2:my-coool-groups

I was thinking of handling that with a weekly cleanup job looking for “stale” associated groups, i.e. something like this on the AG model

def self.destroy_stale_groups!
    AssociatedGroup.joins("AS ag LEFT JOIN user_associated_groups AS uag ON ag.id = uag.associated_group_id")
      .joins("user_histories as uh ON uh.subject = (ag.name||':'||ag.provider_name||coalesce((':'||ag.provider_id), ''))")
      .where("
        uag.id IS NULL AND
        uh.updated_at < NOW() - INTERVAL '30 days';
      ").destroy_all
 end

This also raises the utiity of having the group label as the GroupHistory subject. It would be simpler to have the associated group id for a query like the above, however that would also mean less if that associated group was subsequently removed (i.e. the subject would then become useless). On balance I think it still makes sense to use the label as the subject, however it would be nice to have an associated_group_id column in group_histories. Do you think it’s worth adding one?

Moreover, looking at the label again now, I’m not sure the difference between the label on the server (name:provider_name:provider_id) and the client (provider_name:name) makes sense. I think my original thinking behind that was something like the id would be messy on the client, but now I’m not sure. Two issues to resolve:

  1. the (name:provider_name) order should be consistent. What order would you prefer?
  2. do you think the non-inclusion of the provider_id on the client makes sense?

Regarding garbage-collecting old AssociatedGroups, I think we should only do it automatically if

  1. The AssociatedGroup has 0 UserAssociatedGroups and 0 GroupAssociatedGroups
  2. It was ‘last seen’ more than 1 week ago (or similar)

For (2), I don’t think we should join on the user_histories table. That could be extremely slow on large sites. Instead we could add a last_seen column to AssociatedGroups (similar to the last_seen column on UserAssociatedAccounts). That column can be bumped every time we receive a response from the identity provider which contains that group. Then the query to lookup stale groups should be something like

AssociatedGroup.left_joins(:group_associated_groups, :user_associated_groups)
               .where("group_associated_groups.id IS NULL AND user_associated_groups.id IS NULL")
               .where("last_seen_at < ?", 1.week.ago)

For the other questions:

  1. I think I prefer provider_name:name, so that an alphabetical list will group by provider. But if you prefer otherwise, I don’t mind too much
  2. I think I prefer excluding it. If we do include it, it should be the last item, and probably in brackets to avoid confusion. I imagine some providers could have very long (e.g. >20 char) group IDs. Again, I don’t mind too much either way. We can always add/remove it later based on user feedback

@angusmcleod Totally understand, thanks for the update <3

@davidtaylorhq Ok, I’ve added garbage collection, and addressed the label issue. Ready for another look.

Looks great, thanks @angusmcleod - I’ll aim to get this merged in the next couple of weeks :rocket: