DEV: replaces huge generated emoji list by a simpler regex (PR #11053)

This commit brings the following changes:

  • uses \p{Emoji} instead of a complex generated string of 50kb+, this is necessary as it’s currently not possible to add more data to this string or it’s causing crashes
  • fetches latest updated images from various sets
  • fixes a bug where frowning_face was colliding with slightly_frowning_face
  • ensures javascript:update_constants rake task is called after updating emojis
  • also fixes lefthook ember-template-lint crashing as it was attempting to parse a huge number of files (images)

GitHub

@eviltrout @gschlager note that the commit is huge but, there’s actually only few files relevant to review:

  • app/assets/javascripts/pretty-text/addon/emoji.js
  • lib/pretty_text/shims.js
  • lib/tasks/emoji.rake

Regarding the failed test: I believe using only \p{Emoji} won’t be enough.

Per unicode spec (see UTS #51: Unicode Emoji), a more exhaustive expression might be needed:

"☺️😃👨‍🌾😜".match(/\p{Emoji}/gu)
// ["☺", "😃", "👨", "🌾", "😜"]

"☺️😃👨‍🌾😜".match(/\p{RI}\p{RI}|\p{Emoji}(\p{Emoji_Modifier}|\u{FE0F}\u{20E3}?|[\u{E0020}-\u{E007E}]+\u{E007F})?(\u{200D}\p{Emoji}(\p{Emoji_Modifier}|\u{FE0F}\u{20E3}?|[\u{E0020}-\u{E007E}]+\u{E007F})?)*/gu)
// ["☺️", "😃", "👨‍🌾", "😜"]

Regarding the failed test: I believe using only \p{Emoji} won’t be enough.

Per unicode spec (see http://unicode.org/reports/tr51/#EBNF_and_Regex), a more exhaustive expression might be needed:

"☺️😃👨‍🌾😜".match(/\p{Emoji}/gu)
// ["☺", "😃", "👨", "🌾", "😜"]

"☺️😃👨‍🌾😜".match(/\p{RI}\p{RI}|\p{Emoji}(\p{Emoji_Modifier}|\u{FE0F}\u{20E3}?|[\u{E0020}-\u{E007E}]+\u{E007F})?(\u{200D}\p{Emoji}(\p{Emoji_Modifier}|\u{FE0F}\u{20E3}?|[\u{E0020}-\u{E007E}]+\u{E007F})?)*/gu)
// ["☺️", "😃", "👨‍🌾", "😜"]

Yes I have seen the failure I think we will have to go for more complex indeed, not a big deal, still way simpler than what we had

This pull request introduces 1 alert when merging e8f11a852d9d471f92418009ab7c0c8b6b9a0ede into dbec3792b7331a79078faa2b8f0def339ea7e912 - view on LGTM.com

new alerts:

  • 1 for Useless regular-expression character escape

@jjaffeux where are we on this? has stuff drifted since this PR was made (do you need another emoji update)

@SamSaffron there’s multiple subjects at once now:

  • we need to update how we match emojis as our current way is buggy and more importantly will crash if we add more emojis and is very slow and adds to the payload
  • we need to drop joypixels set and replace it with open emoji
  • we need to update to support new emojis, and things have changed a lot since I done it first and I would want to update the script to make all of this even easier in the future
  • I would want to investigate using svgs for emojis

So yes lots of things going on, I’m currently investigating various things to try to split this in different tasks. We can close this one for now if you want.

1 Like

I would want to investigate using svgs for emojis

:heart_eyes: :heart_eyes: :heart_eyes:

can we do a new update @jjaffeux and merge this week ?

I can try yes, this is really touchy so not 100% sure, bit will try

The title of this pull request changed from “DEV: emoji update and fixes” to "DEV: replaces huge generated emoji list by a simpler regex

@SamSaffron I will re-read this, do more checks and look at few places for improvements on Monday but it’s almost done.

2 Likes

@SamSaffron I had quite a lot of things to adjust, I will merge it tomorrow morning first thing.

1 Like