FIX: reword whitelist to allowlist (#430)

FIX: reword whitelist to allowlist (#430)

The version was bumped to 2.0.0 as this change is not backward compatible

diff --git a/README.md b/README.md
index daa6479..a4d9823 100644
--- a/README.md
+++ b/README.md
@@ -75,7 +75,7 @@ Adding Support for a new URL
 ----------------------------
 
   1. Check if the site supports [oEmbed](http://oembed.com/) or [Open Graph](https://developers.facebook.com/docs/opengraph/).
-     If it does, you can probably get away with just whitelisting the URL in `Onebox::Engine::WhitelistedGenericOnebox` (see: [Whitelisted Generic Onebox caveats](#user-content-whitelisted-generic-onebox-caveats)).
+     If it does, you can probably get away with just allowing the URL in `Onebox::Engine::AllowlistedGenericOnebox` (see: [Allowlisted Generic Onebox caveats](#user-content-allowlisted-generic-onebox-caveats)).
      If the site does not support open standards, you can create a new engine.
 
   2. Create new onebox engine
@@ -163,12 +163,12 @@ Adding Support for a new URL
      require_relative "engine/name_onebox"
      `‍``
 
-Whitelisted Generic Onebox caveats
+Allowlisted Generic Onebox caveats
 ----------------------------------
 
-The Whitelisted Generic Onebox has some caveats for its use, beyond simply whitelisting the domain.
+The Allowlisted Generic Onebox has some caveats for its use, beyond simply allowlisting the domain.
 
-  1. The domain must be whitelisted
+  1. The domain must be allowlisted
   2. The URL you're oneboxing cannot be a root url (e.g. `http://example.com` won't work, but `http://example.com/page` will)
   3. If the oneboxed URL responds with oEmbed and has a `rich` type: the `html` content must contain an `<iframe>`. Responses without an iframe will not be oneboxed.
 
diff --git a/lib/onebox/engine.rb b/lib/onebox/engine.rb
index 9d42cf8..33f7807 100644
--- a/lib/onebox/engine.rb
+++ b/lib/onebox/engine.rb
@@ -141,7 +141,7 @@ require_relative "engine/wikimedia_onebox"
 require_relative "engine/wikipedia_onebox"
 require_relative "engine/youtube_onebox"
 require_relative "engine/youku_onebox"
-require_relative "engine/whitelisted_generic_onebox"
+require_relative "engine/allowlisted_generic_onebox"
 require_relative "engine/pubmed_onebox"
 require_relative "engine/soundcloud_onebox"
 require_relative "engine/imgur_onebox"
diff --git a/lib/onebox/engine/allowlisted_generic_onebox.rb b/lib/onebox/engine/allowlisted_generic_onebox.rb
new file mode 100644
index 0000000..d10c2e8
--- /dev/null
+++ b/lib/onebox/engine/allowlisted_generic_onebox.rb
@@ -0,0 +1,375 @@
+# frozen_string_literal: true
+
+require 'htmlentities'
+
+module Onebox
+  module Engine
+    class AllowlistedGenericOnebox
+      include Engine
+      include StandardEmbed
+      include LayoutSupport
+
+      def self.allowed_domains=(list)
+        @allowed_domains = list
+      end
+
+      def self.allowed_domains
+        @allowed_domains ||= default_allowed_domains.dup
+      end
+
+      def self.default_allowed_domains
+        %w(
+          23hq.com
+          500px.com
+          8tracks.com
+          abc.net.au
+          about.com
+          answers.com
+          arstechnica.com
+          ask.com
+          battle.net
+          bbc.co.uk
+          bbs.boingboing.net
+          bestbuy.ca
+          bestbuy.com
+          blip.tv
+          bloomberg.com
+          businessinsider.com
+          change.org
+          clikthrough.com
+          cnet.com
+          cnn.com
+          codepen.io
+          collegehumor.com
+          consider.it
+          coursera.org
+          cracked.com
+          dailymail.co.uk
+          dailymotion.com
+          deadline.com
+          dell.com
+          deviantart.com
+          digg.com
+          dotsub.com
+          ebay.ca
+          ebay.co.uk
+          ebay.com
+          ehow.com
+          espn.go.com
+          etsy.com
+          facebook.com
+          findery.com
+          folksy.com
+          forbes.com
+          foxnews.com
+          funnyordie.com
+          gifs.com
+          groupon.com
+          howtogeek.com
+          huffingtonpost.ca
+          huffingtonpost.com
+          hulu.com
+          ign.com
+          ikea.com
+          imdb.com
+          indiatimes.com
+          itunes.apple.com
+          khanacademy.org
+          kickstarter.com
+          kinomap.com
+          lessonplanet.com
+          linkedin.com
+          liveleak.com
+          livestream.com
+          mashable.com
+          medium.com
+          meetup.com
+          mixcloud.com
+          mlb.com
+          myshopify.com
+          myspace.com
+          nba.com
+          npr.org
+          nytimes.com
+          photobucket.com
+          pinterest.com
+          reference.com
+          revision3.com
+          rottentomatoes.com
+          samsung.com
+          screenr.com
+          scribd.com
+          slideshare.net
+          sourceforge.net
+          speakerdeck.com
+          spotify.com
+          squidoo.com
+          streamable.com
+          techcrunch.com
+          ted.com
+          thefreedictionary.com
+          theglobeandmail.com
+          thenextweb.com
+          theonion.com
+          thestar.com
+          thesun.co.uk
+          thinkgeek.com
+          tmz.com
+          torontosun.com
+          tumblr.com
+          twitpic.com
+          usatoday.com
+          viddler.com
+          videojug.com
+          vine.co
+          walmart.com
+          washingtonpost.com
+          wi.st
+          wikia.com
+          wikihow.com
+          wired.com
+          wistia.com
+          wonderhowto.com
+          wsj.com
+          zappos.com
+          zillow.com
+        )
+      end
+
+      # Often using the `html` attribute is not what we want, like for some blogs that
+      # include the entire page HTML. However for some providers like Flickr it allows us
+      # to return gifv and galleries.
+      def self.default_html_providers
+        ['Flickr', 'Meetup']
+      end
+
+      def self.html_providers
+        @html_providers ||= default_html_providers.dup
+      end
+
+      def self.html_providers=(new_provs)
+        @html_providers = new_provs
+      end
+
+      # A re-written URL converts http:// -> https://
+      def self.rewrites
+        @rewrites ||= https_hosts.dup
+      end
+
+      def self.rewrites=(new_list)
+        @rewrites = new_list
+      end
+
+      def self.https_hosts
+        %w(slideshare.net dailymotion.com livestream.com imgur.com flickr.com)
+      end
+
+      def self.host_matches(uri, list)
+        !!list.find { |h| %r((^|\.)#{Regexp.escape(h)}$).match(uri.host) }
+      end
+
+      def self.probable_discourse(uri)
+        !!(uri.path =~ /\/t\/[^\/]+\/\d+(\/\d+)?(\?.*)?$/)
+      end
+
+      def self.probable_wordpress(uri)
+        !!(uri.path =~ /\d{4}\/\d{2}\//)
+      end
+
+      def self.allowed_twitter_labels
+        ['brand', 'price', 'usd', 'cad', 'reading time', 'likes']
+      end
+
+      def self.===(other)
+        other.kind_of?(URI) ?
+          host_matches(other, allowed_domains) || probable_wordpress(other) || probable_discourse(other) :
+          super
+      end
+
+      def to_html
+        rewrite_https(generic_html)
+      end
+
+      def placeholder_html
+        return article_html if is_article?
+        return image_html if is_image?
+        return Onebox::Helpers.video_placeholder_html if is_video? || is_card?
+        return Onebox::Helpers.generic_placeholder_html if is_embedded?
+        to_html
+      end
+
+      def data
+        @data ||= begin
+          html_entities = HTMLEntities.new
+          d = { link: link }.merge(raw)
+
+          if !Onebox::Helpers.blank?(d[:title])
+            d[:title] = html_entities.decode(Onebox::Helpers.truncate(d[:title], 80))
+          end
+
+          d[:description] ||= d[:summary]
+          if !Onebox::Helpers.blank?(d[:description])
+            d[:description] = html_entities.decode(Onebox::Helpers.truncate(d[:description], 250))
+          end
+
+          if !Onebox::Helpers.blank?(d[:site_name])

[... diff too long, it was truncated ...]

GitHub sha: c27aebce

1 Like

This commit appears in #430 which was approved by techAPJ. It was merged by lis2.