DEV: Replace custom uri_encode logic with Addressable (#420)

DEV: Replace custom uri_encode logic with Addressable (#420)

Since 8ff3fff introduced the addressable gem as a runtime dependency, it’s now possible to replace all custom URI encoding logic with addressable’s methods.

It does handle some characters differently - e.g. it now doesn’t escape URLs in URI fragments (see the added test). Previous escaping behavior was reported as an issue in Discourse.

diff --git a/lib/onebox/helpers.rb b/lib/onebox/helpers.rb
index aa38bff..9ebc86e 100644
--- a/lib/onebox/helpers.rb
+++ b/lib/onebox/helpers.rb
@@ -190,54 +190,21 @@ module Onebox
       src
     end
 
-    RFC_3986_URI_REGEX ||= /^(?<scheme>([^:\/?#]+):)?(?<authority>\/\/([^\/?#]*))?(?<path>[^?#]*)(\?(?<query>[^#]*))?(#(?<fragment>.*))?$/
-    DOUBLE_ESCAPED_REGEXP ||= /%25([0-9a-f]{2})/i
-
-    # Percent-encodes a URI query parameter per RFC3986 - https://tools.ietf.org/html/rfc3986
-    def self.uri_query_encode(query_string)
-      return "" unless query_string
-
-      # query can encode space to %20 OR +
-      # + MUST be encoded as %2B
-      # in RFC3968 both query and fragment are defined as:
-      # = *( pchar / "/" / "?" )
-      # CGI.escape turns space into + which is the most backward compatible
-      # however it doesn't roundtrip through URI.unescape which prefers %20
-      CGI.escape(query_string).gsub('%25', '%').gsub('+', '%20')
-    end
-
     # Percent-encodes a URI string per RFC3986 - https://tools.ietf.org/html/rfc3986
     def self.uri_encode(url)
       return "" unless url
 
-      # parse uri into named matches, then reassemble properly encoded
-      parts = url.match(RFC_3986_URI_REGEX)
-
-      encoded = ""
-      encoded += parts[:scheme] unless parts[:scheme].nil?
-      encoded += parts[:authority] unless parts[:authority].nil?
-
-      # path requires space to be encoded as %20 (NEVER +)
-      # + should be left unencoded
-      encoded += Addressable::URI.encode(parts[:path]) unless parts[:path].nil?
-      encoded.gsub!(DOUBLE_ESCAPED_REGEXP, '%\1')
-
-      # each query parameter
-      if !parts[:query].nil?
-        query_string = parts[:query].split('&').map do |pair|
-          # can optionally be separated by an =
-          pair.split('=').map do |v|
-            uri_query_encode(v)
-          end.join('=')
-        end.join('&')
-        encoded += '?' + query_string
-      end
+      uri = Addressable::URI.parse(url)
 
-      unless parts[:fragment].nil?
-        encoded += '#' + uri_query_encode(parts[:fragment])&.gsub('%21%2F', '!/')
-      end
+      encoded_uri = Addressable::URI.new(
+        scheme: Addressable::URI.encode_component(uri.scheme, Addressable::URI::CharacterClasses::SCHEME),
+        authority: Addressable::URI.encode_component(uri.authority, Addressable::URI::CharacterClasses::AUTHORITY),
+        path: Addressable::URI.encode_component(uri.path, Addressable::URI::CharacterClasses::PATH + "\\%"),
+        query: Addressable::URI.encode_component(uri.query, "a-zA-Z0-9\\-\\.\\_\\~\\$\\&\\*\\,\\=\\:\\@\\?\\%"),
+        fragment: Addressable::URI.encode_component(uri.fragment, "a-zA-Z0-9\\-\\.\\_\\~\\!\\$\\&\\'\\(\\)\\*\\+\\,\\;\\=\\:\\/\\?\\%")
+      )
 
-      encoded
+      encoded_uri.to_s
     end
 
     def self.uri_unencode(url)
diff --git a/spec/lib/onebox/helpers_spec.rb b/spec/lib/onebox/helpers_spec.rb
index afa35eb..df2e82a 100644
--- a/spec/lib/onebox/helpers_spec.rb
+++ b/spec/lib/onebox/helpers_spec.rb
@@ -93,5 +93,6 @@ RSpec.describe Onebox::Helpers do
 
     it { expect(described_class.uri_encode("https://example.com/random%2Bpath?q=random%2Bquery")).to eq("https://example.com/random%2Bpath?q=random%2Bquery") }
     it { expect(described_class.uri_encode("https://glitch.com/edit/#!/equinox-watch")).to eq("https://glitch.com/edit/#!/equinox-watch") }
+    it { expect(described_class.uri_encode("https://gitpod.io/#https://github.com/eclipse-theia/theia")).to eq("https://gitpod.io/#https://github.com/eclipse-theia/theia") }
   end
 end

GitHub sha: de1ba8c3