FIX: Return consistent character encodings when downloading S3 uploads

approved
#1

FIX: Return consistent character encodings when downloading S3 uploads

Net::HTTP always returns ASCII-8BIT encoding. File.read auto-detects the encoding. This leads to an encoding inconsistency between a fresh download, and a cached download. This commit ensures all downloaded files are treated equally, by always returning the cached version from the filesystem, even during initial download.

One symptom of this problem is during theme exports: https://meta.discourse.org/t/116907

Related ruby ticket: https://bugs.ruby-lang.org/issues/2567

diff --git a/lib/file_store/base_store.rb b/lib/file_store/base_store.rb
index 85c3be2..15b85cb 100644
--- a/lib/file_store/base_store.rb
+++ b/lib/file_store/base_store.rb
@@ -85,6 +85,7 @@ module FileStore
             follow_redirect: true
           )
           cache_file(file, filename)
+          file = get_from_cache(filename)
         end
 
         file
diff --git a/spec/components/file_store/base_store_spec.rb b/spec/components/file_store/base_store_spec.rb
index ac2aebd..5ef2579 100644
--- a/spec/components/file_store/base_store_spec.rb
+++ b/spec/components/file_store/base_store_spec.rb
@@ -45,4 +45,33 @@ RSpec.describe FileStore::BaseStore do
       expect(FileStore::BaseStore.new.get_path_for_optimized_image(optimized)).to eq(optimized_path)
     end
   end
+
+  describe '#download' do
+    before do
+      `rm -rf #{FileStore::BaseStore::CACHE_DIR}`
+
+      SiteSetting.enable_s3_uploads = true
+      SiteSetting.s3_upload_bucket = "s3-upload-bucket"
+      SiteSetting.s3_access_key_id = "some key"
+      SiteSetting.s3_secret_access_key = "some secret key"
+
+      stub_request(:get, upload_s3.url).to_return(status: 200, body: "Hello world")
+    end
+
+    let(:upload_s3) { Fabricate(:upload_s3) }
+
+    it "should return consistent encodings for fresh and cached downloads" do
+      # Net::HTTP always returns binary ASCII-8BIT encoding. File.read auto-detects the encoding
+      # Make sure we File.read after downloading a file for consistency
+
+      store = FileStore::BaseStore.new
+
+      first_encoding = store.download(upload_s3).read.encoding
+
+      second_encoding = store.download(upload_s3).read.encoding
+
+      expect(first_encoding).to eq(Encoding::UTF_8)
+      expect(second_encoding).to eq(Encoding::UTF_8)
+    end
+  end
 end

GitHub sha: ef660d5a

1 Like
#2

This commit has been mentioned on Discourse Meta. There might be relevant details there:

Approved #3