FIX: vBulletin importer should import unreferenced attachments (PR #12187)

The regular vBulletin importer does not import attachments which haven’t been referenced inside the post. vBulletin supports this way of attaching and will just show the attachments at the end of the post. After our migration we lost roughly 25% of attachments due to this. I developed this fix which i successfully ran on our production env.

Fixed things:

  • Will import all unreferenced attachments
  • Allows resuming und will not append attachments twice
  • Add codepath that will speedup resuming before falling back to upload deduplication for the remaining attachments

Why is this a draft pull request?

  • I want to gather feedback if the code/fix looks good to you before also merging this to the bulk importer
  • I want to port this to the vBulletin bulk importer, but the bulk importer is broken at the moment, which i need to fix first
  • I want to run a full test. At the moment i just tested running it on my “broken” import

Is there any testing infrastructure for importers?

FYI: The vBulletin 5 importer seems to be not affected. I opted not to fully reuse the vB5 code, but to fix the old code as simple as possible. (During research i found someone working a vB5 bulk importer: Comparing discourse:master...justindirose:vb5-bulk · discourse/discourse · GitHub)

GitHub

This pull request has been mentioned on Discourse Meta. There might be relevant details there: