FEATURE: Replace SimpleRSS with Ruby RSS module (PR #5311)

DISCUSSION: https://meta.discourse.org/t/simple-rss-is-buggy-and-should-be-forked-or-replaced/27529/23?u=xrav3nz

This PR got a bit chubby, so here’s a brief breakdown:

  1. More tests: PollFeedJob should correctly parse ATOM feeds

    #5199 adds the integration tests for parsing and creating topics from RSS feed, this adds the tests for ATOM feed

  2. FeedItemAccessor

    TL;DR this provides a consistent interface for accessing the content of a feed item element.

    Say we have a rss_item and an atom_entry, both parsed by the built-in Ruby RSS module. The interface for accessing the title are quite different: rss_item.title and atom_entry.title.content respectively.

    FeedItemAccessor acts as a general wrapper that provides a consistent interface: accessor.element_content(:title) will extract the content of the title element, and thus saves the pain from dealing with two different type of feed elements.

  3. FeedElementInstaller

    It ‘installs’ new elements so that RSS::Parser will accept and parses non-standard RSS/ATOM elements. EDIT: See https://github.com/discourse/discourse/pull/5311#issuecomment-345152707.

  4. replace SimpleRSS with Ruby RSS module

    Having the aforementioned points already in-place, this is actually the easier part. :dark_sunglasses:

Open to suggestions :muscle: Cheers


You’ve signed the CLA, xrav3nz. Thank you! This pull request is ready for review.

This pull request has been mentioned on Discourse Meta. There might be relevant details there:


This feels a bit unsafe due to infinite redirection potential, should probably use final destination here to be consistent and probably excon to download (we do want a handover method on final destination so maybe add that as well)

should probably use final destination here to be consistent and probably excon to download

Didn’t know about the latter, will do!

a handover method on final destination

Could you explain this a bit more? Specifically what is a handover method and what is it expected to do?

specifically final destination is a bit wasteful cause it does a HEAD only to do a GET afterwards to get the actual data, in this case a single GET would do, so it would be nice if you could use it to handover the “open” GET request and avoid the HEAD req.

Thanks for taking the time to explain that, but I am still a bit confused :cold_sweat: on the following point

in this case a single GET would do

The proposed flow to resolve the potential infinite redirection is to: get the final destination of the feed url (with HEAD, internally), and then excon GET’s the resolved url.

The HEAD request is unneeded when there are no (infinite) redirection, of which we cannot be sure without FinalDestination (that sends HEADs). What am I missing here?

Thing is FinalDestination could just use GET and hand over the get something like this could work

txt = FinalDestination.new(url).get

Then it can internally be smart enough to do GETs instead of HEAD

This optimises for the general case where a single GET is good enough to find the final destination, redirects are pretty much the same cost be they HEAD or GET so overall cost is unconditionally cheaper.

I am scope creeping here, so for starters just implement final destination as it is implemented elsewhere, I can merge it in and then we can do another pr for any final destination optimisations.

Will look at the FinalDestination optimizations and open another PR when ready!

EDIT: See https://github.com/discourse/discourse/pull/5311#issuecomment-345152707.

As per RSS 2.0 Specifications#Extending RSS:

A RSS feed may contain [non-standard elements], only if those elements are defined in a namespace

FeedElementInstaller only installs valid (i.e. namespaced) elements. Let me know if this may not be acceptable.

I think this is ok, @eviltrout what is your call here?

I vote we merge this @samsaffron if you are Ok with it. It’s now or never (post 2.0)

OK, merging this, thanks @xrav3nz