fix problem with RSS feeds with spaces in URLs
The following feed does not work properly in Feedly:
This is because the URLs have spaces in them. Now, this isn't strictly correct -- they should be encoded as %20 -- but Feedly's behavior of truncating them at the first space isn't (I think) correct either.
Toland H commented
Here's another broken feed: http://randomc.net/feed
I've already spent over the course of 15 emails back and forth trying to convince the site owner to fix this, but he wouldn't budge and instead pointed to this thread.
I understand their feed is technically what's broken, but feedly's behavior breaks workarounds implemented by the browser.
Please fix this.
Chris Hubbell commented
Please to fix
As an alternative, send ninja teams to beat-down site owners who put unescaped blank-spaces in URLs.
Net Newswire also does this. you should properly encode your URLs.
Kisai Yuki commented
The Pixietrix sites RSS feeds were generated from the titles of the comics. We've switched to using the article I'd numbers, but the way Feedly was handling this is still invalid according to the XML specification, of which RSS feeds are.
A for the space escaping issue, when I originally wrote the RSS output script, nobody ever reported a problem with it until google shutdown google reader. So who knows what else Feedly is broken on.
http://www.rssboard.org/rss-profile#data-types-url , references http://www.apps.ietf.org/rfc/rfc3987.html which is correct when dealing with Unicode.
But the overall problem is that the rss puts links as the element data and not as a attribute, thus the processing that would normally be done to escape an attribute is not done, thereby leaving whitespace to be interpreted be passed through the processing if it is actually following XML rules, and is actually a contradiction. Pixietrix RSS generator was written against what validated at the time ( most webcomics RSS feeds do not validate because they dump html directly into the feed)
At any rate Feedly is still broken, and what has been put in place is a bandaid till there is a way to not create a MSIE type of mess just to support Feedly.
> Just because your web browser fixes it doesn't mean it's required to. The web browser is only doing it to be nice because it's a common mistake. But it's definitely a mistake, and I guarantee you feedly won't be the only service that has a problem with it.
That's not really an excuse (and Reader, which I expect many switched from, was able to read these feeds).
Hell, if you go with strict interpretation sadly your feed reader will handle about half the feeds out there if even that, feed reading is a wild west of broken standards (aside from ATOM) and even more broken implementations. And of course this is generally a situation where the feed provider either has no way to handle feedback or doesn't have the technical chops and can't fix it up.
Either way, I expect Feedly already applies a number of fixes to broken feeds, this is but one more.
Just because your web browser fixes it doesn't mean it's required to. The web browser is only doing it to be nice because it's a common mistake. But it's definitely a mistake, and I guarantee you feedly won't be the only service that has a problem with it. If feedly were to pass the entire thing as-is to the HTTP GET request, as was suggested in the above comment, it would break, because spaces ARE NOT LEGAL in the URL part of an HTTP GET. The first space in the URL is treated as a separator by the protocol, and whatever's after it is treated as a protocol version identifier, and not part of the URL. Most web servers will look at that and go "I have no clue what protocol this is because I don't recognize that name" and just reject the request. The only way for Feedly to pull this off is if they do the escaping themselves before passing it through, and then that itself would violate the "always pass anything that's not markup through to the application" that you just invoked.
If you're aiming for maximum compatibility then you need to get your site software to escape the spaces in the URLs. If it's not escaping those, who knows what else it isn't escaping, and you could be exposed to all kinds of security issues such as cross-site scripting.
For more information see the URL RFC (http://www.ietf.org/rfc/rfc1738.txt) and search for 'Unsafe'.
Unsafe characters in URLs need to be escaped, and space is an 'unsafe' character
The browser is correcting a common issue. URLs should not have spaces in them (the XML spec only suggests that it's valid XML, not that a URL in it is valid). The RSS feed from pixitrixcomix should either encode the spaces in URLs with %20 or + (plus sign).
Kisai Yuki commented
Feedly does not appear to be following the XML rules, which is why people are having this problem. Spaces inside a quote pair of an attribute are part of the attribute value, not line termination.
"An XML processor must always pass all characters in a document that are not markup through to the application."
Hence it is not passing the whitespace to the underlying http GET request.
If you type out any of the URL's in Pixietrixcomix by hand in your web browser you will see that the browser itself converts the spaces to %20's. Feedly is not doing this, so the problem is in Feedly and not the RSS feed.