The syndication situation, as I see it.

I'm usually a big fan of off-the-shelf web development. For example, if I need a search engine for a site, I'll happily download somebody else's script if it lets me save my time for more important jobs, like developing the site content. The ultimate benefit One of the main benefits of Open Source software is that it lowers the cost of entry into a market, by letting developers get basic tools cheap, so they can focus development resources on "big ideas". With that in mind, I don't usually bother writing my own scripts unless I've got a "big idea" (big for me, anyway) that requires a custom programming job (like cymk2rgb, for instance).

I had to break my own rule when I started blogging, because the off-the-shelf solutions seemed to have different priorities than I do. (Most weblog software creates or encourages creating bad websites. More on that another day.) The Perl script used for this blog is an "as I go, when I have the spare time" project, so it's not feature-complete; there are standard tools (like comments and search boxes) that probably won't be added until I've decided the blog needs them, and some common features that will never be added. (The "mood icon" is one of the stupidest things I've ever seen. If you want people to know what your mood is, write about it. Jeez.) Content syndication (via RSS) fell somewhere in between: It wasn't a real necessity until the blog had some traffic (and even that's arguable), but I already had some code written to produce RSS 0.91 files, leftover from the days I managed a My Netscape channel on another site. Integrating that code into the blogging script was trivial, so I added an old-fashioned "netscape.xml" file to the blog, linked to it with the ubiquitous "XML" icon, and figured that was an acceptable stop-gap until I felt like reading the newer RSS specifications.

That was, until I bothered to run Analog and read the access report for this blog. In the ten days following my first ping of Weblogs.com, various robots and aggregators attempted to access the following syndication files: backend.php, b2rdf.php, b2rss.php, b2rss.xml, default.rss, index.rdf, index.rss, index.xml, netscape.xml, rss, rss.php, rss.rdf, and rss.xml. Clearly, some people think syndication is more important that I do. I've spent the last few days reading specifications, mangling code, playing with aggregators, and catching up on the state of the art in syndication. I can't say I'm really impressed with what I've seen so far. It's a field beset with multiple problems.

The biggest problem with content syndication (and virtually everyone involved in the subject admits this) is the excess of syndication formats. There are two different major formats using the same name (RSS 1.0 and RSS 2.0), a pair of annoying ancestor languages that refuse to die (RSS 0.9 is the ancestor of RSS 1.0, but RSS 0.91 is the base for RSS 2.0), and an upstart heir waiting the wings (Atom, formerly known as Echo, formerly known as Pie). My blogging script is now producing files in three versions of RSS, and I'll probably have to include Atom at some point as well.

A related, but less critical, issue is the lack of standard naming conventions for syndication files, as evidenced by the fourteen different file names that robots are looking for. After creating this blog's RSS files, I felt obligated to set up server-side redirects from the "imaginary" files to the "real" files, so that the wayward robots would stop getting "File Not Found" error messages. (I'll bet money, however, that some aggregators aren't designed to understand all the HTTP status codes.) Before I could do that, however, I had to use Google to find out which file names were associated with which versions. (For instance, all the "backend.php" scripts seem to produce RSS 0.91, so "backend.php" gets redirected to "netscape.xml".) And then, I had to decide what MIME media types to use for the three files I created. That's where things get really bad, because now I'm making decisions that affect end-users' interaction with the blog.

It's like this: Most sites syndicating themselves by RSS still use the "application/xml" media type, which, while technically correct, makes implementing RSS-readers for end-users tricky. Normally, when a user selects a hyperlink in a web page, his browser uses the Content-Type header provided by the web server to determine two things: Whether or not the browser can display the file by itself, and (if it can't) what "helper application" (often a browser plug-in) should be used to display/use the file. This is a form of content negotiation. The "application/xml" content type is too general for negotiation/syndication purposes, because it includes dozens of XML-derived languages -- a generalized XML reader may display RSS in a user-readable manner, but it doesn't know how to schedule updates, while an RSS reader configured to intercept all "application/xml" files would be trying to read files that aren't syndication-related. That's a no-win situation for end-users and developers, so neither group bothers to configure their RSS readers as "application/xml" helpers.

(In contrast to the format/filename/content-type chaos of the RSS scene, a bunch of search engine developers agreed to follow the "robots.txt" standard back in 1996, because getting a functional standard to market was more important to them than making the standard feature-rich. They picked a format, a standard name, and a MIME media type, then they all stuck to it. The file may not be fancy, but everyone knows what it is and where to find it.)

In order for RSS-reading software to integrate into a web browser as well as other file readers (like Shockwave), RSS files need a unique content type. Right now, there are proposals for "application/rss+xml", and "application/rdf+xml" (which, like "application/xml", is too general), and "application/atom+xml", but they're not in widespread use, and they're not yet IANA-recognized. They also break a few stubborn RSS readers that only accept "application/xml". (A lot of Some programmers creating RSS readers seem to have forgotten the old maxim (Postel's Law), "Be liberal in what you accept and be conservative in what you create" -- if more than one media type can describe a document, then aggregators should be willing to accept all legal applicable types.) That creates a dilemma for webmasters: If we use the IANA-recognized content-type, we support a wider variety of RSS readers, but create links that discourage better integration of syndication software into the browsing experience.

Which leads us to the single greatest problem with syndicating websites through RSS: Much of the desktop software available for reading syndicated content isn't very smart. I can't find any RSS-reading programs that install themselves as helper applications in MS Windows that will intercept "application/rss+xml" files. Most RSS-reading software takes the form of stand-alone aggregators: If a user surfing the web finds a syndication feed she wants to subscribe to, she has to launch a separate aggregator and copy the syndication feed's URI from browser to aggregator. The user shouldn't be required to activate that second program, let alone have to copy the URI manually. She should be able to click the feed link, let the browser start the aggregator, and have the aggregator ask a yes-or-no question: "Do you want to subscribe to this news feed?" Using a website shouldn't require two separate software programs. It's essentially a usability issue; adding a helper app to the browser is more usuable than running two programs.

(The "Subscribe to this site in Radio"-type buttons that various aggregators use is a step in the right direction, usability-wise, but it's still a cheap hack that depends on proprietary protocols instead of standard-based content-negotiation. It's creating a proliferation of proprietary subscription buttons. To me, one of the oddest things about developers of Radio and other aggregators is their apparent disdain for content-negotiation. Instead of advocating a standard that leverages existing protocols that could work now, they're continuing to advocate client-side standards that would slow the development of a universal button. Cheap proprietary hacks only suceed for companies who have market dominance -- something independent developers should keep in mind now that Microsoft is adding an RSS reader to the next version of Windows. They're likely to use the sledgehammer solution to finding syndication feeds -- request "/rss.xml" from every site the user visits, and then have the web browser pop-up the yes-or-no question when it finds a feed. Then it's all over, because Microsoft has set the de facto standard for feed file name, location, and format.)

There are some syndication advocates who talk about the syndicated web as something separate or superior to the traditional web (Some bloggers are calling it it the Metaweb), thinking that will help promote the use of syndicated feeds. That's a mistake, because normal users don't want another web in their life, the want the web they've got now to do more. The original strength of the World Wide Web was that it made web browsers the central platform for integrating different protocols like http, ftp, gopher, and e-mail and accessing different file types like HTML, GIF, and MPEG. Users didn't need to know what file type they were accessing, or what software tool should access which protocol; they clicked on something and something useful happened. Most current RSS-readers don't integrate the "syndication experience" with the "browsing experience", and that's why most users don't want them.

If RSS-based syndication is going to suceed with the normal users, three things need to happen: Developers have to create software that integrates into the browser better, so that users don't have to consciously run two programs. For that to work, content providers have to start supporting a helper-based architecture by using more specific syndication-specific media types like "application/rss+xml". Both sides need to stop talking about syndication like it's separate from the Web at large, and work to convince users that syndication is an improvement to the existing Web, not a replacement or a strange cousin.

I'm just a content provider, so all I can do is label properly and advocate better. The first part is easiest: For now, I'm identifying this blog's RSS 1.0 file as "application/rdf+xml", its RSS 2.0 file (the default syndication file for this blog) as "application/rss+xml", and leaving the RSS 0.91 file as an "application/xml" sop for the really old-school aggregators to deal with. (If it turns out these file associations break the entire Web, I'll have to reconsider them.) Advocacy is more difficult, because I haven't found any truly great aggregators I can refer users to. (Right now, I'm testing Awasu, which integrates with Internet Explorer better than most programs. Too bad I use Mozilla.) When I find one I'm confident in recommending, I pretty much have to give it a free advertisment on this site, so that users know why they should download it.

Posted at 11:58:16 PM EST on 17 December 2003 from Trenton, MI