A couple of months or so ago, I made a conscious shift in my focus with the DiSo Project. Instead of continuing to concentrate on some of the higher level deliverables like WordPress plugins, I decided it was time to step back and evaluate where the development community (specifically the PHP development community) is with the Open Stack. For the purposes of this discussion, I’m going to use Johannes Ernst’s redux of John McCrea’s Open Stack graphic. I’m also only going to concentrate on three of the middle components: Metadata Discovery, Authentication, and Access Control.
First a quick note, to make sure this discussion does not get derailed. There is a time and a place to talk about these topics in the abstract. That is incredibly important work, especially in the development of these specifications, but that’s not what I’m currently interested in. I’m focused on developing solid PHP libraries to implement these technologies. Why PHP? Because that’s what WordPress uses, which is the current platform I’m targeting with the work I’m doing in DiSo. I know that PHP isn’t as sexy as Python or Ruby, but it’s what we’re using. I agree that we need solid libraries written in these other languages as well, but that’s not my focus. PHP is widely deployed and used, including companies very involved in implementing the Open Stack like Facebook and Plaxo (Luke, Joseph – I’m expecting some help from you guys :) ).
I’ll also note that I’m specifically targeting PHP 5. PHP 4 is no longer supported, and maintaining backwards compatibility (especially when talking about XML parsing) is a complete pain. This creates a problem with getting code into WordPress core, but I’m okay with that… they’ll move to PHP 5 eventually.
Let’s start with the most mature library we’ve got. JanRain made a huge name for themselves in the OpenID community a couple of years ago by providing open source libraries in a number of different languages, including of course PHP. Like any library, there are a few weird things here and there, but by and large it is an excellent implementation that has served the community (including this developer) very well. Last week, JanRain announced that they were restructuring the development process of the PHP library to make it more open to developers. The code itself has moved from their internal darcs repository to github, they’ve added Luke Shepard of Facebook and myself as committers, and releases, bug tracking, etc will eventually be moved to the Google Code project. Going forward, we’ll be looking at trimming down the library a bit, removing support in core for older protocol versions and edge cases that weren’t really used, and overall making it easier for developers to use.
There are two OAuth PHP libraries that I’m aware of, the “official” library stored in the OAuth Google Code project, and the Mediamatic library from Marc Worrell. The former library seems to have more users because of it’s exposure from the OAuth website, and is much lighter weight than the Mediamatic library (too much so for my taste). I initially chose the Mediamatic library for my work in getting OAuth working with WordPress, but eventually found some problems with the general library architecture. After some discussion with developers of both libraries, I’ve begun work on a new OAuth library. I re-architected the library from scratch, and then used a combination of the two libraries for much of the actual implementations. It’s probably about 80+ percent done, and should hopefully provide something both communities can work with.
Discovery has certainly received the least amount of love from the development community, which is a bit ironic given that it’s a foundational part of almost every application of the Open Stack. There’s no shortage of metadata discovery and parsing libraries: Joseph Smarr contributed one to the xrds-simple Google Code repository, the OpenID library has its own, and the Mediamatic OAuth library has its own. Yet amazingly, none of these help you at all if you’re wanting to manipulate or publish a metadata document. They’re all half-baked, each written for a very specific use-case. What we need is a full implementation of the discovery protocols. And that, of course, is where it gets a little more complicated…
Disclaimer: If you really want everything there is to know about this subject, go read the writings of Eran Hammer-Lahav… I’m just going to gloss over it a bit.
Metadata discovery includes two steps: you need to know how to get the metadata about a resource, and you need to know what format that metadata is in so that you can parse it and make sense of it. OpenID uses a technology known as Yadis to retrieve the metadata document, which is in an XML language known as XRDS (Extensible Resource Descriptor Sequence). OAuth Discovery uses a combined and simplified version of these two known as XRDS-Simple. Discovery for OpenID and OAuth is more-or-less compatible.
Now, there is also work being done in the OASIS XRI TC (of which I’m a member) to develop the simpler, and more uniform successor to these protocols. Retrieval of the metadata will use a collection of methods known as LRDD (pronounced “lard”), while the metadata itself will be in a much simpler format known as XRD. While identical in spirit, these are complete rewrites of the previous specs. The new specs are not compatible with the old, but they are also designed so that they do not conflict either, so that both may be used simultaneously. Shifting to these new discovery protocols will certainly not be easy, but believe me when I tell you that it will be worth it. In fact, it’s absolutely essential for players like Google to implement OP-driven identifier selection (allowing users to login with OpenID by simply entering “gmail.com”).
So as I said earlier, we don’t have any real good discovery libraries for PHP. As part of my work on WordPress, I started development on a XRDS-Simple library in PHP. More recently, I created a separate branch of the code which implements LRDD+XRD exclusively. Realistically, we’ll probably need a library which handles both the old and new protocols for a while. The idea would be that none of the higher level libraries like OpenID or OAuth need worry about metadata discovery, except for maybe a lightweight wrapper around the discovery library. The new OAuth library I’m working on will do this from day one; the existing OpenID library will take a little while, but I think we’ll eventually see it rely on a separate library for discovery.
Feedback and Help
First of all, I welcome any feedback on the implementations that currently exist, especially the OAuth and discovery libraries I’m working on. They are not complete and most certainly not production ready, but they’re getting close. I’d also like to solicit development help, especially from people with larger deployments and/or a vested interest in this technology. All the new development is happening on github, so creating a clone to hack on is incredibly simple. Even if you don’t have development cycles you can put into this, I’ve already got at least one technical decision I need to make that I’d love feedback on, which I’ll be covering in my next post: “Why Does HTTP Suck So Much in PHP”.