<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	xmlns:creativeCommons="http://backend.userland.com/creativeCommonsRssModule"
>

<channel>
	<title>willnorris.com &#187; directed identity</title>
	<atom:link href="http://willnorris.com/tag/directed-identity/feed" rel="self" type="application/rss+xml" />
	<link>http://willnorris.com</link>
	<description>there&#039;s more to life than this</description>
	<lastBuildDate>Tue, 15 May 2012 21:57:32 +0000</lastBuildDate>
	<language>en-US</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.4-beta3-20574</generator>
<creativeCommons:license>http://creativecommons.org/licenses/by-nc-sa/3.0/</creativeCommons:license>
		<item>
		<title>A New Kind of OpenID Proxy</title>
		<link>http://willnorris.com/2009/08/a-new-kind-of-openid-proxy</link>
		<comments>http://willnorris.com/2009/08/a-new-kind-of-openid-proxy#comments</comments>
		<pubDate>Mon, 03 Aug 2009 19:21:57 +0000</pubDate>
		<dc:creator>Will Norris</dc:creator>
				<category><![CDATA[identity]]></category>
		<category><![CDATA[technology]]></category>
		<category><![CDATA[directed identity]]></category>
		<category><![CDATA[openid]]></category>
		<category><![CDATA[privacy]]></category>
		<category><![CDATA[proxy]]></category>

		<guid isPermaLink="false">http://willnorris.com/?p=840</guid>
		<description><![CDATA[After writing last weeks post on directed identity, I really got to thinking on the topic a little more. One of the things that has always bothered me about the prospect of sites requiring the use of directed identity, is that it means I can no longer use my self-hosted OpenID to login. Currently, I [...]]]></description>
			<content:encoded><![CDATA[<p>After writing last weeks post on <a href="http://willnorris.com/2009/08/best-practices-with-directed-identity">directed identity</a>, I really got to thinking on the topic a little more.  One of the things that has always bothered me about the prospect of sites <strong>requiring</strong> the use of directed identity, is that it means I can no longer use my self-hosted OpenID to login.  Currently, I have the <a href="http://wordpress.org/extend/plugins/openid/">WordPress OpenID plugin</a> installed, and I use it as <a href="http://willnorris.com/wordpress/index.php/openid/server">my sole OpenID provider</a>.  While the plugin does support identifier select (mainly for blogs which have multiple authors), it does not support directed identity.  But even if it did, it wouldn&#8217;t make much sense&#8230; an OpenID URL of the form &#8220;http://willnorris.com/openid/9eb4d59c1d488a4&#8221; doesn&#8217;t do a very good job of masking who it belongs to.  No matter how opaque the path, the URL is still rooted at the authority &#8220;willnorris.com&#8221; which means it can belong to only one person &#8212; me.</p>

<p>So what am I to do?  The most obvious answer is to simply use one of the larger hosted OpenID providers that support such a feature, like Google.  But I have some real problems with that, both philosophically and practically.  Fortunately, I think I may also have a better solution.</p>

<p><span id="more-840"></span></p>

<h2>Maintaining Control</h2>

<p>My first knee-jerk reaction to being forced to use an OpenID provider like Google is a purely philosophical argument.  If the whole point of OpenID was to create a completely distributed identity ecosystem, then isn&#8217;t this a huge step backwards if we&#8217;re no longer allowing for self-hosted OpenIDs in certain cases?  While I think it&#8217;s an unfortunate situation, it is a completely understandable one.  There really is no good way to maintain the privacy of the user, as I demonstrated above.  Remember that OpenID, unlike SAML, was never designed to mask the identity of the user&#8230; quite the opposite, in fact!  When you take any technology and try to apply it in situations it was not originally intended for, you have to be prepared for the reality that it&#8217;s not going to fit quite as nicely as one might like.</p>

<p>So philosophical arguments aside (as valid as they may be), what other reasons might one have for not wanting to use a third-party OpenID provider?  One of the biggest in my mind is privacy.  Have you ever looked at the &#8220;Visited Sites&#8221; page on <a href="http://myopenid.com/">MyOpenID</a> or a similar provider?  It shows which sites you&#8217;ve logged into, how many times, when the last login was, etc.  Pretty neat data to have at your disposal as a user, right?  But also potentially pretty scary data for JanRain to have for every user.  Now, I&#8217;m not suggesting that JanRain or any other OpenID provider purge this data&#8230; it&#8217;s actually a really valuable feature and I think it&#8217;s in the user&#8217;s best interest to have access to it.  But let&#8217;s remember what started this conversation&#8230; directed identity.</p>

<p><img src="http://willnorris.com/wordpress-content/uploads/2009/08/myopenid-visited-sites.png" alt="myopenid-visited-sites" title="myopenid-visited-sites" width="638" height="139" class="alignright size-full wp-image-842" style="border: 1px solid #000; padding: 1px;" /></p>

<p>While directed identity can be used for everyday logins, it really shines when maintaining user privacy becomes very important.  What if I&#8217;m a whistleblower who wants to report unsafe business practices to the government?  What if I participate in some taboo subject matter online that I want to keep hidden from family and friends.  What if I am a political dissident, and revelation of my identity could result in imprisonment or even death?  What if I simply want to exercise my constitutional right to privacy?  While some of these are rather exotic examples, they illustrate my point pretty well.  There are cases where privacy is truly important.  As I demonstrated in my last post on <a href="http://willnorris.com/2009/08/best-practices-with-directed-identity">directed identity implementations</a>, there are algorithms that can adequately protect the identity of the user from the site they are logging in to.  But remember, those algorithms do absolutely nothing for protecting my identity from my identity provider.  If privacy is my goal, then why would I trust Google or JanRain with my daily activities any more than anyone else?</p>

<p>In Andy Oram&#8217;s post on this topic &#8220;<a href="http://broadcast.oreilly.com/2009/07/shortening-cookies-using-openi.html">Shortening cookies: Using OpenID to improve government privacy online</a>&#8221; and his follow-up &#8220;<a href="http://radar.oreilly.com/2009/08/privacy-and-open-government-co.html">Privacy and open government: conversations with EPIC and others about OpenID</a>&#8221;, he talks about the possibility of the federal government running an OpenID provider which is used to authenticate users to other government websites and services.  First of all, I don&#8217;t think the government needs to stand up yet another OpenID provider.  The private sector has done a pretty good job so far of making sure people have portable identifiers, whether they actually realize it or not.  But more importantly, I simply don&#8217;t trust the government to <a href="http://www.google.com/search?q=warrantless+wiretapping">maintain and respect my privacy</a>.  In discussing whether the government is actually the right party to run this service, Oram mentions:</p>

<blockquote>
  <p>The problem is whether visitors can trust any particular server 1) to stay up, 2) not to go out of business, 3) not to leak information, 4) not to abuse the information for private gain, and 5) not to cave in to government pressure and release information outside of the scope of the law.</p>
</blockquote>

<p>It&#8217;s really the last point that bothers me.  This is a system that, by design, holds the links between the private identities of citizens, and the anonymous identifiers they are using to communicate with the government.  Not only are these individuals potentially screwed if the system is broken into, but also if some judge decides there is justifiable cause for revealing these identities.  Privacy is a matter of policy, rather than technical design.  This is not sufficient.  I think we need a system that protects user privacy as a technical feature, not simply as a policy decision.  We need a system that couldn&#8217;t reveal the identity of a user even if the Chief Justice himself were to order it.</p>

<h2>OpenID Proxy</h2>

<p>My idea for such a system is actually very simple in its design.  At a high level, it&#8217;s not unlike <a href="http://emailtoid.net/">Emailtoid</a>, JanRain&#8217;s <a href="https://rpxnow.com/">RPX</a>, or the never really launched <a href="http://vidoop.com/vidoopconnect/">Vidoop Connect</a>.  It is an OpenID proxy that stands between the actual relying party and OpenID provider, so that the two never actually communicate directly to one another.  The proxy faces both parties and therefore implements both sides of OpenID &#8212; it provides an OpenID Provider implementing directed identity which communicates with the actual relying party, and it also implements an OpenID relying party which communicates with the actual OpenID provider.  Let&#8217;s look at the basic user flow&#8230;</p>

<p>Our user, Alice, goes to OSHA.gov to report the unsafe work environments at her job.  Fearing retribution from her employer if they were to find out she reported them, Alice wants to remain anonymous in her communication with OSHA.  When logging in to OSHA&#8217;s whistleblower site, Alice enters the URL of an OpenID proxy, &#8220;proxy.example.net&#8221;.  This proxy could be run by the government itself, a private citizen&#8217;s rights organization, it doesn&#8217;t really matter.  OSHA&#8217;s OpenID relying party communicates with the OpenID Proxy&#8217;s server, and begins an identifier select OpenID flow, which results in Alice being sent over to the proxy.  Now, instead of having to create a new account at the proxy, Alice uses her real OpenID URL, &#8220;alice.example.org&#8221; to login.  After successfully authenticating, the OpenID proxy uses a hashing algorithm to generate an opaque URL for Alice, and returns that identifier back to OSHA.  So Alice was able to use her own self-hosted OpenID URL in a privacy preserving manner &#8212; OSHA has no way of identifying her based on the resulting directed identity issued by the proxy.</p>

<p>But what about the proxy itself?  As we&#8217;ve already mentioned, the OpenID provider which <strong>generates</strong> the directed identity is able to determine the user the identifier belongs to, even if a secret salt was used as part of the hashing algorithm.  In order to protect Alice&#8217;s identity from even the OpenID proxy, we need to do two more things.  Let&#8217;s look again at our hashing algorithm for generating directed identities:</p>

<pre><code>md5( username + openid_provider + relying_party + secret_salt )
</code></pre>

<p>For OpenID, that username would simply be Alice&#8217;s OpenID, &#8220;alice.example.org&#8221;.  If subpoenaed, the OpenID proxy would be able to determine whether a given directed identity indeed belonged to &#8220;alice.example.org&#8221; simply be running it through the above algorithm.  Remember that the secret salt is what prevents the relying party from using this same technique to identify the user?  Well what if we add a little secret salt of our own to the username portion of the algorithm?  What if we replaced &#8220;username&#8221; with &#8220;username + user_salt&#8221;?  That way, even the proxy itself wouldn&#8217;t be able to replay the hashing algorithm without actually knowing the correct salt value to plug in there.</p>

<p>Now, before I talk about the user salt itself, there is one caveat that must be pointed out&#8230; this is the second of the &#8220;two more things&#8221; I mentioned above.  The user salt is not exactly secret, because it must be provided to the OpenID Proxy so that it can be used to generate the directed identifier.  The thing that protects user privacy is that <strong>the proxy must not record the value of the user salt or the user OpenID</strong>.  It can&#8217;t log the values anywhere and it can&#8217;t store them in a database.  It must use the OpenID and salt to generate the directed identifier, and then get rid of them.  Otherwise, it would still be technically possible to trace the identifier back to the actual user.  This would be one reason why it might make sense for such a system to be run by a citizen&#8217;s rights organization that is generally more trusted to do everything possible to protect user privacy.  It&#8217;s also worth noting that nothing in this architecture suggests that there would be only one OpenID proxy&#8230; there can, and should, be several proxies for user&#8217;s to choose from, all using these same (or similar) techniques.</p>

<h2>Adding user salt</h2>

<p>There are two methods I can think of to add the user salt mentioned above.  The first is certainly the most straight-forward, but also a little more difficult for some users.  If Alice&#8217;s OpenID provider were to itself return a directed identity to the OpenID proxy, that would be sufficient for salting the proxy&#8217;s hashing algorithm.  So long as that directed identity is not being stored by the proxy, there would be no way to trace back any identifiers the proxy generates.  It&#8217;s worth noting that in this case, a self-hosted directed identity actually would be sufficient.  Remember earlier when I mentioned that the URL &#8220;http://willnorris.com/openid/9eb4d59c1d488a4&#8221; wouldn&#8217;t do much for protecting my identity?  That is true, but for the purposes of salting the proxy&#8217;s hashing algorithm, it would work just fine.  But this of course still requires that the user have access to an OpenID provider that supports directed identity.  The WordPress OpenID plugin which I use does not currently, so this wouldn&#8217;t work for me.</p>

<p>The second method of providing a user salt requires a slight modification to the OpenID provider, but should be a bit easier to do.  The salt could be provided separately by the OpenID provider as an extension on the OpenID transaction itself.  To make things easier, it could simply be a new Attribute Exchange attribute.  No need for new OpenID extensions, just a new attribute which represents a &#8220;user salt&#8221;.  The OpenID proxy would then combine the user&#8217;s OpenID together with the salt value, and use that to generate the final directed identity that is returned to the relying party.  If the proxy were subpoenaed to verify if a given directed identifier belonged to &#8220;alice.example.org&#8221;, it would be unable to do so without also knowing the user salt.  And as long as the salt was not being recorded anywhere, the proxy would be completely incapable of verifying the user that an opaque identifier belonged to.</p>

<p>So what about using some other things as the user salt?  How about the association used in the OpenID transaction?  Remember that the point is still for Alice to be able to login to OSHA.gov over time and be recognized as the same (possibly anonymous) user.  In order for this to work, she must return with the same directed identifier from the OpenID proxy each time.  And in order for the proxy to ensure that it generates the same directed identifier, it must use the same input values.  While an OpenID association <strong>is</strong> intended to be relatively long-lived, it can, and often does, change over time.  When this happens, the proxy would end up generating a new directed identifier for Alice, and she would no longer appear as the same user at OSHA.gov.</p>

<h2>Some practical matters</h2>

<p>So there are some additional practical matters that I think must be considered for this solution to truly work.  First of all, <em>what happens if the proxy&#8217;s secret salt is compromised?</em>  This would certainly be a bad thing, but not quite as bad as it would be in a standard directed identity situation.  Remember in our final algorithm from <a href="http://willnorris.com/2009/08/best-practices-with-directed-identity">last time</a>, the secret salt was the one and only key that protected the privacy of the user in a generated directed identifier.  If the salt is compromised by a party, they could then brute-force the hashing algorithm and identify the true identity of a user associated with a directed identifier.  But remember that one of the goals of the proxy is to protect the true identity of the user from even the proxy itself&#8230; that&#8217;s why we added the user salt.  So even if the secret salt of the proxy were compromised, an attacker would be unable to identify the user associated with a given directed identifier.  Additionally, if the proxy were aware of the compromised salt, they could transition over to a new secret salt using the same method I mentioned last time that USC uses to update relying parties of new USCID values.  The proxy would simply inform the relying party that user X, identified by this opaque identifier, was previously identified by this other opaque identifier, and their records should be updated.  Nothing about the user herself is revealed, only the transition from one opaque identifier to another.</p>

<p><em>What if the user still wants to maintain different personas for different relying parties?</em>  For example, what if Alice wants to reveal the name of her employer via an AX attribute when communicating with OSHA.gov, but not when communicating with WhiteHouse.gov?  Technically, the solution is pretty simple, but it&#8217;s not as user friendly as I would like.  The OpenID proxy would simply need to embed the trust root of the <strong>actual</strong> relying party inside its own trust root that it presents to Alice&#8217;s OpenID Provider.  Part of the OpenID flow is that a provider prompts the user if they trust the site that is asking them to authenticate, and additionally if they want to release any additional attributes to the party.  The relying party is identified using a URL known as the &#8220;trust root&#8221;.  So a user may see a prompt (borrowing from MyOpenID&#8217;s language) along the lines of:</p>

<blockquote>
  <p>You are signing in to <strong>proxy.example.net/</strong> as <strong>http://alice.example.org/</strong>.</p>
</blockquote>

<p>Traditionally, the decision that Alice makes here is recorded so that she is not prompted each time she logs in.  So if she wants to make a different decision depending on which site she is logging in to, then the OpenID proxy would need to change its trust root.  One example might result in a prompt that looks like:</p>

<blockquote>
  <p>You are signing in to <strong>proxy.example.net/site/osha.gov</strong> as <strong>http://alice.example.org/</strong>.</p>
</blockquote>

<p>This would record her decision specifically for &#8220;OSHA.gov via proxy.example.net&#8221;.  Using this method, she could easily release different attributes for different relying parties, without any changes to her OpenID provider whatsoever.  While I don&#8217;t like the idea of the user having to parse apart that URL to figure out what it means, it&#8217;s at least workable.  As a long term solution, a new OpenID extension could be defined which provides a more human friendly version of the trust root which explains to the user more clearly where her data is going.  In fact, I think that has already been proposed and being worked on.</p>

<h2>Proof of concept</h2>

<p>Though the explanation of the concept ends up being a little wordy, the design itself is actually quite simple.  Because of the strict requirement that identifiers <strong>not</strong> be stored, there is actually not much to it.  The only database storage would be for OpenID associations and nonces&#8230; everything else is calculated in real time.  I&#8217;ll be working with <a href="http://mtrichardson.com/">Michael Richardson</a> this week to be build a working implementation to show how it would work.  I&#8217;m also beginning conversations with <a href="http://epic.org/">EPIC</a> to help advise them on their recommendations to OMB regarding federal website cookie policies.  I welcome any questions or comments on this approach to privacy with OpenID.  Some of the specifics I just really thought about this last weekend, so I may be overlooking a few things.  But the overall architecture is nothing that hasn&#8217;t been done before, and has a strong, proven track record.</p>
]]></content:encoded>
			<wfw:commentRss>http://willnorris.com/2009/08/a-new-kind-of-openid-proxy/feed</wfw:commentRss>
		<slash:comments>16</slash:comments>
		</item>
		<item>
		<title>Best Practices with Directed Identity</title>
		<link>http://willnorris.com/2009/08/best-practices-with-directed-identity</link>
		<comments>http://willnorris.com/2009/08/best-practices-with-directed-identity#comments</comments>
		<pubDate>Mon, 03 Aug 2009 00:27:43 +0000</pubDate>
		<dc:creator>Will Norris</dc:creator>
				<category><![CDATA[identity]]></category>
		<category><![CDATA[technology]]></category>
		<category><![CDATA[directed identity]]></category>
		<category><![CDATA[openid]]></category>
		<category><![CDATA[saml]]></category>
		<category><![CDATA[shibboleth]]></category>

		<guid isPermaLink="false">http://willnorris.com/?p=831</guid>
		<description><![CDATA[Given the current discussion happening right now around federal website cookie policies, and the good response I got from my last post, I wanted to continue talking about directed identity a little bit. In this post, I want to talk about how directed identity has actually been implemented in projects I&#8217;ve been involved with, and [...]]]></description>
			<content:encoded><![CDATA[<p>Given the current discussion happening right now around federal website <a href="http://blog.ostp.gov/2009/07/24/cookiepolicy/">cookie policies</a>, and the good response I got from my <a href="http://willnorris.com/2009/07/openid-directed-identity-identifier-select">last post</a>, I wanted to continue talking about directed identity a little bit.  In this post, I want to talk about how directed identity has actually been implemented in projects I&#8217;ve been involved with, and what lessons can be learned from that.</p>

<p><span id="more-831"></span></p>

<h2>Shibboleth and SAML</h2>

<p><a href="http://shibboleth.internet2.edu/">Shibboleth</a> is an open source web single sign-on product which is very popular with universities around the world.  It is primarily an implementation of the <a href="http://saml.xml.org/saml-specifications">Security Assertion Markup Language</a> (SAML), but supports other identity protocols as well.  SAML v2 defines a specific type of identifier that may be used to identify a user within a transaction known as a <em>Persistent Identifier</em>.  The name itself does not convey it&#8217;s full meaning, but it is defined as &#8220;a persistent opaque identifier for a principal that is specific to an identity provider and a service provider or affiliation of service providers.&#8221;  This is SAML&#8217;s <em>directed identity</em>.</p>

<p>Shibboleth&#8217;s implementation of persistent identifiers has matured a bit over the years, but the main algorithm remains the same.  At it&#8217;s simplest, the identifier is simply a hash of three things: an identifier for the user, an identifier for the identity provider, and an identifier for the relying party.  Using md5 as our hashing algorithm, and using a few made up values, this would look like:</p>

<pre><code>md5( "jsmith" + "http://idp.example.com/" + "http://sp.example.com/" ) = 
    2f6bc52c7527747eff263f4183c7f402
</code></pre>

<p>When a relying party receives the string &#8220;2f6bc52c7527747eff263f4183c7f402&#8221;, it is completely opaque and reveals nothing about the identity of the user.</p>

<h2>Working with known algorithms</h2>

<p>As I mentioned before, Shibboleth is open source software.  All of our code is publicly available, including <a href="http://svn.middleware.georgetown.edu/view/java-shib-common/branches/REL_1/src/main/java/edu/internet2/middleware/shibboleth/common/attribute/resolver/provider/attributeDefinition/TransientIdAttributeDefinition.java?view=markup">the way that we calculate persistent identifiers</a>.  So how could a relying party use this knowledge to take the above opaque identifier and decipher the identity of the user it belongs to?  Pretty simply, if they have a list of possible usernames, or can reasonably guess them.  A list of usernames is far easier to get than you might imagine from unprotected university LDAP directories.  Then you simply iterate through the list and run the same algorithm until you find the matching hash value.  And even given a population of 30,000 usernames (the approximate number of students at <a href="http://www.usc.edu/">USC</a> where I worked), the following shell script churns through them in about 90 seconds:</p>

<pre><code>~$ date &amp;&amp; for i in `head -n 30000 /usr/share/dict/web2`; 
   do echo $i | md5 &gt;/dev/null; done &amp;&amp; date
Sun Aug  2 16:25:53 PDT 2009
Sun Aug  2 16:27:35 PDT 2009
</code></pre>

<p>To protect against such a brute-force attach to reveal the identity of a user, we added a secret salt to the hashing algorithm.  That way, you can&#8217;t regenerate the hash without also knowing the salt value:</p>

<pre><code>md5( "jsmith" + "http://idp.example.com/" + "http://sp.example.com/" 
    + "my-secret-salt" ) = cf79282c587897fb733d8338fe7bc9c2
</code></pre>

<p>It&#8217;s worth pointing out that, while use of a secret salt prevents a relying party from brute forcing the hash, it in no way prevents the identity provider from doing so, since they do of course know the secret salt.  Though we never ended up needing to do this at USC, we built the tools that would enable us to identify the user a persistent identifier belonged to.  In the event that a relying party reported that a particular user was abusing their system, we wanted to make sure we could identify who it was.</p>

<h2>The realities of deployment</h2>

<p>The above algorithm actually works pretty well for generating unique and secure opaque values for tuples of user, identity provider, and service provider.  But what happens when one of those values change?  At USC, users could request that their username be changed for a variety of reasons, and approximately 300 such requests were made each year.  So using the above algorithm, a username change would change every one of the user&#8217;s generated persistent identifiers.  And the reality is, businesses are often bought out by other businesses.  What happens when &#8220;http://sp.example.com/&#8221; needs to change to &#8220;http://sp.new-company.com/&#8221;?  That will effect the generated persistent identifier for <strong>every</strong> user.  How do you deal with these realities?  There are a number of ways, and I&#8217;ll outline just a few.  There is no one <em>right</em> solution, as the policy and practices of a particular institution will greatly impact their decision of how to address these situations.</p>

<p>Perhaps the simplest option in some respects would be to simply not change which identifiers are used for generating the hash and instead <strong>map the new identifiers</strong> to the old values.  Even if a user&#8217;s username has changed to &#8220;jjones&#8221;, you map it back to &#8220;jsmith&#8221; for the purposes of generated persistent identifiers.  While this allows a deployment to continue using the same hashes, it does introduce the burden of maintaining this map of identifiers.  And what is the institution&#8217;s policy with regards to re-issuing identifiers?  If &#8220;jsmith&#8221; is allowed to be re-issued to a different user at some point in the future, that is going to create problems.</p>

<p>An alternate solution would be to simply <strong>use better identifiers</strong>.  At USC, we had another user attribute called the &#8220;uscPVID&#8221; which we used instead of username.  This was itself an opaque identifier that effectively served as the primary key within the enterprise directory.  Even if all the other data for a user changed like their name and username, the uscPVID would remain the same.  The same kind of persistent key can be obtained for relying parties by creating a lookup table that maps the relying party&#8217;s public ID to some more persistent internal ID.  If and when a relying party changes their public ID, you simply modify, or add a new entry in your lookup table.</p>

<p>Finally, an identity provider could <strong>migrate from old to new identifiers</strong>, either with a one time update, or gradually.  For a one time update, the identity provider would generate the old and new identifier for a user or set of users, and provide that information to the relying part out of band.  The relying party would then be responsible for updating their database accordingly.  Alternately, the new and old mapping could be provided gradually at the time of user login by using attributes.  This was the technique used at USC for updating relying parties of changes to a different user attribute known as the USCID.  For reasons I won&#8217;t bother explaining here, this 10 digit ID could change for a user when certain events occurred.  To alert relying parties, we would include two attributes &#8212; the current USCID for the user, along with a list of &#8220;historical&#8221; USCIDs for that user.  Relying parties were then responsible for updating any records they had for one of the historical USCIDs to the current USCID of the user.</p>

<h2>Application to OpenID</h2>

<p>The above hashing method could be used with little or no modification to create directed identity URLs for OpenID users.  You could of course simply generate completely random IDs and store them in the database&#8230; that works too.  In my next post however, I&#8217;ll be talking about a new kind of OpenID service I&#8217;ve been doing a lot of thinking about, an OpenID proxy which works to protect the privacy of OpenIDs that don&#8217;t support directed identity themselves.  We&#8217;ll be using the above hashing algorithm, but doing some interesting things with the user identifier.</p>
]]></content:encoded>
			<wfw:commentRss>http://willnorris.com/2009/08/best-practices-with-directed-identity/feed</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Directed Identity vs Identifier Select</title>
		<link>http://willnorris.com/2009/07/openid-directed-identity-identifier-select</link>
		<comments>http://willnorris.com/2009/07/openid-directed-identity-identifier-select#comments</comments>
		<pubDate>Fri, 31 Jul 2009 10:29:03 +0000</pubDate>
		<dc:creator>Will Norris</dc:creator>
				<category><![CDATA[identity]]></category>
		<category><![CDATA[technology]]></category>
		<category><![CDATA[directed identity]]></category>
		<category><![CDATA[openid]]></category>

		<guid isPermaLink="false">http://willnorris.com/?p=797</guid>
		<description><![CDATA[I initially started writing this post a couple months ago in response to the common misuse of the term &#8220;directed identity&#8221; I was seeing in the OpenID community. After reading Dirk Balfanz&#8217;s guest post &#8220;Users vs. Identity Providers in OpenID&#8221; on Eran Hammer-Lahav&#8217;s blog, I decided it was important to get this posted. I think [...]]]></description>
			<content:encoded><![CDATA[<p>I initially started writing this post a couple months ago in response to the common misuse of the term &#8220;directed identity&#8221; I was seeing in the OpenID community.  After reading Dirk Balfanz&#8217;s guest post &#8220;<a href="http://www.hueniverse.com/hueniverse/2009/07/users-vs-identity-providers-in-openid.html">Users vs. Identity Providers in OpenID</a>&#8221; on <a href="http://www.hueniverse.com/">Eran Hammer-Lahav&#8217;s blog</a>, I decided it was important to get this posted.  I think some people who are relatively new to certain identity concepts genuinely misunderstand what is meant by &#8220;directed identity&#8221;, while others know the difference but are simply loose with the term.  In how it is most often used in the OpenID community, the exact distinction between &#8220;directed identity&#8221; and other related concepts is not terribly important for now.  But as we start seeing OpenID used in higher value transactions, and where higher degrees of privacy are required, not understanding the difference can lead to great confusion.</p>

<p><span id="more-797"></span></p>

<h2>OpenID History</h2>

<p>It&#8217;s important to understand a little bit of the history of OpenID.  The technology <a href="http://www.sixapart.com/labs/openid/">was created in 2005</a> by <a href="http://bradfitz.com/">Brad Fitzpatrick</a> while working at <a href="http://www.sixapart.com/">Six Apart</a> as a means for bloggers to leave authenticated comments on each others blogs without having to create new accounts.  Since the target community was bloggers, everyone already had a blog URL that they identified with, so that made for a convenient portable, identifying, and globally unique identifier for users.  In slightly more technical terms, we would refer to these identifiers as &#8220;omnidirectional&#8221; and &#8220;readable&#8221;.</p>

<p>The <em>readability</em> of an identifier is exactly what you would think, and can be determined quite easily.  Does the identifier itself give any clues as to the object it identifies?  Blog URLs were intentionally chosen as identifiers because they were easily recognizable.  A non human-readable identifier is generally referred to as being <em>opaque</em>.  You are not able to discern anything about the resource simply by looking at its identifier.  Take a UPC barcode for example &#8212; while it uniquely identifies a product in a store, the number itself is meaningless without looking it up in a database.</p>

<p>The second important property of an OpenID is in being <em>omnidirectional</em>.  The direction of an identifier really just refers to the contexts in which that identifier is used.  One of the original goals of OpenID was to have a single portable identifier that users could use on many different sites across the web.  By contrast, a &#8220;directed&#8221; identifier is one that can only be used in certain contexts (generally, just one).</p>

<h2>New Use Cases</h2>

<p>Over time, the use cases in which OpenID was applied expanded, and the technology was forced to mature.  Despite <a href="http://web.archive.org/web/20050716234818/http://openid.net/">deliberate attempts</a> NOT to include profile data in the original protocol, a new extension was soon created that would allow <a href="http://openid.net/specs/openid-simple-registration-extension-1_0.html">basic registration data</a> like name and email address to be passed along on top of OpenID, revealing additional data about the user.  Just as people are not one dimensional in real life, it was quickly apparent that there was value in allowing users to maintain multiple sets of identity data, generally called &#8220;personas&#8221;, that could be presented to different websites.  I could have a &#8220;personal&#8221; persona which included different data than my &#8220;work&#8221; persona.  If I really wanted to keep these parts of my life separated, I could even use different OpenIDs for different sites.</p>

<p>I think you could say that this is where many of the current usability problems with OpenID began.  As users were having to manage multiple identities, and sometimes not remembering their URL at all, a new mechanism was devised to make things easier for users.  Instead of typing in their full OpenID URL at a consumer site, the user could simply enter the URL of their <strong>OpenID Provider</strong>.  This would allow the consumer site to start the OpenID authentication flow and send the user over to the right OpenID provider.  The user could then authenticate to the provider, select a particular OpenID URL and persona if they have multiple, and the correct OpenID and data would be returned to the consumer site.  This new flow that was included in OpenID 2.0 was never really given a good name, but is <a href="http://openid.net/specs/openid-authentication-2_0.html#responding_to_authentication">referred to in the spec</a> as &#8220;OpenID Provider driven identifier selection&#8221;.  Just rolls right off the tongue doesn&#8217;t it?  This is why almost no one calls it by the right name, it&#8217;s a mouthful.  But it is at least accurate &#8212; the OpenID Provider is responsible for having the user select the appropriate identifier for that particular transaction.</p>

<h2>OpenID and Privacy</h2>

<p>But there was another, perhaps more pressing, use-case that led to the development of the &#8220;identifier select&#8221; flow.  While I as a user can maintain different personas which I use at different sites, what if I want to remain completely anonymous?  What if I don&#8217;t want to reveal <strong>anything</strong> about myself, yet still be recognized as the same user each time I login to a particular site with my OpenID?  Have you ever used Yahoo!&#8217;s OpenID provider by simply typing &#8220;yahoo.com&#8221; into an OpenID field?  If you have, you may have noticed that Yahoo! gives you a choice of what OpenID URL you want to use, including one that is completely opaque (remember talking about &#8220;readibility&#8221; earlier).  I am given two choices when I login:</p>

<ul>
<li><a href="http://www.flickr.com/photos/wnorris">http://www.flickr.com/photos/wnorris</a></li>
<li><a href="https://me.yahoo.com/a/YN.TrVBnuIAvmAk7teEzbLW_MQ-">https://me.yahoo.com/a/YN.TrVBnuIAvmAk7teEzbLW_MQ-</a></li>
</ul>

<p>I can use my Flickr URL which links to my photo stream, and subsequently lots of other information about me, or I can use this opaque URL that reveals nothing about me.  If I were to login to an OpenID enabled site using the second URL, there would be no way to identify which Yahoo! user it belongs to, or anything else about me &#8212; it is completely opaque.  Well&#8230; except for that fact that I&#8217;ve just publicly revealed what my &#8220;secret&#8221; Yahoo! OpenID URL is.  This means that anywhere I have previously used this URL can now be linked back to me.  Not to worry, I&#8217;m pretty sure I&#8217;ve never used it anywhere except for testing.</p>

<p>But even without me revealing what my URL is, you could begin to build a profile of me.  Without knowing anything else but my OpenID, you could search for that URL on various websites and piece together different things I may have said or done.  Maybe I mentioned my city on one site, and part of my name on another.  The more I use that &#8220;secret&#8221; OpenID, the more I reveal about myself, and the less &#8220;secret&#8221; it becomes.  Even if my data on these sites was not publicly accessible, what if multiple site owners where to start building such a profile about me behind the scenes without me knowing?  What if multiple government websites were to build such a profile about me?  This is known as <em>collusion</em>, and is precisely what our next and final measure is intended to protect against.</p>

<h2>Directed Identity</h2>

<p>As we&#8217;ve seen above, OpenID Provider driven identifier selection does not necessarily imply anything about the readability of the subsequent URL that is returned.  We&#8217;ve also seen that just because an identifier is opaque does not necessarily guarantee our privacy online.  The final piece of our puzzle takes us right back to where this discussion started &#8212; directed identity.  If you recall, the &#8220;direction&#8221; of an identifier refers to the context in which it is used.  And a <em>directed identifier</em> is one that is typically used within a single context, or with a single party.  So when we talk about &#8220;directed identity&#8221; in terms of OpenID, we mean that a <strong>different</strong> OpenID URL is used for <strong>every</strong> website you login to.  While this does not necessarily mean that the identifier is also opaque, it&#8217;s pretty useless if it isn&#8217;t.</p>

<p>This concept isn&#8217;t new to identity or to OpenID.  A very early identity company <a href="http://lists.danga.com/pipermail/yadis/2006-August/002778.html">Sxip provided exactly this feature</a>:</p>

<blockquote>
  <p>In our implementations of a Homesite, we let the user select which  persona they want to be at a new site. One of those is an &#8220;anonymous&#8221;  persona that will have a unique URL for each site.</p>
  
  <p>This lets the user decide on a site by site basis what is disclosed.</p>
  
  <p>&#8212; Dick Hardt</p>
</blockquote>

<p>To my knowledge, the only OpenID provider that implements true directed identity today is Google (and Sxip still, I assume).  If there are others I&#8217;m not aware of, please leave a comment and let me know.  Remember that Yahoo! doesn&#8217;t implement directed identity because, even though they use &#8220;identifier select&#8221; along with an opaque identifier, that identifier is <strong>not</strong> unique for each website you login to.</p>

<h2>Recap and Why this Matters</h2>

<p>So to recap the three basic terms, and the way in which they build upon each other:</p>

<ul>
<li><p><em>OpenID Provider driven identifier selection</em> (or <em>identifier select</em> for short) refers to the ability for a user to enter the URL of their OpenID Provider into an OpenID field rather than their personal OpenID URL.  This is a feature of OpenID 2.0, and will result in an actual user OpenID URL being returned to the consuming site.  This says nothing about the nature of that URL, and can be implemented simply as a user convenience.</p></li>
<li><p>An <em>opaque</em> URL is one that does not itself reveal any information about the user it identifies.  Any practical use of opaque identifiers necessitates the use of identifier select, since it is not realistic to have a user remember and enter a long and meaningless OpenID URL.  Opaque identifiers protect user privacy.</p></li>
<li><p>A <em>directed identifier</em> is an opaque identifier which is unique for a given site.  The same OpenID URL is continually returned to a given consuming site, but no two consuming sites are ever given the same OpenID URL for a user.  Directed identity protects against collusion.</p></li>
</ul>

<p>Today, identity data is being thrown around pretty loosely without much regard to how it is being used, but that is quickly changing.  Slowly but surely, we are seeing reputable companies involved in high value transactions express interest in what federated identity can offer.  Remember how much buzz the administration of then President Elect Obama created when <a href="http://www.readwriteweb.com/archives/barack_obamas_changegov_adds_o.php">they implemented OpenID</a> via Intense Debate on Change.gov?  Just look at last week&#8217;s <a href="http://www.whitehouse.gov/blog/Federal-Websites-Cookie-Policy/">announcement from the White House</a> regarding HTTP cookie policies on federal websites.  Now tell me they are not going to be looking at directed identity if and when they were ever to implement federated identity for real.  On that day, it will become <strong>very</strong> important that the community (and especially OpenID Providers) understand the difference between &#8220;identifier select&#8221; and &#8220;directed identity&#8221; if they want to play ball with the government.</p>
]]></content:encoded>
			<wfw:commentRss>http://willnorris.com/2009/07/openid-directed-identity-identifier-select/feed</wfw:commentRss>
		<slash:comments>10</slash:comments>
		</item>
	</channel>
</rss>

