DevDisasters

A More Unique Identifier

"Oh for crying out loud," Jeremy heard his cubicle-neighbor Andy shout, followed by a string of not-so-family-friendly expletives. "It's yet another duplicate GUID!"

Jeremy was intrigued. "Duplicate" is perhaps the least likely problem for a Globally Unique Identifier. With more than 340 billion trillion quadrillion (and that's no typo) possible values, the probability of having two identical GUIDs is basically non-existent. The probability of having multiple duplicate GUIDs is smaller than winning the lottery twice. On the same day. For every lottery held in the world.

"Duplicate GUIDs?" Jeremy stood up and asked over the cubicle wall. "How is that even possible?"

"Obviously it's bound to happen sooner or later," Andy responded. "I mean, we generate a lot of GUIDs. And I mean a lot. We really should have used a more unique identifier, like I had suggested earlier."

That last sentence-especially delivered with the told-you-so inflection-was the only clue Jeremy needed to know exactly what Andy was referring to. Months earlier, the development team was presented with a bit of a unique challenge.

Unique Requirement
An automated data collection and processing application they were building required that a "dataset ID" be returned for every dataset that was uploaded to the Web service. This "dataset ID" could then be used by the consuming application to check on the processing status, cancel the processing request and, once processing was completed, retrieve the "processed dataset ID."

Tell Us Your Tale

Each issue Alex Papadimoulis, publisher of the popular Web site The Daily WTF, recounts first-person tales of software development gone terribly wrong. Have you experienced the darker side of development? We want to publish your story. E-mail your tale to Executive Editor Kathleen Richards at krichards@1105media.com and put "DevDisasters" as the subject line.

The tricky part in all this was that the processing application would never know how many IDs were issued or what IDs had been issued: It would somehow have to provide an ID that was always unique.

Given the globally unique requirement, the solution was obvious to Jeremy: Simply generate a GUID using the Windows API. Andy, on the other hand, hadn't used GUID in the past and didn't quite trust an algorithm to be smart enough to generate such an identifier. He didn't have a better idea, but was confident that, given enough time, he could cobble something together that utilized the computer's serial number, CPU footprint and a number of other factors.

"We're not generating that many GUIDs," Jeremy defended. "A thousand a day, tops. Statistically, we'd need to generate a hundred trillion every day for a mill-"

Andy cut him off. "Yeah, yeah, I remember your whole spiel. A billion, gazillion, fafillion, shabolubalu jillion zillion yillion ... Whatever. The fact is, we've got duplicates. It's causing all sorts of problems, and I'm going to have to spend all afternoon cleaning the mess for just this one duplicate."

Shifty Characters
Baffled, Jeremy decided to peek at the source code to see the problem. Perhaps it was a variable that was getting reused? Or maybe something in the cache?

After all of 10 minutes, Jeremy discovered the root of the problem:

// Swap two chars of dataset ID 
// to create processed ID
var dsID = dataSetGuid.ToString();
var pdsID = new StringBuilder();
pdsID.Append(dsID[1]); 
pdsID.Append(dsID[0]); 
pdsID.Append(dsID.Substring(2));
return new Guid(pdsID.ToString());

The code was checked in by Andy. In fairness, it will generate a new, unique GUID-provided that the first two characters of the GUID aren't the same.

Jeremy explained the problem to Andy, who was still working on cleaning up 664591c8-1985-4071-a4ab-ec87f1e9af1.

"Oh," Andy said, embarrassed. "I see. But what are the chances of that?"

About the Author

Alex Papadimoulis is a managing partner at Inedo LLC and publisher of the Web site "Worse Than Failure" (WorseThanFailure.com). He writes the DevDisasters page in every issue of Redmond Developer News.

comments powered by Disqus

Reader Comments:

Sun, Feb 14, 2010 Russ Painter Ireland

Uhh one in sixteen. Yea, I've had that battle before. My boss insisted that such a compact number could never be unique enough for our tables which may have up to a million entries. It took hours of explaining the statistics and the inner workings of guid. Even then he only reluctantly accepted it, reserving the right to say "I told you so" whenever we would eventually hit a conflict (which obviously never happened).

Tue, Dec 1, 2009 tim

i enjoyed this story

Add Your Comment:

Your Name:(optional)
Your Email:(optional)
Your Location:(optional)
Comment:
Please type the letters/numbers you see above