From UUID to Infinite Loops

By Susam Pal on 10 Apr 2015

Most people involved in software development have probably come across UUIDs. UUID stands for universally unique identifier. It is also sometimes called GUID which stands for globally unique identifier. Quoting from RFC 4122 below:

This specification defines a Uniform Resource Name namespace for UUIDs (Universally Unique IDentifier), also known as GUIDs (Globally Unique IDentifier). A UUID is 128 bits long, and requires no central registration process.

These 128-bit identifiers are typically represented as 32 hexadecimal digits, displayed in five groups separated by hyphens. There are various variants and versions of UUIDs which differ in how the identifiers are encoded in binary and how they are generated. In this post, we are going to focus only on variant 1 of version 4 UUIDs, also known simply as version 4 UUIDs or random UUIDs. Here are a couple examples of version 4 UUIDs generated using Python:

>>> import uuid
>>> str(uuid.uuid4())
>>> str(uuid.uuid4())

Version 4 UUID is one of the most popular type of UUIDs in use today. Unlike the other versions, this version does not require external inputs like MAC address, sequence number, current time, etc. All except 6 bits are generated randomly in version 4 UUIDs. This leads to some fun results. For example, the following piece of Python code is an infinite loop:

while str(uuid.uuid4())[14] == '4': pass

So is this:

while str(uuid.uuid4())[19] in ['8', '9', 'a', 'b']: pass

The above infinite loops show that the hexademical digit at index 14 must always be 4. Similarly, the hexadecimal digit at index 19 must always be one of 8, 9, a, and b. We can see the two examples of version 4 UUIDs mentioned earlier and confirm that this is indeed the case. Here are a few more examples that illustrate this pattern:


The digit after the second hyphen is at index 14 and indeed this digit is always 4. Similarly, the hexadecimal digit after the third hyphen is at index 19 and indeed it is always one of 8, 9, a, and b.

If we number the octets in the identifiers as 0, 1, 2, etc. where 0 represents the most significant octet (the leftmost pair of hexadecimal digits in the string representations above), then with a careful study of section 4.1.1 of RFC 4122 we know that the first two most significant bits of octet 8 represent the variant number. Since we are working with variant 1 of version 4 UUIDs, these two bits must be 1 and 0. As a result, octet 8 must be of the form 10xx xxxx in binary where each x represents an independent random bit. Thus, in binary, the four most significant bits of octet 8 must be one of 1000, 1001, 1010, and 1011. This explains why we always see the hexadecimal digit 8, 9, a, or b at this position.

Similarly, a study of section 4.1.2 and section 4.1.3 of the RFC shows that the four most significant bits of octet 6 must be set to 0100 to represent the version number 4. This explains why we always see the hexadecimal digit 4 here.

Section 4.4 of RFC 4122 further summarizes these points. This explains why the two little Python snippets mentioned above amount to infinite loops.