Fixed Bits of Version 4 UUID

By Susam Pal on 10 Apr 2015

Universally Unique Identifiers or UUIDs are a popular way creating identifiers that are unique for practical purposes. Quoting from RFC 4122 below:

This specification defines a Uniform Resource Name namespace for UUIDs (Universally Unique IDentifier), also known as GUIDs (Globally Unique IDentifier). A UUID is 128 bits long, and requires no central registration process.

These 128-bit identifiers are typically represented as 32 hexadecimal digits, displayed in five groups separated by hyphens. There are various variants and versions of UUIDs which differ in how the identifiers are encoded in binary and how they are generated. In this post, we are going to focus only on variant 1 of version 4 UUIDs, also known simply as version 4 UUIDs or random UUIDs. Here are a couple examples of version 4 UUIDs generated using Python:

>>> import uuid
>>> str(uuid.uuid4())
'980ddc6a-2c56-44da-ac71-9e6bfc924e25'
>>> str(uuid.uuid4())
'10c3fcde-96a0-4c9e-905b-443b00ceeb01'

Version 4 UUID is one of the most popular type of UUIDs in use today. Unlike the other versions, this version does not require external inputs like MAC address, sequence number, current time, etc. All except six bits are generated randomly in version 4 UUIDs. The six non-random bits are fixed. They represent the version and variant of the UUID. Here is a tiny Python program that demonstrates the first set of fixed bits:

while str(uuid.uuid4())[14] == '4': pass

The above program is an infinite loop. So is this:

while str(uuid.uuid4())[19] in ['8', '9', 'a', 'b']: pass

The above infinite loops show that the hexademical digit at index 14 must always be 4. Similarly, the hexadecimal digit at index 19 must always be one of 8, 9, a, and b. We can see the two examples of version 4 UUIDs mentioned earlier and confirm that this is indeed the case. Here are a few more examples that illustrate this pattern:

527218be-a09e-4d0e-86ce-c39d1348d953
14163389-2eea-4e30-9124-fcf2451eb9fc
c21b57cc-2a4e-4425-a2f4-129256562599
37700270-6deb-4a73-bbcd-d47c6e20b567

The digit after the second hyphen is at index 14 and indeed this digit is always 4. Similarly, the hexadecimal digit after the third hyphen is at index 19 and indeed it is always one of 8, 9, a, and b.

If we number the octets in the identifiers as 0, 1, 2, etc. where 0 represents the most significant octet (the leftmost pair of hexadecimal digits in the string representations above), then with a careful study of section 4.1.1 of RFC 4122 we know that the first two most significant bits of octet 8 represent the variant number. Since we are working with variant 1 of version 4 UUIDs, these two bits must be 1 and 0. As a result, octet 8 must be of the form 10xx xxxx in binary where each x represents an independent random bit. Thus, in binary, the four most significant bits of octet 8 must be one of 1000, 1001, 1010, and 1011. This explains why we always see the hexadecimal digit 8, 9, a, or b at this position.

Similarly, a study of section 4.1.2 and section 4.1.3 of the RFC shows that the four most significant bits of octet 6 must be set to 0100 to represent the version number 4. This explains why we always see the hexadecimal digit 4 here.

Section 4.4 of RFC 4122 further summarizes these points. To summarize, version 4 UUIDs, although 128 bits in length, have 122 bits of randomness. They have six fixed bits that represent its version and variant.

Comments | #python | #programming | #technology