Of Course "changeme" Is Valid Base64

By Susam Pal on 24 Oct 2020

Today, I came across this blog post regarding how the author of the post used the string "changeme" as test data while testing a Base64 decoding functionality in their application. However, the author incorrectly believed that this test data is not a valid Base64-encoded string and therefore would fail to decode successfully when decoded as Base64. To their surprise, they found that this string "changeme" does in fact decode successfully.

The post did not go any further into understanding why indeed "changeme" is a valid Base64-encoded string and why it can successfully be decoded into binary data. It appears that the author was using Base64 encoding scheme as a black box.

I think it is worth noting and illustrating that any alphanumeric string with a length that is a multiple of 4 is a valid Base64-encoded string. Here are some examples that illustrate this:

$ printf AAAA | base64 --decode | od -tx1
0000000    00  00  00
0000003
$ printf AAAAAAAA | base64 --decode | od -tx1
0000000    00  00  00  00  00  00
0000006
$ printf AQEB | base64 --decode | od -tx1
0000000    01  01  01
0000003
$ printf AQID | base64 --decode | od -tx1
0000000    01  02  03
0000003
$ printf main | base64 --decode | od -tx1
0000000    99  a8  a7
0000003
$ printf scrabble | base64 --decode | od -tx1
0000000    b1  ca  da  6d  b9  5e
0000006
$ printf 12345678 | base64 --decode | od -tx1
0000000    d7  6d  f8  e7  ae  fc
0000006

Further, since + and / are also used as symbols in Base64 encoding (for binary 111110 and 111111, respectively), we also have a few more intersting examples:

$ printf 1+2+3+4+5/11 | base64 --decode | od -tx1
0000000    d7  ed  be  df  ee  3e  e7  fd  75
0000011
$ printf "\xd7\xed\xbe\xdf\xee\x3e\xe7\xfd\x75" | base64
1+2+3+4+5/11

I think it is good to understand why any string with a length that is a multiple of 4 turns out to be a valid Base64-encoded string. The Base64 encoding scheme encodes each group of 6 bits in the binary input with a chosen ASCII character. For every possible 6-bit binary value, we have assigned an ASCII character that appears in the Base64-encoded string. Each output ASCII character can be one of the 64 carefully chosen ASCII characters: lowercase and uppercase letters from the English alphabet, the ten digits from the Arabic numerals, the plus sign (+) and the forward slash (/). For example, the bits 000000 is encoded as A, the bits 000001 is encoded as B, and so on. The equals sign (=) is used for padding but that is not something we will discuss in detail in this post.

The smallest positive multiple of 6 that is also a multiple of 8 is 24. Thus every group of 3 bytes (24 bits) of binary data is translated to 4 ASCII characters in its Base64-encoded string. Thus the entire input data is divided into groups of 3 bytes each and then each group of 3 bytes is encoded into 4 ASCII characters. What if the last group is less than 3 bytes long? There are certain padding rules for such cases but I will not discuss them right now in this post. For more details on the padding rules, see RFC 4648.

Now as a natural result of the encoding scheme, it turns out that any 4 alphanumeric characters is a valid Base64 encoding of some binary data. That's because for every alphanumeric character, we can find some 6-bit binary data that would be translated to it during Base64 encoding. This is the reason why any alphanumeric string with a length that is a multiple of 4 is a valid Base64-encoded string and can be successfully decoded to some binary data.

Comments | #unix | #shell | #technology