Of Course "changeme" Is Valid Base64
Today, I came across
post regarding how the author of the post used the
"changeme" as test data while testing a Base64
decoding functionality in their application. However, the author
incorrectly believed that this test data is not a valid
Base64-encoded string and therefore would fail to decode
successfully when decoded as Base64. To their surprise, they found
that this string
"changeme" does in fact decode
The post did not go any further into understanding why
"changeme" is a valid Base64-encoded string and
why it can successfully be decoded into binary data. It appears that
the author was using Base64 encoding scheme as a black box.
I think it is worth noting and illustrating that any alphanumeric string with a length that is a multiple of 4 is a valid Base64-encoded string. Here are some examples that illustrate this:
$ printf AAAA | base64 --decode | od -tx1 0000000 00 00 00 0000003 $ printf AAAAAAAA | base64 --decode | od -tx1 0000000 00 00 00 00 00 00 0000006 $ printf AQEB | base64 --decode | od -tx1 0000000 01 01 01 0000003 $ printf AQID | base64 --decode | od -tx1 0000000 01 02 03 0000003 $ printf main | base64 --decode | od -tx1 0000000 99 a8 a7 0000003 $ printf scrabble | base64 --decode | od -tx1 0000000 b1 ca da 6d b9 5e 0000006 $ printf 12345678 | base64 --decode | od -tx1 0000000 d7 6d f8 e7 ae fc 0000006
/ are also used as
symbols in Base64 encoding (for binary
111111, respectively), we also have a few more
$ printf 1+2+3+4+5/11 | base64 --decode | od -tx1 0000000 d7 ed be df ee 3e e7 fd 75 0000011 $ printf "\xd7\xed\xbe\xdf\xee\x3e\xe7\xfd\x75" | base64 1+2+3+4+5/11
I think it is good to understand why any string with a length that
is a multiple of 4 turns out to be a valid Base64-encoded string.
The Base64 encoding scheme encodes each group of 6 bits in the
binary input with a chosen ASCII character. For every possible 6-bit
binary value, we have assigned an ASCII character that appears in
the Base64-encoded string. Each output ASCII character can be one of
the 64 carefully chosen ASCII characters: lowercase and uppercase
letters from the English alphabet, the ten digits from the Arabic
numerals, the plus sign (
+) and the forward slash
/). For example, the bits
A, the bits
000001 is encoded
B, and so on. The equals sign (
used for padding but that is not something we will discuss in detail
in this post.
The smallest positive multiple of 6 that is also a multiple of 8 is 24. Thus every group of 3 bytes (24 bits) of binary data is translated to 4 ASCII characters in its Base64-encoded string. Thus the entire input data is divided into groups of 3 bytes each and then each group of 3 bytes is encoded into 4 ASCII characters. What if the last group is less than 3 bytes long? There are certain padding rules for such cases but I will not discuss them right now in this post. For more details on the padding rules, see RFC 4648.
Now as a natural result of the encoding scheme, it turns out that any 4 alphanumeric characters is a valid Base64 encoding of some binary data. That's because for every alphanumeric character, we can find some 6-bit binary data that would be translated to it during Base64 encoding. This is the reason why any alphanumeric string with a length that is a multiple of 4 is a valid Base64-encoded string and can be successfully decoded to some binary data.