A no-magic base64 implementation in F#
2021-10-02Today I got lost on Wikipedia (again) and ended up trying to naively implement base64 encoding and decoding in F#...
Understanding how to encode some bytes into base64 is relatively simple:
- Convert the input bytes into a long uninterrupted bit string.
- Split the bit string into sextets (6 bits), padding the last sextet with zeros if necessary.
- Using the base64 indexed symbols table, translate each sextet into a character.
Adding padding characters (=
) is not strictly necessary.
Decoding is basically the opposite:
- Map each input character to its corresponding index from the base64 symbols table.
- Convert each index to its bit representation, and concatenate the sextets to get an uninterrupted bit string.
- Split the bit string into octets/bytes (8 bits).
Without further delay, here is what I came up with. Notice how it closely follows the bullet points above:
let base64Encode (bytes: byte array) =
bytes
|> List.ofArray
|> List.collect ( // Make an uninterrupted string of bits from the input bytes:
int // Cast bytes to ints
>> splitNumerals 2 // Convert input bytes to bits (binary)
>> padList Left 8 0) // Pad with zeros so we have 8 bits for all input octets
|> List.chunkBySize 6 // Split into sextets
|> List.map (
padList Right 6 0 // Pad incomplete sextets with trailing zeros
>> unsplitNumerals 2 // Convert binary back to ints
>> (Array.get base64Charset)) // Map ints to chars
|> System.String.Concat
let base64Decode (str: string) =
str.TrimEnd('=').ToCharArray() // Remove trailing padding and split chars
|> List.ofArray
|> List.map (fun char ->
base64CharsetIndices
|> Map.find char) // Map chars back to ints
|> List.collect ( // Recreate the uninterrupted bit string:
splitNumerals 2 // Convert ints to binary
>> padList Left 6 0) // Pad with zeros so we have 6 bits for all input sextets
|> List.chunkBySize 8 // Regroup octets
|> List.map (
unsplitNumerals 2 // Convert binary back to ints
>> byte) // Cast ints to bytes
|> Array.ofList
And here is the complete code:
open System.Text
/// Decompose a number in numerals according to the desired base (radix).
let splitNumerals radix (x: int) =
let rec splitNumerals' =
function
| head :: tail ->
if head < radix then head :: tail
else head % radix :: (splitNumerals' (head / radix :: tail))
| _ -> failwith "Don't call this function with an empty list!"
splitNumerals' [ x ] |> List.rev
/// Returns the exponentiation result of a list of positional numerals
/// in the desired base (radix).
let unsplitNumerals radix (xs: int list) =
(0, xs |> List.rev |> List.indexed)
||> List.fold (fun state (i, x) -> state + x * (pown radix i))
/// Using the provided charset, returns a map of char -> index in the charset.
let getCharsetIndexMap =
Array.indexed
>> Array.map (fun (x, y) -> y, x)
>> Map.ofArray
type Pad = | Left | Right
let padList padType targetSize padValue (list: 'a list) =
if list.Length >= targetSize then list
else
let padding = List.init (targetSize - list.Length) (fun _ -> padValue)
match padType with
| Left -> padding @ list
| Right -> list @ padding
// Standard base64 charset:
let base64Charset =
[ 'A' .. 'Z' ]
@ [ 'a' .. 'z' ]
@ [ '0' .. '9' ]
@ [ '+'; '/' ]
|> Array.ofList
let base64CharsetIndices = getCharsetIndexMap base64Charset
/// Encodes an array of bytes to a base64 string.
/// (Note we don't bother adding padding chars)
let base64Encode (bytes: byte array) =
bytes
|> List.ofArray
|> List.collect ( // Make an uninterrupted string of bits from the input bytes:
int // Cast bytes to ints
>> splitNumerals 2 // Convert input bytes to bits (binary)
>> padList Left 8 0) // Pad with zeros so we have 8 bits for all input octets
|> List.chunkBySize 6 // Split into sextets
|> List.map (
padList Right 6 0 // Pad incomplete sextets with trailing zeros
>> unsplitNumerals 2 // Convert binary back to ints
>> (Array.get base64Charset)) // Map ints to chars
|> System.String.Concat
/// Decodes a base64 string into an array of bytes.
let base64Decode (str: string) =
str.TrimEnd('=').ToCharArray() // Remove trailing padding and split chars
|> List.ofArray
|> List.map (fun char ->
base64CharsetIndices
|> Map.find char) // Map chars back to ints
|> List.collect ( // Recreate the uninterrupted bit string:
splitNumerals 2 // Convert ints to binary
>> padList Left 6 0) // Pad with zeros so we have 6 bits for all input sextets
|> List.chunkBySize 8 // Regroup octets
|> List.map (
unsplitNumerals 2 // Convert binary back to ints
>> byte) // Cast ints to bytes
|> Array.ofList
System.Convert.ToBase64String(Encoding.UTF8.GetBytes("Many hands make light work."))
|> printf "%A"
base64Encode (Encoding.UTF8.GetBytes("Many hands make light work."))
|> printf "%A"
base64Decode "TWFueSBoYW5kcyBtYWtlIGxpZ2h0IHdvcmsu"
|> Encoding.UTF8.GetString
|> printfn "%A"
System.Convert.FromBase64String("TWFueSBoYW5kcyBtYWtlIGxpZ2h0IHdvcmsu")
|> Encoding.UTF8.GetString
|> printf "%A"
It's not very efficient, but it's easy to follow!
Compare this to the C# reference implementation for example (The decoding is even worse), or with this F# implementation...
Benchmarking
Just for fun, I did a benchmark to compare my code to the reference BCL implementation, as well as the F# implementation found on FsSnip:
I had no hope of matching the BCL, but I didn't expect to be faster than the F# FsSnip implementation on the decoding part! I guess its author did not profile their code... π
The whole code, including the benchmark, is available on github.com/mlaily/base64.fs
The comment is shown highlighted below in context.
JavaScript is required to see the comments. Sorry...