← → 🔗

Marshalling a native structure containing two fixed-length strings with a different charset

2019-12-08

What's the problem?

I want to use the following C structure from C#:

typedef struct
{
    char sessionKey[32];
    wchar_t userName[64];
} ScrobblerConfig;

Notice the types of the two arrays are different.

I should be able to do something like this:

[StructLayout(LayoutKind.Sequential)]
public class ScrobblerConfig
{
    // Does not compile
    [MarshalAs(UnmanagedType.ByValTStr, SizeConst = 32, CharSet = CharSet.Ansi)]
    public string sessionKey;

    [MarshalAs(UnmanagedType.ByValTStr, SizeConst = 64, CharSet = CharSet.Unicode)]
    public string userName;
}

But the above code does not compile, because CharSet cannot be specified at the field level, and has to be specified on the containing structure, in the StructLayout attribute.

That's a problem, because I have two different charsets on my two strings!

Manual marshalling

I'm going to spoil the surprise, my solution looks like this:

[StructLayout(LayoutKind.Sequential)]
public class ScrobblerConfig
{
    [MarshalAs(UnmanagedType.ByValArray, SizeConst = 32)]
    public readonly byte[] sessionKey = new byte[32];

    [MarshalAs(UnmanagedType.ByValArray, SizeConst = 128)]
    public readonly byte[] userName = new byte[128];

    public string SessionKey
    {
        get => InteropHelper.DecodeNativeBuffer(sessionKey, Encoding.ASCII);
        set => InteropHelper.EncodeToNativeBuffer(sessionKey, Encoding.ASCII, value);
    }

    public string UserName
    {
        get => InteropHelper.DecodeNativeBuffer(userName, Encoding.Unicode);
        set => InteropHelper.EncodeToNativeBuffer(userName, Encoding.Unicode, value);
    }
}

With InteropHelper being discussed in the rest of this post...

The fields are marshalled as immutable fixed-length byte arrays
It's clear the arrays cannot and should not change
There is no doubt about the actual size of the fields (e.g.: where is the null terminator?)

This is a matter of opinion, but I find this even clearer than the default .NET string marshalling for fixed-length strings!

Setter implementation (`EncodeToNativeBuffer()`)

Since we can't alter the array (only its content), we cannot represent a null string. They are the same as an empty string.

In case the buffer is too small for the value we want to encode, we cannot just truncate the bytes: if an encoded character's bytes are truncated, it's not the same character. It might not even be a valid character anymore...

In this situation, I chose to set the buffer to an empty string, and optionally throw an exception.

Encoding.GetBytes() does not encode a null terminator, so after the value is encoded, extra bytes at the end of the buffer should be cleared.

Getter implementation (`DecodeNativeBuffer()`)

Encoding.GetString() does not stop reading once it finds a null character in the buffer, and actually outputs them in its result, so we have to TrimEnd('\0') them.

As mentionned on the web, It seems it would be more efficient to find the null-terminator before, and give a limit to Encoding.GetString().

Unfortunately, it's way more complicated than it sounds:

In UTF-16, the string abc would be {97, 0, 98, 0 ,99, 0}.

We can't find the first null byte, because it would stop at the first character
We can't find the latest non-null byte, because it would result in the wrong {97, 0, 98, 0 ,99} (c is not encoded properly)

Depending on the encoding, a character could be one or multiple bytes long. For non-fixed-length encodings like UTF-8, two different characters might not be encoded with the same number of bytes.

So it's actually better to let the decoder do its work, and trim null characters after decoding the string...

Another thing to note: since we use a fixed-length buffer, a string might fit in the buffer, but the remaining null bytes might not represent a proper null-terminator in the chosen encoding.

For example, if we use UTF-16 with a 3 bytes buffer, we can encode one character, but there is not enough room to encode a 2 bytes long null-terminator.

{97, 0, 0} is not a valid UTF-16 string.

This results in a '�' (REPLACEMENT CHARACTER (U+FFFD)) character being outputted when decoding the buffer.

For this reason, TrimEnd('\0') is not enough, and should actually be TrimEnd('\0', '�').

`InteropHelper` complete implementation

public static class InteropHelper
{
    public static string DecodeNativeBuffer(byte[] source, Encoding encoding)
    {
        if (source == null) throw new ArgumentNullException(nameof(source));
        if (encoding == null) throw new ArgumentNullException(nameof(encoding));

        // GetString() includes null characters, but we don't want them.
        // We can't just trim at the byte level before, because it might f-up the encoding...
        var decoded = encoding.GetString(source);
        // Also trim unicode replacement characters (�) that can appear
        // when the buffer is too small to encode the last null character
        // (that can take more than one byte in some encodings)
        return decoded.TrimEnd('\0', '�');
    }

    /// <summary>
    /// Writes <paramref name="value"/> to <paramref name="backingField"/>, using the specified <paramref name="encoding"/>.
    /// If <paramref name="throwOnBufferTooSmall"/> is false and the buffer is too small for <paramref name="value"/>,
    /// <paramref name="backingField"/> is cleared.
    /// </summary>
    public static void EncodeToNativeBuffer(byte[] backingField, Encoding encoding, string value, bool throwOnBufferTooSmall = false)
    {
        if (backingField == null) throw new ArgumentNullException(nameof(backingField));
        if (encoding == null) throw new ArgumentNullException(nameof(encoding));

        // By design, we can't change the buffer length or set it to null
        if (value == null)
        {
            Array.Clear(backingField, 0, backingField.Length);
        }
        else
        {
            int requiredSize = encoding.GetByteCount(value);
            if (backingField.Length < requiredSize)
            {
                if (throwOnBufferTooSmall)
                {
                    throw new ArgumentException(
                        $"Buffer too small ({backingField.Length}) to fit the required {requiredSize} byte(s).",
                        nameof(backingField));
                }
                else
                {
                    // If throwOnBufferTooSmall is false, values too large result in an empty string.
                    Array.Clear(backingField, 0, backingField.Length);
                }
            }
            else // buffer large enough
            {
                int written = encoding.GetBytes(value, 0, value.Length, backingField, 0);
                Array.Clear(backingField, written, backingField.Length - written);
            }
        }
    }
}

The code discussed in this post is actually used in my XMPlay Sharp Scrobbler plugin.

Not discussed in this post is also a comprehensive unit test suite for this code...

Newer Older

0 comment

The current page url links to a specific comment.
The comment is shown highlighted below in context.

JavaScript is required to see the comments. Sorry...

Marshalling a native structure containing two fixed-length strings with a different charset

What's the problem?

Manual marshalling

Setter implementation (EncodeToNativeBuffer())

Getter implementation (DecodeNativeBuffer())

InteropHelper complete implementation

Setter implementation (`EncodeToNativeBuffer()`)

Getter implementation (`DecodeNativeBuffer()`)

`InteropHelper` complete implementation