Global BSTRs

27 09 2012

Working with COM from C++ is pretty natural, after all COM’s interfaces are binary compatible with C++ virtual tables. However there are some places in which the fact that COM was designed to work with C leaks through and causes some friction.

A notable example of this is BSTR, under the covers all BSTR is is a typedef to wchar_t* however a BSTR is conceptually different from a plain string, a BSTR may contain embed Nul characters and is therefore not Nul terminated, instead the length is stored in the memory that precedes the actual string data. BSTRs also resist the concept of const correctness, you can’t have a const BSTR which can be a bit of a pain when you want to have BSTR constants.

What should one do when two different pieces of code want to share the value of a BSTR? There are several options.

1. #define a string literal in a header

#define XPATH_CHEAP_BOOKS L"//genre/book[price<7]/title"

One can pass a (wide) string literal to a function expecting a BSTR and this will even work …if you don’t use BSTR specific methods or marshal the parameter.

void Foo(BSTR bs)
    wcout << L"The length of '" << bs << "' is " << SysStringLen(bs) << endl;

Foo(L"Your guess is as good as mine");

Code that ran correctly (by accident) for years and years can suddenly start to fail if the class implementing an interface changes from C++ to .NET, if there’s an interop call between C++ and .NET the marshaller go into play. Since the marshaller knows that the correct way to determine the length of a BSTR is to call SysStringLen it will do so and quite probably get an enormous value, this will cause an out of memory exception when trying to allocate the .NET string.

Obviously the same problems happen with const wchar_t*.

Global CComBSTR

CComBSTR XPATH_CHEAP_BOOKS = L"//genre/book[price<7]/title";

Having a global CComBSTR  (in a header file) has several problems

  1. you have to choose it’s storage class
    •  it can’t be extern (which is the default) because then you’ll get a link-time multiple definition error
    • if it’s static there will be an instance for every translation unit (ie .cpp file), which makes point 2 worse
  2. Each constant will allocate a BSTR when the binary (dll or exe) is loaded, even if it’s never used and even if COM isn’t yet ready to successfully run SysAllocString.
  3. Since CComBSTRs can’t be const someone may change them during the application’s lifetime
  4. When the process terminates the destructors will call SysFreeString, this is just a waste of time, the process is going to return all its memory to the OS anyway, who cares if it’s freed or un-freed memory?

A solution

The first hint at a solution came from  __declspec( selectany), this solves problem #1, an extern global object with __declspec( selectany) will not cause a multiple definition error at link time, instead the linker will arbitrarily select one of the symbols and discard the rest.

The next problem (#2) is that we don’t want to initialize each global too early (or indeed at all in case it isn’t used). In order to do this we can just create a class that has a cast operator to BSTR. This cast operator will use lazy evaluation to create the BSTR the first time it is used and cache the value.

Since the cast operator can be called on different threads some protection must be used, thankfully InterlockedCompareExchangePointer is a perfect fit for our needs.

The rest is pretty straight forward.

  • We can avoid #4 (waste of time in process destruction) by neglecting to write a destructor
  • Although we can’t prevent the clients of our code from changing the BSTR at compile time we can verify that nobody has at runtime (I chose to do so only in debug builds)

We want the class to be light weight therefore it’s best if it is initialized with a string literal so that we can grab the pointer and be sure it will be valid throughout the lifetime of the program and not have to copy it. In order to ensure that we really got a string literal we can make the constructor private and make a macro that creates these objects, in the macro we can verify that the string is a literal by concatenating another (empty) string to it (this will fail at compile time for non literal strings). The macro will use a public member factory function that is named in a way that makes it obvious that it should only be passed a string literal (we are protecting against Murphy, not Machiavelli after all).

Warning: The following code seems to work as expected but has not been rigorously tested, I welcome any corrections.

#pragma once

#include "WTypes.h"// BSTR
#include "crtdbg.h"
#ifdef _DEBUG
#include <cwchar>// for ASSERT in LeakBSTR

// LeakBSTR is for creating globals with the GLOBAL_BSTR macro therefore there is no need 
// to free the BSTR at destruction (hence no destructor)
class LeakBSTR { 
    BSTR bs;
    // It's safe to hold on to the string pointer in the constructor since
    // the private ctor & macro enforce initialization by string literal
    const wchar_t* str; 

    LeakBSTR(const wchar_t* s)
        :  bs(NULL)
        , str(s)
    { }

    static LeakBSTR constructFromStringLiteral(const wchar_t* s)
        return s;

    operator BSTR() 
        if (bs == NULL) {
            BSTR tmp = SysAllocString(str);
            if (InterlockedCompareExchangePointer((void**)&bs, tmp, NULL) != NULL) 
                SysFreeString(tmp); // Another thread already initialized 'bs'

        // make sure nobody messed with the BSTR since it was created
        _ASSERTE(std::wcscmp(str, bs) == 0); 
        return bs;

#define GLOBAL_BSTR(name, value) __declspec(selectany) LeakBSTR \
    name(LeakBSTR::constructFromStringLiteral(L"" value L""))





Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

%d bloggers like this: