Visual Studio debuger visualizer for JsonCPP

6 06 2015

We’ve been working with the JsonCPP library and while it’s very nice to use, it’s a bit of a pain to debug. For example consider this simple JSON value:

var json = [ 
  42, 
  { 
    "name": "hello", 
    "pi": 3.1415926, 
    "valid": true 
  }
]

If you want to see the value of the pi in the debugger it would look (after some digging) something like this:

JSON with no visualizer

After enduring this for a bit too long I decided to look for a debuger visualizer for JsonCPP but couldn’t find one. So as a last resort I decided to write one myself. I have to say that I was pleasantly surprised to find that this was pretty simple and after a little work I got to the situation that my debugger window looked much more manageable:

JSON with visualizerIf you want to use this visualizer you can find it at in GitHub’s visualstudio-debugger repository.

Advertisements




Converting Unicode to Unicode

11 11 2014

Recently my matchmaker called me over for a consultation. He was facing some trouble with text encoding and since I once read Joel’s The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!I’m considered an expert (rather than barely competent which is also an overstatement).

From the get go it was obvious that the problem was in converting UTF-8 strings to UTF-16. Two main methods were used for this, the CW2A classes and CComBSTR’s constructor that accepts a const char*. These methods both use the CP_THREAD_ACP code page when converting strings and you cannot set the thread local to be UTF-8.

After introducing a fix we inspected the results in the debugger and were confused by what we saw in the watch window. We therefore decided to have a look at a toy example.

Analyzing the problem

Consider the string “Bugs Я Us” which contains the Russian letter “Я” (ya).

int main(int argc, char* argv[])
{
	const wchar_t * wide = L"Bugs Я Us";
	CW2A cw2a(wide);
	CW2A cw2a8(wide, CP_UTF8);
	string str = CW2A(wide);
	string str8 = CW2A(wide, CP_UTF8);
	CComBSTR bs(str8.c_str());
	CComBSTR bs8(CA2W(str8.c_str(), CP_UTF8));
}

Our toy example gave almost the expected results:

Type Default CP_UTF8
CW2A Bugs ? Us Bugs Я Us
std::string Bugs ? Us Bugs Я Us
CComBSTR Bugs Я Us Bugs Я Us

The things that surprised me were the cells in red, those should have the correct string surely?

Then I remembered about the s8 format specifier which instructs Visual studio to display strings as UTF-8, perhaps the strings are correct but Visual Studio is misleading us! After adding s8 to the watch window things look marginally better. Now only the std::string differs from my expectations.

Type Default CP_UTF8
CW2A Bugs ? Us Bugs Я Us
std::string Bugs ? Us Bugs Я Us
CComBSTR Bugs Я Us Bugs Я Us

A bit more poking around showed that the reason for this is the std::string’s visualizer uses the s specifier.

You can find the visualizer in:
<VS Install Directory>\Common7\Packages\Debugger\Visualizers\stl.natvis

I added the red 8s to the file (you have to do this as administrator).

<Type Name="std::basic_string&lt;char,*&gt;">
  <DisplayString Condition="_Myres &lt; _BUF_SIZE">{_Bx._Buf,s8}</DisplayString>
  <DisplayString Condition="_Myres &gt;= _BUF_SIZE">{_Bx._Ptr,s8}</DisplayString>
  <StringView Condition="_Myres &lt; _BUF_SIZE">_Bx._Buf,s8</StringView>
  <StringView Condition="_Myres &gt;= _BUF_SIZE">_Bx._Ptr,s8</StringView>

 

Now, std::string, at least, defaults to UTF-8 representation in the debugger visualizer

watch8

You may be asking yourself why there are two lines each for DisplayString and StringView, this is due to the fact that Visual C++’s string uses the Short String Optimization which avoids dynamic allocations for short strings.

I personally think that Visual Studio should allow configuring the default encoding it uses to display strings, much as it allows displaying numbers in hexadecimal format.

hex

Detecting Additional Offenders

After fixing the original bug we tried to find other locations that may be harbouring similar bugs.

Finding all instances of CW2A is easy, just grep for it, but finding places that use a specific overload of CComBSTR’s constructor or assignment operator is more of a problem.

One way to do this is to mark the offending methods as deprecated. Using #pragma deprecated would allow us to deprecate a method without editing VC’s headers but since we want to deprecate a specific overload this is not an option. I had to use my administrator rights again to edit atlcomcli.h.

declspec

Now we get a warning for every use of the deprecated method and decide whether you’ve found a lurking bug.

warning

 

 





Press Cancel to cancel (or Cancelable Asserts)

2 10 2013

Back when I developed on Unix asserts where simple, if an assert fired the application would abort with a nice core dump which could be then debugged at leisure. When I moved to Windows development one of the changes I had to get used to was that Visual Studio’s _ASSERTs were not fatal.

Regular VS ASSERTE

At first glance this looks like an improvement, you can choose which asserts are fatal and which are not.

There is the obvious wart “Press Retry to debug” I had to read this line several hundred times before it became automatic. Still, all in all, an improvement. However the situation on one of our mega-lines-of-code projects was not so good. It’s a bit embarrassing to admit this (if others have faced the same situation please comment below, misery loves company) but the debug build of the application became very cumbersome to use. This was probably the result of having some teams work only with production builds and some with debug builds and from the fact that some flows in a specific team’s code would only happen when used in a particular way from other teams’ modules. Whatever the cause the result was that when working with the debug build you would have to press Ignore many, many times.

The sane thing to do would have been to treat an ASSERT being activated as a critical defect and either fix the calling code or, if the assert was a false positive, remove the assert. Politics got in the way of sanity and changing the code that contained the assert would often be more effort than clicking Ignore a couple of times.

Usually the situation wasn’t so dire, after all a code base does not typically contain that many asserts, after all in order for an assert to exist someone had to write it explicitly. It got worse when the (false positive) assert was in a loop or part of a recursive call, in these cases you would have to Ignore the same assert dozens if not hundreds of times. In fact it got so bad that the debug version of the application had a menu item which turned off assertions using _CrtSetReportMode(CRT_ASSERT, _CRTDBG_MODE_DEBUG).

I had a small insight, it doesn’t really matter how many times an assert fires, it should optimally not fire at all, but if it does it doesn’t matter if it’s once or a thousand times. If a false-positive assert does creep into your code, the short term goal, is to get it to stop bothering you. For this purpose I wrote the CANCLEABLE_ASSERT macro, this macro never got committed to our code base since it’s obviously not the right thing to do. The right thing to do is fix all the asserts but perhaps it would have been the pragmatic thing to do since this product eventually got to the state where almost nobody used the debug build.

I retouched the macro a bit  for the purposes of this post and here it is in all its glory (or lack thereof).

#ifdef _DEBUG
#include <windows.h>
#include <intrin.h>

#define CANCLEABLE_ASSERT(expr)              /*1*/ \
  do {                                       /*2*/ \
    static bool check(true);                 /*3*/ \
    if(check &&!(expr)) {                    /*4*/ \
      switch(MessageBoxA(NULL,                     \
      "Assertion failed: "#expr                    \
      "\n\nFile: " __FILE__                        \
      "\nLine: " _CRT_STRINGIZE(__LINE__)          \
      "\n\nDo you want to debug?"                  \
      "\n(Cancel means don't assert here again)",  \
      "Debug?", MB_YESNOCANCEL | MB_ICONHAND)){	   \
      case IDYES:                                  \
        __debugbreak()                       /*5*/ \
        break;                                     \
      case IDCANCEL:                               \
        check = false;                             \
      }                                            \
   }                                               \
} while((void)false, false)                  /*6*/
#else
#define CANCLEABLE_ASSERT(x)((void)0)
#endif

A bit of explanation for anyone interested.

  1. If this is the first time you see multi-line macros, having a backslash at the end of the line pulls the next line into the macro, this is why I used C style comments  and not C++ for numbering, otherwise the rest of the macro would also be commented out.
  2. The whole thing is wrapped by a do { } while in order to make this a single statement (here’s why).
  3. The do { } while also introduces a new scope which makes the static boolean very localized and we don’t have to worry about giving it a unique name.
  4. the condition is only evaluated if this specific assert hasn’t been canceled due to logical and short-circuiting this is a nice feature since expr may be arbitrarily costly to compute (this is also why expr is parenthesised).
  5. In case we choose to debug the code the int 3 assembly instruction breaks into the debugger, note that in assembly int stands for interrupt, not integer  Edit: Thanks to Ofek for suggesting I use __debugbreak() instead.
  6. The while condition is ugly in order to prevent compiler warnings.

Cancelable asserts

As an added benefit this dialog has more intiative buttons. Yes I want to debug, No I do not want to debug, please Cancel this breakpoint for the duration of this program.

As I mentioned above, this code did not make into our code base so I can’t vouch for it 100% but I’m hereby placing it in the public domain. I hope you find it useful (or at least mildly interesting).





Global BSTRs

27 09 2012

Working with COM from C++ is pretty natural, after all COM’s interfaces are binary compatible with C++ virtual tables. However there are some places in which the fact that COM was designed to work with C leaks through and causes some friction.

A notable example of this is BSTR, under the covers all BSTR is is a typedef to wchar_t* however a BSTR is conceptually different from a plain string, a BSTR may contain embed Nul characters and is therefore not Nul terminated, instead the length is stored in the memory that precedes the actual string data. BSTRs also resist the concept of const correctness, you can’t have a const BSTR which can be a bit of a pain when you want to have BSTR constants.

What should one do when two different pieces of code want to share the value of a BSTR? There are several options.

1. #define a string literal in a header

#define XPATH_CHEAP_BOOKS L"//genre/book[price<7]/title"

One can pass a (wide) string literal to a function expecting a BSTR and this will even work …if you don’t use BSTR specific methods or marshal the parameter.

void Foo(BSTR bs)
{
    wcout << L"The length of '" << bs << "' is " << SysStringLen(bs) << endl;
}

Foo(L"Your guess is as good as mine");

Code that ran correctly (by accident) for years and years can suddenly start to fail if the class implementing an interface changes from C++ to .NET, if there’s an interop call between C++ and .NET the marshaller go into play. Since the marshaller knows that the correct way to determine the length of a BSTR is to call SysStringLen it will do so and quite probably get an enormous value, this will cause an out of memory exception when trying to allocate the .NET string.

Obviously the same problems happen with const wchar_t*.

Global CComBSTR

CComBSTR XPATH_CHEAP_BOOKS = L"//genre/book[price<7]/title";

Having a global CComBSTR  (in a header file) has several problems

  1. you have to choose it’s storage class
    •  it can’t be extern (which is the default) because then you’ll get a link-time multiple definition error
    • if it’s static there will be an instance for every translation unit (ie .cpp file), which makes point 2 worse
  2. Each constant will allocate a BSTR when the binary (dll or exe) is loaded, even if it’s never used and even if COM isn’t yet ready to successfully run SysAllocString.
  3. Since CComBSTRs can’t be const someone may change them during the application’s lifetime
  4. When the process terminates the destructors will call SysFreeString, this is just a waste of time, the process is going to return all its memory to the OS anyway, who cares if it’s freed or un-freed memory?

A solution

The first hint at a solution came from  __declspec( selectany), this solves problem #1, an extern global object with __declspec( selectany) will not cause a multiple definition error at link time, instead the linker will arbitrarily select one of the symbols and discard the rest.

The next problem (#2) is that we don’t want to initialize each global too early (or indeed at all in case it isn’t used). In order to do this we can just create a class that has a cast operator to BSTR. This cast operator will use lazy evaluation to create the BSTR the first time it is used and cache the value.

Since the cast operator can be called on different threads some protection must be used, thankfully InterlockedCompareExchangePointer is a perfect fit for our needs.

The rest is pretty straight forward.

  • We can avoid #4 (waste of time in process destruction) by neglecting to write a destructor
  • Although we can’t prevent the clients of our code from changing the BSTR at compile time we can verify that nobody has at runtime (I chose to do so only in debug builds)

We want the class to be light weight therefore it’s best if it is initialized with a string literal so that we can grab the pointer and be sure it will be valid throughout the lifetime of the program and not have to copy it. In order to ensure that we really got a string literal we can make the constructor private and make a macro that creates these objects, in the macro we can verify that the string is a literal by concatenating another (empty) string to it (this will fail at compile time for non literal strings). The macro will use a public member factory function that is named in a way that makes it obvious that it should only be passed a string literal (we are protecting against Murphy, not Machiavelli after all).

Warning: The following code seems to work as expected but has not been rigorously tested, I welcome any corrections.

#ifndef GLOBAL_BSTR_HEADER_PROTECTION_
#define GLOBAL_BSTR_HEADER_PROTECTION_
#pragma once

#include "WTypes.h"// BSTR
#include "crtdbg.h"
#ifdef _DEBUG
#include <cwchar>// for ASSERT in LeakBSTR
#endif

// LeakBSTR is for creating globals with the GLOBAL_BSTR macro therefore there is no need 
// to free the BSTR at destruction (hence no destructor)
class LeakBSTR { 
    BSTR bs;
    // It's safe to hold on to the string pointer in the constructor since
    // the private ctor & macro enforce initialization by string literal
    const wchar_t* str; 

    LeakBSTR(const wchar_t* s)
        :  bs(NULL)
        , str(s)
    { }

public:
    static LeakBSTR constructFromStringLiteral(const wchar_t* s)
    {
        return s;
    }

    operator BSTR() 
    {
        if (bs == NULL) {
            BSTR tmp = SysAllocString(str);
            if (InterlockedCompareExchangePointer((void**)&bs, tmp, NULL) != NULL) 
                SysFreeString(tmp); // Another thread already initialized 'bs'
        }

        // make sure nobody messed with the BSTR since it was created
        _ASSERTE(std::wcscmp(str, bs) == 0); 
        return bs;
    }
};

#define GLOBAL_BSTR(name, value) __declspec(selectany) LeakBSTR \
    name(LeakBSTR::constructFromStringLiteral(L"" value L""))

GLOBAL_BSTR(EmptyBSTR, L"");

#endif




A const BSTR isn’t

11 06 2012

If you’ve programmed any COM you’ve come across the ubiquitous BSTR string type. You may have even been tempted to have a read-only BSTR and used a const BSTR. Unfortunately this is not to be. If we look at WTypes.h we’ll see that BSTR is defined thus:

typedef WCHAR OLECHAR;
typedef OLECHAR* BSTR;

Basically a BSTR is a wchar_t*[1], but what happens when we try using a const BSTR?

The rule of thumb when trying to understand what is const in a C++ deceleration is to push the const as far as you can to the right without passing an asterisk and then read from right to left.

const int i; // int const i => i is a const-int 
const int *p; // int const * p => p is a pointer to a const-int
int * const q; // q is a const-pointer to int
const int * const pq; // int const * const pq => pq is a const-pointer to a const-int

This implies that const BSTR is the same as BSTR const, since the typedef captured the asterisk as part of the type the const can travel over it and produce a “wchar_t* const“, i.e.  the pointer is immutable, not the individual characters.

const BSTR deceptive = SysAllocString(L"Read only!");
deceptive[9] = '?'; // deceptive = "Read only?"
deceptive = NULL; // Compile error, deceptive is const

This is probably the exact opposite of the intended purpose.

The same translates to CComBSTR (and _bstr_t), having a const CComBSTR (and passing parameters by const reference) probably don’t do what you think they should.

The only advantage a const CComBSTR has is that it has static linkage but if that’s what you want you should just say static CComBSTR.


[1] At least at the static type level, at the semantic level it’s another story.





Make_shared, almost a silver bullet

22 04 2012

Herb Sutter just published a new Guru of the Week in which he claims one should always use make_shared when created shared pointers.

This is very good advice (as one can expect from Herb) but I can’t resist the urge to nitpick.

As Herb states the advantage of using make_shared is that when you use plain new the shared_ptr object must allocate the reference counting block separately from the (existing) tracked object

Memory layout when using new

On the other hand when using make_shared both the tracked object and the reference counting can be allocated together.

Memory layout when using make_shared

This has the following advantages:

  • One less memory allocation (and deallocation), this is both faster and takes up less memory (the allocator overhead for size tracking)
  • Increased data locality, the ref counts and the object are probably in the same memory page (perhaps even the same cache line)

However there could be rare cases in which the make_shared implementation has a drawback. Consider what happens if we use weak_ptrs to break cycles in our ownership graph and the object’s strong references expire while some weak references hold on. This means that the pointed at object must be destroyed but we still need the reference counts (specifically the weak reference count).

So in the case where we used plain-old new we get:

void releaseStrongRef()
{
    if (--data_->strongRefs_ == 0)
        delete data_->value_; // destroy object and free its memory
    if (data_->weakRefs_ == 0)
        delete data_;
}

void releaseWeakRef()
{
    if (--data_->weakRefs_ == 0)
        delete data_;
}

While the make_shared implementation will have to hold on to the memory till the weak references are released:

void releaseStrongRef()
{
    if(--data_->strongRefs_ == 0) 
        data_->value_->~T();// explicitly destroy object, memory isn't freed
    if(data_->weakRefs_ == 0)
        delete data_;
}

void releaseWeakRef()
{
    if(--data_->weakRefs_ == 0)
        delete data_;
}

So if you’re expecting to use weak_ptrs that will outlive your pointed to object and you need your memory back ASAP consider not using make_shared.





Who has access to your private parts?

17 12 2011

A famous quip says that C++ is the language in which friends have access to your private parts.

But is it only friends?

When writing an RAII object I was feeling a bit lazy (which as we know is a virtue) and was surprised to discover that a local class (a class defined in a function body) has access to its enclosing class’s private members.

class outer {
    int private_;
public:
    outer() : private_(0) {}

    void print() {   cout << private_ << endl; }

    void foo()
    {
        class local { // RAII object
            outer* outer_;
        public:
            local(outer* p)
                : outer_(p)
            {
                outer_->private_++;
            }

            ~local() { outer_->private_--; }
        };

        print(); // -> 0

        {// Scope for RAII object 

            local in(this);
            print(); // -> 1
        }

        print(); // -> 0
    }
};

A quick look through the standard didn’t show where such behaviour is specified but it compiled with all the compilers I threw it at[1].

When I revealed this discovery to my cow-orkers they responded with an empathic “DUH!”  Oh well there  goes my C++ geek-cred  :o(

Allowing local classes access to private members makes a lot of sense on the language design front, after all if you can create a class in a method you should be able to do anything the method can do.  I just never thought of it before and apparently I’m not the only one.  When looking at C++11’s standard (well the final draft actually) I found the following in section 11 (Member access control):

A member of a class can also access all the names to which the class has access. A local class of a member function may access the same names that the member function itself may access.

Since this sentence does not appear in the original ’98 standard I can only assume that greater minds than mine had also missed this point, at least at first.

Having classes which can access your private parts can be very useful, they can be used for temporarily changing state (as in the RAII class above) or they can be used as parameters to STL algorithms[2]. The problem is that having a local class breaks the flow of the function and a nested class must be defined in the header so that something which is an implementation detail becomes globally visible.

But wait, the section in the standard had a footnote. “Access permissions are thus transitive and cumulative to nested and local classes.”

Transitive, that implies more than one level, this gave me an idea for a new(?) C++ pattern. I don’t know if I’m the first who thought of it and I doubt it’s very useful but here goes, I’m very proud to introduce the nested2 pattern.

For every class we can create a backdoor class that allows creation of quasi-friend classes. Then if the need arises we can add these quasi-friend classes in the .cpp file.

// .h file
class troy {
    class wooden_horse;
    int private_;
public:
    troy();
    void foo();
};

// .cpp file
class troy::wooden_horse {
public:
    struct odysseus {
        troy* t_;
        odysseus(troy* p) : t_(p)
        {
            t_->private_++;
        }

        ~odysseus()
        {
            t_->private_--;
        }
    };

    // Add Greek warriors as needed
};

troy::troy()
    : private_(41)
{
}

void troy::foo()
{
    wooden_horse::odysseus odysseus(this);
    cout << private_ << endl;
}

By making the backdoor class private we can be sure that the quasi-friend classes won’t be abused since nobody but our class will be able to create an object of this type.


Edit: Ever since I posted this I couldn’t sleep at night, I knew I would never forgive myself if I don’t to this:

:%s/troy/french_castle/g
:%s/wooden_horse/large_wooden_badger/g
:%s/odysseus/bedemir/g
:%s/Greek warriors/Knights of the round table/g
:wq

[1] VC8, VC10, gcc, clang and Comeu (yes I am a bit obsessive, why do you ask?) 

[2] Although this is less compelling since the introduction of lambda expressions in C++11. On second thought, since lambda functions are implemented as unnameable classes perhaps this is the reason for the change in the standard, to allow lambda functions the same access rights as the method that defined them.