Cactus shirt

17 05 2015

This is the second post in my shirts series, in my first post I told about the hobby I had twenty years ago, drawing on shirts. Since they have started falling to pieces and I can’t make myself throw them out I decided to write about them so that they can live on in digital form and I can reclaim some wardrobe space.

This is one of my first shirts so there are a few unforced errors which I attempted to cover up.

The Front

Cactus-front

On the front I have my Abrahamic character drinking from a decapitated cactus with the caption “Enjoy Cactus Cola, the Sheik’s thing”. This is a play on Coca Cola’s slogan and the similarity between Sheik and Chic. Since this is a cactus the sheik’s hand is obviously bleeding. One of the reasons I’ve stopped wearing this shirt at home is that my children find the blood very disturbing and can’t help but comment about it whenever they see the shirt.

I then got a stain in the centre of the shirt and had to cover it up, I chose a skull and crossbones with the warning

Use of this product may prove hazardous to haemophiliacs

The Back

 Cactus-back

The back of the shirt is a bit of a hodgepodge of desert-based jokes.

I have the character from the front of the shirt crawling towards a mirage. You’ll notice that he has a star and crescent armband, this is to cover up my second error where I initially draw the arm as some kind of Möbius strip.

Next to the mirage is the skeleton of a fish (complete with the skeleton of bubbles coming out of its mouth). The idea was that the fish came to live in the mirage and died since there wasn’t really any water there (what can I say I thought it was amusing at the time).

The sun is wearing sunglasses as is its wont and drinking from a can of Mercury with a straw (mercury being both a liquid and a celestial body). The can of mercury is labelled with both its astrological and chemical symbol, I should note that this was before I heard of the Mercury company I later worked for.

Next to that is the skeleton of Joe Camel who died of lung cancer (I thought that was edgy at the time) and a dog gnawing on one of its bones.

And to complete the plethora of pathetic puns are the Cacti family with the Mother cactus, father cactus and their son showing off his muscles.





Slicing up a UTF-8 string

30 03 2015

A couple of years ago I had to deal with some low level code that sent a UTF-8 encoded string as packets of bytes. At first I converted to string and stored a concatenation of the result but I got a defect saying that we would sometimes get funny strings that contained a � character. I recognized the Unicode replacement character and quickly figured out that the cause was that a multi-byte UTF-8 character was was split between two packets and thus could not be correctly converted to a string. The solution was simple, just accumulate the data as bytes and only convert to string when all the data has been received.

This memory surfaced when I performed a code review for a colleague who was facing a 1 MiB size limitation when using Chrome’s Native Messaging, his solution was to cut the message into chunks and send them one after the other.

I warned him about the danger of arbitrarily splitting a UTF-8 string without checking if you’re at a character boundary.

As mentioned in Wikipedia’s entry for UTF-8, one of the main advantages with UFT-8 is that it is backwards compatible with ASCII, this means that all ASCII characters have the same meaning in UTF-8. Since ASCII uses 7 bits and have a 0 MSB in UTF-8 a 0 MSB denotes a single byte character. The first byte of all multi-byte characters begin with 1 bits times the number of bytes in the character, followed by a (e.g. a three byte character will start with 1110). All the other bytes in the character (known as continuation bytes) all begin with 10.

Here’s a summary table:

First bit(s) Condition It is a Rule
0
(byte & 0x80) == 0
Single byte character It’s OK to cut before or after it
10
(byte & 0xC0) == 0x80
Continuation byte Do not cut before or after it
11
(byte & 0xC0) == 0xC0
First bye of multi-byte character It’s OK to cut before it but not after it




OCD is the path to the dark side

6 01 2015

A while back I had to wrap a built in JavaScript function, this is pretty simple thanks to the fact that JavaScript is a dynamic prototype based language. Here’s an example of how this can be done (not the actual function or functionality in question):

(function wrapAddEventListener() {
  var orig = HTMLElement.prototype.addEventListener;
  function wrapper(name, handler, capture) {
    console.log("Added a handler for " + name + ' on ' + this); 
    orig.call(this, name, function(ev) { 
      console.log("Got Event " + ev.type); 
      handler(ev); 
    }, capture);
  };

  HTMLElement.prototype.addEventListener = wrapper;	
})();

The problem was that then my OCD kicked in because now if I type document.body.addEventListener in the console I get the function’s body instead of function addEventListener() { [native code] }. For some reason this bothered me (why?) enough in order to add the following line to the function wrapping code

wrapper.toString = function() { 
    return orig.toString() 
}

Now this is deceitful and worthless since it doesn’t really achieve anything, debugging into the function will show the wrapper code. Still I felt that for aesthetic reasons this is preferable.

I’m not sure if covering your tracks like this is evil (since it’s deceitful) or acceptable since it isn’t hiding any semantic changes. I’ll just hope its the worst of my sins for the upcoming year…





Converting Unicode to Unicode

11 11 2014

Recently my matchmaker called me over for a consultation. He was facing some trouble with text encoding and since I once read Joel’s The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!I’m considered an expert (rather than barely competent which is also an overstatement).

From the get go it was obvious that the problem was in converting UTF-8 strings to UTF-16. Two main methods were used for this, the CW2A classes and CComBSTR’s constructor that accepts a const char*. These methods both use the CP_THREAD_ACP code page when converting strings and you cannot set the thread local to be UTF-8.

After introducing a fix we inspected the results in the debugger and were confused by what we saw in the watch window. We therefore decided to have a look at a toy example.

Analyzing the problem

Consider the string “Bugs Я Us” which contains the Russian letter “Я” (ya).

int main(int argc, char* argv[])
{
	const wchar_t * wide = L"Bugs Я Us";
	CW2A cw2a(wide);
	CW2A cw2a8(wide, CP_UTF8);
	string str = CW2A(wide);
	string str8 = CW2A(wide, CP_UTF8);
	CComBSTR bs(str8.c_str());
	CComBSTR bs8(CA2W(str8.c_str(), CP_UTF8));
}

Our toy example gave almost the expected results:

Type Default CP_UTF8
CW2A Bugs ? Us Bugs Я Us
std::string Bugs ? Us Bugs Я Us
CComBSTR Bugs Я Us Bugs Я Us

The things that surprised me were the cells in red, those should have the correct string surely?

Then I remembered about the s8 format specifier which instructs Visual studio to display strings as UTF-8, perhaps the strings are correct but Visual Studio is misleading us! After adding s8 to the watch window things look marginally better. Now only the std::string differs from my expectations.

Type Default CP_UTF8
CW2A Bugs ? Us Bugs Я Us
std::string Bugs ? Us Bugs Я Us
CComBSTR Bugs Я Us Bugs Я Us

A bit more poking around showed that the reason for this is the std::string’s visualizer uses the s specifier.

You can find the visualizer in:
<VS Install Directory>\Common7\Packages\Debugger\Visualizers\stl.natvis

I added the red 8s to the file (you have to do this as administrator).

<Type Name="std::basic_string&lt;char,*&gt;">
  <DisplayString Condition="_Myres &lt; _BUF_SIZE">{_Bx._Buf,s8}</DisplayString>
  <DisplayString Condition="_Myres &gt;= _BUF_SIZE">{_Bx._Ptr,s8}</DisplayString>
  <StringView Condition="_Myres &lt; _BUF_SIZE">_Bx._Buf,s8</StringView>
  <StringView Condition="_Myres &gt;= _BUF_SIZE">_Bx._Ptr,s8</StringView>

 

Now, std::string, at least, defaults to UTF-8 representation in the debugger visualizer

watch8

You may be asking yourself why there are two lines each for DisplayString and StringView, this is due to the fact that Visual C++’s string uses the Short String Optimization which avoids dynamic allocations for short strings.

I personally think that Visual Studio should allow configuring the default encoding it uses to display strings, much as it allows displaying numbers in hexadecimal format.

hex

Detecting Additional Offenders

After fixing the original bug we tried to find other locations that may be harbouring similar bugs.

Finding all instances of CW2A is easy, just grep for it, but finding places that use a specific overload of CComBSTR’s constructor or assignment operator is more of a problem.

One way to do this is to mark the offending methods as deprecated. Using #pragma deprecated would allow us to deprecate a method without editing VC’s headers but since we want to deprecate a specific overload this is not an option. I had to use my administrator rights again to edit atlcomcli.h.

declspec

Now we get a warning for every use of the deprecated method and decide whether you’ve found a lurking bug.

warning

 

 





Big in Lichtenstein

30 09 2014

I’ve been neglecting this blog recently but I still pop in once in a while to see if anyone is interested. Since there’s been nothing new in months it’s not surprising that the views are pretty consistently in the low double digits.

One thing that did surprise me was to discover that in the last month the fifth country in regards to visits to my blog was Lichtenstein! 28 views, I didn’t even know Lichtenstein had that many residents.

Liechtenstein

Even more surprising is that when searching for Liechtenstein, the country is the third Wikipedia result.





Wedding wishes

6 07 2014

Just a quick post since I’m on my way to a cousin’s wedding. This time I won’t be putting on the greeting card what I consider to be the ultimate wedding wish.

May you both be as happy in your marriage as my wife and I thought we would be.

 

The last time I used this I was later confronted by a worried looking co-worker asking me if everything was OK at home.





Punishments are a Poor Parenting Practice

6 02 2014

They say that you shouldn’t threaten children with punishments, it’s more empowering for children to have the consequences of their actions explained to them.

Say for example that you put your child’s clothes on the radiator so they’re warm and cozy when he gets up.

Now if the child finds it hard to get up in the morning you can tell him:

The radiator has gone off, you should get dressed quickly before your clothes cool down.

And that’s considered good parenting.

If on the other hand, he stays in bed till well after the clothes have reached room temperature and you say:

If you don’t get a bloody move on and get dressed right now I’ll put your clothes in the fridge!

Well in that case, some people may claim, you’re doing things sub-optimally.








Follow

Get every new post delivered to your Inbox.