Should you use std::string, std::u16string, or std::u32string?

November 17, 2014

C++11 introduced a couple of new string classes on top of std::string:

  1. u16string
  2. u32string

“Finally”, you must think, “C++ has addressed the sorry state of Unicode development in portable code! All I have to do is choose one of these classes and I’m all set!”.

Well, you’d might want to rethink that. To see why, let’s take a look at some definitions:

typedef basic_string<char> string;

typedef basic_string<char16_t> u16string;

typedef basic_string<char32_t> u32string;

As you can see, they all use the same exact template class. In other words, there is nothing Unicode-aware, or anything special at all for that matter, with the new classes. You don’t get “Unicode for free” or anything like that. We do see however an important difference between them – each class uses a different type as an underlying “character”.

Why do I say “character” with double quotes? Well, when used correctly, these underlying data types should actually represent code units (minimal Unicode encoding blocks) – not characters! For example, suppose you have a UTF-8 encoded std::string containing the Hebrew word “שלום”. Since Hebrew requires two bytes per character, the string will actually contain 8 char “characters” - not 4!

And this is not only true for variable length encoding such as UTF-8 (and indeed, UTF-16). Suppose your UTF-32 encoded std::u32string contains the grapheme cluster (what we normally think of as a “character”) ў. That cluster is actually a combination of the Cyrillic у character with the Breve diacritic (which is a combining code point), so your string will actually contain 2 char32_t “characters” – not 1!

In other words, these strings should really be thought of as sequences of bytes, where each string type is more suitable for a different Unicode encoding:

  • std::string is suitable for UTF-8
  • std::u16string is suitable for UTF-16
  • std::u32string is suitable for UTF-32

Unfortunately, after all this talk we’re back to square one – what string class should we use? Well, since we now understand this is a question of encoding, the question becomes what encoding we should use. Fortunately, even though this is somewhat of a religious war, “the internet” has all but declared UTF-8 as the winner. Here’s what renowned Perl/Unicode expert Tom Christiansen had to say about UTF-16 (emphasis mine):

I yesterday just found a bug in the Java core String class’s equalsIgnoreCase method (also others in the string class) that would never have been there had Java used either UTF-8 or UTF-32. There are millions of these sleeping bombshells in any code that uses UTF-16, and I am sick and tired of them. UTF-16 is a vicious pox that plagues our software with insidious bugs forever and ever. It is clearly harmful, and should be deprecated and banned.

Other experts, such as the author of Boost.Locale, have a similar view. The key arguments follow (for many more see the links above):

  1. Most people who work with UTF-16 assume it is a fixed-width encoding (2 bytes per code point). It is not (and even if it were, like we already saw code points are not characters). This can be a source of hard to find bugs that may very well creep in to production and only occur when some Korean guy uses characters outside the Basic Multilingual Plane (BMP) to spell his name. In UTF-8 these things will pop up far sooner, as you’ll be running into multi-byte code points very quickly (e.g. Arabic).
  2. UTF-16 is not ASCII backward-compliant. UTF-8 is, since any ASCII string can be encoded the same (i.e. have the same bytes) in UTF-8 (I say can because in Unicode there may be multiple byte sequences that define the exact same grapheme clusters – I’m not actually sure if there could be different forms for the same ASCII string but disclaimers such as these are usually due when dealing with Unicode:) )
  3. UTF-16 has endianness issues. UTF-8 is endianness independent.
  4. UTF-8 favors efficiency for English letters and other ASCII characters (one byte per character). Since a lot of strings are inherently English (code, xml, etc.) this tradeoff makes sense in most scenarios.
  5. The World Wide Web is almost universally UTF-8.

So now that we know what string class we should use (std::string) and what encoding we should use with it (UTF-8), you may be wondering how we should deal with these beasts. For example – how do we count grapheme clusters?

Unfortunately, that question depends on your use case and can be extremely complex. A couple of good places to start would be UTF8-CPP and Boost.Locale. Good luck :)

Some WinDbg tips

October 30, 2014

I’ve gathered some WinDbg tips over time (mostly for managed dump analysis) and this seems like as good a place as any to share them, so here you go.

Preparation (one time)

  • Install the latest debugging tools from the Dev Center
    • Let’s assume you install them to c:\debuggers
  • Download sosex.dll and place it in c:\debuggers
  • Create an environment variable named _NT_SYMBOL_PATH with the value C:\Symbols;srv*C:\symbols*
    • Most Microsoft symbols should be found in the symbol server and will be cached in C:\Symbols
    • You can copy any other symbols (PDBs) you have to the C:\Symbols folder as well
  • Create an environment variable named _NT_SOURCE_PATH with the value srv* 
    • This will enable source serving (when source indexing was included in the build) – simply double-click the relevant line in the Call Stack (calls) window and you should jump straight to the relevant line in the source code.
    • You might wonder how that’s possible when no server was specified. The answer is a bit surprising – strictly speaking, there is no such thing as a source server! The name is a bit misleading. What actually happens is that the source-indexed PDB contains the proper command(s) to retrieve the relevant source files. By default this “command” would be something like C:\src\foo.cs, and it would only work if the file in the correct version is actually there. However, if source-indexing was enabled in TFS build, it would look something like tf get MyClass.cs /version:C8 (i.e. the file will be retrieved directly from source, with the correct version).

Preparation (per debugging session)

  • Open the dump file in WinDbg
    • be sure to match the architecture to the analyzed process – use WinDbg x86 for 32-bit processes and WinDbg x64 for 64-bit processes
  • Enable Debugger Markup Language (DML) by issuing .prefer_dml 1
    • This will make the output of some commands contain convenient hyperlinks
  • Load the SOSEX extension by issuing the command .load sosex.dll
    • Don’t load SOS.dll manually – the first command below (analyze -v) will load it with the correct version automatically


  • Automatic exception analysis: !analyze -v
    • In many cases this will suffice to find the root cause!
  • Threads
    • !Threads – lists all the managed threads in the process
  • Stack commands
    • !clrstack – provides a true stack trace for managed code only
    • !dumpstack – provides a verbose stack trace
    • !eestac – runs !dumpStack on all threads in the process
    • !dso – displays any managed objects within the bounds of the current stack
    • ! – produces and displays a merged stack trace of managed and unmanaged frames. Note that in addition to the native offset, a managed (IL) offset is specified – this is extremely useful for debugging NullReferenceException’s in that the exact offending IL instruction is indicated (actually it will be one instruction before the offending one in the specific case of NullReferenceException)
    • !sosex.mdso – dumps object references on the stack and in CPU registers in the current context
    • !sosex.mdv – displays argument and local variable information for the current frame
    • !sosex.mframe – displays or sets the current managed frame for the !mdt and !mdv commands
  • Heap commands
    • !eeheap – enumerates process memory consumed by internal CLR data structures
    • !DumpHeap – traverses the garbage collected heap
  • Object commands
    • !do – allows you to examine the fields of an object, as well as learn important properties of the object
    • !dumparray – examines elements of an array object
    • !dumpvc – examines the fields of a value class
    • !sosex.mdt – displays the fields of the specified object or type
  • Method commands
    • !dumpmt – examines a MethodTable
    • !DumpIL – prints the IL code associated with a managed method
    • !U – presents an annotated disassembly of a managed method
    • !sosex.muf – disassembles the method specified by the given MD or code address with interleaved source, IL, and assembly code
  • Exception commands
    • !pe exceptionAddress - formats fields of any object derived from System.Exception
  • GC commands
    • !GCRoot – looks for references (or roots) to an object
  • SOS Help
    • !
    • ! FAQ
  • SOSEX Help
    • !sosexhelp or !

Mismatched SOS.dll versions

!analyze -v  should get the correct SOS.dll version for you. Even if for some reason it doesn’t, SOS.dll warnings can be ignored most of the time, so long as the mscordacwks.dll version is correct (see the next section if that’s not the case). However there may be cases where the correct SOS.dll version is needed (or perhaps you just want to get rid of the warning). Here are some places to look for it (once you find it, copy it to your debugger folder and issue .load sos.dll):

  • Your best bet is the machine on which the dump was taken. Provided that it’s accessible and wasn’t patched since the time the dump was taken, the correct version should be found at C:\Windows\Microsoft.NET\Framework64\v4.0.30319\SOS.dll. Of course it should be there on your machine as well, and your local version may happen to match (or be close enough).
  • You may find the version you’re looking for here (you should be able to extract the file from the update package itself using 7-zip).
  • You may also want to try Psscor4.dll instead of SOS.dll (Psscor is a superset of SOS) – its version need not match the dump (aside of being .NET 4.0). Note that it is less maintained than SOS.
  • For more information see

Mismatched mscordacwks.dll versions

WinDBG should find the correct mscordacwks.dll automatically and download it from the symbol server. If that doesn’t happen, try and do it explicitly by running .cordll -ve -u -l (you’d might want to first run !sym noisy and/or !symfix in order to troubleshoot better – see the next section for details). Failing that, try and get the correct version from the following places, and run .cordll -u -ve -lp PathToFolderContainingMscorDAC once you have it.

  • Again, your best bet is the machine on which the dump was taken (under the same caveats as above). It should be found at C:\Windows\Microsoft.NET\Framework64\v4.0.30319\mscordacwks.dll. As before, it will be there on your machine so you could luck out.
  • The following post lists many versions of CLR 4.0, you may be able to extract the correct version with the same method as above (use 7-zip to open the cab files inside the archive).
  • For more information see

Troubleshooting missing symbols and sources

  • !sym noisy – increases symbol verbosity (always enable this when troubleshooting symbol issues)
  • .srcnoisy 3 – increases source resolution vebosity (use it when double-clicking lines in the callstack window doesn’t bring up the correct sources)
  • lm – displays the specified loaded modules
    • lme displays only modules that have a symbol problem (very useful)
  • .reload /f – forces the reloading of all symbols
    • .reload /f SomeAssembly.dll – forces the reloading of a specified DLL

For further reading try to hit F1 inside WinDbg – the documentation is very good !

Getting started with ETW using .NET’s EventSource

October 13, 2014

.NET 4.5 introduced the EventSource class, allowing convenient access to Event Tracing for Windows (ETW) from managed code. This is a boon for enterprise developers, and I encourage you to go read up on it at MSDN. Also, be sure sure to check out Vance Morrison’s EventSource blog entries. Another useful blog by MS MVP Kathleen Dollard is Leaning into Windows, and Muhammad Shujaat Siddiqi’s blog is worth checking out as well.

Once you’ve got the hang of EventSource, take it to the next level with the Enterprise Library Semantic Logging Application Block (SLAB). You can do some pretty cool stuff with it, including verification of your EventSource class validity, and automatic routing of ETW events to different storage platforms (such as databases, Azure tables, and text files). Start with the developer’s guide.

Finally, if you take a close look at the documentation of the EventSource class, you’ll notice the following note:

There is a NuGet version of the EventSource class that provides more features. For more information, see Microsoft EventSource Library 1.0.16.

I couldn’t find any documentation for this mysterious NuGet package, but I did find its samples package. Once you install it you can take a look at the extensively documented code as well as debug and step into the parts you’re interested in. Be sure to read the guides that accompany the samples: _EventRegisterUsersGuide.docx and _EventSourceUsersGuide.docx - they pertain to the vanilla library that comes with .NET as well (not just the NuGet package).

One thing I couldn’t find is a list of differences between the vanilla .NET EventSource and the NuGet package. Going by the signatures, I only saw a couple more WriteEvent overloads. If you’ve read the documentation, you should know by now that that’s a good thing, however digging a little into the code I found another important difference – event type support.

The .NET version currently supports the following types (taken from ManifestBuilder.GetTypeName):

  1. Enum
  2. Boolean
  3. SByte
  4. Byte
  5. Int16
  6. UInt16
  7. Int32
  8. UInt32
  9. Int64
  10. UInt64
  11. Single
  12. Double
  13. DateTime
  14. String
  15. Guid

The NuGet package adds support for the following:

  1. Char
  2. IntPtr
  3. Byte*
  4. Byte[]

Finally, the beta pre-relase includes a very interesting overload: public void Write<T>(string eventName, T data). The data parameter is documented as follows:

The object containing the event payload data. The type T must be an anonymous type or a type with an [EventData] attribute. The public instance properties of data will be written recursively to create the fields of the event.

In other words, we should be able to use annotated custom types (in a similar fashion to WCF DataContract-annotated objects). But even better than that, we should be able to say:

public void Bar(int i, string s, DateTime d)
    Write(null, new {I = i, S = s, D = d});

Notice I said should, because I could make neither work (beta after all). But why do we even need this when we already have the WriteEvent(int eventId, params Object[]) overload? Well, let’s look at the documentation of the latter:

By default, the compiler calls this overload if the parameters for the call do not match one of the other method overloads. This overload is much slower than the other overloads, because it does the following:

  1. It allocates an array to hold the variable argument.
  2. It casts each parameter to an object (which causes allocations for primitive types).
  3. It assigns these objects to the array.
  4. It calls the function, which then determines the type of each argument so it can be serialized for ETW.

Using an anonymous types, we save the boxing (bullet 2) and unboxing (post bullet 4). I’m not sure how big of a benefit that is, so we may still be forced to use the WriteEventCore method, but I suppose every little bit helps!

Controlling the physical storage location of the Azure Storage Emulator

September 25, 2014

The Azure Storage Emulator is a very convenient tool for working against mocked Azure Storage services – blobs, queues and tables. However, it appears to have no visible means of setting the physical location used for storage, or even determining it. Luckily, both are easily achieve.


Blobs are stored on the file system, and that location is very easy to control:

  1. Navigate to C:\Users\<YourUserName>\AppData\Local\WAStorageEmulator
  2. Open WAStorageEmulator.<EmulatorVersion>.config with the test editor of your choice
  3. Change the PageBlobRoot and BlockBlobRoot to the location you desire (the default is something like C:\Users\<UserName>\AppData\Local\WAStorageEmulator\PageBlobRoot)

Tables and queues

Tables and queues are stored in a SQL server database (LocalDB by default), so in order to set the physical storage location, we simply need to set the physical storage location of the database (mdf file). We’ll use the sqlcmd utility to do that:

  1. Close all programs that may be using the storage emulator database (the storage emulator itself, management studio, visual studio, etc.)
  2. Run sqlcmd -S instancePath
    • By default the storage emulator uses LocalDB, so the above would be sqlcmd -S (localdb)\v11.0
    • If you configured a different SQL Server instance you’ll have to use that instead
    • You can always determine the instance path used by the storage emulator by examining the SQLInstance element in the WAStorageEmulator.<EmulatorVersion>.config file mentioned in the Blobs section above
  3. Type the following commands with [enter] after each line:
    1. SELECT name, physical_name AS CurrentLocation, state_desc FROM sys.master_files;
    2. Go;
  4. You will now see the list of all database files. Pick a file you want to move and note its name and current physical location. For example, suppose you want to move the WAStorageEmulatorDb32 file from its current location to E:\StorageEmulator\LocalDB\WAStorageEmulatorDb33.mdf
  5. Type the following commands with [enter] after each line:
    1. ALTER DATABASE WAStorageEmulatorDb33 SET OFFLINE with no_wait;
    2. Go;
  6. Move the mdf file from its current location to E:\StorageEmulator\LocalDB\WAStorageEmulatorDb33.mdf
  7. Type the following commands with [enter] after each line:
    1. ALTER DATABASE WAStorageEmulatorDb33 MODIFY FILE ( NAME = WAStorageEmulatorDb33, FILENAME = ‘E:\StorageEmulator\LocalDB\WAStorageEmulatorDb33.mdf’);
    2. ALTER DATABASE WAStorageEmulatorDb33 SET ONLINE;
    3. Go;
  8. All done. You can verify the result of the move by re-running the commands in step (3) and observing the new file location.

Knowing where the storage emulator stores its data and how to change that location can be useful. For example, if your C: drive is a small SSD, space may be running low. Conversely, your C: drive may be a slow hard disk drive, and you wish to use a faster SSD for storage emulation.

Obtaining old Mono and MonoDevelop Mac versions

August 28, 2014

I was recently looking for old Mono and MonoDevelop Mac versions and realised they weren’t trivial to find. So for the benefit of all mankind, here are the links:

For more information see:

Happy coding :)

Getting spinner gifs from a CDN

August 16, 2014

Using animated gifs for spinner / loader animation can be quite convenient, and sites such as AjaxLoad allow you to create them to your liking. However, being the early optimization evil person that I am, I was wondering if I could get those off a CDN. There doesn’t seem to be any official support for this, and you won’t get custom taliored gifs to your exact liking, but a sneaky web search reveals quite a few possibilities. For example:

Enjoy :)


July 12, 2014

A couple of months ago I blogged about TvGameLauncher, a command line tool to help you launch your favorite games on your HDMI-connected TV (or any other connected display) with all the necessary steps carried out for you (change primary display, change default audio endpoint, prevent sleep).

The tool works great (at least for me :) ), but its command-line nature leaves is inaccessible to users who aren’t comfortable with the command line and the entire experience isn’t that much fun.

Enter TvGameLauncherGUI. I’ve created a nice(?) WPF GUI front-end for TvGameLauncher, and also improved the latter for good measure:

  1. There is now a useful “darken non-primary displays” that will darken all displays except the one where the game takes place for improved gaming immersion atmosphere.
  2. TvGameLauncher now employs the excellent NirCmd instead of the previous relatively unknown (and less reliable and updated) tools.
  3. Improved logging, error handling, and more.

Get it at SourceForge, and be sure to check out the 5 minute tutorial on Youtube.



The Windows API Code Pack – the case of the missing samples

June 13, 2014

The Windows API Code Pack can be a boon for managed code developers wanting to access Windows functionality such as the TaskBar, Windows Shell, DirectX, Aero, Sensors, and more (see the article for a more complete list).

Unfortunately, the original link is dead (as is the entire MSDN Archive Gallery, may it rest in peace) – you can try the wayback machine but the download link won’t work. As it turns out, the code pack has found a new home in the Nuget repository:

This is all well and good, but these are just the binaries – where are the original code samples? The documentation comprises of thin XML API coverage, not nearly enough for a developer wanting to get started with development quickly. The original download contained many useful samples and these don’t seem to be available anymore.

Fortunately, a quick web search revealed quite a few mirrors of the original package (in its latest 1.1 version) :

And just because I’m paranoid, I’ve uploaded a copy to my OneDrive.

But wait, there’s more! The Windows 7 Training Kit For Developers contains yet additional samples using the Windows API Code Pack!

Happy coding :)


Creating managed wrappers for COM interfaces

May 31, 2014

COM interop can be a very useful tool, but it requires the definitions of the unmanaged interfaces one wants to use. This may be a little tricky, so here’s a small guide to help you out. Note that much of the advice below is applicable to P/Invoke as well.

Make sure it’s necessary

Many COM APIs have managed counterparts, so make sure you don’t waste your time doing what the BCL team already did for you. A quick search would usually lead you to the right StackOverflow / MSDN page. Make sure you check the Windows API Code Pack too.

See if it’s been done

Many times, the interface you want to use has already been written in managed form. For example, IPersistFile is already defined in mscorlib.dll (hint: you already have this referenced in your project by default) under the System.Runtime.InteropServices.ComTypes namespace.

A quick MSDN search should start you off in the right direction – you’d be surprised where you find some of them… For example, Microsoft.VisualStudio.TestTools.UITest.Extension contains IUniformResourceLocator, and Microsoft.VisualStudio.OLE.Interop contains IPropertyStorage. Searches beyond MSDN could be worthwhile as well, especially note Another useful source is Ohloh code search (note that you can filter by language, e.g. C#).

If you don’t want to take a dependency on the assembly that has the definition you want, you can always copy the definitions you need to your source (e.g. from Reflector).

Another repository of managed code definitions is embedded in the P/Invoke Interop Assistant (more on this tool later).

Use TlbImp

Sometimes you won’t find the interface you want defined anywhere. Fortunately, TlbImp can automatically create the needed managed COM definitions for you, provided that you can feed it with the appropriate type library (.tlb) files (note that these could be embedded in EXE and DLL files as resource) .

That would have been straightforward had the OS / SDK actually come with tlb files for the entire Win32 COM API, but that is not always the case. Instead, you can usually find an interface definition (.idl) file containing the interface you desire (use findstr or something like find in files in Notepad++).

Once you find the relevant IDL file, you will need to compile it into a TLB file, which in turn could be fed to TlbImp. The midl compiler does just that, but it will only produce tlb files for Library definitions, which may not be present. Fortunately, we can simply inject some Library definition with the interfaces we want into the IDL file and thus “trick” midl into producing the tlb.

For example, suppose you want to get the managed definition of the IPropertySetStorage interace, which resides in [SDKFolder]\Include\um\PropIdl.Idl. 

  1. Open PropIdl.Idl in your favorite text editor
  2. Add the following after the definition of IPropertySetStorage:
    [ uuid( 03383777-9430-4A45-9417-38B4B5CB4143 )] // (use guidgen.exe to generate this guid)
    library TempLib {
    interface IPropertySetStorage;
  3. Run midl PropIdl.Idl
  4. Run tlbimp PropIdl.tlb
  5. Don’t forget to undo your changes to PropIdl.Idl !

You will now have TempLib.dll containing all the required definitions for the interface.

Use the P/Invoke Interop Assistant

Sometimes you can’t even find IDL definitions for the interfaces you want, and in those cases you have to go hardcore and define them yourself. Note that you should look at the unmanaged definitions in the header (.h) file itself (again find it in the SDK folder with findstr or a similar tool) rather than the MSDN documentation, as definition order is very important and the docs don’t preserve it. MSDN may also present the signatures of the ANSI interface, whereas you’d likely want the Unicode variety (in the header file, they will be defined as InterfaceA for ANSI and InterfaceW for Unicode).

As for the signature conversion itself, the CLR Interop team has released the P/Invoke Interop Assistant to, well, assist you with that. Basically, you paste in the unmanaged signature of the interface / struct you want to use (grab them from MSDN), and it generates the managed definitions for you. Note that it may need your help sometimes for typedefs it doesn’t know (in these cases you need to “unwrap” the typedefs until you reach something it knows,  usually a basic type) so browsing the actual header files in Visual Studio may be more convenient (simply start a new Win32 project and include the desired header).

For more information on marshaling data between managed and unmanaged code see Marshaling Data with Platform Invoke. The P/Invoke Data Types table and the COM Data Types table are great cheat-sheets to have handy, too.

Don’t let Work Guy screw Home Guy over

May 1, 2014

A great man once said:

I never get enough sleep. I stay up late at night, cause I’m Night Guy. Night Guy wants to stay up late. ‘What about getting up after five hours sleep?’, oh that’s Morning Guy’s problem… That’s not my problem, I’m Night Guy – I stay up as late as I want! So you get up in the morning, you’re exhausted, groggy… oooh I hate that Night Guy!

See, Night Guy always screws Morning Guy. There’s nothing Morning Guy can do. The only Morning Guy can do is try and oversleep often enough so that Day Guy looses his job and Night Guy has no money to go out anymore.

Today, after tackling some annoying WCF issues at a late hour, I realized Work Guy and Home Guy have a very similar relationship.

When I’m at work , I stay to work late cause I’m Work Guy. I  want to track down that bug, I want to finish that piece of code, I want to get that test to pass, I want to study that interesting .NET topic a little further.

“What about getting home at 10:00 PM, missing the gym session you wanted to have, skipping that UFC title fight event you wanted to watch, and delaying that level you wanted to complete in your latest video game?”, oh that’s Home Guy’s problem - I’m Work Guy, I stay at work as late as I want!

So you get home late, and by the time you eat, shower, and write a small blog post you have to go to bed… ooh I hate that Work Guy ! It’s a good thing that by that time you’re already Night Guy, so you can stay up late :)


Get every new post delivered to your Inbox.