Wednesday, May 23, 2012

GSOAP Unicode


It has been a while since my last post. I have been busy doing low level C stuff related to Bluetooth and WiFi in the last two years. There are couple of findings that I want to share during these years. I hope they could be useful to others as well.

Let's get down to the topic. We have been using GSOAP C++ to handle our web service call. We could tweak GSOAP to support unicode on Windows by modifying the typemap.dat file:

xsd__string   = | wchar_t* | wchar_t*

(note: some might wonder why we don't use std:string or std:wstring, one of the requirement given to us was to use primitive data type, so we don't really have choices here)

In Windows. char is 8 bits only and wchar_t is 16 bits. Things are good, we used wchar_t* to store all our strings. Then, we migrated to Linux. In Linux, we are using char, because it is already capable of supporting UTF-8 unicode (i.e. datatype char in Linux is 16 bits).

Under Linux, GSOAP doesn't support unicode if char is used. We found it too troublesome to change all our string datatype to to wchar_t*. 

I finally come up with the idea to hack header file generated by GSOAP wsdl2h. I modified certain strings that needs to be in unicode to wchar_t*. So, only certain string variables are in wchar_t*, not all.

Next, I encounter problem trying to convert wchar_t* to char*. Using wcstombs doesn't help. I have UTF-32 stored in wchar_t* from GSOAP, and I am trying to convert it to UTF-8 stored in char*.

I finally find handy help from code project. It pointed me to unicode.org library file; however, it's a broken link. I manage to find it hosted somewhere in Googlecode site. The code project help and unicode.org library help me to convert wchar_t* to char* (UTF-32 to UTF-8) successfully. It's a big relief.

That's it for now.