* [[http://openhome.cc/Gossip/Programmer/Encoding1.html|程式人專欄: 亂碼 1／2（上）頻遭模糊的字元集與編碼基礎]]
  * [[http://openhome.cc/Gossip/Programmer/Encoding2.html|程式人專欄: 亂碼 1／2（下）常見字元編碼處理方式]]
  * [[http://www.luanxiang.org/blog/archives/1271.html|浅谈编码]]

  * 寬字符 ([[wp>Wide character]], [[http://www.cplusplus.com/reference/cwchar/wchar_t/|wchar_t]])
  * 多字節字符 (multibyte character)
  * 字符/字元 (character)
  * 碼位/碼點 (code point)
    * 字符對應的編號。 
  * 字符集/字元集 (character set)
    * 字符和碼位的對應關係。 
  * 編碼 (encoding)
    * 碼位實際儲存在內存或是磁盤上的內容。
    * UTF-8 就是 Unicode 字符其中一種編碼。
    * [[wp>ASCII]] 是一種編碼。
    * 碼位 (code point) 和編碼 (encoding) 是兩種不一樣的概念。
  * [[wp>Byte order mark]]
    * 指名後續編碼為大端或是小端。

  * 代碼表/頁碼表 (code page)
    * 字符對應的編碼 (encoding)。 

  * byte string。即 char*。
  * character string。即 wchar_t*。

  * 小結:
    * Unicode 是字符集 (character set)。
    * Unicode 有底下幾種編碼 (encoding):
      * UCS-2、UCS-4、UTF-8、UTF-16 (Windows 採用) 和 UTF-32。   
====== Windows ======
  * [[http://unicodebook.readthedocs.org/en/latest/operating_systems.html#operating-systems|Programming with Unicode (Windows)]]

  * [[http://msdn.microsoft.com/en-us/library/06b9yaeb.aspx|Text and Strings in Visual C++]]
    * [[http://msdn.microsoft.com/en-us/library/cwe8bzh0.aspx|Unicode and MBCS]]
      * <q>Unicode is a 16-bit character encoding, providing enough encodings for all languages. All ASCII characters are included in Unicode as widened characters.</q>
      * Windows 採用 UTF-16 做為 Unicode 字符集的編碼。
    * Unicode: Windows ME/Windows 98 以前的平台不支援
    * MBCS: Unicode 的替代品，在所有 Windows 平台皆支援。新開發的軟件不建議採用 MBCS，直接採用 Unicode。
    * SBCS: 即 ACSII

  * [[http://msdn.microsoft.com/en-us/library/c426s321.aspx|Generic-Text Mappings in Tchar.h]]
    * 巨集 _TCHAR 的對應如下:
      * Unicode (UTF-16): wchar_t。此為 Windows 規定。
      * MBCS: char
      * SBCS: char

  * 小結:
    * 工程設置一律採用 "Use Unicode Character Set"。  
    * 使用 TCHAR 宣告字符串常量和變數。
    * 使用 _t 開頭的函式。
    * [[http://utf8everywhere.org/#how|How to do text on Windows]] 不建議上述兩項作法。
===== 轉換 =====
  * [[http://msdn.microsoft.com/zh-tw/library/ms235631.aspx|How to: Convert Between Various String Types]]
    * <q>A char * string (also known as a C style string) uses a null character to indicate the end of the string. C style strings usually require one byte per character, but can also use two bytes. In the examples below, char * strings are sometimes referred to as multibyte character strings because of the string data that results from converting from Unicode strings. </q>  
    * [[http://stackoverflow.com/questions/8032080/how-to-convert-char-to-wchar-t|How to convert char* to wchar_t*?]]

  * 多字節字符串 (multibyte character string) 轉寬字符串 (wide character string)
    * [[http://msdn.microsoft.com/zh-tw/library/eyktyxsx.aspx|mbstowcs_s]]<code c>
    // Create and display a C style string, and then use it 
    // to create different kinds of strings.
    char *orig = "Hello, World!";
    cout << orig << " (char *)" << endl;

    // newsize describes the length of the 
    // wchar_t string called wcstring in terms of the number 
    // of wide characters, not the number of bytes.
    size_t newsize = strlen(orig) + 1;

    // The following creates a buffer large enough to contain 
    // the exact number of characters in the original string
    // in the new format. If you want to add more characters
    // to the end of the string, increase the value of newsize
    // to increase the size of the buffer.
    wchar_t * wcstring = new wchar_t[newsize];

    // Convert char* string to a wchar_t* string.
    size_t convertedChars = 0;
    mbstowcs_s(&convertedChars, wcstring, newsize, orig, _TRUNCATE);
    // Display the result and indicate the type of string that it is.
    wcout << wcstring << _T(" (wchar_t *)") << endl;
</code><code c>
errno_t mbstowcs_s(
   size_t *pReturnValue,
   wchar_t *wcstr,
   size_t sizeInWords,
   const char *mbstr,
   size_t count 
);
</code>
      * <q>mbstowcs_s uses the current locale for any locale-dependent behavior; _mbstowcs_s_l is identical except that it uses the locale passed in instead.</q>
    * [[http://msdn.microsoft.com/en-us/library/windows/desktop/dd319072(v=vs.85).aspx|MultiByteToWideChar]]<code c>
    // 計算輸入(欲轉換)字串 pszValue 的字數 (character)。注意! 非字節數 (byte)。
    size_t n = ::MultiByteToWideChar(CP_ACP,0,(const char *)pszValue,-1,NULL,0);
    
    // 配置輸出(欲轉出)字串緩衝區。
    wchar_t* buffer = new wchar_t[n];
    
    // 轉換輸入字串 (pszValue) 至輸出字串緩衝區 (buffer)。
    ::MultiByteToWideChar(CP_ACP,0,(const char *)pszValue,-1,buffer,int(n));
    
    // 將緩衝區資料另存起來。
    m_strValue = tstring(buffer);
    
    // 釋放緩衝區。
    delete [] buffer;
</code><code c>
int MultiByteToWideChar(
  _In_       UINT CodePage,
  _In_       DWORD dwFlags,
  _In_       LPCSTR lpMultiByteStr,
  _In_       int cbMultiByte,
  _Out_opt_  LPWSTR lpWideCharStr,
  _In_       int cchWideChar
);
</code>
      * CodePage
        * 輸入字串的編碼。
      * lpMultiByteStr
        * 輸入字串指針
      * cbMultiByte
        * 欲處理輸入字串多少個字節數 (byte)。
        * 若為 -1，代表輸入字串為空字符 (NULL) 結尾。MultiByteToWideChar 返回值代表輸入字串的字數 (character)，包含空字符。
      * lpWideCharStr
        * 輸出字串緩衝區。可為 NULL。
      * cchWideChar
        * 輸出字串緩衝區大小，以字數計 (character)。
        * 若為 0，MultiByteToWideChar 返回值代表輸出字串緩衝區 (lpWideCharStr) 所需字數 (character)。
    * [[http://www.cplusplus.com/reference/sstream/wstringstream/|wstringstream]]<code cpp>
#include <sstream>
#include "tstring.h"

std::wstringstream wss;
wss << pszValue;
m_strValue = tstring(wss.str().c_str());
</code>
      * [[http://blog.163.com/tianshi_17th/blog/static/4856418920085209414977/|也谈C++中char*与wchar_t*之间的转换]]
      * 尚未驗證其正確性。

  * 寬字符串 (wide character string) 轉多字節字符串 (multibyte character string)

  * UTF-8 一般可以用 char* 表示，因為彼此皆為 8-bit 編碼。char 是 signed 或是 unsigned 不影響。
    * [[http://stackoverflow.com/questions/148403/utf8-to-from-wide-char-conversion-in-stl|UTF8 to/from wide char conversion in STL]]
      * [[http://llvm.org/svn/llvm-project/llvm/trunk/include/llvm/Support/ConvertUTF.h|ConvertUTF.h]]
    * [[http://stackoverflow.com/questions/8818652/how-can-char-represent-an-utf-8-string|How can char[] represent an UTF-8 string?]]

  * 小結:
    * 編碼在編譯器之應用，編譯器內部應採用寬字符串 (wide character string)，即 Windows 內部支援的 Unicode; 外部輸入或輸出一律視為多字節字符串 (multibyte character string)，可能是 ASCII 或是 UTF-8。
    * 編譯器使用的 Lex 應視輸入為 UTF-8，再轉換成寬字符，交給後續程序處理。
===== 其它 =====
  * [[http://stackoverflow.com/questions/11107608/whats-wrong-with-c-wchar-t-and-wstrings-what-are-some-alternatives-to-wide|What's “wrong” with C++ wchar_t and wstrings? What are some alternatives to wide characters?]]
  * [[http://stackoverflow.com/questions/13087219/what-exactly-is-the-l-prefix-in-c|What exactly is the L prefix in C++?]]
  * [[http://stackoverflow.com/questions/17103925/how-well-is-unicode-supported-in-c11|How well is unicode supported in C++11?]]

  * [[http://stackoverflow.com/questions/16167305/why-does-my-application-require-visual-c-redistributable-package|Why does my application require Visual C++ Redistributable package]]

  * [[http://www.codeproject.com/Articles/76252/What-are-TCHAR-WCHAR-LPSTR-LPWSTR-LPCTSTR-etc|What are TCHAR, WCHAR, LPSTR, LPWSTR, LPCTSTR (etc.)?]]
  * [[http://www.cppblog.com/seahouse/archive/2011/01/13/137571.aspx|VSVC编译选项/MDd与/MTd]]
  * [[http://stackoverflow.com/questions/604484/linker-errors-between-multiple-projects-in-visual-c|Linker errors between multiple projects in Visual C++]]

  * char* 轉換成 LPCTSTR
    * [[http://blog.chinaunix.net/uid-11143705-id-90369.html|char 转wchar_t 及wchar_t转char]]
    * [[http://blog.csdn.net/fengshalangzi/article/details/5815073|wchar_t与char转换(总结）]]
    * [[http://www.cnblogs.com/gdutbean/archive/2012/03/31/2427609.html|char 转wchar_t 及wchar_t转char]]
  
====== Linux ======
  * [[http://unicodebook.readthedocs.org/en/latest/index.html|Programming with Unicode]]
  * [[http://www.cprogramming.com/tutorial/unicode.html|Unicode in C and C++: What You Can Do About It Today]]
  * [[http://linuxprograms.wordpress.com/tag/unicode/|C: Using scanf and wchar_t to read and print UTF-8 strings]]
  * [[http://www.codeproject.com/Articles/38242/Reading-UTF-with-C-streams|Reading UTF-8 with C++ streams]]
====== 中文編碼 ======
  * [[wpzh>中日韓統一表意文字]]
  * [[wpzh>大五碼]]
====== 參考資料 ======
  * [[http://www.utf-8.com/|UTF-8 and Unicode]]
  * [[http://www.joelonsoftware.com/articles/Unicode.html|The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)]]
    * [[http://www.csie.ntu.edu.tw/~p92005/Joel/Unicode.html|每個軟體開發者都絕對一定要會的Unicode及字元集必備知識(沒有藉口！)]]
    * [[http://stackoverflow.com/questions/20942469/clarification-on-joel-spolskys-unicode-article|Clarification on Joel Spolsky's Unicode Article]]
  * [[http://www.utf8everywhere.org/|UTF-8 Everywhere]]
  * [[https://docs.python.org/release/3.2/howto/unicode.html|Unicode HOWTO]]
  * Unicode 編碼速查
    * [[http://www.scarfboy.com/coding/unicode-tool|Unicode lookup/search tool]]
    * [[http://www.fileformat.info/info/unicode/char/search.htm|Unicode Character Search]]
  * 自製編程語言第五章