fokifantasy.blogg.se - Codepoints bmp

CODEPOINTS BMP CODE

The Uniscribe API supports supplementary characters.The Windows GDI API supports format 12 cmap tables in fonts so that surrogates can be displayed correctly.Windows supports surrogate-enabled input method editors (IMEs).For more information, see the OpenType font specification. Format 12 of the OpenType font cmap table directly supports the 4-byte character code.The operating system supports supplementary characters in the following ways: However, not all system components are compatible with supplementary characters. Windows 2000 introduces support for basic input, output, and simple sorting of supplementary characters. For more details about supplementary characters, surrogates, and surrogate pairs, refer to The Unicode Standard. Using the surrogate mechanism, UTF-16 can support all 1,114,112 potential Unicode characters.

CODEPOINTS BMP CODE

The second (low) surrogate is a 16-bit code value in the range U+DC00 to U+DFFF. The first (high) surrogate is a 16-bit code value in the range U+D800 to U+DBFF.

For UTF-16, a "surrogate pair" is required to represent a single supplementary character. About Supplementary CharactersĪ supplementary character is a character located beyond the BMP, and a "surrogate" is a UTF-16 code value. For UTF-16 to represent this larger set of characters, the Unicode Standard defines "supplementary characters". Naturally, most code points beyond the BMP do not yet have characters assigned to them, but definition of the planes gives Unicode the potential to define 1,114,112 characters (that is, 2 16 * 17 characters) within the code point range U+0000 to U+10FFFF. The Unicode standard has established 16 additional "planes" of characters, each the same size as the BMP. Unicode version 4.1 includes over 97,000 characters, with over 70,000 characters for Chinese alone. The use of 16 bits allows direct representation of 65,536 unique characters, but this Basic Multilingual Plane (BMP) is not nearly enough to cover all the symbols used in human languages. Windows applications normally use UTF-16 to represent Unicode character data.