Recognize wide Unicode point on Windows narrow Python build -
Recognize wide Unicode point on Windows narrow Python build -
i have narrow python 2.7.6 build on windows. have string containing both "narrow" (< 0x10000) , "wide" (> 0xffff) unicode code points.
>>> wide1 = u'\u0002b740' >>> wide2 = u'\ud86d\udf40' >>> wide1 == wide2 true >>> narrow = u'\ud86d' >>> s = wide1 + narrow
but when iterate on string, doesn't recognize wide code points:
>>> c in s: >>> c u'\ud86d' u'\udf40' u'\ud86d'
and becomes impossible find out whether char narrow code point or part of wide code point.
you cannot. high unicode codepoints internally represented utf-16 surrogates.
the u+d86d , u+df40 codepoints such surrogates, should never see in normal unicode text usage anyway. quoting wikipedia article on utf-16:
the unicode standard permanently reserves these code point values utf-16 encoding of lead , trail surrogates, , never assigned character, there should no reason encode them. official unicode standard says no utf forms, including utf-16, can encode these code points.
as such u+d800 u+dfff codepoints should not treated narrow points; 1 half of wide codepoint, purpose.
python windows unicode
Comments
Post a Comment