Recognize wide Unicode point on Windows narrow Python build -

i have narrow python 2.7.6 build on windows. have string containing both "narrow" (< 0x10000) , "wide" (> 0xffff) unicode code points.

>>> wide1 = u'\u0002b740' >>> wide2 = u'\ud86d\udf40' >>> wide1 == wide2 true >>> narrow = u'\ud86d' >>> s = wide1 + narrow

but when iterate on string, doesn't recognize wide code points:

>>> c in s: >>> c u'\ud86d' u'\udf40' u'\ud86d'

and becomes impossible find out whether char narrow code point or part of wide code point.

you cannot. high unicode codepoints internally represented utf-16 surrogates.

the u+d86d , u+df40 codepoints such surrogates, should never see in normal unicode text usage anyway. quoting wikipedia article on utf-16:

the unicode standard permanently reserves these code point values utf-16 encoding of lead , trail surrogates, , never assigned character, there should no reason encode them. official unicode standard says no utf forms, including utf-16, can encode these code points.

as such u+d800 u+dfff codepoints should not treated narrow points; 1 half of wide codepoint, purpose.

python windows unicode

Search This Blog

Three

Recognize wide Unicode point on Windows narrow Python build -

Comments

Post a Comment

Popular posts from this blog

php - Android app custom user registration and login with cookie using facebook sdk -

ruby on rails - Devise Logout Error in RoR -

c# - Create a Notification Object (Email or Page) At Run Time -- Dependency Injection or Factory -