"Kelvin Moss" <km**********@yahoo.comwrote:
#
# SM Ryan wrote:
# # Hi all,
# #
# # How could one write an strstr function to work with unicode characters?
# #
# # Are there existing implementations/solutions/api for doing so?
# >
# String functions should work just fine on UTF-8 encoded unicode
# characters - minding that nonASCII characters will have codes greater
# than 127 (or less than zero) and might be represented by multiple bytes.
# For something like strstr which should only be looking for byte
# sequences without embedded zeros, it should be fine, while strchr
# can be problematically.
#
# Yes, I am dealing with UTF8 encoded Unicode characters. So you mean to
# say that as long as I don't have embedded zeroes in the strings strstr
# should be fine. Right? I think this assumption may not work quite well
# in real applications. Your thoughts?
UTF-8 bytes are ASCII characters plus nonzero bytes; UTF-8 encoding
does not insert zero bytes where none existed before. As long as all
you're doing is shuffling bytes around, you can use most str* functions.
Functions like strchr which expect one char to be one character
will only work on the ASCII subset.
In FILEs, you have to negotiate with other programs how they will
interpret byte sequences. If all the applications assume UTF-8
encodings in FILEs, and they handle UTF-8 internally, then everything
will be fine.
--
SM Ryan
http://www.rawbw.com/~wyrmwif/
I love the smell of commerce in the morning.