473,386 Members | 1,773 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,386 software developers and data experts.

mblen and mbrlen

I've been getting inconsistent results with mblen and mbrlen on
Solaris.

Although mblen accepts a multibyte string, mbrlen always rejects it,
reporting an encoding error. The mbstate_t variable is valid (it has
been zeroed with memset, and I have confirmed that it is in a valid
initial state with mbsinit directly before the call to mbrlen).

Can anyone shed any light on why this could be happening ?
Nov 13 '05 #1
9 2327
In article <c6**************************@posting.google.com >, Paul King wrote:
I've been getting inconsistent results with mblen and mbrlen on
Solaris.

Although mblen accepts a multibyte string, mbrlen always rejects it,
reporting an encoding error. The mbstate_t variable is valid (it has
been zeroed with memset, and I have confirmed that it is in a valid
initial state with mbsinit directly before the call to mbrlen).

Can anyone shed any light on why this could be happening ?

Maybe you could provide a trimmed down runnable program that
exhibits the behaviour that you describe?
--
Andreas Kähäri
Nov 13 '05 #2
In <sl**********************@otaku.freeshell.org> Andreas Kahari <ak*******@freeshell.org> writes:
In article <c6**************************@posting.google.com >, Paul King wrote:
I've been getting inconsistent results with mblen and mbrlen on
Solaris.

Although mblen accepts a multibyte string, mbrlen always rejects it,
reporting an encoding error. The mbstate_t variable is valid (it has
been zeroed with memset, and I have confirmed that it is in a valid
initial state with mbsinit directly before the call to mbrlen).

Can anyone shed any light on why this could be happening ?


Maybe you could provide a trimmed down runnable program that
exhibits the behaviour that you describe?


Even then, it would be problematic: mbrlen is not a C89 function and
the Solaris libraries do not claim C99 conformance. Posting to a Sun
newsgroup may be a better idea.

Dan
--
Dan Pop
DESY Zeuthen, RZ group
Email: Da*****@ifh.de
Nov 13 '05 #3
Da*****@cern.ch (Dan Pop) wrote in message news:<bp**********@sunnews.cern.ch>...

Maybe you could provide a trimmed down runnable program that
exhibits the behaviour that you describe?


Even then, it would be problematic: mbrlen is not a C89 function and
the Solaris libraries do not claim C99 conformance. Posting to a Sun
newsgroup may be a better idea.


Thanks. I could probably trim down a program to work, although I
suspect that the locale might be an issue. But I've already tried
every comparison I can think of and it seems to be entirely a
difference between which strings the two routines will accept.

A multibyte (UTF-8 I beleive) string that fails on mbrlen but not
mblen is:

"\316\272\316\261\316\273\316\267\316\274\316\255\ 317\201\316\261\316\272\317\214\317\203\316\274\31 6\265!"
Nov 13 '05 #4
in comp.lang.c i read:
I could probably trim down a program to work, although I
suspect that the locale might be an issue.
absolutely it is involved, LC_CTYPE affects it's behavior, so if the
current setting isn't utf-8 and you feed mb(r)len a pointer to a utf-8
string there's no telling just what will be the result. i agree that
it's very odd that they don't return the same value, but merely odd not
incorrect (since this would be undefined behavior anything is possible).
A multibyte (UTF-8 I beleive) string that fails on mbrlen but not
mblen is:
mb(r)len doesn't determine the length of a string, only the number of bytes
involved in a single multi-byte character, so ...
"\316\272\316\261\316\273\316\267\316\274\316\255 \317\201\316\261\316\272\317\214\317\203\316\274\3 16\265!"


if this is indeed utf-8 then mblen and mbrlen are only determining the
length of the first character, which is "\316\272" (i.e., u+03ba -- greek
small letter kappa), so i would expect a return value of 2 from either so
long as LC_CTYPE is set correctly (i.e., you have first called setlocale
with appropriate arguments).

--
a signature
Nov 13 '05 #5
In <m1*************@usa.net> those who know me have no need of my name <no****************@usa.net> writes:
mb(r)len doesn't determine the length of a string, only the number of bytes
involved in a single multi-byte character, so ...
"\316\272\316\261\316\273\316\267\316\274\316\25 5\317\201\316\261\316\272\317\214\317\203\316\274\ 316\265!"


if this is indeed utf-8 then mblen and mbrlen are only determining the
length of the first character, which is "\316\272" (i.e., u+03ba -- greek
small letter kappa), so i would expect a return value of 2 from either so
long as LC_CTYPE is set correctly (i.e., you have first called setlocale
with appropriate arguments).


Not necessarily: utf-8 may be the multibyte character encoding used in the
C locale. Only the implementation documentation can tell, but I doubt
that one and the same implementation supports more than one encoding
method for multibyte characters, depending on the locale setting
(although this is allowed by the standard).

In principle, UCS-4 should provide proper support for *any* locale.
That's why it was created in the first place.

Dan
--
Dan Pop
DESY Zeuthen, RZ group
Email: Da*****@ifh.de
Nov 13 '05 #6
those who know me have no need of my name <no****************@usa.net> wrote in message news:<m1*************@usa.net>...
A multibyte (UTF-8 I beleive) string that fails on mbrlen but not
mblen is:
mb(r)len doesn't determine the length of a string, only the number of bytes
involved in a single multi-byte character, so ...


Yes, I am aware of that - however since we are dealing with a variable
length coding system I thought it best to supply the whole string.
"\316\272\316\261\316\273\316\267\316\274\316\255 \317\201\316\261\316\272\317\214\317\203\316\274\3 16\265!"


if this is indeed utf-8 then mblen and mbrlen are only determining the
length of the first character, which is "\316\272" (i.e., u+03ba -- greek
small letter kappa), so i would expect a return value of 2 from either so
long as LC_CTYPE is set correctly (i.e., you have first called setlocale
with appropriate arguments).


The locale should have been set correctly - although it's not easy to
check and the setup relies on the environment rather than calling
setlocale in the program. I am getting the correct result (as you
say, 2) for mblen - and in fact it walks the whole string. mbrlen
gives up on the first character.
Nov 13 '05 #7
In <c6**************************@posting.google.com > pa*********@convergys.com (Paul King) writes:
The locale should have been set correctly - although it's not easy to
check and the setup relies on the environment rather than calling
setlocale in the program.


If you don't call setlocale, you're in the C locale. All you can control
from the environment is the "" locale, but this locale still has to be
made the current locale with a setlocale call.

Dan
--
Dan Pop
DESY Zeuthen, RZ group
Email: Da*****@ifh.de
Nov 13 '05 #8
Da*****@cern.ch (Dan Pop) wrote in message news:<bp**********@sunnews.cern.ch>...

If you don't call setlocale, you're in the C locale. All you can control
from the environment is the "" locale, but this locale still has to be
made the current locale with a setlocale call.


My mistake. I've just checked the code again and there is a
setlocale() call (using the locale from the environment variable
LC_CTYPE). The locale used is an alias which SHOULD be pointing at a
suitable locale but I can't remember how to verify that.
Nov 13 '05 #9
In <c6**************************@posting.google.com > pa*********@convergys.com (Paul King) writes:
My mistake. I've just checked the code again and there is a
setlocale() call (using the locale from the environment variable
LC_CTYPE). The locale used is an alias which SHOULD be pointing at a
suitable locale but I can't remember how to verify that.


For starters, check the return value of setlocale(). If it's a null
pointer, you're still in the "C" locale.

Dan
--
Dan Pop
DESY Zeuthen, RZ group
Email: Da*****@ifh.de
Nov 13 '05 #10

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

6
by: jalkadir | last post by:
How can I conver a char to std::wstring, for intance time_t Time_t; std::wstring Date; time( &Time_t ); Date = ctime( &Time_t ); or strd::string wstr; str = __FILE__;
18
by: Zygmunt Krynicki | last post by:
Hello I've browsed the FAQ but apparently it lacks any questions concenring wide character strings. I'd like to calculate the length of a multibyte string without converting the whole string. ...
16
by: Daniel Rudy | last post by:
....is that there is no single man page that lists the stdlib functions that I can reference. I'm working in a Unix environment. -- Daniel Rudy Email address has been encoded to reduce spam....
149
by: Christopher Benson-Manica | last post by:
(Followups set to comp.std.c. Apologies if the crosspost is unwelcome.) strchr() is to strrchr() as strstr() is to strrstr(), but strrstr() isn't part of the standard. Why not? --...
3
by: Simon Morgan | last post by:
Hi, The following code is meant to validate a string of multibyte characters by using mbcheck() to call mblen() on each character on the string passed to it. The problem is that it isn't working...
0
by: Kirt Loki Dankmyer | last post by:
So, I download the latest "stable" tar for perl (5.8.7) and try to compile it on the Solaris 8 (SPARC) box that I administrate. I try all sorts of different switches, but I can't get it to compile....
9
by: TheOne | last post by:
Would anyone please point me to a list of reentrant C library functions? I want to know which C library functions are safe to use inside a signal handler across all platforms. Does GNU C library...
10
by: joelagnel | last post by:
hi friends, i've been having this confusion for about a year, i want to know the exact difference between text and binary files. using the fwrite function in c, i wrote 2 bytes of integers in...
1
by: Marcel Ruff | last post by:
Hi, i have the question on how to determine the string length of a wide string and a multibyte string: 1. Number of letters (one letter may use three bytes) 2. Number of bytes In the code...
0
by: taylorcarr | last post by:
A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.