469,642 Members | 1,151 Online
Bytes | Developer Community
New Post

Home Posts Topics Members FAQ

Post your question to a community of 469,642 developers. It's quick & easy.

UTF-8 not decoding

Hi,
I am opening a stream that is UTF encoded. I use fgetc to read the
stream- which is binary safe. I add every character read to a string.
But when I look at the stream, I see some characters with a bunch of
"?" question markets, and then utf8_decode has no effect on it
either.

How do you go about decoding utf. Does adding the characters to the
string somehow mess it up. Please help. Running 4.3.4 PHP on Win.

--
http://www.dbForumz.com/ This article was posted by author's request
Articles individually checked for conformance to usenet standards
Topic URL: http://www.dbForumz.com/PHP-UTF-deco...ict138860.html
Visit Topic URL to contact author (reg. req'd). Report abuse: http://www.dbForumz.com/eform.php?p=464220
Jul 17 '05 #1
3 3669
"steve" <Us************@dbForumz.com> wrote in message
news:41**********@news.athenanews.com...
Hi,
I am opening a stream that is UTF encoded. I use fgetc to read the
stream- which is binary safe. I add every character read to a string.
But when I look at the stream, I see some characters with a bunch of
"?" question markets, and then utf8_decode has no effect on it
either.
Question marks means that there're Unicode characters that aren't found
within the current codepage. Basically the characters are there, they're
just represented by ?s.

utf8_decode() does have an effect: it replaces characters outside of
ISO-8859-1 with question marks.
How do you go about decoding utf. Does adding the characters to the
string somehow mess it up. Please help. Running 4.3.4 PHP on Win.


The question is, what do you mean by decoding UTF8. Using fgetc on UTF8 text
is not a good idea, since one Unicode character can span multiple bytes.
Jul 17 '05 #2
"Chung Leong" wrote:
"steve" <Us************@dbForumz.com> wrote in message
news:411ab511 _7@news.athenanews.com...
Hi,
I am opening a stream that is UTF encoded. I use fgetc to read

the
stream- which is binary safe. I add every character read to a

string.


But when I look at the stream, I see some characters with a bunch

of
"?" question markets, and then utf8_decode has no effect on it
either.


Question marks means that there’re Unicode characters that
aren’t found
within the current codepage. Basically the characters are there,
they’re
just represented by ?s.

utf8_decode() does have an effect: it replaces characters outside

of ISO-8859-1 with question marks.
How do you go about decoding utf. Does adding the characters to

the
string somehow mess it up. Please help. Running 4.3.4 PHP on

Win.

The question is, what do you mean by decoding UTF8. Using fgetc on
UTF8 text
is not a good idea, since one Unicode character can span multiple
bytes.


Thanks, Chung. I am interested in decoding usenet message headers that
look like this:
"=?Utf-8?B?YmVsZGVyYXo=?="

--
http://www.dbForumz.com/ This article was posted by author's request
Articles individually checked for conformance to usenet standards
Topic URL: http://www.dbForumz.com/PHP-UTF-deco...ict138860.html
Visit Topic URL to contact author (reg. req'd). Report abuse: http://www.dbForumz.com/eform.php?p=464367
Jul 17 '05 #3
"steve" wrote:
[quote:eff0459c7e="Chung Leong"]"steve"
<Us************@dbForumz.com> wrote in message
news:411ab511 _7@news.athenanews.com...
Hi,
I am opening a stream that is UTF encoded. I use fgetc to read

the
stream- which is binary safe. I add every character read to a

string.


But when I look at the stream, I see some characters with a bunch

of
"?" question markets, and then utf8_decode has no effect on it
either.


Question marks means that there’re Unicode characters that
aren’t found
within the current codepage. Basically the characters are there,
they’re
just represented by ?s.

utf8_decode() does have an effect: it replaces characters outside

of ISO-8859-1 with question marks.
How do you go about decoding utf. Does adding the characters to

the
string somehow mess it up. Please help. Running 4.3.4 PHP on

Win.

The question is, what do you mean by decoding UTF8. Using fgetc on
UTF8 text
is not a good idea, since one Unicode character can span multiple
bytes.


Thanks, Chung. I am interested in decoding usenet message headers that
look like this:
"=?Utf-8?B?YmVsZGVyYXo=?="[/quote:eff0459c7e]

Ok, figured it out. Take a string like this:
$instr = "=?Utf-8?B?YmVsZGVyYXo=?="

and feed it as argument to this function:
function decode_subject( $instr ) {
$enstr = $instr;
while( preg_match(
’/^([^?]+)?=\?[^?]+\?(B|Q)\?([^?]+)=?=?\?=(.+)?$/i’, $enstr,
$match ) ) {
if( $match[2] == ’b’ || $match[2] == ’B’ )
$enstr = $match[1] . base64_decode( $match[3] ) .
(isset($match[4])?$match[4]:’’);
else
$enstr = $match[1] . quoted_printable_decode( $match[3] );
}
return( $enstr );
}

and it will return the ascii equivalent.

The function is included in: PHP Newsreader
http://pnews.sourceforge.net/

--
http://www.dbForumz.com/ This article was posted by author's request
Articles individually checked for conformance to usenet standards
Topic URL: http://www.dbForumz.com/PHP-UTF-deco...ict138860.html
Visit Topic URL to contact author (reg. req'd). Report abuse: http://www.dbForumz.com/eform.php?p=464416
Jul 17 '05 #4

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

9 posts views Thread by lawrence | last post: by
4 posts views Thread by Alban Hertroys | last post: by
38 posts views Thread by Haines Brown | last post: by
6 posts views Thread by jmgonet | last post: by
6 posts views Thread by archana | last post: by
1 post views Thread by sheldon.regular | last post: by
4 posts views Thread by shreshth.luthra | last post: by
23 posts views Thread by Allan Ebdrup | last post: by
reply views Thread by gheharukoh7 | last post: by
By using this site, you agree to our Privacy Policy and Terms of Use.