By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
440,873 Members | 1,040 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 440,873 IT Pros & Developers. It's quick & easy.

UTF-8 not decoding

P: n/a
Hi,
I am opening a stream that is UTF encoded. I use fgetc to read the
stream- which is binary safe. I add every character read to a string.
But when I look at the stream, I see some characters with a bunch of
"?" question markets, and then utf8_decode has no effect on it
either.

How do you go about decoding utf. Does adding the characters to the
string somehow mess it up. Please help. Running 4.3.4 PHP on Win.

--
http://www.dbForumz.com/ This article was posted by author's request
Articles individually checked for conformance to usenet standards
Topic URL: http://www.dbForumz.com/PHP-UTF-deco...ict138860.html
Visit Topic URL to contact author (reg. req'd). Report abuse: http://www.dbForumz.com/eform.php?p=464220
Jul 17 '05 #1
Share this Question
Share on Google+
3 Replies


P: n/a
"steve" <Us************@dbForumz.com> wrote in message
news:41**********@news.athenanews.com...
Hi,
I am opening a stream that is UTF encoded. I use fgetc to read the
stream- which is binary safe. I add every character read to a string.
But when I look at the stream, I see some characters with a bunch of
"?" question markets, and then utf8_decode has no effect on it
either.
Question marks means that there're Unicode characters that aren't found
within the current codepage. Basically the characters are there, they're
just represented by ?s.

utf8_decode() does have an effect: it replaces characters outside of
ISO-8859-1 with question marks.
How do you go about decoding utf. Does adding the characters to the
string somehow mess it up. Please help. Running 4.3.4 PHP on Win.


The question is, what do you mean by decoding UTF8. Using fgetc on UTF8 text
is not a good idea, since one Unicode character can span multiple bytes.
Jul 17 '05 #2

P: n/a
"Chung Leong" wrote:
"steve" <Us************@dbForumz.com> wrote in message
news:411ab511 _7@news.athenanews.com...
Hi,
I am opening a stream that is UTF encoded. I use fgetc to read

the
stream- which is binary safe. I add every character read to a

string.


But when I look at the stream, I see some characters with a bunch

of
"?" question markets, and then utf8_decode has no effect on it
either.


Question marks means that there’re Unicode characters that
aren’t found
within the current codepage. Basically the characters are there,
they’re
just represented by ?s.

utf8_decode() does have an effect: it replaces characters outside

of ISO-8859-1 with question marks.
How do you go about decoding utf. Does adding the characters to

the
string somehow mess it up. Please help. Running 4.3.4 PHP on

Win.

The question is, what do you mean by decoding UTF8. Using fgetc on
UTF8 text
is not a good idea, since one Unicode character can span multiple
bytes.


Thanks, Chung. I am interested in decoding usenet message headers that
look like this:
"=?Utf-8?B?YmVsZGVyYXo=?="

--
http://www.dbForumz.com/ This article was posted by author's request
Articles individually checked for conformance to usenet standards
Topic URL: http://www.dbForumz.com/PHP-UTF-deco...ict138860.html
Visit Topic URL to contact author (reg. req'd). Report abuse: http://www.dbForumz.com/eform.php?p=464367
Jul 17 '05 #3

P: n/a
"steve" wrote:
[quote:eff0459c7e="Chung Leong"]"steve"
<Us************@dbForumz.com> wrote in message
news:411ab511 _7@news.athenanews.com...
Hi,
I am opening a stream that is UTF encoded. I use fgetc to read

the
stream- which is binary safe. I add every character read to a

string.


But when I look at the stream, I see some characters with a bunch

of
"?" question markets, and then utf8_decode has no effect on it
either.


Question marks means that there’re Unicode characters that
aren’t found
within the current codepage. Basically the characters are there,
they’re
just represented by ?s.

utf8_decode() does have an effect: it replaces characters outside

of ISO-8859-1 with question marks.
How do you go about decoding utf. Does adding the characters to

the
string somehow mess it up. Please help. Running 4.3.4 PHP on

Win.

The question is, what do you mean by decoding UTF8. Using fgetc on
UTF8 text
is not a good idea, since one Unicode character can span multiple
bytes.


Thanks, Chung. I am interested in decoding usenet message headers that
look like this:
"=?Utf-8?B?YmVsZGVyYXo=?="[/quote:eff0459c7e]

Ok, figured it out. Take a string like this:
$instr = "=?Utf-8?B?YmVsZGVyYXo=?="

and feed it as argument to this function:
function decode_subject( $instr ) {
$enstr = $instr;
while( preg_match(
’/^([^?]+)?=\?[^?]+\?(B|Q)\?([^?]+)=?=?\?=(.+)?$/i’, $enstr,
$match ) ) {
if( $match[2] == ’b’ || $match[2] == ’B’ )
$enstr = $match[1] . base64_decode( $match[3] ) .
(isset($match[4])?$match[4]:’’);
else
$enstr = $match[1] . quoted_printable_decode( $match[3] );
}
return( $enstr );
}

and it will return the ascii equivalent.

The function is included in: PHP Newsreader
http://pnews.sourceforge.net/

--
http://www.dbForumz.com/ This article was posted by author's request
Articles individually checked for conformance to usenet standards
Topic URL: http://www.dbForumz.com/PHP-UTF-deco...ict138860.html
Visit Topic URL to contact author (reg. req'd). Report abuse: http://www.dbForumz.com/eform.php?p=464416
Jul 17 '05 #4

This discussion thread is closed

Replies have been disabled for this discussion.