473,705 Members | 5,438 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

UTF-8 not decoding

Hi,
I am opening a stream that is UTF encoded. I use fgetc to read the
stream- which is binary safe. I add every character read to a string.
But when I look at the stream, I see some characters with a bunch of
"?" question markets, and then utf8_decode has no effect on it
either.

How do you go about decoding utf. Does adding the characters to the
string somehow mess it up. Please help. Running 4.3.4 PHP on Win.

--
http://www.dbForumz.com/ This article was posted by author's request
Articles individually checked for conformance to usenet standards
Topic URL: http://www.dbForumz.com/PHP-UTF-deco...ict138860.html
Visit Topic URL to contact author (reg. req'd). Report abuse: http://www.dbForumz.com/eform.php?p=464220
Jul 17 '05 #1
3 3969
"steve" <Us************ @dbForumz.com> wrote in message
news:41******** **@news.athenan ews.com...
Hi,
I am opening a stream that is UTF encoded. I use fgetc to read the
stream- which is binary safe. I add every character read to a string.
But when I look at the stream, I see some characters with a bunch of
"?" question markets, and then utf8_decode has no effect on it
either.
Question marks means that there're Unicode characters that aren't found
within the current codepage. Basically the characters are there, they're
just represented by ?s.

utf8_decode() does have an effect: it replaces characters outside of
ISO-8859-1 with question marks.
How do you go about decoding utf. Does adding the characters to the
string somehow mess it up. Please help. Running 4.3.4 PHP on Win.


The question is, what do you mean by decoding UTF8. Using fgetc on UTF8 text
is not a good idea, since one Unicode character can span multiple bytes.
Jul 17 '05 #2
"Chung Leong" wrote:
"steve" <Us************ @dbForumz.com> wrote in message
news:411ab511 _7@news.athenan ews.com...
Hi,
I am opening a stream that is UTF encoded. I use fgetc to read

the
stream- which is binary safe. I add every character read to a

string.


But when I look at the stream, I see some characters with a bunch

of
"?" question markets, and then utf8_decode has no effect on it
either.


Question marks means that therere Unicode characters that
arent found
within the current codepage. Basically the characters are there,
theyre
just represented by ?s.

utf8_decode() does have an effect: it replaces characters outside

of ISO-8859-1 with question marks.
How do you go about decoding utf. Does adding the characters to

the
string somehow mess it up. Please help. Running 4.3.4 PHP on

Win.

The question is, what do you mean by decoding UTF8. Using fgetc on
UTF8 text
is not a good idea, since one Unicode character can span multiple
bytes.


Thanks, Chung. I am interested in decoding usenet message headers that
look like this:
"=?Utf-8?B?YmVsZGVyYXo =?="

--
http://www.dbForumz.com/ This article was posted by author's request
Articles individually checked for conformance to usenet standards
Topic URL: http://www.dbForumz.com/PHP-UTF-deco...ict138860.html
Visit Topic URL to contact author (reg. req'd). Report abuse: http://www.dbForumz.com/eform.php?p=464367
Jul 17 '05 #3
"steve" wrote:
[quote:eff0459c7 e="Chung Leong"]"steve"
<Us************ @dbForumz.com> wrote in message
news:411ab511 _7@news.athenan ews.com...
Hi,
I am opening a stream that is UTF encoded. I use fgetc to read

the
stream- which is binary safe. I add every character read to a

string.


But when I look at the stream, I see some characters with a bunch

of
"?" question markets, and then utf8_decode has no effect on it
either.


Question marks means that therere Unicode characters that
arent found
within the current codepage. Basically the characters are there,
theyre
just represented by ?s.

utf8_decode() does have an effect: it replaces characters outside

of ISO-8859-1 with question marks.
How do you go about decoding utf. Does adding the characters to

the
string somehow mess it up. Please help. Running 4.3.4 PHP on

Win.

The question is, what do you mean by decoding UTF8. Using fgetc on
UTF8 text
is not a good idea, since one Unicode character can span multiple
bytes.


Thanks, Chung. I am interested in decoding usenet message headers that
look like this:
"=?Utf-8?B?YmVsZGVyYXo =?="[/quote:eff0459c7 e]

Ok, figured it out. Take a string like this:
$instr = "=?Utf-8?B?YmVsZGVyYXo =?="

and feed it as argument to this function:
function decode_subject( $instr ) {
$enstr = $instr;
while( preg_match(
/^([^?]+)?=\?[^?]+\?(B|Q)\?([^?]+)=?=?\?=(.+)?$/i, $enstr,
$match ) ) {
if( $match[2] == b || $match[2] == B )
$enstr = $match[1] . base64_decode( $match[3] ) .
(isset($match[4])?$match[4]:);
else
$enstr = $match[1] . quoted_printabl e_decode( $match[3] );
}
return( $enstr );
}

and it will return the ascii equivalent.

The function is included in: PHP Newsreader
http://pnews.sourceforge.net/

--
http://www.dbForumz.com/ This article was posted by author's request
Articles individually checked for conformance to usenet standards
Topic URL: http://www.dbForumz.com/PHP-UTF-deco...ict138860.html
Visit Topic URL to contact author (reg. req'd). Report abuse: http://www.dbForumz.com/eform.php?p=464416
Jul 17 '05 #4

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

9
4181
by: lawrence | last post by:
Someone on www.php.net suggested using a seems_utf8() method to test text for UTF-8 character encoding but didn't specify how to write such a method. Can anyone suggest a test that might work? Something that maybe gives 90% confidence that a given block of text is or is not UTF-8 encoded?
4
6396
by: Alban Hertroys | last post by:
Another python/psycopg question, for which the solution is probably quite simple; I just don't know where to look. I have a query that inserts data originating from an utf-8 encoded XML file. And guess what, it contains utf-8 encoded characters... Now my problem is that psycopg will only accept queries of type str, so how do I get my utf-8 encoded data into the DB? I can't do query.encode('ascii'), that would be similar to: >>> x =...
12
8226
by: Mike Dee | last post by:
A very very basic UTF-8 question that's driving me nuts: If I have this in the beginning of my Python script in Linux: #!/usr/bin/env python # -*- coding: UTF-8 -*- should I - or should I not - be able to use non-ASCII characters in strings and in Tk GUI button labels and GUI window titles and in raw_input data without Python returning wrong case in manipulated
38
5736
by: Haines Brown | last post by:
I'm having trouble finding the character entity for the French abbreviation for "number" (capital N followed by a small supercript o, period). My references are not listing it. Where would I find an answer to this question (don't find it in the W3C_char_entities document). -- Haines Brown brownh@hartford-hwp.com
6
18758
by: jmgonet | last post by:
Hello everybody, I'm having troubles loading a Xml string encoded in UTF-8. If I try this code: ------------------------------ XmlDocument doc=new XmlDocument(); String s="<?xml version=\"1.0\" encoding=\"utf-8\" standalone=\"yes\"?><a>Schönbühl</a>"; doc.LoadXml(s); doc.Save("d:\\temp\\test.xml");
6
13890
by: archana | last post by:
Hi all, can someone tell me difference between unicode and utf 8 or utf 18 and which one is supporting more character set. whic i should use to support character ucs-2. I want to use ucs-2 character in streamreader and streamwriter. How unicode and utf chacters are stored.
1
7287
by: sheldon.regular | last post by:
I am new to unicode so please bear with my stupidity. I am doing the following in a Python IDE called Wing with Python 23. äöü äöü '\xc3\xa4\xc3\xb6\xc3\xbc' u'\xe4\xf6\xfc' u'\xe4\xf6\xfc' äöü
4
2348
by: shreshth.luthra | last post by:
Hi All, I am having a GUI which accepts a Unicode string and searches a given set of xml files for that string. Now, i have 2 XML files both of them saved in UTF-8 format, having characters of different language. Although both of them are having UTF-8 as BoM, but only first file is having UTF-8 defined in XML declration at the top of the XML file as
10
19564
by: Jed | last post by:
I have a form that needs to handle international characters withing the UTF-8 character set. I have tried all the recommended strategies for getting utf-8 characters from form input to email message and I cannot get it to work. I need to stay with classic asp for this. Here are some things I tried: 'CDONTS Call msg.SetLocaleIDs(65001)
23
5018
by: Allan Ebdrup | last post by:
I hava an ajax web application where i hvae problems with UTF-8 encoding oc chineese chars. My Ajax webapplication runs in a HTML page that is UTF-8 Encoded. I copy and paste some chineese chars from another HTML page viewed in IE7, that is also UTF-8 encoded (search for "china" on google.com). I paste the chineese chars into a content editable div. My Ajax webservice compiles an XML where the data from the content editable div is...
0
8690
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
9274
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
0
9139
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
0
8979
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
1
6606
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupr who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
5933
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
1
3138
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
2
2491
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
3
2083
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.