473,324 Members | 2,178 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,324 software developers and data experts.

UTF-8 encoding problem

Hi All,

I am having a GUI which accepts a Unicode string and searches a given
set of xml files for that string.

Now, i have 2 XML files both of them saved in UTF-8 format, having
characters of different language.

Although both of them are having UTF-8 as BoM, but only first file is
having UTF-8 defined in XML declration at the top of the XML file as
well.

Now, when i search for some different langauge character in that
directory using a third party GUI for desktop search, it shows that the
charcter exist in the first file (in which XML declation was also
there), but not in the second file (having only BoM)

Initilally i thought that the problem is mainly because of UTF-8 being
supporting both MultiBye and Unicode, but could not find much on it.

Please help.

Regards,
Shreshth

Oct 18 '06 #1
6 1699
sh*************@gmail.com wrote:
>
Initilally i thought that the problem is mainly because of UTF-8 being
supporting both MultiBye and Unicode, but could not find much on it.

What does this have to do with C++ at all?
UTF-8 is a multibyte encoding of the Unicode (which effectively
is a 32 bit character space) but I doubt that's your problem.
Your problem is your document isn't conforming with the document
rules that the search program is using.
Oct 18 '06 #2
I know this has nothing to do with C++ in particular but where better
to ask such a question.

Anyways,
>your problem is your document isn't conforming with the document
rules that the search program is using.
I am not able to understand what you are trying to say by this.
Ofcourse i cannot do anything about the Search Program (Which is for
sure using Unicode)

But the question is that if both the file are in UTF-8 format why is it
(search program) working only for the one having UTF-8 in its XML
declaration as well.
Does it really make any difference in this regard.

Thanks for your reply.

Shreshth
Ron Natalie wrote:
sh*************@gmail.com wrote:

Initilally i thought that the problem is mainly because of UTF-8 being
supporting both MultiBye and Unicode, but could not find much on it.
What does this have to do with C++ at all?
UTF-8 is a multibyte encoding of the Unicode (which effectively
is a 32 bit character space) but I doubt that's your problem.
Your problem is your document isn't conforming with the document
rules that the search program is using.
Oct 18 '06 #3
sh*************@gmail.com wrote:
Although both of them are having UTF-8 as BoM, but only first file is
having UTF-8 defined in XML declration at the top of the XML file as
well.
BOMs are quite useless for UTF-8. They're nothing but facultative.
And according to the XML spec (AFAIK), the default encoding when no
encoding is declared is UTF-8.

Now, when i search for some different langauge character in that
directory using a third party GUI for desktop search, it shows that the
charcter exist in the first file (in which XML declation was also
there), but not in the second file (having only BoM)
OK, so you have a problem with your broken third party application.
How is that related with C++?

Initilally i thought that the problem is mainly because of UTF-8 being
supporting both MultiBye and Unicode, but could not find much on it.
Like most of your message, what you say just doesn't make much sense.

Please help.
Getting a basic understanding of what Unicode and its encoding formats
are would surely help.

Oct 18 '06 #4
Ron Natalie wrote:
the Unicode (which effectively
is a 32 bit character space)
Unicode only reserves 2^20 + 2^16 mappings.
21 bits is more than enough to store that.
Oct 18 '06 #5
sh*************@gmail.com wrote:
I know this has nothing to do with C++ in particular but where better
to ask such a question.
The statement above is the best I have seen in a long time here.

If you know your question have "nothing to do with C++ in particular"
then why do you ask in a newsgroup dedicated to the C++ language? That
is like asking for help with you car in a bicycle shop.

You will probably get much better response if you ask in a forum
dedicated to your problem.

Sincerely,

Peter Jansson
http://www.p-jansson.com/
http://www.jansson.net/
Oct 18 '06 #6


Check your 3rd party search tool documentation about how it searches
XML files.
sh*************@gmail.com wrote:
I know this has nothing to do with C++ in particular but where better
to ask such a question.

Anyways,
your problem is your document isn't conforming with the document
rules that the search program is using.

I am not able to understand what you are trying to say by this.
Ofcourse i cannot do anything about the Search Program (Which is for
sure using Unicode)

But the question is that if both the file are in UTF-8 format why is it
(search program) working only for the one having UTF-8 in its XML
declaration as well.
Does it really make any difference in this regard.

Thanks for your reply.

Shreshth
Ron Natalie wrote:
sh*************@gmail.com wrote:
>
Initilally i thought that the problem is mainly because of UTF-8 being
supporting both MultiBye and Unicode, but could not find much on it.
>
>
What does this have to do with C++ at all?
UTF-8 is a multibyte encoding of the Unicode (which effectively
is a 32 bit character space) but I doubt that's your problem.
Your problem is your document isn't conforming with the document
rules that the search program is using.
Oct 19 '06 #7

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

38
by: Haines Brown | last post by:
I'm having trouble finding the character entity for the French abbreviation for "number" (capital N followed by a small supercript o, period). My references are not listing it. Where would I...
22
by: Martin Trautmann | last post by:
Hi all, is there any kind of 'hiconv' or other (unix-like) conversion tool that would convert UTF-8 to HTML (ISO-Latin-1 and Unicode)? The database output is UTF-8 or UTF-16 only - Thus almost...
7
by: saroj.yadav | last post by:
As I understand it (correct me, if I am wrong) Unicode came into picture so that a document containing multiple language characters can be supported like somebody can write a document comparing...
23
by: Steven T. Hatton | last post by:
This is one of the first obstacles I encountered when getting started with C++. I found that everybody had their own idea of what a string is. There was std::string, QString, xercesc::XMLString,...
7
by: Jimmy Shaw | last post by:
Hi everybody, Is there any SIMPLE way to convert from UTF-16 to UTF-32? I may be mixed up, but is it possible that all UTF-16 "code points" that are 16 bits long appear just the same in UTF-32,...
0
by: DolphinDB | last post by:
Tired of spending countless mintues downsampling your data? Look no further! In this article, you’ll learn how to efficiently downsample 6.48 billion high-frequency records to 61 million...
0
by: ryjfgjl | last post by:
ExcelToDatabase: batch import excel into database automatically...
0
by: jfyes | last post by:
As a hardware engineer, after seeing that CEIWEI recently released a new tool for Modbus RTU Over TCP/UDP filtering and monitoring, I actively went to its official website to take a look. It turned...
0
by: ArrayDB | last post by:
The error message I've encountered is; ERROR:root:Error generating model response: exception: access violation writing 0x0000000000005140, which seems to be indicative of an access violation...
1
by: PapaRatzi | last post by:
Hello, I am teaching myself MS Access forms design and Visual Basic. I've created a table to capture a list of Top 30 singles and forms to capture new entries. The final step is a form (unbound)...
1
by: CloudSolutions | last post by:
Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...
1
by: Defcon1945 | last post by:
I'm trying to learn Python using Pycharm but import shutil doesn't work
0
by: af34tf | last post by:
Hi Guys, I have a domain whose name is BytesLimited.com, and I want to sell it. Does anyone know about platforms that allow me to list my domain in auction for free. Thank you
0
by: Faith0G | last post by:
I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.