473,796 Members | 2,679 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

XML and "á", "é" character problems

Hello:
I have a problem with encoding. I get information from a web page, and
sometimes it contains "strange" symbols like "á" and "é", and these come
encoding like "& a a c u t e ;", "& e a c u t e ;" and similar.
The problem is that i use this information in XMLDocument object, I load the
string than contains all the information in this object with the method
..LoadXML(strin g) and this raises an exception, i think it's because it has a
problem with the strange symbols.
I want to know if there is a possible to avoid it, i don't mind change a "á"
for "a", but i don't to loose the "a" character.
What can i do? I am hurry, i need to deliver this soft for the next monday.
Thanks for the information.
Bye.

Nov 12 '05 #1
8 1921
Javier wrote:
I have a problem with encoding. I get information from a web page, and
sometimes it contains "strange" symbols like "á" and "é", and these come
encoding like "& a a c u t e ;", "& e a c u t e ;" and similar.
The problem is that i use this information in XMLDocument object, I load the
string than contains all the information in this object with the method
.LoadXML(string ) and this raises an exception, i think it's because it has a
problem with the strange symbols.


Why guess? Show us which exception do you get.
--
Oleg Tkachenko [XML MVP, MCAD]
http://www.xmllab.net
http://blog.tkachenko.com
Nov 12 '05 #2
'System.Xml.Xml Exception' Exception in system.xml.dll
Extra Information: System Error.

I look at MSDN and this is the only type of exception that LoadXML method
can return. It happens with strings like the next:
"Aplicacion del Razonamiento Semicualitativo al Modelado y Analisis de
Sistemas Econ & o a c u t e ; micos.";
but if i remove the entiy and changes the string to this:
"Aplicacion del Razonamiento Semicualitativo al Modelado y Analisis de
Sistemas Economicos.";
it works. The problem is that this information is about book's titles and i
have to save it and i cann't loose characters. I don't mind loose the written
accent but not the character.

Thanks for the help.
Bye.
"Oleg Tkachenko [MVP]" wrote:
Javier wrote:
I have a problem with encoding. I get information from a web page, and
sometimes it contains "strange" symbols like "á" and "é", and these come
encoding like "& a a c u t e ;", "& e a c u t e ;" and similar.
The problem is that i use this information in XMLDocument object, I load the
string than contains all the information in this object with the method
.LoadXML(string ) and this raises an exception, i think it's because it has a
problem with the strange symbols.


Why guess? Show us which exception do you get.
--
Oleg Tkachenko [XML MVP, MCAD]
http://www.xmllab.net
http://blog.tkachenko.com

Nov 12 '05 #3


Javier wrote:

I have a problem with encoding. I get information from a web page, and
sometimes it contains "strange" symbols like "á" and "é", and these come
encoding like "& a a c u t e ;", "& e a c u t e ;" and similar.


If your XML uses entity references then the entities referenced need to
be defined in the DTD (unless you use predefined enitities like lt or gt
or apos or quot).
So you need to make sure that you have e.g.
<?xml version="1.0"?>
<!DOCTYPE root [
<!ENTITY aacute "á">
]>
<root>Some text here with a reference: &aacute;</root>


--

Martin Honnen --- MVP XML
http://JavaScript.FAQTs.com/
Nov 12 '05 #4
Hello:

i have a problem with your solution. The information in the XML Files comes
from a web pages, and i don't know all the entities that could appear when i
get automatically the information from the web page. What i know is that all
the entities are HTML entities, perhaps if i could find a DTD for all this
HTML entities, i will solve the problem. But i need it done, i am not going
to do a DTD for all the entities that can appear in a HTML document. Is it
possible?
Thanks for your help.
Bye.

"Martin Honnen" wrote:


Javier wrote:

I have a problem with encoding. I get information from a web page, and
sometimes it contains "strange" symbols like "á" and "é", and these come
encoding like "& a a c u t e ;", "& e a c u t e ;" and similar.


If your XML uses entity references then the entities referenced need to
be defined in the DTD (unless you use predefined enitities like lt or gt
or apos or quot).
So you need to make sure that you have e.g.
<?xml version="1.0"?>
<!DOCTYPE root [
<!ENTITY aacute "á">
]>
<root>Some text here with a reference: á</root>


--

Martin Honnen --- MVP XML
http://JavaScript.FAQTs.com/

Nov 12 '05 #5
Javier wrote:
i have a problem with your solution. The information in the XML Files comes
from a web pages, and i don't know all the entities that could appear when i
get automatically the information from the web page. What i know is that all
the entities are HTML entities, perhaps if i could find a DTD for all this
HTML entities,


Yep, if you just take a look at HTML spec you can find them -
http://www.w3.org/TR/html401/sgml/entities.html

--
Oleg Tkachenko [XML MVP, MCAD]
http://www.xmllab.net
http://blog.tkachenko.com
Nov 12 '05 #6


Javier wrote:

What i know is that all
the entities are HTML entities, perhaps if i could find a DTD for all this
HTML entities, i will solve the problem. But i need it done, i am not going
to do a DTD for all the entities that can appear in a HTML document. Is it
possible?


Check the links in the XHTML 1.0 specification here:
<http://www.w3.org/TR/xhtml1/#h-A2>
there all entities defined in HTML 4.01 respectively XHTML 1.0 are defined.

--

Martin Honnen --- MVP XML
http://JavaScript.FAQTs.com/
Nov 12 '05 #7
Hello:
I have solved it, i add the entities that found in the links you gave me to
my XML file, and it solves the problem.
It has been really hard, because i know almost nothing about XML but now it
works.
Thank you all for your help.
Bye.
Nov 12 '05 #8
Note that you can copy this file to your hard drive:

http://www.w3.org/TR/xhtml1/DTD/xhtml-lat1.ent

Then you can include it into your XML document using the following include
mechanism:

<!DOCTYPE root [
<!ENTITY % isolat1 SYSTEM "xhtml-lat1.ent">
%isolat1;
]>
<root>
...
</root>

This way you can share all these definitions across your XML documents.
"Javier" <Ja****@discuss ions.microsoft. com> wrote in message
news:B6******** *************** ***********@mic rosoft.com...
Hello:
I have solved it, i add the entities that found in the links you gave me
to
my XML file, and it solves the problem.
It has been really hard, because i know almost nothing about XML but now
it
works.
Thank you all for your help.
Bye.

Nov 12 '05 #9

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

0
2354
by: Anthony Baxter | last post by:
To go along with the 2.4a3 release, here's an updated version of the decorator PEP. It describes the state of decorators as they are in 2.4a3. PEP: 318 Title: Decorators for Functions and Methods Version: $Revision: 1.34 $ Last-Modified: $Date: 2004/09/03 09:32:50 $ Author: Kevin D. Smith, Jim Jewett, Skip Montanaro, Anthony Baxter
14
3775
by: spike | last post by:
Im trying to write a program that should read through a binary file searching for the character sequence "\name\" Then it should read the characters following the "\name\" sequence until a NULL character is encountered. But when my program runs it gets a SIGSEGV (Segmentation vioalation) signal. Whats wrong? And is there a better way than mine to solve this task (most likely)
81
7358
by: Matt | last post by:
I have 2 questions: 1. strlen returns an unsigned (size_t) quantity. Why is an unsigned value more approprate than a signed value? Why is unsighned value less appropriate? 2. Would there be any advantage in having strcat and strcpy return a pointer to the "end" of the destination string rather than returning a
86
3961
by: Randy Yates | last post by:
In Harbison and Steele's text (fourth edition, p.111) it is stated, The C language does not specify the range of integers that the integral types will represent, except ot say that type int may not be smaller than short and long may not be smaller than int. They go on to say, Many implementations represent characters in 8 bits, type short in
45
3466
by: Gregory Petrosyan | last post by:
1) From 2.4.2 documentation: There are two new valid (semantic) forms for the raise statement: raise Class, instance raise instance 2) In python: >>> raise NameError Traceback (most recent call last): File "<stdin>", line 1, in ? NameError
14
5937
by: Arne | last post by:
A lot of Firefox users I know, says they have problems with validation where the ampersand sign has to be written as &amp; to be valid. I don't have Firefox my self and don't wont to install it only because of this, so I hope some of you gurus can enlighten me with this :) In what circumstances can the "&amp;" in the source code be involuntary changed to "&" by a browser when or other software, when editing and uploading the file to the web...
3
10614
by: mr | last post by:
How can i 'force' c++ to interprete "blabla" strings as unicode string instead of ascii string (i just don't want to add 'L' before the thousands strings that are on my projects...), as all my projects are using unicode, and i don't see any reason that c++ compilator keep creating ascii Is it a way? thanks !
1
6510
by: laredotornado | last post by:
Hi, I'm using PHP 4.4.4 on Apache 2 on Fedora Core 5. PHP was installed using Apache's apxs and the php library was installed to /usr/local/php. However, when I set my "error_reporting" setting to be "E_ALL", notices are still not getting reported. The perms on my file are 664, with owner root and group root. The php.ini file is located at /usr/local/lib/php/php.ini. Any ideas why the setting does not seem to be having an effect? ...
169
9224
by: JohnQ | last post by:
(The "C++ Grammer" thread in comp.lang.c++.moderated prompted this post). It would be more than a little bit nice if C++ was much "cleaner" (less complex) so that it wasn't a major world wide untaking to create a toolchain for it. Way back when, there used to be something called "Small C". I wonder if the creator(s) of that would want to embark on creating a nice little Small C++ compiler devoid of C++ language features that make...
25
30992
by: Peng Yu | last post by:
Hi, It is possible to change the length of "\t" to a number other than 8. std::cout << "\t"; Thanks, Peng
0
9685
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
10465
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
1
10200
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
9061
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
0
6800
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
5453
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
0
5582
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
4127
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
2
3744
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.