473,799 Members | 2,935 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

DOM: accented characters

Using this xml file:

// test.xml
<?xml version="1.0"?>
<test>é</test>

(that's a &eacute;)

and this script:

$doc = DOMDocument::lo ad("test.xml") ;

outputs

Warning: DOMDocument::lo ad() [function.load]: Input is not proper
UTF-8, indicate encoding ! in file:///test.xml, line: 2 in E:\test.php
on line 3

Warning: DOMDocument::lo ad() [function.load]: Bytes: 0xE9 0x3C 0x2F
0x74 in file:///test.xml, line: 2 in E:\test.php on line 3

I tried using htmlentities(), so the xml file looks like

// test.xml
<?xml version="1.0"?>
<test>&eacute ;</test>

The script does not change, the output is

Warning: DOMDocument::lo ad() [function.load]: Entity 'eacute' not
defined in file:///test.xml, line: 2 in E:\test.php on line 3

I tried fiddling with the encoding, but it was more of a guess than
anything. Can anyone solve this problem?

Thank you,
Jonathan

Aug 26 '05 #1
5 3791
On 2005-08-26, Jonathan Mcdougall <jo************ ***@gmail.com> wrote:
Using this xml file:

// test.xml
<?xml version="1.0"?>
<test>é</test>

(that's a &eacute;)

and this script:

$doc = DOMDocument::lo ad("test.xml") ; Warning: DOMDocument::lo ad() [function.load]: Bytes: 0xE9 0x3C 0x2F
0x74 in file:///test.xml, line: 2 in E:\test.php on line 3
0xE9 is ISO-8859-1 encoding for &acute; 0x3C for &lt;
<test>&eacute; </test>


Might want to try <test>é</test>

--
Met vriendelijke groeten,
Tim Van Wassenhove <http://timvw.madoka.be >
Aug 27 '05 #2
Tim Van Wassenhove wrote:
On 2005-08-26, Jonathan Mcdougall <jo************ ***@gmail.com> wrote:
Using this xml file:

// test.xml
<?xml version="1.0"?>
<test>é</test>

(that's a &eacute;)

and this script:

$doc = DOMDocument::lo ad("test.xml") ;

Warning: DOMDocument::lo ad() [function.load]: Bytes: 0xE9 0x3C 0x2F
0x74 in file:///test.xml, line: 2 in E:\test.php on line 3


0xE9 is ISO-8859-1 encoding for &acute; 0x3C for &lt;
<test>&eacute; </test>


Might want to try <test>é</test>


Is there a php function which translates special characters to
character codes?
htmlentities() only translates to named codes (&eacute;).

Jonathan

Aug 27 '05 #3
Jonathan Mcdougall (jo************ ***@gmail.com) wrote:
: Using this xml file:

: // test.xml
: <?xml version=3D"1.0" ?>
: <test>=E9</test>

I assume that the news posting software has encoded a byte as the string
"=E9"

: (that's a &eacute;)

In the correct 8 bit character set that byte could be interpretted that
way.

: and this script:

: $doc =3D DOMDocument::lo ad("test.xml") ;

: outputs

: Warning: DOMDocument::lo ad() [function.load]: Input is not proper
: UTF-8, indicate encoding ! in file:///test.xml, line: 2 in E:\test.php
: on line 3

The xml parser assumes the data is encoded in utf-8. In utf-8, the byte
with the value 0xE9 would have to be the start of a multi-byte sequence.
By itself that byte is not a character (in utf-8).
: Warning: DOMDocument::lo ad() [function.load]: Bytes: 0xE9 0x3C 0x2F
: 0x74 in file:///test.xml, line: 2 in E:\test.php on line 3

The multi-byte sequence would be the four bytes 0xE9 0x3C 0x2F 0x74, but
that sequence is not in fact a valid utf-8 sequence.

: I tried using htmlentities(), so the xml file looks like

: // test.xml
: <?xml version=3D"1.0" ?>
: <test>&eacute ;</test>

: The script does not change, the output is

: Warning: DOMDocument::lo ad() [function.load]: Entity 'eacute' not
: defined in file:///test.xml, line: 2 in E:\test.php on line 3

No, xml does not know anything about entities, except the minimal set,
which I recall as being &amp; &lt; &gt; &quot; &apos;

Anything else must be declared (which has complications to use), or you
can use the numeric "entity" (is that the right name?). I.e. &#n; where
"n" is a decimal number, or &#xh; where h is a hexidecimal number.
: I tried fiddling with the encoding, but it was more of a guess than
: anything. Can anyone solve this problem?

I assume you could define the correct 8bit character encoding in the
<?xml...> thingy, though that may require the parser to know that
character set.

Or use a numeric "entity" to encode the non-ascii characters.

Or encode using utf-8 (not sure which php function does that).

--

This programmer available for rent.
Aug 27 '05 #4
Jonathan Mcdougall wrote:
Is there a php function which translates special characters to
character codes?
htmlentities() only translates to named codes (&eacute;).


You can use the translation table used by htmlentities() and
htmlspecialchar s() as follows:

$table = get_html_transl ation_table(HTM L_ENTITIES);
foreach (array_keys($ta ble) as $k) {
$table[$k] = '&#' . ord($k) . ';';
}

And then apply it as the second argument to strtr():

$text = strtr($text, $table);
JW

Aug 28 '05 #5
Janwillem Borleffs wrote:
Jonathan Mcdougall wrote:
Is there a php function which translates special characters to
character codes?
htmlentities() only translates to named codes (&eacute;).


You can use the translation table used by htmlentities() and
htmlspecialchar s() as follows:

$table = get_html_transl ation_table(HTM L_ENTITIES);
foreach (array_keys($ta ble) as $k) {
$table[$k] = '&#' . ord($k) . ';';
}

And then apply it as the second argument to strtr():

$text = strtr($text, $table);


Great, thank you, very good thing to know.

Finally, I decided to back to xml_parser, which is less convinient, but
accepts htmlentities and allows "selective" loading of an xml document,
which fits my requirements.

Thank to all,
Jonathan

Aug 28 '05 #6

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

1
3843
by: Bernhard Georg Enders | last post by:
I'm using the php 'file' command to read the contents of an ASCII text file to a variable. The original text file contains some accented and special characters. The problem arises when I echo this variable: the accented characters are messed up. And not only accented characters, for example, the special characters ª is printed as ª. What can I do to correct this issue? I have tried, without success, to use the 'setlocale' command. TIA,
1
2454
by: Fuzzyman | last post by:
I've written an anagram finder that produces anagrams from a dictionary of words. The user can load their own dictionary. ( http://www.voidspace.org.uk/atlantibots/nanagram.html ) In order to ensure it is able to find anagrams properly I wanted to strip characters like punctuation etc from words in the dictionary and words the user entered. I test(ed) against the 26 English letters ( string.ascii_lowercase ).
14
16129
by: Nicolas Bouillon | last post by:
Hi I would like to replace accentuel chars (like "é", "è" or "à") with non accetued ones ("é" -> "e", "è" -> "e", "à" -> "a"). I have tried string.replace method, but it seems dislike non ascii chars... Can you help me please ? Thanks.
2
2662
by: Remco van den Berg | last post by:
I'm running a MySQL database with one of tables holding the members of a volleybal club in the Netherlands. One the the fields in that table is holding the name of the players. How do I search for all people with the name "Andre", with the important remark, that it should also match "André"!! So with the letter "e" with an accent "'" on it. In the Netherlands those names can be spelled with and without the accent and I do not always...
2
1707
by: nicolas_riesch | last post by:
I try to use python as the language in an asp page with Microsoft IIS 5.0. I have these two files, req_bad.asp and req_ok.asp ---------- req_bad.asp --------- <%@LANGUAGE=Python%> <%
4
2032
by: Satish | last post by:
Hi Gurus, Please help me in this, I have tried all options available to me. (Option 1) I am making a simple request from VB.NET client to WBI generated WSDL and passing request parameters. The response I am expecting should have some accented characters (Customer Name: Lokalcenter Åbygård), but this gets truncated and response is as follows (Customer Name: Lokalcenter bygrd). The WSDL has utf-8 character set defined.
0
2307
by: shintu | last post by:
Hallo, I am trying to write french accented characters é è ê in Excel worksheet using my perl script , But I am stuck here as I couldnt find a way of writing it !: My code: use strict; use warnings;
2
6432
by: gsuns82 | last post by:
Hi all, I have to replace accented characters from a input string with normal plain text.I have coded as follows. String input = "ÄÀÁÂÃ"; input= input.replaceAll("", "A"); like wise v can do for all. output was: ************ AAAAA
4
2898
by: gsuns82 | last post by:
Hi all, I am facing a strange issue. i.e: I have a jsp page with an input text field where the user can enter searching value even along with Accented Characters.After that i am getting the input value at the controller inorder to compose a query,before that i am replacing Accented Characters with plain text values so that i can append the proper input value in the DBquery. I tried to replace the input string for ...
1
1989
by: gsauns | last post by:
I have an ASP.NET app in which I import from a comma-delimited text file, put all that data in a GridView, and then insert the records into multiple related tables in my SQL Server database. I got one text file which contains accented characters. When I bring them into the GridView, the accented characters show up as boxes, the universal unrecognized character. And since I am inserting into the DB by iterating thru the rows of the...
0
9688
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
9544
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
10259
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
0
10030
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
0
9077
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
0
5467
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
0
5589
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
4145
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
3
2941
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.