By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
428,816 Members | 2,151 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 428,816 IT Pros & Developers. It's quick & easy.

Escape to Unicode?

P: n/a
I've begun dealing with PHP's XML functions (puttup!)

I shoudl say- php's DEFAULT XML functions, no extensions. Probably not
5.0. I don't care...

The POINt is, they choke on funny characters, even encoded funny
characters. You need to use the unicode. (change ñ to ñ).

Whatevuh.

That's why, why- now ignore that part, because it will distract and
proably cause you to misconstrue the thrust of the question to follow:

Does PHP have a function that will escape all funny characters in a
string (encoded, unencoded, both, either...) to their unicode
equivilants?

In a string- ignore the XML parts of this question.

(I'm looking at pre-proscessing the data coming into forms that will
form the offending XML)

-Derik

Jun 14 '06 #1
Share this Question
Share on Google+
2 Replies


P: n/a
ReGenesis0 wrote:
The POINt is, they choke on funny characters, even encoded funny
characters. You need to use the unicode. (change ñ to ñ).
There is no such thing as ñ in generic XML. ñ is a purely
HTML concept. There are only five pre-defined entities which XML parsers
are expected to know:

&
<
>
"
'

If PHP understood ñ in generic XML it would be behaving
*incorrectly*. ñ is undefined. (I'm assuming here that you've not
written a DTD that defines what ñ means, which seems like a
reasonable assumption.)
Does PHP have a function that will escape all funny characters in a
string (encoded, unencoded, both, either...) to their unicode
equivilants?


It seems you want some function that converts:

ñ => ñ
€ => €

You might be able to do this using html_entity_decode() to get everything
in its raw form (e.g. will convert € to €) and then use a regular
expression to convert things into numeric character references (e.g. €
to €). Such a regular expression can be found in soapergem at gmail
dot com's 10 May 2006 comment here:
http://uk2.php.net/manual/en/function.htmlentities.php

That said, you're better off correcting the root problem -- that ñ
is not correct XML.

--
Toby A Inkster BSc (Hons) ARCS
Contact Me ~ http://tobyinkster.co.uk/contact

Jun 15 '06 #2

P: n/a
Toby Inkster wrote:
You might be able to do this using html_entity_decode() to get everything
in its raw form (e.g. will convert € to ) and then use a regular
expression to convert things into numeric character references (e.g.
to €). Such a regular expression can be found in soapergem at gmail
dot com's 10 May 2006 comment here:
http://uk2.php.net/manual/en/function.htmlentities.php

That said, you're better off correcting the root problem -- that ñ
is not correct XML.


....which is precicely what I indent to do-- I want to convert such tags
as they come in as form inputs before they're sent to become XML files.

I'm not asking a question sideways of the problem and missing something
obvious, am I? I hate when that happens...

-Derik

Jun 15 '06 #3

This discussion thread is closed

Replies have been disabled for this discussion.