471,596 Members | 1,218 Online
Bytes | Software Development & Data Engineering Community
Post +

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 471,596 software developers and data experts.

Debugging charset problems with XSLT and PHP4/MySQL4

Hi!

We've been building a pretty big web app here for internal use. SMS
text messages come in from an aggregator and are stored in a MySQL 4
db. Our operators then deal with them using a web interface. The db is
queried using PHP4 and the results output as XML which is then
transformed using XSLT into XHTML.

Now, in our testing environment everything works just fine. However,
when we try and run it with actual live data, any incoming SMS message
that contains a non-ASCII character breaks the system at the Sablotron
stage (invalid token).

Now the aggregating service is sending us the incoming messages UTF-8
encoded. The XML and XSL is all set up to be UTF-8. However, somewhere
along the lines something is getting screwed up so that Sablotron barfs
(typical examples are pound signs or euro signs).

I'm having a hard time debugging this because as far as I can tell
everything is set to be using UTF-8 by default. Clearly something isn't
(MySQL possibly). I'd really appreciate some pointers for things to
check.

TIA,

Darren

Jul 17 '05 #1
2 1742
After spending hours googling and checking mailing list archives I can
see that many have come across this problem but there is very little by
way of solutions. However, I think I have managed to isolate what I
think is causing the problem. It's the use of "echo" in various places.
I did not know that echo's output is always ASCII. Now I'd actually
like to re-write the various parts that use "echo" in a totally
different way, but in the meantime what is a multibyte equivalent of
"echo"?

Jul 17 '05 #2
**sigh**

My last response was a complete red-herring. I got this inaccurate
information from the last comment on this bug report:

http://bugs.php.net/bug.php?id=17792&edit=1

After going up that blind alley I did the sensible thing and tested for
myself whether echo would output utf-8 by creating a utf-8 PHP file
with

echo "<long list of random non-ascii unicode characters>";

in it and of course it worked fine (with default_charset = "utf-8" in
php.ini).

I believe the problem was to do with our MySQL tables using Latin1. We
seem to have some kind of workaround in place using
mb_convert_encoding($xml, "UTF-8", "Latin1") before sending the xml
through Sablotron. Frankly I'm still confused but things are at least
working for now.

Best, Darren

Jul 17 '05 #3

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

4 posts views Thread by CJ Oxx | last post: by
4 posts views Thread by >>-Archer--> | last post: by
1 post views Thread by alex bazan | last post: by
16 posts views Thread by Serdar Kalayc | last post: by
reply views Thread by Joseph S. | last post: by
2 posts views Thread by Piotr | last post: by
reply views Thread by Anwar ali | last post: by

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.