By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
454,409 Members | 1,524 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 454,409 IT Pros & Developers. It's quick & easy.

Debugging charset problems with XSLT and PHP4/MySQL4

P: n/a
Hi!

We've been building a pretty big web app here for internal use. SMS
text messages come in from an aggregator and are stored in a MySQL 4
db. Our operators then deal with them using a web interface. The db is
queried using PHP4 and the results output as XML which is then
transformed using XSLT into XHTML.

Now, in our testing environment everything works just fine. However,
when we try and run it with actual live data, any incoming SMS message
that contains a non-ASCII character breaks the system at the Sablotron
stage (invalid token).

Now the aggregating service is sending us the incoming messages UTF-8
encoded. The XML and XSL is all set up to be UTF-8. However, somewhere
along the lines something is getting screwed up so that Sablotron barfs
(typical examples are pound signs or euro signs).

I'm having a hard time debugging this because as far as I can tell
everything is set to be using UTF-8 by default. Clearly something isn't
(MySQL possibly). I'd really appreciate some pointers for things to
check.

TIA,

Darren

Jul 17 '05 #1
Share this Question
Share on Google+
2 Replies


P: n/a
After spending hours googling and checking mailing list archives I can
see that many have come across this problem but there is very little by
way of solutions. However, I think I have managed to isolate what I
think is causing the problem. It's the use of "echo" in various places.
I did not know that echo's output is always ASCII. Now I'd actually
like to re-write the various parts that use "echo" in a totally
different way, but in the meantime what is a multibyte equivalent of
"echo"?

Jul 17 '05 #2

P: n/a
**sigh**

My last response was a complete red-herring. I got this inaccurate
information from the last comment on this bug report:

http://bugs.php.net/bug.php?id=17792&edit=1

After going up that blind alley I did the sensible thing and tested for
myself whether echo would output utf-8 by creating a utf-8 PHP file
with

echo "<long list of random non-ascii unicode characters>";

in it and of course it worked fine (with default_charset = "utf-8" in
php.ini).

I believe the problem was to do with our MySQL tables using Latin1. We
seem to have some kind of workaround in place using
mb_convert_encoding($xml, "UTF-8", "Latin1") before sending the xml
through Sablotron. Frankly I'm still confused but things are at least
working for now.

Best, Darren

Jul 17 '05 #3

This discussion thread is closed

Replies have been disabled for this discussion.