On Thu, 06 Jul 2006 19:16:53 +0200, Stefan Behnel wrote:
Quote:
Quote:
>>
>Is there a correct way to handle text input from a <FORMwhen the page is
>utf-8 and that input is going to be used in SQL statements? I've tried
>things like (with no success):
>sql = u"select * from blah where col='%s'" % input
>
What about " ... % unicode(input, "UTF-8")" ?
>
>
I guess it's similar, I've had partial success with input.decode('utf-8')
before DB usage, and then output.encode('utf-8') for output. But although
this stores and displays newly added utf-8 texts correctly, it
causes other problems when displaying the existing texts. I think
they're suffering from a double encoding issue. It seems rather
strange the encode/decode appears to be required now, and not before.
Is this how it should be done?
Quote:
>
You didn't tell us what database you are using, which encoding your
database uses, which Python-DB interface library you deploy, and lots of
other things that might be helpful to solve your problem.
That would be MySQLdb with latin1, but I've tried various methods to make
it utf-8 (lots of guidance online). But this was only after I discovered
the breakage with the newer python. I.e. it has worked for years on both
machines and various python versions. I omitted that info because I can
paste the SQL into mysql's shell, it does the expected thing with no
errors, so I assumed the DB itself isn't the cause. I guess it could
be a new MySQLdb issue causing breakage.
I feel I can see part of the light, but if I'm close to what I think
is needed, it's not practical to change everything to handle encode/decode
site wide, especially as some of the data gets moved to Oracle for other
applications (most is written in Perl).
I'm thinking I need to do this now, is this the norm?:
get user input from web
text.encode('utf-8')
store or use as search in DB
text.decode('utf-8')
display page etc
The encode/decode stages have never been required before :-(