I have a mysql database with characters like   » in it. I'm
trying to write a python script to remove these, but I'm having a
really hard time.
These strings are coming out as type 'str' not 'unicode' so I tried to
just
record[4].replace('Â', '')
but this does nothing. However the following code works
#!/usr/bin/python
s = 'aaaaa  aaa'
print type(s)
print s
print s.find('Â')
This returns
<type 'str'>
aaaaa  aaa
6
The other odd thing is that the  character shows up as two spaces if
I print it to the terminal from mysql, but it shows up as  when I
print from the simple script above.
What am I doing wrong? 12 9827
jdonnell wrote: I have a mysql database with characters like   » in it. I'm trying to write a python script to remove these, but I'm having a really hard time.
use the "hammer" recipe. i'm using it to create URL-friendly
fragment from latin-1 album titles:
<http://aspn.activestat e.com/ASPN/Cookbook/Python/Recipe/251871>
(check the last comment, "a cleaner solution"
for a better implementation) .
it basically hammers down accented chars like à and Â
to the most near ASCII representation.
since you receive string data as str from mysql
object first convert them as unicode with:
u = unicode('Â', 'latin-1')
then feed u to the hammer function (the fix_unicode at the
end).
HTH,
deelan
--
"Però è bello sapere che, di questi tempi spietati, almeno
un mistero sopravvive: l'età di Afef Jnifen." -- dagospia.com
>>s = 'aaaaa  aaa' What am I doing wrong?
First get rid of characters not allowed
in Python code.
Replace  with appropriate escape
sequence: /x## where ## is the
hexadecimal code of the ASCII
character.
Claudio
"Claudio Grondi" <cl************ @freenet.de> schrieb im Newsbeitrag
news:3a******** *****@individua l.net... s = 'aaaaa  aaa' What am I doing wrong?
First get rid of characters not allowed in Python code. Replace  with appropriate escape sequence: /x## where ## is the (should be \x##) hexadecimal code of the ASCII character.
Claudio
i.e. probably instead of 'aaaaa  aaa'
'aaaaa \xC2 aaa'
In my ASCII table 'Â' is '\xC2'
Claudio
aaaaa  aaa'
0123456
It's OK
And this run OK for me :
s = 'aaaaa  aaa'
print s
print s.replace('Â', '')
In <11************ *********@o13g2 000cwo.googlegr oups.com>, jdonnell wrote: I have a mysql database with characters like   » in it. I'm trying to write a python script to remove these, but I'm having a really hard time.
[...]
The other odd thing is that the  character shows up as two spaces if I print it to the terminal from mysql, but it shows up as  when I print from the simple script above. What am I doing wrong?
Is it possible that your DB stores strings UTF-8 encoded? The
byte sequence '\xc2\xa0' which displays as 'Â ' in latin-1 encoding is a
non breakable space character.
Ciao,
Marc 'BlackJack' Rintsch
I had this problem recently. It turned out that something
had encoded a unicode string into utf-8. When I found
the culprit and fixed the underlying design issue, it went away.
John Roth
"jdonnell" <ja********@gma il.com> wrote in message
news:11******** *************@o 13g2000cwo.goog legroups.com...
I have a mysql database with characters like   » in it. I'm
trying to write a python script to remove these, but I'm having a
really hard time.
These strings are coming out as type 'str' not 'unicode' so I tried to
just
record[4].replace('Â', '')
but this does nothing. However the following code works
#!/usr/bin/python
s = 'aaaaa  aaa'
print type(s)
print s
print s.find('Â')
This returns
<type 'str'>
aaaaa  aaa
6
The other odd thing is that the  character shows up as two spaces if
I print it to the terminal from mysql, but it shows up as  when I
print from the simple script above.
What am I doing wrong?
On Tue, 22 Mar 2005 20:09:55 -0600, "John Roth" <ne********@jhr othjr.com> wrote: I had this problem recently. It turned out that something had encoded a unicode string into utf-8. When I found the culprit and fixed the underlying design issue, it went away.
John Roth "jdonnell" <ja********@gma il.com> wrote in message news:11******* **************@ o13g2000cwo.goo glegroups.com.. . I have a mysql database with characters like   » in it. I'm trying to write a python script to remove these, but I'm having a really hard time.
These strings are coming out as type 'str' not 'unicode' so I tried to just
record[4].replace('Â', '')
but this does nothing. However the following code works
#!/usr/bin/python
s = 'aaaaa  aaa' print type(s) print s print s.find('Â')
This returns <type 'str'> aaaaa  aaa 6
The other odd thing is that the  character shows up as two spaces if I print it to the terminal from mysql, but it shows up as  when I print from the simple script above. What am I doing wrong?
What encodings are involved?
This is from idle on windows, which seems to display latin-1 source ok:
---- "Latin-1:»\n".decode( 'latin-1')
u'Latin-1:\xc2\xbb\n' "Latin-1:»\n".decode( 'latin-1').encode('cp4 37', 'replace')
'Latin-1:?\xaf\n' "Latin-1:»\n".decode( 'latin-1').encode('cp4 37', 'ignore')
'Latin-1:\xaf\n' u'Latin-1:\xc2\xbb\n'.e ncode('cp437',' replace')
'Latin-1:?\xaf\n'
----
Now this is in an NT4 console windows with code page 437:
---- u'Latin-1:\xc2\xbb\n'.e ncode('cp437',' replace')
'Latin-1:?\xaf\n' import sys sys.stdout.writ e(u'Latin-1:\xc2\xbb\n'.e ncode('cp437',' replace'))
Latin-1:?»
----
Notice that the interactive output does a repr that creates the \xaf, but
the character is available and can be written non-repr'd via sys.stdout.writ e.
For the heck of it:
sys.stdout.writ e(u'Latin-1:\xc2\xbb\n'.e ncode('cp437',' xmlcharrefrepla ce'))
Latin-1:»
I don't know if this is going to get through to your screen ;-)
Regards,
Bengt Richter
Thanks for all the replies. I just got in to work so I haven't tried
any of them yet. I see that I wasn't as clear as I should have been so
I'll clarify a little. I'm grabbing some data from msn's rss feed.
Here's an example. http://search.msn.com/results.aspx?q...=rss&FORM=ZZRE
The string ' all domain name extensions » Good' is where I have a
problem. The
' »' shows up as '  »' when I write it to a file or stick
it in mysql. I did a hex dump and this is what I see.
jay@localhost:~/scripts> cat test.txt
extensions » Good
jay@localhost:~/scripts> xxd test.txt
0000000: 6578 7465 6e73 696f 6e73 20c2 a020 c2a0 extensions .. ..
0000010: 20c2 bb20 476f 6f64 0a .. Good
One thing that jumps out is that two of the Â's are c2a0, but one of
them is c2bb. Well, those are the details since I wasn't clear before. This thread has been closed and replies have been disabled. Please start a new discussion. Similar topics |
by: Bart Plessers \(artabel\) |
last post by:
Hello,
I have problems with the quotation mark and strings in my asp script.
I made a general FORM (myform.asp) to read out data from a dbase
Some vars are defined in the FORM:
SQL_DBASE
SQL_SELECT
SQL_TABLE
SQL_CONDITION
|
by: Blah Blah |
last post by:
i just thought i'd shoot out a quick email on problems i've been having with
utf-8 in moving from 4.1.0 to 4.1.1. (please note that because i am using
UTF-8 as my default character set, i compiled from source rather than using
a premade binary)
i know that i'm working with alpha software - this is more of a
warning/sharing of knowledge than a complaint.
BASIC ISSUE: i am unable to use UTF-8 with mysql 4.1.1 and connector/j
3.0.9.
|
by: Curious Angel |
last post by:
Help? Spec Character Problems w/JAVASCRIPT TOOLTIP
Hi, I'm experiencing bizarre problems with quote marks that previously displayed
properly in a Javascript TOOLTIP I wrote a year ago . . . and now, inexplicably,
won't translate (?). The COPYRIGHT text was originally written using Typographers
Quotes (in the ALT-0145, ALT-0146, ALT-0147, and ALT-0148 family). For those who
don't know what I'm talking about, these are the more stylized...
|
by: David |
last post by:
I am having some issues with Firefox not rendering an element with the
correct font. I am using the font-family style within a stylesheet class
definition. I then set the element I am creating to use that class and the
font is not showing correctly. If I set the font-family to "arial" or
"Comic Sans MS" the font shows correctly, however I am trying to use the
"Webdings" font, which is installed on my computer. All fonts show
correctly...
|
by: gabriel |
last post by:
greetings,
I am currently working on a website where I need to print the Euro
symbol and some "oe" like in "oeuvre".
If I choose this :
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd" >
<html
| |
by: Bryan Olson |
last post by:
Yesterday I embarrassed myself on sci.crypt with some incorrect C
code and corresponding claims about the language. My source was
Harbison and Steele (H&S), /C, A Reference Manual/ and I thought
I was on strong ground.
The first issue, legal inputs for toupper() and other character-
handling functions, I wrote up and sent to the appropriate
address for errata items. Dr. Harbison subsequently
acknowledged that it seems to be bug. My...
|
by: james |
last post by:
Hi, I am loading a CSV file ( Comma Seperated Value) into a Richtext box. I have a routine that splits the data up when it hits
the "," and then copies the results into a listbox. The data also has some different characters in it that I am trying to
remove. The small a with two dots over it and the small y with two dots over it. Here is my code so far to remove the small y:
Private Sub Button2_Click(ByVal sender As System.Object, ByVal...
|
by: Doug Lerner |
last post by:
I'm working on a client/server app that seems to work fine in OS Firefox and
Windows IE and Firefox.
However, in OS X Safari, although the UI/communications themselves work
fine, if the characters getting sent back and forth are in Japanese they
come back from the server "moji bake" (corrupted).
Anybody have any ideas why this might work differently in Safari than in
Firefox or IE?
|
by: ThunderMusic |
last post by:
Hi,
We are trying to encode to ISO-8859-1, but we have problems doing it using
the encoders in .NET. We get some unknown characters in some culture which
comes out fine if we post (from IE) from a page in ISO-8859-1 to another
page using ISO-8859-1, but cannot take a .NET string or a UTF-8 string,
convert it in ISO-8859-1 and display it with this encoding using the same
content in the string...
Are there anyone that know how IE does it?...
|
by: Klaus Herzberg |
last post by:
Hi,
I come from the "dark side" php/mysql and there often problems with
character sets (utf-8, latin...) and storing data in datebase.
Exists in the world of dot.net and ms-sql-server similiar problems?
To precise: I have to store xml-data in database. Maybe its better to
encode (like base64) the strings?
Perhaps there are some links to read?
|
by: marktang |
last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look !
Part I. Meaning of...
| |
by: jinu1996 |
last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth.
The Art of Business Website Design
Your website is...
|
by: tracyyun |
last post by:
Dear forum friends,
With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
|
by: agi2029 |
last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own....
Now, this would greatly impact the work of software developers. The idea...
|
by: conductexam |
last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one.
At the time of converting from word file to html my equations which are in the word document file was convert into image.
Globals.ThisAddIn.Application.ActiveDocument.Select();...
|
by: TSSRALBI |
last post by:
Hello
I'm a network technician in training and I need your help.
I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs.
The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols.
I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
|
by: adsilva |
last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
| |
by: muto222 |
last post by:
How can i add a mobile payment intergratation into php mysql website.
|
by: bsmnconsultancy |
last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...
| |