473,625 Members | 3,329 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

problems with  character

I have a mysql database with characters like   » in it. I'm
trying to write a python script to remove these, but I'm having a
really hard time.

These strings are coming out as type 'str' not 'unicode' so I tried to
just

record[4].replace('Â', '')

but this does nothing. However the following code works

#!/usr/bin/python

s = 'aaaaa  aaa'
print type(s)
print s
print s.find('Â')

This returns
<type 'str'>
aaaaa  aaa
6

The other odd thing is that the  character shows up as two spaces if
I print it to the terminal from mysql, but it shows up as  when I
print from the simple script above.
What am I doing wrong?

Jul 18 '05 #1
12 9827
jdonnell wrote:
I have a mysql database with characters like   » in it. I'm
trying to write a python script to remove these, but I'm having a
really hard time.


use the "hammer" recipe. i'm using it to create URL-friendly
fragment from latin-1 album titles:

<http://aspn.activestat e.com/ASPN/Cookbook/Python/Recipe/251871>
(check the last comment, "a cleaner solution"
for a better implementation) .

it basically hammers down accented chars like à and Â
to the most near ASCII representation.

since you receive string data as str from mysql
object first convert them as unicode with:

u = unicode('Â', 'latin-1')

then feed u to the hammer function (the fix_unicode at the
end).

HTH,
deelan

--
"Però è bello sapere che, di questi tempi spietati, almeno
un mistero sopravvive: l'età di Afef Jnifen." -- dagospia.com
Jul 18 '05 #2
>>s = 'aaaaa  aaa'
What am I doing wrong?


First get rid of characters not allowed
in Python code.
Replace  with appropriate escape
sequence: /x## where ## is the
hexadecimal code of the ASCII
character.

Claudio
Jul 18 '05 #3

"Claudio Grondi" <cl************ @freenet.de> schrieb im Newsbeitrag
news:3a******** *****@individua l.net...
s = 'aaaaa  aaa'
What am I doing wrong?


First get rid of characters not allowed
in Python code.
Replace  with appropriate escape
sequence: /x## where ## is the (should be \x##)
hexadecimal code of the ASCII
character.

Claudio


i.e. probably instead of 'aaaaa  aaa'
'aaaaa \xC2 aaa'
In my ASCII table 'Â' is '\xC2'

Claudio
Jul 18 '05 #4
aaaaa  aaa'
0123456
It's OK

Jul 18 '05 #5
And this run OK for me :

s = 'aaaaa  aaa'
print s
print s.replace('Â', '')

Jul 18 '05 #6
In <11************ *********@o13g2 000cwo.googlegr oups.com>, jdonnell wrote:
I have a mysql database with characters like   » in it. I'm
trying to write a python script to remove these, but I'm having a
really hard time.

[...]

The other odd thing is that the  character shows up as two spaces if
I print it to the terminal from mysql, but it shows up as  when I
print from the simple script above.
What am I doing wrong?


Is it possible that your DB stores strings UTF-8 encoded? The
byte sequence '\xc2\xa0' which displays as 'Â ' in latin-1 encoding is a
non breakable space character.

Ciao,
Marc 'BlackJack' Rintsch

Jul 18 '05 #7
I had this problem recently. It turned out that something
had encoded a unicode string into utf-8. When I found
the culprit and fixed the underlying design issue, it went away.

John Roth

"jdonnell" <ja********@gma il.com> wrote in message
news:11******** *************@o 13g2000cwo.goog legroups.com...
I have a mysql database with characters like   » in it. I'm
trying to write a python script to remove these, but I'm having a
really hard time.

These strings are coming out as type 'str' not 'unicode' so I tried to
just

record[4].replace('Â', '')

but this does nothing. However the following code works

#!/usr/bin/python

s = 'aaaaa  aaa'
print type(s)
print s
print s.find('Â')

This returns
<type 'str'>
aaaaa  aaa
6

The other odd thing is that the  character shows up as two spaces if
I print it to the terminal from mysql, but it shows up as  when I
print from the simple script above.
What am I doing wrong?

Jul 18 '05 #8
On Tue, 22 Mar 2005 20:09:55 -0600, "John Roth" <ne********@jhr othjr.com> wrote:
I had this problem recently. It turned out that something
had encoded a unicode string into utf-8. When I found
the culprit and fixed the underlying design issue, it went away.

John Roth

"jdonnell" <ja********@gma il.com> wrote in message
news:11******* **************@ o13g2000cwo.goo glegroups.com.. .
I have a mysql database with characters like   » in it. I'm
trying to write a python script to remove these, but I'm having a
really hard time.

These strings are coming out as type 'str' not 'unicode' so I tried to
just

record[4].replace('Â', '')

but this does nothing. However the following code works

#!/usr/bin/python

s = 'aaaaa  aaa'
print type(s)
print s
print s.find('Â')

This returns
<type 'str'>
aaaaa  aaa
6

The other odd thing is that the  character shows up as two spaces if
I print it to the terminal from mysql, but it shows up as  when I
print from the simple script above.
What am I doing wrong?

What encodings are involved?

This is from idle on windows, which seems to display latin-1 source ok:
----
"Latin-1:»\n".decode( 'latin-1') u'Latin-1:\xc2\xbb\n' "Latin-1:»\n".decode( 'latin-1').encode('cp4 37', 'replace') 'Latin-1:?\xaf\n' "Latin-1:»\n".decode( 'latin-1').encode('cp4 37', 'ignore') 'Latin-1:\xaf\n' u'Latin-1:\xc2\xbb\n'.e ncode('cp437',' replace') 'Latin-1:?\xaf\n' ----
Now this is in an NT4 console windows with code page 437:

---- u'Latin-1:\xc2\xbb\n'.e ncode('cp437',' replace') 'Latin-1:?\xaf\n' import sys
sys.stdout.writ e(u'Latin-1:\xc2\xbb\n'.e ncode('cp437',' replace')) Latin-1:?»
----

Notice that the interactive output does a repr that creates the \xaf, but
the character is available and can be written non-repr'd via sys.stdout.writ e.

For the heck of it:
sys.stdout.writ e(u'Latin-1:\xc2\xbb\n'.e ncode('cp437',' xmlcharrefrepla ce'))

Latin-1:»

I don't know if this is going to get through to your screen ;-)

Regards,
Bengt Richter
Jul 18 '05 #9
Thanks for all the replies. I just got in to work so I haven't tried
any of them yet. I see that I wasn't as clear as I should have been so
I'll clarify a little. I'm grabbing some data from msn's rss feed.
Here's an example.
http://search.msn.com/results.aspx?q...=rss&FORM=ZZRE

The string ' all domain name extensions » Good' is where I have a
problem. The
' »' shows up as '  »' when I write it to a file or stick
it in mysql. I did a hex dump and this is what I see.

jay@localhost:~/scripts> cat test.txt
extensions » Good
jay@localhost:~/scripts> xxd test.txt
0000000: 6578 7465 6e73 696f 6e73 20c2 a020 c2a0 extensions .. ..
0000010: 20c2 bb20 476f 6f64 0a .. Good

One thing that jumps out is that two of the Â's are c2a0, but one of
them is c2bb. Well, those are the details since I wasn't clear before.

Jul 18 '05 #10

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

8
2117
by: Bart Plessers \(artabel\) | last post by:
Hello, I have problems with the quotation mark and strings in my asp script. I made a general FORM (myform.asp) to read out data from a dbase Some vars are defined in the FORM: SQL_DBASE SQL_SELECT SQL_TABLE SQL_CONDITION
0
2559
by: Blah Blah | last post by:
i just thought i'd shoot out a quick email on problems i've been having with utf-8 in moving from 4.1.0 to 4.1.1. (please note that because i am using UTF-8 as my default character set, i compiled from source rather than using a premade binary) i know that i'm working with alpha software - this is more of a warning/sharing of knowledge than a complaint. BASIC ISSUE: i am unable to use UTF-8 with mysql 4.1.1 and connector/j 3.0.9.
3
2215
by: Curious Angel | last post by:
Help? Spec Character Problems w/JAVASCRIPT TOOLTIP Hi, I'm experiencing bizarre problems with quote marks that previously displayed properly in a Javascript TOOLTIP I wrote a year ago . . . and now, inexplicably, won't translate (?). The COPYRIGHT text was originally written using Typographers Quotes (in the ALT-0145, ALT-0146, ALT-0147, and ALT-0148 family). For those who don't know what I'm talking about, these are the more stylized...
12
18354
by: David | last post by:
I am having some issues with Firefox not rendering an element with the correct font. I am using the font-family style within a stylesheet class definition. I then set the element I am creating to use that class and the font is not showing correctly. If I set the font-family to "arial" or "Comic Sans MS" the font shows correctly, however I am trying to use the "Webdings" font, which is installed on my computer. All fonts show correctly...
18
6500
by: gabriel | last post by:
greetings, I am currently working on a website where I need to print the Euro symbol and some "oe" like in "oeuvre". If I choose this : <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd" > <html
1
1719
by: Bryan Olson | last post by:
Yesterday I embarrassed myself on sci.crypt with some incorrect C code and corresponding claims about the language. My source was Harbison and Steele (H&S), /C, A Reference Manual/ and I thought I was on strong ground. The first issue, legal inputs for toupper() and other character- handling functions, I wrote up and sent to the appropriate address for errata items. Dr. Harbison subsequently acknowledged that it seems to be bug. My...
18
4608
by: james | last post by:
Hi, I am loading a CSV file ( Comma Seperated Value) into a Richtext box. I have a routine that splits the data up when it hits the "," and then copies the results into a listbox. The data also has some different characters in it that I am trying to remove. The small a with two dots over it and the small y with two dots over it. Here is my code so far to remove the small y: Private Sub Button2_Click(ByVal sender As System.Object, ByVal...
21
2483
by: Doug Lerner | last post by:
I'm working on a client/server app that seems to work fine in OS Firefox and Windows IE and Firefox. However, in OS X Safari, although the UI/communications themselves work fine, if the characters getting sent back and forth are in Japanese they come back from the server "moji bake" (corrupted). Anybody have any ideas why this might work differently in Safari than in Firefox or IE?
6
1678
by: ThunderMusic | last post by:
Hi, We are trying to encode to ISO-8859-1, but we have problems doing it using the encoders in .NET. We get some unknown characters in some culture which comes out fine if we post (from IE) from a page in ISO-8859-1 to another page using ISO-8859-1, but cannot take a .NET string or a UTF-8 string, convert it in ISO-8859-1 and display it with this encoding using the same content in the string... Are there anyone that know how IE does it?...
3
4353
by: Klaus Herzberg | last post by:
Hi, I come from the "dark side" php/mysql and there often problems with character sets (utf-8, latin...) and storing data in datebase. Exists in the world of dot.net and ms-sql-server similiar problems? To precise: I have to store xml-data in database. Maybe its better to encode (like base64) the strings? Perhaps there are some links to read?
0
8256
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
8635
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
0
8497
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
0
7184
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
0
5570
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
4089
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
0
4193
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
1803
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
2
1500
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.