473,406 Members | 2,220 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,406 software developers and data experts.

Conditional for...in failing with utf-8, Spanish book translation

Hi all,

This is my first Usenet post!
I've run into a wall with my first Python program. I'm writing some
simple code to take a text file that's utf-8 and in Spanish and to use
online translation tools to convert it, word-by-word, into English. Then
I'm generating a PDF with both of the languages.

Most of this is working great, but I get intermittent errors of the form:

---

Translating coche(coche)...
Already cached!
English: car
Translating ahora(ahora)...
tw returned now
English: now
Translating mismo?(mismo)...
Already cached!
English: same
Translating ¡A(�a)...
iconv: illegal input sequence at position 0
tw returned error: the required parameter "srctext" is missing
English: error: the required parameter "srctext" is missing

---

The output should look like:
Translating Raw_Text(lowercaserawtextwithoutpunctuation)...
tw returned englishtranslation
English: englishtranslation

I've narrowed the problem down to a simple test program. Check this out:

---

# -*- coding: utf-8 -*-

acceptable = "abcdefghijklmnopqrstuvwxyzóÃ*ñú" # this line will work
acceptable = "abcdefghijklmnopqrstuvwxyzóÃ*ñúá" # this line won't
#wtf?

word = "¡A"
word_key = ''.join([c for c in word.lower() if c in acceptable])
print "word_key = " + word_key

---

Any ideas? I'm really stumped!

Thanks,
Hunter
Jun 27 '08 #1
2 1038
On Mon, 21 Apr 2008 08:33:47 +0200, Hunter wrote:
I've narrowed the problem down to a simple test program. Check this out:

---

# -*- coding: utf-8 -*-

acceptable = "abcdefghijklmnopqrstuvwxyzóÃ*ñú" # this line will work
acceptable = "abcdefghijklmnopqrstuvwxyzóÃ*ñúá" # this line won't
#wtf?

word = "¡A"
word_key = ''.join([c for c in word.lower() if c in acceptable])
print "word_key = " + word_key

---

Any ideas? I'm really stumped!
You are not working with unicode but UTF-8 encoded characters. That's
bytes and not letters/characters. Your `word` for example contains three
bytes and not the two characters you think it contains:

In [43]: word = "¡A"

In [44]: len(word)
Out[44]: 3

In [45]: for c in word: print repr(c)
....:
'\xc2'
'\xa1'
'A'

So you are *not* testing if ¡ is in `acceptable` but the two byte values
that are the UTF-8 representation of that character.

Ciao,
Marc 'BlackJack' Rintsch
Jun 27 '08 #2
Hunter wrote:
I've narrowed the problem down to a simple test program. Check this out:

---

# -*- coding: utf-8 -*-

acceptable = "abcdefghijklmnopqrstuvwxyzóÃ*ñú" # this line will work
acceptable = "abcdefghijklmnopqrstuvwxyzóÃ*ñúá" # this line won't
[bad words stripped]

this should read

acceptable = u"abcdefghijklmnopqrstuvwxyzóÃ*ñú"
acceptable = u"abcdefghijklmnopqrstuvwxyzóÃ*ñúá"

Mind the little "u" before the string, which makes it a unicode string instead
of an encoded byte string.

http://docs.python.org/tut/node5.htm...00000000000000

Stefan
Jun 27 '08 #3

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

7
by: Guy Hocking | last post by:
Hi there, I have a problem in my ASP/SQL Server application i am developing, i hope you guys can help. I have a ASP form with list boxes populated by SQL tables. When a user selects a value...
28
by: Benjamin Niemann | last post by:
Hello, I've been just investigating IE conditional comments - hiding things from non-IE/Win browsers is easy, but I wanted to know, if it's possible to hide code from IE/Win browsers. I found...
4
by: Keith | last post by:
Hi, I'm in the process of learning a bit of web programming, and I'm making an effort to stay compliant with current web standards. I have a minimal web page with divs intended to change colour...
92
by: Raghavendra R A V, CSS India | last post by:
hie.. Do any one knows how to write a C program without using the conditional statements if, for, while, do, switch, goto and even condotional statements ? It would be a great help for me if...
3
by: Robert W. | last post by:
I have a library that I'd like to have both a WinForms app and a CF app utilize. Generally the code is very compatible but sometimes there are some differences. At first I thought I could use...
8
by: Olov Johansson | last post by:
I just found out that JavaScript 1.5 (I tested this with Firefox 1.0.7 and Konqueror 3.5) has support not only for standard function definitions, function expressions (lambdas) and Function...
2
by: Brad Farrell | last post by:
I'm trying to code a line in my .asp that will check for one of three values in a field and either: a) if null, no display b) if set to -.01, show the "Please call..." c) show the price ...
8
by: Typehigh | last post by:
I have many text fields with conditional formatting applied, specifically when the condition is "Field Has Focus". Without any events associated with the fields the conditional formatting works...
5
by: paulo | last post by:
Can anyone please tell me how the C language interprets the following code: #include <stdio.h> int main(void) { int a = 1; int b = 10; int x = 3;
2
by: sajithamol | last post by:
What can be done to save a script from failing, when it happens for a IF command to check a Null value with a value?
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.