473,394 Members | 1,750 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,394 software developers and data experts.

From python to LaTeX in emacs on windows

Hi group

I hope this is not a faq...

I try to understand how to use the new way of specifying a files
encoding, but no matter what I do I get strange characters in the
output.

I have a text file which I have generated in python by parsing some
html.

In the file there is international characters like é and ó.
I can see the file in emacs it is encoded as
mule-utf-8-dos

I read the file into python as a string and suddenly the characters
when printed looks strange and consists of two characters.

First problem: How do I avoid this?

Second problem is that I make some string replacements and more in
the string to write a latex output file. When I open this file in
emacs the characters now are not the same?

Second problem: How do I avoid this?

tia,
--
Brian (remove the sport for mail)
http://www.et.dtu.dk/staff/be
Jul 18 '05 #1
3 2149
Brian Elmegaard wrote:
Hi group

I hope this is not a faq...

I try to understand how to use the new way of specifying a files
encoding, but no matter what I do I get strange characters in the
output.

I have a text file which I have generated in python by parsing some
html.

In the file there is international characters like é and ó.
I can see the file in emacs it is encoded as
mule-utf-8-dos

I read the file into python as a string and suddenly the characters
when printed looks strange and consists of two characters.

First problem: How do I avoid this?

Second problem is that I make some string replacements and more in
the string to write a latex output file. When I open this file in
emacs the characters now are not the same?

Second problem: How do I avoid this?


When you read the filecontents in python, you'll have the "raw" byte
sequence, in this case it is the UTF-8 encoding of unicode text. But you
probably want a unicode string. Use "text = unicode(data, 'utf-8')"
where "data" is the filecontent you read. After processing you probably
want to write it back to a file. Before you do this, you will have to
convert the unicode string back to a byte sequence. Use "data =
text.encode('utf')".

Handling character encodings correctly *is* difficult. It's no shame, if
you don't get it right on the first attempt.
Jul 18 '05 #2
Benjamin Niemann <b.*******@betternet.de> writes:

Thank for the help. I solved the problem by specifying the cp1252
encoding for the python file by a magic comment and for the input data file.
When you read the filecontents in python, you'll have the "raw" byte
sequence, in this case it is the UTF-8 encoding of unicode text. But
you probably want a unicode string. Use "text = unicode(data,
'utf-8')" where "data" is the filecontent you read. After processing
you probably want to write it back to a file. Before you do this, you
will have to convert the unicode string back to a byte sequence. Use
"data = text.encode('utf')".

This worked, but when I try to print text I get:
UnicodeEncodeError: 'ascii' codec can't encode characters in position 9-10: ordinal not in range(128)
Why is that?
Handling character encodings correctly *is* difficult.


What makes it difficult? The OS, the editor, python, latex?

--
Brian (remove the sport for mail)
http://www.et.dtu.dk/staff/be
Jul 18 '05 #3
Brian Elmegaard wrote:
Benjamin Niemann <b.*******@betternet.de> writes:

Thank for the help. I solved the problem by specifying the cp1252
encoding for the python file by a magic comment and for the input data file.

When you read the filecontents in python, you'll have the "raw" byte
sequence, in this case it is the UTF-8 encoding of unicode text. But
you probably want a unicode string. Use "text = unicode(data,
'utf-8')" where "data" is the filecontent you read. After processing
you probably want to write it back to a file. Before you do this, you
will have to convert the unicode string back to a byte sequence. Use
"data = text.encode('utf')".
This worked, but when I try to print text I get:
UnicodeEncodeError: 'ascii' codec can't encode characters in position 9-10: ordinal not in range(128)
Why is that?

The console only understands "byte streams". To print a unicode string,
python tries to encode it using the default encoding, which is 'ascii'
in your case. That encoding is not able to represent characters like
'ü', 'ä'.. which causes the exception. What I usually do is something like:
print text.encode("cp1251", "ignore")

The 'ignore' argument causes all characters, that cannot be represented
in cp1251 to be silently dropped - which is ok, if the output is only
used e.g. to track progress.

Don't know if there's a way to python to do this automagically for all
unicodes passed to stdout...

Handling character encodings correctly *is* difficult.

What makes it difficult? The OS, the editor, python, latex?

At least for me it is difficult, because I'm used to think "1 byte = 1
character" and when I read/write files I could simple handle the data as
strings. Unless you begin to parse arbitrary data from the internet,
there is little chance that you encounter text encodings different from
your operating systems default and you start to believe that e.g.
"ord('ü') == 252" is a universal rule sent by the gods...
If you do it right, then you should convert all data that 'enters' your
application as early as possible to unicode and encode it back when you
print/save/send it - this way you'll only have to deal with unicodes in
your application code. The most difficult part is probably changing old
habbits ;)
Jul 18 '05 #4

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

30
by: David Mertz | last post by:
Pythonistas, My loyal fans :-) will remember that I did a Python IDE roundup for _Charming Python_ a couple years back. Now I have another such roundup lined up... not the very next article,...
5
by: David Eppstein | last post by:
Does anyone have an implemented Python codec for converting between unicode and LaTeX markup? E.g. I'd like 'ï' to be converted to '{\"\i}' and vice versa. Preferably including at least the Latin...
22
by: Kenneth Miller | last post by:
Hello All, I am new to Python and i was wondering what graphing utlities would be available to me. I have already tried BLT and after weeks of unsuccesful installs i'd like to find something...
1
by: Tom | last post by:
I need a very, very minimal LaTeX system on Windows. I only need to have the possibility to get DVI files out of my tex files (with minimal fonts). An I need it without any installer (no settings...
12
by: John Salerno | last post by:
Is 'Python 3000' just a code name for version 3.0, or will it really be called that when it's released?
18
by: stylecomputers | last post by:
Hi All, What do you find the best IDE for creating web applications in Python is? Preferably FOS IDE. Cheers
4
by: Perseo | last post by:
Hi guys, we are looking for a python developer for a European project. This project is multilangual and free it is called EuroCv and it need a module for exporting data in PDF. As web developer...
4
by: emin.shopper | last post by:
Emacs seems to freeze when doing certain shell commands on Microsoft Windows. The following is a simple example with Xemacs: ---------------------------------------------------------- ...
14
by: jmDesktop | last post by:
Hi, I'm trying to learn Python. I using Aquamac an emac implementation with mac os x. I have a program. If I go to the command prompt and type pythong myprog.py, it works. Can the program be...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.