From python to LaTeX in emacs on windows

Brian Elmegaard

Hi group

I hope this is not a faq...

I try to understand how to use the new way of specifying a files
encoding, but no matter what I do I get strange characters in the
output.

I have a text file which I have generated in python by parsing some
html.

In the file there is international characters like é and ó.
I can see the file in emacs it is encoded as
mule-utf-8-dos

I read the file into python as a string and suddenly the characters
when printed looks strange and consists of two characters.

First problem: How do I avoid this?

Second problem is that I make some string replacements and more in
the string to write a latex output file. When I open this file in
emacs the characters now are not the same?

Second problem: How do I avoid this?

tia,
--
Brian (remove the sport for mail)
http://www.et.dtu.dk/staff/be

Jul 18 '05 #1

Subscribe Post Reply

2149

Benjamin Niemann

Brian Elmegaard wrote:

Hi group

I hope this is not a faq...

I try to understand how to use the new way of specifying a files
encoding, but no matter what I do I get strange characters in the
output.

I have a text file which I have generated in python by parsing some
html.

In the file there is international characters like é and ó.
I can see the file in emacs it is encoded as
mule-utf-8-dos

I read the file into python as a string and suddenly the characters
when printed looks strange and consists of two characters.

First problem: How do I avoid this?

Second problem is that I make some string replacements and more in
the string to write a latex output file. When I open this file in
emacs the characters now are not the same?

Second problem: How do I avoid this?

When you read the filecontents in python, you'll have the "raw" byte
sequence, in this case it is the UTF-8 encoding of unicode text. But you
probably want a unicode string. Use "text = unicode(data, 'utf-8')"
where "data" is the filecontent you read. After processing you probably
want to write it back to a file. Before you do this, you will have to
convert the unicode string back to a byte sequence. Use "data =
text.encode('utf')".

Handling character encodings correctly *is* difficult. It's no shame, if
you don't get it right on the first attempt.

Jul 18 '05 #2

Brian Elmegaard

Benjamin Niemann <b.*******@betternet.de> writes:

Thank for the help. I solved the problem by specifying the cp1252
encoding for the python file by a magic comment and for the input data file.

When you read the filecontents in python, you'll have the "raw" byte
sequence, in this case it is the UTF-8 encoding of unicode text. But
you probably want a unicode string. Use "text = unicode(data,
'utf-8')" where "data" is the filecontent you read. After processing
you probably want to write it back to a file. Before you do this, you
will have to convert the unicode string back to a byte sequence. Use
"data = text.encode('utf')".

This worked, but when I try to print text I get:
UnicodeEncodeError: 'ascii' codec can't encode characters in position 9-10: ordinal not in range(128)
Why is that?
Handling character encodings correctly *is* difficult.

What makes it difficult? The OS, the editor, python, latex?

--
Brian (remove the sport for mail)
http://www.et.dtu.dk/staff/be

Jul 18 '05 #3

Benjamin Niemann

Brian Elmegaard wrote:

Benjamin Niemann <b.*******@betternet.de> writes:

Thank for the help. I solved the problem by specifying the cp1252
encoding for the python file by a magic comment and for the input data file.

When you read the filecontents in python, you'll have the "raw" byte
sequence, in this case it is the UTF-8 encoding of unicode text. But
you probably want a unicode string. Use "text = unicode(data,
'utf-8')" where "data" is the filecontent you read. After processing
you probably want to write it back to a file. Before you do this, you
will have to convert the unicode string back to a byte sequence. Use
"data = text.encode('utf')".
This worked, but when I try to print text I get:
UnicodeEncodeError: 'ascii' codec can't encode characters in position 9-10: ordinal not in range(128)
Why is that?

The console only understands "byte streams". To print a unicode string,
python tries to encode it using the default encoding, which is 'ascii'
in your case. That encoding is not able to represent characters like
'ü', 'ä'.. which causes the exception. What I usually do is something like:
print text.encode("cp1251", "ignore")

The 'ignore' argument causes all characters, that cannot be represented
in cp1251 to be silently dropped - which is ok, if the output is only
used e.g. to track progress.

Don't know if there's a way to python to do this automagically for all
unicodes passed to stdout...

Handling character encodings correctly *is* difficult.

What makes it difficult? The OS, the editor, python, latex?

At least for me it is difficult, because I'm used to think "1 byte = 1
character" and when I read/write files I could simple handle the data as
strings. Unless you begin to parse arbitrary data from the internet,
there is little chance that you encounter text encodings different from
your operating systems default and you start to believe that e.g.
"ord('ü') == 252" is a universal rule sent by the gods...
If you do it right, then you should convert all data that 'enters' your
application as early as possible to unicode and encode it back when you
print/save/send it - this way you'll only have to deal with unicodes in
your application code. The most difficult part is probably changing old
habbits ;)

Jul 18 '05 #4

Similar topics

My future Python IDE article

by: David Mertz | last post by:

Pythonistas, My loyal fans :-) will remember that I did a Python IDE roundup for _Charming Python_ a couple years back. Now I have another such roundup lined up... not the very next article,...

Python

Python LaTeX codec?

by: David Eppstein | last post by:

Does anyone have an implemented Python codec for converting between unicode and LaTeX markup? E.g. I'd like 'ï' to be converted to '{\"\i}' and vice versa. Preferably including at least the Latin...

Python

Python Graphing Utilities.

by: Kenneth Miller | last post by:

Hello All, I am new to Python and i was wondering what graphing utlities would be available to me. I have already tried BLT and after weeks of unsuccesful installs i'd like to find something...

Python

Building minimal latex from scratch?

by: Tom | last post by:

I need a very, very minimal LaTeX system on Windows. I only need to have the possibility to get DVI files out of my tex files (with minimal fonts). An I need it without any installer (no settings...

.NET Framework

Python 3.0 or Python 3000?

by: John Salerno | last post by:

Is 'Python 3000' just a code name for version 3.0, or will it really be called that when it's released?

Python

Best IDE for Python

by: stylecomputers | last post by:

Hi All, What do you find the best IDE for creating web applications in Python is? Preferably FOS IDE. Cheers

Python

Python Expert

by: Perseo | last post by:

Hi guys, we are looking for a python developer for a European project. This project is multilangual and free it is called EuroCv and it need a module for exporting data in PDF. As web developer...

Python

emacs shell hangs on W32 with python

by: emin.shopper | last post by:

Emacs seems to freeze when doing certain shell commands on Microsoft Windows. The following is a simple example with Xemacs: ---------------------------------------------------------- ...

Python

Can I run a python program from within emacs?

by: jmDesktop | last post by:

Hi, I'm trying to learn Python. I using Aquamac an emac implementation with mac os x. I have a program. If I go to the command prompt and type pythong myprog.py, it works. Can the program be...

Python

How to turn on java script in a villaon keypad mobile phone

by: Charles Arthur | last post by:

How do i turn on java script on a villaon, callus and itel keypad mobile phone

Java

Batch import of multiple excel files into the database

by: ryjfgjl | last post by:

If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...

Data Management

Merging data from multiple Excel files

by: ryjfgjl | last post by:

In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...

Data Management

How to build RAID in BIOS?

by: Hystou | last post by:

There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

Computer Hardware

What is ONU?

by: marktang | last post by:

ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...

General

Problem With Comparison Operator <=> in G++

by: Oralloy | last post by:

Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...

C / C++

Maximizing Business Potential: The Nexus of Website Design and Digital Marketing

by: jinu1996 | last post by:

In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

Online Marketing

The easy way to turn off automatic updates for Windows 10/11

by: Hystou | last post by:

Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...

Windows Server

Discussion: How does Zigbee compare with other wireless protocols in smart home applications?

by: tracyyun | last post by:

Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...

General