473,387 Members | 1,486 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,387 software developers and data experts.

Reading Windows CSV file with LCID entries under Linux.

Dear all,

I've stumbled over a problem with Windows Locale ID information and
codepages. I'm writing a Python application that parses a CSV file,
the format of a line in this file is "LCID;Text1;Text2". Each line can
contain a different locale id (LCID) and the text fields contain data
that is encoded in some codepage which is associated with this LCID. My
current data file contains the codes 1033 for German and 1031 for
English US (as listed in
http://www.microsoft.com/globaldev/r...lcid-all.mspx).
Unfortunately, I cannot find out which Codepage (like cp-1252 or
whatever) belongs to which LCID.

My question is: How can I convert this data into something more
reasonable like unicode? Basically, what I want is something like
"Text1;Text2", both fields encoded as UTF-8. Can this be done with
Python? How can I find out which codepage I have to use for 1033 and 1031?

Any help appreciated,
Thomas.
Sep 22 '08 #1
2 1590

ThomasMy question is: How can I convert this data into something more
Thomasreasonable like unicode? Basically, what I want is something
Thomaslike "Text1;Text2", both fields encoded as UTF-8. Can this be
Thomasdone with Python? How can I find out which codepage I have to
Thomasuse for 1033 and 1031?

There are examples at end of the CSV module documentation which show how to
create Unicode readers and writers. You can extend the UnicodeReader class
to peek at the LCID field and save the corresponding codepage for the
remainder of the line. (This would assume you're not creating CSV files
which contain newlines. Each line read would be assumed to be a new record
in the file.)

Skip
Sep 22 '08 #2
Thomas Troeger wrote:
I've stumbled over a problem with Windows Locale ID information and
codepages. I'm writing a Python application that parses a CSV file,
the format of a line in this file is "LCID;Text1;Text2". Each line can
contain a different locale id (LCID) and the text fields contain data
that is encoded in some codepage which is associated with this LCID. My
current data file contains the codes 1033 for German and 1031 for
English US (as listed in
http://www.microsoft.com/globaldev/r...lcid-all.mspx).
Unfortunately, I cannot find out which Codepage (like cp-1252 or
whatever) belongs to which LCID.

My question is: How can I convert this data into something more
reasonable like unicode? Basically, what I want is something like
"Text1;Text2", both fields encoded as UTF-8. Can this be done with
Python? How can I find out which codepage I have to use for 1033 and 1031?

The GetLocaleInfo API call can do that conversion:

http://msdn.microsoft.com/en-us/libr...70(VS.85).aspx

You'll need to use ctypes (or write a c extension) to
use it. Be aware that if it doesn't succeed you may need
to fall back on cp 65001 -- utf8.

TJG
Sep 22 '08 #3

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

6
by: Kevin T. Ryan | last post by:
Hi All - I'm not sure, but I'm wondering if this is a bug, or maybe (more likely) I'm misunderstanding something...see below: >>> f = open('testfile', 'w') >>> f.write('kevin\n') >>>...
16
by: Paul Rubin | last post by:
As what must be penance for something or other, I'm needing to release a Python app for use under Windows XP. Please be gentle with me since I'm a Un*x weenie and the only thing I've had much...
2
by: christos panagiotou | last post by:
hi all I am trying to open some .raw files that represent images (256x256, 8 bit per pixel, no header) in a c++ program I cannot copy paste the module here as it uses a method from the VTK...
4
by: Jason Kumpf | last post by:
OK I've been staring at this code all day and still with everything I have tried I cannot figure out two problems I am having. Once is why the space limit for the directory I create in the code...
1
by: Jason Wilson | last post by:
I've been tasked by my boss with configuring a MySQL install that we host for one our partners to support SSL using a commercial certificate. MySQL is installed on Windows 2000 Server. 1st: I...
9
by: vermarajeev | last post by:
Hi @all, I'm trying to read a binary file containing some data for my cross platform project. Here is the code snippet which will help me explain my problem const int AVAILSIZE = 100000;...
0
by: phil469 | last post by:
I'm having an issue when trying to read a file in a user's homedir from a cgi script. I have a virtual host section in my httpd.conf file which I'll include. The cgi script is a very basic script...
0
AmberJain
by: AmberJain | last post by:
Windows Autorun FAQs: Description NOTE- If you are unfamiliar with the concept of autoruns, then read "Windows Autorun FAQs: Overview". Que-1: How can I safely remove or edit the autorun...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.