473,410 Members | 1,953 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,410 software developers and data experts.

Detecting filename-encoding (on WinXP)?

Hi,

I have a need to store directory and filenames in a database. For the
database I chose to use UTF-8 encoding; but the actual encoding used is
probably immaterial: whichever coding I take, I'll run into this issue
eventually.

At first my code worked until I ran into a directory full of Cyrillic
characters and my program blew up.

So now what I need to know is, how do I find out in what encoding a
particular filename is? Is there a portable way for doing this? And if
not, then what is the non-portable way for doing this on Windows?
(WinXP)
(If there's only a non-portable way then I'll worry about porting it
later, if and when this program will ever have a need to run on a
Unix-like environment)
Many thanks in advance,

--Tim

Feb 2 '06 #1
4 1774
Tim N. van der Leeuw wrote:
Hi,

I have a need to store directory and filenames in a database. For the
database I chose to use UTF-8 encoding; but the actual encoding used is
probably immaterial: whichever coding I take, I'll run into this issue
eventually.

At first my code worked until I ran into a directory full of Cyrillic
characters and my program blew up.
How did you find the files? Did you pass a Unicode path as argument
to os.listdir()? See http://www.python.org/peps/pep-0277.html
So now what I need to know is, how do I find out in what encoding a
particular filename is? Is there a portable way for doing this? And if
not, then what is the non-portable way for doing this on Windows?
(WinXP)
(If there's only a non-portable way then I'll worry about porting it
later, if and when this program will ever have a need to run on a
Unix-like environment)


Feb 2 '06 #2
Hi Magnus,

I get the filename from a URL, which probably is not in any kind of
unicode-string but just a plain ASCII string. It should be possible to
cast this to an ASCII string -- I'll try it right away to see if this
works.

Thanks!

--Tim

Feb 2 '06 #3
On 2 Feb 2006 08:03:14 -0800, rumours say that "Tim N. van der Leeuw"
<ti*************@nl.unisys.com> might have written:
So now what I need to know is, how do I find out in what encoding a
particular filename is? Is there a portable way for doing this?


You said the filename comes as data, and not as contents of os.listdir(),
right?

You can only know (for almost certain) what encoding is *not* the filename
(by looping over encodings and marking those where .decode fails).

If it was textual data, you could be more successful in guessing (btw, it's
been a long time since I requested example texts from various encodings for
my encoding-guessing app, but I was sent only one) by testing characters in
pairs and their frequencies.
--
TZOTZIOY, I speak England very best.
"Dear Paul,
please stop spamming us."
The Corinthians
Feb 10 '06 #4
Actually, the directory-name comes in as a URL and as such I had no
problems yet just creating a unicode-string from it which I can pass to
os.walk(), and get proper unicode-filenames back from it.
Then I can encode them into utf-8 and pass them to the database-layer
and it all works.

cheers,

--Tim

Feb 10 '06 #5

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

11
by: Adam Parkin | last post by:
Hello all, I need to be able to detect if there is an active available Internet connection in my VB6 program. In my program I'm using the Internet Transfer Control to transfer some files by HTTP,...
5
by: Steve | last post by:
Hi everyone, I'm trying to make an openGL app (in visual c++) that allows the use of two mice simultaneously. Does anyone know if this is possible, how I might go about doing it or how I could...
2
by: Ralph | last post by:
I used to have Visual Basic .net std. 2003 installed on WinXP SP1A. But I found it too hard to upgrade WinXP to SP2. Now, I do have WinXP SP2 installed, but I am having problems installing...
0
by: Macadair | last post by:
I develop mid-range multi-user applications for a large site. Currently Win98 and Office97 are the standards used however we are slowly moving to WinXP. Due to the site upgrade cost, we are...
3
by: Frank Jiao | last post by:
I can get my local IPv6 address in Win2003 use the source code as below: string localName = Dns.GetHostName(); string address = ""; string scopeId = ""; IPHostEntry hostEntry = Dns.Resolve(...
1
by: Kevin | last post by:
My app was built with VC++ and run on WinXP, I found the controls style of my app is different from controls in those apps coming with WinXP, like button, check box, listctrl, etc, those controls...
4
by: | last post by:
I have earlier used an HttpModule that did URL rewrites on the BeginRequest event. Now I am trying to use the same module in a different application on a new and upgraded machine (winxp sp2). ...
79
by: VK | last post by:
I wandering about the common proctice of some UA's producers to spoof the UA string to pretend to be another browser (most often IE). Shouldn't it be considered as a trademark violation of the...
10
by: Nathan Sokalski | last post by:
I want to make sure I am doing a browser detection that will work once IE7 is released. My current detection statement (written using VB.NET) is: If Me.Request.Browser.Browser.ToUpper() = "IE"...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.