473,386 Members | 1,819 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,386 software developers and data experts.

Dealing with textfiles with multiple encodings

I'm planning to write an app that will extract certain messages in
mailboxes stored in Eudora and Thunderbird formats. These are both
"plain text" formats, but the character encoding varies greatly from one
message to the other. I wonder how to deal with that, so that the
messages comes out in the right encoding.

I suppose I need to decode each message according to its MIME headers to
get it right, but now I wonder how to store the messages in memory
before decoding them. Would it be right to treat the text as binary data?

Gustaf
Oct 21 '06 #1
2 1209
I suppose I need to decode each message according to its MIME headers to
get it right, but now I wonder how to store the messages in memory
before decoding them. Would it be right to treat the text as binary data?
Take a look at System.Net.Mime
--
Mihai Nita [Microsoft MVP, Windows - SDK]
http://www.mihai-nita.net
------------------------------------------
Replace _year_ with _ to get the real email
Oct 21 '06 #2
Yes, you can treat it as binary data, i.e. a byte array. You can use the
ASCII encoding to read enough of the message to determine the encoding,
and the convert it with the correct encoding.

Gustaf wrote:
I'm planning to write an app that will extract certain messages in
mailboxes stored in Eudora and Thunderbird formats. These are both
"plain text" formats, but the character encoding varies greatly from one
message to the other. I wonder how to deal with that, so that the
messages comes out in the right encoding.

I suppose I need to decode each message according to its MIME headers to
get it right, but now I wonder how to store the messages in memory
before decoding them. Would it be right to treat the text as binary data?

Gustaf
Oct 21 '06 #3

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

27
by: John Roth | last post by:
PEP 263 is marked finished in the PEP index, however I haven't seen the specified Phase 2 in the list of changes for 2.4 which is when I expected it. Did phase 2 get cancelled, or is it just not...
4
by: Jorgen Gustafsson | last post by:
Hi, im trying to write a small progam to compare data in 2 textfiles. I want to search for values that doesnt exist in File2. The result should be "3" in the example below but Im not able to do...
11
by: enrique | last post by:
I want to display different languages in a single web document, each likely with a different encoding. I found this:...
10
by: Bugs | last post by:
I believe I read in a relatively recent thread that the reason python24.dll is so large compared to previous releases is that all the language encodings are linked into the library? Are there...
2
by: Matt | last post by:
I have a client that transmits a file to us with many XML documents enclosed. The problem is that each is a different format and may have different encodings as they contain information from many...
3
by: sophie_newbie | last post by:
Hi, I want to store python text strings that characters like "é" "Č" in a mysql varchar text field. Now my problem is that mysql does not seem to accept these characters. I'm wondering if there...
1
by: Stephan Rose | last post by:
Question everyone, I may be slightly off-topic with this but I'm not really sure where else to go with this. what's the "best/easiest" ways to deal with string encodings? Right now, I'm...
13
by: mario | last post by:
Hello! i stumbled on this situation, that is if I decode some string, below just the empty string, using the mcbs encoding, it succeeds, but if I try to encode it back with the same encoding it...
9
by: Stef Mientki | last post by:
hello, I want to search multiple textfiles (python source files) for a specific word. I can find all files, open them and do a search, but I guess that will be rather slow. I couldn't find...
3
by: Philip Semanchuk | last post by:
On Nov 9, 2008, at 7:00 PM, News123 wrote: Look under the heading "Standard Encodings": http://docs.python.org/library/codecs.html Note that both the page you found (which appears to be a...
0
by: taylorcarr | last post by:
A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.