473,699 Members | 2,702 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Converting text and detecting encoding

Hello.

What I want to do is simple: correctly reading a text file whose encoding is
not known (it can be Ascii,UTF7,UTF8 or Unicode).

I'm thinking of something like that:

1) Read the text as Ascii:

string text="";
System.Text.Enc oding encoding=System .Text.Encoding. ASCII;
using (StreamReader sr=new StreamReader(fi lePath,encoding ))
{
text=sr.ReadToE nd();
}

2)Implement some kind of static methods like the following:

public static System.Text.Enc oding GetEncodingFrom Text(string text)
{
[...]
}

3)Convert the string "text" into the correct encoding.
I got no idea on how to implement points 2 and 3.
Any suggestion is welcome.
Jul 4 '06 #1
3 5825
"Flix" <wr***@newsgrou p.comwrote:
What I want to do is simple: correctly reading a text file whose encoding is
not known (it can be Ascii,UTF7,UTF8 or Unicode).
It's not really that simple. Text can be UTF-16, in little-endian or
big-endian, without a BOM (byte order marker), for example. Check out
the IsTextUnicode() Win32 API - the functionality in Windows essentially
uses heuristics and guesses.

Check out Encoding.GetPre amble() in the docs for other possible clues.

-- Barry

--
http://barrkel.blogspot.com/
Jul 4 '06 #2

"Barry Kelly" <ba***********@ gmail.comha scritto nel messaggio
news:mg******** *************** *********@4ax.c om...
"Flix" <wr***@newsgrou p.comwrote:
>What I want to do is simple: correctly reading a text file whose encoding
is
not known (it can be Ascii,UTF7,UTF8 or Unicode).

It's not really that simple. Text can be UTF-16, in little-endian or
big-endian, without a BOM (byte order marker), for example. Check out
the IsTextUnicode() Win32 API - the functionality in Windows essentially
uses heuristics and guesses.

Check out Encoding.GetPre amble() in the docs for other possible clues.

-- Barry
Thank you for your reply.
Jul 4 '06 #3
Flix <wr***@newsgrou p.comwrote:
What I want to do is simple: correctly reading a text file whose encoding is
not known (it can be Ascii,UTF7,UTF8 or Unicode).
That's not simple - it's impossible. Every UTF7 file is a valid ASCII
file, for instance.

--
Jon Skeet - <sk***@pobox.co m>
http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too
Jul 4 '06 #4

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

1
2194
by: H. Kaya | last post by:
Hallo, I have a problem converting a XML file to a other. I have no idea how I can do this. I try it for a long time but I can not find a solution. Has anyone a Idea? Below you can find my Input XML Document, Output XML Document and my scratch XSLT file. At the end is my request Output XML. Greetings H. Kaya
3
9921
by: Stephan Brunner | last post by:
Hi I have created two flavors of an XSLT stylesheet to transform all attributes of an XML document to elements: They both work as expected with MSXML and XMLSPY but throw an exception ========================= <?xml version="1.0" encoding="iso-8859-1"?> <xsl:stylesheet version="1.0"
7
2052
by: jj | last post by:
It seems simple but I had a hard time converting this small function from vb.net to C#. I even tried the software that automatically converts from vb to c# , but to no avail. Please can someone take some precious minutes to help me out. Thanks John Public Function SimpleCrypt( ByVal Text As String) As String Dim strTempChar As String, i As Integer
5
1487
by: scott | last post by:
hi all, hope some one can help me, this prob is driving me mad. im using sockets to communicate between a client and a server. I don't' have control over the client and how it sends the data so i have had to try and work out how it is doing it. From what i can see it is sending the data in ASCII because if i try to use Unicode it just stops working, where's with ASCII i can get response from it and send and receive data.
5
2518
by: Robert | last post by:
I have a series of web applications (configured as separate applications) on a server. There is a main application at the root and then several virtual directories that are independant applications. I am testing an upgrade of all of the sites and have converted the main root site...although not necessarily fixed any issues. I move on instead and converted one of the virtual roots that is a seperate
12
7767
by: Mark Rae | last post by:
Hi, Can anyone please tell me how to convert an unserializeable object say, a System.Web.Mail.MailMessage object, to a byte array and then convert the byte array to a Base64 string? Any assistance gratefully received. Best regards,
4
34167
by: George | last post by:
Hi, I am puzzled by the following and seeking some assistance to help me understand what happened. I have very limited encoding knowledge. Our SAP system writes out a text file which includes German characters. 1. When I use StreamReader(System.String filepath) without specifying an encoding method, the German characters such as Ä are lost when I do a ReadLine()
2
5381
by: Nikola Skoric | last post by:
What I have is a bunch of text in arabic, and series of Unicode bytes which represent those arabic words (like this: \'c2\'e4\'f6\'d3\'f3\'c9 \'f1). Now I have to figure out how to convert my arabic text to bunch of \'somethings. If I understood Unicode correctly (and I'm not sure if I did), I first have to figure out which encoding this is (UTF-16 or UTF-32 or some other) and then convert the letters to their byte representation. I think...
3
3597
by: Jone | last post by:
Hello, I have tried to convert a Mac text file to Windows using code below. Encoding Win = Encoding.GetEncoding("Windows-1252"); Encoding Mac = Encoding.GetEncoding("macintosh"); byte macBytes = Mac.GetBytes(Row);
0
9032
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
1
8908
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
7745
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
1
6532
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
4374
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
0
4626
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
3054
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
2
2344
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
3
2008
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.