473,405 Members | 2,187 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,405 software developers and data experts.

Convert DOS Cyrillic text to Unicode

How can I convert DOS cyrillic text to Unicode
Nov 20 '05 #1
10 7986
* "Nikolay Petrov" <jo******@mail.bg> scripsit:
How can I convert DOS cyrillic text to Unicode


Take a look at the 'System.Text.Encoding' class.

--
Herfried K. Wagner [MVP]
<URL:http://dotnet.mvps.org/>
<URL:http://dotnet.mvps.org/dotnet/faqs/>
Nov 20 '05 #2
I am doing this all day ;-)
still nothing ;-)

"Herfried K. Wagner [MVP]" <hi***************@gmx.at> wrote in message
news:%2****************@TK2MSFTNGP10.phx.gbl...
* "Nikolay Petrov" <jo******@mail.bg> scripsit:
How can I convert DOS cyrillic text to Unicode


Take a look at the 'System.Text.Encoding' class.

--
Herfried K. Wagner [MVP]
<URL:http://dotnet.mvps.org/>
<URL:http://dotnet.mvps.org/dotnet/faqs/>

Nov 20 '05 #3
LOL
I am doing this all day ;-)
still nothing ;-)
How can I convert DOS cyrillic text to Unicode


Take a look at the 'System.Text.Encoding' class.

Nov 20 '05 #4
did i mention that this is going to be my first app ;-)

"Cor Ligthert" <no**********@planet.nl> wrote in message
news:ef**************@tk2msftngp13.phx.gbl...
LOL
I am doing this all day ;-)
still nothing ;-)
> How can I convert DOS cyrillic text to Unicode

Take a look at the 'System.Text.Encoding' class.


Nov 20 '05 #5
Hi Nikolay,

Send some code in advance, when you have luck Jay will help you, you can
send this as well to the newsgroup.

Microsoft.public.dotnet.general

There you have a change that Jon Skeet will help you.

They are the two who do the most encoding problems.

Cor
Nov 20 '05 #6
Nikolay,
In addition to the other comments

What is the Code Page for DOS Cyrillic? Quickly checking MSDN I think its
866, but you need to double check!

You would use Encoding.GetEncoding to get the DOS Cyrillic Encoding object.

Imports System.Text

Dim cyrillic As Encoding = Encoding.GetEncoding(866)

Given an array of Bytes with DOS Cyrillic in it, you would use
Encoding.GetString to convert to a Unicode String.

Dim bytes() As Byte
Dim s As String = cyrillic.GetString(bytes)

Given a Unicode String, you would us Encoding.GetBytes to get an array of
Bytes with DOS cyrillic.

bytes = cyrillic.GetBytes(s)

If your DOS cyrillic is in a Text File you pass the Encoding object to your
System.IO reader & writer classes

Dim input As New StreamReader("myCyrillic.txt", cyrillic)

Dim output As New StreamWriter("myCyrillic.txt", False, cyrillic)

For information on Unicode, Encoding, and code pages (such as DOS Cyrillic)
see:

http://www.yoda.arachsys.com/csharp/unicode.html

One last thing: Once you have a String it is Unicode! Only Byte arrays &
Streams contain DOS Cyrillic and other character encodings.

Hope this helps
Jay

"Nikolay Petrov" <jo******@mail.bg> wrote in message
news:u8**************@TK2MSFTNGP09.phx.gbl...
How can I convert DOS cyrillic text to Unicode

Nov 20 '05 #7
Thanks Cor

"Cor Ligthert" <no**********@planet.nl> wrote in message
news:OI**************@TK2MSFTNGP12.phx.gbl...
Hi Nikolay,

Send some code in advance, when you have luck Jay will help you, you can
send this as well to the newsgroup.

Microsoft.public.dotnet.general

There you have a change that Jon Skeet will help you.

They are the two who do the most encoding problems.

Cor

Nov 20 '05 #8
That was very helpfull.
But I have some problems. Let me first tell you exactly what I want to
achieve.
I've made a simple ASP .NET page with two text boxes and a button.
What I need is, that a user paste DOS cyrillic text (taken from Notepad) in
left text box,
and when he clicks the button, the Converted to Unicode text to appear at
the right box.
So I get the DOS text as String, not as bytes. How should I proceed in this
case?
"Jay B. Harlow [MVP - Outlook]" <Ja************@msn.com> wrote in message
news:ut**************@TK2MSFTNGP12.phx.gbl...
Nikolay,
In addition to the other comments

What is the Code Page for DOS Cyrillic? Quickly checking MSDN I think its
866, but you need to double check!

You would use Encoding.GetEncoding to get the DOS Cyrillic Encoding object.
Imports System.Text

Dim cyrillic As Encoding = Encoding.GetEncoding(866)

Given an array of Bytes with DOS Cyrillic in it, you would use
Encoding.GetString to convert to a Unicode String.

Dim bytes() As Byte
Dim s As String = cyrillic.GetString(bytes)

Given a Unicode String, you would us Encoding.GetBytes to get an array of
Bytes with DOS cyrillic.

bytes = cyrillic.GetBytes(s)

If your DOS cyrillic is in a Text File you pass the Encoding object to your System.IO reader & writer classes

Dim input As New StreamReader("myCyrillic.txt", cyrillic)

Dim output As New StreamWriter("myCyrillic.txt", False, cyrillic)

For information on Unicode, Encoding, and code pages (such as DOS Cyrillic) see:

http://www.yoda.arachsys.com/csharp/unicode.html

One last thing: Once you have a String it is Unicode! Only Byte arrays &
Streams contain DOS Cyrillic and other character encodings.

Hope this helps
Jay

"Nikolay Petrov" <jo******@mail.bg> wrote in message
news:u8**************@TK2MSFTNGP09.phx.gbl...
How can I convert DOS cyrillic text to Unicode


Nov 20 '05 #9
Nikolay,
What I need is, that a user paste DOS Cyrillic text (taken from Notepad) in left text box, I would expect Notepad will have Windows Cyrillic or Unicode or think it
has, depending on the version of Windows & your regional settings in Control
Panel.
So I get the DOS text as String, not as bytes. How should I proceed in this case? No you don't get DOS text as a String!

Strings in .NET are always Unicode! Period.

Notepad, the browser & ASP.NET has already converted your "DOS text" into
Unicode for you. As I stated Notepad made an assumption of what kind of text
it is, then the browser used some encoding, such as UTF-8 or Windows
Cyrillic to send the response to ASP.NET as a stream of bytes. ASP.NET then
converted this response stream of bytes into a Unicode String. Hence your
program now has a Unicode string!

I've only used the normal encoding for requests & response in ASP.NET, so
I'm not certain on how to use a specific encoding for requests & responses.

Unfortunately you will need to ask in one of the ASP.NET newsgroups, such as
microsoft.public.dotnet.framework.aspnet for specifics on specific encodings
on requests & responses...

Notice that in the above there is a whole lot of converting going on! Once
your user opened the file in Notepad it was converted, an assumption was
made about the type of text in the file (I strongly suspect the assumption
was not DOS Cyrillic). Then when you cut & pasted the text from notepad to
your browser a conversion may have been made, but more then likely it was
done in the code page of your regional settings in windows, then when you
submitted the page to ASP.NET a conversion is made from the request/response
encoding into Unicode. So by the time ASP.NET gets you text is has already
been converted for you, so it is no where near DOC Cyrillic any more.

If you have files with DOS Cyrillic in them and you need or want to use
ASP.NET to convert them to Unicode I would recommend rather then using a
notepad, a text box and cut & paste. That you use the input type=file HTML
control to upload your DOS Cyrillic to the server as a stream of bytes
(preserving the DOS Cyrillic), then using the encoding object as I showed to
read this stream validly converting it to Unicode.

Hope this helps
Jay

"Nikolay Petrov" <jo******@mail.bg> wrote in message
news:eb**************@TK2MSFTNGP12.phx.gbl... That was very helpfull.
But I have some problems. Let me first tell you exactly what I want to
achieve.
I've made a simple ASP .NET page with two text boxes and a button.
What I need is, that a user paste DOS cyrillic text (taken from Notepad) in left text box,
and when he clicks the button, the Converted to Unicode text to appear at
the right box.
So I get the DOS text as String, not as bytes. How should I proceed in this case?
"Jay B. Harlow [MVP - Outlook]" <Ja************@msn.com> wrote in message
news:ut**************@TK2MSFTNGP12.phx.gbl...
Nikolay,
In addition to the other comments

What is the Code Page for DOS Cyrillic? Quickly checking MSDN I think its 866, but you need to double check!

You would use Encoding.GetEncoding to get the DOS Cyrillic Encoding

object.

Imports System.Text

Dim cyrillic As Encoding = Encoding.GetEncoding(866)

Given an array of Bytes with DOS Cyrillic in it, you would use
Encoding.GetString to convert to a Unicode String.

Dim bytes() As Byte
Dim s As String = cyrillic.GetString(bytes)

Given a Unicode String, you would us Encoding.GetBytes to get an array of Bytes with DOS cyrillic.

bytes = cyrillic.GetBytes(s)

If your DOS cyrillic is in a Text File you pass the Encoding object to

your
System.IO reader & writer classes

Dim input As New StreamReader("myCyrillic.txt", cyrillic)

Dim output As New StreamWriter("myCyrillic.txt", False, cyrillic)

For information on Unicode, Encoding, and code pages (such as DOS

Cyrillic)
see:

http://www.yoda.arachsys.com/csharp/unicode.html

One last thing: Once you have a String it is Unicode! Only Byte arrays &
Streams contain DOS Cyrillic and other character encodings.

Hope this helps
Jay

"Nikolay Petrov" <jo******@mail.bg> wrote in message
news:u8**************@TK2MSFTNGP09.phx.gbl...
How can I convert DOS cyrillic text to Unicode



Nov 20 '05 #10
Definitely, Jay. Thank you!
I've got it working allready.
Thank you again.

"Jay B. Harlow [MVP - Outlook]" <Ja************@msn.com> wrote in message
news:%2****************@TK2MSFTNGP10.phx.gbl...
Nikolay,
What I need is, that a user paste DOS Cyrillic text (taken from Notepad) in
left text box,

I would expect Notepad will have Windows Cyrillic or Unicode or think it
has, depending on the version of Windows & your regional settings in

Control Panel.
So I get the DOS text as String, not as bytes. How should I proceed in this
case?

No you don't get DOS text as a String!

Strings in .NET are always Unicode! Period.

Notepad, the browser & ASP.NET has already converted your "DOS text" into
Unicode for you. As I stated Notepad made an assumption of what kind of

text it is, then the browser used some encoding, such as UTF-8 or Windows
Cyrillic to send the response to ASP.NET as a stream of bytes. ASP.NET then converted this response stream of bytes into a Unicode String. Hence your
program now has a Unicode string!

I've only used the normal encoding for requests & response in ASP.NET, so
I'm not certain on how to use a specific encoding for requests & responses.
Unfortunately you will need to ask in one of the ASP.NET newsgroups, such as microsoft.public.dotnet.framework.aspnet for specifics on specific encodings on requests & responses...

Notice that in the above there is a whole lot of converting going on! Once
your user opened the file in Notepad it was converted, an assumption was
made about the type of text in the file (I strongly suspect the assumption
was not DOS Cyrillic). Then when you cut & pasted the text from notepad to
your browser a conversion may have been made, but more then likely it was
done in the code page of your regional settings in windows, then when you
submitted the page to ASP.NET a conversion is made from the request/response encoding into Unicode. So by the time ASP.NET gets you text is has already
been converted for you, so it is no where near DOC Cyrillic any more.

If you have files with DOS Cyrillic in them and you need or want to use
ASP.NET to convert them to Unicode I would recommend rather then using a
notepad, a text box and cut & paste. That you use the input type=file HTML
control to upload your DOS Cyrillic to the server as a stream of bytes
(preserving the DOS Cyrillic), then using the encoding object as I showed to read this stream validly converting it to Unicode.

Hope this helps
Jay

"Nikolay Petrov" <jo******@mail.bg> wrote in message
news:eb**************@TK2MSFTNGP12.phx.gbl...
That was very helpfull.
But I have some problems. Let me first tell you exactly what I want to
achieve.
I've made a simple ASP .NET page with two text boxes and a button.
What I need is, that a user paste DOS cyrillic text (taken from Notepad)

in
left text box,
and when he clicks the button, the Converted to Unicode text to appear at
the right box.
So I get the DOS text as String, not as bytes. How should I proceed in

this
case?
"Jay B. Harlow [MVP - Outlook]" <Ja************@msn.com> wrote in message news:ut**************@TK2MSFTNGP12.phx.gbl...
Nikolay,
In addition to the other comments

What is the Code Page for DOS Cyrillic? Quickly checking MSDN I think

its 866, but you need to double check!

You would use Encoding.GetEncoding to get the DOS Cyrillic Encoding

object.

Imports System.Text

Dim cyrillic As Encoding = Encoding.GetEncoding(866)

Given an array of Bytes with DOS Cyrillic in it, you would use
Encoding.GetString to convert to a Unicode String.

Dim bytes() As Byte
Dim s As String = cyrillic.GetString(bytes)

Given a Unicode String, you would us Encoding.GetBytes to get an array of Bytes with DOS cyrillic.

bytes = cyrillic.GetBytes(s)

If your DOS cyrillic is in a Text File you pass the Encoding object to

your
System.IO reader & writer classes

Dim input As New StreamReader("myCyrillic.txt", cyrillic)

Dim output As New StreamWriter("myCyrillic.txt", False, cyrillic)

For information on Unicode, Encoding, and code pages (such as DOS

Cyrillic)
see:

http://www.yoda.arachsys.com/csharp/unicode.html

One last thing: Once you have a String it is Unicode! Only Byte arrays & Streams contain DOS Cyrillic and other character encodings.

Hope this helps
Jay

"Nikolay Petrov" <jo******@mail.bg> wrote in message
news:u8**************@TK2MSFTNGP09.phx.gbl...
> How can I convert DOS cyrillic text to Unicode
>
>



Nov 20 '05 #11

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

10
by: Markus Ernst | last post by:
Hi I have a string such as Добро" that shows the cyrillic word "?????" in the browser. Now I played around with lots of examples and contributed functions in the manual...
35
by: Philipp Lenssen | last post by:
Does anybody have experience displaying Cyrillic in common browsers with common settings? I found the following page researching the topic, however I cannot display all characters in the table...
17
by: Nikolay Petrov | last post by:
How can I convert DOS cyrillic text to Unicode
8
by: Kirill Simonov | last post by:
Hi, Could anyone suggest me a simple IDE suitable for teaching Python as a first programming language to high school students? It is necessary that it has a good support for input/output in...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.