473,388 Members | 868 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,388 software developers and data experts.

printing ASCII character greater than 0x7f

i have come across a situation in my project where i read a text file with
some characters greater than hex 0x7f.

i need to write character (0xE0) to a new file as an exception. however when
i attempt to write this via "Console.Write" or "filestream.Write" it seems
the value changes. most of the output file is in text mode.

if i view the original file in binary mode i see the character i'm having
issue with as "e0 00" but when i re-write it i get "C3 A0".

i just can not reproduce the read information in my output file. is there
way to do this and if so how?
Jun 27 '08 #1
13 3841
Well, ASCII doesn't define any characters above this. You need to know
the codepage of the file, and use the correct encoding - via (for
example) Encoding.GetEncoding(int codepage) [and pass this encoding into
whichever StreamReader etc you are using].

Otherwise, translations will occur. And whether they represent your
original data is anyone's guess.

Marc
Jun 27 '08 #2
is there away to find the codepage? is there a good reference source?

"Marc Gravell" wrote:
Well, ASCII doesn't define any characters above this. You need to know
the codepage of the file, and use the correct encoding - via (for
example) Encoding.GetEncoding(int codepage) [and pass this encoding into
whichever StreamReader etc you are using].

Otherwise, translations will occur. And whether they represent your
original data is anyone's guess.

Marc
Jun 27 '08 #3
On Jun 9, 4:10 pm, auldh <au...@discussions.microsoft.comwrote:
is there away to find the codepage? is there a good reference source?
There's no hard and fast rule for determining the codepage of a file.
A single file (i.e. a sequence of bytes) may be valid (but with
different meanings) for several different codepages.

Jon
Jun 27 '08 #4
ok, i got the codepage and i see in MSDN how to set the codepage.

i see how to write the bytes but how can i write the character to the output
file and not just the bytes?

is there a good sample on how to read a string with codepage 1250 (E0 01).
then write via Console.Write and FileStream.Write to UTF16? the string is
from a registry key and not another file per say.

"Jon Skeet [C# MVP]" wrote:
On Jun 9, 4:10 pm, auldh <au...@discussions.microsoft.comwrote:
is there away to find the codepage? is there a good reference source?

There's no hard and fast rule for determining the codepage of a file.
A single file (i.e. a sequence of bytes) may be valid (but with
different meanings) for several different codepages.

Jon
Jun 27 '08 #5
auldh <au***@discussions.microsoft.comwrote:
ok, i got the codepage and i see in MSDN how to set the codepage.

i see how to write the bytes but how can i write the character to the output
file and not just the bytes?
Use a StreamWriter (either directly or around a stream) and specify
Encoding.GetEncoding(1250).
is there a good sample on how to read a string with codepage 1250 (E0 01).
then write via Console.Write and FileStream.Write to UTF16? the string is
from a registry key and not another file per say.
If you're reading the string from the registry, I'd expect it to be in
Unicode already. However, if you read it as bytes, just use
Encoding.GetString(bytes).

--
Jon Skeet - <sk***@pobox.com>
Web site: http://www.pobox.com/~skeet
Blog: http://www.msmvps.com/jon.skeet
C# in Depth: http://csharpindepth.com
Jun 27 '08 #6
Jon,
the key that i'm reading is "reg_sz" so it should not be byte. i guess the
value is corrupted because like you said the registry should be in Unicode
already.

and the base language is US english.
the tool i'm building is a registry export to "reg" format i want to do 2
things:
1) create an excetption reporting the key that is in trouble. (i got that
handled)
2) write the key and keyvalue to the export file just like it appears in the
registry.

i'm getting the keyname and keyvalue as string. how ever i can't
getFileStream.Write to rebuild the value correctly it is converting to???

how do i set the string to codepage 1250 on the read?
how do i set the FileStream.Write to codepage UTF16 to write?
i'm dizzy trying to get this done and not seeing straight.

"Jon Skeet [C# MVP]" wrote:
auldh <au***@discussions.microsoft.comwrote:
ok, i got the codepage and i see in MSDN how to set the codepage.

i see how to write the bytes but how can i write the character to the output
file and not just the bytes?

Use a StreamWriter (either directly or around a stream) and specify
Encoding.GetEncoding(1250).
is there a good sample on how to read a string with codepage 1250 (E0 01).
then write via Console.Write and FileStream.Write to UTF16? the string is
from a registry key and not another file per say.

If you're reading the string from the registry, I'd expect it to be in
Unicode already. However, if you read it as bytes, just use
Encoding.GetString(bytes).

--
Jon Skeet - <sk***@pobox.com>
Web site: http://www.pobox.com/~skeet
Blog: http://www.msmvps.com/jon.skeet
C# in Depth: http://csharpindepth.com
Jun 27 '08 #7
auldh <au***@discussions.microsoft.comwrote:
the key that i'm reading is "reg_sz" so it should not be byte. i guess the
value is corrupted because like you said the registry should be in Unicode
already.
Right. Have a look with regedit and see what it shows.
and the base language is US english.
the tool i'm building is a registry export to "reg" format i want to do 2
things:
1) create an excetption reporting the key that is in trouble. (i got that
handled)
2) write the key and keyvalue to the export file just like it appears in the
registry.
Well, that's tricky - because as far as I know you'll only get the key
value as a string.
i'm getting the keyname and keyvalue as string. how ever i can't
getFileStream.Write to rebuild the value correctly it is converting to???
If you've got garabage in the registry, you'll have a hard time
"fixing" it.
how do i set the string to codepage 1250 on the read?
You don't - the registry correctly reads whatever is there.
how do i set the FileStream.Write to codepage UTF16 to write?
Don't use a FileStream, use a StreamWriter and pass in the right
encoding.
i'm dizzy trying to get this done and not seeing straight.
See if http://pobox.com/~skeet/csharp/unicode.html helps at all.

--
Jon Skeet - <sk***@pobox.com>
Web site: http://www.pobox.com/~skeet
Blog: http://www.msmvps.com/jon.skeet
C# in Depth: http://csharpindepth.com
Jun 27 '08 #8
Looks like you are reading the Unicode UTF16 character “00E0” 'LATIN
SMALL LETTER A WITH GRAVE' http://www.fileformat.info/info/unic...00e0/index.htm
and then you are writing the character to a file using the default
encoding of UTF8 which will end up translating the UTF16 byte from
“00E0” to UTF8 “C3A0”.

If you want to have an exact copy of the bytes you are reading, then
you will need to set your StreamWritter encoding to UTF16 (.Net calls
it Encoding.Unicode) that way, if you open the file in binary mode you
will see “00E0” and not “C3A0”.

At least I think that is what’s going on…..

Rene
On Jun 9, 9:41*am, auldh <au...@discussions.microsoft.comwrote:
i have come across a situation in my project where i read a text file with
some characters greater than hex 0x7f.

i need to write character (0xE0) to a new file as an exception. however when
i attempt to write this via "Console.Write" or "filestream.Write" it seems
the value changes. most of the output file is in text mode.

if i view the original file in binary mode i see the character i'm having
issue with as "e0 00" but when i re-write it i get "C3 A0".

i just can not reproduce the read information in my output file. is there
way to do this and if so how?
Jun 27 '08 #9
it sounds right. but from the registry key it looks like an "a" with a "`"
over it and it comes closer to codepage 1250.

i just can't figue out how to copy extactly to a new output file.

if i could just find a way to write via "streamwriter.write" this one
character set via a different codepage.

"qg**********@mailinator.com" wrote:
Looks like you are reading the Unicode UTF16 character β€œ00E0” 'LATIN
SMALL LETTER A WITH GRAVE' http://www.fileformat.info/info/unic...00e0/index.htm
and then you are writing the character to a file using the default
encoding of UTF8 which will end up translating the UTF16 byte from
β€œ00E0” to UTF8 β€œC3A0”.

If you want to have an exact copy of the bytes you are reading, then
you will need to set your StreamWritter encoding to UTF16 (.Net calls
it Encoding.Unicode) that way, if you open the file in binary mode you
will see β€œ00E0” and not β€œC3A0”.

At least I think that is what’s going on…..

Rene
On Jun 9, 9:41 am, auldh <au...@discussions.microsoft.comwrote:
i have come across a situation in my project where i read a text file with
some characters greater than hex 0x7f.

i need to write character (0xE0) to a new file as an exception. however when
i attempt to write this via "Console.Write" or "filestream.Write" it seems
the value changes. most of the output file is in text mode.

if i view the original file in binary mode i see the character i'm having
issue with as "e0 00" but when i re-write it i get "C3 A0".

i just can not reproduce the read information in my output file. is there
way to do this and if so how?

Jun 27 '08 #10
static void Main(string[] args)
{
// Your none ASCII character.
char charFromRegistry = (char)0xE0;

// Save the char to a file.
using (FileStream fs = new FileStream(@"C:\Err.bin", FileMode.Create))
{
byte[] uniBytes = Encoding.Unicode.GetBytes(new char[] {
charFromRegistry });
fs.WriteByte(uniBytes[0]);
}

// Read the char from the file.
using (FileStream fs = new FileStream(@"C:\Err.bin", FileMode.Open))
{
byte b = (byte)fs.ReadByte();
}
}

Perhaps I am missing something????


"auldh" <au***@discussions.microsoft.comwrote in message
news:50**********************************@microsof t.com...
it sounds right. but from the registry key it looks like an "a" with a "`"
over it and it comes closer to codepage 1250.

i just can't figue out how to copy extactly to a new output file.

if i could just find a way to write via "streamwriter.write" this one
character set via a different codepage.

"qg**********@mailinator.com" wrote:
>Looks like you are reading the Unicode UTF16 character β€œ00E0” 'LATIN
SMALL LETTER A WITH GRAVE'
http://www.fileformat.info/info/unic...00e0/index.htm
and then you are writing the character to a file using the default
encoding of UTF8 which will end up translating the UTF16 byte from
β€œ00E0” to UTF8 β€œC3A0”.

If you want to have an exact copy of the bytes you are reading, then
you will need to set your StreamWritter encoding to UTF16 (.Net calls
it Encoding.Unicode) that way, if you open the file in binary mode you
will see β€œ00E0” and not β€œC3A0”.

At least I think that is what’s going on…..

Rene
On Jun 9, 9:41 am, auldh <au...@discussions.microsoft.comwrote:
i have come across a situation in my project where i read a text file
with
some characters greater than hex 0x7f.

i need to write character (0xE0) to a new file as an exception. however
when
i attempt to write this via "Console.Write" or "filestream.Write" it
seems
the value changes. most of the output file is in text mode.

if i view the original file in binary mode i see the character i'm
having
issue with as "e0 00" but when i re-write it i get "C3 A0".

i just can not reproduce the read information in my output file. is
there
way to do this and if so how?

Jun 27 '08 #11
Rene,
sorry i guess this would work if i did create a ".bin" file.

the output file is a "text" file. the program reads a given registry hive
and enumerates it.

the program emulates the "regedit" export but it will read local and remote
machine.

the program creates an output file in "text" format then in "reg" format.
the later can be imported via regedit.
it also validates a specific hive to see if there are missing keys, missing
values and corrupts as it did in this case.

i guess i realizing there are too many issues to over come unless i'm wrong.
1) in this test run on the a machine i found a key with the wrong codepage
being used.
2) i don't think i can change the codepage output in run-time. meaning if i
create the output file in "reg" mode i'm using default ASCII.
and the format of the file can not be changed to something else for a given
line.
3) i need to alter my plan to exclude the corrupted key and create an
error.txt file with exceptions.

if i'm wrong i look forward to input.

i would like to thank all you who volunteer your input. well done.
"Rene" wrote:
static void Main(string[] args)
{
// Your none ASCII character.
char charFromRegistry = (char)0xE0;

// Save the char to a file.
using (FileStream fs = new FileStream(@"C:\Err.bin", FileMode.Create))
{
byte[] uniBytes = Encoding.Unicode.GetBytes(new char[] {
charFromRegistry });
fs.WriteByte(uniBytes[0]);
}

// Read the char from the file.
using (FileStream fs = new FileStream(@"C:\Err.bin", FileMode.Open))
{
byte b = (byte)fs.ReadByte();
}
}

Perhaps I am missing something????


"auldh" <au***@discussions.microsoft.comwrote in message
news:50**********************************@microsof t.com...
it sounds right. but from the registry key it looks like an "a" with a "`"
over it and it comes closer to codepage 1250.

i just can't figue out how to copy extactly to a new output file.

if i could just find a way to write via "streamwriter.write" this one
character set via a different codepage.

"qg**********@mailinator.com" wrote:
Looks like you are reading the Unicode UTF16 character β€œ00E0” 'LATIN
SMALL LETTER A WITH GRAVE'
http://www.fileformat.info/info/unic...00e0/index.htm
and then you are writing the character to a file using the default
encoding of UTF8 which will end up translating the UTF16 byte from
β€œ00E0” to UTF8 β€œC3A0”.

If you want to have an exact copy of the bytes you are reading, then
you will need to set your StreamWritter encoding to UTF16 (.Net calls
it Encoding.Unicode) that way, if you open the file in binary mode you
will see β€œ00E0” and not β€œC3A0”.

At least I think that is what’s going on…..

Rene
On Jun 9, 9:41 am, auldh <au...@discussions.microsoft.comwrote:
i have come across a situation in my project where i read a text file
with
some characters greater than hex 0x7f.

i need to write character (0xE0) to a new file as an exception. however
when
i attempt to write this via "Console.Write" or "filestream.Write" it
seems
the value changes. most of the output file is in text mode.

if i view the original file in binary mode i see the character i'm
having
issue with as "e0 00" but when i re-write it i get "C3 A0".

i just can not reproduce the read information in my output file. is
there
way to do this and if so how?


Jun 27 '08 #12
auldh submitted this idea :
Rene,
sorry i guess this would work if i did create a ".bin" file.
Just rename the file to "something.txt". The
"Encoding.Unicode.GetBytes" part translated the characters in the
string to the bytes that should be in the file according to that
encoding. The filesystem sees no differences between "text" and
"binary" files: they all consist of a lot of "bytes".
You can either use a plain FileStream where you write bytes that you
got from passing a string through some encoding (as in the example) or
use a StringWriter with some encoding where you can "just" write a
string and have the exact same string-to-byte[] conversion take place
"below the covers".

Hans Kesting
>
>static void Main(string[] args)
{
// Your none ASCII character.
char charFromRegistry = (char)0xE0;

// Save the char to a file.
using (FileStream fs = new FileStream(@"C:\Err.bin", FileMode.Create))
{
byte[] uniBytes = Encoding.Unicode.GetBytes(new char[] {
charFromRegistry });
fs.WriteByte(uniBytes[0]);
}

// Read the char from the file.
using (FileStream fs = new FileStream(@"C:\Err.bin", FileMode.Open))
{
byte b = (byte)fs.ReadByte();
}
}

Jun 27 '08 #13
ok, found this issue between my registry output file and the one created by
Windows.

regedit uses codepage encoding i just used StreamWriter with out specifing a
codepage. in seems regedit uses "Encoding.Unicode" specifier.

now i can get my character and binary compare also shows a perfect replica.

thanks everyone for your help.
Jun 27 '08 #14

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

5
by: lkrubner | last post by:
I know I'm missing something obvious, but I looked hard at this page and did not see the format of the return specified: http://us3.php.net/manual/en/function.ord.php >From the limited example...
37
by: chandy | last post by:
Hi, I have an Html document that declares that it uses the utf-8 character set. As this document is editable via a web interface I need to make sure than high-ascii characters that may be...
34
by: _libra_ | last post by:
Hi, Can someone tell me how to obtein the values of certain especial characteres like the left arrow or the Start or del key? Thanks.
5
by: mail2atulmehta | last post by:
I have a question. how to generate two files, one in UTF-8, the other in ASCII with the same column length SO that when i do the conversion from utf-8 to ascii, the column length does not change ....
16
by: chunhui_true | last post by:
I know in ASCII '\r' is 0x0d,'\n' is 0x0a. But some say ASCII characters in UTF8 is unchanged. Now I want to know in UTF8 '\r' and '\n' are already 0x0d and 0x0a?? Could anybody can tell me? Very...
17
by: Scott Starker | last post by:
Hi all. What I want to do is create a control array using labels (I already figure out how to create a control array) with 1 label equal to the ASCII font character it displays. What I have...
31
by: Claude Yih | last post by:
Hi, everyone. I got a question. How can I identify whether a file is a binary file or an ascii text file? For instance, I wrote a piece of code and saved as "Test.c". I knew it was an ascii text...
24
by: ChaosKCW | last post by:
Hi I am reading from an oracle database using cx_Oracle. I am writing to a SQLite database using apsw. The oracle database is returning utf-8 characters for euopean item names, ie special...
29
by: Ron Garret | last post by:
>>> u'\xbd' u'\xbd' >>> print _ Traceback (most recent call last): File "<stdin>", line 1, in ? UnicodeEncodeError: 'ascii' codec can't encode character u'\xbd' in position 0: ordinal not in...
0
by: taylorcarr | last post by:
A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.