473,387 Members | 1,463 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,387 software developers and data experts.

Text Encoding...

I need help. Please bear with this.

I have a program. It takes in files that are delimited.
The delimiters are declared in the file by looking at
fixed positions in the file (If you work with ANSI x12
files, you know what I mean). This normally isn't a
problem, but I'm getting a file that is using some odd
characters as delimiters.

Specifically, a Hex 'BA' is declared as a delimiter. I
read the file into memory using this...
=================================================
fs = New FileStream(InFile, System.IO.FileMode.Open,
System.IO.FileAccess.Read)
sr = New StreamReader(fs, System.Text.Encoding.UTF7)

InputString = sr.ReadToEnd
=================================================
At this point I close the file. In the string, I remove
any carriage control and line feed characters.

Then I write the string to a new file with this.
=================================================
fs2 = New FileStream(OutFile, System.IO.FileMode.Create,
System.IO.FileAccess.Write, IO.FileShare.Write)
sw = New StreamWriter(fs2, System.Text.Encoding.ASCII)

sw.Write(sANSIString)
=================================================

NOTE :Initially, I don't think my stream writer specified
encoding.

Anyway here is the problem....

The resulting file ends up with a different value in the
places where the hex 'BA' used to be. I've played with
various combinations of encoding, for both reading and
writing, and I'm not able to
maintain the character. I need to maintain this!

In one case, the single-byte hex 'BA' is actually
replaced with two bytes, but everything else in the file
is as it should be. In another case, the character is
a "?". I don't remember what happens in other
situations, but in no case is the hex 'BA' maintained.

I don't really understand encoding, so that is only
compounding my frustration and confusion.

Any help is greatly appreciated. I could supply more
details, if necessary.

QM.
Jul 21 '05 #1
6 1547
Not for sure here but her we go anyway....

Stop removing the cr and lf's from the stream the A in BA is actually a lf.
When you replace this your losing your ability to parse at that position.

cr = Carriage Return
lf = Line Feed

Bryan Martin
Sp**@ahwayside.com

"quincy" <qm*******@yahoo.com> wrote in message
news:2a*****************************@phx.gbl...
I need help. Please bear with this.

I have a program. It takes in files that are delimited.
The delimiters are declared in the file by looking at
fixed positions in the file (If you work with ANSI x12
files, you know what I mean). This normally isn't a
problem, but I'm getting a file that is using some odd
characters as delimiters.

Specifically, a Hex 'BA' is declared as a delimiter. I
read the file into memory using this...
=================================================
fs = New FileStream(InFile, System.IO.FileMode.Open,
System.IO.FileAccess.Read)
sr = New StreamReader(fs, System.Text.Encoding.UTF7)

InputString = sr.ReadToEnd
=================================================
At this point I close the file. In the string, I remove
any carriage control and line feed characters.

Then I write the string to a new file with this.
=================================================
fs2 = New FileStream(OutFile, System.IO.FileMode.Create,
System.IO.FileAccess.Write, IO.FileShare.Write)
sw = New StreamWriter(fs2, System.Text.Encoding.ASCII)

sw.Write(sANSIString)
=================================================

NOTE :Initially, I don't think my stream writer specified
encoding.

Anyway here is the problem....

The resulting file ends up with a different value in the
places where the hex 'BA' used to be. I've played with
various combinations of encoding, for both reading and
writing, and I'm not able to
maintain the character. I need to maintain this!

In one case, the single-byte hex 'BA' is actually
replaced with two bytes, but everything else in the file
is as it should be. In another case, the character is
a "?". I don't remember what happens in other
situations, but in no case is the hex 'BA' maintained.

I don't really understand encoding, so that is only
compounding my frustration and confusion.

Any help is greatly appreciated. I could supply more
details, if necessary.

QM.

Jul 21 '05 #2
Oh and BTW it seems your BA is represented as....

Hex B = Dec 11 which corresponds to vertical tab mostly added by word
processors.
Hex A = Dec 10 which corresponds to line feed.

Bryan Martin
Sp**@ahwayside.com

"Bryan Martin" <sp**@ahwayside.com> wrote in message
news:uJ**************@tk2msftngp13.phx.gbl...
Not for sure here but her we go anyway....

Stop removing the cr and lf's from the stream the A in BA is actually a lf. When you replace this your losing your ability to parse at that position.

cr = Carriage Return
lf = Line Feed

Bryan Martin
Sp**@ahwayside.com

"quincy" <qm*******@yahoo.com> wrote in message
news:2a*****************************@phx.gbl...
I need help. Please bear with this.

I have a program. It takes in files that are delimited.
The delimiters are declared in the file by looking at
fixed positions in the file (If you work with ANSI x12
files, you know what I mean). This normally isn't a
problem, but I'm getting a file that is using some odd
characters as delimiters.

Specifically, a Hex 'BA' is declared as a delimiter. I
read the file into memory using this...
=================================================
fs = New FileStream(InFile, System.IO.FileMode.Open,
System.IO.FileAccess.Read)
sr = New StreamReader(fs, System.Text.Encoding.UTF7)

InputString = sr.ReadToEnd
=================================================
At this point I close the file. In the string, I remove
any carriage control and line feed characters.

Then I write the string to a new file with this.
=================================================
fs2 = New FileStream(OutFile, System.IO.FileMode.Create,
System.IO.FileAccess.Write, IO.FileShare.Write)
sw = New StreamWriter(fs2, System.Text.Encoding.ASCII)

sw.Write(sANSIString)
=================================================

NOTE :Initially, I don't think my stream writer specified
encoding.

Anyway here is the problem....

The resulting file ends up with a different value in the
places where the hex 'BA' used to be. I've played with
various combinations of encoding, for both reading and
writing, and I'm not able to
maintain the character. I need to maintain this!

In one case, the single-byte hex 'BA' is actually
replaced with two bytes, but everything else in the file
is as it should be. In another case, the character is
a "?". I don't remember what happens in other
situations, but in no case is the hex 'BA' maintained.

I don't really understand encoding, so that is only
compounding my frustration and confusion.

Any help is greatly appreciated. I could supply more
details, if necessary.

QM.


Jul 21 '05 #3
I wondered about doing that, and I did run the read/write
without removing the crlf.... same problems.

-----Original Message-----
Not for sure here but her we go anyway....

Stop removing the cr and lf's from the stream the A in BA is actually a lf.When you replace this your losing your ability to parse at that position.
cr = Carriage Return
lf = Line Feed

Bryan Martin
Sp**@ahwayside.com

"quincy" <qm*******@yahoo.com> wrote in message
news:2a*****************************@phx.gbl...
I need help. Please bear with this.

I have a program. It takes in files that are delimited. The delimiters are declared in the file by looking at
fixed positions in the file (If you work with ANSI x12
files, you know what I mean). This normally isn't a
problem, but I'm getting a file that is using some odd
characters as delimiters.

Specifically, a Hex 'BA' is declared as a delimiter. I
read the file into memory using this...
=================================================
fs = New FileStream(InFile, System.IO.FileMode.Open,
System.IO.FileAccess.Read)
sr = New StreamReader(fs, System.Text.Encoding.UTF7)

InputString = sr.ReadToEnd
=================================================
At this point I close the file. In the string, I remove any carriage control and line feed characters.

Then I write the string to a new file with this.
=================================================
fs2 = New FileStream(OutFile, System.IO.FileMode.Create, System.IO.FileAccess.Write, IO.FileShare.Write)
sw = New StreamWriter(fs2, System.Text.Encoding.ASCII)

sw.Write(sANSIString)
=================================================

NOTE :Initially, I don't think my stream writer specified encoding.

Anyway here is the problem....

The resulting file ends up with a different value in the places where the hex 'BA' used to be. I've played with
various combinations of encoding, for both reading and
writing, and I'm not able to
maintain the character. I need to maintain this!

In one case, the single-byte hex 'BA' is actually
replaced with two bytes, but everything else in the file is as it should be. In another case, the character is
a "?". I don't remember what happens in other
situations, but in no case is the hex 'BA' maintained.

I don't really understand encoding, so that is only
compounding my frustration and confusion.

Any help is greatly appreciated. I could supply more
details, if necessary.

QM.

.

Jul 21 '05 #4
Hex 'BA' is Dec 186. Looks like the little Degree symbol.

-----Original Message-----
Oh and BTW it seems your BA is represented as....

Hex B = Dec 11 which corresponds to vertical tab mostly added by wordprocessors.
Hex A = Dec 10 which corresponds to line feed.

Bryan Martin
Sp**@ahwayside.com

"Bryan Martin" <sp**@ahwayside.com> wrote in message
news:uJ**************@tk2msftngp13.phx.gbl...
Not for sure here but her we go anyway....

Stop removing the cr and lf's from the stream the A in BA is actually a
lf.
When you replace this your losing your ability to

parse at that position.
cr = Carriage Return
lf = Line Feed

Bryan Martin
Sp**@ahwayside.com

"quincy" <qm*******@yahoo.com> wrote in message
news:2a*****************************@phx.gbl...
> I need help. Please bear with this.
>
> I have a program. It takes in files that are delimited. > The delimiters are declared in the file by looking at
> fixed positions in the file (If you work with ANSI x12 > files, you know what I mean). This normally isn't a
> problem, but I'm getting a file that is using some odd > characters as delimiters.
>
> Specifically, a Hex 'BA' is declared as a delimiter. I > read the file into memory using this...
>
>
> =================================================
> fs = New FileStream(InFile, System.IO.FileMode.Open,
> System.IO.FileAccess.Read)
> sr = New StreamReader(fs, System.Text.Encoding.UTF7)
>
> InputString = sr.ReadToEnd
> =================================================
>
>
> At this point I close the file. In the string, I remove > any carriage control and line feed characters.
>
> Then I write the string to a new file with this.
>
>
> =================================================
> fs2 = New FileStream(OutFile, System.IO.FileMode.Create, > System.IO.FileAccess.Write, IO.FileShare.Write)
> sw = New StreamWriter(fs2, System.Text.Encoding.ASCII) >
> sw.Write(sANSIString)
> =================================================
>
> NOTE :Initially, I don't think my stream writer specified > encoding.
>
> Anyway here is the problem....
>
> The resulting file ends up with a different value in the > places where the hex 'BA' used to be. I've played with > various combinations of encoding, for both reading and > writing, and I'm not able to
> maintain the character. I need to maintain this!
>
> In one case, the single-byte hex 'BA' is actually
> replaced with two bytes, but everything else in the file > is as it should be. In another case, the character is > a "?". I don't remember what happens in other
> situations, but in no case is the hex 'BA' maintained. >
> I don't really understand encoding, so that is only
> compounding my frustration and confusion.
>
> Any help is greatly appreciated. I could supply more
> details, if necessary.
>
> QM.


.

Jul 21 '05 #5
Try setting the code page for the encoder, for example:

Encoding enc = Encoding.GetEncoding(1252);
quincy wrote:
I need help. Please bear with this.

I have a program. It takes in files that are delimited.
The delimiters are declared in the file by looking at
fixed positions in the file (If you work with ANSI x12
files, you know what I mean). This normally isn't a
problem, but I'm getting a file that is using some odd
characters as delimiters.

Specifically, a Hex 'BA' is declared as a delimiter. I
read the file into memory using this...
=================================================
fs = New FileStream(InFile, System.IO.FileMode.Open,
System.IO.FileAccess.Read)
sr = New StreamReader(fs, System.Text.Encoding.UTF7)

InputString = sr.ReadToEnd
=================================================
At this point I close the file. In the string, I remove
any carriage control and line feed characters.

Then I write the string to a new file with this.
=================================================
fs2 = New FileStream(OutFile, System.IO.FileMode.Create,
System.IO.FileAccess.Write, IO.FileShare.Write)
sw = New StreamWriter(fs2, System.Text.Encoding.ASCII)

sw.Write(sANSIString)
=================================================

NOTE :Initially, I don't think my stream writer specified
encoding.

Anyway here is the problem....

The resulting file ends up with a different value in the
places where the hex 'BA' used to be. I've played with
various combinations of encoding, for both reading and
writing, and I'm not able to
maintain the character. I need to maintain this!

In one case, the single-byte hex 'BA' is actually
replaced with two bytes, but everything else in the file
is as it should be. In another case, the character is
a "?". I don't remember what happens in other
situations, but in no case is the hex 'BA' maintained.

I don't really understand encoding, so that is only
compounding my frustration and confusion.

Any help is greatly appreciated. I could supply more
details, if necessary.

QM.


Jul 21 '05 #6
quincy <qm*******@yahoo.com> wrote:
I need help. Please bear with this.

I have a program. It takes in files that are delimited.
The delimiters are declared in the file by looking at
fixed positions in the file (If you work with ANSI x12
files, you know what I mean). This normally isn't a
problem, but I'm getting a file that is using some odd
characters as delimiters.

Specifically, a Hex 'BA' is declared as a delimiter. I
read the file into memory using this...
=================================================
fs = New FileStream(InFile, System.IO.FileMode.Open,
System.IO.FileAccess.Read)
sr = New StreamReader(fs, System.Text.Encoding.UTF7)

InputString = sr.ReadToEnd
=================================================


As I said before, if you use UTF7 any bytes which were 0xba won't be
decoded properly, because UTF7 doesn't have any character which is
encoded to hex 0xba.

Please read the full response I posted when you asked the same question
(without the encoding on the writing side) 5 days ago.

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet
If replying to the group, please do not mail me too
Jul 21 '05 #7

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

1
by: David Thomas | last post by:
Hi there, a while ago, I posted a question regarding reading japanese text from a text file. Well, since I solved the problem, I thought I'd post my solution for the benefit of other people with...
4
by: H Lee | last post by:
Hi, I'm an XML newbie, and not sure if this is the appropriate newsgroup to post my question, so feel free to suggest other newgroups where I should post this message if this is the case. I'm...
8
by: Kai Bohli | last post by:
Hi all ! I've come across a huge problem (for me at least). I'm trying to send some initial graphics to a labelprinter. To do this, I load the graphics from resource and send it directly to...
5
by: Lenard Gunda | last post by:
hi! I have the following problem. I need to read data from a TXT file our company receives. I would use StreamReader, and process it line by line using ReadLine, however, the following problem...
5
by: Ian | last post by:
I am creating an XML file through the XmlTextWriter. This is output to a MemoryStream which I convert a string through a Byte Array. Everything works correctly except for one BIG issue. My XML...
10
by: Nikolay Petrov | last post by:
How can I convert DOS cyrillic text to Unicode
3
by: Flix | last post by:
Hello. What I want to do is simple: correctly reading a text file whose encoding is not known (it can be Ascii,UTF7,UTF8 or Unicode). I'm thinking of something like that: 1) Read the text...
4
by: George | last post by:
Hi, I am puzzled by the following and seeking some assistance to help me understand what happened. I have very limited encoding knowledge. Our SAP system writes out a text file which includes...
11
by: pardesiya | last post by:
Friends, I am having trouble displaying Japanese text within a textbox (or anywhere else) in an aspx page with .net 2.0 framework. Initial default text in Japanese displays perfectly but when I...
29
by: list | last post by:
Hi folks, I am new to Googlegroups. I asked my questions at other forums, since now. I have an important question: I have to check files if they are binary(.bmp, .avi, .jpg) or text(.txt,...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: aa123db | last post by:
Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.