473,657 Members | 2,496 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Text Encoding...

I need help. Please bear with this.

I have a program. It takes in files that are delimited.
The delimiters are declared in the file by looking at
fixed positions in the file (If you work with ANSI x12
files, you know what I mean). This normally isn't a
problem, but I'm getting a file that is using some odd
characters as delimiters.

Specifically, a Hex 'BA' is declared as a delimiter. I
read the file into memory using this...
=============== =============== =============== ====
fs = New FileStream(InFi le, System.IO.FileM ode.Open,
System.IO.FileA ccess.Read)
sr = New StreamReader(fs , System.Text.Enc oding.UTF7)

InputString = sr.ReadToEnd
=============== =============== =============== ====
At this point I close the file. In the string, I remove
any carriage control and line feed characters.

Then I write the string to a new file with this.
=============== =============== =============== ====
fs2 = New FileStream(OutF ile, System.IO.FileM ode.Create,
System.IO.FileA ccess.Write, IO.FileShare.Wr ite)
sw = New StreamWriter(fs 2, System.Text.Enc oding.ASCII)

sw.Write(sANSIS tring)
=============== =============== =============== ====

NOTE :Initially, I don't think my stream writer specified
encoding.

Anyway here is the problem....

The resulting file ends up with a different value in the
places where the hex 'BA' used to be. I've played with
various combinations of encoding, for both reading and
writing, and I'm not able to
maintain the character. I need to maintain this!

In one case, the single-byte hex 'BA' is actually
replaced with two bytes, but everything else in the file
is as it should be. In another case, the character is
a "?". I don't remember what happens in other
situations, but in no case is the hex 'BA' maintained.

I don't really understand encoding, so that is only
compounding my frustration and confusion.

Any help is greatly appreciated. I could supply more
details, if necessary.

QM.
Jul 21 '05 #1
6 1571
Not for sure here but her we go anyway....

Stop removing the cr and lf's from the stream the A in BA is actually a lf.
When you replace this your losing your ability to parse at that position.

cr = Carriage Return
lf = Line Feed

Bryan Martin
Sp**@ahwayside. com

"quincy" <qm*******@yaho o.com> wrote in message
news:2a******** *************** ******@phx.gbl. ..
I need help. Please bear with this.

I have a program. It takes in files that are delimited.
The delimiters are declared in the file by looking at
fixed positions in the file (If you work with ANSI x12
files, you know what I mean). This normally isn't a
problem, but I'm getting a file that is using some odd
characters as delimiters.

Specifically, a Hex 'BA' is declared as a delimiter. I
read the file into memory using this...
=============== =============== =============== ====
fs = New FileStream(InFi le, System.IO.FileM ode.Open,
System.IO.FileA ccess.Read)
sr = New StreamReader(fs , System.Text.Enc oding.UTF7)

InputString = sr.ReadToEnd
=============== =============== =============== ====
At this point I close the file. In the string, I remove
any carriage control and line feed characters.

Then I write the string to a new file with this.
=============== =============== =============== ====
fs2 = New FileStream(OutF ile, System.IO.FileM ode.Create,
System.IO.FileA ccess.Write, IO.FileShare.Wr ite)
sw = New StreamWriter(fs 2, System.Text.Enc oding.ASCII)

sw.Write(sANSIS tring)
=============== =============== =============== ====

NOTE :Initially, I don't think my stream writer specified
encoding.

Anyway here is the problem....

The resulting file ends up with a different value in the
places where the hex 'BA' used to be. I've played with
various combinations of encoding, for both reading and
writing, and I'm not able to
maintain the character. I need to maintain this!

In one case, the single-byte hex 'BA' is actually
replaced with two bytes, but everything else in the file
is as it should be. In another case, the character is
a "?". I don't remember what happens in other
situations, but in no case is the hex 'BA' maintained.

I don't really understand encoding, so that is only
compounding my frustration and confusion.

Any help is greatly appreciated. I could supply more
details, if necessary.

QM.

Jul 21 '05 #2
Oh and BTW it seems your BA is represented as....

Hex B = Dec 11 which corresponds to vertical tab mostly added by word
processors.
Hex A = Dec 10 which corresponds to line feed.

Bryan Martin
Sp**@ahwayside. com

"Bryan Martin" <sp**@ahwayside .com> wrote in message
news:uJ******** ******@tk2msftn gp13.phx.gbl...
Not for sure here but her we go anyway....

Stop removing the cr and lf's from the stream the A in BA is actually a lf. When you replace this your losing your ability to parse at that position.

cr = Carriage Return
lf = Line Feed

Bryan Martin
Sp**@ahwayside. com

"quincy" <qm*******@yaho o.com> wrote in message
news:2a******** *************** ******@phx.gbl. ..
I need help. Please bear with this.

I have a program. It takes in files that are delimited.
The delimiters are declared in the file by looking at
fixed positions in the file (If you work with ANSI x12
files, you know what I mean). This normally isn't a
problem, but I'm getting a file that is using some odd
characters as delimiters.

Specifically, a Hex 'BA' is declared as a delimiter. I
read the file into memory using this...
=============== =============== =============== ====
fs = New FileStream(InFi le, System.IO.FileM ode.Open,
System.IO.FileA ccess.Read)
sr = New StreamReader(fs , System.Text.Enc oding.UTF7)

InputString = sr.ReadToEnd
=============== =============== =============== ====
At this point I close the file. In the string, I remove
any carriage control and line feed characters.

Then I write the string to a new file with this.
=============== =============== =============== ====
fs2 = New FileStream(OutF ile, System.IO.FileM ode.Create,
System.IO.FileA ccess.Write, IO.FileShare.Wr ite)
sw = New StreamWriter(fs 2, System.Text.Enc oding.ASCII)

sw.Write(sANSIS tring)
=============== =============== =============== ====

NOTE :Initially, I don't think my stream writer specified
encoding.

Anyway here is the problem....

The resulting file ends up with a different value in the
places where the hex 'BA' used to be. I've played with
various combinations of encoding, for both reading and
writing, and I'm not able to
maintain the character. I need to maintain this!

In one case, the single-byte hex 'BA' is actually
replaced with two bytes, but everything else in the file
is as it should be. In another case, the character is
a "?". I don't remember what happens in other
situations, but in no case is the hex 'BA' maintained.

I don't really understand encoding, so that is only
compounding my frustration and confusion.

Any help is greatly appreciated. I could supply more
details, if necessary.

QM.


Jul 21 '05 #3
I wondered about doing that, and I did run the read/write
without removing the crlf.... same problems.

-----Original Message-----
Not for sure here but her we go anyway....

Stop removing the cr and lf's from the stream the A in BA is actually a lf.When you replace this your losing your ability to parse at that position.
cr = Carriage Return
lf = Line Feed

Bryan Martin
Sp**@ahwayside .com

"quincy" <qm*******@yaho o.com> wrote in message
news:2a******* *************** *******@phx.gbl ...
I need help. Please bear with this.

I have a program. It takes in files that are delimited. The delimiters are declared in the file by looking at
fixed positions in the file (If you work with ANSI x12
files, you know what I mean). This normally isn't a
problem, but I'm getting a file that is using some odd
characters as delimiters.

Specifically, a Hex 'BA' is declared as a delimiter. I
read the file into memory using this...
=============== =============== =============== ====
fs = New FileStream(InFi le, System.IO.FileM ode.Open,
System.IO.FileA ccess.Read)
sr = New StreamReader(fs , System.Text.Enc oding.UTF7)

InputString = sr.ReadToEnd
=============== =============== =============== ====
At this point I close the file. In the string, I remove any carriage control and line feed characters.

Then I write the string to a new file with this.
=============== =============== =============== ====
fs2 = New FileStream(OutF ile, System.IO.FileM ode.Create, System.IO.FileA ccess.Write, IO.FileShare.Wr ite)
sw = New StreamWriter(fs 2, System.Text.Enc oding.ASCII)

sw.Write(sANSIS tring)
=============== =============== =============== ====

NOTE :Initially, I don't think my stream writer specified encoding.

Anyway here is the problem....

The resulting file ends up with a different value in the places where the hex 'BA' used to be. I've played with
various combinations of encoding, for both reading and
writing, and I'm not able to
maintain the character. I need to maintain this!

In one case, the single-byte hex 'BA' is actually
replaced with two bytes, but everything else in the file is as it should be. In another case, the character is
a "?". I don't remember what happens in other
situations, but in no case is the hex 'BA' maintained.

I don't really understand encoding, so that is only
compounding my frustration and confusion.

Any help is greatly appreciated. I could supply more
details, if necessary.

QM.

.

Jul 21 '05 #4
Hex 'BA' is Dec 186. Looks like the little Degree symbol.

-----Original Message-----
Oh and BTW it seems your BA is represented as....

Hex B = Dec 11 which corresponds to vertical tab mostly added by wordprocessors.
Hex A = Dec 10 which corresponds to line feed.

Bryan Martin
Sp**@ahwayside .com

"Bryan Martin" <sp**@ahwayside .com> wrote in message
news:uJ******* *******@tk2msft ngp13.phx.gbl.. .
Not for sure here but her we go anyway....

Stop removing the cr and lf's from the stream the A in BA is actually a
lf.
When you replace this your losing your ability to

parse at that position.
cr = Carriage Return
lf = Line Feed

Bryan Martin
Sp**@ahwayside. com

"quincy" <qm*******@yaho o.com> wrote in message
news:2a******** *************** ******@phx.gbl. ..
> I need help. Please bear with this.
>
> I have a program. It takes in files that are delimited. > The delimiters are declared in the file by looking at
> fixed positions in the file (If you work with ANSI x12 > files, you know what I mean). This normally isn't a
> problem, but I'm getting a file that is using some odd > characters as delimiters.
>
> Specifically, a Hex 'BA' is declared as a delimiter. I > read the file into memory using this...
>
>
> =============== =============== =============== ====
> fs = New FileStream(InFi le, System.IO.FileM ode.Open,
> System.IO.FileA ccess.Read)
> sr = New StreamReader(fs , System.Text.Enc oding.UTF7)
>
> InputString = sr.ReadToEnd
> =============== =============== =============== ====
>
>
> At this point I close the file. In the string, I remove > any carriage control and line feed characters.
>
> Then I write the string to a new file with this.
>
>
> =============== =============== =============== ====
> fs2 = New FileStream(OutF ile, System.IO.FileM ode.Create, > System.IO.FileA ccess.Write, IO.FileShare.Wr ite)
> sw = New StreamWriter(fs 2, System.Text.Enc oding.ASCII) >
> sw.Write(sANSIS tring)
> =============== =============== =============== ====
>
> NOTE :Initially, I don't think my stream writer specified > encoding.
>
> Anyway here is the problem....
>
> The resulting file ends up with a different value in the > places where the hex 'BA' used to be. I've played with > various combinations of encoding, for both reading and > writing, and I'm not able to
> maintain the character. I need to maintain this!
>
> In one case, the single-byte hex 'BA' is actually
> replaced with two bytes, but everything else in the file > is as it should be. In another case, the character is > a "?". I don't remember what happens in other
> situations, but in no case is the hex 'BA' maintained. >
> I don't really understand encoding, so that is only
> compounding my frustration and confusion.
>
> Any help is greatly appreciated. I could supply more
> details, if necessary.
>
> QM.


.

Jul 21 '05 #5
Try setting the code page for the encoder, for example:

Encoding enc = Encoding.GetEnc oding(1252);
quincy wrote:
I need help. Please bear with this.

I have a program. It takes in files that are delimited.
The delimiters are declared in the file by looking at
fixed positions in the file (If you work with ANSI x12
files, you know what I mean). This normally isn't a
problem, but I'm getting a file that is using some odd
characters as delimiters.

Specifically, a Hex 'BA' is declared as a delimiter. I
read the file into memory using this...
=============== =============== =============== ====
fs = New FileStream(InFi le, System.IO.FileM ode.Open,
System.IO.FileA ccess.Read)
sr = New StreamReader(fs , System.Text.Enc oding.UTF7)

InputString = sr.ReadToEnd
=============== =============== =============== ====
At this point I close the file. In the string, I remove
any carriage control and line feed characters.

Then I write the string to a new file with this.
=============== =============== =============== ====
fs2 = New FileStream(OutF ile, System.IO.FileM ode.Create,
System.IO.FileA ccess.Write, IO.FileShare.Wr ite)
sw = New StreamWriter(fs 2, System.Text.Enc oding.ASCII)

sw.Write(sANSIS tring)
=============== =============== =============== ====

NOTE :Initially, I don't think my stream writer specified
encoding.

Anyway here is the problem....

The resulting file ends up with a different value in the
places where the hex 'BA' used to be. I've played with
various combinations of encoding, for both reading and
writing, and I'm not able to
maintain the character. I need to maintain this!

In one case, the single-byte hex 'BA' is actually
replaced with two bytes, but everything else in the file
is as it should be. In another case, the character is
a "?". I don't remember what happens in other
situations, but in no case is the hex 'BA' maintained.

I don't really understand encoding, so that is only
compounding my frustration and confusion.

Any help is greatly appreciated. I could supply more
details, if necessary.

QM.


Jul 21 '05 #6
quincy <qm*******@yaho o.com> wrote:
I need help. Please bear with this.

I have a program. It takes in files that are delimited.
The delimiters are declared in the file by looking at
fixed positions in the file (If you work with ANSI x12
files, you know what I mean). This normally isn't a
problem, but I'm getting a file that is using some odd
characters as delimiters.

Specifically, a Hex 'BA' is declared as a delimiter. I
read the file into memory using this...
=============== =============== =============== ====
fs = New FileStream(InFi le, System.IO.FileM ode.Open,
System.IO.FileA ccess.Read)
sr = New StreamReader(fs , System.Text.Enc oding.UTF7)

InputString = sr.ReadToEnd
=============== =============== =============== ====


As I said before, if you use UTF7 any bytes which were 0xba won't be
decoded properly, because UTF7 doesn't have any character which is
encoded to hex 0xba.

Please read the full response I posted when you asked the same question
(without the encoding on the writing side) 5 days ago.

--
Jon Skeet - <sk***@pobox.co m>
http://www.pobox.com/~skeet
If replying to the group, please do not mail me too
Jul 21 '05 #7

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

1
9029
by: David Thomas | last post by:
Hi there, a while ago, I posted a question regarding reading japanese text from a text file. Well, since I solved the problem, I thought I'd post my solution for the benefit of other people with the same problem. The plan was to make a script to read and display japanese text. I will use it for making a japanese proverb script and for a japanese language study script.
4
11514
by: H Lee | last post by:
Hi, I'm an XML newbie, and not sure if this is the appropriate newsgroup to post my question, so feel free to suggest other newgroups where I should post this message if this is the case. I'm having issues using XmlTextWriter, saving it out to a file with UTF8 encoding, and seeing "dirty", or "human unreadable" characters show up *right before* the XML declaration. I need to have the XML declaration state "encoding = utf-8", but also...
8
3483
by: Kai Bohli | last post by:
Hi all ! I've come across a huge problem (for me at least). I'm trying to send some initial graphics to a labelprinter. To do this, I load the graphics from resource and send it directly to the printerport along with "printer instructions". The problem is that the printer instruction have to be "plain text" while the image has to be binary. Something like this:
5
15036
by: Lenard Gunda | last post by:
hi! I have the following problem. I need to read data from a TXT file our company receives. I would use StreamReader, and process it line by line using ReadLine, however, the following problem occurs. The file contains characters with ASCII codes above 128. But the file is still text (nothing like UTF7/8 or the like). It also might contain + signs. As a result:
5
2316
by: Ian | last post by:
I am creating an XML file through the XmlTextWriter. This is output to a MemoryStream which I convert a string through a Byte Array. Everything works correctly except for one BIG issue. My XML file is being truncated somewhere in the process. Large XML files give a truncated result, and small ones rsult in a Byte.Length = 0 . I assume the it is getting stuck in a buffer?? I tried Fluch() on the MemStream and Base Stream without success....
10
8049
by: Nikolay Petrov | last post by:
How can I convert DOS cyrillic text to Unicode
3
5822
by: Flix | last post by:
Hello. What I want to do is simple: correctly reading a text file whose encoding is not known (it can be Ascii,UTF7,UTF8 or Unicode). I'm thinking of something like that: 1) Read the text as Ascii: string text="";
4
34149
by: George | last post by:
Hi, I am puzzled by the following and seeking some assistance to help me understand what happened. I have very limited encoding knowledge. Our SAP system writes out a text file which includes German characters. 1. When I use StreamReader(System.String filepath) without specifying an encoding method, the German characters such as Ä are lost when I do a ReadLine()
11
3765
by: pardesiya | last post by:
Friends, I am having trouble displaying Japanese text within a textbox (or anywhere else) in an aspx page with .net 2.0 framework. Initial default text in Japanese displays perfectly but when I attempt to change the text following a button-click event, it displays as junk. I have tried setting the globalization tag in the web.config file but that does not help eiter. <globalization requestEncoding="UTF-8" responseEncoding="UTF-8"
29
4855
by: list | last post by:
Hi folks, I am new to Googlegroups. I asked my questions at other forums, since now. I have an important question: I have to check files if they are binary(.bmp, .avi, .jpg) or text(.txt, .cpp, .h, .php, .html). How to check a file an find out if the file is binary or text? Thanks for your help.
0
8425
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
8845
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
0
8743
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
1
8522
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
7355
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
0
5647
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
4173
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
0
4333
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
2
1736
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.