By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
444,018 Members | 1,204 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 444,018 IT Pros & Developers. It's quick & easy.

Writing out text with nulls

P: n/a
I have a program in 2005 that is reading a text file removing text and then
writing it back out again. It removes lines that start with PRINT.

This program has worked fine for months. Now all of a sudden, it is reading
a straight text file and adding a null after each character it reads in.
Why is that?

The original file doesn't have nulls in them. The code is:
********************************************
using System;
using System.IO;
using System.Collections.Generic;
using System.Text;

namespace DeletePrintStatements
{
class Program
{
static void Main(string[] args)
{
string lineDisplay;
string oldLineDisplay;
FileStream fs = null;

StreamReader sr = null;
fs = new FileStream(@"D:\Database
Scripts\CurrentSchema101408.sql", FileMode.Open, System.IO.FileAccess.Read);
sr = new StreamReader(fs);

StreamWriter sw = null;
sw = File.CreateText(@"D:\Database
Scripts\CurrentSchemaNoPrint101408.sql");

string stemp = null;
sw.WriteLine("set nocount on");

while (sr.Peek() >= 0)
{
lineDisplay = sr.ReadLine();

if (lineDisplay.Length >= 4) stemp =
lineDisplay.Substring(0, 4);

if ((lineDisplay.Length < 5) || (lineDisplay.Substring(0, 5)
!= "PRINT"))
sw.WriteLine(lineDisplay);
else
{
// Since last line was not a Print statement make sure
next line is = "GO" and if so ignore it

if (sr.Peek() >= 0)
{
oldLineDisplay = lineDisplay;
lineDisplay = sr.ReadLine();
if ((lineDisplay.Length < 2) ||
(lineDisplay.Substring(0, 2) != "GO"))
{
sw.WriteLine(oldLineDisplay); // Should only be
the "Update Succeeded" line
// or a print
statement inside of a SP
sw.WriteLine(lineDisplay);
}
}
}
Console.WriteLine(lineDisplay);
}
fs.Close();
sr.Close();
sw.Close();
Console.ReadLine();
}
}
}
********************************************

I have tried closing an reopening the program but it keeps doing the same
thing.

Thanks,

Tom
Oct 14 '08 #1
Share this Question
Share on Google+
7 Replies


P: n/a
On Oct 14, 2:55*pm, "tshad" <t...@dslextreme.comwrote:
I have a program in 2005 that is reading a text file removing text and then
writing it back out again. *It removes lines that start with PRINT.

This program has worked fine for months. *Now all of a sudden, it is reading
a straight text file and adding a null after each character it reads in.
Why is that?

The original file doesn't have nulls in them. *The code is:
********************************************
using System;
using System.IO;
using System.Collections.Generic;
using System.Text;

namespace DeletePrintStatements
{
* * class Program
* * {
* * * * static void Main(string[] args)
* * * * {
* * * * * * string lineDisplay;
* * * * * * string oldLineDisplay;
* * * * * * FileStream fs = null;

* * * * * * StreamReader sr = null;
* * * * * * fs = new FileStream(@"D:\Database
Scripts\CurrentSchema101408.sql", FileMode.Open, System.IO.FileAccess.Read);
* * * * * * sr = new StreamReader(fs);

* * * * * * StreamWriter sw = null;
* * * * * * sw = File.CreateText(@"D:\Database
Scripts\CurrentSchemaNoPrint101408.sql");

* * * * * * string stemp = null;
* * * * * * sw.WriteLine("set nocount on");

* * * * * * while (sr.Peek() >= 0)
* * * * * * {
* * * * * * * * lineDisplay = sr.ReadLine();

* * * * * * * * if (lineDisplay.Length >= 4) stemp =
lineDisplay.Substring(0, 4);

* * * * * * * * if ((lineDisplay.Length < 5) || (lineDisplay.Substring(0, 5)
!= "PRINT"))
* * * * * * * * * * sw.WriteLine(lineDisplay);
* * * * * * * * else
* * * * * * * * {
* * * * * * * * * * // Since last line was not a Print statement make sure
next line is = "GO" and if so ignore it

* * * * * * * * * * if (sr.Peek() >= 0)
* * * * * * * * * * {
* * * * * * * * * * * * oldLineDisplay = lineDisplay;
* * * * * * * * * * * * lineDisplay = sr.ReadLine();
* * * * * * * * * * * * if ((lineDisplay.Length <2) ||
(lineDisplay.Substring(0, 2) != "GO"))
* * * * * * * * * * * * {
* * * * * * * * * * * * * * sw.WriteLine(oldLineDisplay); *// Should only be
the "Update Succeeded" line
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * *// or a print
statement inside of a SP
* * * * * * * * * * * * * * sw.WriteLine(lineDisplay);
* * * * * * * * * * * * }
* * * * * * * * * * }
* * * * * * * * }
* * * * * * * * Console.WriteLine(lineDisplay);
* * * * * * }
* * * * * * fs.Close();
* * * * * * sr.Close();
* * * * * * sw.Close();
* * * * * * Console.ReadLine();
* * * * }
* * }}

********************************************

I have tried closing an reopening the program but it keeps doing the same
thing.

Thanks,

Tom
The output you describe is what Unicode characters would look like.
Maybe your project changed from multi-byte to Unicode.
Oct 14 '08 #2

P: n/a
Not sure where your text file is coming from but I've had similar
problems. One of the problems I ran into is there are characters
which Visual Studio (in general) cannot render as well as many text
editors. The way I ended up finding out that my text file was bad was
to put it through programmer's notepad or the command prompt edit
window.

Now to edit these characters out I had to do a string.replace for each
of these characters by using their integer value. Its something like
character values 1 - 26 cannot be rendered by normal text editors.

This may or may not be your problem, but I figured I'd offer at least
an idea.
Oct 14 '08 #3

P: n/a
Here is the file I am reading:

SET NUMERIC_ROUNDABORT OFF
GO
SET ANSI_PADDING, ANSI_WARNINGS, CONCAT_NULL_YIELDS_NULL, ARITHABORT, QUOTED_IDENTIFIER, ANSI_NULLS ON
GO

Here is what it comes up with:

0: 73 65 74 20 6E 6F 63 6F 75 6E 74 20 6F 6E 0D 0A set nocount on..
10: 53 00 45 00 54 00 20 00 4E 00 55 00 4D 00 45 00 S.E.T. ..N.U.M.E.
20: 52 00 49 00 43 00 5F 00 52 00 4F 00 55 00 4E 00 R.I.C._.R.O.U.N.
30: 44 00 41 00 42 00 4F 00 52 00 54 00 20 00 4F 00 D.A.B.O.R.T. .O.
40: 46 00 46 00 0D 0A 00 0D 0A 00 47 00 4F 00 0D 0A F.F.......G.O...
50: 00 0D 0A 00 53 00 45 00 54 00 20 00 41 00 4E 00 .....S.E.T. .A.N.

60: 53 00 49 00 5F 00 50 00 41 00 44 00 44 00 49 00 S.I._.P.A.D.D.I.
70: 4E 00 47 00 2C 00 20 00 41 00 4E 00 53 00 49 00 N.G.,. ..A.N.S.I.
80: 5F 00 57 00 41 00 52 00 4E 00 49 00 4E 00 47 00 _.W.A.R.N.I.N.G.
90: 53 00 2C 00 20 00 43 00 4F 00 4E 00 43 00 41 00 S.,. ..C.O.N.C.A.
A0: 54 00 5F 00 4E 00 55 00 4C 00 4C 00 5F 00 59 00 T._.N.U.L.L._.Y.

B0: 49 00 45 00 4C 00 44 00 53 00 5F 00 4E 00 55 00 I.E.L.D.S._.N.U.
C0: 4C 00 4C 00 2C 00 20 00 41 00 52 00 49 00 54 00 L.L.,. ..A.R.I.T.
D0: 48 00 41 00 42 00 4F 00 52 00 54 00 2C 00 20 00 H.A.B.O.R.T.,. .
E0: 51 00 55 00 4F 00 54 00 45 00 44 00 5F 00 49 00 Q.U.O.T.E.D._.I.
F0: 44 00 45 00 4E 00 54 00 49 00 46 00 49 00 45 00 D.E.N.T.I.F.I.E.
100: 52 00 2C 00 20 00 41 00 4E 00 53 00 49 00 5F 00 R.,. ..A.N.S.I._.

110: 4E 00 55 00 4C 00 4C 00 53 00 20 00 4F 00 4E 00 N.U.L.L.S. .O.N.
120: 0D 0A 00 0D 0A 00 47 00 4F 00 0D 0A 00 0D 0A 00 .......G.O.......
130: 0D 0A ..
As you can see the line that was added (set nocount on) didn't have nulls and the lines it read it does.

What would cause this?

Thanks,

Tom

"tshad" <tf*@dslextreme.comwrote in message news:uD**************@TK2MSFTNGP05.phx.gbl...
>I have a program in 2005 that is reading a text file removing text and then
writing it back out again. It removes lines that start with PRINT.

This program has worked fine for months. Now all of a sudden, it is reading
a straight text file and adding a null after each character it reads in.
Why is that?

The original file doesn't have nulls in them. The code is:
********************************************
using System;
using System.IO;
using System.Collections.Generic;
using System.Text;

namespace DeletePrintStatements
{
class Program
{
static void Main(string[] args)
{
string lineDisplay;
string oldLineDisplay;
FileStream fs = null;

StreamReader sr = null;
fs = new FileStream(@"D:\Database
Scripts\CurrentSchema101408.sql", FileMode.Open, System.IO.FileAccess.Read);
sr = new StreamReader(fs);

StreamWriter sw = null;
sw = File.CreateText(@"D:\Database
Scripts\CurrentSchemaNoPrint101408.sql");

string stemp = null;
sw.WriteLine("set nocount on");

while (sr.Peek() >= 0)
{
lineDisplay = sr.ReadLine();

if (lineDisplay.Length >= 4) stemp =
lineDisplay.Substring(0, 4);

if ((lineDisplay.Length < 5) || (lineDisplay.Substring(0, 5)
!= "PRINT"))
sw.WriteLine(lineDisplay);
else
{
// Since last line was not a Print statement make sure
next line is = "GO" and if so ignore it

if (sr.Peek() >= 0)
{
oldLineDisplay = lineDisplay;
lineDisplay = sr.ReadLine();
if ((lineDisplay.Length < 2) ||
(lineDisplay.Substring(0, 2) != "GO"))
{
sw.WriteLine(oldLineDisplay); // Should only be
the "Update Succeeded" line
// or a print
statement inside of a SP
sw.WriteLine(lineDisplay);
}
}
}
Console.WriteLine(lineDisplay);
}
fs.Close();
sr.Close();
sw.Close();
Console.ReadLine();
}
}
}
********************************************

I have tried closing an reopening the program but it keeps doing the same
thing.

Thanks,

Tom

Oct 14 '08 #4

P: n/a
Please do not post HTML. Use plain text. As for the question...

On Tue, 14 Oct 2008 13:14:59 -0700, tshad <tf*@dslextreme.comwrote:
Here is the file I am reading: [...]
Where did that file come from? As Jim suggested, the text with the 0
bytes do in fact look like Unicode characters (UTF-16 to be specific).
The bytes you posted have mixed UTF-8 and UTF-16 (UTF-8 is the default for
StreamWriter, and as long as the characters are all in the 0-127 range
will be indistinguishable from ASCII), because you're reading UTF-16 data
from the original file and emitted that data as if it were UTF-8 (along
with the other UTF-8 stuff you've added, such as the first line, and the
line breaks).

Whatever the problem is, it's related to whatever outputs the file you're
reading. Somewhere along the line, it apparently got changed to output
UTF-16. You can either fix your program to read the input as UTF-16
instead, or you can go smack upside the head whatever person it was that
changed the output format without consulting the people that would affect
(such as yourself). And then get them to change it back so that they are
writing UTF-8 or ASCII again (whatever it was that was being written in
the first place).

Pete
Oct 14 '08 #5

P: n/a

"Peter Duniho" <Np*********@nnowslpianmk.comwrote in message
news:op***************@petes-computer.local...
Please do not post HTML. Use plain text. As for the question...

On Tue, 14 Oct 2008 13:14:59 -0700, tshad <tf*@dslextreme.comwrote:
>Here is the file I am reading: [...]

Where did that file come from? As Jim suggested, the text with the 0
bytes do in fact look like Unicode characters (UTF-16 to be specific).
The bytes you posted have mixed UTF-8 and UTF-16 (UTF-8 is the default for
StreamWriter, and as long as the characters are all in the 0-127 range
will be indistinguishable from ASCII), because you're reading UTF-16 data
from the original file and emitted that data as if it were UTF-8 (along
with the other UTF-8 stuff you've added, such as the first line, and the
line breaks).

Whatever the problem is, it's related to whatever outputs the file you're
reading. Somewhere along the line, it apparently got changed to output
UTF-16. You can either fix your program to read the input as UTF-16
instead, or you can go smack upside the head whatever person it was that
changed the output format without consulting the people that would affect
(such as yourself). And then get them to change it back so that they are
writing UTF-8 or ASCII again (whatever it was that was being written in
the first place).
Found out what was going on. Just not sure why.

It seems to be written out in unicode (hex shows it that way) but the
program sees it as ansi (utf-8, I assume). And the program handles it fine.

But if I make any change (textpad or notepad) it now shows the each
character as having a blank character between it when it writes it out.
Then when you look at it in Textpad it shows a black box between each
character and Notepad shows a blank between each character.

Not sure why they are different. In both cases, there were nulls between
each character. But the editors treated them different.

Tom

Pete

Oct 14 '08 #6

P: n/a
On Tue, 14 Oct 2008 14:55:14 -0700
"tshad" <tf*@dslextreme.comwrote:
But if I make any change (textpad or notepad) it now shows the each
character as having a blank character between it when it writes it
out. Then when you look at it in Textpad it shows a black box between
each character and Notepad shows a blank between each character.

Not sure why they are different. In both cases, there were nulls
between each character. But the editors treated them different.
The text editor is probably set up to use UTF-16 encoding for
characters. Per MSDN, UTF-16 is the internal encoding used in Windows
and .NET,[1] Java also uses this as well, IIRC. It could be saving the
file in that way if the system configuration has somehow changed to do
that, but I don't know what would be involved in such a thing.

In any case, if you can manage to do it, you should probably try to
detect the character set of the file before processing it, so that your
program can appropriately handle it. UTF-16 is pretty easy to detect
for documents that contain characters which mostly or completely fit in
the ASCII character set, and most ASCII-compatible ones are detectable
if you know their rules; ASCII compatible charsets use 0-127
identically to ASCII. You could, in theory, detect UTF-16 and
compensate for that, and otherwise just read bytes in the range of
33-127, as a (very simple, but not terribly robust) way for dealing
with files that may have an arbitrary charset.

--- Mike

--
My sigfile ran away and is on hiatus.

Oct 14 '08 #7

P: n/a
Your problem seems the file format.

try
sw = new StreamWriter(fs, System.Text.Encoding.UTF8);
with reader you can do the same, try always specify the format your
are readign when it's none binary files
obviously System.Text.Encoding contains other format like ASCII Utf16
and more. choose one and stick with it.

But those are SQL query so they should be using anything else than
ASCII or UTF8. And right now your code seems to read as UTF16
Oct 15 '08 #8

This discussion thread is closed

Replies have been disabled for this discussion.