By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
457,939 Members | 1,541 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 457,939 IT Pros & Developers. It's quick & easy.

Array Problem When Index Value is Nothing

P: n/a
Hi all,

I ran into memory problems while tying to search and replace a very large
text file. To solve this I break the file up into chunks and run the search
and replace on each chunk. This works fine and has solved the OutOfMemory
problem.

However, on the last loop when the array c is written to CleanTMX, a number
of 0x00 characters are written at the end of the file. This causes problems
in a further XMLTransformation as this character is not allowed in XML. I
looked at the values of the index. Th eproblem seems to be caused by index
values at the end of the array being set to Nothing.

Question: How can I get rid of these characters? Or how can I reduce the
array to only contain index values that are not Nothing?

Here's the code that writes the CleanTMX file:

Dim c(My.Settings.ReadChunkSize) As Char 'ReadChunkSize is a user-defined
setting, normally set to 10000

Using sr As StreamReader = New StreamReader(OriginalTMX,
System.Text.Encoding.UTF8, True)
Do While sr.Peek() >= 0
sr.Read(c, 0, c.Length)
Dim i As Integer
For i = 0 To arrFind.Length - 1
c = Regex.Replace(c, arrFind(i), arrReplace(i))
Next
Try
Using sw As StreamWriter = New StreamWriter(CleanTMX, True,
System.Text.Encoding.UTF8)
sw.Write(c)
End Using

Catch ex As Exception
End Try
Loop

Would really appreciate any help on this one.

Thanx

Rob
Jun 27 '08 #1
Share this Question
Share on Google+
3 Replies


P: n/a
"Robert Bevington" <rb********@freenet.deschrieb
Hi all,

I ran into memory problems while tying to search and replace a very
large text file. To solve this I break the file up into chunks and
run the search and replace on each chunk. This works fine and has
solved the OutOfMemory problem.

However, on the last loop when the array c is written to CleanTMX, a
number of 0x00 characters are written at the end of the file. This
causes problems in a further XMLTransformation as this character is
not allowed in XML. I looked at the values of the index. Th eproblem
seems to be caused by index values at the end of the array being set
to Nothing.

Question: How can I get rid of these characters? Or how can I reduce
the array to only contain index values that are not Nothing?

Here's the code that writes the CleanTMX file:

Dim c(My.Settings.ReadChunkSize) As Char 'ReadChunkSize is a
user-defined setting, normally set to 10000

Using sr As StreamReader = New StreamReader(OriginalTMX,
System.Text.Encoding.UTF8, True)
Do While sr.Peek() >= 0
sr.Read(c, 0, c.Length)
Dim i As Integer
For i = 0 To arrFind.Length - 1
c = Regex.Replace(c, arrFind(i), arrReplace(i))
Next
Try
Using sw As StreamWriter = New StreamWriter(CleanTMX, True,
System.Text.Encoding.UTF8)
sw.Write(c)
End Using

Catch ex As Exception
End Try
Loop

Would really appreciate any help on this one.
I'm not sure if it's correct in this context, but I think sr.Read
returns the number of characters read. Hence, you have to write only as
many characters as have been read.

dim CharCount as integer

charcount = sr.read(c, 0, c.length)
...
sw.write(c, 0, charcount)

I think this explains the additional characters.

However, you should reposition the file pointer after reading a chunk.
I'm not sure if that's possible using the StreamReader because of the
internal buffer, so you'd have to use a BinaryReader and do the UTF8
decoding on your own, while being able to set the file pointer
backwards. Otherwise, you will not recognize search strings that are
split across chunks boundaries. For example,

chunk #1: "Robert B"
chunk #2: "evington"

You don't find "Bev" in any of the chunks.
Armin

Jun 27 '08 #2

P: n/a
Can't you just REDIM PRESERVE to reduce the array size to get rid of the 0x00
entries?

Armin is correct that you'll miss entries on chunk boundaries, BTW. One
solution is to use the 'c' array as a buffer, appending newly read characters
to the end, taking off characters to the output stream from the beginning,
and always leaving at least n characters in 'c', where n=length of the
biggest string you are looking for (minus one).

--
David Streeter
Synchrotech Software
Sydney Australia
"Robert Bevington" wrote:
Hi all,

I ran into memory problems while tying to search and replace a very large
text file. To solve this I break the file up into chunks and run the search
and replace on each chunk. This works fine and has solved the OutOfMemory
problem.

However, on the last loop when the array c is written to CleanTMX, a number
of 0x00 characters are written at the end of the file. This causes problems
in a further XMLTransformation as this character is not allowed in XML. I
looked at the values of the index. Th eproblem seems to be caused by index
values at the end of the array being set to Nothing.

Question: How can I get rid of these characters? Or how can I reduce the
array to only contain index values that are not Nothing?

Here's the code that writes the CleanTMX file:

Dim c(My.Settings.ReadChunkSize) As Char 'ReadChunkSize is a user-defined
setting, normally set to 10000

Using sr As StreamReader = New StreamReader(OriginalTMX,
System.Text.Encoding.UTF8, True)
Do While sr.Peek() >= 0
sr.Read(c, 0, c.Length)
Dim i As Integer
For i = 0 To arrFind.Length - 1
c = Regex.Replace(c, arrFind(i), arrReplace(i))
Next
Try
Using sw As StreamWriter = New StreamWriter(CleanTMX, True,
System.Text.Encoding.UTF8)
sw.Write(c)
End Using

Catch ex As Exception
End Try
Loop

Would really appreciate any help on this one.

Thanx

Rob
Jun 27 '08 #3

P: n/a
Hi Armin and Surtur,

thanx guys for your replies. Having read that my "great" solution to my
problem didn't really work was a real downer for me :-) I wasa broken man
last night and went straight to bed :-) But that's what happens when
beginners start programming I suppose.

I tried the Redim Preserve. That might solve the one problem. I just need to
find the correct value for the redim.

Surtur's solution sounds interesting too. I'll look into to both.

Again thanx

Rob
"SurturZ" <su*****@newsgroup.nospamschrieb im Newsbeitrag
news:43**********************************@microsof t.com...
Can't you just REDIM PRESERVE to reduce the array size to get rid of the
0x00
entries?

Armin is correct that you'll miss entries on chunk boundaries, BTW. One
solution is to use the 'c' array as a buffer, appending newly read
characters
to the end, taking off characters to the output stream from the beginning,
and always leaving at least n characters in 'c', where n=length of the
biggest string you are looking for (minus one).

--
David Streeter
Synchrotech Software
Sydney Australia
"Robert Bevington" wrote:
>Hi all,

I ran into memory problems while tying to search and replace a very large
text file. To solve this I break the file up into chunks and run the
search
and replace on each chunk. This works fine and has solved the OutOfMemory
problem.

However, on the last loop when the array c is written to CleanTMX, a
number
of 0x00 characters are written at the end of the file. This causes
problems
in a further XMLTransformation as this character is not allowed in XML. I
looked at the values of the index. Th eproblem seems to be caused by
index
values at the end of the array being set to Nothing.

Question: How can I get rid of these characters? Or how can I reduce the
array to only contain index values that are not Nothing?

Here's the code that writes the CleanTMX file:

Dim c(My.Settings.ReadChunkSize) As Char 'ReadChunkSize is a user-defined
setting, normally set to 10000

Using sr As StreamReader = New StreamReader(OriginalTMX,
System.Text.Encoding.UTF8, True)
Do While sr.Peek() >= 0
sr.Read(c, 0, c.Length)
Dim i As Integer
For i = 0 To arrFind.Length - 1
c = Regex.Replace(c, arrFind(i), arrReplace(i))
Next
Try
Using sw As StreamWriter = New StreamWriter(CleanTMX, True,
System.Text.Encoding.UTF8)
sw.Write(c)
End Using

Catch ex As Exception
End Try
Loop

Would really appreciate any help on this one.

Thanx

Rob

Jun 27 '08 #4

This discussion thread is closed

Replies have been disabled for this discussion.