By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
443,918 Members | 1,852 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 443,918 IT Pros & Developers. It's quick & easy.

extracting the text out of a binary file

P: n/a
RSH
Hi,

I have quite a few .DAT data files that i need to extract the data out of.
When i open the files in a text editor I see all of the text that I need to
get at BUT there are a lot of junk (binary?) characters and white space in
non logical formatting positions.

Here is a small sample of what the data looks like:

0~ 0501101010512505011132451235 >    
  X 3> B
 

MICHAEL B SMITH, DVM PC 123 MAIN ST., STE 600 DALLAS, TX 75252
MICHAEL SMITH 
s ' 'MICHAEL B NORTON, DVM PC 123 MAIN ST., STE 600 DALLAS, TX
75252
ZkM   B
< ' '
 ' 
'  ' '  ' '
B
 -D   '
B
Is it possible to filter out all the junk characters and just get at the
text? If so could you provide a sample of how?

I tried doing a string replace but the number of characters in the array
were getting way out of control. I was wondering if the binaryreader class
might be what I'm looking for but I have no idea on what that really does.

Thanks!
RSH
Nov 21 '05 #1
Share this Question
Share on Google+
6 Replies


P: n/a
You can filter the string to just accept range of the numbers and letters.
ASCII table: http://www.lookuptables.com/
But why do you read binary files using text file methods?
--
Saber S.
"RSH" <wa*************@yahoo.com> wrote in message
news:Oy**************@TK2MSFTNGP12.phx.gbl...
Hi,

I have quite a few .DAT data files that i need to extract the data out of.
When i open the files in a text editor I see all of the text that I need
to get at BUT there are a lot of junk (binary?) characters and white space
in non logical formatting positions.

Here is a small sample of what the data looks like:

0~ 0501101010512505011132451235 >      
X 3> B
 

MICHAEL B SMITH, DVM PC 123 MAIN ST., STE 600 DALLAS, TX 75252
MICHAEL SMITH 
s ' 'MICHAEL B NORTON, DVM PC 123 MAIN ST., STE 600 DALLAS, TX
75252 ZkM  
B
< ' '
 ' 
'  ' '  ' ' B
 -D   ' B
Is it possible to filter out all the junk characters and just get at the
text? If so could you provide a sample of how?

I tried doing a string replace but the number of characters in the array
were getting way out of control. I was wondering if the binaryreader
class might be what I'm looking for but I have no idea on what that really
does.

Thanks!
RSH

Nov 21 '05 #2

P: n/a
RSH
Primariy i just need to get at the data...which is all there but
unfortunately there are thousands of useless characters with the text. i
tried Regex to strip the useless characters but everything disappears when i
try that which leads me to believe it is an encoding mismatch. If you look
at my example of the text file below you will see what is happening. The
files are from an old Microfocus ISAM database which no simple database
viewer exists. many products are out there but they are several thousand
dollars, which is way out of range for us.

"Saber" <saber[.AT.]oxin.ir> wrote in message
news:ex**************@TK2MSFTNGP10.phx.gbl...
You can filter the string to just accept range of the numbers and letters.
ASCII table: http://www.lookuptables.com/
But why do you read binary files using text file methods?
--
Saber S.
"RSH" <wa*************@yahoo.com> wrote in message
news:Oy**************@TK2MSFTNGP12.phx.gbl...
Hi,

I have quite a few .DAT data files that i need to extract the data out
of. When i open the files in a text editor I see all of the text that I
need to get at BUT there are a lot of junk (binary?) characters and white
space in non logical formatting positions.

Here is a small sample of what the data looks like:

0~ 0501101010512505011132451235 >       X 3>
B
 

MICHAEL B SMITH, DVM PC 123 MAIN ST., STE 600 DALLAS, TX 75252
MICHAEL SMITH 
s ' 'MICHAEL B NORTON, DVM PC 123 MAIN ST., STE 600 DALLAS, TX
75252 ZkM   B
< ' '
 ' 
'  ' '  ' ' B
 -D   ' B
Is it possible to filter out all the junk characters and just get at the
text? If so could you provide a sample of how?

I tried doing a string replace but the number of characters in the array
were getting way out of control. I was wondering if the binaryreader
class might be what I'm looking for but I have no idea on what that
really does.

Thanks!
RSH


Nov 21 '05 #3

P: n/a
I mean using *range*, for example imagine you've stored whole data in a
string file named strOld and you want to exclude ascii charachter instead of
numbers
and letters:

Dim newStr As String
Dim i As Integer
For i = 0 To oldStr.Length - 1
If oldStr.Chars(i) >= Chr(48) AndAlso oldStr.Chars(i) <= Chr(172) Then
newStr += oldStr.Chars(i)
End If
Next

You can play with if-statement to get your desired result.
--
Saber S.

"RSH" <wa*************@yahoo.com> wrote in message
news:eM**************@tk2msftngp13.phx.gbl...
Primariy i just need to get at the data...which is all there but
unfortunately there are thousands of useless characters with the text. i
tried Regex to strip the useless characters but everything disappears when
i try that which leads me to believe it is an encoding mismatch. If you
look at my example of the text file below you will see what is happening.
The files are from an old Microfocus ISAM database which no simple
database viewer exists. many products are out there but they are several
thousand dollars, which is way out of range for us.

"Saber" <saber[.AT.]oxin.ir> wrote in message
news:ex**************@TK2MSFTNGP10.phx.gbl...
You can filter the string to just accept range of the numbers and
letters.
ASCII table: http://www.lookuptables.com/
But why do you read binary files using text file methods?
--
Saber S.
"RSH" <wa*************@yahoo.com> wrote in message
news:Oy**************@TK2MSFTNGP12.phx.gbl...
Hi,

I have quite a few .DAT data files that i need to extract the data out
of. When i open the files in a text editor I see all of the text that I
need to get at BUT there are a lot of junk (binary?) characters and
white space in non logical formatting positions.

Here is a small sample of what the data looks like:

0~ 0501101010512505011132451235 >       X 3>
B
 

MICHAEL B SMITH, DVM PC 123 MAIN ST., STE 600 DALLAS, TX 75252
MICHAEL SMITH 
s ' 'MICHAEL B NORTON, DVM PC 123 MAIN ST., STE 600 DALLAS,
TX 75252 ZkM   B
< ' '
 ' 
'  ' '  ' ' B
 -D   ' B
Is it possible to filter out all the junk characters and just get at the
text? If so could you provide a sample of how?

I tried doing a string replace but the number of characters in the array
were getting way out of control. I was wondering if the binaryreader
class might be what I'm looking for but I have no idea on what that
really does.

Thanks!
RSH



Nov 21 '05 #4

P: n/a
RSH
Oh thats good...that worked BUT...is there anyway to keep single instances
of spaces...in other words if there are two or more spaces next to each
other they can be eliminated...and i would like to keep the linefeeds too.

Thanks alot!
"Saber" <saber[.AT.]oxin.ir> wrote in message
news:Oc**************@TK2MSFTNGP14.phx.gbl...
I mean using *range*, for example imagine you've stored whole data in a
string file named strOld and you want to exclude ascii charachter instead
of numbers
and letters:

Dim newStr As String
Dim i As Integer
For i = 0 To oldStr.Length - 1
If oldStr.Chars(i) >= Chr(48) AndAlso oldStr.Chars(i) <= Chr(172) Then
newStr += oldStr.Chars(i)
End If
Next

You can play with if-statement to get your desired result.
--
Saber S.

"RSH" <wa*************@yahoo.com> wrote in message
news:eM**************@tk2msftngp13.phx.gbl...
Primariy i just need to get at the data...which is all there but
unfortunately there are thousands of useless characters with the text. i
tried Regex to strip the useless characters but everything disappears
when i try that which leads me to believe it is an encoding mismatch. If
you look at my example of the text file below you will see what is
happening. The files are from an old Microfocus ISAM database which no
simple database viewer exists. many products are out there but they are
several thousand dollars, which is way out of range for us.

"Saber" <saber[.AT.]oxin.ir> wrote in message
news:ex**************@TK2MSFTNGP10.phx.gbl...
You can filter the string to just accept range of the numbers and
letters.
ASCII table: http://www.lookuptables.com/
But why do you read binary files using text file methods?
--
Saber S.
"RSH" <wa*************@yahoo.com> wrote in message
news:Oy**************@TK2MSFTNGP12.phx.gbl...
Hi,

I have quite a few .DAT data files that i need to extract the data out
of. When i open the files in a text editor I see all of the text that I
need to get at BUT there are a lot of junk (binary?) characters and
white space in non logical formatting positions.

Here is a small sample of what the data looks like:

0~ 0501101010512505011132451235 >       X 3>
B
 

MICHAEL B SMITH, DVM PC 123 MAIN ST., STE 600 DALLAS, TX 75252
MICHAEL SMITH 
s ' 'MICHAEL B NORTON, DVM PC 123 MAIN ST., STE 600 DALLAS,
TX 75252 ZkM   B
< ' '
 ' 
'  ' '  ' ' B
 -D   ' B
Is it possible to filter out all the junk characters and just get at
the text? If so could you provide a sample of how?

I tried doing a string replace but the number of characters in the
array were getting way out of control. I was wondering if the
binaryreader class might be what I'm looking for but I have no idea on
what that really does.

Thanks!
RSH



Nov 21 '05 #5

P: n/a
I found a tricky way about your "spacing" problem,
but maybe there are also better ways:

For i = 0 To oldStr.Length - 1
If oldStr.Chars(i) >= Chr(32) AndAlso oldStr.Chars(i) <= Chr(172) Then
If Not (oldStr.Substring(i).StartsWith(Chr(32)) AndAlso _
oldStr.Chars(i + 1) = Chr(32)) Then newStr += oldStr.Chars(i)
End If
Next
About LFs, I'm not sure what to do, please send me a sample
of your files (if they are less than 1 MB!) and the code you use to read
those files.
--
Saber S.
"RSH" <wa*************@yahoo.com> wrote in message
news:OR**************@TK2MSFTNGP15.phx.gbl...
Oh thats good...that worked BUT...is there anyway to keep single instances
of spaces...in other words if there are two or more spaces next to each
other they can be eliminated...and i would like to keep the linefeeds too.

Thanks alot!
"Saber" <saber[.AT.]oxin.ir> wrote in message
news:Oc**************@TK2MSFTNGP14.phx.gbl...
I mean using *range*, for example imagine you've stored whole data in a
string file named strOld and you want to exclude ascii charachter instead
of numbers
and letters:

Dim newStr As String
Dim i As Integer
For i = 0 To oldStr.Length - 1
If oldStr.Chars(i) >= Chr(48) AndAlso oldStr.Chars(i) <= Chr(172) Then
newStr += oldStr.Chars(i)
End If
Next

You can play with if-statement to get your desired result.
--
Saber S.

"RSH" <wa*************@yahoo.com> wrote in message
news:eM**************@tk2msftngp13.phx.gbl...
Primariy i just need to get at the data...which is all there but
unfortunately there are thousands of useless characters with the text.
i tried Regex to strip the useless characters but everything disappears
when i try that which leads me to believe it is an encoding mismatch.
If you look at my example of the text file below you will see what is
happening. The files are from an old Microfocus ISAM database which no
simple database viewer exists. many products are out there but they are
several thousand dollars, which is way out of range for us.

"Saber" <saber[.AT.]oxin.ir> wrote in message
news:ex**************@TK2MSFTNGP10.phx.gbl...
You can filter the string to just accept range of the numbers and
letters.
ASCII table: http://www.lookuptables.com/
But why do you read binary files using text file methods?
--
Saber S.
"RSH" <wa*************@yahoo.com> wrote in message
news:Oy**************@TK2MSFTNGP12.phx.gbl...
> Hi,
>
> I have quite a few .DAT data files that i need to extract the data out
> of. When i open the files in a text editor I see all of the text that
> I need to get at BUT there are a lot of junk (binary?) characters and
> white space in non logical formatting positions.
>
> Here is a small sample of what the data looks like:
>
> 0~ 0501101010512505011132451235 >       X
> 3> B
>  
>
> MICHAEL B SMITH, DVM PC 123 MAIN ST., STE 600 DALLAS, TX
> 75252 MICHAEL SMITH 
> s ' 'MICHAEL B NORTON, DVM PC 123 MAIN ST., STE 600 DALLAS,
> TX 75252 ZkM   B
> < ' '
>  ' 
> '  ' '  ' ' B
>  -D   ' B
>
>
> Is it possible to filter out all the junk characters and just get at
> the text? If so could you provide a sample of how?
>
> I tried doing a string replace but the number of characters in the
> array were getting way out of control. I was wondering if the
> binaryreader class might be what I'm looking for but I have no idea on
> what that really does.
>
> Thanks!
> RSH
>



Nov 21 '05 #6

P: n/a
a better loop is:
For i = 0 To oldStr.Length - 2
If oldStr.Chars(i) >= Chr(32) AndAlso oldStr.Chars(i) <= Chr(172) Then
If Not (oldStr.Substring(i, 2) = " ") Then newStr += oldStr.Chars(i) '*
End If
Next

* There are 2 space characters in double qutations ( ...(i,2)=" ")

--
Saber S.
"Saber" <saber[.AT.]oxin.ir> wrote in message
news:%2****************@TK2MSFTNGP09.phx.gbl...
I found a tricky way about your "spacing" problem,
but maybe there are also better ways:

For i = 0 To oldStr.Length - 1
If oldStr.Chars(i) >= Chr(32) AndAlso oldStr.Chars(i) <= Chr(172) Then
If Not (oldStr.Substring(i).StartsWith(Chr(32)) AndAlso _
oldStr.Chars(i + 1) = Chr(32)) Then newStr += oldStr.Chars(i)
End If
Next
About LFs, I'm not sure what to do, please send me a sample
of your files (if they are less than 1 MB!) and the code you use to read
those files.
--
Saber S.
"RSH" <wa*************@yahoo.com> wrote in message
news:OR**************@TK2MSFTNGP15.phx.gbl...
Oh thats good...that worked BUT...is there anyway to keep single
instances of spaces...in other words if there are two or more spaces next
to each other they can be eliminated...and i would like to keep the
linefeeds too.

Thanks alot!
"Saber" <saber[.AT.]oxin.ir> wrote in message
news:Oc**************@TK2MSFTNGP14.phx.gbl...
I mean using *range*, for example imagine you've stored whole data in a
string file named strOld and you want to exclude ascii charachter
instead of numbers
and letters:

Dim newStr As String
Dim i As Integer
For i = 0 To oldStr.Length - 1
If oldStr.Chars(i) >= Chr(48) AndAlso oldStr.Chars(i) <= Chr(172) Then
newStr += oldStr.Chars(i)
End If
Next

You can play with if-statement to get your desired result.
--
Saber S.

"RSH" <wa*************@yahoo.com> wrote in message
news:eM**************@tk2msftngp13.phx.gbl...
Primariy i just need to get at the data...which is all there but
unfortunately there are thousands of useless characters with the text.
i tried Regex to strip the useless characters but everything disappears
when i try that which leads me to believe it is an encoding mismatch.
If you look at my example of the text file below you will see what is
happening. The files are from an old Microfocus ISAM database which no
simple database viewer exists. many products are out there but they
are several thousand dollars, which is way out of range for us.

"Saber" <saber[.AT.]oxin.ir> wrote in message
news:ex**************@TK2MSFTNGP10.phx.gbl...
> You can filter the string to just accept range of the numbers and
> letters.
> ASCII table: http://www.lookuptables.com/
> But why do you read binary files using text file methods?
>
>
> --
> Saber S.
> "RSH" <wa*************@yahoo.com> wrote in message
> news:Oy**************@TK2MSFTNGP12.phx.gbl...
>> Hi,
>>
>> I have quite a few .DAT data files that i need to extract the data
>> out of. When i open the files in a text editor I see all of the text
>> that I need to get at BUT there are a lot of junk (binary?)
>> characters and white space in non logical formatting positions.
>>
>> Here is a small sample of what the data looks like:
>>
>> 0~ 0501101010512505011132451235 >       X
>> 3> B
>>  
>>
>> MICHAEL B SMITH, DVM PC 123 MAIN ST., STE 600 DALLAS, TX
>> 75252 MICHAEL SMITH 
>> s ' 'MICHAEL B NORTON, DVM PC 123 MAIN ST., STE 600
>> DALLAS, TX 75252 ZkM   B
>> < ' '
>>  ' 
>> '  ' '  ' ' B
>>  -D   ' B
>>
>>
>> Is it possible to filter out all the junk characters and just get at
>> the text? If so could you provide a sample of how?
>>
>> I tried doing a string replace but the number of characters in the
>> array were getting way out of control. I was wondering if the
>> binaryreader class might be what I'm looking for but I have no idea
>> on what that really does.
>>
>> Thanks!
>> RSH
>>
>
>



Nov 21 '05 #7

This discussion thread is closed

Replies have been disabled for this discussion.