Hi,
I have quite a few .DAT data files that i need to extract the data out of.
When i open the files in a text editor I see all of the text that I need to
get at BUT there are a lot of junk (binary?) characters and white space in
non logical formatting positions.
Here is a small sample of what the data looks like:
0~ 0501101010512505011132451235 > ô ô
X 3> Bô
MICHAEL B SMITH, DVM PC 123 MAIN ST., STE 600 DALLAS, TX 75252
MICHAEL SMITH Õ
s ' 'MICHAEL B NORTON, DVM PC 123 MAIN ST., STE 600 DALLAS, TX
75252
ÌZkM Ô Bô
< Ô ' Ò '
'
' ' ' ' â '
Bô
-D Õ â '
Bô
Is it possible to filter out all the junk characters and just get at the
text? If so could you provide a sample of how?
I tried doing a string replace but the number of characters in the array
were getting way out of control. I was wondering if the binaryreader class
might be what I'm looking for but I have no idea on what that really does.
Thanks!
RSH 6 1826
You can filter the string to just accept range of the numbers and letters.
ASCII table: http://www.lookuptables.com/
But why do you read binary files using text file methods?
--
Saber S.
"RSH" <wa*************@yahoo.com> wrote in message
news:Oy**************@TK2MSFTNGP12.phx.gbl... Hi,
I have quite a few .DAT data files that i need to extract the data out of. When i open the files in a text editor I see all of the text that I need to get at BUT there are a lot of junk (binary?) characters and white space in non logical formatting positions.
Here is a small sample of what the data looks like:
0~ 0501101010512505011132451235 > ô ô X 3> Bô
MICHAEL B SMITH, DVM PC 123 MAIN ST., STE 600 DALLAS, TX 75252 MICHAEL SMITH Õ s ' 'MICHAEL B NORTON, DVM PC 123 MAIN ST., STE 600 DALLAS, TX 75252 ÌZkM Ô Bô < Ô ' Ò ' ' ' ' ' ' â ' Bô -D Õ â ' Bô
Is it possible to filter out all the junk characters and just get at the text? If so could you provide a sample of how?
I tried doing a string replace but the number of characters in the array were getting way out of control. I was wondering if the binaryreader class might be what I'm looking for but I have no idea on what that really does.
Thanks! RSH
Primariy i just need to get at the data...which is all there but
unfortunately there are thousands of useless characters with the text. i
tried Regex to strip the useless characters but everything disappears when i
try that which leads me to believe it is an encoding mismatch. If you look
at my example of the text file below you will see what is happening. The
files are from an old Microfocus ISAM database which no simple database
viewer exists. many products are out there but they are several thousand
dollars, which is way out of range for us.
"Saber" <saber[.AT.]oxin.ir> wrote in message
news:ex**************@TK2MSFTNGP10.phx.gbl... You can filter the string to just accept range of the numbers and letters. ASCII table: http://www.lookuptables.com/ But why do you read binary files using text file methods?
-- Saber S. "RSH" <wa*************@yahoo.com> wrote in message news:Oy**************@TK2MSFTNGP12.phx.gbl... Hi,
I have quite a few .DAT data files that i need to extract the data out of. When i open the files in a text editor I see all of the text that I need to get at BUT there are a lot of junk (binary?) characters and white space in non logical formatting positions.
Here is a small sample of what the data looks like:
0~ 0501101010512505011132451235 > ô ô X 3> Bô
MICHAEL B SMITH, DVM PC 123 MAIN ST., STE 600 DALLAS, TX 75252 MICHAEL SMITH Õ s ' 'MICHAEL B NORTON, DVM PC 123 MAIN ST., STE 600 DALLAS, TX 75252 ÌZkM Ô Bô < Ô ' Ò ' ' ' ' ' ' â ' Bô -D Õ â ' Bô
Is it possible to filter out all the junk characters and just get at the text? If so could you provide a sample of how?
I tried doing a string replace but the number of characters in the array were getting way out of control. I was wondering if the binaryreader class might be what I'm looking for but I have no idea on what that really does.
Thanks! RSH
I mean using *range*, for example imagine you've stored whole data in a
string file named strOld and you want to exclude ascii charachter instead of
numbers
and letters:
Dim newStr As String
Dim i As Integer
For i = 0 To oldStr.Length - 1
If oldStr.Chars(i) >= Chr(48) AndAlso oldStr.Chars(i) <= Chr(172) Then
newStr += oldStr.Chars(i)
End If
Next
You can play with if-statement to get your desired result.
--
Saber S.
"RSH" <wa*************@yahoo.com> wrote in message
news:eM**************@tk2msftngp13.phx.gbl... Primariy i just need to get at the data...which is all there but unfortunately there are thousands of useless characters with the text. i tried Regex to strip the useless characters but everything disappears when i try that which leads me to believe it is an encoding mismatch. If you look at my example of the text file below you will see what is happening. The files are from an old Microfocus ISAM database which no simple database viewer exists. many products are out there but they are several thousand dollars, which is way out of range for us. "Saber" <saber[.AT.]oxin.ir> wrote in message news:ex**************@TK2MSFTNGP10.phx.gbl... You can filter the string to just accept range of the numbers and letters. ASCII table: http://www.lookuptables.com/ But why do you read binary files using text file methods?
-- Saber S. "RSH" <wa*************@yahoo.com> wrote in message news:Oy**************@TK2MSFTNGP12.phx.gbl... Hi,
I have quite a few .DAT data files that i need to extract the data out of. When i open the files in a text editor I see all of the text that I need to get at BUT there are a lot of junk (binary?) characters and white space in non logical formatting positions.
Here is a small sample of what the data looks like:
0~ 0501101010512505011132451235 > ô ô X 3> Bô
MICHAEL B SMITH, DVM PC 123 MAIN ST., STE 600 DALLAS, TX 75252 MICHAEL SMITH Õ s ' 'MICHAEL B NORTON, DVM PC 123 MAIN ST., STE 600 DALLAS, TX 75252 ÌZkM Ô Bô < Ô ' Ò ' ' ' ' ' ' â ' Bô -D Õ â ' Bô
Is it possible to filter out all the junk characters and just get at the text? If so could you provide a sample of how?
I tried doing a string replace but the number of characters in the array were getting way out of control. I was wondering if the binaryreader class might be what I'm looking for but I have no idea on what that really does.
Thanks! RSH
Oh thats good...that worked BUT...is there anyway to keep single instances
of spaces...in other words if there are two or more spaces next to each
other they can be eliminated...and i would like to keep the linefeeds too.
Thanks alot!
"Saber" <saber[.AT.]oxin.ir> wrote in message
news:Oc**************@TK2MSFTNGP14.phx.gbl... I mean using *range*, for example imagine you've stored whole data in a string file named strOld and you want to exclude ascii charachter instead of numbers and letters:
Dim newStr As String Dim i As Integer For i = 0 To oldStr.Length - 1 If oldStr.Chars(i) >= Chr(48) AndAlso oldStr.Chars(i) <= Chr(172) Then newStr += oldStr.Chars(i) End If Next
You can play with if-statement to get your desired result. -- Saber S.
"RSH" <wa*************@yahoo.com> wrote in message news:eM**************@tk2msftngp13.phx.gbl... Primariy i just need to get at the data...which is all there but unfortunately there are thousands of useless characters with the text. i tried Regex to strip the useless characters but everything disappears when i try that which leads me to believe it is an encoding mismatch. If you look at my example of the text file below you will see what is happening. The files are from an old Microfocus ISAM database which no simple database viewer exists. many products are out there but they are several thousand dollars, which is way out of range for us. "Saber" <saber[.AT.]oxin.ir> wrote in message news:ex**************@TK2MSFTNGP10.phx.gbl... You can filter the string to just accept range of the numbers and letters. ASCII table: http://www.lookuptables.com/ But why do you read binary files using text file methods?
-- Saber S. "RSH" <wa*************@yahoo.com> wrote in message news:Oy**************@TK2MSFTNGP12.phx.gbl... Hi,
I have quite a few .DAT data files that i need to extract the data out of. When i open the files in a text editor I see all of the text that I need to get at BUT there are a lot of junk (binary?) characters and white space in non logical formatting positions.
Here is a small sample of what the data looks like:
0~ 0501101010512505011132451235 > ô ô X 3> Bô
MICHAEL B SMITH, DVM PC 123 MAIN ST., STE 600 DALLAS, TX 75252 MICHAEL SMITH Õ s ' 'MICHAEL B NORTON, DVM PC 123 MAIN ST., STE 600 DALLAS, TX 75252 ÌZkM Ô Bô < Ô ' Ò ' ' ' ' ' ' â ' Bô -D Õ â ' Bô
Is it possible to filter out all the junk characters and just get at the text? If so could you provide a sample of how?
I tried doing a string replace but the number of characters in the array were getting way out of control. I was wondering if the binaryreader class might be what I'm looking for but I have no idea on what that really does.
Thanks! RSH
I found a tricky way about your "spacing" problem,
but maybe there are also better ways:
For i = 0 To oldStr.Length - 1
If oldStr.Chars(i) >= Chr(32) AndAlso oldStr.Chars(i) <= Chr(172) Then
If Not (oldStr.Substring(i).StartsWith(Chr(32)) AndAlso _
oldStr.Chars(i + 1) = Chr(32)) Then newStr += oldStr.Chars(i)
End If
Next
About LFs, I'm not sure what to do, please send me a sample
of your files (if they are less than 1 MB!) and the code you use to read
those files.
--
Saber S.
"RSH" <wa*************@yahoo.com> wrote in message
news:OR**************@TK2MSFTNGP15.phx.gbl... Oh thats good...that worked BUT...is there anyway to keep single instances of spaces...in other words if there are two or more spaces next to each other they can be eliminated...and i would like to keep the linefeeds too.
Thanks alot!
"Saber" <saber[.AT.]oxin.ir> wrote in message news:Oc**************@TK2MSFTNGP14.phx.gbl...I mean using *range*, for example imagine you've stored whole data in a string file named strOld and you want to exclude ascii charachter instead of numbers and letters:
Dim newStr As String Dim i As Integer For i = 0 To oldStr.Length - 1 If oldStr.Chars(i) >= Chr(48) AndAlso oldStr.Chars(i) <= Chr(172) Then newStr += oldStr.Chars(i) End If Next
You can play with if-statement to get your desired result. -- Saber S.
"RSH" <wa*************@yahoo.com> wrote in message news:eM**************@tk2msftngp13.phx.gbl... Primariy i just need to get at the data...which is all there but unfortunately there are thousands of useless characters with the text. i tried Regex to strip the useless characters but everything disappears when i try that which leads me to believe it is an encoding mismatch. If you look at my example of the text file below you will see what is happening. The files are from an old Microfocus ISAM database which no simple database viewer exists. many products are out there but they are several thousand dollars, which is way out of range for us. "Saber" <saber[.AT.]oxin.ir> wrote in message news:ex**************@TK2MSFTNGP10.phx.gbl... You can filter the string to just accept range of the numbers and letters. ASCII table: http://www.lookuptables.com/ But why do you read binary files using text file methods?
-- Saber S. "RSH" <wa*************@yahoo.com> wrote in message news:Oy**************@TK2MSFTNGP12.phx.gbl... > Hi, > > I have quite a few .DAT data files that i need to extract the data out > of. When i open the files in a text editor I see all of the text that > I need to get at BUT there are a lot of junk (binary?) characters and > white space in non logical formatting positions. > > Here is a small sample of what the data looks like: > > 0~ 0501101010512505011132451235 > ô ô X > 3> Bô > > > MICHAEL B SMITH, DVM PC 123 MAIN ST., STE 600 DALLAS, TX > 75252 MICHAEL SMITH Õ > s ' 'MICHAEL B NORTON, DVM PC 123 MAIN ST., STE 600 DALLAS, > TX 75252 ÌZkM Ô Bô > < Ô ' Ò ' > ' > ' ' ' ' â ' Bô > -D Õ â ' Bô > > > Is it possible to filter out all the junk characters and just get at > the text? If so could you provide a sample of how? > > I tried doing a string replace but the number of characters in the > array were getting way out of control. I was wondering if the > binaryreader class might be what I'm looking for but I have no idea on > what that really does. > > Thanks! > RSH >
a better loop is:
For i = 0 To oldStr.Length - 2
If oldStr.Chars(i) >= Chr(32) AndAlso oldStr.Chars(i) <= Chr(172) Then
If Not (oldStr.Substring(i, 2) = " ") Then newStr += oldStr.Chars(i) '*
End If
Next
* There are 2 space characters in double qutations ( ...(i,2)=" ")
--
Saber S.
"Saber" <saber[.AT.]oxin.ir> wrote in message
news:%2****************@TK2MSFTNGP09.phx.gbl... I found a tricky way about your "spacing" problem, but maybe there are also better ways:
For i = 0 To oldStr.Length - 1 If oldStr.Chars(i) >= Chr(32) AndAlso oldStr.Chars(i) <= Chr(172) Then If Not (oldStr.Substring(i).StartsWith(Chr(32)) AndAlso _ oldStr.Chars(i + 1) = Chr(32)) Then newStr += oldStr.Chars(i) End If Next
About LFs, I'm not sure what to do, please send me a sample of your files (if they are less than 1 MB!) and the code you use to read those files.
-- Saber S. "RSH" <wa*************@yahoo.com> wrote in message news:OR**************@TK2MSFTNGP15.phx.gbl... Oh thats good...that worked BUT...is there anyway to keep single instances of spaces...in other words if there are two or more spaces next to each other they can be eliminated...and i would like to keep the linefeeds too.
Thanks alot!
"Saber" <saber[.AT.]oxin.ir> wrote in message news:Oc**************@TK2MSFTNGP14.phx.gbl...I mean using *range*, for example imagine you've stored whole data in a string file named strOld and you want to exclude ascii charachter instead of numbers and letters:
Dim newStr As String Dim i As Integer For i = 0 To oldStr.Length - 1 If oldStr.Chars(i) >= Chr(48) AndAlso oldStr.Chars(i) <= Chr(172) Then newStr += oldStr.Chars(i) End If Next
You can play with if-statement to get your desired result. -- Saber S.
"RSH" <wa*************@yahoo.com> wrote in message news:eM**************@tk2msftngp13.phx.gbl... Primariy i just need to get at the data...which is all there but unfortunately there are thousands of useless characters with the text. i tried Regex to strip the useless characters but everything disappears when i try that which leads me to believe it is an encoding mismatch. If you look at my example of the text file below you will see what is happening. The files are from an old Microfocus ISAM database which no simple database viewer exists. many products are out there but they are several thousand dollars, which is way out of range for us. "Saber" <saber[.AT.]oxin.ir> wrote in message news:ex**************@TK2MSFTNGP10.phx.gbl... > You can filter the string to just accept range of the numbers and > letters. > ASCII table: http://www.lookuptables.com/ > But why do you read binary files using text file methods? > > > -- > Saber S. > "RSH" <wa*************@yahoo.com> wrote in message > news:Oy**************@TK2MSFTNGP12.phx.gbl... >> Hi, >> >> I have quite a few .DAT data files that i need to extract the data >> out of. When i open the files in a text editor I see all of the text >> that I need to get at BUT there are a lot of junk (binary?) >> characters and white space in non logical formatting positions. >> >> Here is a small sample of what the data looks like: >> >> 0~ 0501101010512505011132451235 > ô ô X >> 3> Bô >> >> >> MICHAEL B SMITH, DVM PC 123 MAIN ST., STE 600 DALLAS, TX >> 75252 MICHAEL SMITH Õ >> s ' 'MICHAEL B NORTON, DVM PC 123 MAIN ST., STE 600 >> DALLAS, TX 75252 ÌZkM Ô Bô >> < Ô ' Ò ' >> ' >> ' ' ' ' â ' Bô >> -D Õ â ' Bô >> >> >> Is it possible to filter out all the junk characters and just get at >> the text? If so could you provide a sample of how? >> >> I tried doing a string replace but the number of characters in the >> array were getting way out of control. I was wondering if the >> binaryreader class might be what I'm looking for but I have no idea >> on what that really does. >> >> Thanks! >> RSH >> > >
This thread has been closed and replies have been disabled. Please start a new discussion. Similar topics
by: JustinCase |
last post by:
Hi,
I have to index the text of a pdf document.
Does any of you know of a PHP script/extension or a binary that is able
to extract the text ?
The pdf extension mentioned in the php.net docs...
|
by: Trader |
last post by:
Hi,
I'm trying to use Mark Hammond's win32clipboard module to extract more
complex data than just plain ASCII text from the Windows clipboard.
For instance, when you select all the content on...
|
by: Eric |
last post by:
Assume that disk space is not an issue
(the files will be small < 5k in general for the purpose of storing
preferences)
Assume that transportation to another OS may never occur.
Are there...
|
by: gRizwan |
last post by:
Hello all,
We have a problem on a webpage. That page is sent some email data in
base64 format. what we need to do is, decode the base64 data back to
original shape and extract attached image...
|
by: code_wrong |
last post by:
hi,
I decided to extract the text from some powerpoint files. The results have
thrown up some questions.
When I use the 'char *valid' character array (in the program below) to
choose the...
|
by: Adam J. Schaff |
last post by:
I am writing a quick program to edit a binary file that contains file paths
(amongst other things). If I look at the files in notepad, they look like:
...
|
by: Dave |
last post by:
Hello,
I am wondering about including binary files in my MS Access database
application. I want to keep my application as just a single MDE or MDB
file, but the users of the app may need some...
|
by: joelagnel |
last post by:
hi friends,
i've been having this confusion for about a year, i want to know the
exact difference between text and binary files.
using the fwrite function in c, i wrote 2 bytes of integers in...
|
by: =?Utf-8?B?U2NvdHQ=?= |
last post by:
I am trying to extract a zip file in a database image field to disk. For some
reason, the zip file is getting corrupted / truncated. I have code in ASP
which extracts the zip file no problem, so i...
|
by: ryjfgjl |
last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
|
by: BarryA |
last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
|
by: nemocccc |
last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
|
by: Sonnysonu |
last post by:
This is the data of csv file
1 2 3
1 2 3
1 2 3
1 2 3
2 3
2 3
3
the lengths should be different i have to store the data by column-wise with in the specific length.
suppose the i have to...
|
by: Hystou |
last post by:
There are some requirements for setting up RAID:
1. The motherboard and BIOS support RAID configuration.
2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
|
by: marktang |
last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
|
by: Hystou |
last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
|
by: Hystou |
last post by:
Overview:
Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
|
by: agi2029 |
last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...
| |