473,434 Members | 1,593 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,434 software developers and data experts.

Modifying a text file

I want to remove duplicate entries within a text file. So if I had
this within a text file...

Applications/Diabetic Registry/
Applications/Diabetic Registry/
Applications/Diabetic Registry/
Applications/Great Plains/
Applications/Great Plains/
Applications/Great Plains/
Applications/Great Plains/Servers/
Applications/Great Plains/Servers/
Applications/HeartBase/
Applications/HeartBase/
Applications/HeartBase/
Applications/HHC/
Applications/HHC/
Applications/HHC/
Applications/HHC/

I would want the result to be this:

Applications/Diabetic Registry/
Applications/Great Plains/
Applications/Great Plains/Servers/
Applications/HeartBase/
Applications/HHC/

I've tried using StreamReader and StreamWriter simulataneously with no
success...any other ideas?

Jan 23 '06 #1
5 1854
Use the StreamReader to read the lines into an array of strings. Close the
StreamReader. Loop through the array to eliminate the duplicates by
comparing each string in the array with all of the strings before it. You
can eliminate the duplicates by setting the duplicate entries to a blank
string. Write the string to the file using a StreamWriter. Don't write the
blank array members.

If your file contains blank lines, use a different string to indicate a
removed string (e.g. "[REMOVED]").

--
HTH,

Kevin Spencer
Microsoft MVP
..Net Developer
Who is Mighty Abbott?
A twin turret scalawag.

"soup_nazi" <bc*****@wfs-ops.org> wrote in message
news:11**********************@g44g2000cwa.googlegr oups.com...
I want to remove duplicate entries within a text file. So if I had
this within a text file...

Applications/Diabetic Registry/
Applications/Diabetic Registry/
Applications/Diabetic Registry/
Applications/Great Plains/
Applications/Great Plains/
Applications/Great Plains/
Applications/Great Plains/Servers/
Applications/Great Plains/Servers/
Applications/HeartBase/
Applications/HeartBase/
Applications/HeartBase/
Applications/HHC/
Applications/HHC/
Applications/HHC/
Applications/HHC/

I would want the result to be this:

Applications/Diabetic Registry/
Applications/Great Plains/
Applications/Great Plains/Servers/
Applications/HeartBase/
Applications/HHC/

I've tried using StreamReader and StreamWriter simulataneously with no
success...any other ideas?

Jan 23 '06 #2
If the file is large this might be a drain on resources and cause
performance problems.

"Kevin Spencer" <ke***@DIESPAMMERSDIEtakempis.com> wrote in message
news:uw****************@TK2MSFTNGP11.phx.gbl...
Use the StreamReader to read the lines into an array of strings. Close the
StreamReader. Loop through the array to eliminate the duplicates by
comparing each string in the array with all of the strings before it. You
can eliminate the duplicates by setting the duplicate entries to a blank
string. Write the string to the file using a StreamWriter. Don't write the
blank array members.

If your file contains blank lines, use a different string to indicate a
removed string (e.g. "[REMOVED]").

--
HTH,

Kevin Spencer
Microsoft MVP
.Net Developer
Who is Mighty Abbott?
A twin turret scalawag.

"soup_nazi" <bc*****@wfs-ops.org> wrote in message
news:11**********************@g44g2000cwa.googlegr oups.com...
I want to remove duplicate entries within a text file. So if I had
this within a text file...

Applications/Diabetic Registry/
Applications/Diabetic Registry/
Applications/Diabetic Registry/
Applications/Great Plains/
Applications/Great Plains/
Applications/Great Plains/
Applications/Great Plains/Servers/
Applications/Great Plains/Servers/
Applications/HeartBase/
Applications/HeartBase/
Applications/HeartBase/
Applications/HHC/
Applications/HHC/
Applications/HHC/
Applications/HHC/

I would want the result to be this:

Applications/Diabetic Registry/
Applications/Great Plains/
Applications/Great Plains/Servers/
Applications/HeartBase/
Applications/HHC/

I've tried using StreamReader and StreamWriter simulataneously with no
success...any other ideas?


Jan 23 '06 #3
Question, will the duplicate entries always be next to each other?

Can you provide some code that shows how you used the reader and writer.
There just might be something wrong with your logic.

"soup_nazi" <bc*****@wfs-ops.org> wrote in message
news:11**********************@g44g2000cwa.googlegr oups.com...
I want to remove duplicate entries within a text file. So if I had
this within a text file...

Applications/Diabetic Registry/
Applications/Diabetic Registry/
Applications/Diabetic Registry/
Applications/Great Plains/
Applications/Great Plains/
Applications/Great Plains/
Applications/Great Plains/Servers/
Applications/Great Plains/Servers/
Applications/HeartBase/
Applications/HeartBase/
Applications/HeartBase/
Applications/HHC/
Applications/HHC/
Applications/HHC/
Applications/HHC/

I would want the result to be this:

Applications/Diabetic Registry/
Applications/Great Plains/
Applications/Great Plains/Servers/
Applications/HeartBase/
Applications/HHC/

I've tried using StreamReader and StreamWriter simulataneously with no
success...any other ideas?

Jan 23 '06 #4
> If the file is large this might be a drain on resources and cause
performance problems.
If the file is *very* large, perhaps. However, I have written applications
that load hundreds of MB of data into memory without any performance issues.
Considering the sample he posted, I estimated that the size of the file is
not likely to be very large.

Other solutions that would handle very large files and check for duplicate
lines would definitely slow down performance. Disk IO is costly and slow,
especially in a managed app. When possible, it's best to read an entire file
into memory and work with it from there.

Yes, it would be possible to open a stream to the file, and read a line (or
a chunk of lines) at a time, comparing each line to another line (or chunk
of lines) read from the stream. If it were a very large file, this might be
necessary. But again, it would be costly to do so, because of the constant
disk IO involved. In addition, the constant re-allocation of strings would
consume a lot of managed memory. You'll notice that my solution did not
involve any reallocation of strings, except for the blank strings used to
replace removed strings.

Yes, my solution could be optimized a bit more. For example, rather than
replacing a string with a blank string in the array, removed strings could
be replace with null, now that I think of it.

If you have a better idea, let's hear it.

--
HTH,

Kevin Spencer
Microsoft MVP
..Net Developer
Who is Mighty Abbott?
A twin turret scalawag.

"Peter Rilling" <pe***@nospam.rilling.net> wrote in message
news:OQ****************@TK2MSFTNGP15.phx.gbl... If the file is large this might be a drain on resources and cause
performance problems.

"Kevin Spencer" <ke***@DIESPAMMERSDIEtakempis.com> wrote in message
news:uw****************@TK2MSFTNGP11.phx.gbl...
Use the StreamReader to read the lines into an array of strings. Close
the StreamReader. Loop through the array to eliminate the duplicates by
comparing each string in the array with all of the strings before it. You
can eliminate the duplicates by setting the duplicate entries to a blank
string. Write the string to the file using a StreamWriter. Don't write
the blank array members.

If your file contains blank lines, use a different string to indicate a
removed string (e.g. "[REMOVED]").

--
HTH,

Kevin Spencer
Microsoft MVP
.Net Developer
Who is Mighty Abbott?
A twin turret scalawag.

"soup_nazi" <bc*****@wfs-ops.org> wrote in message
news:11**********************@g44g2000cwa.googlegr oups.com...
I want to remove duplicate entries within a text file. So if I had
this within a text file...

Applications/Diabetic Registry/
Applications/Diabetic Registry/
Applications/Diabetic Registry/
Applications/Great Plains/
Applications/Great Plains/
Applications/Great Plains/
Applications/Great Plains/Servers/
Applications/Great Plains/Servers/
Applications/HeartBase/
Applications/HeartBase/
Applications/HeartBase/
Applications/HHC/
Applications/HHC/
Applications/HHC/
Applications/HHC/

I would want the result to be this:

Applications/Diabetic Registry/
Applications/Great Plains/
Applications/Great Plains/Servers/
Applications/HeartBase/
Applications/HHC/

I've tried using StreamReader and StreamWriter simulataneously with no
success...any other ideas?



Jan 24 '06 #5
On 23 Jan 2006 10:26:02 -0800, "soup_nazi" <bc*****@wfs-ops.org>
wrote:
I want to remove duplicate entries within a text file. So if I had
this within a text file...

Applications/Diabetic Registry/
Applications/Diabetic Registry/
Applications/Diabetic Registry/
Applications/Great Plains/
Applications/Great Plains/
Applications/Great Plains/
Applications/Great Plains/Servers/
Applications/Great Plains/Servers/
Applications/HeartBase/
Applications/HeartBase/
Applications/HeartBase/
Applications/HHC/
Applications/HHC/
Applications/HHC/
Applications/HHC/

I would want the result to be this:

Applications/Diabetic Registry/
Applications/Great Plains/
Applications/Great Plains/Servers/
Applications/HeartBase/
Applications/HHC/

I've tried using StreamReader and StreamWriter simulataneously with no
success...any other ideas?


The usual way to remove duplicates is to load the file into memory,
sort it then run through it keeping any line that does not match the
previous line.

If the file is too big to load into memory in one piece then you will
have to look at other techniques. Either process the file in chunks
(read up on "merge sort" for ideas) or else use the structure inherent
in the example you showed. You could load the whole thing into a
tree, reducing the amount of memory used:
<ASCII art ahead - monospaced font strongly recommended>

Applications -+-> Diabetic Registry ---> end
|
+-> Great Plains -+-> end
| |
| +-> Servers ---> end
|
+-> HeartBase ---> end
|
+-> HHC ---> end

rossum
--

The ultimate truth is that there is no ultimate truth
Jan 24 '06 #6

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

33
by: Jason Heyes | last post by:
I would like to modify the contents of a file, replacing all occurances of one string with another. I wrote these functions: bool read_file(std::string name, std::string &s); bool...
5
by: jrmsmo | last post by:
Hi I have a document as follows: <?xml version="1.0"?> <metadata xml:lang="en"> </metadata> I want to change the document so it looks as follows: <?xml version="1.0"?> <metadata...
1
by: Max Khitrov | last post by:
Hello everyone, I'm working on a VS .NET add-in that will allow developers to use Subversion software from within the IDE (much like Source Safe). Ideally, I would like for my plug-in to be able...
8
by: vadim | last post by:
Hi, Is there a .Net control available that allows to write into web.config file appsettings section? The idea is to create encrypted user name and password for database connection and then...
2
by: Randall Powell | last post by:
I am in the process of developing a Windows Service which will: (1) monitor multiple network shares; (2) marshal text file transfers into an SQL Server 2000 instance; and (3) provide messaging...
2
by: rk | last post by:
I have the following library.xml file coming from a system, this can't be modified. ____________________________________________________________________________ <?xml version="1.0"...
24
by: allpervasive | last post by:
hi all, this is reddy, a beginner to c lang,,here i have some problems in reading and modifying the contents of a file,, hope you can help to solve this problem. Here i attach the file to be...
1
ganesanji
by: ganesanji | last post by:
hi to all, I am new to php. I have to edit a text file using php. I saw the file system concepts modes. My problem is I want to change a particular text or word in a text file. How to...
1
by: Joe Cool | last post by:
I am attempting to add a function to an application I am working on to modify the JPEG Comment in a Jpeg image file. I can retrieve the JPEG Comment with no problem. The problem is modifying it....
5
by: IUnknown | last post by:
Ok, we are all aware of the situation where modifying the folder structure (adding files, folders, deleting files, etc) will result in ASP.NET triggering a recompilation/restart of the application....
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
1
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...
0
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and...
0
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.