473,803 Members | 2,949 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Modifying a text file

I want to remove duplicate entries within a text file. So if I had
this within a text file...

Applications/Diabetic Registry/
Applications/Diabetic Registry/
Applications/Diabetic Registry/
Applications/Great Plains/
Applications/Great Plains/
Applications/Great Plains/
Applications/Great Plains/Servers/
Applications/Great Plains/Servers/
Applications/HeartBase/
Applications/HeartBase/
Applications/HeartBase/
Applications/HHC/
Applications/HHC/
Applications/HHC/
Applications/HHC/

I would want the result to be this:

Applications/Diabetic Registry/
Applications/Great Plains/
Applications/Great Plains/Servers/
Applications/HeartBase/
Applications/HHC/

I've tried using StreamReader and StreamWriter simulataneously with no
success...any other ideas?

Jan 23 '06 #1
5 1883
Use the StreamReader to read the lines into an array of strings. Close the
StreamReader. Loop through the array to eliminate the duplicates by
comparing each string in the array with all of the strings before it. You
can eliminate the duplicates by setting the duplicate entries to a blank
string. Write the string to the file using a StreamWriter. Don't write the
blank array members.

If your file contains blank lines, use a different string to indicate a
removed string (e.g. "[REMOVED]").

--
HTH,

Kevin Spencer
Microsoft MVP
..Net Developer
Who is Mighty Abbott?
A twin turret scalawag.

"soup_nazi" <bc*****@wfs-ops.org> wrote in message
news:11******** **************@ g44g2000cwa.goo glegroups.com.. .
I want to remove duplicate entries within a text file. So if I had
this within a text file...

Applications/Diabetic Registry/
Applications/Diabetic Registry/
Applications/Diabetic Registry/
Applications/Great Plains/
Applications/Great Plains/
Applications/Great Plains/
Applications/Great Plains/Servers/
Applications/Great Plains/Servers/
Applications/HeartBase/
Applications/HeartBase/
Applications/HeartBase/
Applications/HHC/
Applications/HHC/
Applications/HHC/
Applications/HHC/

I would want the result to be this:

Applications/Diabetic Registry/
Applications/Great Plains/
Applications/Great Plains/Servers/
Applications/HeartBase/
Applications/HHC/

I've tried using StreamReader and StreamWriter simulataneously with no
success...any other ideas?

Jan 23 '06 #2
If the file is large this might be a drain on resources and cause
performance problems.

"Kevin Spencer" <ke***@DIESPAMM ERSDIEtakempis. com> wrote in message
news:uw******** ********@TK2MSF TNGP11.phx.gbl. ..
Use the StreamReader to read the lines into an array of strings. Close the
StreamReader. Loop through the array to eliminate the duplicates by
comparing each string in the array with all of the strings before it. You
can eliminate the duplicates by setting the duplicate entries to a blank
string. Write the string to the file using a StreamWriter. Don't write the
blank array members.

If your file contains blank lines, use a different string to indicate a
removed string (e.g. "[REMOVED]").

--
HTH,

Kevin Spencer
Microsoft MVP
.Net Developer
Who is Mighty Abbott?
A twin turret scalawag.

"soup_nazi" <bc*****@wfs-ops.org> wrote in message
news:11******** **************@ g44g2000cwa.goo glegroups.com.. .
I want to remove duplicate entries within a text file. So if I had
this within a text file...

Applications/Diabetic Registry/
Applications/Diabetic Registry/
Applications/Diabetic Registry/
Applications/Great Plains/
Applications/Great Plains/
Applications/Great Plains/
Applications/Great Plains/Servers/
Applications/Great Plains/Servers/
Applications/HeartBase/
Applications/HeartBase/
Applications/HeartBase/
Applications/HHC/
Applications/HHC/
Applications/HHC/
Applications/HHC/

I would want the result to be this:

Applications/Diabetic Registry/
Applications/Great Plains/
Applications/Great Plains/Servers/
Applications/HeartBase/
Applications/HHC/

I've tried using StreamReader and StreamWriter simulataneously with no
success...any other ideas?


Jan 23 '06 #3
Question, will the duplicate entries always be next to each other?

Can you provide some code that shows how you used the reader and writer.
There just might be something wrong with your logic.

"soup_nazi" <bc*****@wfs-ops.org> wrote in message
news:11******** **************@ g44g2000cwa.goo glegroups.com.. .
I want to remove duplicate entries within a text file. So if I had
this within a text file...

Applications/Diabetic Registry/
Applications/Diabetic Registry/
Applications/Diabetic Registry/
Applications/Great Plains/
Applications/Great Plains/
Applications/Great Plains/
Applications/Great Plains/Servers/
Applications/Great Plains/Servers/
Applications/HeartBase/
Applications/HeartBase/
Applications/HeartBase/
Applications/HHC/
Applications/HHC/
Applications/HHC/
Applications/HHC/

I would want the result to be this:

Applications/Diabetic Registry/
Applications/Great Plains/
Applications/Great Plains/Servers/
Applications/HeartBase/
Applications/HHC/

I've tried using StreamReader and StreamWriter simulataneously with no
success...any other ideas?

Jan 23 '06 #4
> If the file is large this might be a drain on resources and cause
performance problems.
If the file is *very* large, perhaps. However, I have written applications
that load hundreds of MB of data into memory without any performance issues.
Considering the sample he posted, I estimated that the size of the file is
not likely to be very large.

Other solutions that would handle very large files and check for duplicate
lines would definitely slow down performance. Disk IO is costly and slow,
especially in a managed app. When possible, it's best to read an entire file
into memory and work with it from there.

Yes, it would be possible to open a stream to the file, and read a line (or
a chunk of lines) at a time, comparing each line to another line (or chunk
of lines) read from the stream. If it were a very large file, this might be
necessary. But again, it would be costly to do so, because of the constant
disk IO involved. In addition, the constant re-allocation of strings would
consume a lot of managed memory. You'll notice that my solution did not
involve any reallocation of strings, except for the blank strings used to
replace removed strings.

Yes, my solution could be optimized a bit more. For example, rather than
replacing a string with a blank string in the array, removed strings could
be replace with null, now that I think of it.

If you have a better idea, let's hear it.

--
HTH,

Kevin Spencer
Microsoft MVP
..Net Developer
Who is Mighty Abbott?
A twin turret scalawag.

"Peter Rilling" <pe***@nospam.r illing.net> wrote in message
news:OQ******** ********@TK2MSF TNGP15.phx.gbl. .. If the file is large this might be a drain on resources and cause
performance problems.

"Kevin Spencer" <ke***@DIESPAMM ERSDIEtakempis. com> wrote in message
news:uw******** ********@TK2MSF TNGP11.phx.gbl. ..
Use the StreamReader to read the lines into an array of strings. Close
the StreamReader. Loop through the array to eliminate the duplicates by
comparing each string in the array with all of the strings before it. You
can eliminate the duplicates by setting the duplicate entries to a blank
string. Write the string to the file using a StreamWriter. Don't write
the blank array members.

If your file contains blank lines, use a different string to indicate a
removed string (e.g. "[REMOVED]").

--
HTH,

Kevin Spencer
Microsoft MVP
.Net Developer
Who is Mighty Abbott?
A twin turret scalawag.

"soup_nazi" <bc*****@wfs-ops.org> wrote in message
news:11******** **************@ g44g2000cwa.goo glegroups.com.. .
I want to remove duplicate entries within a text file. So if I had
this within a text file...

Applications/Diabetic Registry/
Applications/Diabetic Registry/
Applications/Diabetic Registry/
Applications/Great Plains/
Applications/Great Plains/
Applications/Great Plains/
Applications/Great Plains/Servers/
Applications/Great Plains/Servers/
Applications/HeartBase/
Applications/HeartBase/
Applications/HeartBase/
Applications/HHC/
Applications/HHC/
Applications/HHC/
Applications/HHC/

I would want the result to be this:

Applications/Diabetic Registry/
Applications/Great Plains/
Applications/Great Plains/Servers/
Applications/HeartBase/
Applications/HHC/

I've tried using StreamReader and StreamWriter simulataneously with no
success...any other ideas?



Jan 24 '06 #5
On 23 Jan 2006 10:26:02 -0800, "soup_nazi" <bc*****@wfs-ops.org>
wrote:
I want to remove duplicate entries within a text file. So if I had
this within a text file...

Applications/Diabetic Registry/
Applications/Diabetic Registry/
Applications/Diabetic Registry/
Applications/Great Plains/
Applications/Great Plains/
Applications/Great Plains/
Applications/Great Plains/Servers/
Applications/Great Plains/Servers/
Applications/HeartBase/
Applications/HeartBase/
Applications/HeartBase/
Applications/HHC/
Applications/HHC/
Applications/HHC/
Applications/HHC/

I would want the result to be this:

Applications/Diabetic Registry/
Applications/Great Plains/
Applications/Great Plains/Servers/
Applications/HeartBase/
Applications/HHC/

I've tried using StreamReader and StreamWriter simulataneously with no
success...an y other ideas?


The usual way to remove duplicates is to load the file into memory,
sort it then run through it keeping any line that does not match the
previous line.

If the file is too big to load into memory in one piece then you will
have to look at other techniques. Either process the file in chunks
(read up on "merge sort" for ideas) or else use the structure inherent
in the example you showed. You could load the whole thing into a
tree, reducing the amount of memory used:
<ASCII art ahead - monospaced font strongly recommended>

Applications -+-> Diabetic Registry ---> end
|
+-> Great Plains -+-> end
| |
| +-> Servers ---> end
|
+-> HeartBase ---> end
|
+-> HHC ---> end

rossum
--

The ultimate truth is that there is no ultimate truth
Jan 24 '06 #6

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

33
2904
by: Jason Heyes | last post by:
I would like to modify the contents of a file, replacing all occurances of one string with another. I wrote these functions: bool read_file(std::string name, std::string &s); bool write_file(std::string name, const std::string &s); void find_replace(std::string &s, std::string first, std::string second); bool find_replace_file(std::string name, std::string first, std::string second) {
5
13583
by: jrmsmo | last post by:
Hi I have a document as follows: <?xml version="1.0"?> <metadata xml:lang="en"> </metadata> I want to change the document so it looks as follows: <?xml version="1.0"?> <metadata xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="http://localhost/XMLDemo/MyXMLschema.xsd">
1
2232
by: Max Khitrov | last post by:
Hello everyone, I'm working on a VS .NET add-in that will allow developers to use Subversion software from within the IDE (much like Source Safe). Ideally, I would like for my plug-in to be able to modify icons that are displayed in Solution Explorer based on the file's status. So far I've been able to retrieve data from Solution Explorer using the UIHierarchy and related objects. Those give me access to the contents, but not to the...
8
2350
by: vadim | last post by:
Hi, Is there a .Net control available that allows to write into web.config file appsettings section? The idea is to create encrypted user name and password for database connection and then use them from ASP.Net. The program that will create the encrypted entries is a simple winform app. ConfigurationSettings.appsettings allows to read web.config sections but how
2
2653
by: Randall Powell | last post by:
I am in the process of developing a Windows Service which will: (1) monitor multiple network shares; (2) marshal text file transfers into an SQL Server 2000 instance; and (3) provide messaging services via email and a customized event log viewer. An additional goal is to have the service provide a visual status indicator via an icon to be located in the Taskbar status area. The NotifyIcon component appears to be a logical candidate and worked...
2
1817
by: rk | last post by:
I have the following library.xml file coming from a system, this can't be modified. ____________________________________________________________________________ <?xml version="1.0" encoding="utf-8" standalone="no"?> <library> <book> <name> Discover America </name> </book>
24
2980
by: allpervasive | last post by:
hi all, this is reddy, a beginner to c lang,,here i have some problems in reading and modifying the contents of a file,, hope you can help to solve this problem. Here i attach the file to be modified and the program code. In the attached file below i just want to change the value of data(only float value) after the line 1 P V T 1 15 till 2 G TT, from positive to negative and vice versa, and wire the date in other file. can someone help...
1
1473
ganesanji
by: ganesanji | last post by:
hi to all, I am new to php. I have to edit a text file using php. I saw the file system concepts modes. My problem is I want to change a particular text or word in a text file. How to find the index of the specific word. Is there any functions or methods available for change a particular word or finding index of a word... For the example, consider a text file named ganesh.txt which content is shown below.... ganesh is working ...
1
3246
by: Joe Cool | last post by:
I am attempting to add a function to an application I am working on to modify the JPEG Comment in a Jpeg image file. I can retrieve the JPEG Comment with no problem. The problem is modifying it. I have the contents of a Jpeg loaded into an Image object, _Image, using the Image.FromFile method. I convert the Text property of a TextBox to a byte array with code
5
2643
by: IUnknown | last post by:
Ok, we are all aware of the situation where modifying the folder structure (adding files, folders, deleting files, etc) will result in ASP.NET triggering a recompilation/restart of the application. In a nutshell, I understand how this can be considered desireable by some, but I am not one of those people. My situation is that we have a root site (hosted @ http://www.mydomain.com) in the root application folder '/'.
0
10317
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
1
10300
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
10069
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
0
9127
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
1
7607
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
5636
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
4277
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
2
3802
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
3
2974
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.