473,395 Members | 1,666 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,395 software developers and data experts.

Union of sets

Hello, I have a problem and I don´t know, how to solve it.
I have two .txt files, there is 1,000,000 numbers in each file, one number on each row. Lists are not sorted ascending or descending. I need to unite these two sets into a new file, delete duplicated numbers and sort ascending. I´m beginner in C++, so I have absolutely no idea, how to solve it. Thanks for any idea.
Apr 21 '10 #1
5 3555
Banfa
9,065 Expert Mod 8TB
1,000,000 numbers if they were ints would take about 4mbyte of RAM. On to-days systems with gigabytes of ram it is probably acceptable to allocate that much memory so for each of the 2 files
  • Load the file into memory
  • Sort the data in memory
  • Remove duplicates, if this didn't happen at stage 2
you can use a std::vector or std::list to hold the data from the file, this will making sorting easy since sorting algorithms are provided for standard containers.

Once you have the data from the 2 files in memory in 2 separate containers sorted and with duplicates removed it is fairly simple to then merge these 2 datasets into a single dataset.

Then finally you can write the resulting dataset out to a result file.
Apr 21 '10 #2
donbock
2,426 Expert 2GB
It would complicate the logic of the merge, but the program might run faster if you didn't worry about removing duplicates until the merge step. That would eliminate the cost of deleting duplicates from the two big arrays.

Hmmm. On the other hand, consider reading the input files directly into a binary tree that you sort as you go, but don't insert duplicate values. Then all you have to do is create the output file by traversing the tree.

The binary tree technique triples the memory usage; and you're totally screwed if you run out of memory before acquiring all the input data.
Apr 21 '10 #3
Banfa
9,065 Expert Mod 8TB
Actually C++ provides an easy method to implement Dons suggestion. The std::set. This is often implemented as a binary search tree, it is stored using a strict weak ordering and it can not contain duplicates.

Just read the 2 files adding all entries to a single std::set, the insert operation will fail in a gracefully for any values already in the std::set and you can just extract the data in the set to get the ordered list once both files have been read.
Apr 21 '10 #4
donbock
2,426 Expert 2GB
Darn! Why is it that all my most insightful design solutions seem to be built in to C++ where they're available to everybody?
Apr 21 '10 #5
Banfa
9,065 Expert Mod 8TB
lol because those pesky C++ designers looked round at all the things that C programmers found useful and then included them in their standard library so that everyone would have easy access to them.
Apr 21 '10 #6

Sign in to post your reply or Sign up for a free account.

Similar topics

3
by: Mickel Grönroos | last post by:
Hi! Are there any standard list methods for getting the intersection and difference of two lists? (The union is easy ("list1.extend(list2)"), unless you want it to contain unique values.) ...
10
by: KENNY L. CHEN | last post by:
Dear experts, I have two tables in my Oracle 8i database: TEST (COL1,COl2,REC_NO) and TEST1 (COL1,COL2,REC_NO). Both tables are unique-indexed on (COL1,COL2,REC_NO). I think the following...
5
by: NAJH | last post by:
I've been trying to do a union with a subquery - I've made a different example which follows the same principles as follows: First bit brings back accounts which are in the top 10 to 15 by...
2
by: mattytee123 | last post by:
I have about 20 tables, of which I would like to do a union query and count of how many of each different code there is? The simplified verson of the table is structured like this. Code ...
6
by: das | last post by:
Hello all, I have a table with thousands of rows and is in this format: id col1 col2 col3 col4 --- ------ ----- ------ ------ 1 nm 78 xyz pir 2 ...
84
by: Peter Olcott | last post by:
Is there anyway of doing this besides making my own string from scratch? union AnyType { std::string String; double Number; };
2
by: Evyn | last post by:
Hi, What is the best way to implement a union for fuzzy sets in C++? I have used the following code for crisp sets, but the trick to converting it for fuzzy sets eludes me. ...
4
by: Theo R. | last post by:
Hi all, I have the following struct defined - #define INTEGER 0 #define STRING 1 typedef struct { char type ; union {
8
by: Latina | last post by:
Hi, I am doing a program of overloaded methods. I want to get what is the union of the user set and S set and the user set and S1 set. I try it but it is not working. Can some one help me please?...
1
by: vekka | last post by:
Hi! Programming language: C I wondered if any of you are able to see what is wrong with my code. What I want to do is to find the union of a set of numbers. I use a singly linked list. and my union...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.