By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
446,227 Members | 1,376 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 446,227 IT Pros & Developers. It's quick & easy.

Help with file comparison

P: n/a
Does anyone have any tips for comparing large amounts of data?

The data in question is around 1000 lines of up to 400 delimited key value
pairs. Each line has a unique identifier stored in one of the fields. Each
field needs to be compared to the corresponding line/field in a "master file".
In addition, each field has formatting rules, such as trimming spaces and
zeroes, ignore capitalization, etc., which are stored in a database. I have
done similar things in Java using a HashMap of HashMaps approach, but the
performance was awful.

Would using the STL hash_map in C++ result in the same performace problems?
Would it be better to read the file into memory with mmap() or fopen/fread
and work with individually malloc'ed pieces of data, using memcmp() to
compare, and just store the pointers in an array or map? Also, any hints
for divided up all those pairs? Is strtok() my best bet?

Thanks for any help or feedback.
--
-chris

*address above may be munged
Jul 22 '05 #1
Share this Question
Share on Google+
1 Reply


P: n/a

"chris" <mu***@spamfree.midsouth.rr.com> wrote in message
news:G_*******************@clmboh1-nws5.columbus.rr.com...
| Does anyone have any tips for comparing large amounts of data?
|
| The data in question is around 1000 lines of up to 400 delimited key value
| pairs. Each line has a unique identifier stored in one of the fields. Each
| field needs to be compared to the corresponding line/field in a "master file".
| In addition, each field has formatting rules, such as trimming spaces and
| zeroes, ignore capitalization, etc., which are stored in a database. I have
| done similar things in Java using a HashMap of HashMaps approach, but the
| performance was awful.
|
| Would using the STL hash_map in C++ result in the same performace problems?
| Would it be better to read the file into memory with mmap() or fopen/fread
| and work with individually malloc'ed pieces of data, using memcmp() to
| compare, and just store the pointers in an array or map? Also, any hints
| for divided up all those pairs? Is strtok() my best bet?

- Read the key-values pairs into 'std::map<>' accossiative container(s).
- Write a simple trim function
- Write a simple function to ignore case

Once you have done that, you're almost finished :-).

Note howerver, that 'C++' knows nothing of databses.

Cheers.
Chris Val
Jul 22 '05 #2

This discussion thread is closed

Replies have been disabled for this discussion.