File Merge

Michael R. Copeland

I'm writing an application that requires an "intelligent merge" of 2
files. That is, equal data has a "preferred source" that I want to
write out. What I have works, I believe, but it seems horribly
cumbersome (having to set the input variables to ""...). Is there a
better way? TIA

while ((!feof(wf3)) || (!feof(wf1)))
{
if (!feof(wf1))
{
strcpy(WDBRec, "");
if (fgets(WDBRec, sizeof(WDBRec), wf1) != NULL)
nic++;
}
if (!feof(wf3))
{
strcpy(DBERec, "");
if (fgets(DBERec, sizeof(DBERec), wf3) != NULL)
bic++;
}
dbeBib = atoi(copy(DBERec, 1, 5));
wdbBib = atoi(copy(WDBRec, 1, 5));
if (wdbBib == dbeBib) // records match - defer to old data
{
if (dbeBib > 0)
{
writeToDBE(DBERec); buc++;
}
}
if (wdbBib < dbeBib) // work file data is new - write it
{
if (wdbBib > 0)
{
writeToDBE(WDBRec); nuc++;
}
else
{
if (dbeBib > 0)
{
writeToDBE(DBERec); buc++;
}
}
}
if (wdbBib > dbeBib)// prevailing old data - write it out
{
if (dbeBib > 0)
{
writeToDBE(DBERec); buc++;
}
}
} // while

Nov 15 '05 #1

Subscribe Reply

1747

Martijn

Hi,

I'm writing an application that requires an "intelligent merge" of
2 files. That is, equal data has a "preferred source" that I want to
write out. What I have works, I believe, but it seems horribly
cumbersome (having to set the input variables to ""...). Is there a
better way? TIA

while ((!feof(wf3)) || (!feof(wf1)))
{
if (!feof(wf1))
{
strcpy(WDBRec, "");
if (fgets(WDBRec, sizeof(WDBRec), wf1) != NULL)
nic++;
}
[snipped]

Firstly: your indentation has some room for improvement. Secondly, you are
inconsistent with your syntax of single-line if bodies. But here is my
actual reply:

Assuming you are not working with UNICODE, you can simplify

strcpy(WDBRec, "");

to

WDBRec[0] = '\0';

The code doesn't look that cumbersome. You could rewrite it, which may make
it a little bit more straight foreward:

if (fgets(WDBRec, sizeof(WDBRec), wfl) != NULL)
{
nic++;
wdbBib = atoi(copy(WDBRec, 1, 5));
}
else
{
wdbBib = 0;
}

Somewhere further down in your code you use:
if (wdbBib > 0)
{
writeToDBE(WDBRec); nuc++;
}

This will take care of not writing out an invalid string (because WDBRec
will still contain the previous value).

I might have left some loose ties, but this should help you along.

Good luck!

--
Martijn
http://www.sereneconcepts.nl

Nov 15 '05 #2

CBFalconer

Martijn wrote:

.... snip ...
Firstly: your indentation has some room for improvement.
Secondly, you are inconsistent with your syntax of single-line
if bodies. But here is my actual reply:

His indentation was fine here. This indicates that indentation
swallowing is taking place somewhere on the path to you, but not to
me. It could be your newsreader, which appears to be a Microsoft
execresence.

You neglected to attribute the portion you quoted. Please don't do
that.

--
Chuck F (cb********@yahoo.com) (cb********@worldnet.att.net)
Available for consulting/temporary embedded and systems.
<http://cbfalconer.home.att.net> USE worldnet address!

Nov 15 '05 #3

Chris Croughton

On Fri, 05 Aug 2005 07:59:58 GMT, CBFalconer
<cb********@yahoo.com> wrote:

Martijn wrote:

... snip ...

Firstly: your indentation has some room for improvement.
Secondly, you are inconsistent with your syntax of single-line
if bodies. But here is my actual reply:

His indentation was fine here. This indicates that indentation
swallowing is taking place somewhere on the path to you, but not to
me. It could be your newsreader, which appears to be a Microsoft
execresence.

The original used tabs. Nasty things, they can expand to any number of
spaces including none depending on the system (for me they expanded to 8
spaces, which is excessive but just bearable in that example).

Chris C

Nov 15 '05 #4

Eric Sosman

Michael R. Copeland wrote:

I'm writing an application that requires an "intelligent merge" of 2
files. That is, equal data has a "preferred source" that I want to
write out. What I have works, I believe, but it seems horribly
cumbersome (having to set the input variables to ""...). Is there a
better way? TIA

while ((!feof(wf3)) || (!feof(wf1)))
This is the wrong way to check for end-of-file. Please
see Question 12.2 in the comp.lang.c Frequently Asked Questions
(FAQ) list at

http://www.eskimo.com/~scs/C-faq/top.html
{
if (!feof(wf1))
Ditto.
{
strcpy(WDBRec, "");
This looks pointless. You're about to overwrite the contents
of WDBRec by using fgets() on it, so why do you care what's in it
beforehand? Perhaps this is an attempt to rescue the situation
after the unreliable end-of-file test -- if so, once you fix the
test you won't need this any more.

By the way, you didn't show us what WDBRec is. From the way
you're using it, it should be an array of char; a pointer to a
malloc'ed area would not work here.
if (fgets(WDBRec, sizeof(WDBRec), wf1) != NULL)
nic++;
}
if (!feof(wf3))
{
strcpy(DBERec, "");
if (fgets(DBERec, sizeof(DBERec), wf3) != NULL)
bic++;
}
dbeBib = atoi(copy(DBERec, 1, 5));
You haven't shown us what copy() is. I'm going to assume
that it copies the second through sixth characters (that is,
array elements [1] through [5]) into a six-char array somewhere
and appends a '\0'. Whether this works depends a lot on the
location and nature of that intermediate six-char array; see
Question 7.5 for a description of one all-too-frequent error.
(By the way, if fgets() didn't read anything, the second through
sixth characters will be the leftovers from the record prior to
the current one, if any.)

Despite its suggestive name, atoi() is not a very good way
to convert decimal strings to integers, not unless you're very
trusting of the source. The problem is that it will happily
convert "123x5" to 123 and give no indication that the input
is in any way strange. It won't even detect "xyzzy" as in any
way peculiar (indeed, its behavior on "xyzzy" is completely
unpredictable). So unless you are very, very sure that the
input is valid, atoi() is a poor way to convert it. There are
at least three superior ways to proceed:

- Use strtol(), because it will do the conversion *and*
report any oddities it finds, in a predictable way.

- Use sscanf(). It's a little bit trickier than it looks,
but allows you to do without the copy() stuff:
if (sscanf(DBERec+1, "%5d%n", &dbeBib, &len) == 1
&& len == 5) { all's well } else { bad input }
The "%5d" converts no more than five digits (in case
additional digits follow the field of interest). The
"%n" tells you how many digits were actually converted
(it will set len to 3 if the input was "123xy"). And
if the "%5d" finds no digits at all ("xyzzy") sscanf()
will stop and return zero.

- If you're really sure the respective fields contain digits,
you can compare them as characters without converting at
all by using memcmp(DBERec+1, WDBRec+1, 5). However, this
may cause some surprises with non-digits: for example,
"01234" and " 1234" will be treated as unequal, and "-1234"
will be treated as less than "-9999". You'll have to decide
whether this is appropriate for your application.
wdbBib = atoi(copy(WDBRec, 1, 5));
if (wdbBib == dbeBib) // records match - defer to old data
{
if (dbeBib > 0)
I'm not sure what this test is for, unless perhaps it's
part of the rescue attempt for the incorrect end-of-file test.
If there really are actual non-positive numbers in the input,
it looks like this will eliminate them from the output. But
if you've got purely digit fields that can't be negative (though
"00000" would, of course, be zero), I think this test and the
others like it can simply go away once you fix the EOF handling.
{
writeToDBE(DBERec); buc++;
}
}
if (wdbBib < dbeBib) // work file data is new - write it
{
if (wdbBib > 0)
{
writeToDBE(WDBRec); nuc++;
}
else
{
if (dbeBib > 0)
{
writeToDBE(DBERec); buc++;
}
}
}
if (wdbBib > dbeBib)// prevailing old data - write it out
{
if (dbeBib > 0)
{
writeToDBE(DBERec); buc++;
}
}
} // while

You say you believe this works, but one thing that strikes
me as strange is that you read new input from *both* files every
time through the loop. (Until the botched EOF detection kicks
in, of course.) That doesn't seem right at all: If you get the
sequence "11111" "33333" "55555" from WDB while DBE provides
"22222" "44444", I'd expect you'd want to see all five of these
in the output -- but that's not what you're doing, and I'm not
sure whether it's accidental or intentional. Take another look.

--
Eric Sosman
es*****@acm-dot-org.invalid

Nov 15 '05 #5

Alan Balmer

On Fri, 05 Aug 2005 07:59:58 GMT, CBFalconer <cb********@yahoo.com>
wrote:

Martijn wrote:
... snip ...

Firstly: your indentation has some room for improvement.
Secondly, you are inconsistent with your syntax of single-line
if bodies. But here is my actual reply:

His indentation was fine here. This indicates that indentation
swallowing is taking place somewhere on the path to you, but not to
me. It could be your newsreader, which appears to be a Microsoft
execresence.

The OP used tabs instead of spaces. To the OP: Don't do that.
You neglected to attribute the portion you quoted. Please don't do
that.

--
Al Balmer
Balmer Consulting
re************************@att.net

Nov 15 '05 #6

Martijn

>> Firstly: your indentation has some room for improvement.

Secondly, you are inconsistent with your syntax of single-line
if bodies. But here is my actual reply:
His indentation was fine here. This indicates that indentation
swallowing is taking place somewhere on the path to you, but not to
me. It could be your newsreader, which appears to be a Microsoft
execresence.

You make it sound like it's a bad thing ;) Given your experience you should
have no problem confirming that fact by looking at the headers.
You neglected to attribute the portion you quoted. Please don't do
that.

I am aware of your pedantic approach towards other peoples posting habits
(whether that's a good thing or a bad thing is not up to me), but you lost
me here. Could you rephrase your comment for a non-native English speaking
individual like myself?

But what did you think of the post content-wise?

--
Martijn
http://www.sereneconcepts.nl

Nov 15 '05 #7

Martijn

>> Firstly: your indentation has some room for improvement.

Secondly, you are inconsistent with your syntax of single-line
if bodies. But here is my actual reply:
His indentation was fine here. This indicates that indentation
swallowing is taking place somewhere on the path to you, but not to
me. It could be your newsreader, which appears to be a Microsoft
execresence.

You make it sound like it's a bad thing ;) Given your experience you should
have no problem confirming that fact by looking at the headers.
You neglected to attribute the portion you quoted. Please don't do
that.

Nov 15 '05 #8

Alan Balmer

On Fri, 5 Aug 2005 17:57:03 +0200, "Martijn"
<su*********************@hot-remove-mail.com> wrote:

Firstly: your indentation has some room for improvement.
Secondly, you are inconsistent with your syntax of single-line
if bodies. But here is my actual reply:
His indentation was fine here. This indicates that indentation
swallowing is taking place somewhere on the path to you, but not to
me. It could be your newsreader, which appears to be a Microsoft
execresence.

You make it sound like it's a bad thing ;)

Yes.
Given your experience you should
have no problem confirming that fact by looking at the headers.
I suppose that's how he knew.
You neglected to attribute the portion you quoted. Please don't do
that.
I am aware of your pedantic approach towards other peoples posting habits
(whether that's a good thing or a bad thing is not up to me), but you lost
me here. Could you rephrase your comment for a non-native English speaking
individual like myself?

You quoted parts of other people's writings without any indication of
who the writer was (attribution.) You did it again in this post. Look
at the top of this reply for an example of an attribution (to you).
Look at nearly any other post in this forum for other examples.

A better newsreader would help. Failing that, you should look for a
program called OE-Quotefix, which reportedly makes Outlook Express act
somewhat like a real newsreader.
But what did you think of the post content-wise?

--
Al Balmer
Balmer Consulting
re************************@att.net

Nov 15 '05 #9

Flash Gordon

Martijn wrote:

<snip>

You neglected to attribute the portion you quoted. Please don't do
that.

I am aware of your pedantic approach towards other peoples posting habits
(whether that's a good thing or a bad thing is not up to me), but you lost
me here. Could you rephrase your comment for a non-native English speaking
individual like myself?

<snip>

See at the top where it says "Martijn wrote:"? That's called an
attribution. It tells you who wrote the quoted text. You are deleting
all the attributions so I don't know who wrote, "You neglected to..."

So please leave in the bits saying who wrote what (the attributions) for
all the text still quoted.
--
Flash Gordon
Living in interesting times.
Although my email address says spam, it is real and I read it.

Nov 15 '05 #10

Martijn

Alan Balmer wrote:

On Fri, 5 Aug 2005 17:57:03 +0200, "Martijn" wrote:
[snipped]

You neglected to attribute the portion you quoted. Please don't do
that.

I am aware of your pedantic approach towards other peoples posting
habits (whether that's a good thing or a bad thing is not up to me),
but you lost me here. Could you rephrase your comment for a
non-native English speaking individual like myself?

You quoted parts of other people's writings without any indication of
who the writer was (attribution.) You did it again in this post. Look
at the top of this reply for an example of an attribution (to you).

Duely noted, thanks for the clarification. I did it again because such is
(or was) my way of doing it, no harm intended. I'll change it, np.
Look at nearly any other post in this forum for other examples.
I'll take your word for it.
A better newsreader would help. Failing that, you should look for a
program called OE-Quotefix, which reportedly makes Outlook Express act
somewhat like a real newsreader.

I did, I use it, and it does.

But what did you think of the post content-wise?

Still, everyone (except Eric and me) is avoiding the subject of the OP's
message. But then again, all these OT threads are much more fun, right ? :P
Unless netiquette or posting conventions all of a sudden have become
on-subject in this group.

--
Martijn
http://www.sereneconcepts.nl

Nov 15 '05 #11

Alan Balmer

On Fri, 5 Aug 2005 23:33:16 +0200, "Martijn"
<su*********************@hot-remove-mail.com> wrote:

But what did you think of the post content-wise?

Still, everyone (except Eric and me) is avoiding the subject of the OP's
message.

At this point, it's appropriate that the OP take some time to act on
the copious advice already received, and come back with his new,
improved version. No point in beating it to death yet ;-)
But then again, all these OT threads are much more fun, right ? :P
Unless netiquette or posting conventions all of a sudden have become
on-subject in this group.

Netiquette, posting conventions, and topicality are traditionally
on-topic in any newsgroup.
--
Al Balmer
Balmer Consulting
re************************@att.net

Nov 15 '05 #12

CBFalconer

Martijn wrote:

.... snip ...
You neglected to attribute the portion you quoted. Please don't
do that.
I am aware of your pedantic approach towards other peoples posting
habits (whether that's a good thing or a bad thing is not up to me),
but you lost me here. Could you rephrase your comment for a
non-native English speaking individual like myself?

Your English is so good that I can't tell you are a non-English
speaker. At any rate the attributions are the little pieces at the
head, such as the above "Martijn wrote". These allow the reader to
untangle who said what, and in response to whom. It often affects
whether or not to take that portion seriously.

But what did you think of the post content-wise?

I was only commenting on the indentation comment.

--
Chuck F (cb********@yahoo.com) (cb********@worldnet.att.net)
Available for consulting/temporary embedded and systems.
<http://cbfalconer.home.att.net> USE worldnet address!

Nov 15 '05 #13

Michael R. Copeland

> > while ((!feof(wf3)) || (!feof(wf1)))

This is the wrong way to check for end-of-file. Please
see Question 12.2 in the comp.lang.c Frequently Asked Questions
(FAQ) list at Indeed, but that's why I wanted suggestions how I can improve this
kludgey logic. Regardless of the normal method to read a file, I
couldn't work out a clean and simple way to loop through both files
while performing the merge. I guess I'd need to see how the
conventional logic applies to my particular problem (and I couldn't find
anything on google on this...).
http://www.eskimo.com/~scs/C-faq/top.html
{
if (!feof(wf1))
Ditto.
{
strcpy(WDBRec, "");

This looks pointless. You're about to overwrite the contents
of WDBRec by using fgets() on it, so why do you care what's in it
beforehand? Perhaps this is an attempt to rescue the situation
after the unreliable end-of-file test -- if so, once you fix the
test you won't need this any more.

The point of this to to assure that the "atoi" conversion that
follows doesn't produce a valid number from data residing after the file
had been read. Without it, I was getting the final record from one of
the files written to the output twice! It's gotten to the point that
adding logic/tweaking was getting me more and more away from a clean
solution.
By the way, you didn't show us what WDBRec is. From the way
you're using it, it should be an array of char; a pointer to a
malloc'ed area would not work here. It's a character array (as it should be)...
if (fgets(WDBRec, sizeof(WDBRec), wf1) != NULL)
nic++;
}
if (!feof(wf3))
{
strcpy(DBERec, "");
if (fgets(DBERec, sizeof(DBERec), wf3) != NULL)
bic++;
}
dbeBib = atoi(copy(DBERec, 1, 5));

You haven't shown us what copy() is. I'm going to assume
that it copies the second through sixth characters (that is,
array elements [1] through [5]) into a six-char array somewhere
and appends a '\0'. Whether this works depends a lot on the
location and nature of that intermediate six-char array; see
Question 7.5 for a description of one all-too-frequent error.

No, it is a "substr" function that copies character 1-5 of the input.
I didn't include some stuff I felt was extraneous to the logic, sorry...

Nov 15 '05 #14

Eric Sosman

Michael R. Copeland wrote:

while ((!feof(wf3)) || (!feof(wf1)))

This is the wrong way to check for end-of-file. Please
see Question 12.2 in the comp.lang.c Frequently Asked Questions
(FAQ) list at

Indeed, but that's why I wanted suggestions how I can improve this
kludgey logic. Regardless of the normal method to read a file, I
couldn't work out a clean and simple way to loop through both files
while performing the merge. I guess I'd need to see how the
conventional logic applies to my particular problem (and I couldn't find
anything on google on this...).

Hmmm. The FAQ seems pretty clear -- but then again, I
have the advantage of already knowing the answer. "It's
elementary," as Sherlock always says *after* explaining how
he figured it out.

Okay: End-of-input is only detected by attempting a read
and having it fail. There's no way to ask "Is there any input
left?" without actually attempting an input operation. This
may seem a silly restriction in connection with disk files,
but consider the situation when input is coming from a keyboard
or a network socket or something of the kind: There is no way
to predict that the user is about to strike ^D or ^Z or whatever
the local end-of-input key sequence is, nor is there any way to
predict that the other end of your socket connection is about
to hang up the phone on you. You cannot know what happened
until it actually happens -- so the only way to know that you
have reached end-of-input is to try to read something and get
a failure.

Now, end-of-file is only one of the reasons an input attempt
might fail: for example, input from a disk could fail in the
event of a head crash, or input from a keyboard could fail if
you spilled Coke Classic into the mechanism and shorted it out
with caramelized sugar. Most of C's input functions report a
kind of "generalized failure" no matter what the cause -- and
the *only* reason feof() exists is to let you figure out that
cause. If the function could read no more input because it
detected end-of-file, feof() will be true; if feof() is false,
the failure was something like a bad disk sector (and ferror()
will be true).

Putting this all together, you should write code that looks
something like

if (fgets(fgets(WDBRec, sizeof(WDBRec), wf1) == NULL) {
/* Woops! Couldn't get any more input. Why not? */
if (feof(wf1)){
/* Aha! We've reached end-of-file, and should
* remember the fact and not try to read any more.
*/
}
else {
/* Oh, woe! Oh, woe! The Nazgul have eaten
* the disk controller, and are even now taking
* the Token Ring back to Sauron! If I were to
* test ferror(wf1) at this point it would return
* true, confirming my worst fears.
*/
}
}
else {
/* Whoopee! I got some input data! */
}

I hope you understand by now that it is pointless to call
feof() or ferror() *before* an input operation; they only make
sense after an input operation has already failed.

You haven't shown us what copy() is. I'm going to assume
that it copies the second through sixth characters (that is,
array elements [1] through [5]) into a six-char array somewhere
and appends a '\0'. Whether this works depends a lot on the
location and nature of that intermediate six-char array; see
Question 7.5 for a description of one all-too-frequent error.

No, it is a "substr" function that copies character 1-5 of
the input. I didn't include some stuff I felt was extraneous
to the logic, sorry...

Um, er, that's exactly how I described it, is it not? And
all the things I said about it (including the Q7.5 reference and
the stuff about repeating the final record indefinitely after
EOF) still hold.

--
Eric Sosman
es*****@acm-dot-org.invalid

Nov 15 '05 #15

Similar topics