473,387 Members | 1,520 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,387 software developers and data experts.

Segfault question.

When I started testing the algorithms for my wrap program, I threw together
this snippet of code, which works quite well. Except that it (predictably)
segfaults at the end when it tries to go beyond the file. At some point, I
tried to mend that behavior using feof() but without success. The
functionality is not harmed, but this has started to bug me. What am I
missing here? Sometimes being a code duffer is frustrating!! lol!!!

The code:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

int main (int argc, char *argv[])
{
FILE *fp;

int len;
char buf[100];

if ((fp = fopen(argv[1], "r")) == NULL) {
fprintf(stderr, "can't open fp");
return EXIT_FAILURE;
}

while (((len = strlen(fgets(buf, 80, fp))) != 0)) {
printf(" %i\t", len);
printf("%s", buf);
}

fclose(fp); /* Nah, no error checking here... */

return EXIT_SUCCESS;
}

Thanks for reading.
--
Email is wtallman at olypen dot com
Nov 14 '05 #1
10 1912

"name" <us**@host.domain> wrote in message
news:10*************@corp.supernews.com...
When I started testing the algorithms for my wrap program, I threw together this snippet of code, which works quite well.
No it doesn't. It invokes undefined behavior.
Except that it (predictably)
segfaults at the end when it tries to go beyond the file.
And I can see exactly why. See below.
At some point, I
tried to mend that behavior using feof() but without success.
Guessing rarely will fix the real problem.
The
functionality is not harmed,
Well, no, you can't kill something that's already dead. :-)
but this has started to bug me.
Yes, you have a serious, fatal bug.
What am I
missing here?
You apparently forgot to check the documentation of a library
function, because you didn't allow for its possible failure.
Sometimes being a code duffer is frustrating!! lol!!!
Especially when you try to go to fast, as I suspect you've done.

The code:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

int main (int argc, char *argv[])
{
FILE *fp;

int len;
I see below that you store the return value from 'strlen()'
in 'len'. This means its type should be 'size_t', not 'int'.
char buf[100];

if ((fp = fopen(argv[1], "r")) == NULL) {
You should check that 'argv[1]' is indeed a valid pointer
(i.e. make sure argc > 1) before trying to dereference it.
If argc <= 1, then the expression 'argv[1]' invokes
undefined behavior.
fprintf(stderr, "can't open fp");
return EXIT_FAILURE;
}

while (((len = strlen(fgets(buf, 80, fp))) != 0)) {
If 'fgets()' encounters an error or end of file, it will return
NULL. If you pass NULL as the argument to 'strlen()' you get
undefined behavior (which could be manifested as a 'segfault').
printf(" %i\t", len);
printf("%s", buf);
}

fclose(fp); /* Nah, no error checking here... */
Nor did you do error checking where it really mattered.
Check the return value of *any* function which is documented
to possibly return a 'failure' indication (as does 'fgets()').

return EXIT_SUCCESS;
}


-Mike
Nov 14 '05 #2
name wrote:

When I started testing the algorithms for my wrap program, I
threw together this snippet of code, which works quite well.
Except that it (predictably) segfaults at the end when it tries
to go beyond the file. At some point, I tried to mend that
behavior using feof() but without success. The functionality is
not harmed, but this has started to bug me. What am I missing
here? Sometimes being a code duffer is frustrating!! lol!!!

The code:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

int main (int argc, char *argv[])
{
FILE *fp;

int len;
char buf[100];

if ((fp = fopen(argv[1], "r")) == NULL) {
fprintf(stderr, "can't open fp");
return EXIT_FAILURE;
}

while (((len = strlen(fgets(buf, 80, fp))) != 0)) {
printf(" %i\t", len);
printf("%s", buf);
}

fclose(fp); /* Nah, no error checking here... */

return EXIT_SUCCESS;
}


Look up what fgets returns when it encounters end-of-file or an
i/o error. Then consider what strlen does when you ask it to chew
on that.

<rant> Please get rid of the excessive indentation in your code.
3 or 4 spaces is quite enough. The excessive space makes lines
too long and causes things to disappear over the right margin
(although the above lines are short enough to avoid that). Don't
use tabs. </rant>

--
"A man who is right every time is not likely to do very much."
-- Francis Crick, co-discover of DNA
"There is nothing more amazing than stupidity in action."
-- Thomas Matthews
Nov 14 '05 #3
CBFalconer wrote:
Look up what fgets returns when it encounters end-of-file or an
i/o error. Then consider what strlen does when you ask it to chew
on that. I've neve been able to figue out when fgets returns NULL. Please can you
explain it.
Fo example if I've got the following file:
aaaaaaaa\n
bbbbbbbbb<EOF>

And I would call fgets three times in a raw with a big buffer (lager
then 10 bytes or chars). When whould fgets return NULL?
<rant> Please get rid of the excessive indentation in your code.
3 or 4 spaces is quite enough. The excessive space makes lines
too long and causes things to disappear over the right margin
(although the above lines are short enough to avoid that). Don't
use tabs. </rant>
IMHO if lines are too long it's time to create new function to solve the
part of the whole task.

"name" wrote while (((len = strlen(fgets(buf, 80, fp))) != 0)) {

I think that functional style is good enough, so I suggest that you'll
write a wrapper fo strlen.
Something like:
int my_strlen (const char *s)
{
size_t tmp;

if (s == NULL)
return -1;
tmp = strlen (s);
if (tmp > INT_MAX) {
errno = EINVAL;
return -1;
}
return (int)tmp;
}

--
vir
Nov 14 '05 #4
Victor Nazarov wrote:
CBFalconer wrote:
Look up what fgets returns when it encounters end-of-file or an
i/o error. Then consider what strlen does when you ask it to
chew on that.


I've neve been able to figue out when fgets returns NULL. Please
can you explain it.


Look at the last two references in my sig. line.

--
Some useful references:
<http://www.ungerhu.com/jxh/clc.welcome.txt>
<http://www.eskimo.com/~scs/C-faq/top.html>
<http://benpfaff.org/writings/clc/off-topic.html>
<http://anubis.dkuug.dk/jtc1/sc22/wg14/www/docs/n869/> (C99)
<http://www.dinkumware.com/refxc.html> C-library
Nov 14 '05 #5
In article <news:41***************@yahoo.com>
<rant> Please get rid of the excessive indentation in your code.
3 or 4 spaces is quite enough. The excessive space makes lines
too long and causes things to disappear over the right margin
(although the above lines are short enough to avoid that). Don't
use tabs. </rant>


I personally do not mind 8-character-per-lexical-level indentation,
although I do think 4 works better. I do remember hearing, from
the "human/computer interaction" folks and people doing visual
studies, that anything less than three characters is not so good,
because -- depending on one's font -- two-character indentations
may not create sufficient angles to trigger the brain's horizontal
and vertical line detectors. These detectors exist, though, and
run all the time whether we want them to or not; careful indentation
takes advantage of them.

As for tabs: use them or do not, but do not change your system's
interpretation of them. If you want n-character indentation where
"n" differs from the system's interpretation of "hardware" tabs,
just make sure that when you push your "tab" key in your editor,
it inserts spaces and/or tabs in order to get to the n'th column.
(In some cases, this may mean pushing a key other than the one
labelled "tab". For instance, in vi/nvi/vim, use ^T and ^D to
indent and -- assuming you have autoindent set -- de-indent by the
value you have put in the "shiftwidth" setting. In emacs, of
course, the whole thing is fully programmable.) If you do this
*instead* of instructing your editor to re-interpret the hardware
tabs, then anyone using the same underlying system will be able to
edit your code and see the same columnization that you see.
--
In-Real-Life: Chris Torek, Wind River Systems
Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W) +1 801 277 2603
email: forget about it http://web.torek.net/torek/index.html
Reading email is like searching for food in the garbage, thanks to spammers.
Nov 14 '05 #6
Oops, included wrong file! My bad!! That was a prototype that didn't yield
correct results, as well as being badly constructed. The user version does
yield correct results but is still badly constructed (natch...), so exhibits
the same behavior.

On 2004-09-05, CBFalconer <cb********@yahoo.com> wrote:

Look up what fgets returns when it encounters end-of-file or an
i/o error. Then consider what strlen does when you ask it to chew
on that.


Okay, fgets returns a null pointer if it encounters either an EOF
immediately, or if it encounters an error. In the latter case, the string
array is undefined, so error checking fgets should be the first thing to do,
I gather. Passing a null pointer to strlen is what causes the segfault?
Does that mean that strlen returns that error because it doesn't recognize
what has been passed and so assumes it's outside of its allotted territory?

Or is something else entirely going on and I'm still at sea? <grin>

Thanks!
--
Email is wtallman at olypen dot com
Nov 14 '05 #7
In article <news:10*************@corp.supernews.com>
name <us**@host.domain> wrote:
Okay, fgets returns a null pointer if it encounters either an EOF
immediately, or if it encounters an error.
Yes (although the "error" case is a bit dodgy; some fgets()
implementations will only return NULL on EOF-or-error-at-start,
treating error-in-the-middle as a sign to return a valid C string
that does not end with '\n').
In the latter case, the string array is undefined, so error
checking fgets should be the first thing to do, I gather.
Yes. More precisely, check whether fgets() returned its first
argument or NULL (these are the only two possibilities). (You
can also use (feof(fp) || ferror(fp)) to see whether EOF and/or
error were encountered "along the way", but this may interact
badly with fgets() variants that handle partial input lines, as
I described above.)
Passing a null pointer to strlen is what causes the segfault?
Just so. The effect is officially undefined, but a "nice" system
such as a Linux box will trap the error at runtime and terminate
the program (by default -- programs can override this, and debuggers
can trap the problem before the program sees it). Less-nice systems
might have strlen() return 42.
Does that mean that strlen returns that error because it doesn't recognize
what has been passed and so assumes it's outside of its allotted territory?

Or is something else entirely going on and I'm still at sea? <grin>


On your system, strlen() never returns at all -- so it makes no
sense to say "strlen returns that error". You could say "strlen
produces that result", which at least avoids the word "returns". :-)

The method by which Linux detects the problem and aborts the program
is beyond the scope of this newsgroup.[%] Here, it suffices to say
that strlen() requires a C string, and (char *)NULL does not qualify
as one. (A "C string" is a data structure consisting of one or
more "char"s in sequence, beginning with the char whose address is
given as a value of type "char *", and ending with the first '\0'.
Since NULL never points to a valid C data object, it cannot provide
the first byte of a string. Note that the empty string begins and
ends with its '\0' byte, which makes it quite different from NULL:
there is at least one valid C "char" there holding the '\0'.)

[% Still, I will mention that it has to do with "virtual memory"
and the on-chip MMU, which translates "virtual addresses" used by
running programs into "physical addresses" used to locate actual
bytes in RAM. The translation process has several trapping options,
with varying methods of handling them and "degrees of fatality":
areas can be marked entire-off-limits, or "within limits but not
present in RAM at the moment", or "valid but read-only", and so
on. On some CPUs, areas can even be marked execute-only, so that
it is impossible to read CPU instructions as data. Linux reserves
some areas as "not allocated to the program" and sets up the MMU
so that those areas are marked off-limits, then delivers a segmentation
fault if you attempt to read, write, or execute from such an area.]
--
In-Real-Life: Chris Torek, Wind River Systems
Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W) +1 801 277 2603
email: forget about it http://web.torek.net/torek/index.html
Reading email is like searching for food in the garbage, thanks to spammers.
Nov 14 '05 #8
On 2004-09-05, Chris Torek <no****@torek.net> wrote:
<saved densely informative post for further study!!>

I gather that, for my purposes, the segfault at the EOF is comfortable,
because the EOF virtually always follows a newline and will be thus at the
beginning of a string. Doesn't bother me, but... suppose I process a file
where the EOF does show up without being preceeded by a newline? At that
point, I can't just live with a segfault unless I'm sure of the data I'm
getting.

Certainly I'm not in the market for the solution to Schroedinger's God
Problem as obtained some thousands of centuries hence in a land far, far
away!! LOL!!! Ummm... that was (will have been?) 42, was it not? <grin>

Perhaps I should just use the venerable ((c=getc(fp))!=EOF) approach. I'm
using that to drive the wrap program and, as near as I can tell, it's simple
enough that it should be considered bullet-proof. I understand that's not
the most efficient way of going, but for what I'm doing, that's really not
relevant. I can say that I am not disposed to even touch the scanf family
of functions! <grin>

I must presume that there are other more sophisticated strategies in use,
but I'm going to have to stick with what I think I can manage to understand,
lest I inundate myself unnecessarily!

In any case, thanks for all the info!
--
Email is wtallman at olypen dot com
Nov 14 '05 #9
On Mon, 06 Sep 2004 04:59:41 -0000, name <us**@host.domain> wrote:
I gather that, for my purposes, the segfault at the EOF is comfortable,
because the EOF virtually always follows a newline and will be thus at the
beginning of a string. Doesn't bother me, but... suppose I process a file
where the EOF does show up without being preceeded by a newline? At that
point, I can't just live with a segfault unless I'm sure of the data I'm
getting.


Not a good plan. The segfault you are currently experiencing is the
result of undefined behavior. The thing about undefined behavior is
it need not be consistent. Tomorrow it could manifest itself in a
completely different fashion, such as deleting the file you just
finished processing. Upgrading your hardware, OS, or compiler could
also change the behavior.

<<Remove the del for email>>
Nov 14 '05 #10
On 2004-09-06, Barry Schwarz <sc******@deloz.net> wrote:
On Mon, 06 Sep 2004 04:59:41 -0000, name <us**@host.domain> wrote:
I gather that, for my purposes, the segfault at the EOF is comfortable,
because the EOF virtually always follows a newline and will be thus at the
beginning of a string. Doesn't bother me, but... suppose I process a file
where the EOF does show up without being preceeded by a newline? At that
point, I can't just live with a segfault unless I'm sure of the data I'm
getting.


Not a good plan. The segfault you are currently experiencing is the
result of undefined behavior. The thing about undefined behavior is
it need not be consistent. Tomorrow it could manifest itself in a
completely different fashion, such as deleting the file you just
finished processing. Upgrading your hardware, OS, or compiler could
also change the behavior.


Okay, thanks.
--
Email is wtallman at olypen dot com
Nov 14 '05 #11

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

12
by: Nathaniel Echols | last post by:
I've written a function in C to perform protein sequence alignment. This works fine in a standalone C program. I've added the necessary packaging to use it in Python; it returns three strings and...
6
by: Stefan Behnel | last post by:
Hi! In Python 2.4b3, the deque is causing a segfault on two different machines I tested on. With deque, my program runs fine for a while (at least some tens of seconds up to minutes) and then...
0
by: dale | last post by:
Python newbie disclaimer on I am running an app with Tkinter screen in one thread and command-line input in another thread using raw_input(). First question - is this legal, should it run...
165
by: Dieter | last post by:
Hi. In the snippet of code below, I'm trying to understand why when the struct dirent ** namelist is declared with "file" scope, I don't have a problem freeing the allocated memory. But...
162
by: Richard Heathfield | last post by:
I found something interesting on the Web today, purely by chance. It would be funny if it weren't so sad. Or sad if it weren't so funny. I'm not sure which. ...
3
by: kj | last post by:
I am trying to diagnose a bug in my code, but I can't understand what's going on. I've narrowed things down to this: I have a function, say foo, whose signature looks something like: int foo(...
2
by: danielesalatti | last post by:
Hello!! I'm studying c++ and I'm trying to get a little piece of code working, but I'm getting a segfault with strlen here: void tabhash::set (url *U) { uint hash = U->hashCode(); char* url =...
10
by: somebody | last post by:
There are two files below named search.c and search.h. In the for loop in search.c, the for loop never exits, even if mystruct.field1 has no match. Instead of exiting the for loop it keeps going...
14
by: Donn Ingle | last post by:
Yo, An app of mine relies on PIL. When PIL hits a certain problem font (for unknown reasons as of now) it tends to segfault and no amount of try/except will keep my wxPython app alive. My first...
0
by: aa123db | last post by:
Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.