468,463 Members | 1,957 Online
Bytes | Developer Community
New Post

Home Posts Topics Members FAQ

Post your question to a community of 468,463 developers. It's quick & easy.

How to use scanf() safely?

Hi.
Before I use scanf(), I must malloc the memory for it, like this:

//Start
char * buffer;

buffer = malloc(20);
scanf("%s", &buffer);
//End

As we know, if I type 30 characters in, something bad will happen.
So, how can I solve this problem?
(I mean, no matter how many charaters you type in, it can works well.)
Jul 14 '06 #1
14 20078
iwinux wrote:
Hi.
Before I use scanf(), I must malloc the memory for it, like this:

//Start
char * buffer;

buffer = malloc(20);
scanf("%s", &buffer);
//End

As we know, if I type 30 characters in, something bad will happen.
So, how can I solve this problem?
(I mean, no matter how many charaters you type in, it can works well.)
Don't use scanf, use something like fgets instead.
--
==============
Not a pedant
==============
Jul 14 '06 #2
iwinux schrieb:
Before I use scanf(), I must malloc the memory for it, like this:

//Start
char * buffer;

buffer = malloc(20);
scanf("%s", &buffer);
//End

As we know, if I type 30 characters in, something bad will happen.
So, how can I solve this problem?
(I mean, no matter how many charaters you type in, it can works well.)
scanf() cannot easily be used in a safe manner.
See past discussions and the FAQ for this.
Usually, one just uses fgets() (or getchar() in a loop).

Back to scanf():
If you have compile time limits, you can use

#define stringize(s) #s
#define XSTR(s) stringize(s)
#define BUFSIZE 20

char *buffer = malloc(BUFSIZE+1);
if (buffer) {
if (1 == scanf("%"XSTR(BUFSIZE)"s", &buffer) {
do_something(buffer);
}
}

Otherwise, you can do
int len;
char *format;
char *buffer;

len = 1 + snprintf(0, 0, "%%%lus", bufSize);
if (len 0) {
format = malloc(len);
buffer = malloc(bufSize+1);
if (format && buffer) {
snprintf(format, len, "%%%lus", bufSize);
if (1 == scanf(format, buffer)) {
do_something(buffer);
}
}
}

Cheers
Michael
--
E-Mail: Mine is an /at/ gmx /dot/ de address.
Jul 14 '06 #3
iwinux wrote:
Hi.
Before I use scanf(), I must malloc the memory for it, like this:

//Start
char * buffer;

buffer = malloc(20);
if (buffer == NULL) ...
scanf("%s", &buffer);
scanf ("%s", buffer); /* no & */
//End

As we know, if I type 30 characters in, something bad will happen.
So, how can I solve this problem?
(I mean, no matter how many charaters you type in, it can works well.)
There's a whole suite of different things you can do.
One is to tell scanf() how much space is available:

scanf ("%19s", buffer); /* 19 + 1 == 20 */

This will prevent scanf() from trying to store characters
beyond the end of the allocated memory, but it still isn't
wonderful: If you type "supercalifragilisticexpialidocious"
the buffer will receive "supercalifragilisti" and a zero
byte, and then the next input operation will start with
"cexpial...". If you type "It is an Ancient Mariner" the
buffer will receive "It" and a zero byte, and the next
input operation will start with " is an...".

Experience suggests that scanf() is *not* a good
function for interactive input. It is often better to
read a line at a time with fgets() (not with gets(),
mind you!) and then extract data from the complete
line, possibly with sscanf(). fgets() has its own set
of problems, but they are usually easier to deal with
than those of the much more complex scanf().

--
Eric Sosman
es*****@acm-dot-org.invalid
Jul 14 '06 #4
iwinux said:
Hi.
Before I use scanf(), I must malloc the memory for it, like this:

//Start
char * buffer;

buffer = malloc(20);
What if malloc returns NULL?
scanf("%s", &buffer);
The & is incorrect.
//End

As we know, if I type 30 characters in, something bad will happen.
Right. Well, it might. Or it might try to lull you into a false sense of
security.
So, how can I solve this problem?
(I mean, no matter how many charaters you type in, it can works well.)
There is always a limit, of course. But if you are prepared to abandon
scanf, you can make the limit sufficiently large for any practical purpose,
without having stupidly large static arrays around the place.

http://www.cpax.org.uk/prg/writings/fgetdata.php contains an article I wrote
which deals with precisely this problem, and which comes up with some
practical solutions.

--
Richard Heathfield
"Usenet is a strange place" - dmr 29/7/1999
http://www.cpax.org.uk
email: rjh at above domain (but drop the www, obviously)
Jul 14 '06 #5
There is always a limit, of course. But if you are prepared to abandon
scanf, you can make the limit sufficiently large for any practical purpose,
without having stupidly large static arrays around the place.
So it's not easy to deal with a very long string?
Such as an text editor.
Jul 14 '06 #6
iwinux said:
>There is always a limit, of course. But if you are prepared to abandon
scanf, you can make the limit sufficiently large for any practical
purpose, without having stupidly large static arrays around the place.

So it's not easy to deal with a very long string?
Define "easy". It's easy for me. I don't know whether it's easy for you.
Such as an text editor.
<shrugIf you're writing a text editor, the ability to handle arbitrarily
long strings is the least of your worries.

--
Richard Heathfield
"Usenet is a strange place" - dmr 29/7/1999
http://www.cpax.org.uk
email: rjh at above domain (but drop the www, obviously)
Jul 14 '06 #7
Eric Sosman wrote:
There's a whole suite of different things you can do.
One is to tell scanf() how much space is available:

scanf ("%19s", buffer); /* 19 + 1 == 20 */

This will prevent scanf() from trying to store characters
beyond the end of the allocated memory, but it still isn't
wonderful: If you type "supercalifragilisticexpialidocious"
the buffer will receive "supercalifragilisti" and a zero
byte, and then the next input operation will start with
"cexpial...". If you type "It is an Ancient Mariner" the
buffer will receive "It" and a zero byte, and the next
input operation will start with " is an...".
Before the coming input operation, the program can clear the remainder
characters and has a correct beginning.
Experience suggests that scanf() is *not* a good
function for interactive input. It is often better to
read a line at a time with fgets() (not with gets(),
mind you!) and then extract data from the complete
line, possibly with sscanf(). fgets() has its own set
of problems, but they are usually easier to deal with
than those of the much more complex scanf().
Do you think the fgets and sscanf combination is also a right candidate
for non-user-interactive input, e.g. file input? Which functions should
be used for file input? Thank you.

Jul 18 '06 #8
lovecreatesbeauty wrote:
Eric Sosman wrote:
>>
Experience suggests that scanf() is *not* a good
function for interactive input. It is often better to
read a line at a time with fgets() (not with gets(),
mind you!) and then extract data from the complete
line, possibly with sscanf(). fgets() has its own set
of problems, but they are usually easier to deal with
than those of the much more complex scanf().

Do you think the fgets and sscanf combination is also a right candidate
for non-user-interactive input, e.g. file input? Which functions should
be used for file input? Thank you.
It depends on the "provenance" of the file. It's perfectly
all right to use fscanf() directly if you're sure that the file
adheres to the expected format (or if you're willing to accept
the consequences of a deviation). If a program writes a file,
rewinds it, and reads it back again, fscanf() seems fine. If
Program A writes the file and a "related" Program B reads it,
fscanf() with bare-bones error-checking may be good enough (one
still needs some error-checking in case A 1.1 writes something
that B 1.0 can't digest).

If the file comes from an "unrelated" program, one must be
more cautious when reading it. If you write a program intending
that it be used as "vmstat 10 | myprogram" you must be on guard
against "vmstat -p 10 | myprogram" or "iostat -xn 5 | myprogram"
or even "myprogram < /etc/passwd". It is usually sufficient to
terminate with regrets when unexpected input is detected, but the
detection itself is also usually important ...

For "untrusted" line-oriented files, fgets() is a good place
to start because it captures the notion of "line." (Imperfectly,
in the case of lines too long for the provided buffer, but you
can write a little extra code to deal with that or to detect it
and say "This line of >1023 characters didn't come from vmstat.")
Once you've got the line sitting in a character array, C has a
good assortment of surgical tools for dissecting it: there's
sscanf(), strtok() -- I use it unashamedly, with care -- strchr(),
the <ctype.harsenal, strtod(), and all the rest.

In extreme cases, you might even write a full-fledged parser
that recognizes the input as matching (or failing to match) a
formal grammar, and possibly verifies other constraints as well --
the XML fad is founded on the desire to be able to do this sort
of thing in a fairly mechanical fashion. Such a parser might or
might not need the notion of "line;" it depends on the format.

--
Eric Sosman
es*****@acm-dot-org.invalid
Jul 18 '06 #9
Eric Sosman wrote:
It is often better to
read a line at a time with fgets() (not with gets(),
mind you!) and then extract data from the complete
line, possibly with sscanf().
I once thought fgets and sscanf may be better than the single scanf. At
the moment, I do not have that feeling at all. sscanf and scanf come
from one same family, the defeats in scanf remain in sscanf. When a
user enters, e.g. "WHAT_VALUE_ABC", both fail:
scanf("%d", &i);
or
sscanf(buf, "%d", &i);

The program validates the range of the data user provided, prompts
users to reenter proper data after invalid data provided. Isn't this
the right way?

Jul 18 '06 #10


lovecreatesbeauty wrote On 07/18/06 11:29,:
Eric Sosman wrote:
> It is often better to
read a line at a time with fgets() (not with gets(),
mind you!) and then extract data from the complete
line, possibly with sscanf().


I once thought fgets and sscanf may be better than the single scanf. At
the moment, I do not have that feeling at all. sscanf and scanf come
from one same family, the defeats in scanf remain in sscanf. When a
user enters, e.g. "WHAT_VALUE_ABC", both fail:
scanf("%d", &i);
or
sscanf(buf, "%d", &i);

The program validates the range of the data user provided, prompts
users to reenter proper data after invalid data provided. Isn't this
the right way?
Try the experiment yourself. For each of these
programs:

/* Program S */
#include <stdio.h>
int main(void) {
int x;
for (;;) {
puts ("Enter a value:");
if (scanf("%d", &x) == 1)
break;
puts ("Try again, please.");
}
printf ("The number is %d\n", x);
return 0;
}

/* Program SS */
#include <stdio.h>
int main(void) {
int x;
for (;;) {
char buff[100];
puts ("Enter a value:");
if (fgets(buff, sizeof buff, stdin) == buff
&& sscanf(buff, "%d", &x) == 1)
break;
puts ("Try again, please.");
}
printf ("The number is %d\n", x);
return 0;
}

.... enter WHAT_VALUE_ABC at the first prompt and 42 at
the second. Are there any differences in behavior? If
so, which behavior do you think is more useful in an
interactive setting? Why?

--
Er*********@sun.com

Jul 18 '06 #11
lovecreatesbeauty wrote:
>
Eric Sosman wrote:
There's a whole suite of different things you can do.
One is to tell scanf() how much space is available:

scanf ("%19s", buffer); /* 19 + 1 == 20 */

This will prevent scanf() from trying to store characters
beyond the end of the allocated memory, but it still isn't
wonderful: If you type "supercalifragilisticexpialidocious"
the buffer will receive "supercalifragilisti" and a zero
byte, and then the next input operation will start with
"cexpial...". If you type "It is an Ancient Mariner" the
buffer will receive "It" and a zero byte, and the next
input operation will start with " is an...".

Before the coming input operation, the program can clear the remainder
characters and has a correct beginning.
scanf can be used more powerfully than that:

/* BEGIN new.c */
/*
** If rc equals 0, then an empty line was entered
** and the array contains garbage.
** If rc equals EOF, then the end of file was reached.
** If rc equals 1, then there is a string in array.
** Up to LENGTH number of characters are read
** from a line of a text file or stream.
** If the line is longer than LENGTH,
** then the extra characters are discarded.
*/
#include <stdio.h>

#define LENGTH 80
#define str(x) # x
#define xstr(x) str(x)

int main(void)
{
int rc;
char array[LENGTH + 1];

puts("The LENGTH macro is " xstr(LENGTH));
fputs("Enter a string with spaces:", stdout);
fflush(stdout);
rc = scanf("%" xstr(LENGTH) "[^\n]%*[^\n]", array);
if (!feof(stdin)) {
getchar();
}
while (rc == 1) {
printf("Your string is:%s\n\n"
"Hit the Enter key to end,\nor enter "
"another string to continue:", array);
fflush(stdout);
rc = scanf("%" xstr(LENGTH) "[^\n]%*[^\n]", array);
if (!feof(stdin)) {
getchar();
}
if (rc == 0) {
*array = '\0';
}
}
return 0;
}

/* END new.c */

--
pete
Jul 18 '06 #12
Eric Sosman wrote:
lovecreatesbeauty wrote On 07/18/06 11:29,:
Eric Sosman wrote:
It is often better to
read a line at a time with fgets() (not with gets(),
mind you!) and then extract data from the complete
line, possibly with sscanf().

I once thought fgets and sscanf may be better than the single scanf. At
the moment, I do not have that feeling at all. sscanf and scanf come
from one same family, the defeats in scanf remain in sscanf. When a
user enters, e.g. "WHAT_VALUE_ABC", both fail:
scanf("%d", &i);
or
sscanf(buf, "%d", &i);

The program validates the range of the data user provided, prompts
users to reenter proper data after invalid data provided. Isn't this
the right way?

Try the experiment yourself. For each of these
programs:

/* Program S */
#include <stdio.h>
int main(void) {
int x;
for (;;) {
puts ("Enter a value:");
if (scanf("%d", &x) == 1)
break;
puts ("Try again, please.");
}
printf ("The number is %d\n", x);
return 0;
}

/* Program SS */
#include <stdio.h>
int main(void) {
int x;
for (;;) {
char buff[100];
puts ("Enter a value:");
if (fgets(buff, sizeof buff, stdin) == buff
&& sscanf(buff, "%d", &x) == 1)
break;
puts ("Try again, please.");
}
printf ("The number is %d\n", x);
return 0;
}

... enter WHAT_VALUE_ABC at the first prompt and 42 at
the second. Are there any differences in behavior? If
so, which behavior do you think is more useful in an
interactive setting? Why?
/*scanf and sscanf are very similar. I can think of two differences
between them, one is sscanf needs one more argument, the other is the
difference demonstrated by the example code. but that can be fixed, see
line 9. please correct me if I am wrong.*/

/* Program S.2 */
#include <stdio.h>
int main(void) {
int x;
for (;;){
puts("Enter a value:");
if (scanf("%d", &x) == 1)
break;
while ((x = getchar()) != '\n' && x != EOF) ; /*line 9*/
puts ("Try again, please.");
}
printf ("The number is %d\n", x);
return 0;
}

Jul 19 '06 #13
lovecreatesbeauty wrote:
>
/*scanf and sscanf are very similar. I can think of two differences
between them, one is sscanf needs one more argument, the other is the
difference demonstrated by the example code. but that can be fixed, see
line 9. please correct me if I am wrong.*/

/* Program S.2 */
#include <stdio.h>
int main(void) {
int x;
for (;;){
puts("Enter a value:");
if (scanf("%d", &x) == 1)
break;
while ((x = getchar()) != '\n' && x != EOF) ; /*line 9*/
puts ("Try again, please.");
}
printf ("The number is %d\n", x);
return 0;
}
Good: You've spotted the difference -- but you haven't
thought about it enough yet. Exercise: Modify the program
to read an integer from one line and a double from another,
prompting with "Enter an integer" and "Enter a double".
Test it by entering "42" on the first line and "42.0" on
the second. Then run it again, but this time enter "4 2"
on the first line. Run it a third time, entering "42 BAD"
on the first line and "BAD 42.0" on the second. Run it a
fourth time, entering " " at each prompt. Try to emit error
messages that describe as accurately as possible just how the
input differs from what the program expects.

The fundamental reason that fscanf() is not very good for
interactive input is that much interactive input is line-oriented,
but fscanf() is very nearly oblivious to line boundaries. fgets()
can provide the line awareness and then sscanf() can perform the
parsing, with the knowledge that it's operating on a line and not
on a stream of input that crosses an arbitrary number of line
boundaries, possibly more or fewer than you were expecting.

It is *possible* to do interactive input with fscanf(),
just as it is *possible* to write full-fledged C programs without
for, do, while, and if. Nobody will forbid you to indulge in
self-imposed hardships if that's your pleasure, but many will
wonder why you insist on doing things the hard way.

--
Eric Sosman
es*****@acm-dot-org.invalid
Jul 19 '06 #14
iwinux wrote:
Before I use scanf(), I must malloc the memory for it, like this:

//Start
char * buffer;

buffer = malloc(20);
scanf("%s", &buffer);
//End

As we know, if I type 30 characters in, something bad will happen.
So, how can I solve this problem?
As with any problem, to solve it you must first understand the nature
of the problem. scanf() forces all destination variables to be
predclared before the input starts. So using scanf itself is the
source of the problem. In general its preferable to obtain the input
from some other method (an iterated fgets is possible, but hardly
ideal) then use *sscanf()* AFTER deciding on how much memory to malloc
for your destinations.

(Another problem is that more than likely you don't want to scanf()
parsing semantics. Strings are terminated by white space with scanf()
for some inexplicable reason.)
(I mean, no matter how many charaters you type in, it can works well.)
Anyhow, first lets start with getting a full line of input safely (C
doesn't have any built-in provisions for doing this):

http://www.pobox.com/~qed/userInput.html

The key point being that using fgetstralloc(), you know the length of
the input and have a the entire contents of the input in one shot (most
other programming languages have a built-in mechanism for doing this,
BTW). From there you can estimate the destination sizes, or use
strcspn() to help you parse before you figure out exactly how much
memory you need for your destination parameters, then use sscanf() or
whatever to extract the exact results.

--
Paul Hsieh
http://www.pobox.com/~qed/
http://bstring.sf.net/

Jul 19 '06 #15

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

39 posts views Thread by Teh Charleh | last post: by
57 posts views Thread by Eric Boutin | last post: by
12 posts views Thread by B Thomas | last post: by
17 posts views Thread by Lefty Bigfoot | last post: by
33 posts views Thread by Lalatendu Das | last post: by
20 posts views Thread by Xavoux | last post: by
51 posts views Thread by deepak | last post: by
reply views Thread by kmladenovski | last post: by
By using this site, you agree to our Privacy Policy and Terms of Use.