473,385 Members | 1,922 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,385 software developers and data experts.

How to use scanf() safely?

Hi.
Before I use scanf(), I must malloc the memory for it, like this:

//Start
char * buffer;

buffer = malloc(20);
scanf("%s", &buffer);
//End

As we know, if I type 30 characters in, something bad will happen.
So, how can I solve this problem?
(I mean, no matter how many charaters you type in, it can works well.)
Jul 14 '06 #1
14 21929
iwinux wrote:
Hi.
Before I use scanf(), I must malloc the memory for it, like this:

//Start
char * buffer;

buffer = malloc(20);
scanf("%s", &buffer);
//End

As we know, if I type 30 characters in, something bad will happen.
So, how can I solve this problem?
(I mean, no matter how many charaters you type in, it can works well.)
Don't use scanf, use something like fgets instead.
--
==============
Not a pedant
==============
Jul 14 '06 #2
iwinux schrieb:
Before I use scanf(), I must malloc the memory for it, like this:

//Start
char * buffer;

buffer = malloc(20);
scanf("%s", &buffer);
//End

As we know, if I type 30 characters in, something bad will happen.
So, how can I solve this problem?
(I mean, no matter how many charaters you type in, it can works well.)
scanf() cannot easily be used in a safe manner.
See past discussions and the FAQ for this.
Usually, one just uses fgets() (or getchar() in a loop).

Back to scanf():
If you have compile time limits, you can use

#define stringize(s) #s
#define XSTR(s) stringize(s)
#define BUFSIZE 20

char *buffer = malloc(BUFSIZE+1);
if (buffer) {
if (1 == scanf("%"XSTR(BUFSIZE)"s", &buffer) {
do_something(buffer);
}
}

Otherwise, you can do
int len;
char *format;
char *buffer;

len = 1 + snprintf(0, 0, "%%%lus", bufSize);
if (len 0) {
format = malloc(len);
buffer = malloc(bufSize+1);
if (format && buffer) {
snprintf(format, len, "%%%lus", bufSize);
if (1 == scanf(format, buffer)) {
do_something(buffer);
}
}
}

Cheers
Michael
--
E-Mail: Mine is an /at/ gmx /dot/ de address.
Jul 14 '06 #3
iwinux wrote:
Hi.
Before I use scanf(), I must malloc the memory for it, like this:

//Start
char * buffer;

buffer = malloc(20);
if (buffer == NULL) ...
scanf("%s", &buffer);
scanf ("%s", buffer); /* no & */
//End

As we know, if I type 30 characters in, something bad will happen.
So, how can I solve this problem?
(I mean, no matter how many charaters you type in, it can works well.)
There's a whole suite of different things you can do.
One is to tell scanf() how much space is available:

scanf ("%19s", buffer); /* 19 + 1 == 20 */

This will prevent scanf() from trying to store characters
beyond the end of the allocated memory, but it still isn't
wonderful: If you type "supercalifragilisticexpialidocious"
the buffer will receive "supercalifragilisti" and a zero
byte, and then the next input operation will start with
"cexpial...". If you type "It is an Ancient Mariner" the
buffer will receive "It" and a zero byte, and the next
input operation will start with " is an...".

Experience suggests that scanf() is *not* a good
function for interactive input. It is often better to
read a line at a time with fgets() (not with gets(),
mind you!) and then extract data from the complete
line, possibly with sscanf(). fgets() has its own set
of problems, but they are usually easier to deal with
than those of the much more complex scanf().

--
Eric Sosman
es*****@acm-dot-org.invalid
Jul 14 '06 #4
iwinux said:
Hi.
Before I use scanf(), I must malloc the memory for it, like this:

//Start
char * buffer;

buffer = malloc(20);
What if malloc returns NULL?
scanf("%s", &buffer);
The & is incorrect.
//End

As we know, if I type 30 characters in, something bad will happen.
Right. Well, it might. Or it might try to lull you into a false sense of
security.
So, how can I solve this problem?
(I mean, no matter how many charaters you type in, it can works well.)
There is always a limit, of course. But if you are prepared to abandon
scanf, you can make the limit sufficiently large for any practical purpose,
without having stupidly large static arrays around the place.

http://www.cpax.org.uk/prg/writings/fgetdata.php contains an article I wrote
which deals with precisely this problem, and which comes up with some
practical solutions.

--
Richard Heathfield
"Usenet is a strange place" - dmr 29/7/1999
http://www.cpax.org.uk
email: rjh at above domain (but drop the www, obviously)
Jul 14 '06 #5
There is always a limit, of course. But if you are prepared to abandon
scanf, you can make the limit sufficiently large for any practical purpose,
without having stupidly large static arrays around the place.
So it's not easy to deal with a very long string?
Such as an text editor.
Jul 14 '06 #6
iwinux said:
>There is always a limit, of course. But if you are prepared to abandon
scanf, you can make the limit sufficiently large for any practical
purpose, without having stupidly large static arrays around the place.

So it's not easy to deal with a very long string?
Define "easy". It's easy for me. I don't know whether it's easy for you.
Such as an text editor.
<shrugIf you're writing a text editor, the ability to handle arbitrarily
long strings is the least of your worries.

--
Richard Heathfield
"Usenet is a strange place" - dmr 29/7/1999
http://www.cpax.org.uk
email: rjh at above domain (but drop the www, obviously)
Jul 14 '06 #7
Eric Sosman wrote:
There's a whole suite of different things you can do.
One is to tell scanf() how much space is available:

scanf ("%19s", buffer); /* 19 + 1 == 20 */

This will prevent scanf() from trying to store characters
beyond the end of the allocated memory, but it still isn't
wonderful: If you type "supercalifragilisticexpialidocious"
the buffer will receive "supercalifragilisti" and a zero
byte, and then the next input operation will start with
"cexpial...". If you type "It is an Ancient Mariner" the
buffer will receive "It" and a zero byte, and the next
input operation will start with " is an...".
Before the coming input operation, the program can clear the remainder
characters and has a correct beginning.
Experience suggests that scanf() is *not* a good
function for interactive input. It is often better to
read a line at a time with fgets() (not with gets(),
mind you!) and then extract data from the complete
line, possibly with sscanf(). fgets() has its own set
of problems, but they are usually easier to deal with
than those of the much more complex scanf().
Do you think the fgets and sscanf combination is also a right candidate
for non-user-interactive input, e.g. file input? Which functions should
be used for file input? Thank you.

Jul 18 '06 #8
lovecreatesbeauty wrote:
Eric Sosman wrote:
>>
Experience suggests that scanf() is *not* a good
function for interactive input. It is often better to
read a line at a time with fgets() (not with gets(),
mind you!) and then extract data from the complete
line, possibly with sscanf(). fgets() has its own set
of problems, but they are usually easier to deal with
than those of the much more complex scanf().

Do you think the fgets and sscanf combination is also a right candidate
for non-user-interactive input, e.g. file input? Which functions should
be used for file input? Thank you.
It depends on the "provenance" of the file. It's perfectly
all right to use fscanf() directly if you're sure that the file
adheres to the expected format (or if you're willing to accept
the consequences of a deviation). If a program writes a file,
rewinds it, and reads it back again, fscanf() seems fine. If
Program A writes the file and a "related" Program B reads it,
fscanf() with bare-bones error-checking may be good enough (one
still needs some error-checking in case A 1.1 writes something
that B 1.0 can't digest).

If the file comes from an "unrelated" program, one must be
more cautious when reading it. If you write a program intending
that it be used as "vmstat 10 | myprogram" you must be on guard
against "vmstat -p 10 | myprogram" or "iostat -xn 5 | myprogram"
or even "myprogram < /etc/passwd". It is usually sufficient to
terminate with regrets when unexpected input is detected, but the
detection itself is also usually important ...

For "untrusted" line-oriented files, fgets() is a good place
to start because it captures the notion of "line." (Imperfectly,
in the case of lines too long for the provided buffer, but you
can write a little extra code to deal with that or to detect it
and say "This line of >1023 characters didn't come from vmstat.")
Once you've got the line sitting in a character array, C has a
good assortment of surgical tools for dissecting it: there's
sscanf(), strtok() -- I use it unashamedly, with care -- strchr(),
the <ctype.harsenal, strtod(), and all the rest.

In extreme cases, you might even write a full-fledged parser
that recognizes the input as matching (or failing to match) a
formal grammar, and possibly verifies other constraints as well --
the XML fad is founded on the desire to be able to do this sort
of thing in a fairly mechanical fashion. Such a parser might or
might not need the notion of "line;" it depends on the format.

--
Eric Sosman
es*****@acm-dot-org.invalid
Jul 18 '06 #9
Eric Sosman wrote:
It is often better to
read a line at a time with fgets() (not with gets(),
mind you!) and then extract data from the complete
line, possibly with sscanf().
I once thought fgets and sscanf may be better than the single scanf. At
the moment, I do not have that feeling at all. sscanf and scanf come
from one same family, the defeats in scanf remain in sscanf. When a
user enters, e.g. "WHAT_VALUE_ABC", both fail:
scanf("%d", &i);
or
sscanf(buf, "%d", &i);

The program validates the range of the data user provided, prompts
users to reenter proper data after invalid data provided. Isn't this
the right way?

Jul 18 '06 #10


lovecreatesbeauty wrote On 07/18/06 11:29,:
Eric Sosman wrote:
> It is often better to
read a line at a time with fgets() (not with gets(),
mind you!) and then extract data from the complete
line, possibly with sscanf().


I once thought fgets and sscanf may be better than the single scanf. At
the moment, I do not have that feeling at all. sscanf and scanf come
from one same family, the defeats in scanf remain in sscanf. When a
user enters, e.g. "WHAT_VALUE_ABC", both fail:
scanf("%d", &i);
or
sscanf(buf, "%d", &i);

The program validates the range of the data user provided, prompts
users to reenter proper data after invalid data provided. Isn't this
the right way?
Try the experiment yourself. For each of these
programs:

/* Program S */
#include <stdio.h>
int main(void) {
int x;
for (;;) {
puts ("Enter a value:");
if (scanf("%d", &x) == 1)
break;
puts ("Try again, please.");
}
printf ("The number is %d\n", x);
return 0;
}

/* Program SS */
#include <stdio.h>
int main(void) {
int x;
for (;;) {
char buff[100];
puts ("Enter a value:");
if (fgets(buff, sizeof buff, stdin) == buff
&& sscanf(buff, "%d", &x) == 1)
break;
puts ("Try again, please.");
}
printf ("The number is %d\n", x);
return 0;
}

.... enter WHAT_VALUE_ABC at the first prompt and 42 at
the second. Are there any differences in behavior? If
so, which behavior do you think is more useful in an
interactive setting? Why?

--
Er*********@sun.com

Jul 18 '06 #11
lovecreatesbeauty wrote:
>
Eric Sosman wrote:
There's a whole suite of different things you can do.
One is to tell scanf() how much space is available:

scanf ("%19s", buffer); /* 19 + 1 == 20 */

This will prevent scanf() from trying to store characters
beyond the end of the allocated memory, but it still isn't
wonderful: If you type "supercalifragilisticexpialidocious"
the buffer will receive "supercalifragilisti" and a zero
byte, and then the next input operation will start with
"cexpial...". If you type "It is an Ancient Mariner" the
buffer will receive "It" and a zero byte, and the next
input operation will start with " is an...".

Before the coming input operation, the program can clear the remainder
characters and has a correct beginning.
scanf can be used more powerfully than that:

/* BEGIN new.c */
/*
** If rc equals 0, then an empty line was entered
** and the array contains garbage.
** If rc equals EOF, then the end of file was reached.
** If rc equals 1, then there is a string in array.
** Up to LENGTH number of characters are read
** from a line of a text file or stream.
** If the line is longer than LENGTH,
** then the extra characters are discarded.
*/
#include <stdio.h>

#define LENGTH 80
#define str(x) # x
#define xstr(x) str(x)

int main(void)
{
int rc;
char array[LENGTH + 1];

puts("The LENGTH macro is " xstr(LENGTH));
fputs("Enter a string with spaces:", stdout);
fflush(stdout);
rc = scanf("%" xstr(LENGTH) "[^\n]%*[^\n]", array);
if (!feof(stdin)) {
getchar();
}
while (rc == 1) {
printf("Your string is:%s\n\n"
"Hit the Enter key to end,\nor enter "
"another string to continue:", array);
fflush(stdout);
rc = scanf("%" xstr(LENGTH) "[^\n]%*[^\n]", array);
if (!feof(stdin)) {
getchar();
}
if (rc == 0) {
*array = '\0';
}
}
return 0;
}

/* END new.c */

--
pete
Jul 18 '06 #12
Eric Sosman wrote:
lovecreatesbeauty wrote On 07/18/06 11:29,:
Eric Sosman wrote:
It is often better to
read a line at a time with fgets() (not with gets(),
mind you!) and then extract data from the complete
line, possibly with sscanf().

I once thought fgets and sscanf may be better than the single scanf. At
the moment, I do not have that feeling at all. sscanf and scanf come
from one same family, the defeats in scanf remain in sscanf. When a
user enters, e.g. "WHAT_VALUE_ABC", both fail:
scanf("%d", &i);
or
sscanf(buf, "%d", &i);

The program validates the range of the data user provided, prompts
users to reenter proper data after invalid data provided. Isn't this
the right way?

Try the experiment yourself. For each of these
programs:

/* Program S */
#include <stdio.h>
int main(void) {
int x;
for (;;) {
puts ("Enter a value:");
if (scanf("%d", &x) == 1)
break;
puts ("Try again, please.");
}
printf ("The number is %d\n", x);
return 0;
}

/* Program SS */
#include <stdio.h>
int main(void) {
int x;
for (;;) {
char buff[100];
puts ("Enter a value:");
if (fgets(buff, sizeof buff, stdin) == buff
&& sscanf(buff, "%d", &x) == 1)
break;
puts ("Try again, please.");
}
printf ("The number is %d\n", x);
return 0;
}

... enter WHAT_VALUE_ABC at the first prompt and 42 at
the second. Are there any differences in behavior? If
so, which behavior do you think is more useful in an
interactive setting? Why?
/*scanf and sscanf are very similar. I can think of two differences
between them, one is sscanf needs one more argument, the other is the
difference demonstrated by the example code. but that can be fixed, see
line 9. please correct me if I am wrong.*/

/* Program S.2 */
#include <stdio.h>
int main(void) {
int x;
for (;;){
puts("Enter a value:");
if (scanf("%d", &x) == 1)
break;
while ((x = getchar()) != '\n' && x != EOF) ; /*line 9*/
puts ("Try again, please.");
}
printf ("The number is %d\n", x);
return 0;
}

Jul 19 '06 #13
lovecreatesbeauty wrote:
>
/*scanf and sscanf are very similar. I can think of two differences
between them, one is sscanf needs one more argument, the other is the
difference demonstrated by the example code. but that can be fixed, see
line 9. please correct me if I am wrong.*/

/* Program S.2 */
#include <stdio.h>
int main(void) {
int x;
for (;;){
puts("Enter a value:");
if (scanf("%d", &x) == 1)
break;
while ((x = getchar()) != '\n' && x != EOF) ; /*line 9*/
puts ("Try again, please.");
}
printf ("The number is %d\n", x);
return 0;
}
Good: You've spotted the difference -- but you haven't
thought about it enough yet. Exercise: Modify the program
to read an integer from one line and a double from another,
prompting with "Enter an integer" and "Enter a double".
Test it by entering "42" on the first line and "42.0" on
the second. Then run it again, but this time enter "4 2"
on the first line. Run it a third time, entering "42 BAD"
on the first line and "BAD 42.0" on the second. Run it a
fourth time, entering " " at each prompt. Try to emit error
messages that describe as accurately as possible just how the
input differs from what the program expects.

The fundamental reason that fscanf() is not very good for
interactive input is that much interactive input is line-oriented,
but fscanf() is very nearly oblivious to line boundaries. fgets()
can provide the line awareness and then sscanf() can perform the
parsing, with the knowledge that it's operating on a line and not
on a stream of input that crosses an arbitrary number of line
boundaries, possibly more or fewer than you were expecting.

It is *possible* to do interactive input with fscanf(),
just as it is *possible* to write full-fledged C programs without
for, do, while, and if. Nobody will forbid you to indulge in
self-imposed hardships if that's your pleasure, but many will
wonder why you insist on doing things the hard way.

--
Eric Sosman
es*****@acm-dot-org.invalid
Jul 19 '06 #14
iwinux wrote:
Before I use scanf(), I must malloc the memory for it, like this:

//Start
char * buffer;

buffer = malloc(20);
scanf("%s", &buffer);
//End

As we know, if I type 30 characters in, something bad will happen.
So, how can I solve this problem?
As with any problem, to solve it you must first understand the nature
of the problem. scanf() forces all destination variables to be
predclared before the input starts. So using scanf itself is the
source of the problem. In general its preferable to obtain the input
from some other method (an iterated fgets is possible, but hardly
ideal) then use *sscanf()* AFTER deciding on how much memory to malloc
for your destinations.

(Another problem is that more than likely you don't want to scanf()
parsing semantics. Strings are terminated by white space with scanf()
for some inexplicable reason.)
(I mean, no matter how many charaters you type in, it can works well.)
Anyhow, first lets start with getting a full line of input safely (C
doesn't have any built-in provisions for doing this):

http://www.pobox.com/~qed/userInput.html

The key point being that using fgetstralloc(), you know the length of
the input and have a the entire contents of the input in one shot (most
other programming languages have a built-in mechanism for doing this,
BTW). From there you can estimate the destination sizes, or use
strcspn() to help you parse before you figure out exactly how much
memory you need for your destination parameters, then use sscanf() or
whatever to extract the exact results.

--
Paul Hsieh
http://www.pobox.com/~qed/
http://bstring.sf.net/

Jul 19 '06 #15

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

39
by: Teh Charleh | last post by:
OK I have 2 similar programmes, why does the first one work and the second does not? Basically the problem is that the program seems to ignore the gets call if it comes after a scanf call. Please...
57
by: Eric Boutin | last post by:
Hi ! I was wondering how to quickly and safely use a safe scanf( ) or gets function... I mean.. if I do : char a; scanf("%s", a); and the user input a 257 char string.. that creates a...
12
by: B Thomas | last post by:
Hi, I was reading O'Reilly's "Practical C programming" book and it warns against the use of scanf, suggesting to avoid using it completely . Instead it recomends to use using fgets and sscanf....
17
by: Lefty Bigfoot | last post by:
Hello, I am aware that a lot of people are wary of using scanf, because doing it improperly can be dangerous. I have tried to find a good tutorial on all the ins and outs of scanf() but been...
33
by: Lalatendu Das | last post by:
Dear friends, I am getting a problem in the code while interacting with a nested Do-while loop It is skipping a scanf () function which it should not. I have written the whole code below. Please...
185
by: Martin Jørgensen | last post by:
Hi, Consider: ------------ char stringinput ..bla. bla. bla. do {
20
by: Xavoux | last post by:
Hello all... I can't remind which function to use for safe inputs... gets, fgets, scanf leads to buffer overflow... i compiled that code with gcc version 2.95.2, on windows 2000 char tmp0 =...
51
by: deepak | last post by:
Hi, For the program pasted below, scanf is not waiting for the second user input. Can someone suggest reason for this? void main() { char c;
0
by: Faith0G | last post by:
I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...
0
by: ryjfgjl | last post by:
In our work, we often need to import Excel data into databases (such as MySQL, SQL Server, Oracle) for data analysis and processing. Usually, we use database tools like Navicat or the Excel import...
0
by: taylorcarr | last post by:
A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: aa123db | last post by:
Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.