By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
440,262 Members | 1,161 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 440,262 IT Pros & Developers. It's quick & easy.

Removing white spaces and tab characters

P: n/a
Hi,

I have a text like this -
"This is a message containing tabs and white spaces"

Now this text contains tabs and white spaces. I want remove the tabs
and white spaces(if it more than once between two words).

Is there any function we have in C which will find out the tabs and
white spaces and returns the text in the follwong way -

"This is a message containing tabs and white spaces"

Any code help is welcome..

Awaiting the response

Gopal Srinivasan
Nov 14 '05 #1
Share this Question
Share on Google+
11 Replies


P: n/a
gopal srinivasan wrote:
Hi,

I have a text like this -
"This is a message containing tabs and white spaces"

Now this text contains tabs and white spaces. I want remove the tabs
and white spaces(if it more than once between two words).

Is there any function we have in C which will find out the tabs and
white spaces and returns the text in the follwong way -

"This is a message containing tabs and white spaces"

Any code help is welcome..

Awaiting the response

Gopal Srinivasan


No, you will have to write your own.
--
Thomas Matthews

C++ newsgroup welcome message:
http://www.slack.net/~shiva/welcome.txt
C++ Faq: http://www.parashift.com/c++-faq-lite
C Faq: http://www.eskimo.com/~scs/c-faq/top.html
alt.comp.lang.learn.c-c++ faq:
http://www.comeaucomputing.com/learn/faq/
Other sites:
http://www.josuttis.com -- C++ STL Library book

Nov 14 '05 #2

P: n/a
gopal srinivasan wrote on 03/09/04 :
I have a text like this -
"This is a message containing tabs and white
spaces"

Now this text contains tabs and white spaces. I want remove the tabs
and white spaces(if it more than once between two words).

Is there any function we have in C which will find out the tabs and
white spaces and returns the text in the follwong way -


No. Some scan loop and a smart use of the isspace() function
(<ctype.h>) may help to write your own.

Post your code if your are stuck. We don't do homeworks (well, for
free).

--
Emmanuel
The C-FAQ: http://www.eskimo.com/~scs/C-faq/faq.html
The C-library: http://www.dinkumware.com/refxc.html

"C is a sharp tool"

Nov 14 '05 #3

P: n/a
On Fri, 3 Sep 2004, gopal srinivasan wrote:
Hi,

I have a text like this -
"This is a message containing tabs and white spaces"

Now this text contains tabs and white spaces. I want remove the tabs
and white spaces(if it more than once between two words).

Is there any function we have in C which will find out the tabs and
white spaces and returns the text in the follwong way -

"This is a message containing tabs and white spaces"


No, there is no standard C function to do this. It shouldn't be too hard
to write a function to do this. Give it a try and post your code if you
are having trouble.

--
Send e-mail to: darrell at cs dot toronto dot edu
Don't send e-mail to vi************@whitehouse.gov
Nov 14 '05 #4

P: n/a
gopal srinivasan wrote:

I have a text like this -
"This is a message containing tabs and white spaces"

Now this text contains tabs and white spaces. I want remove the tabs
and white spaces(if it more than once between two words).


Write a filter. I suggest using getchar and putchar separated by
some minor code.

--
"A man who is right every time is not likely to do very much."
-- Francis Crick, co-discover of DNA
"There is nothing more amazing than stupidity in action."
-- Thomas Matthews
Nov 14 '05 #5

P: n/a

"gopal srinivasan" <go*********@gmail.com> wrote

I have a text like this -
"This is a message containing tabs and white spaces"
Now this text contains tabs and white spaces. I want remove the tabs
and white spaces(if it more than once between two words).

Is there any function we have in C which will find out the tabs and
white spaces and returns the text in the follwong way -

"This is a message containing tabs and white spaces"

Your prototype will be

/*
strip out all spaces except for one space between words.
Params: str - the string to strip. (Note, will never be lengthened but may
be shortened).
*/
void onespace(char *str)

Now basically we strip leading whitespace, and trailing whitespace, then we
go through looking for non-space whitespace and replace it with a space,
then we replace any runs of two or more spaces with a single space
character. This last operation will require use of memmove() to shift the
remainder of the string up by a space.

To code this really efficiently is quite difficult, because the simple
approach of going through from start to finish moving everything up will
cause a lot of unnecessary copying on large input. The answer is to do one
pass tagging spaces for deletion, maybe by replacing them with a newline (a
character that you are changing anyway, so it cannot appear in the final
output), and then a second pass to consolidate the fragments.

Have a bash at it and post your attempt for improvement and comment.
Nov 14 '05 #6

P: n/a
Hi,
By the following code, tab characters if appearing once or more than
once can be removed -

void main()
{

char source[] = "USA Tops Olympics tally";
char source1[50];
int i=0;
int tab_cnt=0;
while( (source1[i]=source[i]) != '\0')
{
if(source[i]=='\t')
{
tab_cnt++;
if(tab_cnt>=1)
{
source1[i]=' ';
}
}
i++;
}
printf("%s",source1);

}
Nov 14 '05 #7

P: n/a
On 6 Sep 2004 03:11:34 -0700, go*********@gmail.com (gopal srinivasan)
wrote:
Hi,
By the following code, tab characters if appearing once or more than
once can be removed -

void main()
{

char source[] = "USA Tops Olympics tally";
char source1[50];
int i=0;
int tab_cnt=0;
while( (source1[i]=source[i]) != '\0')
{
if(source[i]=='\t')
{
tab_cnt++;
if(tab_cnt>=1)
What purpose do these last two statements serve?
{
source1[i]=' ';
}
}
i++;
}
printf("%s",source1);

}


<<Remove the del for email>>
Nov 14 '05 #8

P: n/a
go*********@gmail.com (gopal srinivasan) writes:
Hi,
By the following code, tab characters if appearing once or more than
once can be removed -

void main()
{

char source[] = "USA Tops Olympics tally";
char source1[50];
int i=0;
int tab_cnt=0;
while( (source1[i]=source[i]) != '\0')
{
if(source[i]=='\t')
{
tab_cnt++;
if(tab_cnt>=1)
{
source1[i]=' ';
}
}
i++;
}
printf("%s",source1);

}


Please accept the following as *constructive* criticism.

main returns int, not void. Change "void main()" to "int main(void)".

Calling printf() with no prototype in scope invokes undefined
behavior; the best way to get a prototype is "#include <stdio.h>".

The length of the source1 array, 50, is an arbitrary magic number.
Change the value of source, and you get an overflow.

You use tabs in a string literal. That's legal, but inadvisable; I
can't tell by looking at the source whether those are tabs or blanks.
Try "USA\tTops\tOlympics\ttally".

The program does everything in main(). It would be far more useful to
write a function that takes a string and returns (a pointer to) the
string with the tabs replaced. (This raises questions about how to
return a string from a function; there are several ways to do it.) In
real life, you're probably going to want to process input from stdin
or from a file. This would also make it much easier to test your
program without recompiling it to change the input data; for example,
you could have easily tried it with sequences of multiple tabs.

You make no real use of tab_cnt; the program merely replaces each tab
with a space. A sequence of two tabs is replaced with two spaces. At
the point where you check whether tab_cnt>=1, that condition will
always be true; tab_cnt is initialized to 0, is incremented just
before the test, and is never set back to 0. I suspect you were
trying to replace each sequence of one or more tabs with a single
space; to do that, you'll need an index to track where you are in
source and another to track where you are in source1. BTW, naming
your variables "source" and "target" would have made for clearer code
than "source" and "source1".

The program's output is not terminated by a newline. It's
implementation-defined whether the last line of output requires a
newline. On some systems, your program might print nothing at all.

You don't return a value from main() or call exit(). In C99, this is
equivalent to "return 0;", which is ok (or it would be if you had
declared main() correctly); in C90, it returns an undefined
termination status to the host environment.

Your indentation is questionable. You use tab characters for
indentation; it's better to use spaces instead, especially when
posting to Usenet. Assuming 8-column tab stops, many of us find
8-column indentation excessive, though that's not universal. The
entire while loop is indented; for consistency, the while(...) should
be directly under the "int tab_cnt=0;".

I was going to comment that replacing tabs by single spaces is less
useful than replacing tabs with multiple spaces, depending on the
current position and possibly on tab-stop settings, as the Unix
"expand" program does -- but looking upthread I see that what you
tried to do is based on the original problem statement (which was
yours). Note that what you originally asked about was replacing each
sequence of one or more spaces and tabs with a single space; your
program does nothing special with spaces in the input string.

Just for the sake of stating the problem more concretely, a Perl
solution would be:

perl -pe 's/[ \t]+/ /g'

Ignore this is you don't know Perl (and don't expect to derive a C
solution from this; C doesn't directly support regular expressions).
If you happen to have Perl on your system, you can use this to check
the results of your own C program.

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <*> <http://users.sdsc.edu/~kst>
We must do something. This is something. Therefore, we must do this.
Nov 14 '05 #9

P: n/a
This is just initial posting..i was in a hurry. Infact the best way is
to make use of pointers!!!

Rgds
Gopal
Nov 14 '05 #10

P: n/a
Keith,

Thanks for that detailed mail. I have taken note..

Rgds
Gopal
Nov 14 '05 #11

P: n/a
gopal srinivasan wrote:
Hi,
By the following code, tab characters if appearing once or more than
once can be removed -

void main()
{

char source[] = "USA Tops Olympics tally";
char source1[50];
int i=0;
int tab_cnt=0;
while( (source1[i]=source[i]) != '\0')
{
if(source[i]=='\t')
{
tab_cnt++;
if(tab_cnt>=1)
{
source1[i]=' ';
}
}
i++;
}
printf("%s",source1);

}

[Code below not compiled or tested.]
void Trim_Whitespace(char * const text)
{
char * source = text;
char * destination = text;
while (*source) /* *source != '\0' */
{
/* Check for whitespace, use lib. function */
if (isspace(*source))
{
/* Copy over a simple space first */
*destination = ' ';
++source;
++destination;

/* Span past extra whitespace */
while (*source && isspace(*source))
{
++source;
}
}
else
{
/* Copy non-whitespace to destination */
*destination = *source;
++source;
++destination;
}
} /* End: while */

/* Copy over terminating null */
*destination = *source;
return;
}

Note that _expanding_ tabs to spaces is a
different function.

--
Thomas Matthews

C++ newsgroup welcome message:
http://www.slack.net/~shiva/welcome.txt
C++ Faq: http://www.parashift.com/c++-faq-lite
C Faq: http://www.eskimo.com/~scs/c-faq/top.html
alt.comp.lang.learn.c-c++ faq:
http://www.comeaucomputing.com/learn/faq/
Other sites:
http://www.josuttis.com -- C++ STL Library book

Nov 14 '05 #12

This discussion thread is closed

Replies have been disabled for this discussion.