By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
425,967 Members | 815 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 425,967 IT Pros & Developers. It's quick & easy.

delete comments in .c file

P: n/a
I want to delete all comments in .c file.

Size of .c file is very big.

Any good idea to do this?

Please show me example code.

Nov 13 '05 #1
Share this Question
Share on Google+
39 Replies


P: n/a
> I want to delete all comments in .c file.

Size of .c file is very big.

Any good idea to do this?


Assuming you have C comments /* like this, right? */
You can use UNIX sed (stream editor). You're not going to believe this,
but: cat input.c | sed -e 's/\/\*.*\*\///g' > output.c

Do a diff to make sure it's working correctly.

--
Jem Berkes
http://www.sysdesign.ca/
Nov 13 '05 #2

P: n/a


Jem Berkes wrote:
I want to delete all comments in .c file.

Size of .c file is very big.

Any good idea to do this?

Assuming you have C comments /* like this, right? */
You can use UNIX sed (stream editor). You're not going to believe this,
but: cat input.c | sed -e 's/\/\*.*\*\///g' > output.c

Do a diff to make sure it's working correctly.


That would only work if all comments are in the form you describe, which
isn't likely. e.g. it won't work if the comments are:

/* Start of a fairly common form
* of comment block.
*/

or:

/* set x: */ x = 7; /* now more stuff... */

In the first case it won't delete the comment while in the second
unusual case it'll delete both the comments plus the "x = 7;" assignment
between them.

You need to Google around or if you want a UNIX tool solution, post this
to a UNIX NG (e.g. comp.unix.questions or comp.unix.shell).

Ed.

Nov 13 '05 #3

P: n/a
Jem Berkes <je*@users.pc9__org> writes:
I want to delete all comments in .c file.

Size of .c file is very big.

Any good idea to do this?


Assuming you have C comments /* like this, right? */
You can use UNIX sed (stream editor). You're not going to believe this,
but: cat input.c | sed -e 's/\/\*.*\*\///g' > output.c

Do a diff to make sure it's working correctly.


That won't detect multi-line comments. It also fails to properly
ignore comment delimiters inside string and character literals. It
deletes everything from the first "/*" on a line to the last "*/" on
the same line; for example, it transforms this:
x = /* one comment */ 42; /* another comment */
to this:
x =
And it replaces each comment by nothing rather than by a blank, so the
following valid C fragment:
x = sizeof/*comment*/int;
is replaced with this:
x = sizeofint;

If you're dealing with C99 code you'll have to worry about "//"
comments (many pre-C99 compilers support these as an extension).

Stripping C comments is a lot more complex than it looks; you almost
have to duplicate most of the functionality of the preprocessor to get
it right.

I really have to ask the original poster: why do you want to do this?

--
Keith Thompson (The_Other_Keith) ks*@cts.com <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <*> <http://www.sdsc.edu/~kst>
Schroedinger does Shakespeare: "To be *and* not to be"
Nov 13 '05 #4

P: n/a
"Timex" <su********@hotmail.com> wrote:
# I want to delete all comments in .c file.
#
# Size of .c file is very big.
#
# Any good idea to do this?
#
# Please show me example code.

tclsh <<':eof'
set c [open something.c]
set b [read $c]
close $c

regsub -all {//[^\n]*\n} $b \n b
regsub -all {/[*]([^*]|[*](?!/))*[*]/} $b {} b

set c [open something-1.c w]
puts $c $b
close $c
:eof

--
Derk Gwen http://derkgwen.250free.com/html/index.html
The whole world's against us.
Nov 13 '05 #5

P: n/a
Derk Gwen <de******@HotPOP.com> writes:
"Timex" <su********@hotmail.com> wrote:
# I want to delete all comments in .c file.
#
# Size of .c file is very big.
#
# Any good idea to do this?
#
# Please show me example code.

tclsh <<':eof'
set c [open something.c]
set b [read $c]
close $c

regsub -all {//[^\n]*\n} $b \n b
regsub -all {/[*]([^*]|[*](?!/))*[*]/} $b {} b

set c [open something-1.c w]
puts $c $b
close $c
:eof


This seems to add an extra blank line at the end of the output file.
It transforms "token/**/pasting" to "tokenpasting", which doesn't
violate the original poster's requirements, but it doesn't match the
way comments are treated in C.

It also doesn't ignore comment delimiters in string and character
literals.

--
Keith Thompson (The_Other_Keith) ks*@cts.com <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <*> <http://www.sdsc.edu/~kst>
Schroedinger does Shakespeare: "To be *and* not to be"
Nov 13 '05 #6

P: n/a
Assuming you have access to the C preprocessor (cpp) you can do a quick hack
to use that. Isolate the file from it's include files and run like,

cpp -nostdinc file.c > new_file.c

It should warn that it cannot find all the include files. Now put your
#includes back in and you should have a comment free file.

This is a quick and dirty solution because the C preprocessor is trying to
do a whole load of stuff aswell as stripping comments. You need to make
sure that there are no #defines because otherwise they will be expanded out.
Put them back in after you have run the processor.

I have only come across the C preprocessor as a seperate program called
'cpp' on *nix boxes. It may be available on windows but I do not know where
or how.
Colin.

"Timex" <su********@hotmail.com> wrote in message
news:bn**********@news.kreonet.re.kr...
I want to delete all comments in .c file.

Size of .c file is very big.

Any good idea to do this?

Please show me example code.

Nov 13 '05 #7

P: n/a
Timex wrote:

I want to delete all comments in .c file.

Size of .c file is very big.

Any good idea to do this?

Please show me example code.


This will replace each /* ... */ style comment with a single space:

#include<stdio.h>
int main(void){i\
nt c,p=-1,k=0,s=0
;while((c=getchar
())!=EOF){if(s==0
){if(p=='/'&&c==
'*'){s=1;k=2;}el\
se if(c=='\"'&&p
!='\\'&&p!='\'')s
=2;}else if(s==1)
{if (p=='*'&&c==
'/')s=0;}else if(
s==2){if(c=='\"'
&&p!='\\')s=0;}if
(k==1)putchar(' '
);if(p>0&&s!=1){
if(!k)putchar(p);
if(--k<0)k=0;}p=c
;}putchar(p);ret\
urn 0;}

--
Tim Hagan
Nov 13 '05 #8

P: n/a
Tim Hagan wrote:
This will replace each /* ... */ style comment with a single space:

#include<stdio.h>
int main(void){i\
nt c,p=-1,k=0,s=0
;while((c=getchar
())!=EOF){if(s==0
){if(p=='/'&&c==
'*'){s=1;k=2;}el\
se if(c=='\"'&&p
!='\\'&&p!='\'')s
=2;}else if(s==1)
{if (p=='*'&&c==
'/')s=0;}else if(
s==2){if(c=='\"'
&&p!='\\')s=0;}if
(k==1)putchar(' '
);if(p>0&&s!=1){
if(!k)putchar(p);
if(--k<0)k=0;}p=c
;}putchar(p);ret\
urn 0;}

It doesn't handle line-splicing. Also, putchar(-1) is not portable.

Jeremy.
Nov 13 '05 #9

P: n/a
> I want to delete all comments in .c file.

Size of .c file is very big.


You work for SCO's Linux division, don't you?
Nov 13 '05 #10

P: n/a
Jeremy Yallop wrote:

Tim Hagan wrote:
This will replace each /* ... */ style comment with a single space:

#include<stdio.h>
int main(void){i\
nt c,p=-1,k=0,s=0
;while((c=getchar
())!=EOF){if(s==0
){if(p=='/'&&c==
'*'){s=1;k=2;}el\
se if(c=='\"'&&p
!='\\'&&p!='\'')s
=2;}else if(s==1)
{if (p=='*'&&c==
'/')s=0;}else if(
s==2){if(c=='\"'
&&p!='\\')s=0;}if
(k==1)putchar(' '
);if(p>0&&s!=1){
if(!k)putchar(p);
if(--k<0)k=0;}p=c
;}putchar(p);ret\
urn 0;}


It doesn't handle line-splicing. Also, putchar(-1) is not portable.


putchar(-1) is never executed in the above code, but you're right
about the line-splicing. Oh, well, back to the drawing board ...

--
Tim Hagan
Nov 13 '05 #11

P: n/a
>> I want to delete all comments in .c file.

Size of .c file is very big.


You work for SCO's Linux division, don't you?


<GRIN>

Nov 13 '05 #12

P: n/a
Tim Hagan wrote:
Jeremy Yallop wrote:

Tim Hagan wrote:
> This will replace each /* ... */ style comment with a single space:
>
> #include<stdio.h>
> int main(void){i\
> nt c,p=-1,k=0,s=0
> ;while((c=getchar
> ())!=EOF){if(s==0
> ){if(p=='/'&&c==
> '*'){s=1;k=2;}el\
> se if(c=='\"'&&p
> !='\\'&&p!='\'')s
>=2;}else if(s==1)
> {if (p=='*'&&c==
> '/')s=0;}else if(
> s==2){if(c=='\"'
> &&p!='\\')s=0;}if
> (k==1)putchar(' '
> );if(p>0&&s!=1){
> if(!k)putchar(p);
> if(--k<0)k=0;}p=c
> ;}putchar(p);ret\
> urn 0;}


It doesn't handle line-splicing. Also, putchar(-1) is not portable.


putchar(-1) is never executed in the above code


putchar(-1) is executed if EOF is encountered immediately.

Jeremy.
Nov 13 '05 #13

P: n/a
Jeremy Yallop <je****@jdyallop.freeserve.co.uk> spoke thus:
putchar(-1) is executed if EOF is encountered immediately.


Why would one want to write putchar(-1) anyway...?

--
Christopher Benson-Manica | I *should* know what I'm talking about - if I
ataru(at)cyberspace.org | don't, I need to know. Flames welcome.
Nov 13 '05 #14

P: n/a
"Timex" <su********@hotmail.com> wrote in
news:bn**********@news.kreonet.re.kr:
I want to delete all comments in .c file.

Size of .c file is very big.

Any good idea to do this?

Please show me example code.


Just grab an evaluation copy of Codewright from Borland. Do a
search-replace on . <-- regexp and restrict "to comments" with a
replacement string of nothing. I just did it to a very large file in about
5 seconds. Now what does this have to do with the C language? The C
language does not specify how to delete comments.

--
- Mark ->
--
Nov 13 '05 #15

P: n/a
Jeremy Yallop wrote:

Tim Hagan wrote:
Jeremy Yallop wrote:

Tim Hagan wrote:
> This will replace each /* ... */ style comment with a single space:
>
> #include<stdio.h>
> int main(void){i\
> nt c,p=-1,k=0,s=0
> ;while((c=getchar
> ())!=EOF){if(s==0
> ){if(p=='/'&&c==
> '*'){s=1;k=2;}el\
> se if(c=='\"'&&p
> !='\\'&&p!='\'')s
>=2;}else if(s==1)
> {if (p=='*'&&c==
> '/')s=0;}else if(
> s==2){if(c=='\"'
> &&p!='\\')s=0;}if
> (k==1)putchar(' '
> );if(p>0&&s!=1){
> if(!k)putchar(p);
> if(--k<0)k=0;}p=c
> ;}putchar(p);ret\
> urn 0;}

It doesn't handle line-splicing. Also, putchar(-1) is not portable.
putchar(-1) is never executed in the above code


.... unless one tries to remove the comments from an empty file. :-)
putchar(-1) is executed if EOF is encountered immediately.


So just insert 'if (p > 0)' before the final putchar.

--
Tim Hagan
Nov 13 '05 #16

P: n/a
Mark A. Odell wrote:
Now what does this have to do with the C language? The C
language does not specify how to delete comments.


The Standard says: "Each comment is replaced by one space character." If
that doesn't specify how to delete comments, I don't know what does.

--
Richard Heathfield : bi****@eton.powernet.co.uk
"Usenet is a strange place." - Dennis M Ritchie, 29 July 1999.
C FAQ: http://www.eskimo.com/~scs/C-faq/top.html
K&R answers, C books, etc: http://users.powernet.co.uk/eton
Nov 13 '05 #17

P: n/a
On Fri, 31 Oct 2003 22:47:18 +0000 (UTC), in comp.lang.c , Richard
Heathfield <do******@address.co.uk.invalid> wrote:
Mark A. Odell wrote:
Now what does this have to do with the C language? The C
language does not specify how to delete comments.


The Standard says: "Each comment is replaced by one space character." If
that doesn't specify how to delete comments, I don't know what does.


I guess Mark's point is that to be sure of being syntactically
identical to the original you should replace all comments by a space.
For instance
int i = 23/* */12;
should still generate a syntax error . :-)

What the OP would expect it to do with
double i = 32 //* */ 4
;
is anyone's guess. I guess you'd have to have a C99 and a C89 mode.
--
Mark McIntyre
CLC FAQ <http://www.eskimo.com/~scs/C-faq/top.html>
CLC readme: <http://www.angelfire.com/ms3/bchambless0/welcome_to_clc.html>
----== Posted via Newsfeed.Com - Unlimited-Uncensored-Secure Usenet News==----
http://www.newsfeed.com The #1 Newsgroup Service in the World! >100,000 Newsgroups
---= 19 East/West-Coast Specialized Servers - Total Privacy via Encryption =---
Nov 13 '05 #18

P: n/a
Here's a perl script which will handle *MOST* sane C code...

Some things that it will miss (scan manually for it first:

a double quote inside of single quotes (e.g.)
char confusion = '"';

C-99 // comments like this

I'm sure that some people can come up with other convoluted counter-examples.

It reads and plays with the entire file, so it will need to hold
at least two or three copies of it in RAM. (for today's computers,
that would be some number of megabytes).

If you want any of the above fixed, feel free to send me a cheque.
__________________________________________________ __
#!/usr/bin/perl
$s=join("",<>);
# printf "[[%s]]\n\n",$s;
$s=~ s/("(\\\\|\\"|[^"])*")|(\/\*([^*]|\*(?=[^\/]))*\*\/)|(\/\/.*)/[[$1 ]]/g;
printf "[[%s]]\n\n",$s;
__________________________________________________ __
Yep, That's it... 5 lines including the shell header.

Timex wrote:
I want to delete all comments in .c file.

Size of .c file is very big.

Any good idea to do this?

Please show me example code.

--
Stephen Samuel +1(604)876-0426 sa****@bcgreen.com
http://www.bcgreen.com/~samuel/
Powerful committed communication. Transformation touching
the jewel within each person and bringing it to light.
Nov 13 '05 #19

P: n/a
Stephen Samuel <st************@telus.net> wrote:
Here's a perl script which will handle *MOST* sane C code...

<snip>

Since when is perl topical in c.l.c?

BTW:
Does your "solution" account for comment delimiters inside string
literals? (I'm unfortunately unable to decrypt the line-noise
provided.)

Regards
--
Irrwahn
(ir*******@freenet.de)
Nov 13 '05 #20

P: n/a
*** rude top-posting fixed ***

Stephen Samuel wrote:
Timex wrote:
I want to delete all comments in .c file.

Size of .c file is very big.

Any good idea to do this?

Please show me example code.


Here's a perl script which will handle *MOST* sane C code...

Some things that it will miss (scan manually for it first:

a double quote inside of single quotes (e.g.)
char confusion = '"';

C-99 // comments like this

I'm sure that some people can come up with other convoluted
counter-examples.

It reads and plays with the entire file, so it will need to
hold at least two or three copies of it in RAM. (for today's
computers, that would be some number of megabytes).

If you want any of the above fixed, feel free to send me a
cheque.
__________________________________________________ __
#!/usr/bin/perl
$s=join("",<>);
# printf "[[%s]]\n\n",$s;
$s=~ s/("(\\\\|\\"|[^"])*")|(\/\*([^*]|\*(?=[^\/]))*\*\/)|(\/\/.*)/[[$1 ]]/g;
printf "[[%s]]\n\n",$s;
__________________________________________________ __
Yep, That's it... 5 lines including the shell header.


Please do not top-post.

The following AFAIK does not have the above faults, and does not
need to store any file copies, in fact not even any line copies.
It will probably be at least an order of magnitude faster.

/* File uncmntc.c - demo of a text filter
Strips C comments. Tested to strip itself
by C.B. Falconer. 2002-08-15
Public Domain. Attribution appreciated
report bugs to <mailto:cb********@worldnet.att.net>
*/

/* With gcc3.1, must omit -ansi to compile eol comments */

#include <stdio.h>
#include <stdlib.h>

static int ch, lastch;

/* ---------------- */

static void putlast(void)
{
if (0 != lastch) fputc(lastch, stdout);
lastch = ch;
ch = 0;
} /* putlast */

/* ---------------- */

/* gobble chars until star slash appears */
static int stdcomment(void)
{
int ch, lastch;

ch = 0;
do {
lastch = ch;
if (EOF == (ch = fgetc(stdin))) return EOF;
} while (!(('*' == lastch) && ('/' == ch)));
return ch;
} /* stdcomment */

/* ---------------- */

/* gobble chars until EOLine or EOF. i.e. // comments */
static int eolcomment(void)
{
int ch, lastch;

ch = '\0';
do {
lastch = ch;
if (EOF == (ch = fgetc(stdin))) return EOF;
} while (!(('\n' == ch) && ('\\' != lastch)));
return ch;
} /* eolcomment */

/* ---------------- */

/* echo chars until '"' or EOF */
static int echostring(void)
{
putlast();
if (EOF == (ch = fgetc(stdin))) return EOF;
do {
putlast();
if (EOF == (ch = fgetc(stdin))) return EOF;
} while (!(('"' == ch) && ('\\' != lastch)));
return ch;
} /* echostring */

/* ---------------- */

int main(void)
{
lastch = '\0';
while (EOF != (ch = fgetc(stdin))) {
if ('/' == lastch)
if (ch == '*') {
lastch = '\0';
if (EOF == stdcomment()) break;
ch = ' ';
putlast();
}
else if (ch == '/') {
lastch = '\0';
if (EOF == eolcomment()) break;
ch = '\n';
putlast(); // Eolcomment here
// Eolcomment line \
with continuation line.
}
else {
putlast();
}
else if (('"' == ch) && ('\\' != lastch)
&& ('\'' != lastch)) {
if ('"' != (ch = echostring())) {
fputs("\"Unterminated\" string\n", stderr);
fputs("checking for\
continuation line string\n", stderr);
fputs("checking for" "concat string\n", stderr);
return EXIT_FAILURE;
}
putlast();
}
else {
putlast();
}
} /* while */
putlast(/* embedded comment */);
return 0;
} /* main */
--
Chuck F (cb********@yahoo.com) (cb********@worldnet.att.net)
Available for consulting/temporary embedded and systems.
<http://cbfalconer.home.att.net> USE worldnet address!
Nov 13 '05 #21

P: n/a


Timex wrote:
I want to delete all comments in .c file.

Size of .c file is very big.

Any good idea to do this?

Please show me example code.


Try "ncsl": http://www.lucentssg.com/displayProduct.cfm?prodid=33
It strips all comments and indentation so just run an indenter (e.g.
"indent") or a C beautifier (e.g. "cb" - google for "cb download
beautifier" and take your pick) on the output to get it back in readable
format. Disclaimer - I've never used this specific download of "ncsl",
I've just used the version provided on UNIX boxes within Lucent.

Ed.

Nov 13 '05 #22

P: n/a
Here's a perl script which will handle *MOST* sane C code...

Some things that it will miss (scan manually for it first:

a double quote inside of single quotes (e.g.)
char confusion = '"';

C-99 // comments like this

I'm sure that some people can come up with other convoluted counter-examples.

It reads and plays with the entire file, so it will need to hold
at least two or three copies of it in RAM. (for today's computers,
that would be some number of megabytes).

If you want any of the above fixed, feel free to send me a cheque.
__________________________________________________ __
#!/usr/bin/perl
$s=join("",<>);
# printf "[[%s]]\n\n",$s;
$s=~ s/("(\\\\|\\"|[^"])*")|(\/\*([^*]|\*(?=[^\/]))*\*\/)|(\/\/.*)/[[$1 ]]/g;
printf "[[%s]]\n\n",$s;
__________________________________________________ __
Yep, That's it... 5 lines including the shell header.

One bug: Quoted strings have a space inserted after them.
Again: fixable, but not worth the trouble for free.

Timex wrote:
I want to delete all comments in .c file.

Size of .c file is very big.

Any good idea to do this?

Please show me example code.

--
Stephen Samuel +1(604)876-0426 sa****@bcgreen.com
http://www.bcgreen.com/~samuel/
Powerful committed communication. Transformation touching
the jewel within each person and bringing it to light.
Nov 13 '05 #23

P: n/a
Here's a perl script which will handle *MOST* sane C code...

Some things that it will miss (scan manually for it first:

a double quote inside of single quotes (e.g.)
char confusion = '"';

C-99 // comments like this

I'm sure that some people can come up with other convoluted counter-examples.

It reads and plays with the entire file, so it will need to hold
at least two or three copies of it in RAM. (for today's computers,
that would be some number of megabytes).

If you want any of the above fixed, feel free to send me a cheque.
__________________________________________________ __
#!/usr/bin/perl
$s=join("",<>);
# printf "[[%s]]\n\n",$s;
$s=~ s/("(\\\\|\\"|[^"])*")|(\/\*([^*]|\*(?=[^\/]))*\*\/)|(\/\/.*)/[[$1 ]]/g;
printf "[[%s]]\n\n",$s;
__________________________________________________ __
Yep, That's it... 5 lines including the shell header.

One bug: Quoted strings have a space inserted after them.
Again: fixable, but not worth the trouble for free.

Timex wrote:
I want to delete all comments in .c file.

Size of .c file is very big.

Any good idea to do this?

Please show me example code.

--
Stephen Samuel +1(604)876-0426 sa****@bcgreen.com
http://www.bcgreen.com/~samuel/
Powerful committed communication. Transformation touching
the jewel within each person and bringing it to light.
Nov 13 '05 #24

P: n/a
Irrwahn Grausewitz wrote:
Stephen Samuel <st************@telus.net> wrote:

Here's a perl script which will handle *MOST* sane C code...
<snip>

Since when is perl topical in c.l.c?

It's a C solution .. But Perl is written in C, so if you like,
I can just
#include <perl-source.c>
BTW:
Does your "solution" account for comment delimiters inside string
literals? (I'm unfortunately unable to decrypt the line-noise
provided.)


Yes. It accounts for comment delimiters in quotes and quote
delimiters in comments (One side effect is that double quote
strings have a space added after them. Given the way that I
wrote it, it was a choice between that, replacing comments with
Nothing (possible to cause syntax errors) or added complexity.)

It also handles quoted double-quotes inside of strings.

It does NOT handle double-quote or comment-start delimiters inside
of single-quotes (char literals), but that would be easy enough to add.
--
Stephen Samuel +1(604)876-0426 sa****@bcgreen.com
http://www.bcgreen.com/~samuel/
Powerful committed communication. Transformation touching
the jewel within each person and bringing it to light.
Nov 13 '05 #25

P: n/a
Stephen Samuel <st************@telus.net> scribbled the following:
Irrwahn Grausewitz wrote:
Stephen Samuel <st************@telus.net> wrote:
Here's a perl script which will handle *MOST* sane C code...


<snip>

Since when is perl topical in c.l.c?

It's a C solution .. But Perl is written in C, so if you like,
I can just
#include <perl-source.c>


Are Perl implementations *required* to be written in C? And are
Perl implementations *required* to ship with the source code?

--
/-- Joona Palaste (pa*****@cc.helsinki.fi) ------------- Finland --------\
\-- http://www.helsinki.fi/~palaste --------------------- rules! --------/
"'So called' means: 'There is a long explanation for this, but I have no
time to explain it here.'"
- JIPsoft
Nov 13 '05 #26

P: n/a
CBFalconer wrote:
*** rude top-posting fixed ***

Hmm.. This must be a relatively recent addition to usenet
ettiquete (i.e. in the last decade or so).

Appologies. I'm an old foggie, and it's probably been an decade
since I've posted here.

--
Stephen Samuel +1(604)876-0426 sa****@bcgreen.com
http://www.bcgreen.com/~samuel/
Powerful committed communication. Transformation touching
the jewel within each person and bringing it to light.
Nov 13 '05 #27

P: n/a
"Stephen Samuel" wrote:
Irrwahn Grausewitz wrote:
Since when is perl topical in c.l.c? It's a C solution


Err, no.
.. But Perl is written in C, so if you like,
I can just
#include <perl-source.c>
Non-standard header file. ;-)
Does your "solution" account for comment delimiters inside string
literals?


Yes.


Nice.
It accounts for comment delimiters in quotes and quote
delimiters in comments (One side effect is that double quote
strings have a space added after them. Given the way that I
wrote it, it was a choice between that, replacing comments with
Nothing (possible to cause syntax errors) or added complexity.)
Hm. AFAICT that shouldn't cause much trouble, OK.
It also handles quoted double-quotes inside of strings.
ITYM something like "\""?
It does NOT handle double-quote or comment-start delimiters inside
of single-quotes (char literals), but that would be easy enough to
add.


Fair enough.
But still there might be "strange" cases caused where your script may
fail. Consider:

/* gotcha! *\
/

A C preprocessor would have deleted the <backslash><new-line> sequence
in translation phase 2 *before* the tokenization and comment replacement
takes place in phase 3. And if the backslash is written as a trigraph
sequence we need to "fake" translation phase 1 as well... :-(

Admittedly, these are rare situations, but you see: sophisticated
comment replacement in C files isn't /that/ easy after all, you have to
provide quite an amount of preprocessor functionality to get it right.

Best Regards
--
Irrwahn

PS: Please don't email me if you already posted
your reply to the newsgroup; thank you.
Nov 13 '05 #28

P: n/a
Joona I Palaste <pa*****@cc.helsinki.fi> writes:
[...]
Are Perl implementations *required* to be written in C? And are
Perl implementations *required* to ship with the source code?


<OT>
Perl is pretty much defined by its implementation, not by a language
standard. The implementation (there's basically only one) is written
in C. It's distributed under one of two open source licenses, both of
which require the source to be available (but not necessarily shipped
with the binaries).

This is probably incorrect in some minor details. If I had posted to
a more appropriate newsgroup, someone would jump in and correct me.
</OT>

--
Keith Thompson (The_Other_Keith) ks*@cts.com <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <*> <http://www.sdsc.edu/~kst>
Schroedinger does Shakespeare: "To be *and* not to be"
Nov 13 '05 #29

P: n/a
Keith Thompson <ks*@cts.com> scribbled the following:
Joona I Palaste <pa*****@cc.helsinki.fi> writes:
[...]
Are Perl implementations *required* to be written in C? And are
Perl implementations *required* to ship with the source code?
<OT>
Perl is pretty much defined by its implementation, not by a language
standard. The implementation (there's basically only one) is written
in C. It's distributed under one of two open source licenses, both of
which require the source to be available (but not necessarily shipped
with the binaries). This is probably incorrect in some minor details. If I had posted to
a more appropriate newsgroup, someone would jump in and correct me.
</OT>


OK, I have to concede with that, but Samuel's answer still wasn't
sufficient. Writing #include <perl_source.h> at the top of the Perl
file will change the program into a mix-and-match of C and Perl,
which will not compile as either language.

--
/-- Joona Palaste (pa*****@cc.helsinki.fi) ------------- Finland --------\
\-- http://www.helsinki.fi/~palaste --------------------- rules! --------/
"Roses are red, violets are blue, I'm a schitzophrenic and so am I."
- Bob Wiley
Nov 13 '05 #30

P: n/a
Irrwahn Grausewitz <ir*******@freenet.de> writes:
Stephen Samuel <st************@telus.net> wrote:
Here's a perl script which will handle *MOST* sane C code...

<snip>

Since when is perl topical in c.l.c?


This is an interesing edge case with respect to topicality. One could
argue that we're talking *about* C (which is clearly topical), but
we're using a mixture of Perl and English to discuss it. Think of the
Perl regular expression as a description of how to strip comments from
C source code.

On the other hand, not everyone here can be expected to speak Perl
regexps fluently.

--
Keith Thompson (The_Other_Keith) ks*@cts.com <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <*> <http://www.sdsc.edu/~kst>
Schroedinger does Shakespeare: "To be *and* not to be"
Nov 13 '05 #31

P: n/a
Stephen Samuel <st************@telus.net> wrote:
CBFalconer wrote:
*** rude top-posting fixed ***
Hmm.. This must be a relatively recent addition to usenet
ettiquete (i.e. in the last decade or so).


It's a convention in comp.lang.c (and several other technical
newsgroups) to place your comments after the part of the original
post you are responding to, in order to retain context. Thus
top-posting is discouraged in c.l.c.
Appologies. I'm an old foggie, and it's probably been an decade
since I've posted here.


Again, please do not send email copies of your replies; thank you.

Regards
--
Irrwahn
(ir*******@freenet.de)
Nov 13 '05 #32

P: n/a
Keith Thompson <ks*@cts.com> wrote:
Irrwahn Grausewitz <ir*******@freenet.de> writes:

Since when is perl topical in c.l.c?
This is an interesing edge case with respect to topicality. One could
argue that we're talking *about* C (which is clearly topical), but
we're using a mixture of Perl and English to discuss it. Think of the
Perl regular expression as a description of how to strip comments from
C source code.


That would make any solution to manipulate C sources implemented in
any language other than C topical in c.l.c. IMHO that would not be a
Good Thing[tm].
On the other hand, not everyone here can be expected to speak Perl
regexps fluently.


Indeed.

Regards
--
Irrwahn
(ir*******@freenet.de)
Nov 13 '05 #33

P: n/a

On Sun, 2 Nov 2003, CBFalconer wrote:
Stephen Samuel wrote:
Timex wrote:

I want to delete all comments in .c file.
#!/usr/bin/perl
$s=join("",<>);
# printf "[[%s]]\n\n",$s;
$s=~ s/("(\\\\|\\"|[^"])*")|(\/\*([^*]|\*(?=[^\/]))*\*\/)|(\/\/.*)/[[$1 ]]/g;
printf "[[%s]]\n\n",$s;

/* File uncmntc.c - demo of a text filter
Strips C comments. Tested to strip itself
by C.B. Falconer. 2002-08-15
Public Domain. Attribution appreciated
report bugs to <mailto:cb********@worldnet.att.net>
*/

<snip code>

I ran your program through some hurdles, and found that
it couldn't handle multibyte character constants for some
reason. I didn't bother to track down why; I just re-wrote
the filter from scratch. ;-) Here's my version, whose
algorithm may be completely different from yours.
This algorithm, on the other hand, completely fails to
handle line-splicing in the middle of comment delimiters: /\
* this is a comment */ does not work, nor does /* this either *\
/. Comment removal really is tricky in the most general case!
Proper error-checking on getc() and putc(), and a good
command-line interface, left as exercises for the interested
reader.
/* File uncmntc2.c - demo of a different text filter
Strips C comments. Tested to strip itself
Improves on CBFalconer's design by correctly handling '/*'
and by having a C89/C99 switch, but doesn't handle the /\
* delimiter correctly.
by Arthur O'Dwyer, 2002-11-03
Public Domain. Attribution appreciated
don't bother reporting bugs, just fix 'em...
*/

#include <stdio.h>
#include <stdlib.h>

/* Strip C99-style end-of-line comments? */
int AllowEOLComments = 1;

int strip_comments(FILE *fp, FILE *outfp);
static int put_carefully(int lastch, int ch, FILE *outfp);
int main(void)
{
strip_comments(stdin, stdout);
return 0;
}
int strip_comments(FILE *fp, FILE *outfp)
{
int ch;
int lastch;
int inchotes = 0;
int inquotes = 0;
int incomment = 0;
int ineolcomment = 0;

for (lastch = ' '; (ch = getc(fp)) != EOF; lastch = ch)
{
if (!incomment && !ineolcomment)
{
if (inquotes || inchotes)
putc(ch, outfp);
else
put_carefully(lastch, ch, outfp);
}

if (inchotes) {
if (ch == '\'' && lastch != '\\')
inchotes = 0;
} else if (inquotes) {
if (ch == '"' && lastch != '\\')
inquotes = 0;
} else if (incomment) {
if (ch == '/' && lastch == '*')
incomment = 0, ch = ' ';
} else if (ineolcomment) {
if (ch == '\n' && lastch != '\\')
ineolcomment = 0;
} else {
if (ch == '\'')
inchotes = 1;
else if (ch == '"')
inquotes = 1;
else if (lastch == '/' && ch == '*') {
putc(' ', outfp);
incomment = 1;
}
else if (AllowEOLComments && lastch == '/' && ch == '/')
ineolcomment = 1;
}
}

if (lastch == '/')
putc(lastch, outfp);

return 0;
}
static int put_carefully(int lastch, int ch, FILE *outfp)
{
/* Print out 'ch', but be very careful not to print
* any characters that might be part of a comment
* delimiter. Contrariwise, if 'lastch' is now
* definitely *not* a comment delimiter, we must now
* print it, too.
*/

if (AllowEOLComments) {
if (lastch == '/' && ch == '/')
return 0;
}
if (lastch == '/' && ch == '*')
return 0;
if (lastch == '/')
putc(lastch, outfp);
if (ch != '/')
putc(ch, outfp);
return 0;
}

Nov 13 '05 #34

P: n/a
"Arthur J. O'Dwyer" wrote:
On Sun, 2 Nov 2003, CBFalconer wrote:
.... snip ...
/* File uncmntc.c - demo of a text filter
Strips C comments. Tested to strip itself
by C.B. Falconer. 2002-08-15
Public Domain. Attribution appreciated
report bugs to <mailto:cb********@worldnet.att.net>
*/ <snip code>

I ran your program through some hurdles, and found that
it couldn't handle multibyte character constants for some
reason. I didn't bother to track down why; I just re-wrote
the filter from scratch. ;-) Here's my version, whose
algorithm may be completely different from yours.

.... snip ...

A known failing. It also fails miserably with trigraphs. The
multibyte char is probably easily handled analogously to handling
quoted strings.

/* File uncmntc2.c - demo of a different text filter
Strips C comments. Tested to strip itself
Improves on CBFalconer's design by correctly handling '/*'
and by having a C89/C99 switch, but doesn't handle the /\
* delimiter correctly.
by Arthur O'Dwyer, 2002-11-03

^^^^
That is the year I wrote mine :-)

All of which shows that there are multiple ways to implement a
black box. I omitted any reference to cats because I happen to
like them.

--
Chuck F (cb********@yahoo.com) (cb********@worldnet.att.net)
Available for consulting/temporary embedded and systems.
<http://cbfalconer.home.att.net> USE worldnet address!
Nov 13 '05 #35

P: n/a
In <Pi***********************************@unix42.andr ew.cmu.edu> "Arthur J. O'Dwyer" <aj*@nospam.andrew.cmu.edu> writes:
[snip] Comment removal really is tricky in the most general case!


Since this is exercise 1-23 in K&R2, there are several solutions
available at Richard's site:

http://users.powernet.co.uk/eton/kandr2/index.html

including a 556-line entry from Chris Torek that I think also brews
coffee...

Pat

BTW, Richard: Would you consider adding a plaintext version of the
"naming conventions" page to the zipfile as a sort of "README"?
Nov 13 '05 #36

P: n/a
Patrick Foley wrote:
BTW, Richard: Would you consider adding a plaintext version of the
"naming conventions" page to the zipfile as a sort of "README"?


I am currently re-evaluating the Answers section of my site. I'll get back
to you when I have a bit more time.

--
Richard Heathfield : bi****@eton.powernet.co.uk
"Usenet is a strange place." - Dennis M Ritchie, 29 July 1999.
C FAQ: http://www.eskimo.com/~scs/C-faq/top.html
K&R answers, C books, etc: http://users.powernet.co.uk/eton
Nov 13 '05 #37

P: n/a
>In <Pi***********************************@unix42.andr ew.cmu.edu> "Arthur J. O'Dwyer" <aj*@nospam.andrew.cmu.edu> writes:
[snip] Comment removal really is tricky in the most general case!

In article <nc************@myname.my.domain>
Patrick Foley <pf****@earthlink.net> writes:Since this is exercise 1-23 in K&R2, there are several solutions
available at Richard's site:

http://users.powernet.co.uk/eton/kandr2/index.html

including a 556-line entry from Chris Torek that I think also brews
coffee...


But it has (gasp!) a *bug*. :-) The "level 2 state machine" for
handling comments fails to reconsider characters in a few cases.
I think the main (only?) problem can be fixed without too much
fuss:

case L2_SLASH:
if (c == '*')
l2state = L2_COMM;
else if (c99 && c == '/')
l2state = L2_SLASHSLASH;
else {
SYNCLINES();
OUTPUT('/', 0);
--> if (c != '/') {
--> if (c != EOF)
--> COPY();
--> l2state = L2_NORMAL;
--> }
}
break;

The bug is in the marked lines, which output the first slash
and then change the level-2 state. But the new state should
be "that which results in seeing character c as if the initial
state had been L2_NORMAL", so we could replace all of them with:

l2state = L2_NORMAL;
goto l2_normal_case;

and add an "l2_normal_case" label under case L2_NORMAL: above.
Alternatively, the assignment to l2state can be changed to:

l2state = c == '\'' ? L2_CC :
c == '"' ? L2_SC : L2_NORMAL;

which avoids the dreaded "goto", and simply duplicates what would
have happened in L2_NORMAL state (except of course that instead of
replacing l2state with L2_SLASH for '/', we have to replace it with
L2_NORMAL for characters that are not in [/'"]).
--
In-Real-Life: Chris Torek, Wind River Systems
Salt Lake City, UT, USA (4039.22'N, 11150.29'W) +1 801 277 2603
email: forget about it http://67.40.109.61/torek/index.html (for the moment)
Reading email is like searching for food in the garbage, thanks to spammers.
Nov 13 '05 #38

P: n/a
Timex wrote:
I want to delete all comments in .c file.

Size of .c file is very big.

Any good idea to do this?

Please show me example code.


Perhaps a better idea is to break the file into
smaller pieces upon better themes.

I believe that delete all the comments is crime
against programming ethics. After all, one of
the greatest ideals to achieve is to make a
program readable by a programming illiterate
person.

--
Thomas Matthews

C++ newsgroup welcome message:
http://www.slack.net/~shiva/welcome.txt
C++ Faq: http://www.parashift.com/c++-faq-lite
C Faq: http://www.eskimo.com/~scs/c-faq/top.html
alt.comp.lang.learn.c-c++ faq:
http://www.raos.demon.uk/acllc-c++/faq.html
Other sites:
http://www.josuttis.com -- C++ STL Library book

Nov 13 '05 #39

P: n/a
In article <Pi***********************************@unix42.andr ew.cmu.edu>,
Arthur J. O'Dwyer <aj*@nospam.andrew.cmu.edu> wrote:
> Timex wrote:
> >
> > I want to delete all comments in .c file.


I tested Arthur's program and, despite its claim, it couldn't
even strip its own comments (it left in the comment in
put_carefully()). The bug is that it thought the backslash
meant that '\\' was not a complete character constant (nor
would it think "\\" was a complete string).

Is this a complete C99-style comment?
// \\
If it is, a similar fix may be needed in that part of the code.

Lesson: Comment removal really is tricky in the most general case!

Agreed.

-- Gary

My attempt at a bug fix:
/* File uncmntc2.c - demo of a different text filter
Strips C comments. Tested to strip itself
Improves on CBFalconer's design by correctly handling '/*'
and by having a C89/C99 switch, but doesn't handle the /\
* delimiter correctly.
by Arthur O'Dwyer, 2002-11-03
bug fix by Gary Ansok, 2003-11-06 to handle '\\' and "\\"
Public Domain. Attribution appreciated
don't bother reporting bugs, just fix 'em...
*/

#include <stdio.h>
#include <stdlib.h>

/* Strip C99-style end-of-line comments? */
int AllowEOLComments = 1;

int strip_comments(FILE *fp, FILE *outfp);
static int put_carefully(int lastch, int ch, FILE *outfp);
int main(void)
{
strip_comments(stdin, stdout);
return 0;
}
int strip_comments(FILE *fp, FILE *outfp)
{
int ch;
int lastch;
int inchotes = 0;
int inquotes = 0;
int incomment = 0;
int ineolcomment = 0;
int backslashed = 0;

for (lastch = ' '; (ch = getc(fp)) != EOF; lastch = ch)
{
if (!incomment && !ineolcomment)
{
if (inquotes || inchotes)
putc(ch, outfp);
else
put_carefully(lastch, ch, outfp);
}

if (inchotes) {
if (lastch == '\\')
backslashed ^= 1;
else
backslashed = 0;
if (ch == '\'' && !backslashed)
inchotes = 0;
} else if (inquotes) {
if (lastch == '\\')
backslashed ^= 1;
else
backslashed = 0;
if (ch == '"' && !backslashed)
inquotes = 0;
} else if (incomment) {
if (ch == '/' && lastch == '*')
incomment = 0, ch = ' ';
} else if (ineolcomment) {
if (ch == '\n' && lastch != '\\')
ineolcomment = 0;
} else {
if (ch == '\'')
inchotes = 1;
else if (ch == '"')
inquotes = 1;
else if (lastch == '/' && ch == '*') {
putc(' ', outfp);
incomment = 1;
}
else if (AllowEOLComments && lastch == '/' && ch == '/')
ineolcomment = 1;
}
}

if (lastch == '/')
putc(lastch, outfp);

return 0;
}
static int put_carefully(int lastch, int ch, FILE *outfp)
{
/* Print out 'ch', but be very careful not to print
* any characters that might be part of a comment
* delimiter. Contrariwise, if 'lastch' is now
* definitely *not* a comment delimiter, we must now
* print it, too.
*/

if (AllowEOLComments) {
if (lastch == '/' && ch == '/')
return 0;
}
if (lastch == '/' && ch == '*')
return 0;
if (lastch == '/')
putc(lastch, outfp);
if (ch != '/')
putc(ch, outfp);
return 0;
}
Nov 13 '05 #40

This discussion thread is closed

Replies have been disabled for this discussion.