Writing single bits to a file

Don Bruder <da****@sonic.netwrites:

In article <11**********************@v23g2000prn.googlegroups .com>,
riva <ra*********@gmail.comwrote:

>Well, what if I have just 19 bytes to write that means 2 chars and 3
bits!

(Assuming you actually meant 19 BITS, rather than 19 BYTES...)

Then stuff the three leftover bits into a third byte, pad that byte with
5 bits of your choice (personally, I'd go with zeroes, but that's just
me) and write it.

Considering that every currently-in-use OS uses the concept of "blocks",
"chunks", or "clusters" as the smallest possible unit for disk I/O, with
each of them being made up of some multiple of 128 bytes, it's
incredibly unlikely that "saving" 5 bits is going to be meaningful in
any realistic situation.

Sure, but it's not just a matter of saving space. The size of a file
(i.e., the number of bytes that have been written to it) can be an
important piece of information.

Under many operating systems, the size of a file is *recorded* as an
exact number of bytes, even if the physical size on disk is a multiple
of some larger block size. (However, the C standard doesn't guarantee
this; binary files can be padded at the end with null characters.)

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <* <http://users.sdsc.edu/~kst>
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"

Oct 26 '07 #7

Tor Rustad

riva wrote:

I am developing a compression program. Is there any way to write a
data to file in the form of bits, like write bit 0 then bit 1 and then
bit 1 and so on ....

The way this can be done in C compression SW, is if you provide your own
bit level I/O functions. However, in standard C those I/O functions need
to commit no less than chunks of size CHAR_BIT to a file, which is at
least an 8 bit object.

Standard practice is to pad the last character, just like we do in
communication or encryption SW. For details, look up a data compression
book, or ask in a data compression news group.

--
Tor <torust [at] online [dot] no>

"Technical skill is mastery of complexity, while creativity is mastery
of simplicity"

Oct 26 '07 #8

Eric Sosman

Keith Thompson wrote:

[...]
Sure, but it's not just a matter of saving space. The size of a file
(i.e., the number of bytes that have been written to it) can be an
important piece of information.

That's really not much help for the O.P.'s situation:
He's working on some kind of compression program, which can
generate a stream of output bits that doesn't "chunk" neatly
into an integral number of bytes. No file system I've ever
heard of can record a file length of 10007 bits. And it
gets worse: Some of the best compressors encode their output
in fractions of bits; if a file length of 10007 bits is bad,
a length of 10006.59029663+ bits is *really* bad!

See Walter Roberson's response.

--
Eric Sosman
es*****@ieee-dot-org.invalid

Oct 27 '07 #9

Malcolm McLean

"Don Bruder" <da****@sonic.netwrote in message

Considering that every currently-in-use OS uses the concept of "blocks",
"chunks", or "clusters" as the smallest possible unit for disk I/O, with
each of them being made up of some multiple of 128 bytes, it's
incredibly unlikely that "saving" 5 bits is going to be meaningful in
any realistic situation.

Create thousands of such files, then tar them. See if you can detect a
difference between ten thousand 128 byte files and ten thousand 18-byte
files.

--
Free games and programming goodies.
http://www.personal.leeds.ac.uk/~bgy1mm

Oct 27 '07 #10

riva:

I am developing a compression program. Is there any way to write a
data to file in the form of bits, like write bit 0 then bit 1 and then
bit 1 and so on ....

Maybe something like: (Unchecked code)

typedef struct BitWriterFileInfo {
FILE *pfile;
unsigned cur_byte_val,
i_bit;
} BitWriterFileInfo;

void InitBitWriterFileInfo(BitWriterFileInfo *const p, FILE *const
pfile)
{
p->pfile = pfile;
p->cur_byte_val = 0;
p->i_bit = 0;
}
void WriteBit(BitWriterFileInfo *const pout,int const val)
{
if (val)
{
p->cur_byte_val |= 1u << p->i_bit;
}

if (CHAR_BIT == ++(p->i_bit))
{
P->i_bit = 0;
P->cur_byte_val = 0;

fprintf(pout->pfile....
}
}

You'd need a finaliser routine aswell if (bits_written % CHAR_BIT) is
anything other than zero.

Object-orientated programming would fit here very well by the way.

Martin

Oct 27 '07 #11

"Martin Wells" <wa****@eircom.neta écrit dans le message de news:
11**********************@o3g2000hsb.googlegroups.c om...

riva:

>I am developing a compression program. Is there any way to write a
data to file in the form of bits, like write bit 0 then bit 1 and then
bit 1 and so on ....

Maybe something like: (Unchecked code)

Unchecked indeed ;-)

typedef struct BitWriterFileInfo {
FILE *pfile;
unsigned cur_byte_val,
i_bit;
} BitWriterFileInfo;

void InitBitWriterFileInfo(BitWriterFileInfo *const p, FILE *const
pfile)
{
p->pfile = pfile;
p->cur_byte_val = 0;
p->i_bit = 0;
}
void WriteBit(BitWriterFileInfo *const pout,int const val)
{
if (val)
{
p->cur_byte_val |= 1u << p->i_bit;

p should be pout

}

if (CHAR_BIT == ++(p->i_bit))
{
P->i_bit = 0;
P->cur_byte_val = 0;

P should be pout too ;-)

You should write the byte to the stream before clearing it!

fprintf(pout->pfile....

putc(pout->cur_byte_val, pout->pfile);
pout->cur_byte_val = pout->i_bit = 0;

}
}

You'd need a finaliser routine aswell if (bits_written % CHAR_BIT) is
anything other than zero.

Yes.

Object-orientated programming would fit here very well by the way.

And getting rid of these extra const qualifiers on the function parameters
would improve readability, making parameter name mixup more obvious. A good
illustration of my remarks elsethread.

--
Chqrlie.

Oct 27 '07 #12

Chqrlie:

And getting rid of these extra const qualifiers on the function parameters
would improve readability, making parameter name mixup more obvious.

To be honest I'll use const wherever I can, unless it's stupid (and by
stupid I mean makes no difference whatsoever, or that the difference
has no benefits). An example of such a "stupid" case would be casting
to a const type, reason being that it doesn't make a difference
whether an R-value is const or not because you can't modify it anyway.
If I take a pointer as an argument to a function, and if I don't
intend on changing the pointer at all, then it makes sense to me to
declare it as const. Also, I don't find that it detracts form
readibility or that it adds confusion, so I suppose we just have
different opinions on this.

Just an aside. . . in C, the whole BitWriterFileInfo thing would be
implemented as follows:

int main(void)
{
BitWriterFileInfo obj;

InitBigWriterFileInfo(&obj, stdout);

/* Use it */

FinaliseBigWriterFileInfo(&obj);

return 0;
}

, whereas in something like C++, it would be:

int main()
{
BigFileWriter obj;

/* Use it */
}

Usually, I'm mad for the ol' procedural programming and I use it 9
times out of 10, but this is definitely one of the places where I'd
opt for the object orientated.

Martin

Oct 27 '07 #13

"Martin Wells" <wa****@eircom.neta écrit dans le message de news:
11**********************@v3g2000hsg.googlegroups.c om...

Chqrlie:

>And getting rid of these extra const qualifiers on the function
parameters
would improve readability, making parameter name mixup more obvious.

To be honest I'll use const wherever I can, unless it's stupid (and by
stupid I mean makes no difference whatsoever, or that the difference
has no benefits). An example of such a "stupid" case would be casting
to a const type, reason being that it doesn't make a difference
whether an R-value is const or not because you can't modify it anyway.
If I take a pointer as an argument to a function, and if I don't
intend on changing the pointer at all, then it makes sense to me to
declare it as const. Also, I don't find that it detracts form
readibility or that it adds confusion, so I suppose we just have
different opinions on this.

Given your choice of field names, you shouldn't be too proud of your coding
conventions, stylistic or otherwise. We do disagree on what you consider
'stupid' as well. IMHO, if the need arises to cast a qualified pointer to a
different type in an expression, needlessly unqualifying it at the same time
*is* stupid.

const qualifying the parameters of a 3 line function is completely useless.
Just like casting strcpy to (void) and casting the result of malloc().

--
Chqrlie.

Oct 27 '07 #14

Chqrlie:

Given your choice of field names, you shouldn't be too proud of your coding
conventions, stylistic or otherwise.

By "field names", do you mean the names I choose for functions and
variables? What in particular is wrong with them? Please take one of
them as an example and point out the (perceived) flaws to me.

We do disagree on what you consider
'stupid' as well. IMHO, if the need arises to cast a qualified pointer to a
different type in an expression, needlessly unqualifying it at the same time
*is* stupid.

That's not what I mean. I meant the likes of:

int i;
double x;

...

i = (int const)x;

Writint "int const" instead of "int" has no merit. (Let's assume that
the cast was present to suppress a compiler warning). But even if we
take a pointer type, we could have:

int i;

char *p = (char*)&i;
char *p = (char const *)&i; /* This is stupid */

const qualifying the parameters of a 3 line function is completely useless.
Just like casting strcpy to (void) and casting the result of malloc().

I disagree. The merit is that you'll get a compiler error if you try
to alter something, which, you didn't intend to alter in the first
place.

The be-all and end-all of it though is that I do be consistent with
const -- i.e. if I'm don't plan on changing it, then make it const.

Martin

Oct 27 '07 #15

cr88192

"Charlie Gordon" <ne**@chqrlie.orgwrote in message
news:47***********************@news.free.fr...

"Martin Wells" <wa****@eircom.neta écrit dans le message de news:
11**********************@v3g2000hsg.googlegroups.c om...
>Chqrlie:

>>And getting rid of these extra const qualifiers on the function
parameters
would improve readability, making parameter name mixup more obvious.

To be honest I'll use const wherever I can, unless it's stupid (and by
stupid I mean makes no difference whatsoever, or that the difference
has no benefits). An example of such a "stupid" case would be casting
to a const type, reason being that it doesn't make a difference
whether an R-value is const or not because you can't modify it anyway.
If I take a pointer as an argument to a function, and if I don't
intend on changing the pointer at all, then it makes sense to me to
declare it as const. Also, I don't find that it detracts form
readibility or that it adds confusion, so I suppose we just have
different opinions on this.

Given your choice of field names, you shouldn't be too proud of your
coding conventions, stylistic or otherwise. We do disagree on what you
consider 'stupid' as well. IMHO, if the need arises to cast a qualified
pointer to a different type in an expression, needlessly unqualifying it
at the same time *is* stupid.

'const' is a keyword I don't really know if I have ever really used...

reason: it doesn't really do anything, beyond telling the compiler to
complain to me about stuff I should sanely know already anyways...

now, of course, there are cases where I do want the compiler to complain:
missing prototypes.

reason: this leads to very real and very messy results.
it also helps me maintain general strict modularity conventions (I can't
call any code I did not include a header for, or for detecting functions
which have ceased to exist, which is useful when reorganizing my codebase).
....

note that I make this tolerable, mostly because I tend to use a special tool
to 'autoheader' my source code (though, in my case, in a few places this
does lead to special comments telling the tool to ignore certain functions,
....).

const qualifying the parameters of a 3 line function is completely
useless. Just like casting strcpy to (void) and casting the result of
malloc().

I will disagree with the casting the result of malloc.

main reason:
it is necessary if one wants their code to compile in both C and C++
compilers, since C++ usually raises a big fuss about uncast conversions
to/from 'void *'...

--
Chqrlie.

Oct 27 '07 #16

CBFalconer

cr88192 wrote:

>

.... snip ...

>
I will disagree with the casting the result of malloc.

main reason:
it is necessary if one wants their code to compile in both C and
C++ compilers, since C++ usually raises a big fuss about uncast
conversions to/from 'void *'...

If you write code and try to routinely compile it with both C and
C++ compilers you deserve whatever happens to you. The languages
are _different_. The multiple compilers is not normally useful
anyhow, since you can always compile C in such a manner as to be
linkable to C++, but not the reverse.

--
Chuck F (cbfalconer at maineline dot net)
Available for consulting/temporary embedded and systems.
<http://cbfalconer.home.att.net>

--
Posted via a free Usenet account from http://www.teranews.com

Oct 28 '07 #17

cr88192

"CBFalconer" <cb********@yahoo.comwrote in message
news:47***************@yahoo.com...

cr88192 wrote:
>>
... snip ...
>>
I will disagree with the casting the result of malloc.

main reason:
it is necessary if one wants their code to compile in both C and
C++ compilers, since C++ usually raises a big fuss about uncast
conversions to/from 'void *'...

If you write code and try to routinely compile it with both C and
C++ compilers you deserve whatever happens to you. The languages
are _different_. The multiple compilers is not normally useful
anyhow, since you can always compile C in such a manner as to be
linkable to C++, but not the reverse.

at the time, it was an issue of whether or not I wanted C++ style name
mangling in the object files (after all, this mangling gives at least some
useful type info). eventually I decided that I did not (the proposition
posed more problems than it was worth).

as for C and C++ being different:
they are similar enough here that there is not much real problem in making
code that will work on both, as most of the differences at this level are
minor and fairly trivial to deal with.

eventually, I normalized on using good old plain and unmangled names (and,
more so, internally normalizing on not having underscore prefixes, though on
windows, this is still the convention within the object and exe files...).

and so on...

--
Chuck F (cbfalconer at maineline dot net)
Available for consulting/temporary embedded and systems.
<http://cbfalconer.home.att.net>

--
Posted via a free Usenet account from http://www.teranews.com

Oct 28 '07 #18

Willem

cr88192 wrote:
) 'const' is a keyword I don't really know if I have ever really used...
)
) reason: it doesn't really do anything, beyond telling the compiler to
) complain to me about stuff I should sanely know already anyways...

And telling the compiler that it can do certain optimizations.
SaSW, Willem
--
Disclaimer: I am in no way responsible for any of the statements
made in the above text. For all I know I might be
drugged or something..
No I'm not paranoid. You all think I'm paranoid, don't you !
#EOT

Oct 28 '07 #19

cr88192

"Willem" <wi****@stack.nlwrote in message
news:sl********************@snail.stack.nl...

cr88192 wrote:
) 'const' is a keyword I don't really know if I have ever really used...
)
) reason: it doesn't really do anything, beyond telling the compiler to
) complain to me about stuff I should sanely know already anyways...

And telling the compiler that it can do certain optimizations.

potentially, I guess, if the compiler does not figure this out on its own...

I guess it provides a means for constant folding:
const int foo=3;
....

if(foo) //well now, here we know it is 3...
{ ... }

but, I don't really see why one can't do similar by use of a dirty/clean
flag here (where a variable is dirty if not provable clean). this approach
worked in the past when I was writing interpreters (I guess const serves to
indicate that it is always clean...).
actually, in my compiler, some keywords are parsed but ignored.
at present this is the case with const.

>
SaSW, Willem
--
Disclaimer: I am in no way responsible for any of the statements
made in the above text. For all I know I might be
drugged or something..
No I'm not paranoid. You all think I'm paranoid, don't you !
#EOT

Oct 28 '07 #20

James Kuyper

cr88192 wrote:
....

'const' is a keyword I don't really know if I have ever really used...

reason: it doesn't really do anything, beyond telling the compiler to
complain to me about stuff I should sanely know already anyways...

The 'const' keyword provides the same kind of benefit that prototypes
do: you declare something about how an identifier is intended to be
used, enabling the compiler to warn you if it detects the fact that you
accidentally use it in a manner different from the what the declaration
says. Then you get to decide whether it's the declaration or the usage
that is incorrect. This is one of the many things that the compiler can
check far quicker and more reliably than I can.

True, if I were a perfect programmer, I would never need that warning. I
don't know any perfect programmers. I'm certainly not one, and I'll
happily do what's needed to enable this kind of warning.

Oct 28 '07 #21

"Martin Wells" <wa****@eircom.neta écrit dans le message de news:
11*********************@z9g2000hsf.googlegroups.co m...

Chqrlie:

>Given your choice of field names, you shouldn't be too proud of your
coding
conventions, stylistic or otherwise.

By "field names", do you mean the names I choose for functions and
variables? What in particular is wrong with them? Please take one of
them as an example and point out the (perceived) flaws to me.

No, I meant struct member names.

>We do disagree on what you consider
'stupid' as well. IMHO, if the need arises to cast a qualified pointer
to a
different type in an expression, needlessly unqualifying it at the same
time
*is* stupid.

That's not what I mean. I meant the likes of:

int i;
double x;

...

i = (int const)x;

Writint "int const" instead of "int" has no merit.

I agree completely.

(Let's assume that the cast was present to suppress a compiler warning).

Stupid compiler is this case.

But even if we take a pointer type, we could have:

int i;

char *p = (char*)&i;
char *p = (char const *)&i; /* This is stupid */

I agree as well.

>const qualifying the parameters of a 3 line function is completely
useless.
Just like casting strcpy to (void) and casting the result of malloc().

I disagree. The merit is that you'll get a compiler error if you try
to alter something, which, you didn't intend to alter in the first
place.

The be-all and end-all of it though is that I do be consistent with
const -- i.e. if I'm don't plan on changing it, then make it const.

Consistency has its merits. But do you also write:

int main(int const argc, char * const * const argv) { ... }

--
Chqrlie.

Oct 28 '07 #22

cr88192

"James Kuyper" <ja*********@verizon.netwrote in message
news:vQ_Ui.3729$R%4.932@trnddc05...

cr88192 wrote:
...
>'const' is a keyword I don't really know if I have ever really used...

reason: it doesn't really do anything, beyond telling the compiler to
complain to me about stuff I should sanely know already anyways...

The 'const' keyword provides the same kind of benefit that prototypes do:
you declare something about how an identifier is intended to be used,
enabling the compiler to warn you if it detects the fact that you
accidentally use it in a manner different from the what the declaration
says. Then you get to decide whether it's the declaration or the usage
that is incorrect. This is one of the many things that the compiler can
check far quicker and more reliably than I can.

prototypes provide a lot more:
they actually make the type handling work right...

(well, that and as a side benefiet, I use them to help reinfoce
modularity...).

True, if I were a perfect programmer, I would never need that warning. I
don't know any perfect programmers. I'm certainly not one, and I'll
happily do what's needed to enable this kind of warning.

and I am also a person who writes some amount of stuff in assembler as well,
where assembler provides no such niceties...
however, it is my belief that what const offers, for the most part, is
something people will have already long-since internalized. unlike some
other errors, these are likely to have a much lower chance of random-chance
incedence, which most often consist of IME missing/mistyped variables, major
type errors (often caused by another error), and missing/mixing function
arguments...

assigning a read-only variable is a little less likely, on the grounds that
this action is far more likely to be deliberate.
or such...

Oct 28 '07 #23

"cr88192" <cr*****@hotmail.coma écrit dans le message de news:
13***************************@saipan.com...

>
"James Kuyper" <ja*********@verizon.netwrote in message
news:vQ_Ui.3729$R%4.932@trnddc05...
>cr88192 wrote:
...
>>'const' is a keyword I don't really know if I have ever really used...

reason: it doesn't really do anything, beyond telling the compiler to
complain to me about stuff I should sanely know already anyways...

The 'const' keyword provides the same kind of benefit that prototypes do:
you declare something about how an identifier is intended to be used,
enabling the compiler to warn you if it detects the fact that you
accidentally use it in a manner different from the what the declaration
says. Then you get to decide whether it's the declaration or the usage
that is incorrect. This is one of the many things that the compiler can
check far quicker and more reliably than I can.

prototypes provide a lot more:
they actually make the type handling work right...

(well, that and as a side benefiet, I use them to help reinfoce
modularity...).

James said "the same kind", not "the same amount".
I do enable a ton of warnings, and use extra tools such as valgrind, sparse,
and custom made ones.
I haven't looked at your compiler yet, I'm willing to bet the code would
benefit from such a treatment.

>True, if I were a perfect programmer, I would never need that warning. I
don't know any perfect programmers. I'm certainly not one, and I'll
happily do what's needed to enable this kind of warning.

and I am also a person who writes some amount of stuff in assembler as
well, where assembler provides no such niceties...

I ride motorbikes, yet I fasten my seat belt in a car. Why take risks all
the time?

however, it is my belief that what const offers, for the most part, is
something people will have already long-since internalized. unlike some
other errors, these are likely to have a much lower chance of
random-chance incedence, which most often consist of IME missing/mistyped
variables, major type errors (often caused by another error), and
missing/mixing function arguments...

assigning a read-only variable is a little less likely, on the grounds
that this action is far more likely to be deliberate.

const correctness, although it requires discipline, pays off.
You are probably a bit young and still remember everything you type, when
you start experiencing memory lapses (from 25 up) you will find all these
little tricks pretty handy.

or such...

What do you mean by that? or it your signature? or such ...

--
Chqrlie.

Oct 28 '07 #24

Ben Bacarisse

"Charlie Gordon" <ne**@chqrlie.orgwrites:

"Martin Wells" <wa****@eircom.neta Ã©crit dans le message de news:
11*********************@z9g2000hsf.googlegroups.co m...

<snip>

>The be-all and end-all of it though is that I do be consistent with
const -- i.e. if I'm don't plan on changing it, then make it const.

Consistency has its merits. But do you also write:

int main(int const argc, char * const * const argv) { ... }

<nit-pick size="micro">
A case could be made that such a definition is not legal. The
standard allows 'int main(int argc, char *argv[])' "or equivalent"
with a footnote that suggests the equivalence is to be exact
(e.g. type synonyms and 'char **' are OK but that is about it).

Since one can not even pass a 'char **' argument to a function that
expects a 'char *const *const' parameter it would be hard to argue
that your example is "equivalent".
</nit-pick>

--
Ben.

Oct 28 '07 #25

Chqrlie:

Consistency has its merits. But do you also write:

int main(int const argc, char * const * const argv) { ... }

Yes, I'd make them const if I didn't plan on changing it.

Martin

Oct 28 '07 #26

"Ben Bacarisse" <be********@bsb.me.uka écrit dans le message de news:
87************@bsb.me.uk...

"Charlie Gordon" <ne**@chqrlie.orgwrites:
>"Martin Wells" <wa****@eircom.neta écrit dans le message de news:
11*********************@z9g2000hsf.googlegroups.co m...
<snip>

>>The be-all and end-all of it though is that I do be consistent with
const -- i.e. if I'm don't plan on changing it, then make it const.

Consistency has its merits. But do you also write:

int main(int const argc, char * const * const argv) { ... }

<nit-pick size="micro">
A case could be made that such a definition is not legal. The
standard allows 'int main(int argc, char *argv[])' "or equivalent"
with a footnote that suggests the equivalence is to be exact
(e.g. type synonyms and 'char **' are OK but that is about it).

Since one can not even pass a 'char **' argument to a function that
expects a 'char *const *const' parameter it would be hard to argue
that your example is "equivalent".
</nit-pick>

The prototype I wrote as a joke for main is compatible with the classic int
main(int argc, char *argv[]); in the sense that passing an int and a char *
array would be OK, but it is incompatible in terms of signatures.
Too bad, there is no way to hint that main does not modify the array (not
the strings it points to).

What about int main(int const argc, char ** const argv) { ... } ? This one
is compatible with the standard.
Is this how you define main ?

I can think of an even more verbose yet compatible one:

signed int main(register signed int const argc, register char ** const
restrict argv) { ... }

Great! it does not even fit on one line.

--
Chqrlie.

Oct 28 '07 #27

"Martin Wells" <wa****@eircom.neta écrit dans le message de news:
11*********************@22g2000hsm.googlegroups.co m...

Chqrlie:

>Consistency has its merits. But do you also write:

int main(int const argc, char * const * const argv) { ... }

Yes, I'd make them const if I didn't plan on changing it.

You are one of a kind !

--
Chqrlie.

Oct 28 '07 #28

Ben Bacarisse

"Charlie Gordon" <ne**@chqrlie.orgwrites:

"Ben Bacarisse" <be********@bsb.me.uka Ã©crit dans le message de news:
87************@bsb.me.uk...
>"Charlie Gordon" <ne**@chqrlie.orgwrites:
>>"Martin Wells" <wa****@eircom.neta Ã©crit dans le message de news:
11*********************@z9g2000hsf.googlegroups.co m...
<snip>
>>>The be-all and end-all of it though is that I do be consistent with
const -- i.e. if I'm don't plan on changing it, then make it const.

Consistency has its merits. But do you also write:

int main(int const argc, char * const * const argv) { ... }

<nit-pick size="micro">
A case could be made that such a definition is not legal.

<snip>

></nit-pick>

The prototype I wrote as a joke ...

<snip>

What about int main(int const argc, char ** const argv) { ... } ? This one
is compatible with the standard.
Is this how you define main ?

No, I write 'int main(int argc, char *argv[])' -- I did not think you
were making a joke. I thought you were suggesting a legal, but daft,
alternative to make a point.

--
Ben.

Oct 28 '07 #29

"Ben Bacarisse" <be********@bsb.me.uka écrit dans le message de news:
87************@bsb.me.uk...

"Charlie Gordon" <ne**@chqrlie.orgwrites:

>"Ben Bacarisse" <be********@bsb.me.uka écrit dans le message de news:
87************@bsb.me.uk...
>>"Charlie Gordon" <ne**@chqrlie.orgwrites:
"Martin Wells" <wa****@eircom.neta écrit dans le message de news:
11*********************@z9g2000hsf.googlegroups.co m...
<snip>
The be-all and end-all of it though is that I do be consistent with
const -- i.e. if I'm don't plan on changing it, then make it const.

Consistency has its merits. But do you also write:

int main(int const argc, char * const * const argv) { ... }

<nit-pick size="micro">
A case could be made that such a definition is not legal.

<snip>

>></nit-pick>

The prototype I wrote as a joke ...
<snip>
>What about int main(int const argc, char ** const argv) { ... } ? This
one
is compatible with the standard.
Is this how you define main ?

No, I write 'int main(int argc, char *argv[])' -- I did not think you
were making a joke. I thought you were suggesting a legal, but daft,
alternative to make a point.

Well I thought I was making a joke, but Martin Wells does const argc and
argv in his definitions of main when they are not modified.

--
Chqrlie.

Oct 28 '07 #30

cr88192

"Charlie Gordon" <ne**@chqrlie.orgwrote in message
news:47**********************@news.free.fr...

"cr88192" <cr*****@hotmail.coma écrit dans le message de news:
13***************************@saipan.com...
>>
"James Kuyper" <ja*********@verizon.netwrote in message
news:vQ_Ui.3729$R%4.932@trnddc05...
>>cr88192 wrote:
...
'const' is a keyword I don't really know if I have ever really used...

reason: it doesn't really do anything, beyond telling the compiler to
complain to me about stuff I should sanely know already anyways...

The 'const' keyword provides the same kind of benefit that prototypes
do: you declare something about how an identifier is intended to be
used, enabling the compiler to warn you if it detects the fact that you
accidentally use it in a manner different from the what the declaration
says. Then you get to decide whether it's the declaration or the usage
that is incorrect. This is one of the many things that the compiler can
check far quicker and more reliably than I can.

prototypes provide a lot more:
they actually make the type handling work right...

(well, that and as a side benefiet, I use them to help reinfoce
modularity...).

James said "the same kind", not "the same amount".
I do enable a ton of warnings, and use extra tools such as valgrind,
sparse, and custom made ones.
I haven't looked at your compiler yet, I'm willing to bet the code would
benefit from such a treatment.

potentially...
actually, I am ending up endlessly debugging and fixing things, but not so
much syntactic, primarily semantic issues.

a recent example was rigging up some 'bypass' code to allow me to more
effieciently move floats and doubles between the FPU and SSE (happens, say,
whenever one uses sin or cos), because, as it was before, this operation
would end up flushing the register allocator (bad...). now, it is just
storing into memory (the bottom of the compiler's notion of the stack) on
one end, and loading from the other (still not sure why x86 lacks opcodes
like 'fld32 xmm3', or 'fstp64 xmm0', these would be useful...).
another recent example was noting that my expression parsing, didn't exactly
closely match the C operator precedence rules (noted in part, because me
typing '*(vec3 *)(&v0)', failed to parse right...).

to a large degree, my parser was just sort of reused from my last scripting
language and beaten into shape, but I had failed to notice that I had not
gone and more correctly fixed up the precedence heirarchy (unary and postfix
operators were the same precedence, bitwise operators were the same as
normal arithmetic operators, ...).
so, now, everything is much more closely in tune with the C stadard, for
better or for worse (I don't entirely like C's precedence rules, but then
again, this is partly why my last script lang did them differently, but in
any case conformance forces me to live with them...).

well, at least in the upper-end of my parser (tokenizer mostly), I have gone
and added more operator and brace types (12 new brace types, based on
character combos that should not occure in valid well-formed C, and 22 new
operator tokens). more are possible if one is willing to go into the land of
horrible-looking tokens ('#<. stuff .>' is allready pretty bad...).

if I used them, it would be mostly for compiler and language extensions (the
operators specifically to be overloaded...).

most of the operators take forms like '+.' or '.+', and I will define that
they have precedence similar to those of the operators they resemble (unless
defined for something, it will be an error to try to use them though...).

I also added '~' as an infix operator, which I am considering will operate
like an exponent operator ('a~3', since 'a^b' generally means xor, and
'a**b' is ambiguous). it could also serve as an alternative for 'dot
product', which is currently handled with '^', which, sadly, has a very low
precedence (this however, becomes ambiguous for quaternions, which are both
numeric and vector, and thus can have both exponents and dot product...).

could potentially also add `, as an operator, since it is not otherwise used
as a quote ('2`3', 'u`v', ...). likewise for @ and $ (though gcc allows the
latter in names, I may not, but as of yet I am undecided...).
hmm...

but, whatever, all this is non-standard anyways...

(my great cost: before writing a C compiler, I implemented script languages,
I guess I still sort of think in this way...).

>>True, if I were a perfect programmer, I would never need that warning. I
don't know any perfect programmers. I'm certainly not one, and I'll
happily do what's needed to enable this kind of warning.

and I am also a person who writes some amount of stuff in assembler as
well, where assembler provides no such niceties...

I ride motorbikes, yet I fasten my seat belt in a car. Why take risks all
the time?

point is, bugs usually pop up, and with practice, one develops a tendency to
specifically avoid certain kinds of problems (the more painful the problem,
the more highly the user learns to avoid it). as a result, for people using
assembler, they learn to be careful, since even trivial errors will not be
caught by assembler, and will proceed to become potentially hard to track
down bugs (one develops a kind of 'blank stare' code checking ability).

making something easier, just makes it less painful to make errors, and thus
errors become more frequent.

I suspect this is also very likely the case with programmers who primarily
use statically typed languages that go over and use dynamically typed ones.
since they have not really felt the pain of the compiler missing their type
errors, they are a lot more likely to miss them, which is why, I think,
paradoxically, many good old C and C++ programmers experience pain with many
script languages, yet newbs seem a lot more adept at learning them, and old
timers assert that these kind of errors don't really occur...

(many such people also assert that one gets used to lisp style syntax, but I
never really stopped thinking that it looked ugly, nor did I ever really
like having to use emacs to avoid the pain this kind of syntax causes when
edited in notepad...).

meanwhile, in general, I like power and capability, at the possibly
necessary cost of comfort (and stability...).

>however, it is my belief that what const offers, for the most part, is
something people will have already long-since internalized. unlike some
other errors, these are likely to have a much lower chance of
random-chance incedence, which most often consist of IME missing/mistyped
variables, major type errors (often caused by another error), and
missing/mixing function arguments...

assigning a read-only variable is a little less likely, on the grounds
that this action is far more likely to be deliberate.

const correctness, although it requires discipline, pays off.
You are probably a bit young and still remember everything you type, when
you start experiencing memory lapses (from 25 up) you will find all these
little tricks pretty handy.

well, I don't remeber everything I type (there is just too much...).
as for age, I am getting there, sadly...
not 25 yet, but sadly, it is no longer that far away.
I am getting old it seems...

>or such...

What do you mean by that? or it your signature? or such ...

habit I guess...

--
Chqrlie.

Oct 29 '07 #31

"cr88192" <cr*****@nospam.hotmail.comwrites:

"Charlie Gordon" <ne**@chqrlie.orgwrote in message
news:47***********************@news.free.fr...

[...]

>for exponentiation, I would suggest you do use ** as in Fortran. It
is not ambiguous because a ** b has no meaning for b scalar or
struct type. You would need to bend the grammar a bit, but at least
the precedence would be much better suited than that of ^

I had a '**' operator at one time before, but it kept clashing with
pointers handling. as a result, if it existed, it would have to be
parsed specially, and have rules to avoid accidentally mis-parsing a
pointer operation:
"i**s++", where the intention was actually "i*(*s++)". provably
disambiguating this would require more info than the parser has
available (the only real option would be, for example, requiring
spaces...).

'a^.b' could be another option (where '^.', is given a high rather
than a low precedence).

[...]

"**" can be unambiguous if you allow parsing to be affected by
semantic analysis. But typically the source is tokenized before it's
parsed and analyzed (even though all these things theoretically happen
in translation phase 7). If you see ``x**y'', you can't tell whether
it's ``x ** y'' (an exponentiation) or ``x * *y'' without knowing the
type of y.

I can imagine ways to tweak the grammar, perhaps requiring white space
between a binary "*" operator and a unary "*" operator, but I wouldn't
recommend it; the result would be incompatible with C. And any such
solution would be complicated to define and to implement, and
therefore complicated to use in at least some cases.

You might consider ^^ as an exponentiation operator. It's not likely
that a future C standard will introduce a short-circuit xor operator.

Or you could use a keyword as an operator symbol (``sizeof'' is a
precedent for this).

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <* <http://users.sdsc.edu/~kst>
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"

Oct 29 '07 #32

"Keith Thompson" <ks***@mib.orga écrit dans le message de news:
ln************@nuthaus.mib.org...

"cr88192" <cr*****@nospam.hotmail.comwrites:
>"Charlie Gordon" <ne**@chqrlie.orgwrote in message
news:47***********************@news.free.fr...
[...]

>>for exponentiation, I would suggest you do use ** as in Fortran. It
is not ambiguous because a ** b has no meaning for b scalar or
struct type. You would need to bend the grammar a bit, but at least
the precedence would be much better suited than that of ^

I had a '**' operator at one time before, but it kept clashing with
pointers handling. as a result, if it existed, it would have to be
parsed specially, and have rules to avoid accidentally mis-parsing a
pointer operation:
"i**s++", where the intention was actually "i*(*s++)". provably
disambiguating this would require more info than the parser has
available (the only real option would be, for example, requiring
spaces...).

'a^.b' could be another option (where '^.', is given a high rather
than a low precedence).
[...]

"**" can be unambiguous if you allow parsing to be affected by
semantic analysis. But typically the source is tokenized before it's
parsed and analyzed (even though all these things theoretically happen
in translation phase 7). If you see ``x**y'', you can't tell whether
it's ``x ** y'' (an exponentiation) or ``x * *y'' without knowing the
type of y.

Of course, I'm not proposing ** to be a token, but x * *y to be
reinterpreted as fexp(x, y) if y is a numeric type. This trick can be
played on the parse tree if you have one, at code generation time, or on the
fly if you generate code directly. The programmer would be more inclined to
write x ** y or x**y, but it is parsed as x * *y. This trick would be more
difficult to play in an interpreter with dynamic typing, but still possible,
by sticking the appropriate behaviour to fexp(x, y) for y pointer type.

I can imagine ways to tweak the grammar, perhaps requiring white space
between a binary "*" operator and a unary "*" operator, but I wouldn't
recommend it; the result would be incompatible with C. And any such
solution would be complicated to define and to implement, and
therefore complicated to use in at least some cases.

none of this should be needed.

You might consider ^^ as an exponentiation operator. It's not likely
that a future C standard will introduce a short-circuit xor operator.

Or you could use a keyword as an operator symbol (``sizeof'' is a
precedent for this).

These are other directions, but less appealing IMHO.

--
Chqrlie.

Oct 29 '07 #33

cr88192

"Keith Thompson" <ks***@mib.orgwrote in message
news:ln************@nuthaus.mib.org...

"cr88192" <cr*****@nospam.hotmail.comwrites:
>"Charlie Gordon" <ne**@chqrlie.orgwrote in message
news:47***********************@news.free.fr...
[...]

>>for exponentiation, I would suggest you do use ** as in Fortran. It
is not ambiguous because a ** b has no meaning for b scalar or
struct type. You would need to bend the grammar a bit, but at least
the precedence would be much better suited than that of ^

I had a '**' operator at one time before, but it kept clashing with
pointers handling. as a result, if it existed, it would have to be
parsed specially, and have rules to avoid accidentally mis-parsing a
pointer operation:
"i**s++", where the intention was actually "i*(*s++)". provably
disambiguating this would require more info than the parser has
available (the only real option would be, for example, requiring
spaces...).

'a^.b' could be another option (where '^.', is given a high rather
than a low precedence).
[...]

"**" can be unambiguous if you allow parsing to be affected by
semantic analysis. But typically the source is tokenized before it's
parsed and analyzed (even though all these things theoretically happen
in translation phase 7). If you see ``x**y'', you can't tell whether
it's ``x ** y'' (an exponentiation) or ``x * *y'' without knowing the
type of y.

knowing the type of y is the problem, though theoretically it could be
handled by parse-tree tweaking, if I had the parse tree at the same point I
was doing type handling (in my compiler, I do not, since these are handled
as different stages).

a later frontend may also make type info available in more of the upper
compiler, such that such inferences can be made...

I can imagine ways to tweak the grammar, perhaps requiring white space
between a binary "*" operator and a unary "*" operator, but I wouldn't
recommend it; the result would be incompatible with C. And any such
solution would be complicated to define and to implement, and
therefore complicated to use in at least some cases.

yeah, I considered, but did not accept these ideas...
it matters to me that compatibility not be broken.

You might consider ^^ as an exponentiation operator. It's not likely
that a future C standard will introduce a short-circuit xor operator.

however, I may at some point add such an operator (after all, my last script
language had such an operator...).

'^.' still seems like a better option IMO, since it resembles '^', but is a
different operator...
(I can just make it have a very different precedence than '^').

ok, this drops the precedence-similarity idea (if the new operators have
different precedences than the old ones they resemble).

'&.', '|.', and '^.' might be made tightly binding (slightly more tightly
than '*' and '/').
'*.' and '/.' will be the same as '*' and '/'.
'+.' and '-.' will be the same as '+' and '-'.

'*.' could thus be an alternative for dot product, and maybe an additional
multiply form (is some other cases).
'/.' could be used for a 'reverse divide' for types with non-communitive
multiplication and division (such as quaternions, which currently use a
builtin function for this). potentially, it could also serve as a shorthand
for dividing ints and getting a float (aka: cast-free).
or, all this could be misguided, who knows...

Or you could use a keyword as an operator symbol (``sizeof'' is a
precedent for this).

possible, but I don't really like this approach personally...

--
Keith Thompson (The_Other_Keith) ks***@mib.org
<http://www.ghoti.net/~kst>
San Diego Supercomputer Center <*>
<http://users.sdsc.edu/~kst>
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"

Oct 29 '07 #34

"cr88192" <cr*****@nospam.hotmail.coma écrit dans le message de news:
3e***************************@saipan.com...

>
"Keith Thompson" <ks***@mib.orgwrote in message
news:ln************@nuthaus.mib.org...
>"cr88192" <cr*****@nospam.hotmail.comwrites:
>>"Charlie Gordon" <ne**@chqrlie.orgwrote in message
news:47***********************@news.free.fr...
[...]
>>>for exponentiation, I would suggest you do use ** as in Fortran. It
is not ambiguous because a ** b has no meaning for b scalar or
struct type. You would need to bend the grammar a bit, but at least
the precedence would be much better suited than that of ^

I had a '**' operator at one time before, but it kept clashing with
pointers handling. as a result, if it existed, it would have to be
parsed specially, and have rules to avoid accidentally mis-parsing a
pointer operation:
"i**s++", where the intention was actually "i*(*s++)". provably
disambiguating this would require more info than the parser has
available (the only real option would be, for example, requiring
spaces...).

'a^.b' could be another option (where '^.', is given a high rather
than a low precedence).
[...]

"**" can be unambiguous if you allow parsing to be affected by
semantic analysis. But typically the source is tokenized before it's
parsed and analyzed (even though all these things theoretically happen
in translation phase 7). If you see ``x**y'', you can't tell whether
it's ``x ** y'' (an exponentiation) or ``x * *y'' without knowing the
type of y.

knowing the type of y is the problem, though theoretically it could be
handled by parse-tree tweaking, if I had the parse tree at the same point
I was doing type handling (in my compiler, I do not, since these are
handled as different stages).

a later frontend may also make type info available in more of the upper
compiler, such that such inferences can be made...

Probably your best option.

>I can imagine ways to tweak the grammar, perhaps requiring white space
between a binary "*" operator and a unary "*" operator, but I wouldn't
recommend it; the result would be incompatible with C. And any such
solution would be complicated to define and to implement, and
therefore complicated to use in at least some cases.

yeah, I considered, but did not accept these ideas...
it matters to me that compatibility not be broken.

Wise choice.

>You might consider ^^ as an exponentiation operator. It's not likely
that a future C standard will introduce a short-circuit xor operator.

however, I may at some point add such an operator (after all, my last
script language had such an operator...).

short circuit xor does not get much usage IMHO.

'^.' still seems like a better option IMO, since it resembles '^', but is
a different operator...
(I can just make it have a very different precedence than '^').

ok, this drops the precedence-similarity idea (if the new operators have
different precedences than the old ones they resemble).

'&.', '|.', and '^.' might be made tightly binding (slightly more tightly
than '*' and '/').
'*.' and '/.' will be the same as '*' and '/'.
'+.' and '-.' will be the same as '+' and '-'.

'*.' could thus be an alternative for dot product, and maybe an additional
multiply form (is some other cases).
'/.' could be used for a 'reverse divide' for types with non-communitive
multiplication and division (such as quaternions, which currently use a
builtin function for this). potentially, it could also serve as a
shorthand for dividing ints and getting a float (aka: cast-free).

No, these tokens are really problematic. I pointed at ``1.^2'' that would
become ambiguous if you attach semantics to ^ for floating point values (as
you may have), as is unequivically parsed as 1. ^ 2. ; at least the .^ and
more generally . prefixed arithmetic operators you are considering would not
cause incompatibilities with current C syntax, just parsing surprises for
programmers trying to use your extensions. Adding tokens with a trailing .
do create incompatibilites with the current C syntax as it would cause
legitimate expressions to be parsed differently. Consider these:

..5==x^.5==y // x equals 0.5 or y equals 0.5 but not both.
..8<x&.9>x
1.1==x|.9==y
x/.1
y*.2
z-.2
....

I always put around binary operators but a lot of programmers don't.

--
Chqrlie.

Oct 29 '07 #35

"Charlie Gordon" <ne**@chqrlie.orga écrit dans le message de news:
47**********************@news.free.fr...

"cr88192" <cr*****@nospam.hotmail.coma écrit dans le message de news:
3e***************************@saipan.com...
>>
"Keith Thompson" <ks***@mib.orgwrote in message
news:ln************@nuthaus.mib.org...
>>"cr88192" <cr*****@nospam.hotmail.comwrites:
"Charlie Gordon" <ne**@chqrlie.orgwrote in message
news:47***********************@news.free.fr.. .
[...]
for exponentiation, I would suggest you do use ** as in Fortran. It
is not ambiguous because a ** b has no meaning for b scalar or
struct type. You would need to bend the grammar a bit, but at least
the precedence would be much better suited than that of ^

I had a '**' operator at one time before, but it kept clashing with
pointers handling. as a result, if it existed, it would have to be
parsed specially, and have rules to avoid accidentally mis-parsing a
pointer operation:
"i**s++", where the intention was actually "i*(*s++)". provably
disambiguating this would require more info than the parser has
available (the only real option would be, for example, requiring
spaces...).

'a^.b' could be another option (where '^.', is given a high rather
than a low precedence).
[...]

"**" can be unambiguous if you allow parsing to be affected by
semantic analysis. But typically the source is tokenized before it's
parsed and analyzed (even though all these things theoretically happen
in translation phase 7). If you see ``x**y'', you can't tell whether
it's ``x ** y'' (an exponentiation) or ``x * *y'' without knowing the
type of y.

knowing the type of y is the problem, though theoretically it could be
handled by parse-tree tweaking, if I had the parse tree at the same point
I was doing type handling (in my compiler, I do not, since these are
handled as different stages).

a later frontend may also make type info available in more of the upper
compiler, such that such inferences can be made...

Probably your best option.

>>I can imagine ways to tweak the grammar, perhaps requiring white space
between a binary "*" operator and a unary "*" operator, but I wouldn't
recommend it; the result would be incompatible with C. And any such
solution would be complicated to define and to implement, and
therefore complicated to use in at least some cases.

yeah, I considered, but did not accept these ideas...
it matters to me that compatibility not be broken.

Wise choice.

>>You might consider ^^ as an exponentiation operator. It's not likely
that a future C standard will introduce a short-circuit xor operator.

however, I may at some point add such an operator (after all, my last
script language had such an operator...).

short circuit xor does not get much usage IMHO.

>'^.' still seems like a better option IMO, since it resembles '^', but is
a different operator...
(I can just make it have a very different precedence than '^').

ok, this drops the precedence-similarity idea (if the new operators have
different precedences than the old ones they resemble).

'&.', '|.', and '^.' might be made tightly binding (slightly more tightly
than '*' and '/').
'*.' and '/.' will be the same as '*' and '/'.
'+.' and '-.' will be the same as '+' and '-'.

'*.' could thus be an alternative for dot product, and maybe an
additional multiply form (is some other cases).
'/.' could be used for a 'reverse divide' for types with non-communitive
multiplication and division (such as quaternions, which currently use a
builtin function for this). potentially, it could also serve as a
shorthand for dividing ints and getting a float (aka: cast-free).

No, these tokens are really problematic. I pointed at ``1.^2'' that would
become ambiguous if you attach semantics to ^ for floating point values
(as you may have), as is unequivically parsed as 1. ^ 2. ; at least the
.^ and more generally . prefixed arithmetic operators you are considering
would not cause incompatibilities with current C syntax, just parsing
surprises for programmers trying to use your extensions. Adding tokens
with a trailing . do create incompatibilites with the current C syntax as
it would cause legitimate expressions to be parsed differently. Consider
these:

.5==x^.5==y // x equals 0.5 or y equals 0.5 but not both.
.8<x&.9>x
1.1==x|.9==y
x/.1
y*.2
z-.2
...

I always put around binary operators but a lot of programmers don't.

I always put *spaces* around binary operators but a lot of programmers
don't.

--
Chqrlie.

Oct 29 '07 #36

Ben Bacarisse

"Charlie Gordon" <ne**@chqrlie.orgwrites:
<snip>

>I always put around binary operators but a lot of programmers don't.

I always put *spaces* around binary operators but a lot of programmers
don't.

I found your original version delightfully self-referential -- to the
point where if you had written "I put around operators but many put."
I would have though it deliberate!

--
Ben.

Oct 29 '07 #37

Kenneth Brody

Eric Sosman wrote:
[...]

gets worse: Some of the best compressors encode their output
in fractions of bits; if a file length of 10007 bits is bad,
a length of 10006.59029663+ bits is *really* bad!

[...]

I've never heard of that. What's a fraction of a bit? Is it
something that needs to hold less than two states?

What would you call those states? "True" and "Fal"? :-)

--
+-------------------------+--------------------+-----------------------+
| Kenneth J. Brody | www.hvcomputer.com | #include |
| kenbrody/at\spamcop.net | www.fptech.com | <std_disclaimer.h|
+-------------------------+--------------------+-----------------------+
Don't e-mail me at: <mailto:Th*************@gmail.com>

Oct 29 '07 #38

Walter Roberson

In article <47***************@spamcop.net>,
Kenneth Brody <ke******@spamcop.netwrote:

>Eric Sosman wrote:

>gets worse: Some of the best compressors encode their output
in fractions of bits; if a file length of 10007 bits is bad,
a length of 10006.59029663+ bits is *really* bad!

>I've never heard of that. What's a fraction of a bit? Is it
something that needs to hold less than two states?

[OT]

See "Arithmetic encoding".
http://en.wikipedia.org/wiki/Arithmetic_coding
--
"Beware of bugs in the above code; I have only proved it correct,
not tried it." -- Donald Knuth

Oct 29 '07 #39

"Charlie Gordon" <ne**@chqrlie.orgwrites:
[...]

Of course, I'm not proposing ** to be a token, but x * *y to be
reinterpreted as fexp(x, y) if y is a numeric type. This trick can be
played on the parse tree if you have one, at code generation time, or on the
fly if you generate code directly. The programmer would be more inclined to
write x ** y or x**y, but it is parsed as x * *y. This trick would be more
difficult to play in an interpreter with dynamic typing, but still possible,
by sticking the appropriate behaviour to fexp(x, y) for y pointer type.

[...]

My gut reaction to this idea is: Ick.

If I were designing a new language with a "**" operator, I'd just make
"**" a token. If "*" is also a unary operator, then "x * *y" would
require a space. The kind of special-case treatment you suggest is,
in my opinion, just too convoluted.

I like the way tokenization and analysis are separated in C. It makes
the language easier to implement and, more importantly, easier to
describe. A more complicated definition might allow "x+++++y" to be
legal, but at the cost of creating odd corner cases that couldn't be
resolved without detailed analysis of the standard.

And if you're adding extensions to the language, it's not unlikely
that you'd eventually want to add operator overloading. How do you
overload "**" if it' a composite of "*" and "*", and how do you
interpret "x**y" if either interpretation could be correct?

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <* <http://users.sdsc.edu/~kst>
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"

Oct 29 '07 #40

=?iso-2022-kr?q?=1B=24=29CHarald_van_D=0E=29=26=0F

"cr88192" <cr*****@nospam.hotmail.comwrites:

"Keith Thompson" <ks***@mib.orgwrote in message
news:ln************@nuthaus.mib.org...

[...]

>You might consider ^^ as an exponentiation operator. It's not likely
that a future C standard will introduce a short-circuit xor operator.

however, I may at some point add such an operator (after all, my last
script language had such an operator...).

Um, a short-circuit xor operator is logically impossible; you have to
know the values of both operands to determine the result.

On the other hand, a logical xor might make sense (it would yield 0 or
1 rather than the bitwise result), and "^^" would be a sensible symbol
for it.

Objective-C, if i recall correctly, uses the "@" character for all its
extensions to C, which avoids any incompatibilities. You might
consider a similar approach. For example, you might make "@**" a
token and use it as an exponentiation operator.

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <* <http://users.sdsc.edu/~kst>
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"

Oct 29 '07 #41

On Mon, 29 Oct 2007 15:06:56 +1000, cr88192 wrote:

I had a '**' operator at one time before, but it kept clashing with
pointers handling. as a result, if it existed, it would have to be
parsed specially, and have rules to avoid accidentally mis-parsing a
pointer operation: "i**s++", where the intention was actually
"i*(*s++)". provably disambiguating this would require more info than
the parser has available (the only real option would be, for example,
requiring spaces...).

Wouldn't you be able to define the ** operator as either accepting two
arithmetic types, or accepting an arithmetic left operand and a pointer
right operand? This would mean you could write a * *p to mean a * (*p),
but not to mean pow(a, p), and that you could write a**b to mean either
a * (*b) or pow(a, b). Additionally, it would mean you don't have parsing
problems: a**b is always two operands for one operator, regardless of
whether this operator performs one operation or two.

Oct 29 '07 #42

cr88192

"Charlie Gordon" <ne**@chqrlie.orgwrote in message
news:47**********************@news.free.fr...

"cr88192" <cr*****@nospam.hotmail.coma écrit dans le message de news:
3e***************************@saipan.com...
>>
"Keith Thompson" <ks***@mib.orgwrote in message
news:ln************@nuthaus.mib.org...
>>"cr88192" <cr*****@nospam.hotmail.comwrites:
"Charlie Gordon" <ne**@chqrlie.orgwrote in message
news:47***********************@news.free.fr.. .
[...]
for exponentiation, I would suggest you do use ** as in Fortran. It
is not ambiguous because a ** b has no meaning for b scalar or
struct type. You would need to bend the grammar a bit, but at least
the precedence would be much better suited than that of ^

I had a '**' operator at one time before, but it kept clashing with
pointers handling. as a result, if it existed, it would have to be
parsed specially, and have rules to avoid accidentally mis-parsing a
pointer operation:
"i**s++", where the intention was actually "i*(*s++)". provably
disambiguating this would require more info than the parser has
available (the only real option would be, for example, requiring
spaces...).

'a^.b' could be another option (where '^.', is given a high rather
than a low precedence).
[...]

"**" can be unambiguous if you allow parsing to be affected by
semantic analysis. But typically the source is tokenized before it's
parsed and analyzed (even though all these things theoretically happen
in translation phase 7). If you see ``x**y'', you can't tell whether
it's ``x ** y'' (an exponentiation) or ``x * *y'' without knowing the
type of y.

knowing the type of y is the problem, though theoretically it could be
handled by parse-tree tweaking, if I had the parse tree at the same point
I was doing type handling (in my compiler, I do not, since these are
handled as different stages).

a later frontend may also make type info available in more of the upper
compiler, such that such inferences can be made...

Probably your best option.

>>I can imagine ways to tweak the grammar, perhaps requiring white space
between a binary "*" operator and a unary "*" operator, but I wouldn't
recommend it; the result would be incompatible with C. And any such
solution would be complicated to define and to implement, and
therefore complicated to use in at least some cases.

yeah, I considered, but did not accept these ideas...
it matters to me that compatibility not be broken.

Wise choice.

>>You might consider ^^ as an exponentiation operator. It's not likely
that a future C standard will introduce a short-circuit xor operator.

however, I may at some point add such an operator (after all, my last
script language had such an operator...).

short circuit xor does not get much usage IMHO.

has more use, but more as a 'logical xor', since it is not possible to
short-cicuit like is possible with '&&' or '||'.

>'^.' still seems like a better option IMO, since it resembles '^', but is
a different operator...
(I can just make it have a very different precedence than '^').

ok, this drops the precedence-similarity idea (if the new operators have
different precedences than the old ones they resemble).

'&.', '|.', and '^.' might be made tightly binding (slightly more tightly
than '*' and '/').
'*.' and '/.' will be the same as '*' and '/'.
'+.' and '-.' will be the same as '+' and '-'.

'*.' could thus be an alternative for dot product, and maybe an
additional multiply form (is some other cases).
'/.' could be used for a 'reverse divide' for types with non-communitive
multiplication and division (such as quaternions, which currently use a
builtin function for this). potentially, it could also serve as a
shorthand for dividing ints and getting a float (aka: cast-free).

No, these tokens are really problematic. I pointed at ``1.^2'' that would
become ambiguous if you attach semantics to ^ for floating point values
(as you may have), as is unequivically parsed as 1. ^ 2. ; at least the
.^ and more generally . prefixed arithmetic operators you are considering
would not cause incompatibilities with current C syntax, just parsing
surprises for programmers trying to use your extensions. Adding tokens
with a trailing . do create incompatibilites with the current C syntax as
it would cause legitimate expressions to be parsed differently. Consider
these:

.5==x^.5==y // x equals 0.5 or y equals 0.5 but not both.
.8<x&.9>x
1.1==x|.9==y
x/.1
y*.2
z-.2
...

I always put around binary operators but a lot of programmers don't.

odd, I always thought the preceeding decimal digits were required.
at least in my parser, the number will not be recognized as a number, unless
it starts with a decimal digit, say, '0'...

--
Chqrlie.

Oct 29 '07 #43

On Mon, 29 Oct 2007 15:06:56 +1000, cr88192 wrote:
>I had a '**' operator at one time before, but it kept clashing with
pointers handling. as a result, if it existed, it would have to be
parsed specially, and have rules to avoid accidentally mis-parsing a
pointer operation: "i**s++", where the intention was actually
"i*(*s++)". provably disambiguating this would require more info than
the parser has available (the only real option would be, for example,
requiring spaces...).

Wouldn't you be able to define the ** operator as either accepting two
arithmetic types, or accepting an arithmetic left operand and a pointer
right operand? This would mean you could write a * *p to mean a * (*p),
but not to mean pow(a, p), and that you could write a**b to mean either
a * (*b) or pow(a, b). Additionally, it would mean you don't have parsing
problems: a**b is always two operands for one operator, regardless of
whether this operator performs one operation or two.

Interesting idea, but it would change operator precedence in ways that
I don't want to think about.

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <* <http://users.sdsc.edu/~kst>
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"

Oct 29 '07 #44

"cr88192" <cr*****@nospam.hotmail.comwrites:

"Charlie Gordon" <ne**@chqrlie.orgwrote in message
news:47**********************@news.free.fr...

[...]

>.5==x^.5==y // x equals 0.5 or y equals 0.5 but not both.
.8<x&.9>x
1.1==x|.9==y
x/.1
y*.2
z-.2
...

I always put around binary operators but a lot of programmers don't.

odd, I always thought the preceeding decimal digits were required.
at least in my parser, the number will not be recognized as a number,
unless it starts with a decimal digit, say, '0'...

Take a look at the syntax for a floating-constant, C99 6.4.4.2.

Just as a matter of style, I never use a leading or trailing decimal
point (I at least prepend or append a 0), but it's permitted.

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <* <http://users.sdsc.edu/~kst>
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"

Oct 29 '07 #45

CBFalconer

Kenneth Brody wrote:

Eric Sosman wrote:
[...]
>gets worse: Some of the best compressors encode their output
in fractions of bits; if a file length of 10007 bits is bad,
a length of 10006.59029663+ bits is *really* bad!
[...]

I've never heard of that. What's a fraction of a bit? Is it
something that needs to hold less than two states?

Look up arithmetic compression.

--
Chuck F (cbfalconer at maineline dot net)
Available for consulting/temporary embedded and systems.
<http://cbfalconer.home.att.net>

--
Posted via a free Usenet account from http://www.teranews.com

Oct 29 '07 #46

cr88192

"Keith Thompson" <ks***@mib.orgwrote in message
news:ln************@nuthaus.mib.org...

"cr88192" <cr*****@nospam.hotmail.comwrites:
>"Charlie Gordon" <ne**@chqrlie.orgwrote in message
news:47**********************@news.free.fr...
[...]

>>.5==x^.5==y // x equals 0.5 or y equals 0.5 but not both.
.8<x&.9>x
1.1==x|.9==y
x/.1
y*.2
z-.2
...

I always put around binary operators but a lot of programmers don't.

odd, I always thought the preceeding decimal digits were required.
at least in my parser, the number will not be recognized as a number,
unless it starts with a decimal digit, say, '0'...

Take a look at the syntax for a floating-constant, C99 6.4.4.2.

Just as a matter of style, I never use a leading or trailing decimal
point (I at least prepend or append a 0), but it's permitted.

ok, I missed that, having assumed the leading digits were required.

I am not sure if in-practice things are done this way, or not.
2 options here:
disallow floating point numbers lacking a numeric prefix ('.5' being
technically invalid);
adding a 'disambiguation rule', such that whitespace is required following
the dot if following the dot could ambiguously be confused as a
fractional-number.

'2^.3' would thus be invalid (parsed as '2 ^ .3'), and would thus have to be
written:
'2^. 3'.

'^:', is also possible, but IMO uglier, and has potential implications (the
above operator style at least has precedent in a few certain specific
functional languages...).

'^,' is also possible, since these operators are not allowed standalone or
in suffix position.

(just here looking for a rule I can generalize to create a number of
auxilary operators is all).

--
Keith Thompson (The_Other_Keith) ks***@mib.org
<http://www.ghoti.net/~kst>
San Diego Supercomputer Center <*>
<http://users.sdsc.edu/~kst>
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"

Oct 30 '07 #47