473,416 Members | 1,575 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,416 software developers and data experts.

How do I make my own custom C compiler?

Ok, I am think I am a little more knowledgeable about C and pointers, ughh.

And likewise, I want to fix C.....and not so much to make a C++ or Java or
C# or even D like language.

So, if I wanted to make my "custom" C compiler that's different that the
current C99 or ANSI C, where would I start?

Thanks.


Jun 8 '06 #1
18 3019
"smnoff" <rh******@hotmail.com> writes:
Ok, I am think I am a little more knowledgeable about C and pointers, ughh.
And likewise, I want to fix C.....and not so much to make a C++ or Java or
C# or even D like language.
So, if I wanted to make my "custom" C compiler that's different that the
current C99 or ANSI C, where would I start?


Speaking as someone who never wrote a compiler, I'd suggest:
(1) The Red Dragon Book
(2) Introduction to Compiler Construction with UNIX, by Axel T. Schreiner
and H.George Friedman, Jr.? They take you through the design and
implementation of a compiler for smallC. It was printed in 1985.
You might still be able to get a used copy.
--
Ignorantly,
Allan Adler <ar*@zurich.csail.mit.edu>
* Disclaimer: I am a guest and *not* a member of the MIT CSAIL. My actions and
* comments do not reflect in any way on MIT. Also, I am nowhere near Boston.
Jun 8 '06 #2
"smnoff" <rh******@hotmail.com> writes:
Ok, I am think I am a little more knowledgeable about C and pointers, ughh.

And likewise, I want to fix C.....and not so much to make a C++ or Java or
C# or even D like language.

So, if I wanted to make my "custom" C compiler that's different that the
current C99 or ANSI C, where would I start?


I'd start with an existing open-source compiler, such as gcc or lcc.

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <*> <http://users.sdsc.edu/~kst>
We must do something. This is something. Therefore, we must do this.
Jun 8 '06 #3

"smnoff" <rh******@hotmail.com> wrote in message
news:n7Nhg.5643$f76.4621@dukeread06...
Ok, I am think I am a little more knowledgeable about C and pointers,
ughh.

And likewise, I want to fix C.....and not so much to make a C++ or Java or
C# or even D like language.

So, if I wanted to make my "custom" C compiler that's different that the
current C99 or ANSI C, where would I start?

Thanks.

Hit my website

www.personal.leeds.ac.uk/~bgy1mm

and look at the MiniBasic section.

Writing a Basic interpreter is not trivial, but it is much easier than
writing a compiler.
Once you understand how to write an interpreter, you will have a good
foundation for moving on to a compiler.
Jun 8 '06 #4
Keith Thompson wrote:
"smnoff" <rh******@hotmail.com> writes:
Ok, I am think I am a little more knowledgeable about C and pointers, ughh.

And likewise, I want to fix C.....and not so much to make a C++ or Java or
C# or even D like language.

So, if I wanted to make my "custom" C compiler that's different that the
current C99 or ANSI C, where would I start?


I'd start with an existing open-source compiler, such as gcc or lcc.


Isn't a bit risky to start with such a behemoth?

--
one's freedom stops where others' begin

Giannis Papadopoulos
Computer and Communications Engineering dept. (CCED)
University of Thessaly
http://dop.freegr.net/
Jun 8 '06 #5
Giannis Papadopoulos a écrit :
Keith Thompson wrote:
"smnoff" <rh******@hotmail.com> writes:
Ok, I am think I am a little more knowledgeable about C and pointers, ughh.

And likewise, I want to fix C.....and not so much to make a C++ or Java or
C# or even D like language.

So, if I wanted to make my "custom" C compiler that's different that the
current C99 or ANSI C, where would I start?


I'd start with an existing open-source compiler, such as gcc or lcc.

Isn't a bit risky to start with such a behemoth?


gcc is impossible to understand unles you spend at least
2-3 YEARS working in it full time.

There are at most 20 people on the world that can understand
that compiler, and by understanding I mean that they are
able to modify something in it, something basic like
the parser for instance.

I tried something much simpler: to fix a bug.

Under windows, when a function was _stdcall, it would screw
the floating point stack.

I spent two weeks trying to fix it, learning how it works,
etc.

The first problem is to know RTL. You have to completely understand
RTL to understand the flow of things.

Second, the sheer size of the code base. There are 13-15 MB
of C source code to understand. And the code is mostly very sparsely
commented. Macros everywhere hide from you what is going on.

Accessing data structures is always done with macros, to easy
things when structure layout changes, but this makes it very
hard for newcomers to understand what the hell those macros
are DOING...

Third, you have to find your way in a mess of #ifdefs that defies
the imagination. gcc runs in many machines, and "portability"
has been taken to ridiculous extremes (the assembler, for instance).

This means that the same macro can have several interpretations
depending on which combination of machine/os you are running.

Fourth, like in any beast like this, you are bound to encounter
the horrible hacks that will kill you.

For instance I am trying to understand the way gcc generates the
DWARF tables for C++ exception handling, and I spent several
days trying to understand why the assembler instructions:

.byte 0x4
.long 1

would produce a single byte "0x41" instead of a byte 0x4 and
a 32 bit integer 1.

First, most gcc developers told me I was wrong and that was impossible.
I learned then, that most people in the mailing lists do not know what
they are talking about.

You have to find the guy that knows what he/she is talking about. It
took me a week to find him, and then he told me that the assembler,
when assembling the debug_frame section does not follow what is written
in the assembly directives but "optimizes" it, to save space.

Ahhhhhh.

I would have never found it, it just never crossed my mind...
Lesson learned: Be prepared to find all possible hacks.

ATTENTION IMPORTANT STUFF
-------------------------

Gcc is a very good compiler. It is a compiler that generates code for
MANY machines, and is therefore very complex. Nowhere I want to
imply with this message that its "crap" or "a bad compiler". I just
want to tell people here that is surely not something you
want to *start* with.

jacob
Jun 8 '06 #6
Giannis Papadopoulos <ip******@inf.uth.gr> writes:
Keith Thompson wrote:
"smnoff" <rh******@hotmail.com> writes:
So, if I wanted to make my "custom" C compiler that's different that the
current C99 or ANSI C, where would I start?


I'd start with an existing open-source compiler, such as gcc or lcc.


Isn't a bit risky to start with such a behemoth?


Why? Hacking simple features into GCC is not that difficult.
I've done it a couple of times and so have my officemates.
--
int main(void){char p[]="ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuv wxyz.\
\n",*q="kl BIcNBFr.NKEzjwCIxNJC";int i=sizeof p/2;char *strchr();int putchar(\
);while(*q){i+=strchr(p,*q++)-p;if(i>=(int)sizeof p)i-=sizeof p-1;putchar(p[i]\
);}return 0;}
Jun 8 '06 #7
"Ben Pfaff" writes:
Isn't a bit risky to start with such a behemoth?


Why? Hacking simple features into GCC is not that difficult.
I've done it a couple of times and so have my officemates.


So how is that PhD coming? Is it still in the works or did it already
happen?
Jun 8 '06 #8
smnoff wrote:
Ok, I am think I am a little more knowledgeable about C and pointers, ughh.

And likewise, I want to fix C.....and not so much to make a C++ or Java or
C# or even D like language.
By "fixing" C you create a language which can no longer be called C,
(as standardised by ISO).
So, if I wanted to make my "custom" C compiler that's different that the
current C99 or ANSI C, where would I start?


lcc is said to be an easy compiler to customise and work with.
http://www.cs.princeton.edu/software/lcc/

You might also take a look at the following:
http://fabrice.bellard.free.fr/tcc/

In any case starting with a monster like gcc is not easy, unless you
already happen to have a familiarity with it's source.

Jun 8 '06 #9
"osmium" <r1********@comcast.net> writes:
Why? Hacking simple features into GCC is not that difficult.
I've done it a couple of times and so have my officemates.


So how is that PhD coming? Is it still in the works or did it already
happen?


Still in the works. ETA December 2006, but hard to say with
accuracy...
--
int main(void){char p[]="ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuv wxyz.\
\n",*q="kl BIcNBFr.NKEzjwCIxNJC";int i=sizeof p/2;char *strchr();int putchar(\
);while(*q){i+=strchr(p,*q++)-p;if(i>=(int)sizeof p)i-=sizeof p-1;putchar(p[i]\
);}return 0;}
Jun 8 '06 #10
"smnoff" <rh******@hotmail.com> wrote:
So, if I wanted to make my "custom" C compiler that's different that the
current C99 or ANSI C, where would I start?


Others gave you good advice already. This is a short bibliography you
may find useful, all these books have a practical approach, as opposed
to theoretical (Dragon book)

Holub: "Compiler Design in C"
Wirth: "Compiler Construction" (Free on-line. Oberon subset)
Pemberton & Daniels: "Pascal Implementation: The P4 Compiler and
Interpreter" (Free on-line)
Hendrix: "The Small-C Handbook" (C subset)
Brinch Hansen: "Brinch Hansen on Pascal Compilers" (Pascal subset)
Crenshaw: "Let's Build a Compiler" (Free articles on-line. Basic(?) )
Appel: "Modern Compiler Implementation in C"
Wirth & Gutknecht: "Project Oberon - The Design of an Operating System
and Compiler" (Free on-line)

I agree that gcc is *not* a good choice for a beginner compiler
writer. I would recommend starting with Wirth or Hansen's books.
They implement compilers for "toy" languages, using recursive descent
parsers, so there is no need, (at least at this stage) to learn about
additional parsing tools. LCC (a full C compiler) could follow.
Try also posting in comp.compilers.
Jun 8 '06 #11
Ben Pfaff wrote:
Giannis Papadopoulos <ip******@inf.uth.gr> writes:
Keith Thompson wrote:
"smnoff" <rh******@hotmail.com> writes:
So, if I wanted to make my "custom" C compiler that's different that the
current C99 or ANSI C, where would I start?
I'd start with an existing open-source compiler, such as gcc or lcc.

Isn't a bit risky to start with such a behemoth?


Why? Hacking simple features into GCC is not that difficult.
I've done it a couple of times and so have my officemates.


Yes, but since this question is asked I'd expect that the OP does not
have the necessary experience to pursue such a quest.

--
one's freedom stops where others' begin

Giannis Papadopoulos
Computer and Communications Engineering dept. (CCED)
University of Thessaly
http://dop.freegr.net/
Jun 8 '06 #12
Giannis Papadopoulos <ip******@inf.uth.gr> writes:
Ben Pfaff wrote:
Giannis Papadopoulos <ip******@inf.uth.gr> writes:
Keith Thompson wrote:
"smnoff" <rh******@hotmail.com> writes:
> So, if I wanted to make my "custom" C compiler that's different that the
> current C99 or ANSI C, where would I start?
I'd start with an existing open-source compiler, such as gcc or lcc.
Isn't a bit risky to start with such a behemoth?


Why? Hacking simple features into GCC is not that difficult.
I've done it a couple of times and so have my officemates.


Yes, but since this question is asked I'd expect that the OP does not
have the necessary experience to pursue such a quest.


I'll concede that hacking gcc is probably not a good starting point
for a beginner. (I've never really looked at the gcc sources.)

As someone else mentioned, lcc is said to be reasonably easy to hack
-- and it even has its own newsgroup.

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <*> <http://users.sdsc.edu/~kst>
We must do something. This is something. Therefore, we must do this.
Jun 8 '06 #13
Groovy hepcat smnoff was jivin' on Wed, 7 Jun 2006 22:49:37 -0500 in
comp.lang.c.
How do I make my own custom C compiler?'s a cool scene! Dig it!
Ok, I am think I am a little more knowledgeable about C and pointers, ughh.

And likewise, I want to fix C.....and not so much to make a C++ or Java or
C# or even D like language.

So, if I wanted to make my "custom" C compiler that's different that the
current C99 or ANSI C, where would I start?


This would probably be best asked in comp.compilers. But anyhow...
Writing a C compiler is no mean feat. It is quite a complex language.
My advice is to start with an easier language.
Others have mentioned the "Dragon Book", also known as Compilers:
Principles, Techniques & Tools by Aho, Sethi & Ullman. This is
generally considered *the* book on compiler design, but is very dry
and technical. I'm currently reading it.
I highly recommend Compiler Construction by Wirth
(http://www.oberon.ethz.ch/books.html). It's an excellent work, and
quite hands-on. Wirth takes you through the construction of a compiler
for a subset of the Oberon language (similar to Pascal). I didn't
really feel fully confident about writing my own compiler until I read
this one. (Actually, it's an assembler I'm writing. I'll write
compilers for high level languages later.)
Crenshaw's series of articles entitled Let's Build a Compiler (URL
unavailable at this time) is aimed squarely at the rank beginner, and
is intended to get you writing compilers quickly. Unfortunately it has
its problems. For one thing the series was never finished. For another
thing it's rather haphazard, chopping and changing all over the place,
going over the same ground repeatedly, looking like he was making it
all up as he went along. There is much useful information in it,
though. This series takes you through the process of building a
compiler for a subset of a language the author made up, called KISS.

--

Dig the even newer still, yet more improved, sig!

http://alphalink.com.au/~phaywood/
"Ain't I'm a dog?" - Ronny Self, Ain't I'm a Dog, written by G. Sherry & W. Walker.
I know it's not "technically correct" English; but since when was rock & roll "technically correct"?
Jun 11 '06 #14
smnoff (in n7Nhg.5643$f76.4621@dukeread06) said:

| Ok, I am think I am a little more knowledgeable about C and
| pointers, ughh.
|
| And likewise, I want to fix C.....and not so much to make a C++ or
| Java or C# or even D like language.
|
| So, if I wanted to make my "custom" C compiler that's different
| that the current C99 or ANSI C, where would I start?

There are a several ways to approach the problem: modify the source
for an existing C compiler - or start from scratch and write the whole
thing in the language of your choosing.

Either way you'll learn much more than you expect. Some time back I
approached a similar goal by creating an intermediate compiler (which
compiled PL/C, a superset of BNF) - but by the time the PL/C compiler
was running cleanly, I'd lost interest in the original problem (mostly
because I'd learned enough that the original problem looked trivial.)

Go for it. I predict that you won't arrive at the originally intended
destination - but you will have learned a lot getting wherever you do
arrive. :-)
--
Morris Dovey
DeSoto Solar
DeSoto, Iowa USA
http://www.iedu.com/DeSoto
Jun 11 '06 #15
jacob navia <ja***@jacob.remcomp.fr> writes:
gcc is impossible to understand unles you spend at least 2-3 YEARS working
in it full time. [...]
The first problem is to know RTL. You have to completely understand
RTL to understand the flow of things.
I've already pointed out that I am not qualified to give advice about
this, but I will give some anyway.

I spent some time about 20 years ago trying to read some of the
source code for GCC and to configure it for a hypothetical machine.
I was singularly unqualified to do that and am no less so now.
However, it was very educational and I would be glad to have an
excuse to do something like that again. I do remember some of the
things I learned. I thought RTL was a lot of fun since it was
conceptually simple and fairly self-contained. Where I got into
trouble was in filling in the machine description files. To the
extent that it just described hardware and big- vs. little-
endianness, it was no problem, but there are places where you
have to give exact details about the calling sequence the operating
system uses to load a program on the target machine. I didn't know
enough about operating systems to guess what the calling sequence
would be on the machine I was trying to imagine.

Even if you fail to understand the code for GCC, it probably won't
do you any harm to try. You might find yourself going back to to the
source code again and again for guidance and inspiration as you learn
more about compilers in other ways.
Second, the sheer size of the code base. There are 13-15 MB
of C source code to understand. And the code is mostly very sparsely
commented. Macros everywhere hide from you what is going on.
One way of getting around that problem is to download an old version
of GCC, before it was ported to so many machines and before it supported
so many languages.
Accessing data structures is always done with macros, to easy
things when structure layout changes, but this makes it very
hard for newcomers to understand what the hell those macros
are DOING...
How about this: GCC is full of interesting data structures. You can
just take their definitions in isolation and try to figure out what
to do with them, even if their relevance to compilers is not immediately
apparent. Maybe the original code uses macros for greater efficiency,
but there are certain things you would always want to be able to do
with a given data structure and you can just write them yourself using
functions. Once you have a set of functions that will create or modify
or copy one of these data structures, or print one of them out in some
way, you can then try these macros out on them and see exactly what their
effects are, since you will know exactly what the data structure looks
like before you feed it to the macro.

In other words, as long as you are patient and don't mind studying the
code for its own sake, it seems to me that there are a lot of ways to
understand it. If you are in a hurry because you need to use the code
or modify it, or if you want to learn it quickly and then go write your
own, then the code appears as an obstacle and that might get in the way
of studying it. Just get what you can out of it and be glad that you got
that much.
Third, you have to find your way in a mess of #ifdefs that defies
the imagination. gcc runs in many machines, and "portability"
has been taken to ridiculous extremes (the assembler, for instance).
This means that the same macro can have several interpretations
depending on which combination of machine/os you are running.


I am not very good at GCC but I vaguely recall that it has a lot of options
that let you print out the results of various stages of processing a program.
For example, you can tell GCC to give you RTL output. Maybe if you compile
GCC with GCC and look at the output at the right stage (e.g. after cpp gets
through with it) you can get rid of all the #ifdefs by compiling with all
the things defined that need to be defined. As Jacob Navia points out,
that may not give you the meaning of a given macro on all possible platforms,
but for starters I think one would be happy to know what it means on one
platform.
--
Ignorantly,
Allan Adler <ar*@zurich.csail.mit.edu>
* Disclaimer: I am a guest and *not* a member of the MIT CSAIL. My actions and
* comments do not reflect in any way on MIT. Also, I am nowhere near Boston.
Jun 11 '06 #16

Allan Adler wrote:
Third, you have to find your way in a mess of #ifdefs that defies
the imagination. gcc runs in many machines, and "portability"
has been taken to ridiculous extremes (the assembler, for instance).
This means that the same macro can have several interpretations
depending on which combination of machine/os you are running.


I am not very good at GCC but I vaguely recall that it has a lot of options
that let you print out the results of various stages of processing a program.
For example, you can tell GCC to give you RTL output. Maybe if you compile
GCC with GCC and look at the output at the right stage (e.g. after cpp gets
through with it) you can get rid of all the #ifdefs by compiling with all
the things defined that need to be defined. As Jacob Navia points out,
that may not give you the meaning of a given macro on all possible platforms,
but for starters I think one would be happy to know what it means on one
platform.
--


You can get the output of the preprocessor using the -E option. But the
horrendous format will very likely make this output unreadable by a
human.

By the way , since noone has mentioned it , doesn't one need to be
fairly
proficient in the assembly of some processor before writing a compiler ?

Jun 11 '06 #17
sp****@gmail.com (in
11**********************@u72g2000cwu.googlegroups. com) said:

| By the way , since noone has mentioned it , doesn't one need to be
| fairly
| proficient in the assembly of some processor before writing a
| compiler ?

Only if the compiler is to output assembly code. :-)

[ Imagine a compiler that translated it's source language into C, or
COBOL, or APL... ]

--
Morris Dovey
DeSoto Solar
DeSoto, Iowa USA
http://www.iedu.com/DeSoto
Jun 11 '06 #18
sp****@gmail.com wrote:
By the way , since noone has mentioned it , doesn't one need to be
fairly
proficient in the assembly of some processor before writing a compiler ?

If that one needs a full-feautered compiler yes. But he might stop his
compiler just before the creation of assembly language.
Jun 12 '06 #19

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

17
by: Steve Jorgensen | last post by:
If you've ever employed custom error numbers and messages in you programs, you've probably ended up with code similar to what I've ended up with in the past something like... <code> public...
2
by: Glen | last post by:
I'm working on a custom assembly and I'm trying to figure out the best approach to handling known constraints within the assembly, once compiled, to alert the developer at compile time of a...
3
by: Edward Diener | last post by:
I understand the syntax of custom attributes, but I have no idea what they are supposed to do. Anyone care to give me a clue as to their functionality ?
2
by: Stephen Luce | last post by:
I am porting an instrumentation tool from linux to Visual C/C++. We designed the tool to be a replacement for the compiler. The tool takes c/c++ files and instruments (adds code) and pipes the...
4
by: Matt Jensen | last post by:
Howdy Relatively new to .NET, I'm trying to create custom a namespace for use in creating some utility classes, which I seem to have done OK, however, I'm having a problem trying to use the class...
1
by: aspnet20vb_mike | last post by:
I have a Custom Control which inherits from GridView. I call it GridViewSort and it is in a namespace "PDS.WebControls". I add it to the Toolbox, drop it on my WebForm and it shows up and...
2
by: prabhupr | last post by:
Hi Folks I was reading this article (http://www.dotnetbips.com/articles/displayarticle.aspx?id=32) on "Custom Attribute", written by Bipin. The only thing I did not understand in this article...
6
by: chrisb | last post by:
Hi, Does anyone know if it's possible to generate a custom compiler warning in vs2005? Something like the //todo: comments (possibly //warning: ), but it would show up in the warnings on every...
0
by: =?utf-8?Q?Poor=20Yorick?= | last post by:
Today I needed to install python from source on linux to a custom path. /usr/lib/libtk8.3.so existed, but I wanted python to link to /my/custom/path/lib/libtk8.4.so. I had LDFLAGS set: ...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.