473,509 Members | 2,863 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

how to parse an executable in C and find out if there is any return(RET in assembly) or not

Hi,

I was wondering if we could parse or do something in the executable(
whose source language was C). How can I use some scripting language
like perl/python to find out the information about the executable ? Is
it possible ?

Also, how does the compiler add inling to the program ? I know that
whenever it sees"inline" in front of the procedure name, it inlines it.
But if we give the -finline options, it inline all the procedures ? How
does it do that ? does it parse ? Is there any good book or article
that I can refer to ? I need to build a small compiler ? Can I build it
?

Thanks for reading all of my questions :)

-priyanka

Jun 1 '06 #1
34 2929
"priyanka" <pr**********@gmail.com> writes:
I was wondering if we could parse or do something in the executable(
whose source language was C). How can I use some scripting language
like perl/python to find out the information about the executable ? Is
it possible ?
Executable files have different formats on different systems; the C
standard says nothing about any of them. There's no way to do what
you want to do in standard C.
Also, how does the compiler add inling to the program ? I know that
whenever it sees"inline" in front of the procedure name, it inlines it.
But if we give the -finline options, it inline all the procedures ? How
does it do that ? does it parse ? Is there any good book or article
that I can refer to ? I need to build a small compiler ? Can I build it
?


The "inline" keyword is new in C99; not all C compilers support it.
Even for those that do, it's not guaranteed to inline the function.
Quoting the standard:

Making a function an inline function suggests that calls to the
function be as fast as possible. The extent to which such
suggestions are effective is implementation-defined.

The "-finline" option is specific to some particular compiler; it's
not defined by the C standard.

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <*> <http://users.sdsc.edu/~kst>
We must do something. This is something. Therefore, we must do this.
Jun 1 '06 #2
In article <11*********************@y43g2000cwc.googlegroups. com>,
priyanka <pr**********@gmail.com> wrote:
I was wondering if we could parse or do something in the executable(
whose source language was C). How can I use some scripting language
like perl/python to find out the information about the executable ? Is
it possible ?
If you knew the exact format of the executable; formats of executables
are not specified by the C standard, and are subject to change
with different compiler options and different compiler patches and
different operating systems.

Could you explain why you want to look for return instructions in
the generated machine code? Everything in C is expressed in
terms of functions, and all functions must return. The only
exception is that if all execution paths in in a routine provably
ended up at an exit() call, then the compiler could optimize out
the dead return; some compilers probably do actually bother,
but it is a sufficiently unusual case that surely you would
have phrased the question mentioning exit if you'd been thinking
of that situation...

Also, how does the compiler add inling to the program ? I know that
whenever it sees"inline" in front of the procedure name, it inlines it.
inline is never more than a hint.
But if we give the -finline options
then you are dealing in compiler options that lie outside of the
C standards, and are matters of implementation.

it inline all the procedures ? How
does it do that ? does it parse ? Is there any good book or article
that I can refer to ?
Although you aren't using SGI IRIX, you might find some useful
information about inlining at http://techpubs.sgi.com .
In particular, see SGI's manual page for "ipa" which briefly
describes the process.

There are probably also a number of good papers about inlining
available -- try google scholar .

I need to build a small compiler ?
Inlining... your own compiler... the first question.... I wonder
if what you are trying to figure out is not whether a routine
has *some* return instruction (which they almost all do), but
rather at which points in the logical flow that returns might
occur, so that you can try to inline it? If so, then you are
working at the wrong level: you should be working at the
intermediate representation level, after the parse tree is
generated but before code generation.
Can I build it ?


Small? And complete with inlining? Ummm, that's a non-trivial task
unless the language to be compiled is much much simpler than C,
and you aren't going to be trying to do extensive machine-language
level optimization.
--
"No one has the right to destroy another person's belief by
demanding empirical evidence." -- Ann Landers
Jun 1 '06 #3
Keith Thompson <ks***@mib.org> writes:
"priyanka" <pr**********@gmail.com> writes:
I was wondering if we could parse or do something in the executable(
whose source language was C). How can I use some scripting language
like perl/python to find out the information about the executable ? Is
it possible ?
Executable files have different formats on different systems; the C
standard says nothing about any of them. There's no way to do what
you want to do in standard C.

No way to parse an executable in standard C?
No way to find out if there is a return in generated code in standard C?

A little over zealous trying to protect the role of this newsgroup?

--
Chris.
Jun 1 '06 #4
In article <e5**********@enyo.uwa.edu.au>,
Chris McDonald <ch***@csse.uwa.edu.au> wrote:
Keith Thompson <ks***@mib.org> writes:
Executable files have different formats on different systems; the C
standard says nothing about any of them. There's no way to do what
you want to do in standard C.


No way to parse an executable in standard C?
No way to find out if there is a return in generated code in standard C?


If the file format and OS and language restrictions are such
that it is possible to place into execution a section marked as
data, then figuring out whether there is a machine return instruction
or not is equivilent to solving The Halting Problem. I believe it
is generally agreed that The Halting Problem is not solvable in
standard C.
--
Prototypes are supertypes of their clones. -- maplesoft
Jun 2 '06 #5
Chris McDonald <ch***@csse.uwa.edu.au> writes:
Keith Thompson <ks***@mib.org> writes:
"priyanka" <pr**********@gmail.com> writes:
I was wondering if we could parse or do something in the executable(
whose source language was C). How can I use some scripting language
like perl/python to find out the information about the executable ? Is
it possible ?

Executable files have different formats on different systems; the C
standard says nothing about any of them. There's no way to do what
you want to do in standard C.

No way to parse an executable in standard C?
No way to find out if there is a return in generated code in standard C?

A little over zealous trying to protect the role of this newsgroup?


To be precise, there's no *portable* way to do what the OP wants in
standard C (or, probably, in any other language).

If we had a complete definition of the executable file format, it
would probably be possible to write a standard C program that could
parse it and search for RET instructions (asssuming the target
instruction set has a RET instruction at all).

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <*> <http://users.sdsc.edu/~kst>
We must do something. This is something. Therefore, we must do this.
Jun 2 '06 #6
On 1 Jun 2006 15:41:43 -0700, "priyanka" <pr**********@gmail.com>
wrote:
Hi,

I was wondering if we could parse or do something in the executable(
whose source language was C). How can I use some scripting language
Only in a very system dependent way.
like perl/python to find out the information about the executable ? Is
it possible ?
Ask in newsgroups that deal with perl or python.

Also, how does the compiler add inling to the program ? I know that
I imagine it does it the same way it generates any other code but the
real answer is implementation dependent.
whenever it sees"inline" in front of the procedure name, it inlines it.
But if we give the -finline options, it inline all the procedures ? How
Options are compiler specific.
does it do that ? does it parse ? Is there any good book or article
Ask in a newsgroup that deals with your compiler.
that I can refer to ? I need to build a small compiler ? Can I build it
?


With enough experience, I would think so.
Remove del for email
Jun 2 '06 #7
priyanka said:
Hi,

I was wondering if we could parse or do something in the executable(
whose source language was C).
Call fopen to open the file. If that worked, do whatever it is that you want
to do with or to the file, and then call fclose when you're done.
Also, how does the compiler add inling to the program ?


That depends on the compiler.

--
Richard Heathfield
"Usenet is a strange place" - dmr 29/7/1999
http://www.cpax.org.uk
email: rjh at above domain (but drop the www, obviously)
Jun 2 '06 #8
>I was wondering if we could parse or do something in the executable(
whose source language was C). How can I use some scripting language
like perl/python to find out the information about the executable ?
Find out *WHAT* information about the executable?

C does not document any format for an executable; that's
machine-dependent. If you have an executable in a known format
(say, a CP/M executable for a Z80 processor), you can examine the
file by fopen()ing it in binary mode and reading it. If you write
your program portably then you an examine a CP/M Z80 executable on
any machine you choose to run the program on.

There is no guarantee that the machine you're running the program
on even HAS a "return instruction". However, for most machines
that have one and static-link executables, there will likely be
oodles of them in library code (used or not).

Determining things like the boundaries of machine instructions,
what is machine instructions and what is data, and whether certain
code is reachable is likely to be equivalent to the halting problem.
Is
it possible ? Also, how does the compiler add inling to the program ?
Isn't that a little bit like "how does the manufacturer add wings
to a school bus to make an AirBus"? Inlining is not something added
after code generation.
I know that
whenever it sees"inline" in front of the procedure name, it inlines it.
I believe a more accurate statement is:

In C89, if inline is NOT specified, the compiler may inline the function.
In C89, if inline IS specified, the compiler may inline the function but
this is a syntax error, so it's unlikely to produce any code at all.
In C99, if inline is NOT specified, the compiler must not inline the function.
In C99, if inline IS specified, the compiler may inline the function.
A compiler may always choose the option of not inlining.
But if we give the -finline options, it inline all the procedures ? How
ANSI C does not specify compiler options, and that one violates the
requirements of ANSI C.

does it do that ? does it parse ?
A compiler parses source code. It does not parse executables, even
if that's an appropriate word to describe interpreting the content
of an executable.
Is there any good book or article
that I can refer to ? I need to build a small compiler ? Can I build it
?


Gordon L. Burditt
Jun 2 '06 #9
priyanka wrote:
Hi,

I was wondering if we could parse or do something in the executable(
whose source language was C). How can I use some scripting language
like perl/python to find out the information about the executable ? Is
it possible ?
It's possible, but not easy. You would probably have to disassemble the
executable to get the correct context for whatever machine code
represents 'RET'.

The language is irrelevant, the problem remains the same.
Also, how does the compiler add inling to the program ? I know that
whenever it sees"inline" in front of the procedure name, it inlines it.
inline is a hint, nothing more.
But if we give the -finline options, it inline all the procedures ? How
does it do that ? does it parse ? Is there any good book or article
that I can refer to ? I need to build a small compiler ? Can I build it
?

These questions are too tool specific to answer here.

--
Ian Collins.
Jun 2 '06 #10
go***********@burditt.org (Gordon Burditt) writes:

(Still snipping attribution lines. *Please* stop doing that.)

[...]
I believe a more accurate statement is:

In C89, if inline is NOT specified, the compiler may inline the function.
True.
In C89, if inline IS specified, the compiler may inline the function but
this is a syntax error, so it's unlikely to produce any code at all.
True, except that a C89 compiler is allowed to support inlining as an
extension. If it uses "inline" as a keyword, it has to have a way to
turn it off, since "inline" is a valid identifier in C89.
In C99, if inline is NOT specified, the compiler must not inline the
function.
False. As long as the inlined function behaves the same way as a
non-inlined function (including the ability to take its address),
inlining is a perfectly valid and common optimization.

If anything takes the address of the function, the compiler must
generate a callable body for it. If the compiler can prove that the
function's address is never taken (except implicitly in an ordinary
function call), it can inline all calls and not generate a body.
In C99, if inline IS specified, the compiler may inline the function.
True.
A compiler may always choose the option of not inlining.


True.

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <*> <http://users.sdsc.edu/~kst>
We must do something. This is something. Therefore, we must do this.
Jun 2 '06 #11
priyanka wrote:
Hi,

I was wondering if we could parse or do something in the executable(
whose source language was C). How can I use some scripting language
like perl/python to find out the information about the executable ? Is
it possible ?
Yes, however you need to understand the executable's format and machine
code. This varies from platform to platform and even from compiler to
compiler. Basically, you need to disassemble the executable. Post in a
more appropriate forum like alt.lang.asm.
Also, how does the compiler add inling to the program ? I know that
whenever it sees"inline" in front of the procedure name, it inlines it.
The keyword inline was added to C with the 1999 revision of the
standard. The standard merely states that access to functions qualified
as inline should be made as fast as possible. How it is actually done
is upto the implementation.

Generally though, I would assume that the machine code generated for
the function is made continuous, (i.e. embedded), with the sarrounding
machine code. This results in the function's code being duplicated at
each place it's called, instead of having one copy of the function's
code and executing it via the CALL/RET mechanism.
But if we give the -finline options, it inline all the procedures ? How
does it do that ? does it parse ? Is there any good book or article
that I can refer to ? I need to build a small compiler ? Can I build it
?
Is this a college project? If so, it seems to be beyond your abilities.
Constructing a compiler, even a bare-bones one, is rather involved.
These links may get you started...

http://cs.wwc.edu/~aabyan/464/Book/
http://www.scifac.ru.ac.za/compilers/

Also post compiler related questions to comp.compilers.
Thanks for reading all of my questions :)


In the future, please remember to post only standard C related
questions to this group.

Jun 2 '06 #12

priyanka wrote:
Hi,

I was wondering if we could parse or do something in the executable(
whose source language was C). use _open function and open the file in binary mode then use your logic
to parse the executable. How can I use some scripting language
like perl/python to find out the information about the executable ? Is
it possible ?
python people will tell you. Also, how does the compiler add inling to the program ? I know that
whenever it sees"inline" in front of the procedure name, it inlines it. it is not necessary and depends on compiler. But if we give the -finline options, it inline all the procedures ? How
does it do that ? does it parse ? Is there any good book or article
that I can refer to ? I need to build a small compiler ? Can I build it
? Yes you can it depends on your desire and effort.
Thanks for reading all of my questions :)

-priyanka


Jun 2 '06 #13
Haider wrote:
priyanka wrote:
Hi,

I was wondering if we could parse or do something in the executable(
whose source language was C).

use _open function and open the file in binary mode then use your logic
to parse the executable.


There is no standard C function called _open(). fopen() in binary mode
will accomplish the task.

Jun 2 '06 #14

Walter Roberson wrote:
If the file format and OS and language restrictions are such
that it is possible to place into execution a section marked as
data, then figuring out whether there is a machine return instruction
or not is equivilent to solving The Halting Problem. I believe it
is generally agreed that The Halting Problem is not solvable in
standard C.


Hi Walter,

Could you elaborate on what is "The Halting Problem". I'm not able to
remember it, but I did hear of it some time back.

Thanks
Ritesh

Jun 2 '06 #15
Haider wrote:
priyanka wrote:
Hi,

I was wondering if we could parse or do something in the executable(
whose source language was C).
Almost always a bad idea. Optimisers can do very strange things to code
so working out things about the source is not going to be easy.
use _open function and open the file in binary mode then use your logic
to parse the executable.


There is no function named _open in standard C and many common
implementations do not have it. On those few where it does exist I don't
see any benefit for this problem over the standard fopen function/
How can I use some scripting language
like perl/python to find out the information about the executable ? Is
it possible ?

python people will tell you.


About perl as well? I think the OP will need to ask in groups for both
languages.
Also, how does the compiler add inling to the program ? I know that
whenever it sees"inline" in front of the procedure name, it inlines it.

it is not necessary and depends on compiler.
But if we give the -finline options, it inline all the procedures ? How
does it do that ? does it parse ? Is there any good book or article
that I can refer to ? I need to build a small compiler ? Can I build it
?

Yes you can it depends on your desire and effort.


Strange, -finline is rejected by my compiler. Asking in a group
dedicated to the compiler will get more helpful replies.
--
Flash Gordon, living in interesting times.
Web site - http://home.flash-gordon.me.uk/
comp.lang.c posting guidelines and intro:
http://clc-wiki.net/wiki/Intro_to_clc
Jun 2 '06 #16

santosh wrote:
Haider wrote:
priyanka wrote:
Hi,

I was wondering if we could parse or do something in the executable(
whose source language was C).

use _open function and open the file in binary mode then use your logic
to parse the executable.


There is no standard C function called _open(). fopen() in binary mode
will accomplish the task.

sorry for _open please use open it is standard one.

Jun 2 '06 #17
Haider wrote:

santosh wrote:
Haider wrote:
> priyanka wrote:
> > Hi,
> >
> > I was wondering if we could parse or do something in the executable(
> > whose source language was C).
> use _open function and open the file in binary mode then use your logic
> to parse the executable.


There is no standard C function called _open(). fopen() in binary mode
will accomplish the task.

sorry for _open please use open it is standard one.


`open` is not standard in C; `fopen` is.

--
Chris "seeker" Dollin
"We did not have time to find out everything we wanted to know." /A Clash of Cymbals/

Jun 2 '06 #18
ritesh wrote:
Could you elaborate on what is "The Halting Problem". I'm not able to
remember it, but I did hear of it some time back.


<ot>Google is your friend.</>

--
Chris "seeker" Dollin
"No-one here is exactly what he appears." G'kar, /Babylon 5/

Jun 2 '06 #19
"Haider" <hm*****@yahoo.com> wrote:
santosh wrote:
Haider wrote:
priyanka wrote:
> I was wondering if we could parse or do something in the executable(
> whose source language was C).
use _open function and open the file in binary mode then use your logic
to parse the executable.


There is no standard C function called _open(). fopen() in binary mode
will accomplish the task.

sorry for _open please use open it is standard one.


Before you make claims about a standard, you would be wise to know it.
There is no open() function in ISO C, nor is it in any way needed to
solve the OP's problem.

Richard
Jun 2 '06 #20
In article <11**********************@j55g2000cwa.googlegroups .com>,
ritesh <ri**********@gmail.com> wrote:
Walter Roberson wrote:
If the file format and OS and language restrictions are such
that it is possible to place into execution a section marked as
data, then figuring out whether there is a machine return instruction
or not is equivilent to solving The Halting Problem.
Could you elaborate on what is "The Halting Problem". I'm not able to
remember it, but I did hear of it some time back.


Suppose you have program source X available as input ("source" can
include machine language in this context), and you want to determine
whether that program will eventually terminate for all possible inputs
to X, or if X instead might infinite loop on -some- possible input.
Can you write a program H which will read the source X
and tell you whether X will always terminate (or to use another term,
whether X will always halt)?

If you are given X ahead of time, you might be able to write your
halting test program H to determine whether that one program X will
ever halt, and that same program H might work for some other programs
that are very similar to X. With a really clever program H and a big
enough computer, you might even be able to handle a lot of different
programs -- you might even be able to properly answer the question
for trillions of different possible input programs. It turns out,
though, that it is IMPOSSIBLE to write your halting test program H
to handle *all* possible input programs.
If you had an executable image and you knew the exact format of the
image, and if the format told you exactly where the instruction
sequence started and stopped, and if the machine/OS strictly disallowed
executing data, and if you could *reliably* tell what all the
instructions were (i.e., it was somehow impossible to branch into the
"middle" of an instruction), then you could do a mechanical search to
determine whether there was a RETURN instruction somewhere in the
executable stream... but figuring out whether some RETURN will always
be -reached- is *much* harder.

If the machine/OS does allow data to be executed, then again the
problem becomes hard, because you would need to determine whether the
program ever -writes- a RETURN instruction into a location that it will
definitely execute.

If the machine allows you to branch into the middle of an instruction
byte sequence and the machine allows branch addresses to be calculated
then the problem is difficult again, because you would have to figure
out whether a byte sequence that happens to have the same value as
the RETURN instruction might ever be reached by an unusual branch.
For example, if the RETURN instruction is 83 and if "left shift
address register #8 by 3 bits" happens to be 27 83, then the 83
is just "random chance" if you execute in a straight forward manner,
but if there happens to be a branch to the byte past the 27 then
suddenly there's your RETURN instruction. (And it's not as simple
as that, even -- the branch might be to an unexpected boundary somewhere
hundreds of bytes before and all of the reinterpreted instructions
just happen to come out such that the 27 is part of a previous
instruction in the new execution, eventually leading to the 83 RETURN...)
To expound a bit further: your original question was wide open
and did not place any restrictions as to specific executable format,
machine language, amount of machine memory, available hard disk space,
and so on. That made the question equivilent to asking, "Is it
always possible to do this?", and as I discussed briefly above, the
answer to "always possible?" is NO, even though you might be able to
answer properly for trillions of different executables.

If you pin down to one *particular* set of machine specifications, and
place a fixed pre-determined limit on the amount of memory available to
the program that is to be tested, then the answer reverses itself and
becomes YES: for any pre-specified fixed-resource deterministic
computer system, you *can* always answer the question of whether a
given input program {that is within the pre-specified resource limits}
will always halt or not.

The difficulty, though, is that if the program to be examined is
allowed to use M resource slots (e.g., M bytes of memory including
machine registers), and if each of the resource slots can have S
different states (e.g., each 8-bit byte can represent any of 256
different values), then the resources needed for the testing program
would be at least S**(M-1) resource slots where ** indicates
exponentiation. For example, if the programs to be tested were
restricted to a mere 64 Kbyte of memory, then the testing program would
need access to at least 8**65535 bytes of memory... which is a number
more than 59000 digits long. And there isn't that much memory in the
universe :( [If you were to use every elementary particle in the
universe and were able to extract 2 bits of information from each
of them, then you would be able to run halting tests on programs
that had access to about 34 bytes of memory. With a gigabyte of
memory, you'd be able to run halting tests on programs that
had access to about 12 bytes of memory.]
--
I was very young in those days, but I was also rather dim.
-- Christopher Priest
Jun 2 '06 #21
Hi All,

Thanks for your reply.
I was just asking how do people dissassemble it ? I know that after the
binary is dissambled, we can get the objdump and I can find out from
there whether my program has RET(or any other instruction) or not. But
my question is how do people disassemble it ? Now, I know the commands
to dissamble the binary but I was asking for the logic ? Like, how can
they get the dissambled code with all the right instructions by only
looking at the binary ? Because for me all the binaries look like same
:). I wanted to know the overall algorithm or logic behind it.

Thank you,
Priyanka

priyanka wrote:
Hi,

I was wondering if we could parse or do something in the executable(
whose source language was C). How can I use some scripting language
like perl/python to find out the information about the executable ? Is
it possible ?

Also, how does the compiler add inling to the program ? I know that
whenever it sees"inline" in front of the procedure name, it inlines it.
But if we give the -finline options, it inline all the procedures ? How
does it do that ? does it parse ? Is there any good book or article
that I can refer to ? I need to build a small compiler ? Can I build it
?

Thanks for reading all of my questions :)

-priyanka


Jun 2 '06 #22
"priyanka" <pr**********@gmail.com> writes:
I was just asking how do people dissassemble it ? I know that after the
binary is dissambled, we can get the objdump and I can find out from
there whether my program has RET(or any other instruction) or not. But
my question is how do people disassemble it ? Now, I know the commands
to dissamble the binary but I was asking for the logic ? Like, how can
they get the dissambled code with all the right instructions by only
looking at the binary ? Because for me all the binaries look like same
:). I wanted to know the overall algorithm or logic behind it.


Please don't top-post. Read <http://www.caliburn.nl/topposting.html>.

Your question has nothing at all to do with the C programming
language. I'm not sure what would be an appropriate place to ask it
(probably in a newsgroup for your platform), but this isn't it.

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <*> <http://users.sdsc.edu/~kst>
We must do something. This is something. Therefore, we must do this.
Jun 2 '06 #23
In article <11**********************@f6g2000cwb.googlegroups. com>,
priyanka <pr**********@gmail.com> wrote:
I was just asking how do people dissassemble it ? I know that after the
binary is dissambled, we can get the objdump and I can find out from
there whether my program has RET(or any other instruction) or not. But
my question is how do people disassemble it ? Now, I know the commands
to dissamble the binary but I was asking for the logic ? Like, how can
they get the dissambled code with all the right instructions by only
looking at the binary ? Because for me all the binaries look like same
:). I wanted to know the overall algorithm or logic behind it.


On most systems, an executable program is a very structured binary
file. One part of the information will be either an offset to
where the code begins, or else a marker saying "This next bit is code".

Once the location of the code is known, the disassembler program
just essentially emulates the way the CPU itself would decode
instructions, except that it does not actually take the action
specified by the instruction: instead it just prints out
a representation of the instruction and moves on to the next one.

The actual decoding of instructions can often be handled by lookup
tables that consist of masks (to be AND'd with the instruction)
and comparison values -- when the AND'd value compares equal then
you have found the right decoding class (i.e., which instruction
-format- is being used) and you can then use more specific logic
to get the rest of the information.
--
I was very young in those days, but I was also rather dim.
-- Christopher Priest
Jun 3 '06 #24
Please don't top-post. Your reply should follow, or be interspersed
with, the material you quote.

priyanka wrote:
priyanka wrote:
Hi,

I was wondering if we could parse or do something in the executable(
whose source language was C). How can I use some scripting language
like perl/python to find out the information about the executable ? Is
it possible ?


Hi All,

Thanks for your reply.
I was just asking how do people dissassemble it ? I know that after the
binary is dissambled, we can get the objdump and I can find out from
there whether my program has RET(or any other instruction) or not. But
my question is how do people disassemble it ? Now, I know the commands
to dissamble the binary but I was asking for the logic ? Like, how can
they get the dissambled code with all the right instructions by only
looking at the binary ? Because for me all the binaries look like same
:). I wanted to know the overall algorithm or logic behind it.


Why don't you read Walter Roberson's excellent article on the halting
problem elsethread. Essentially, disassembly reduces to the halting
problem. Now while you can disassemble an object file to a set of
mnemonics easily enough, deciding whether the result is "correct", i.e.
is semantically equivalent to the original object code, is a *far*
harder proposition.

Moreover, disaasembly is a general programming question. Unless you
have a C related question, please post in a more appropriate newsgroup
like comp.programming or alt.lang.asm. If you'll be disassembling x86
object files, comp.lang.asm.x86 is also a better place. Over the years,
there has been a lot of discussion on this issue in alt.lang.asm.

The bottom line is, unless you're very well versed with your target's
instruction set architechture, assembly language, code flow analysis
and so on and so on, (which you don't appear to be), it would be
preferable to use an existing disassembler like IDAPro etc.

Use Google to search for "Disassembly Engines", "Disassemblers",
"Halting problem" etc.

Jun 3 '06 #25
>On most systems, an executable program is a very structured binary
file. One part of the information will be either an offset to
where the code begins, or else a marker saying "This next bit is code".
It's often not that simple. The (read-only) code section may contain
a bunch of other stuff that is not code but is still in the code
section: string literals, branch tables, floating-point constants
on machines that don't have a "load double immediate" instrution,
etc. Some compilers put this in a separate section, (read-only
data vs. read-only code) some don't. On some machines there is
an advantage in having the other stuff "near" the code that uses
it (a short offset is used).
Once the location of the code is known, the disassembler program
just essentially emulates the way the CPU itself would decode
instructions, except that it does not actually take the action
specified by the instruction: instead it just prints out
a representation of the instruction and moves on to the next one.
This approach often generates a big confusing mess, largely due to
decoding string literals and other data as machine code.

I once wrote a Z80 multi-pass disassembler that tried to trace code
execution paths. It knew about things like conditional and
unconditional branches, and it decoded things as instructions only
that stuff that were referenced as instrutions. It also noticed
locations being referenced as 2-byte or 4-byte data values and
displayed them appropriately. It tried to identify references to
strings. Unfortunately, a lot of stuff was still not decoded because
it didn't understand well the many different variants of code
generated through C's switch construct or calling through a function
pointer, and you also had to give it "hints" about entry points to
the program that weren't standard. For other machine architectures,
you might have to track fixed values in registers loaded a few
instructions before to actually figure out where a memory reference
was going.

Then there were the odd constructs to deliberately trip up
disassemblers, for example, on the 8080/Z80:

label0: db 0xc3 ; fool disassemblers
label1: in 5

If you disassemble this starting at label0:, you get a branch instrution
which is 3 bytes long. But something branches *into* the middle
of the instruction, it's an I/O instruction. And then there's self-modifying
code. On the 8080 to do I/O to a variable port number, you *had* to
modify the instruction or make a giant switch for all possible port numbers.
The actual decoding of instructions can often be handled by lookup
tables that consist of masks (to be AND'd with the instruction)
and comparison values -- when the AND'd value compares equal then
you have found the right decoding class (i.e., which instruction
-format- is being used) and you can then use more specific logic
to get the rest of the information.


Gordon L. Burditt
Jun 3 '06 #26

"Walter Roberson" <ro******@ibd.nrc-cnrc.gc.ca> wrote

If the file format and OS and language restrictions are such
that it is possible to place into execution a section marked as
data, then figuring out whether there is a machine return instruction
or not is equivilent to solving The Halting Problem. I believe it
is generally agreed that The Halting Problem is not solvable in
standard C.

You mean place into data a section marked as execution.
Once you've done that, you can run a C program on the executable file to
find out all sorts of interesting things, provided you know the format.
Maybe if you don't know the format if you are a cryptologist of professional
hacker.
The program can be as portable as makes no difference. Technically you could
run on a file system that can't handle the byte size of the original.
--
Buy my book 12 Common Atheist Arguments (refuted)
$1.25 download or $7.20 paper, available www.lulu.com/bgy1mm
Jun 3 '06 #27
Gordon Burditt wrote:
.... snip ...
On the 8080 to do I/O to a variable port number, you *had* to
modify the instruction or make a giant switch for all possible
port numbers.


No you didn't. You could write and execute a baby routine on the
stack, consisting of "in port; ret; nop". The whole thing was
completely reentrant.

You have neglected to include attributions for material you
quoted. Please do not do this again. It is tantamount to
plagiarism.

--
Some informative links:
news:news.announce.newusers
http://www.geocities.com/nnqweb/
http://www.catb.org/~esr/faqs/smart-questions.html
http://www.caliburn.nl/topposting.html
http://www.netmeister.org/news/learn2quote.html

Jun 3 '06 #28
>> On the 8080 to do I/O to a variable port number, you *had* to
modify the instruction or make a giant switch for all possible
port numbers.
No you didn't. You could write and execute a baby routine on the
stack, consisting of "in port; ret; nop". The whole thing was
completely reentrant.


That's still self-modifying code. It's going to be very, very
difficult to get a disassembler to trace through that.
You have neglected to include attributions for material you
quoted. Please do not do this again. It is tantamount to
plagiarism.


attribution = misattribution, as it's been made apparent recently
in this newsgroup that there is not universal agreement about what
the attributions mean.

Gordon L. Burditt
Jun 3 '06 #29
"priyanka" <pr**********@gmail.com> wrote in message
news:11*********************@y43g2000cwc.googlegro ups.com...
Hi,

I was wondering if we could parse or do something in the executable(
whose source language was C). How can I use some scripting language
like perl/python to find out the information about the executable ? Is
it possible ?

In general, for compiled languages, the original source language of an
executable usually has little influence on the final executable.
Its often possible to combine modules written in several languages.
Also, how does the compiler add inling to the program ?
Basically it creates a copy of the procedure...
I know that
whenever it sees"inline" in front of the procedure name, it inlines it.
But if we give the -finline options, it inline all the procedures ?
I doubt if it really inlines all procedures....
How
does it do that ? does it parse ? Is there any good book or article
that I can refer to ?
Probably all very techy,
I need to build a small compiler ? Can I build it?
It depends on what you mean by "small". To get small you have to take things
out. Probably easiest to adapt one of the many "small c" or "tiny c"
varients...

Thanks for reading all of my questions :)

Thats ok, it passes the time,
-priyanka


dave
Jun 3 '06 #30
Gordon Burditt wrote:

<snip>
You have neglected to include attributions for material you
quoted. Please do not do this again. It is tantamount to
plagiarism.


attribution = misattribution, as it's been made apparent recently
in this newsgroup that there is not universal agreement about what
the attributions mean.


Only you seem to think so, everyone else who expresses an opinion wants
attributions. There have been one or two cases where mistakes have been
made, yes, but none where it was not sorted out quickly.
--
Flash Gordon, living in interesting times.
Web site - http://home.flash-gordon.me.uk/
comp.lang.c posting guidelines and intro:
http://clc-wiki.net/wiki/Intro_to_clc
Jun 3 '06 #31
Gordon Burditt wrote:
On the 8080 to do I/O to a variable port number, you *had* to
modify the instruction or make a giant switch for all possible
port numbers.
No you didn't. You could write and execute a baby routine on the
stack, consisting of "in port; ret; nop". The whole thing was
completely reentrant.


That's still self-modifying code. It's going to be very, very
difficult to get a disassembler to trace through that.


No it isn't. No code is ever modified. It is written. The trick
does require that you can execute code from the stack, which is no
problem with the 8080.
You have neglected to include attributions for material you
quoted. Please do not do this again. It is tantamount to
plagiarism.


attribution = misattribution, as it's been made apparent recently
in this newsgroup that there is not universal agreement about what
the attributions mean.


As I pointed out, quoting without attribution is effectively
plagiarism. Every newsreader of which I am aware will construct
the attribution lines automatically. There is no need to steal the
words of others.

--
Some informative links:
news:news.announce.newusers
http://www.geocities.com/nnqweb/
http://www.catb.org/~esr/faqs/smart-questions.html
http://www.caliburn.nl/topposting.html
http://www.netmeister.org/news/learn2quote.html
Jun 3 '06 #32
go***********@burditt.org (Gordon Burditt) writes:
[...]
attribution = misattribution, as it's been made apparent recently
in this newsgroup that there is not universal agreement about what
the attributions mean.


There is nearly universal agreement. Without you, there would be
universal agreement.

You're wrong, and the rest of us are right.

Last time we discussed this, you were asked several questions. You
never answered them. My tentative conclusion is that you just don't
know what you're talking about, but I'd be glad to be proven wrong.

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <*> <http://users.sdsc.edu/~kst>
We must do something. This is something. Therefore, we must do this.
Jun 3 '06 #33
In article <AP********************@bt.com>,
Malcolm <re*******@btinternet.com> wrote:
"Walter Roberson" <ro******@ibd.nrc-cnrc.gc.ca> wrote
If the file format and OS and language restrictions are such
that it is possible to place into execution a section marked as
data, then figuring out whether there is a machine return instruction
or not is equivilent to solving The Halting Problem.

You mean place into data a section marked as execution.
No I don't. To be able to place a memory section into execution
means that it is possible to (somehow) have the CPU use the
memory section as the source of instructions that are directly
executed. To use other phrasing, to be able to branch into the data.

Being able to branch into data is possible on -most- systems, but
the memory management controls on -some- systems are able to
designate some memory pages as non-executable (in a manner very
similar to being able to mark memory pages as non-writable.)
On such systems, if an attempt is made to branch into the data,
the system detects the attempt and faults the program.

[And there's also the Harvard architecture, in which code and
data are in seperate memory address spaces.]

Once you've done that, you can run a C program on the executable file to
find out all sorts of interesting things, provided you know the format.


If you are able to fopen() the executable itself in binary read
mode, you don't need to "place into data a section marked as
execution". I was talking about something else completely: that
if the program is able to start executing data, then you cannot
in the general case statically prove that the program will not
at -some- point synthesis a RETURN instruction to execute.
--
"law -- it's a commodity"
-- Andrew Ryan (The Globe and Mail, 2005/11/26)
Jun 4 '06 #34
"Walter Roberson" <ro******@ibd.nrc-cnrc.gc.ca> wrote
If you are able to fopen() the executable itself in binary read
mode, you don't need to "place into data a section marked as
execution". I was talking about something else completely: that
if the program is able to start executing data, then you cannot
in the general case statically prove that the program will not
at -some- point synthesis a RETURN instruction to execute.

I misunderstood.
The post is clear now.
--
Buy my book 12 Common Atheist Arguments (refuted)
$1.25 download or $7.20 paper, available www.lulu.com/bgy1mm

Jun 4 '06 #35

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

0
990
by: william | last post by:
Hi all, God help me. I have two COM+(developed by VB.NET, they're serviced components, belongs to different project ABCUtil and ABCSalary), com1, and com2. In com2, call com1 to do some...
4
6609
by: Frank Rizzo | last post by:
I have a batch script that is supposed to compile this script: vbc.exe /main:Form1 /target:winexe...
1
3561
by: Larry | last post by:
Hi my friends, I created a dotnet class library and saved it at : F:\Test\c++\ex32a\debug\ex32a.dll. I created a client program to use this class library. My client file is loacated at :...
0
980
by: Boni | last post by:
Dear all, how do I register the path to the assembly (I don't want that it goes to GAC). I thougth that regasm $(OutDir)/My.dll /silent would be enougth. But I get the error "File or assembly name...
1
5394
by: realgeek | last post by:
Hi. I am writing a class lib that has com interop enabled and it is housed and used within some other application. So then, I serialize one of the classes, and on subsequent launches I try to...
1
2457
by: OrionLee | last post by:
I am using C# to work with a 3rd party DLL (Nevron Charts), and attempting to serialise it. The serialisation itself is handled somewhere inside the DLL, so to get it to happen you call the Nevron's...
0
1109
by: alexl | last post by:
Hi, I'm trying a little experiment. I made two c# programs, hi.exe and hello.exe and placed them in the same directory. Then I made an entry in the manifest for hello.exe to depend on the hi.exe...
1
2078
by: jmr | last post by:
Hi All, Since I have choosen to use state server for my session variables , I sometimes get this message Unable to find assembly 'App_Web_4nsap3_u, Version=0.0.0.0, Culture=neutral,...
1
8094
by: =?Utf-8?B?c2lwcHl1Y29ubg==?= | last post by:
Hi Trying to copy a complex data structure with this code public object Clone() { MemoryStream ms = new MemoryStream(); BinaryFormatter bf = new BinaryFormatter(); bf.Serialize(ms, this);...
0
7237
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
7347
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
7416
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
1
7073
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
7506
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
5656
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...
1
5062
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...
0
4732
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and...
0
3218
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.