473,387 Members | 1,791 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,387 software developers and data experts.

gcc's nutty Nethack object code

A few months ago I attempted to compile an old version (3.0j) of
Nethack. Once I got a few modules to compile I disassembled the
code of the shortest and am puzzled by a couple of things.

1. Where the C code calls for an operation that algebraically is

(x - 19) * 10,000,000 [i.e. "return (10000000L*(lev-19));"]

gcc generates code which algebraically is

((x - 19) + (x - 19) * 31 * 63 * 8) * 5 * 128

and I'm wondering at such a roundabout method.

See the attached code, especially the disassembler listing at offsets
19 to 30 (hex).

2. I thought the use of NOPs was to facilitate pre-fetching of
branch or jump-to code; but the compiler doesn't align all jump-to
code on word boundaries, so I'm still at a loss to explain it.

See the pseudo NOPs at offsets 12 and 32 (hex) and compare those
to the lack of a NOP to word-align the instruction at offset 42.
Here are the source and object listings.
----------------------------------------

C SOURCE
From exper.c, from the source for the game Nethack, version 3.0.10, (see http://www.juiblex.co.uk/nethack/source.html).
#include "hack.h"

#ifdef LINT
#define NEW_SCORING
#endif
long
newuexp(lev)
register unsigned lev;
{
#ifdef LINT /* long conversion */
return(0L * lev);
#else
if(lev < 10) return (10L*(1L << lev));
if(lev < 20) return (10000L*(1L << (lev-10)));
return (10000000L*(lev-19));
#endif
}
Given a Nethack experience-level number

{1, 2, 3, 4, 5, 6, 7, 8, 9,
10, 11, 12, 13, 14, 15, 16, 17, 18, 19,
20, 21, 22, 23, 24, 25, 26, 27, 28, 29,
30}

newuexp(lev) returns the minimum number of experience points for
the *next higher* Nethack experience-level, except that there isn't
a level higher than 30

{20, 40, 80, 160, 320, 640, 1280, 2560, 5120,
10K, 20K, 40K, 80K, 160K, 320K, 640K, 1280K, 2560K, 5120K,
10M, 20M, 30M, 40M, 50M, 60M, 70M, 80M, 90M, 100M,
110M}

(DIS)ASSEMBLER LISTING

Compiled: 'gcc -ansi -O -I../include' [5/2/04]
Preprocessing: LINT undefined
Version: gcc version egcs-2.91.66 19990314 (egcs-1.1.2 release) [6/18/04]
OS: NetBSD 1.5.4
CPU: cpu0: AMD Athlon Model 1 (686-class), 604.31 MHz [panix3, 6/26/04]

Disassembled with 'objdump -d'.
exper.o: file format elf32-i386

Disassembly of section .text:
Offset Object code Opcode Src, dest operands
---------------------------------------------------------
00000000 <newuexp>:
0: 55 push %ebp
1: 89 e5 mov %esp,%ebp
3: 8b 4d 08 mov 0x8(%ebp),%ecx
Stardard C linkage:
old base pointer pushed on stack
base pointer = top-of-stack pointer
Reg C = lev [arg list begins at offset 8 from stack top].
6: 83 f9 09 cmp $0x9,%ecx
9: 77 09 ja 14 <newuexp+0x14>
If lev > 9 go to offset 14 hex.
b: b8 0a 00 00 00 mov $0xa,%eax
Reg A = 10 dec. Set up for lev 1-9.
10: eb 2e jmp 40 <newuexp+0x40>
Jump to offset 40 hex, where 10 * 2^lev will be done.
12: 89 f6 mov %esi,%esi
To put following jump-to code on word boundary? Instead of 2 NOPs?
14: 83 f9 13 cmp $0x13,%ecx
17: 76 1f jbe 38 <newuexp+0x38>
If lev <= 19 dec go to offset 38 hex, for lev 10-19 setup.
19: 83 c1 ed add $0xffffffed,%ecx
1c: 89 ca mov %ecx,%edx
1e: c1 e2 05 shl $0x5,%edx
21: 29 ca sub %ecx,%edx
23: 89 d0 mov %edx,%eax
25: c1 e0 06 shl $0x6,%eax
28: 29 d0 sub %edx,%eax
2a: 8d 04 c1 lea (%ecx,%eax,8),%eax
2d: 8d 04 80 lea (%eax,%eax,4),%eax
30: c1 e0 07 shl $0x7,%eax
This multiplies (lev - 19) by 10 million in a puzzlingly roundabout way,
for lev 20-30.

Putting the above 10 instructions in algebraic notation:

Reg C = lev - 19 [2's complement]
Reg D = lev - 19
Reg D = (lev - 19) * 32
Reg D = (lev - 19) * 31
Reg A = (lev - 19) * 31
Reg A = (lev - 19) * 31 * 64
Reg A = (lev - 19) * 31 * 63 = (lev - 19) * 1953
Reg A = (lev - 19) + (lev - 19) * 1953 * 8 = (lev - 19) * 15625
Reg A = (lev - 19) * 15625 * 5 = (lev - 19) * 78125
Reg A = (lev - 19) * 78125 * 128 = (lev - 19) * 10000000

Or, to preserve the intermediate steps, the last line of the above
could read:

Reg A = ((lev - 19) + (lev - 19) * 31 * 63 * 8) * 5 * 128
33: eb 0d jmp 42 <newuexp+0x42>
Jump to offset 42 hex, to end, lev 20-30 done.
35: 8d 76 00 lea 0x0(%esi),%esi
Another jump-to word-boundary alignment for following instruction?
Facilitates pre-fetching for superscaling? Use of NOPs so much
slower?
38: 83 c1 f6 add $0xfffffff6,%ecx
Reg C = lev - 10 [2's complement]. Setup for lev 10-19.
3b: b8 10 27 00 00 mov $0x2710,%eax
Reg A = 10,000 dec. Continued lev 10-19 setup.
40: d3 e0 shl %cl,%eax
Confluence of lev 1-9 and 10-19 processing.

Reg A = reg A [ = 10 or 10,000 dec] * 2 ^ reg C [ = lev or lev - 10].
42: c9 leave
Lev 20-30 processing rejoins here.

Offset 42 is jump-to code, but it's not preceded by a word boundary
aligning NOP. Why?
43: c3 ret


Result in reg A, per C linkage.

--
David Turrell Domain: panix.com

Nov 14 '05 #1
4 1762
In article <cg**********@reader1.panix.com>,
David Turrell <sp******@crayne.org> wrote:
Nethack. Once I got a few modules to compile I disassembled the
code of the shortest and am puzzled by a couple of things.
This isn't really on-topic for comp.lang.c (where I'm reading it), and
I'd guess that the folks in rec.games.roguelike.nethack are probably more
interested in dicusssing the game than the code generated by the compiler.
Followups trimmed to comp.lang.asm.x86 only.
1. Where the C code calls for an operation that algebraically is

(x - 19) * 10,000,000 [i.e. "return (10000000L*(lev-19));"]

gcc generates code which algebraically is

((x - 19) + (x - 19) * 31 * 63 * 8) * 5 * 128

and I'm wondering at such a roundabout method.


Note that the code generated by the compiler uses only shifting and
add/subtract. Most likely, the optimizer "knows" that this chunk of
operations is faster than using a divide (which is typically a somewhat
slow operation on modern processors).
(This is why optimizing compilers are Good. You wouldn't want to have
to untangle something like that in the source code, while if it's done
as part of the compilation only people debugging the optimizer ever have
to worry about it.)
dave

--
Dave Vandervies dj******@csclub.uwaterloo.ca
If anybody asks for advice whether to use goto or not, you should tell
them not to use it. People who ask for advice shouldn't use goto.
--Christian Bau in comp.lang.c

Nov 14 '05 #2
sp******@crayne.org (David Turrell) wrote in
news:cg**********@reader1.panix.com:
A few months ago I attempted to compile an old version (3.0j) of
Nethack. Once I got a few modules to compile I disassembled the
code of the shortest and am puzzled by a couple of things.

1. Where the C code calls for an operation that algebraically is

(x - 19) * 10,000,000 [i.e. "return (10000000L*(lev-19));"]

gcc generates code which algebraically is

((x - 19) + (x - 19) * 31 * 63 * 8) * 5 * 128

and I'm wondering at such a roundabout method.

See the attached code, especially the disassembler listing at offsets
19 to 30 (hex).


All of the issues you're having are specific to the compiler optimizations
you've passed to gcc and nothing specific to Nethack code. The code that
gcc generates is optimized; in terms of CPU instructions, it's cheaper to
do the second than the first, no matter how counter-intuitive that may seem
to the user.

<snip the rest of it>
--
~ Cyde Weys ~
Bite my shiny metal ass.

Nov 14 '05 #3
David Turrell wrote:

A few months ago I attempted to compile an old version (3.0j) of
Nethack. Once I got a few modules to compile I disassembled the
code of the shortest and am puzzled by a couple of things.

1. Where the C code calls for an operation that algebraically is

(x - 19) * 10,000,000 [i.e. "return (10000000L*(lev-19));"]

gcc generates code which algebraically is

((x - 19) + (x - 19) * 31 * 63 * 8) * 5 * 128

and I'm wondering at such a roundabout method.


This is to speed up code execution. Multiplication can take a
long time. When one of the operands is a constant the operation
often can be greatly simplified. For example x * 7:

tmp = x * 8; /* via left shift 3 */
tmp = tmp - x; /* via a single subtract */

and tmp holds the answer.

Your example may or may not be optimum on your machine. In the
past I have limited such operations to constants that can be
expressed as either two or fewer 1 bits (i.e. add two shifted
versions of the other operand) or as a string of contiguous 1 bits
(i.e. subtract two shifted versions of the other operand). These
patterns cover a large proportion of the common small multipliers.

--
Chuck F (cb********@yahoo.com) (cb********@worldnet.att.net)
Available for consulting/temporary embedded and systems.
<http://cbfalconer.home.att.net> USE worldnet address!

Nov 14 '05 #4
On Tue, 2004-08-24 at 02:39 +0000, Cyde Weys wrote:
sp******@crayne.org (David Turrell) wrote in
news:cg**********@reader1.panix.com:
A few months ago I attempted to compile an old version (3.0j) of
Nethack. Once I got a few modules to compile I disassembled the
code of the shortest and am puzzled by a couple of things.

1. Where the C code calls for an operation that algebraically is

(x - 19) * 10,000,000 [i.e. "return (10000000L*(lev-19));"]

gcc generates code which algebraically is

((x - 19) + (x - 19) * 31 * 63 * 8) * 5 * 128

and I'm wondering at such a roundabout method.

See the attached code, especially the disassembler listing at offsets
19 to 30 (hex).


All of the issues you're having are specific to the compiler optimizations
you've passed to gcc and nothing specific to Nethack code. The code that
gcc generates is optimized; in terms of CPU instructions, it's cheaper to
do the second than the first, no matter how counter-intuitive that may seem
to the user.


The OP may also want to try passing different -march and -mcpu options
to GCC and see how that affects the compiled code. The default on i386
GCCs, I think, is to optimize for a 80386.

Fredrik Tolf
Nov 14 '05 #5

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

2
by: AK | last post by:
Hi, I coded a little game in curses, loosely based on nethack but where you command the monsters against the hero (and his dog). I don't have time to develop it further, so I thought someone may...
4
by: cpptutor2000 | last post by:
Could some C++ guru help me please? I am trying to build an application using gcc 3.2.3, that has a some classes using functions defined in some C files in the same directory. If inside the C++...
2
GCC
by: Bo Hunter | last post by:
GCC 3.2 Redhat 9 Why does GCC have a problem with this? if( !( s1 >> Num( base, value )).eof() || ... I have to create a named Num object and everthing is fine.
3
by: Georg | last post by:
Hello, I must be doing something wrong, but I don't get it: - compile gcc -c -O -Iinc src/hello.c -o obj/hello.o gcc -c -O -Iinc src/msg_1.c -o obj/msg_1.o gcc -c -O -Iinc src/msg_2.c -o...
204
by: Alexei A. Frounze | last post by:
Hi all, I have a question regarding the gcc behavior (gcc version 3.3.4). On the following test program it emits a warning: #include <stdio.h> int aInt2 = {0,1,2,4,9,16}; int aInt3 =...
3
by: Hallvard B Furuseth | last post by:
I'm getting horribly lost in the strict aliasing rules. Is this code correct? struct A { int x; }; struct B { int x, y; }; int foo( struct A *a ) { struct B *b = (struct B *) a; return b->x...
68
by: James Dow Allen | last post by:
The gcc compiler treats malloc() specially! I have no particular question, but it might be fun to hear from anyone who knows about gcc's special behavior. Some may find this post interesting;...
6
by: nutty | last post by:
Hi all, I have the following problem ( explanation below code ): // main.cpp class noncopyable { protected: noncopyable() {}
27
by: Dave | last post by:
I'm having a hard time tying to build gcc 4.3.1 on Solaris using the GNU compilers. I then decided to try to use Sun's compiler. The Sun Studio 12 compiler reports the following code, which is in...
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.