473,385 Members | 1,930 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,385 software developers and data experts.

SSE and MMX support in the JIT compiler

Hi,

I was wondering if anyone knows whether the JIT compiler supports
SSE/SSE2 instructions? Thanks

-Andre

Nov 15 '05 #1
13 7366
We currently use SSE2 for some things like the double to int cast, it's not
used for general codegen though.

--
David Notario
Software Design Engineer - CLR JIT Compiler
"Andre" <fo********@hotmail.com> wrote in message
news:3F**************@hotmail.com...
Hi,

I was wondering if anyone knows whether the JIT compiler supports
SSE/SSE2 instructions? Thanks

-Andre

Nov 15 '05 #2
We currently use SSE2 for some things like the double to int cast, it's not
used for general codegen though.

--
David Notario
Software Design Engineer - CLR JIT Compiler
"Andre" <fo********@hotmail.com> wrote in message
news:3F**************@hotmail.com...
Hi,

I was wondering if anyone knows whether the JIT compiler supports
SSE/SSE2 instructions? Thanks

-Andre

Nov 15 '05 #3
Thanks David,

There's one thing I need to ask you - were there a fair amount of
features/improvements in ver1.1 of the CLR compared to v1.0? I see that
total numebr of bytes JIT'd are noticeably less in ver1.1 and some
profiling showed that ver1.1 gave better MFLOPs in executing some
benchmarking code. Thanks

-Andre

David Notario wrote:
We currently use SSE2 for some things like the double to int cast, it's not
used for general codegen though.


Nov 15 '05 #4
Thanks David,

There's one thing I need to ask you - were there a fair amount of
features/improvements in ver1.1 of the CLR compared to v1.0? I see that
total numebr of bytes JIT'd are noticeably less in ver1.1 and some
profiling showed that ver1.1 gave better MFLOPs in executing some
benchmarking code. Thanks

-Andre

David Notario wrote:
We currently use SSE2 for some things like the double to int cast, it's not
used for general codegen though.


Nov 15 '05 #5
No, we didn't do much optimization work in the JIT for v1.1, except for some
very targetted ones that offered a significant speed boost in exchange for
little dev work (1.1 was mainly a security fixes only release for the CLR),
such as the double to int cast (40x speed increase by just using SSE2
instruction)

--
David Notario
Software Design Engineer - CLR JIT Compiler
"Andre" <fo********@hotmail.com> wrote in message
news:3F**************@hotmail.com...
Thanks David,

There's one thing I need to ask you - were there a fair amount of
features/improvements in ver1.1 of the CLR compared to v1.0? I see that
total numebr of bytes JIT'd are noticeably less in ver1.1 and some
profiling showed that ver1.1 gave better MFLOPs in executing some
benchmarking code. Thanks

-Andre

David Notario wrote:
We currently use SSE2 for some things like the double to int cast, it's not used for general codegen though.

Nov 15 '05 #6
Where I say security fixes for the CLR, I mean for the JIT, there were perf
improvements in other areas different than the JIT

--
David Notario
Software Design Engineer - CLR JIT Compiler
"David Notario" <dn******@online.microsoft.com> wrote in message
news:eZ*************@TK2MSFTNGP12.phx.gbl...
No, we didn't do much optimization work in the JIT for v1.1, except for some very targetted ones that offered a significant speed boost in exchange for
little dev work (1.1 was mainly a security fixes only release for the CLR), such as the double to int cast (40x speed increase by just using SSE2
instruction)

--
David Notario
Software Design Engineer - CLR JIT Compiler
"Andre" <fo********@hotmail.com> wrote in message
news:3F**************@hotmail.com...
Thanks David,

There's one thing I need to ask you - were there a fair amount of
features/improvements in ver1.1 of the CLR compared to v1.0? I see that
total numebr of bytes JIT'd are noticeably less in ver1.1 and some
profiling showed that ver1.1 gave better MFLOPs in executing some
benchmarking code. Thanks

-Andre

David Notario wrote:
We currently use SSE2 for some things like the double to int cast,
it's
not used for general codegen though.


Nov 15 '05 #7
Thanks David,

David Notario wrote:
No, we didn't do much optimization work in the JIT for v1.1, except for some
very targetted ones that offered a significant speed boost in exchange for
little dev work (1.1 was mainly a security fixes only release for the CLR),
such as the double to int cast (40x speed increase by just using SSE2
instruction)


So does that mean v1.0 didn't use SSE2 at all (and only used SSE?)? I
guess that's just why I see an increase in the number of FLOPS using v1.1.

If optimizations are being targetted to a particular platform.. does
that imply that there are other platforms .NET is being ported to? (I'm
only aware of Mono and that's on a x86) Does Microsoft plan on porting
..NET (or allow others) to Sun or any other platform for instance?

You mentioned that there have been some improvement in areas other than
the JIT.. could you name some? I'm trying to write up a report for my
company to convince them to completely switch to .NET from J2EE/J2SE and
for that I need to have solid reasoning and give accurate measurements
to show improvements in CLR v1.1 over v1.0. After a months study I'm
personally convinced that the CLR will improve (and some very
interesting features are being added to C# in the next release).. I
can't seem to find anything documented on the current implementation of
the CLR and Rotor, for that matter, is simply not worth studying (as the
optimizing compiler has bee stripped off from it). It would really help
me if you could shed a little more light on this please. Thanks again
for your time David,

-Andre
Nov 15 '05 #8
Andre <fo********@hotmail.com> wrote:
Is this also true for Whidbey, do you know? (And can you say? :)


What's Whidbey? (is that the code name for the next version of C#?)


I believe it's the next version of Visual Studio .NET, including the
next version of .NET itself, which will in turn support the features of
the next version of C# (such as generics).

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet/
If replying to the group, please do not mail me too
Nov 15 '05 #9
Jon Skeet wrote:
Andre <fo********@hotmail.com> wrote:
Is this also true for Whidbey, do you know? (And can you say? :)


What's Whidbey? (is that the code name for the next version of C#?)

I believe it's the next version of Visual Studio .NET, including the
next version of .NET itself, which will in turn support the features of
the next version of C# (such as generics).

Ah.. catchy name :) Thanks Jon

-Andre

Nov 15 '05 #10
We've done more perf work in the JIT for out next version than for our
previous version, but we still won't be generating SSE2 or MMX code in our
codegen.

The rationale behind not doing SSE2 was that we didn't have the time to do a
vectorizing optimizations. If you use SSE2 for scalar operations, it's not
always faster than the equivalent x87 code in 'normal' code (adds and muls
have different latencies in SSE2 vs x87 (mul has lower latency in SSE2, but
add is higher, IIRC), plus some operations (casting from doubles to floats
or floats to doubles) are quite slow in SSE2 compared to x87. We also have
to support processors without SSE2.

So, with all these arguments against it, we decided to focus our work on
improving our x87 codegen and leaving the door open for an SSE2
implementation, instead of putting all our eggs in the SSE2 basket.

--
David Notario
Software Design Engineer - CLR JIT Compiler
"Jon Skeet" <sk***@pobox.com> wrote in message
news:MP************************@news.microsoft.com ...
David Notario <dn******@online.microsoft.com> wrote:
No, we didn't do much optimization work in the JIT for v1.1, except for some very targetted ones that offered a significant speed boost in exchange for little dev work (1.1 was mainly a security fixes only release for the CLR), such as the double to int cast (40x speed increase by just using SSE2
instruction)


Is this also true for Whidbey, do you know? (And can you say? :)

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet/
If replying to the group, please do not mail me too

Nov 15 '05 #11
Hello,
Is there any work being done on using specific features of a processor
to increase performance? For example, on AMD Athlon XPs, there are 4
integer execution pipelines. I can get a 500% decrease in time if I
do a loop like this:

int sums0=0, sums1=0, sums2=0, sums3=0, sums=0;
for(x=0;x<nums.Length/4;x+=4)
{
sums0+=nums[x];
sums1+=nums[x+1];
sums2+=nums[x+2];
sums3+=nums[x+3];
}
sums=(sums0+sums1)+(sums2+sums3);

where nums[] is an array of integers. I know this would be hard to
implement in the JIT, but isn't one of the (main) ideas behind the JIT
is the ability to do run-time optimizations for whatever platform the
code is running on?

Thanks,
Austin Ehlers
On Wed, 6 Aug 2003 00:32:41 -0700, "David Notario"
<dn******@online.microsoft.com> wrote:
We've done more perf work in the JIT for out next version than for our
previous version, but we still won't be generating SSE2 or MMX code in our
codegen.

The rationale behind not doing SSE2 was that we didn't have the time to do a
vectorizing optimizations. If you use SSE2 for scalar operations, it's not
always faster than the equivalent x87 code in 'normal' code (adds and muls
have different latencies in SSE2 vs x87 (mul has lower latency in SSE2, but
add is higher, IIRC), plus some operations (casting from doubles to floats
or floats to doubles) are quite slow in SSE2 compared to x87. We also have
to support processors without SSE2.

So, with all these arguments against it, we decided to focus our work on
improving our x87 codegen and leaving the door open for an SSE2
implementation, instead of putting all our eggs in the SSE2 basket.


Nov 15 '05 #12
What's the original code? I think you made a mistake in your unrolling and
you are effectively doing 4 times less work (loop condition should be
x<nums.Length)

We do take advantage of some processor features and generate different code
for different processors. We could get better there, though, but we also
have a finite amount of time. Also, any processor specifics add a lot of
work to our QA process.

--
David Notario
Software Design Engineer - CLR JIT Compiler
"Austin Ehlers" <th***********************@hotmail.com> wrote in message
news:59********************************@4ax.com...
Hello,
Is there any work being done on using specific features of a processor
to increase performance? For example, on AMD Athlon XPs, there are 4
integer execution pipelines. I can get a 500% decrease in time if I
do a loop like this:

int sums0=0, sums1=0, sums2=0, sums3=0, sums=0;
for(x=0;x<nums.Length/4;x+=4)
{
sums0+=nums[x];
sums1+=nums[x+1];
sums2+=nums[x+2];
sums3+=nums[x+3];
}
sums=(sums0+sums1)+(sums2+sums3);

where nums[] is an array of integers. I know this would be hard to
implement in the JIT, but isn't one of the (main) ideas behind the JIT
is the ability to do run-time optimizations for whatever platform the
code is running on?

Thanks,
Austin Ehlers
On Wed, 6 Aug 2003 00:32:41 -0700, "David Notario"
<dn******@online.microsoft.com> wrote:
We've done more perf work in the JIT for out next version than for our
previous version, but we still won't be generating SSE2 or MMX code in ourcodegen.

The rationale behind not doing SSE2 was that we didn't have the time to do avectorizing optimizations. If you use SSE2 for scalar operations, it's notalways faster than the equivalent x87 code in 'normal' code (adds and mulshave different latencies in SSE2 vs x87 (mul has lower latency in SSE2, butadd is higher, IIRC), plus some operations (casting from doubles to floatsor floats to doubles) are quite slow in SSE2 compared to x87. We also haveto support processors without SSE2.

So, with all these arguments against it, we decided to focus our work on
improving our x87 codegen and leaving the door open for an SSE2
implementation, instead of putting all our eggs in the SSE2 basket.

Nov 15 '05 #13
did you mean:

for(x=0;x<nums.Length/4;x++)
{
sum+=nums[x];
}

for(x=nums.Length/4;x<nums.Length;x+=4)
{
sums0+=nums[x];
sums1+=nums[x+1];
sums2+=nums[x+2];
sums3+=nums[x+3];
}

sum = sums0 + sums1 + sums2 + sums03;

-Andre

Austin Ehlers wrote:
Hello,
Is there any work being done on using specific features of a processor
to increase performance? For example, on AMD Athlon XPs, there are 4
integer execution pipelines. I can get a 500% decrease in time if I
do a loop like this:

int sums0=0, sums1=0, sums2=0, sums3=0, sums=0;
for(x=0;x<nums.Length/4;x+=4)
{
sums0+=nums[x];
sums1+=nums[x+1];
sums2+=nums[x+2];
sums3+=nums[x+3];
}
sums=(sums0+sums1)+(sums2+sums3);

where nums[] is an array of integers. I know this would be hard to
implement in the JIT, but isn't one of the (main) ideas behind the JIT
is the ability to do run-time optimizations for whatever platform the
code is running on?

Thanks,
Austin Ehlers
On Wed, 6 Aug 2003 00:32:41 -0700, "David Notario"
<dn******@online.microsoft.com> wrote:

We've done more perf work in the JIT for out next version than for our
previous version, but we still won't be generating SSE2 or MMX code in our
codegen.

The rationale behind not doing SSE2 was that we didn't have the time to do a
vectorizing optimizations. If you use SSE2 for scalar operations, it's not
always faster than the equivalent x87 code in 'normal' code (adds and muls
have different latencies in SSE2 vs x87 (mul has lower latency in SSE2, but
add is higher, IIRC), plus some operations (casting from doubles to floats
or floats to doubles) are quite slow in SSE2 compared to x87. We also have
to support processors without SSE2.

So, with all these arguments against it, we decided to focus our work on
improving our x87 codegen and leaving the door open for an SSE2
implementation, instead of putting all our eggs in the SSE2 basket.



Nov 15 '05 #14

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

188
by: Ilias Lazaridis | last post by:
I'm a newcomer to python: - E01: The Java Failure - May Python Helps? http://groups-beta.google.com/group/comp.lang.python/msg/75f0c5c35374f553 - I've download (as suggested) the python...
10
by: Bill Davidson | last post by:
Hi there, Please forgive me for posting this article on multiple groups. Being new in the newsgroups, I was not sure which group would have been appropriate for my question. Sorry. My...
0
by: pruebauno | last post by:
Hello all, I am having issues compiling Python with large file support. I tried forcing the configure script to add it but then it bombs in the make process. Any help will be appreciated. ...
40
by: Matt | last post by:
Please skip to the last paragraph if you are in a hurry. Some of the integer variables in my application will need to hold values bigger than 2^32-1. Others won't need to be that big. Time...
5
by: Mark Shelor | last post by:
Problem: find a portable way to determine whether a compiler supports the "long long" type of C99. I thought I had this one solved with the following code: #include <limits.h> #ifdef...
21
by: asm | last post by:
Hi All, Like typdef, does C have further support for portability? Thanks, ASM
13
by: vincent | last post by:
I made the suggestion "Need built in obfuscation support in C# compiler" to Microsoft. Anyone here agree with me? If yes, please cast your vote on this suggestion to raise its priority.
126
by: ramyach | last post by:
Hi friends, I need to write a parallel code in 'C' on the server that is running SGI Irix 6.5. This server supports MIPS Pro C compiler. I don't have any idea of parallel C languages. I looked...
8
by: Divick | last post by:
Hi all, can somebody tell how much std::wstring is supported across different compilers on different platforms? AFAIK std::string is supported by almost all C++ compilers and almost all platforms,...
2
by: Greg Herlihy | last post by:
I was extremely pleased to learn (from this proposal on the C++ Standard's Committee's official website): http://www.open-std.org/JTC1/SC22/WG21/docs/papers/2008/n2601.html that C++ is likely...
0
by: taylorcarr | last post by:
A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.