unreal. how much theoretical headroom is left to optimize js compiler performanc...

jerf · on Dec 7, 2010

To a first approximation, my answer would be something like http://shootout.alioth.debian.org/u32/benchmark.php?test=all... .

That's not necessarily the whole answer and I imagine JS can't ever quite go that fast. But still....

Raphael_Amiard · on Dec 7, 2010

> and I imagine JS can't ever quite go that fast

I'm not the most knowledgeable person on the subject, but from what i understand, there is no theoretical reason that JS couldn't go that fast. The two languages are more similar than they are different, even if JavaScript is quite more complex.

I remember Mike Pall saying something similar in an LTU thread some time ago.

grayrest · on Dec 7, 2010

> I remember Mike Pall saying something similar in an LTU thread some time ago.

He did and the mozilla guys pointed out how this wasn't the case. The languages are very similar but JS has some weird semantics due to how things are scoped. (ref. the Chakra optimization brouhaha a month ago)

jules · on Dec 7, 2010

Going the other way: what small changes in languages can we make to make them much more optimize-able while at the same time keeping (most of) the expressiveness?

grayrest · on Dec 7, 2010

It's a tradeoff between compatibility with existing JS versus perf/capability enhancements. The ECMAScript committee goes back and forth on this topic. The biggest stride in that direction was strict mode, which is intended to catch most of the low hanging fruit. I know Brendan Eich has mentioned a number of things that could change to get the language faster. I don't, however, know if I read it all in one place or I'm combining nultiple one-off examples.

I'm not confident enough in my memory to write out a probably incorrect list of things I remember. Here's a list of the various things I've read from Brendan in case you're interested in tracking it down:

http://lambda-the-ultimate.org/node/3851#comment-57671

This is the LtU thread my previous comment referred to. It's a large (and fantastic!) thread, but I believe that comment and children are the most direct comments back and forth between Mike and Brendan. Andreas Gal is also on the thread and from Mozilla.

http://www.aminutewithbrendan.com/

Brendan's weekly JS podcast. They're relatively accessible and generally cover a lot of area. A good way to get in the language zeitgeist.

http://brendaneich.com/

Brendan's blog, mostly focuses on ES Harmony stuff and Mozilla specific topics.

SwellJoe · on Dec 7, 2010

Would this imply that Lua has reached some pinnacle of speed and can't go any faster? That seems to be a side effect of your statement.

I'm not familiar with Lua, beyond reading an article or two about it, but does its simplicity imply some sort of maximal efficiency? Are the developers behind Lua simply the best programmers in the world and already have everything figured out with regard to optimizing a JIT? I'm not arguing with you...it does seem like that's a reasonable goal for JavaScript JITs to strive for in the near future. But, it doesn't really answer the question of how much better performance can get (in JavaScript or Lua or any other language). Past performance is not necessarily indicative of future performance when so many people are working on the problem from so many angles.

jws · on Dec 7, 2010

Browsing the alternatives in the shootout dataset, LuaJIT appears to be the fastest of the dynamic languages and feature-wise matches Javascript well enough to be a fair benchmark.

You can gain another factor of 2 or so in speed by going to a static language like C or Ada, but that isn't really a fair comparison and you can see the price paid in code size.

The good news for the web is that there may be another factor of 2 to 3 available for Javascript speedup.

eru · on Dec 8, 2010

You can also go to a static language like OCaml. They don't blow up your code size, but are also fast.

jerf · on Dec 7, 2010

LuaJIT is a major outlier, probably the most surprising result in the entire shootout. In point of fact there isn't much more LuaJIT can do without flat-out exceeding C, which isn't going to happen on the shootout to any significant degree any time soon.

(The conventional "JIT can be faster than compiled code" argument doesn't apply because the problems are accurately known by the author of the shootout benchmark code in advance, so, for instance, if there's a speed advantage to sticking with 'char' where you might have been tempted to write 'int', the C shootout code already does that.)

jules · on Dec 7, 2010

That argument is often made for JITs, but I have never seen a real world example where extra runtime knowledge that JITs have is used that couldn't be done better by a static compiler, except in the cases where runtime code loading is used.

mikemike · on Dec 8, 2010

Alias analysis is a good example. A JIT compiler may speculatively add dynamic disambiguation guards (p1 != p2 ==> p1[i] cannot alias p2[i]). If the assumption turns out to be wrong, the JIT compiler dynamically attaches a side branch to the guard using the new assumption (p1 == p2 ==> p1[i] == p2[i], which is an even more powerful result).

Doing this in a static compiler is hard, because it would have to compile both paths for every such disambiguation possibility. This quickly leads to a code explosion. You'd need very, very smart static PGO to cover this case: there are no branch probabilities to measure, since the compiler doesn't know that inserting such a branch might be beneficial. It may only derive this by running PGO on code which has these branches, which leads to the code explosion again.

Auto-vectorization is another example: a static compiler may have to cover all possible alignments for N input vectors and M output vectors. This can get very expensive, so most static compilers simply don't do it and generate slower, generic code. A JIT compiler can specialize to the runtime alignment and even compile secondary branches in case the alignment changes later on (e.g. a filter fed with different kernel lengths at runtime).

wmf · on Dec 7, 2010

I agree in general, although I will point out that virtually no C developers use PGO while it's on by default in HotSpot and now V8. (Of course, it looks like Java needs PGO just to try to catch up with gcc -O3.)

nimrody · on Dec 8, 2010

Not strictly the same but take a look at the CPU world:

VLIW (compilers try to optimize processing based on static knowledge - e.g. Intel's Itanium) vs. the current Intel CPUs (based on P3/P4 architecture) which dynamically allocate resources depending on runtime knowledge.

Runtime information can help compilers. Just look at profile guided optimizations in current static compilers.

The real trouble in JIT compilers is usually that the target languages semantics are very high level. For example an integer in C is machine sized and is not expanded in size to fit its value -- unlike some dynamic languages.

nl · on Dec 7, 2010

http://weblogs.java.net/blog/2008/03/30/deep-dive-assembly-c...

This link doesn't quite give what you are after (it's mostly about static compilation in the Java HotSpot compiler), but I believe the lock elision features (http://www.ibm.com/developerworks/java/library/j-jtp10185/in...) have to be done at runtime in the JVM (because of late binding).

Obviously this doesn't totally invalidate your argument ("except in the cases where runtime code loading is used"), but it is worth noting that in many languages late binding is normal, and so this is the general case.

Also, HP's research Dynamo project "inadvertently" became practical. Programs "interpreted" by Dynamo are often faster than if they were run natively. Sometimes by 20% or more. http://arstechnica.com/reviews/1q00/dynamo/dynamo-1.html

mikemike · on Dec 8, 2010

This is a common misinterpretation of the Dynamo paper: they compiled their C code at the _lowest_ optimization level and then ran the (suboptimal) machine code through Dynamo. So there was actually something left to optimize.

Think about it this way: a 20% difference isn't unrealistic if you compare -O1 vs. -O3.

But it's completely unrealistic to expect a 20% improvement if you'd try this with the machine code generated by a modern C compiler at the highest optimization level.

kragen · on Dec 8, 2010

I think http://shootout.alioth.debian.org/u32/benchmark.php?test=all... is probably a better ceiling. LuaJIT is fantastic and already generates better code than GCC in some cases, but in many others it does not.

There's no theoretical limit to how close a compiler can come to a programmer when it comes to generating machine code to do a particular well-defined task.

seanalltogether · on Dec 7, 2010

Javascript execution is very rarely the bottleneck on webpages. The bottleneck is almost 90% render speed of dom updates.