当前视图
Our implementation does not extend at all side exits. It extends
only if the side exit is for a control-flow branch, and only if the side
exit does not leave the loop. In particular we do not want to extend
a trace tree along a path that leads to an outer loop, because we
want to cover such paths in an outer tree through tree
nesting
.
3.3 Blacklisting
Sometimes, a program follows a path that cannot be compiled
into a trace, usually because of limitations in the implementation.
TraceMonkey does not currently support recording throwing and
catching of arbitrary exceptions. This design trade off was chosen,
because exceptions are usually rare in JavaScript. However, if a
program opts to use exceptions intensively, we would suddenly
incur a punishing runtime overhead if we repeatedly try to record
a trace for this path and repeatedly fail to do so, since we abort
tracing every time we observe an exception being thrown.
As a result, if a hot loop contains traces that always fail, the VM
could potentially run much more slowly than the base interpreter:
the VM repeatedly spends time trying to record traces, but is never
able to run any. To avoid this problem, whenever the VM is about
to start tracing, it must try to predict whether it will finish the trace.
Our prediction algorithm is based on
blacklisting
traces that
have been tried and failed. When the VM fails to finish a trace start-
ing at a given point, the VM records that a failure has occurred. The
VM also sets a counter so that it will not try to record a trace starting
at that point until it is passed a few more times (32 in our imple-
mentation). This
backoff
counter gives temporary conditions that
prevent tracing a chance to end. For example, a loop may behave
differently during startup than during its steady-state execution. Af-
ter a given number of failures (2 in our implementation), the VM
marks the fragment as blacklisted, which means the VM will never
again start recording at that point.
After implementing this basic strategy, we observed that for
small loops that get blacklisted, the system can spend a noticeable
amount of time just finding the loop fragment and determining that
it has been blacklisted. We now avoid that problem by patching the
bytecode. We define an extra no-op bytecode that indicates a loop
header. The VM calls into the trace monitor every time the inter-
preter executes a loop header no-op. To blacklist a fragment, we
simply replace the loop header no-op with a regular no-op. Thus,
the interpreter will never again even call into the trace monitor.
There is a related problem we have not yet solved, which occurs
when a loop meets all of these conditions:
The VM can form at least one root trace for the loop.
There is at least one hot side exit for which the VM cannot
complete a trace.
The loop body is short.
In this case, the VM will repeatedly pass the loop header, search
for a trace, find it, execute it, and fall back to the interpreter.
With a short loop body, the overhead of finding and calling the
trace is high, and causes performance to be even slower than the
basic interpreter. So far, in this situation we have improved the
implementation so that the VM can complete the branch trace.
But it is hard to guarantee that this situation will never happen.
As future work, this situation could be avoided by detecting and
blacklisting loops for which the average trace call executes few
bytecodes before returning to the interpreter.
4. Nested Trace Tree Formation
Figure 7 shows basic trace tree compilation (11) applied to a nested
loop where the inner loop contains two paths. Usually, the inner
loop (with header at
i
2
) becomes hot first, and a trace tree is rooted
at that point. For example, the first recorded trace may be a cycle
T
T
runk T
r
ace
T
r
ee Anchor
T
r
ace Anchor
Br
anch T
r
ace
Guar
d
Side Exit
Figure 5.
A tree with two traces, a trunk trace and one branch
trace. The trunk trace contains a guard to which a branch trace was
attached. The branch trace contain a guard that may fail and trigger
a side exit. Both the trunk and the branch trace loop back to the tree
anchor, which is the beginning of the trace tree.
T
r
ace 2
T
r
ace 1
T
r
ace 2
T
r
ace 1
Closed
Link
ed
Link
ed
Link
ed
Number
Number
String
String
String
String
Boolean
T
r
ace 2
T
r
ace 1
T
r
ace 3
Link
ed
Link
ed
Link
ed
Closed
Number
Number
Number
Boolean
Number
Boolean
Number
Boolean
(a)
(b)
(c)
Figure 6.
We handle type-unstable loops by allowing traces to
compile that cannot loop back to themselves due to a type mis-
match. As such traces accumulate, we attempt to connect their loop
edges to form groups of trace trees that can execute without having
to side-exit to the interpreter to cover odd type cases. This is par-
ticularly important for nested trace trees where an outer tree tries to
call an inner tree (or in this case a forest of inner trees), since inner
loops frequently have initially undefined values which change type
to a concrete value after the first iteration.
through the inner loop,
{
i
2
, i
3
, i
5
, α
}
. The
α
symbol is used to
indicate that the trace loops back the tree anchor.
When execution leaves the inner loop, the basic design has two
choices. First, the system can stop tracing and give up on compiling
the outer loop, clearly an undesirable solution. The other choice is
to continue tracing, compiling traces for the outer loop inside the
inner loop’s trace tree.
For example, the program might exit at
i
5
and record a branch
trace that incorporates the outer loop:
{
i
5
, i
7
, i
1
, i
6
, i
7
, i
1
, α
}
.
Later, the program might take the other branch at
i
2
and then
exit, recording another branch trace incorporating the outer loop:
{
i
2
, i
4
, i
5
, i
7
, i
1
, i
6
, i
7
, i
1
, α
}
. Thus, the outer loop is recorded and
compiled twice, and both copies must be retained in the trace cache.
i
2
i
3
i
4
i
5
i
1
i
6
i
7
t
1
t
2
T
r
ee Call
Out
er T
r
ee
Nes
t
ed T
r
ee
Exit Guar
d
(a)
(b)
Figure 7.
Control flow graph of a nested loop with an if statement
inside the inner most loop (a). An inner tree captures the inner
loop, and is nested inside an outer tree which “calls” the inner tree.
The inner tree returns to the outer tree once it exits along its loop
condition guard (b).
In general, if loops are nested to depth
k
, and each loop has
n
paths
(on geometric average), this na
̈
ıve strategy yields
O
(
n
k
)
traces,
which can easily fill the trace cache.
In order to execute programs with nested loops efficiently, a
tracing system needs a technique for covering the nested loops with
native code without exponential trace duplication.
4.1 Nesting Algorithm
The key insight is that if each loop is represented by its own trace
tree, the code for each loop can be contained only in its own tree,
and outer loop paths will not be duplicated. Another key fact is that
we are not tracing arbitrary bytecodes that might have irreduceable
control flow graphs, but rather bytecodes produced by a compiler
for a language with structured control flow. Thus, given two loop
edges, the system can easily determine whether they are nested
and which is the inner loop. Using this knowledge, the system can
compile inner and outer loops separately, and make the outer loop’s
traces
call
the inner loop’s trace tree.
The algorithm for building nested trace trees is as follows. We
start tracing at loop headers exactly as in the basic tracing system.
When we exit a loop (detected by comparing the interpreter PC
with the range given by the loop edge), we stop the trace. The
key step of the algorithm occurs when we are recording a trace
for loop
L
R
(
R
for loop being recorded) and we reach the header
of a different loop
L
O
(
O
for other loop). Note that
L
O
must be an
inner loop of
L
R
because we stop the trace when we exit a loop.
If
L
O
has a type-matching compiled trace tree, we call
L
O
as
a nested trace tree. If the call succeeds, then we record the call
in the trace for
L
R
. On future executions, the trace for
L
R
will
call the inner trace directly.
If
L
O
does not have a type-matching compiled trace tree yet,
we have to obtain it before we are able to proceed. In order
to do this, we simply abort recording the first trace. The trace
monitor will see the inner loop header, and will immediately
start recording the inner loop.
2
If all the loops in a nest are type-stable, then loop nesting creates
no duplication. Otherwise, if loops are nested to a depth
k
, and each
2
Instead of aborting the outer recording, we could principally merely sus-
pend the recording, but that would require the implementation to be able
to record several traces simultaneously, complicating the implementation,
while saving only a few iterations in the interpreter.
i
2
i
3
i
1
i
6
i
4
i
5
t
2
t
1
t
4
Exit Guar
d
Nes
t
ed T
r
ee
Figure 8.
Control flow graph of a loop with two nested loops (left)
and its nested trace tree configuration (right). The outer tree calls
the two inner nested trace trees and places guards at their side exit
locations.
loop is entered with
m
different type maps (on geometric average),
then we compile
O
(
m
k
)
copies of the innermost loop. As long as
m
is close to 1, the resulting trace trees will be tractable.
An important detail is that the call to the inner trace tree must act
like a function call site: it must return to the same point every time.
The goal of nesting is to make inner and outer loops independent;
thus when the inner tree is called, it must exit to the same point
in the outer tree every time with the same type map. Because we
cannot actually guarantee this property, we must guard on it after
the call, and side exit if the property does not hold. A common
reason for the inner tree not to return to the same point would
be if the inner tree took a new side exit for which it had never
compiled a trace. At this point, the interpreter PC is in the inner
tree, so we cannot continue recording or executing the outer tree.
If this happens during recording, we abort the outer trace, to give
the inner tree a chance to finish growing. A future execution of the
outer tree would then be able to properly finish and record a call to
the inner tree. If an inner tree side exit happens during execution of
a compiled trace for the outer tree, we simply exit the outer trace
and start recording a new branch in the inner tree.
4.2 Blacklisting with Nesting
The blacklisting algorithm needs modification to work well with
nesting. The problem is that outer loop traces often abort during
startup (because the inner tree is not available or takes a side exit),
which would lead to their being quickly blacklisted by the basic
algorithm.
The key observation is that when an outer trace aborts because
the inner tree is not ready, this is probably a temporary condition.
Thus, we should not count such aborts toward blacklisting as long
as we are able to build up more traces for the inner tree.
In our implementation, when an outer tree aborts on the inner
tree, we increment the outer tree’s blacklist counter as usual and
back off on compiling it. When the inner tree finishes a trace, we
decrement the blacklist counter on the outer loop, “forgiving” the
outer loop for aborting previously. We also undo the backoff so that
the outer tree can start immediately trying to compile the next time
we reach it.
5. Trace Tree Optimization
This section explains how a recorded trace is translated to an
optimized machine code trace. The trace compilation subsystem,
NANOJIT
, is separate from the VM and can be used for other
applications.