Power optimization of combinational circuits mapped on LUT-based FPGAS.
Bucur, Ion ; Stefanescu, Costin ; Cupcea, Nicolae 等
1. INTRODUCTION
Power consumption is becoming one of the most important
considerations in VLSI design. High power often run hot and high
temperature tends to exacerbate several silicon failure mechanisms. In
90nm technology, dynamic power is still the leading power cause in
FPGAs. Therefore, in addition to performance and area optimization a
great deal of research has been directed towards issues related to the
low power FPGA circuit mapping. Primary focus in this paper is on
searching optimized solutions primarily for depth, and dynamic power at
the logic level. Secondary target is on minimal depth, area and dynamic
power at the same level. It is introduced a new power optimization
mapper tool for K-LUT based FPGA.
2. RELATED WORK
There have been done several works for decreasing the power
consumption in circuits mapped with FPGAs. Farrahi and Sarrafzadeh
studied the technology mapping problem for lookup table-based FPGAs
(Farrahi & Sarrafzadeh, 1994). The problem is formulated as
assigning LUTs to nodes of a circuit so as to minimize the total
estimated power consumption. They did show that the decision version of
this problem is NP-complete, even for simple classes of inputs such as
3-level circuits. Anderson and Najm proposed a power-aware technology
mapping technique for LUT-based FPGAs which aims to keep nets with high
switching activity out of the FPGA routing network and takes an
activity-conscious approach to logic replication (Anderson & Najm,
2004).
3. FPGA SPECIFIC POWER ISSUES
FPGAs circuits are developed in CMOS technology. Power dissipation in CMOS circuits comprises both static (leakage) power and dynamic
(functional and spurious) power. Dynamic power is caused by transitions
on signals of a circuit and is governed by the equation:
[P.sub.d] = 0.5 [[summation].sub.i] [C.sub.i] x [V.sup.2.sub.dd] x
d(i) (1)
Where [C.sub.i] is the capacitance of a signal, i; d(i) is referred
to as "transition activity" of logic signal i and represents
the rate of transitions on signal i (i.e. the number of times that
signal i changes its value in unit time) ; and [V.sub.dd] is the supply
voltage. The programmability and flexibility of FPGAs are making them
less power-efficient than custom ASICs when considering the
implementation of a given logic circuit (Ho et al., 2008).
4. DEFINITIONS AND MAIN MODEL
Combinational part of a general logic network N can be represented
as a direct acyclic graph (DAG) noted as N(V, E) where V is the set of
nodes and E is the set of directed edges. Each node in N(V, E)
represents a logical gate (possible complex), and a direct edge (u, v)
exists if the output of gate u is an input of the gate v. The set of
direct predecessors of gate v is expressed as input(v) and the set of
direct predecessors of a graph H [subset or equal to] N(V, E), is
similar expressed as input(H). The level of each node u is the largest
level of the predecessors of it, incremented. The level of a PI node is
zero and the level (depth) of a network is the largest node level in the
network. Various approaches to computing dynamic switching activity have
been proposed in the literature, and they can be, considered as either
simulation-based approaches, or as probability approaches. Probability
approach, in networks where there are re-convergent fanouts, is
computing an upper bound of line probability. This technique introduces
errors and do not detect logic spurious activity. In this paper is used
simulation-based approach for 15 combinational circuits of the MCNC benchmark. It was designed and implemented the simulator used for
simulation-based logic activity analysis. This simulator has
capabilities for capturing the number of logic transitions on each net
during simulation, as well as the proportion of time each net spends in
the high and low logic states. Circuits are simulated using almost 10
000 randomly generated input vectors. Each time an internal line u is
incrementing the logic activity (the number of changes from 1 to 0 or
from 0 to 1), it is evaluated the expression:
[absolute value of n(u)/N - n(u) + 1/N+M] [less than or equal to]
[epsilon] (2)
In (2) was noted with N the number of simulated vectors after that
the number of logic changes, on an arbitrary line u, became n(u). Let be
N + M the number of simulated vectors when the number of logic changes,
on line w, is incremented (n(u) + 1). From (2) it results that:
[lim.sub.N.[less than or equal to][infinity]] n(u)/N = d(u) (3)
Parameter e (1 >> [epsilon] >0) in (2) is introduced at
beginning of the simulation and it represents the precision of
simulated-based logic activity determination. Simulator is checking when
all internal lines and primary output lines of the circuit fulfill
condition (2) and ends the simulation. Computed dynamic activity is
based on identical switching activities (0.5) for each primary input. In
K-LUT based FPGAs circuits the essential dynamic power consumption is
caused by transitions that take place at the inputs and outputs of LUTs.
Nets having the greatest transition density have to be, mostly, hidden
in LUTs.
5. IMPLEMENTD ALGORITHM
The application, named PwOptMap, is built-up using structures and
routines of SIS-1.2 (*** 1994) and appropriate cost functions for
power-aware minimal depth mapping. It was used the K-feasible-cones
generation method (using K =5). The K-feasible cones generation is
applyed during network traverse from primary inputs to primary outputs
(Bucur 2007).
Next step, same network is traversed from primary outputs to the
primary inputs, starting with the primary outputs having largest mapped
depth. The apropriate selection among the K-feasible cones of each node
is guided using critical path and several appropriate costs. These costs
are using data related to the number of internal nodes of each cone, the
count of internal nodes having fan-outs in other feasible cones, etc.
Depth Metric (DpthMtr) for each node u is computed over the best
depth K-feasible cone of u (denoted c(u)):
DpthMtr(c(u)) = 1 + min (ppthMtr(v|v [member of] c(u))) (4)
This metric is used to quantify the network depth.
The dissipated power is quantified using the estimated power cost
(EPC). It is locally applied for the entire set of K-feasible cones
(denoted C(u)), of each node u and it is computed using the expression:
EPC(c(u))= [min.sub.c(u)[epsilon]c(u){[[summation].sub.v[member
of]Input(c(u))] d(v) x fanout(v) x EPC(C(v))} (5)
Globally used cost, of each node u, is defined as a weight sum of
the depth metric and of the estimated power cost:
globalCost = [[mu].sub.1] x DpthMtr{c{u)) + [[mu].sub.2] x
EPC(c(u)) (6)
6. EXPERIMENTAL RESULTS
The two parameters, [[mu].sub.1] and [[mu].sub.2], in (6) were
experimentally determined. Best results were obtained when [[mu].sub.2]
<< [[mu].sub.1] reflecting the preference for optimizing depth
over power. Our implemented algorithm did run for mapping into 5-LUT
FPGAs several benchmark circuits, listed in Tab. 1. To estimate power
consumption using (1) it is required the capacitance of each net. Our
attempt was to build-up a tool able to evaluate medium-grain different
mapping choices during logic optimization, the estimated dynamic power
(EDP) for each node u was simply computed as the product of the
node's transition density d(u) and the fan-out of it:
EDP(u) = d(u) x fanoutfu) (7)
PwOptMap is an efficient algorithm being able to compute several
low-power optimal options, as can be seen in Table 1. The first option
keeps optimum depth and finds power-aware equivalent solutions.
The second option is searching, on the user's option, one of
the solutions with optimal depth but performing with improved power
consumption. For nodes situated on the critical paths of irrespective
networks the optimal depth was computed using this relation:
[optimalDepth(u).sub.u[member of]crltlcalPath] = optimumDepth +
[delta] (8)
Values listed in the second column of Tab. 1 were computed using
[delta] =1 for nodes belonging to the critical path, while for other
nodes, not situated on critical paths, the optimal depth values were at
most less or equal to the optimal depth of the circuit.
Area minimization is of vital importance for FPGA mapping. Mapping
with optimum depth is searched a power-aware solution having minimal
area (number of used LUTs). The third solution targets an optimal area
and depth while keeping in low margin the dissipated power (illustrated
in the third column of Tab. 1).
7. CONCLUSION AND FUTURE RESEARCH
On average, the detailed experimental results are showing that
power-aware mapping for optimal depth the estimated dissipated power is
7.54% less than mapping for optimum depth. Concluding, relaxing mapping
conditions for circuits' depth it is leading to less dissipated
power. Future research will include implementation of elaborate
capacitance models (Anderson & Najm, 2004) and improved cost
functions.
8. REFERENCES
Anderson, J.H. & Najm, F.N. (2004). Power Estimation Techniques
for FPGAs. IEEE Tran. on VLSI Systems, Vol. 12, No. 10, (Oct. 2004), pp.
1015-1027, ISSN: 1063-8210
Bucur, I. (2007). Performance mapping of k-LUT FPGAs. University
Politehnica of Bucharest Scientific Bulletin, Series C: Electrical
Engineering, Vol. 69, No. 2, (2007), pp. 49-60, ISSN: 1454-234x
Farrahi, A.H. & Sarrafzadeh, M. (1994). FPGA technology mapping
for power minimization, In: Field-Programmable Logic Architectures,
Synthesis and Applications, Hartenstein, R.W. & Servit, M.Z., (Eds),
pp.66-77, Springer-Verlag, ISBN:3-540-58419-6, Berlin / Heidelberg
Ho, C.H.; Leong, P.H.W.; Luk W. & Wilton, S.J.E. (2008). Rapid
estimation of Power Consumption for Hybrid FPGAs, Available from:
http://www.cse.cuhk.edu.hk/~phwl
/mt/public/archives/papers/hpwr_fpl08.pdf Accessed: 2009-07-14
*** (1994). SIS-1.2, Synthesis of both synchronous and asynchronous sequential circuits, Available from:
http://embedded.eecs.berkeley.edu/pubs/downloads/sis/inde x.htm,
Accessed on: 2009-07-14
Tab. 1. Experimental results of PwOptMap mapping tool
Estimated Dynamic Power
Circuit Depth Optimum Optimal Optimal Depth
Depth Depth & Optimal Area
5xp1 3 2,92 2,01 1,56
9symml 5 3,89 2,91 2,65
C499 5 10,76 8,22 7,96
C880 8 16,69 14,67 14,04
alu2 8 14,05 12,89 12,25
apex6 4 22,62 20,58 20,16
apex7 4 10,45 8,92 8,43
count 3 3,52 3,04 2,50
des 5 99,67 96,11 97,44
duke2 4 9,21 8,64 8,93
misex1 2 2,21 2,11 2,19
rd84 4 4,58 4,12 4,47
rot 6 35,51 35,03 34,73
vg2 4 4,32 3,01 3,95
z4ml 3 1,49 1,38 1,32
241,89 223,64 222,58