摘要:As performance improvements are being increasingly sought via coarse-grained par-allelism, established expectations of continued sequential performance increases are notbeing met. Current trends in computing point toward platforms seeking performance im-provements through various degrees of parallelism, with coarse-grained parallelism featuresbecoming commonplace in even entry-level systems.Yet the broad variety of multiprocessor configurations that will be available that di.erin the numb er of pro cessing elements will make it di.cult to statically create a singleparallel version of a program that performs well on the whole range of such hardware. Asa result, there will so on be a vast number of multipro cessor systems that are significantlyunder-utilized for lack of software that harnesses their power e.ectively. This problem isexacerbated by the growing inventory of legacy programs in binary executable form withpossibly unreachable source code.We present a system that improves the p erformance of optimized sequential binariesthrough dynamic recompilation. Leveraging observations made at runtime, a thin soft-ware layer recompiles executing code compiled for a unipro cessor and generates paral-lelized and/or vectorized code segments that exploit available parallel resources. Amongthe techniques employed are control speculation, lo op distribution across several threads,and automatic parallelization of recursive routines.Our solution is entirely software-based and can be ported to existing hardware platformsthat have parallel processing capabilities. Our p erformance results are obtained on realhardware without using simulation.In preliminary benchmarks on only modestly parallel (2-way) hardware, our system al-ready provides speedups of up to 40% on SpecCPU benchmarks, and near-optimal sp eedupson more obviously parallelizable benchmarks