In this paper, we propose several design approaches to extend useful voltage scaling (i.e. voltage scaling with net energy savings) beyond the conventional limit, which is imposed by the rapid increase of leakage energy overhead in ultra low voltage regimes. We are able to achieve such extra voltage scaling and thus energy savings without compromising performance and variability through minimizing the ratio of leakage to dynamic energy in a circuit. Novel design approaches in pipeline, clocking and architecture optimization are investigated; and applied during the design of a 16b 1024pt complex FFT core. The measurement results from the prototyped FFT core in a 65nm CMOS show the energy consumption of 15.8nF/FFF with the clock frequency of 30MHz and the throughput of 240Msamples/s at the supply voltage of 270mV, which exhibits 2.4× higher energy efficiency and >10× higher throughput than the previous low power FFT designs. Measurement of 60 dies shows modest frequency and energy σ/μ spreads of 7% and 2%, respectively.