This paper presents an ultra low power programmable processor architecture for wireless devices that support 4G wireless communications and video decoding. To derive such an architecture, first we analyzed the kernel algorithms that constitute these applications. The characteristics of these algorithms helped define the wide-SIMD architecture, where the SIMD width can be configured at run time to the specifics of the algorithm being executed. For ultra low power operation, we advocate operating the processor at near threshold voltage. While a combination of near-threshold circuit techniques and parallel SIMD computations achieves excellent energy efficiency, near-threshold operations suffer from large delay variations due to increased process variability. The paper explores low overhead architectural techniques to tolerate and mitigate problems due to delay variations. The techniques include replication of SIMD functional units to replace faulty ones and use of an XRAM crossbar to efficiently set up the new error-free SIMD datapath.