In recent years, the mobile phone industry has become one of the most dynamic technology sectors. The increasing demands of multimedia services on the cellular networks have accelerated this trend. This paper presents a low power SIMD architecture that has been tailored for efficient implementation of H.264 encoder/decoder kernel algorithms. Several customized features have been added to improve the processing performance and lower the power consumption. These include support for different SIMD widths to increase the SIMD utilization efficiency, diagonal memory organization to support both column and row access, temporary buffer and bypass support to reduce the register file power consumption, fused operation support to increase the processing performance, and a fast programmable crossbar to support complex data permutation patterns. The proposed architecture increases the throughput of H.264 encoder/decoder kernel algorithms by a factor of 2.13 while achieving 29% of energy-delay improvement on average compared to our previous SIMD architecture, SODA.