Horizontally partitioned data caches are a popular architectural feature in which the processor maintains two or more data caches at the same level of hierarchy. Horizontally partitioned caches help reduce cache pollution and thereby improve performance. Consequently most previous research has focused on exploiting horizontally partitioned data caches to improve performance, and achieve energy reduction only as a byproduct of performance improvement. In constrast, in this paper we show that optimizing for performance trades-off several opportunities for energy reduction. Our experiments on a HP iPAQ h4300-like memory subsystem demonstrate that optimizing for energy consumption results in up to 127% less memory subsystem energy consumption than the performance optimal solution. Furthermore, we show that energy optimal solution incurs on average only 1.7% performance penalty. Therefore, with energy consumption becoming a first class design constraint, there is a need for compilation techniques aimed at energy reduction. To achieve aforementioned energy savings we propose and explore several low-complexity algorithms aimed at reducing the energy consumption and show that very simple greedy heuristics achieve 97% of the possible memory subsystem energy savings.