Cache replacement algorithms have been widely used in modern computer systems to reduce the number of cache misses. The LRU algorithm has been shown to be an efficient replacement policy in terms of miss rates. However, most of the processors employ a block replacement algorithm which is very simple to implement in hardware or that is an approximation to the true LRU. In this paper, we propose a new implementation of block replacement algorithms in CPU caches by designing the circuitry required to implement an LRU replacement policy in set associative caches. We propose a simple and efficient architecture, Pseudo-FIFO, such that the true LRU replacement algorithm can be implemented without the disadvantages of the traditional implementations. Experimental results show that the Pseudo-FIFO significantly reduces the number of memory cells needed for hardware implementation. Simulation results reveal that our proposed architecture can provide an average value of 26% improvement in the chip area compared to "Reference Matrix" and "Basic Architecture" circuits. Furthermore, it operates about 2.4 times faster than other architectures.