The increasing demand for programmable platforms that enable high bandwidth communication traffic processing has led to the advent of chip multi-processor (CMP) based multi-threaded network processor (NP) architectures. The CMP based architectures include a multitude of heterogeneous memory units ranging from on-chip register banks, local data memories, and scratch pads to multiple banks of off-chip SRAM and DRAM. Implementation of applications on such complex CMP architectures involves mapping of functionality on processing units, and mapping of data items on the memory units with an objective of maximizing the throughput. This paper presents a system-level methodology that consists of a programming model and optimization techniques for solving the functionality and memory mapping problem on CMP based multi-threaded NP architectures. The proposed techniques are evaluated by implementing three representative NP applications on the Intel IXP2400 processor which belongs to the class of CMP based multi-threaded architectures.