Network processor architectures incorporate block multi-threading to alleviate the performance degradation due to memory access latencies. Application design on such architectures requires the determination of the number of threads, and mapping of data items to the various memory elements such that the overall throughput is maximized. The paper presents a quasi-polynomial time approximation algorithm for the multi-threading aware data mapping problem which can be shown to be NP complete. The algorithm generates solutions with throughput no less than 1/2(1+ε) of optimal and data memory requirements no more than (1 + ε) times the memory constraints. Experimental results obtained by mapping applications on the Intel IXP 2400 network processor demonstrate that the algorithm is able to generate solutions whose throughput is within 80% of the optimal when ε = 0.5.