Multiple-Input-Multiple-Output communication systems demand fast sphere decoding with high performance. We propose a high throughput soft-output fixed complexity sphere decoder (PFSD) that is parallel and has comparable performance to list fixed complexity sphere decoder (LFSD) and K-best sphere decoder. In addition, we propose a parallel QR decomposition algorithm to lower the preprocessing overhead, and a low complexity LLR algorithm to allow parallel update of LLR values. We demonstrate the BER and computation complexity advantages of the PFSD algorithm in a 4x4 16-QAM system. The PFSD algorithm has been mapped onto Xilinx XC4VLX160 FPGA. The resulting PFSD decoder can produce 8 candidate vectors per clock cycle, and achieve upto 75Mbps throughput for 4x4 64-QAM configuration at 100MHz with low control overhead.