Three-dimensional (3D) ultrasound is becoming common for non-invasive medical imaging because of its high accuracy, safety, and ease of use. Unlike other modalities, ultrasound transducers require little power, which makes hand-held imaging platforms possible, and several low-resolution 2D devices are commercially available today. However, the extreme computational requirements (and associated power requirements) of 3D ultrasound image formation has, to date, precluded hand-held 3D capable devices. We describe the Sonic Millip3De, a new system architecture and accelerator for 3D ultrasound beamformationthe most computationally intensive aspect of image formation. Our three-layer die-stacked design features a custom beamsum accelerator that employs massive data parallelism and a streaming transform-select-reduce pipeline architecture enabled by our new iterative beamsum delay calculation algorithm. Based on RTL-level design and floorplanning for an industrial 45nm process, we show Sonic Millip3De can enable 3D ultrasound with a fully sampled 128×96 transducer array within a 16W full-system power budget (400× less than a conventional DSP solution) and will meet a 5W safe power target by the 11nm node.