Acoustic scene analysis can be used to extract relevant information in applications such as homeland security, surveillance and environmental monitoring. Wireless sensor networks have been of particular interest in monitoring acoustic scenes. Sensors embedded in such a network typically operate under several constraints such as low power and limited bandwidth. In this paper, we consider resource-efficient acoustic sensing tasks that extract and transmit relevant information to a central station where information assessment can be conducted. We propose a series of acoustic scene analysis tasks that are performed in a hierarchical manner. Hierarchical tasks include sound and speech discrimination, estimation of the number of speakers from the acquired sound, gender and emotional state, and ultimately voice monitoring and key word spotting. We apply support vector machine and Gaussian mixture model algorithms on sound features. A real-time implementation is accomplished using Crossbow motes interfaced with a TI DSP board. A series of experiments are presented to characterize the performance of the algorithms under different conditions.