Seeing with the sound: Sound-based human-context recognition using machine learning

Wei Wang

Research output: ThesisPhD Thesis - Research UT, graduation UT

112 Downloads (Pure)


This thesis presents a research on how to use sound to detect the contexts of human, i.e. what are people doing or how many people are there.
Many modern applications such as smart buildings, healthcare or retails heavily depend on these human-context information to make our lives better.
Take smart buildings as an example,
energy can be saved by turning lights off when no one is around, comfort can be improved when room temperature is automatically adjusted according to the activities of people.
Numerous sensors are substantially used in this domain, such as simple PIR sensors that detect human presence, camera systems that monitor and recognize very detailed human actions, wearable devices that track user movement and identity, etc. None of these sensors and technologies is perfect, and each has pros and cons in different aspects. For instance, PIR sensors are cheap and non-intrusive but only give binary presence information. Cameras can identify human activities info at a fine-grained level, but are more expensive, privacy-risky and require line-of-sight. Wearable devices are diverse, but all of which need careful maintenance and often let users feel intrusive and troublesome.

Our research addresses the above-mentioned challenges by using sound sensors to detect human-context information indoor.
Sound is everywhere and has many advantages such as rich in information, no line-of-sight problem, etc.
Audio sensors or microphones are also very suitable for indoor applications as they are cheap, small and easy to install.
On the other hand, sound also has obvious challenges such as noise interference and the overlap of multiple sounds.
In addition, sound-based applications in buildings may need some more considerations, such as privacy concerns and resource constraints of devices.
To study and address the impact of noises and overlapping sounds, our research is conducted on different scenarios and datasets, i.e. from clean sounds in quiet environments to overlapping sounds in crowded environments.
In order to tackle the challenges in different scenarios, several methodologies are carefully designed and compared.
Together with the performance or accuracy, we also compare the memory and time cost to show how they fit resource-constraint devices.
In our research, an innovative lightweight CNN-based model is also proposed to balance performance and complexity.
In some experiments, we strip the voice bands from audio inputs to explore the possibilities of using low privacy-risk data while still maintaining high accuracy.
Original languageEnglish
QualificationDoctor of Philosophy
Awarding Institution
  • University of Twente
  • Havinga, Paul J.M., Supervisor
Award date13 Oct 2021
Place of PublicationEnschede
Electronic ISBNs978-90-365-5293-6
Publication statusPublished - 13 Oct 2021


Dive into the research topics of 'Seeing with the sound: Sound-based human-context recognition using machine learning'. Together they form a unique fingerprint.

Cite this