Audio-visual object localization and separation using low-rank and sparsity

Jie Pu, Yannis Panagakis, Stavros Petridis, Maja Pantic

    Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

    38 Citations (Scopus)
    23 Downloads (Pure)

    Abstract

    The ability to localize visual objects that are associated with an audio source and at the same time seperate the audio signal is a corner stone in several audio-visual signal processing applications. Past efforts usually focused on localizing only the visual objects, without audio separation abilities. Besides, they often rely computational expensive pre-processing steps to segment images pixels into object regions before applying localization approaches. We aim to address the problem of audio-visual source localization and separation in an unsupervised manner. The proposed approach employs low-rank in order to model the background visual and audio information and sparsity in order to extract the sparsely correlated components between the audio and visual modalities. In particular, this model decomposes each dataset into a sum of two terms: the low-rank matrices capturing the background uncorrelated information, while the sparse correlated components modelling the sound source in visual modality and the associated sound in audio modality. To this end a novel optimization problem, involving the minimization of nuclear norms and matrix ℓ1-norms is solved. We evaluated the proposed method in 1) visual localization and audio separation and 2) visual-assisted audio denoising. The experimental results demonstrate the effectiveness of the proposed method.

    Original languageEnglish
    Title of host publication2017 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2017 - Proceedings
    PublisherIEEE
    Pages2901-2905
    Number of pages5
    ISBN (Electronic)9781509041176
    DOIs
    Publication statusPublished - 16 Jun 2017
    Event42nd IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2017 - New Orleans, United States
    Duration: 5 Mar 20179 Mar 2017
    Conference number: 42
    http://www.ieee-icassp2017.org/

    Conference

    Conference42nd IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2017
    Abbreviated titleICASSP
    Country/TerritoryUnited States
    CityNew Orleans
    Period5/03/179/03/17
    Internet address

    Keywords

    • Audio separation
    • Audiovisual localization
    • Low-rank
    • Multi-modal analysis
    • Sparsity

    Fingerprint

    Dive into the research topics of 'Audio-visual object localization and separation using low-rank and sparsity'. Together they form a unique fingerprint.

    Cite this