Natural language guided visual relationship detection

Wentong Liao, Bodo Rosenhahn, Ling Shuai, Michael Ying Yang

Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

23 Citations (Scopus)
251 Downloads (Pure)


Reasoning about the relationships between object pairs in images is a crucial task for holistic scene understanding. Most of the existing works treat this task as a pure visual classification task: each type of relationship or phrase is classified as a relation category based on the extracted visual features. However, each kind of relationships has a wide variety of object combination and each pair of objects has diverse interactions. Obtaining sufficient training samples for all possible relationship categories is difficult and expensive. In this work, we propose a natural language guided framework to tackle this problem. We propose to use a generic bi-directional recurrent neural network to predict the semantic connection between the participating objects in the relationship from the aspect of natural language. The proposed simple method achieves the state-of-the-art on the Visual Relationship Detection (VRD) and Visual Genome datasets, especially when predicting unseen relationships (e.g., recall improved from 76.42% to 89.79% on VRD zeroshot testing set).

Original languageEnglish
Title of host publicationProceedings - 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, CVPRW 2019
Number of pages10
ISBN (Electronic)9781728125060
Publication statusPublished - Jun 2019
Event32nd IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2019 - Long Beach, United States
Duration: 16 Jun 201920 Jun 2019
Conference number: 32

Publication series

NameIEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops
ISSN (Print)2160-7508
ISSN (Electronic)2160-7516


Conference32nd IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2019
Abbreviated titleCVPR 2019
Country/TerritoryUnited States
CityLong Beach


Dive into the research topics of 'Natural language guided visual relationship detection'. Together they form a unique fingerprint.

Cite this