Abstract
Detecting human-related crimes from surveillance videos poses an increasingly difficult challenge, especially when confronted with human actions that are relatively similar. In this work, we propose a transformer-based model that induces bias through the incorporation of a Tubelet embedder module-a 3D convolutional layer. The aim is to capture spatiotemporal embeddings from skeletal trajectories extracted from videos using 3D convolutional operations. Our experiments are conducted on the Human-Related Crime dataset, revealing that the use of tubelet embeddings maintains competitive performance (49% accuracy) to the state-of-the-art, while considerably reducing the computational complexity of the model.
Original language | English |
---|---|
Title of host publication | 2024 IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS) |
Publisher | IEEE |
ISBN (Electronic) | 9798350374285 |
DOIs | |
Publication status | Published - 18 Sept 2024 |
Event | 20th IEEE International Conference on Advanced Video and Signal-Based Surveillance, AVSS 2024 - Niagara Falls, Canada Duration: 15 Jul 2024 → 16 Jul 2024 Conference number: 20 |
Conference
Conference | 20th IEEE International Conference on Advanced Video and Signal-Based Surveillance, AVSS 2024 |
---|---|
Abbreviated title | AVSS 2024 |
Country/Territory | Canada |
City | Niagara Falls |
Period | 15/07/24 → 16/07/24 |
Keywords
- 2024 OA procedure