Time Coherent Full-Body Poses Estimated Using Only Five Inertial Sensors: Deep versus Shallow Learning

Frank J. Wouda, Matteo Giuberti, Nina Rudigkeit, Bert-Jan F. van Beijnum, Mannes Poel, Peter H. Veltink

Research output: Contribution to journalArticleAcademicpeer-review

Abstract

Full-body motion capture typically requires sensors/markers to be placed on each rigid body segment, which results in long setup times and is obtrusive. The number of sensors/markers can be reduced using deep learning or offline methods. However, this requires large training datasets and/or sufficient computational resources. Therefore, we investigate the following research question: "What is the performance of a shallow approach, compared to a deep learning one, for estimating time coherent full-body poses using only five inertial sensors?". We propose to incorporate past/future inertial sensor information into a stacked input vector, which is fed to a shallow neural network for estimating full-body poses. Shallow and deep learning approaches are compared using the same input vector configurations. Additionally, the inclusion of acceleration input is evaluated. The results show that a shallow learning approach can estimate full-body poses with a similar accuracy (~6 cm) to that of a deep learning approach (~7 cm). However, the jerk errors are smaller using the deep learning approach, which can be the effect of explicit recurrent modelling. Furthermore, it is shown that the delay using a shallow learning approach (72 ms) is smaller than that of a deep learning approach (117 ms).

Original languageEnglish
Number of pages17
JournalSensors (Basel, Switzerland)
Volume19
DOIs
Publication statusPublished - 27 Aug 2019

Fingerprint

learning
Learning
sensors
Sensors
markers
estimating
Deep learning
rigid structures
Neural networks
resources
education
inclusions
estimates
configurations
Research

Keywords

  • deep learning
  • human movement
  • inertial motion capture
  • LSTM
  • machine learning
  • neural networks
  • pose estimation
  • reduced sensor set
  • time coherence

Cite this

@article{ab883fa903ac41fd97443661a9773b49,
title = "Time Coherent Full-Body Poses Estimated Using Only Five Inertial Sensors: Deep versus Shallow Learning",
abstract = "Full-body motion capture typically requires sensors/markers to be placed on each rigid body segment, which results in long setup times and is obtrusive. The number of sensors/markers can be reduced using deep learning or offline methods. However, this requires large training datasets and/or sufficient computational resources. Therefore, we investigate the following research question: {"}What is the performance of a shallow approach, compared to a deep learning one, for estimating time coherent full-body poses using only five inertial sensors?{"}. We propose to incorporate past/future inertial sensor information into a stacked input vector, which is fed to a shallow neural network for estimating full-body poses. Shallow and deep learning approaches are compared using the same input vector configurations. Additionally, the inclusion of acceleration input is evaluated. The results show that a shallow learning approach can estimate full-body poses with a similar accuracy (~6 cm) to that of a deep learning approach (~7 cm). However, the jerk errors are smaller using the deep learning approach, which can be the effect of explicit recurrent modelling. Furthermore, it is shown that the delay using a shallow learning approach (72 ms) is smaller than that of a deep learning approach (117 ms).",
keywords = "deep learning, human movement, inertial motion capture, LSTM, machine learning, neural networks, pose estimation, reduced sensor set, time coherence",
author = "Wouda, {Frank J.} and Matteo Giuberti and Nina Rudigkeit and {van Beijnum}, {Bert-Jan F.} and Mannes Poel and Veltink, {Peter H.}",
year = "2019",
month = "8",
day = "27",
doi = "10.3390/s19173716",
language = "English",
volume = "19",
journal = "Sensors (Switserland)",
issn = "1424-8220",
publisher = "Multidisciplinary Digital Publishing Institute",

}

Time Coherent Full-Body Poses Estimated Using Only Five Inertial Sensors : Deep versus Shallow Learning. / Wouda, Frank J.; Giuberti, Matteo; Rudigkeit, Nina; van Beijnum, Bert-Jan F.; Poel, Mannes; Veltink, Peter H.

In: Sensors (Basel, Switzerland), Vol. 19, 27.08.2019.

Research output: Contribution to journalArticleAcademicpeer-review

TY - JOUR

T1 - Time Coherent Full-Body Poses Estimated Using Only Five Inertial Sensors

T2 - Deep versus Shallow Learning

AU - Wouda, Frank J.

AU - Giuberti, Matteo

AU - Rudigkeit, Nina

AU - van Beijnum, Bert-Jan F.

AU - Poel, Mannes

AU - Veltink, Peter H.

PY - 2019/8/27

Y1 - 2019/8/27

N2 - Full-body motion capture typically requires sensors/markers to be placed on each rigid body segment, which results in long setup times and is obtrusive. The number of sensors/markers can be reduced using deep learning or offline methods. However, this requires large training datasets and/or sufficient computational resources. Therefore, we investigate the following research question: "What is the performance of a shallow approach, compared to a deep learning one, for estimating time coherent full-body poses using only five inertial sensors?". We propose to incorporate past/future inertial sensor information into a stacked input vector, which is fed to a shallow neural network for estimating full-body poses. Shallow and deep learning approaches are compared using the same input vector configurations. Additionally, the inclusion of acceleration input is evaluated. The results show that a shallow learning approach can estimate full-body poses with a similar accuracy (~6 cm) to that of a deep learning approach (~7 cm). However, the jerk errors are smaller using the deep learning approach, which can be the effect of explicit recurrent modelling. Furthermore, it is shown that the delay using a shallow learning approach (72 ms) is smaller than that of a deep learning approach (117 ms).

AB - Full-body motion capture typically requires sensors/markers to be placed on each rigid body segment, which results in long setup times and is obtrusive. The number of sensors/markers can be reduced using deep learning or offline methods. However, this requires large training datasets and/or sufficient computational resources. Therefore, we investigate the following research question: "What is the performance of a shallow approach, compared to a deep learning one, for estimating time coherent full-body poses using only five inertial sensors?". We propose to incorporate past/future inertial sensor information into a stacked input vector, which is fed to a shallow neural network for estimating full-body poses. Shallow and deep learning approaches are compared using the same input vector configurations. Additionally, the inclusion of acceleration input is evaluated. The results show that a shallow learning approach can estimate full-body poses with a similar accuracy (~6 cm) to that of a deep learning approach (~7 cm). However, the jerk errors are smaller using the deep learning approach, which can be the effect of explicit recurrent modelling. Furthermore, it is shown that the delay using a shallow learning approach (72 ms) is smaller than that of a deep learning approach (117 ms).

KW - deep learning

KW - human movement

KW - inertial motion capture

KW - LSTM

KW - machine learning

KW - neural networks

KW - pose estimation

KW - reduced sensor set

KW - time coherence

UR - http://www.scopus.com/inward/record.url?scp=85071625409&partnerID=8YFLogxK

U2 - 10.3390/s19173716

DO - 10.3390/s19173716

M3 - Article

VL - 19

JO - Sensors (Switserland)

JF - Sensors (Switserland)

SN - 1424-8220

ER -