Temporal Representation Learning on Monocular Videos for 3D Human Pose Estimation

Honari, Sina; Constantin, Victor; Rhodin, Helge; Salzmann, Mathieu; Fua, Pascal

Temporal Representation Learning on Monocular Videos for 3D Human Pose Estimation

Honari S, Constantin V, Rhodin H, Salzmann M, Fua P (2023)
IEEE Transactions on Pattern Analysis and Machine Intelligence 45(5): 6415-6427.

Zeitschriftenaufsatz | Veröffentlicht | Englisch

Download

Es wurden keine Dateien hochgeladen. Nur Publikationsnachweis!

URL

https://arxiv.org/pdf/2012.01511

DOI

https://doi.org/10.1109/TPAMI.2022.3215307

Autor*in

Honari, Sina; Constantin, Victor; Rhodin, Helge^UniBi ; Salzmann, Mathieu; Fua, Pascal

Einrichtung

Technische Fakultät > AG Visual AI for Extended Reality

Abstract / Bemerkung

In this article we propose an unsupervised feature extraction method to capture temporal information on monocular videos, where we detect and encode subject of interest in each frame and leverage contrastive self-supervised (CSS) learning to extract rich latent vectors. Instead of simply treating the latent features of nearby frames as positive pairs and those of temporally-distant ones as negative pairs as in other CSS approaches, we explicitly disentangle each latent vector into a time-variant component and a time-invariant one. We then show that applying contrastive loss only to the time-variant features and encouraging a gradual transition on them between nearby and away frames while also reconstructing the input, extract rich temporal features, well-suited for human pose estimation. Our approach reduces error by about 50% compared to the standard CSS strategies, outperforms other unsupervised single-view methods and matches the performance of multi-view techniques. When 2D pose is available, our approach can extract even richer latent features and improve the 3D pose estimation accuracy, outperforming other state-of-the-art weakly supervised methods.

Stichworte

IEEE Transactions on Pattern Analysis and Machine Intelligence

Erscheinungsjahr

2023

Zeitschriftentitel

IEEE Transactions on Pattern Analysis and Machine Intelligence

Band

Ausgabe

Seite(n)

6415-6427

ISSN

0162-8828

eISSN

2160-9292, 1939-3539

Page URI

https://pub.uni-bielefeld.de/record/2991918

Zitieren

Honari S, Constantin V, Rhodin H, Salzmann M, Fua P. Temporal Representation Learning on Monocular Videos for 3D Human Pose Estimation. IEEE Transactions on Pattern Analysis and Machine Intelligence. 2023;45(5):6415-6427.

Honari, S., Constantin, V., Rhodin, H., Salzmann, M., & Fua, P. (2023). Temporal Representation Learning on Monocular Videos for 3D Human Pose Estimation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(5), 6415-6427. https://doi.org/10.1109/TPAMI.2022.3215307

Honari, Sina, Constantin, Victor, Rhodin, Helge, Salzmann, Mathieu, and Fua, Pascal. 2023. “Temporal Representation Learning on Monocular Videos for 3D Human Pose Estimation”. IEEE Transactions on Pattern Analysis and Machine Intelligence 45 (5): 6415-6427.

Honari, S., Constantin, V., Rhodin, H., Salzmann, M., and Fua, P. (2023). Temporal Representation Learning on Monocular Videos for 3D Human Pose Estimation. IEEE Transactions on Pattern Analysis and Machine Intelligence 45, 6415-6427.

Honari, S., et al., 2023. Temporal Representation Learning on Monocular Videos for 3D Human Pose Estimation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(5), p 6415-6427.

S. Honari, et al., “Temporal Representation Learning on Monocular Videos for 3D Human Pose Estimation”, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 45, 2023, pp. 6415-6427.

Honari, S., Constantin, V., Rhodin, H., Salzmann, M., Fua, P.: Temporal Representation Learning on Monocular Videos for 3D Human Pose Estimation. IEEE Transactions on Pattern Analysis and Machine Intelligence. 45, 6415-6427 (2023).

Honari, Sina, Constantin, Victor, Rhodin, Helge, Salzmann, Mathieu, and Fua, Pascal. “Temporal Representation Learning on Monocular Videos for 3D Human Pose Estimation”. IEEE Transactions on Pattern Analysis and Machine Intelligence 45.5 (2023): 6415-6427.

Link(s) zu Volltext(en)

URL

https://arxiv.org/pdf/2012.01511

Access Level

Open Access

Export

Markieren/ Markierung löschen
Markierte Publikationen

Open Data PUB

Suchen in

Google Scholar

PUB - Publikationen an der Universität Bielefeld

Temporal Representation Learning on Monocular Videos for 3D Human Pose Estimation

Zitieren