CNN–LSTM con mecanismo de atención suave para el reconocimiento de acciones humanas en videos
Resumen
Palabras clave
Referencias
I. Jegham, A. B. Khalifa, I. Alouani, and M. A. Mahjoub, “Vision-based human action recognition: An overview and real world challenges,” Forensic Science International: Digital Investigation, vol. 32, p. 200901, 2020. [Online]. Available: http://www.sciencedirect.com/science/article/pii/S174228761930283X
M. A. Khan, K. Javed, S. A. Khan, T. Saba, U. Habib, J. A. Khan, and A. A. Abbasi, “Human action recognition using fusion of multiview and deep features: an application to video surveillance,” Multimedia tools and applications, pp. 1-27, 2020.
J. Bao, M. Ye, and Y. Dou, “Mobile phone-based internet of things human action recognition for e-health,” in 2016 IEEE 13th International Conference on Signal Processing (ICSP). IEEE, 2016, pp. 957–962.
N. Jaouedi, N. Boujnah, O. Htiwich, and M. S. Bouhlel, “Human action recognition to human behavior analysis,” in 2016 7th International Conference on Sciences of Electronics, Technologies of Information and Telecommunications (SETIT). IEEE, 2016, pp. 263–266.
V. Bloom, D. Makris, and V. Argyriou, “G3d: A gaming action dataset and real time action recognition evaluation framework,” in 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops. IEEE, 2012, pp. 7–12.
I. Laptev, “On space-time interest points,” International journal of computer vision, vol. 64, no. 2-3, pp. 107–123, 2005.
C. G. Harris, M. Stephens et al., “A combined corner and edge detector.” in Alvey vision conference, vol. 15, no. 50. Citeseer, 1988, pp. 10–5244.
H. Wang, A. Kläser, C. Schmid, and C. Liu, “Action recognition by dense trajectories,” in CVPR 2011, June 2011, pp. 3169–3176.
H. Bay, A. Ess, T. Tuytelaars, and L. Van Gool, “Speeded-up robust features (surf),” Computer vision and image understanding, vol. 110, no. 3, pp. 346–359, 2008.
N. Dalal and B. Triggs, “Histograms of oriented gradients for human detection,” in 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), vol. 1, 2005, pp. 886–893 vol. 1.
J. Perš, V. Sulić, M. Kristan, M. Perše, K. Polanec, and S. Kovacic, “Histograms of optical flow for efficient representation of body motion,” Pattern Recognition Letters, vol. 31, no. 11, pp. 1369–1376, 2010.
H. Wang, M. M. Ullah, A. Klaser, I. Laptev, and C. Schmid, “Evaluation of local spatio-temporal features for action recognition,” in BMVC 2009 - British Machine Vision Conference, A. Cavallaro, S. Prince, and D. Alexander, Eds. London, United Kingdom: BMVA Press, Sep. 2009, pp. 124.1–124.11. [Online]. Available: https://hal.inria.fr/inria-00439769.
H. Wang, A. Kläser, C. Schmid, and C.-L. Liu, “Dense trajectories and motion boundary descriptors for action recognition,” International journal of computer vision, vol. 103, no. 1, pp. 60–79, 2013.
K. Hornik, “Approximation capabilities of multilayer feedforward networks,” Neural networks, vol. 4, no. 2, pp. 251 -257, 1991.
S. Ji, W. Xu, M. Yang, and K. Yu, “3d convolutional neural networks for human action recognition,” IEEE transactions on pattern analysis and machine intelligence, vol. 35, no. 1, pp. 221–231, 2013.
S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural Comput., vol. 9, no. 8, pp. 1735–1780, Nov. 1997. [Online]. Available: http://dx.doi.org/10.1162/neco.1997.9.8.1735.
Y. Ye and Y. Tian, “Embedding sequential information into spatiotemporal features for action recognition,” in 2016 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 2016, pp. 1110–1118.
B. Zhang, L. Wang, Z. Wang, Y. Qiao, and H. Wang, “Real-time action recognition with deeply transferred motion vector cnns,” IEEE Transactions on Image Processing, vol. 27, no. 5, pp. 2326–2339, 2018.
S. Sharma, R. Kiros, and R. Salakhutdinov, “Action recognition using visual attention,” CoRR, vol. abs/1511.04119, 2015. [Online]. Available: http://arxiv.org/abs/1511.04119.
M. A. Goodale and A. D. Milner, “Separate visual pathways for perception and action,” Trends in neurosciences, vol. 15, no. 1, pp. 20–25, 1992.
K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” arXiv 1409.1556, 09 2014.
C. Feichtenhofer, A. Pinz, and A. Zisserman, “Convolutional two-stream network fusion for video action recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 1933–1941.
K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 770–778.
C. Feichtenhofer, A. Pinz, and R. P. Wildes, “Spatiotemporal residual networks for video action recognition,” CoRR, vol. abs/1611.02155, 2016. [Online]. Available: http://arxiv.org/abs/1611.02155.
Y. Wang, S. Wang, J. Tang, N. O’Hare, Y. Chang, and B. Li, “Hierarchical attention network for action recognition in videos,” arXiv preprint arXiv:1607.06416, 2016.
X. Wang, A. Farhadi, and A. Gupta, “Actions˜ transformations,” in Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, 2016, pp. 2658–2667.
C. Feichtenhofer, A. Pinz, and A. Zisserman, “Convolutional two-stream network fusion for video action recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 1933–1941.
C. I. Orozco, M. E. Buemi, and J. J. Berlles, “Cnn-lstm architecture for action recognition in videos,” in I Simposio Argentino de Imágenes y Visión (SAIV 2019)-JAIIO 48 (Salta), 2019.
F. Chollet et al. (2015) Keras. [Online]. Available: https: //github.com/fchollet/keras
K. Xu, J. Ba, R. Kiros, K. Cho, A. Courville, R. Salakhudinov, R. Zemel, and Y. Bengio, “Show, attend and tell: Neural image caption generation with visual attention,” in International conference on machine learning, 2015, pp. 2048–2057.
H. Kuehne, H. Jhuang, E. Garrote, T. Poggio, and T. Serre, “HMDB: a large video database for human motion recognition,” in Proceedings of the International Conference on Computer Vision (ICCV), 2011.
K. Soomro, A. R. Zamir, and M. Shah, “UCF101: A dataset of 101 human actions classes from videos in the wild,” CoRR, vol. abs/1212.0402, 2012. [Online]. Available: http://arxiv.org/abs/1212.0402
M. Abadi, P. Barham, J. Chen, Z. Chen, A. Davis, J. Dean, M. Devin, S. Ghemawat, G. Irving, M. Isard et al., “Tensorflow: A system for large-scale machine learning,” in 12th {USENIX} Symposium on Operating Systems Design and Implementation ({OSDI} 16), 2016, pp. 265–283.
Y. Dauphin, H. de Vries, and Y. Bengio, “Rmsprop and equilibrated adaptive learning rates for non-convex optimization,” in NIPS, 2015.
Y.-G. Jiang, Q. Dai, X. Xue, W. Liu, and C.-W. Ngo, “Trajectory-based modeling of human actions with motion reference points,” in European Conference on Computer Vision. Springer, 2012, pp. 425–438.
A. Gaidon, Z. Harchaoui, and C. Schmid, “Activity representation with motion hierarchies,” International journal of computer vision, vol. 107, no. 3, pp. 219–238, 2014.
L. Meng, B. Zhao, B. Chang, G. Huang, F. Tung, and L. Sigal, “Where and when to look? spatio-temporal attention for action recognition in videos,” CoRR, vol. abs/1810.04511, 2018. [Online]. Available: http://arxiv.org/abs/1810.04511
X. Li, M. Xie, Y. Zhang, G. Ding, and W. Tong, “Dual attention convolutional network for action recognition,” IET Image Processing, vol. 14, no. 6, pp. 1059–1065, 2020.
M. Marszalek, I. Laptev, and C. Schmid, “Actions in context,” in 2009 IEEE Conference on Computer Vision and Pattern Recognition, June 2009, pp. 2929–2936.
K. K. Reddy and M. Shah, “Recognizing 50 human action categories of web videos,” Mach. Vision Appl., vol. 24, no. 5, pp. 971–981, Jul. 2013. [Online]. Available: http://dx.doi.org/10.1007/s00138-012-0450-4
H. Yang, C. Yuan, L. Zhang, Y. Sun, W. Hu, and S. J. Maybank, “Sta-cnn: convolutional spatial-temporal attention learning for action recognition,” IEEE Transactions on Image Processing, vol. 29, pp. 5783–5793, 2020.
R. Girdhar, J. Carreira, C. Doersch, and A. Zisserman, “Video action transformer network,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019, pp. 244–253.
DOI: https://doi.org/10.37537/rev.elektron.5.1.130.2021
Enlaces de Referencia
- Por el momento, no existen enlaces de referencia
Copyright (c) 2021 Carlos Ismael Orozco, María Elena Buemi, Julio Jacobo Berlles

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Revista elektron, ISSN-L 2525-0159
Facultad de Ingeniería. Universidad de Buenos Aires
Paseo Colón 850, 3er piso
C1063ACV - Buenos Aires - Argentina
revista.elektron@fi.uba.ar
+54 (11) 528-50889