Comparison of Vitis-AI and FINN for implementing convolutional neural networks on FPGA
DOI:
https://doi.org/10.37537/rev.elektron.8.2.200.2024Keywords:
FPGA, CNN, FINN, Vitis-AI, QuantizationAbstract
Convolutional neural networks (CNNs) are essential for image classification and detection, and their implementation in embedded systems is becoming increasingly attractive due to their compact size and low power consumption. Field-Programmable Gate Arrays (FPGAs) have emerged as a promising option, thanks to their low latency and high energy efficiency. Vitis AI and FINN are two development environments that automate the implementation of CNNs on FPGAs. Vitis AI uses a deep learning processing unit (DPU) and memory accelerators, while FINN is based on a streaming architecture and fine-tunes parallelization. Both environments implement parameter quantization techniques to reduce memory usage. This work extends previous comparisons by evaluating both environments by implementing four models with different numbers of layers on the Xilinx Kria KV260 FPGA platform. The complete process from training to evaluation on FPGA, including quantization and hardware implementation, is described in detail. The results show that FINN provides lower latency, higher throughput, and better energy efficiency than Vitis AI. However, Vitis AI stands out for its simplicity in model training and ease of implementation on FPGA. The main finding of this study is that as the complexity of the models increases (with more layers in the neural networks), the differences in terms of performance and energy efficiency between FINN and Vitis AI are significantly reduced.Downloads
References
X. Zhao, L. Wang, Y. Zhang, X. Han, M. Deveci, and M. Parmar, “A
review of convolutional neural networks in computer vision,” Artificial
Intelligence Review, vol. 57, no. 4, pp. 1–43, 2024.
Z. Li, F. Liu, W. Yang, S. Peng, and J. Zhou, “A survey of convolu-
tional neural networks: Analysis, applications, and prospects,” IEEE
Transactions on Neural Networks and Learning Systems, vol. 33,
no. 12, pp. 6999–7019, 2022.
Z. Zhang and J. Li, “A review of artificial intelligence in embedded
systems,” Micromachines, vol. 14, no. 5, p. 897, 2023.
I. Hubara, M. Courbariaux, D. Soudry, R. El-Yaniv, and Y. Bengio,
“Quantized neural networks: Training neural networks with low
precision weights and activations,” Journal of Machine Learning
Research, vol. 18, pp. 1–30, 2018.
M. Courbariaux, I. Hubara, D. Soudry, R. El-Yaniv, and Y. Bengio,
“Binarized Neural Networks: Training Deep Neural Networks with
Weights and Activations Constrained to +1 or -1,” arXiv: Learning,
[Online]. Available: http://arxiv.org/abs/1602.02830
T. P. Swaminathan, C. Silver, and T. Akilan, “Benchmarking
deep learning models on nvidia jetson nano for real-time
systems: An empirical investigation,” 2024. [Online]. Available:
https://arxiv.org/abs/2406.17749
K. P. Seng, P. J. Lee, and L. M. Ang, “Embedded intelligence on fpga:
Survey, applications and challenges,” Electronics, vol. 10, no. 8, 2021.
[Online]. Available: https://www.mdpi.com/2079-9292/10/8/895
XILINX, “Vitis ai - adaptable & real-time ai inference acceleration,”
[Online]. Available: https://github.com/Xilinx/Vitis-AI
Y. Umuroglu, N. J. Fraser, G. Gambardella, M. Blott, P. Leong,
M. Jahre, and K. Vissers, “FINN: A framework for fast, scalable
binarized neural network inference,” FPGA 2017 - Proceedings of the
ACM/SIGDA International Symposium on Field-Programmable
Gate Arrays, no. February, pp. 65–74, 2017, doi: 10.1145/3020078.
M. Machura, M. Danilowicz, and T. Kryjak, “Embedded object
detection with custom littlenet, finn and vitis ai dcnn accelerators,”
Journal of Low Power Electronics and Applications, vol. 12, no. 2,
[Online]. Available: https://www.mdpi.com/2079-9268/12/2/30
F. Hamanaka, T. Odan, K. Kise, and T. V. Chu, “An exploration of
state-of-the-art automation frameworks for fpga-based dnn accelera-
tion,” IEEE Access, vol. 11, pp. 5701–5713, 2023.
Xilinx, “Kria kv260 vision ai starter kit,” 2021. [Online]. Availa-
ble: https://www.amd.com/en/products/system-on-modules/kria/k26/
kv260-vision-starter-kit.html
——, “Dpuczdx8g for zynq ultrascale+ mpsocs product guide
(pg338),” 2023. [Online]. Available: https://docs.xilinx.com/r/en-US/
pg338-dpu/Core-Overview
A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan,
T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, A. Desmaison, A. Köpf,
E. Yang, Z. DeVito, M. Raison, A. Tejani, S. Chilamkurthy, B. Steiner,
L. Fang, J. Bai, and S. Chintala, “PyTorch: An imperative style, high-
performance deep learning library,” Advances in Neural Information
Processing Systems, vol. 32, no. NeurIPS, 2019.
M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, C. Citro, G. S.
Corrado, A. Davis, J. Dean, M. Devin, S. Ghemawat, I. Goodfellow,
A. Harp, G. Irving, M. Isard, Y. Jia, R. Jozefowicz, L. Kaiser,
M. Kudlur, J. Levenberg, D. Mané, R. Monga, S. Moore, D. Murray,
C. Olah, M. Schuster, J. Shlens, B. Steiner, I. Sutskever, K. Talwar,
P. Tucker, V. Vanhoucke, V. Vasudevan, F. Viégas, O. Vinyals,
P. Warden, M. Wattenberg, M. Wicke, Y. Yu, and X. Zheng,
“TensorFlow: Large-scale machine learning on heterogeneous
systems,” 2015, software available from tensorflow.org. [Online].
Available: https://www.tensorflow.org/
Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick,
S. Guadarrama, and T. Darrell, “Caffe: Convolutional architecture
for fast feature embedding,” 2014. [Online]. Available: https:
//arxiv.org/abs/1408.5093
M. Blott, T. B. Preuber, N. J. Fraser, G. Gambardella, K. O’Brien,
Y. Umuroglu, M. Leeser, and K. Vissers, “FinN-R: An end-to-end
deep-learning framework for fast exploration of quantized neural
networks,” ACM Transactions on Reconfigurable Technology and
Systems, vol. 11, no. 3, 2018, doi: 10.1145/3242897.
A. Pappalardo, “Xilinx/brevitas,” 2021, doi: 10.5281/zenodo.3333552.
F. Manca, F. Ratto, and F. Palumbo, “Onnx-to-hardware design flow
for adaptive neural-network inference on fpgas,” 2024. [Online].
Available: https://arxiv.org/abs/2406.09078
Q. Ducasse, P. Cotret, L. Lagadec, and R. Stewart, “Benchmarking
quantized neural networks on fpgas with finn,” arXiv preprint ar-
Xiv:2102.01341, 2021.
Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based
learning applied to document recognition,” Proceedings of the IEEE,
vol. 86, no. 11, pp. 2278–2324, 1998, doi: 10.1109/5.726791.
H. Xiao, K. Rasul, and R. Vollgraf, “Fashion-mnist: a novel
image dataset for benchmarking machine learning algorithms,” 2017.
[Online]. Available: https://arxiv.org/abs/1708.07747
M. Kristan, J. Matas, A. Leonardis, T. Vojir, R. Pflugfelder, G. Fer-
nandez, G. Nebehay, F. Porikli, and L. Čehovin, “A novel performance
evaluation methodology for single-target trackers,” IEEE Transactions
on Pattern Analysis and Machine Intelligence, vol. 38, no. 11, pp.
–2155, Nov 2016.
K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for
image recognition,” 2015. [Online]. Available: https://arxiv.org/abs/
03385
N. Urbano Pintos, H. Lacomi, and M. Lavorato, “Implementación de
red neuronal de convolución vgg16 en fpga con vitis ai,” in Libro
de resúmenes de la 109a Reunión de la Asociación Fı́sica Argentina,
, pp. 47–48.
K. Simonyan and A. Zisserman, “Very deep convolutional networks
for large-scale image recognition,” 2014, doi: 10.48550/ARXIV.
1556. [Online]. Available: https://arxiv.org/abs/1409.1556
A. Krizhevsky, “Learning multiple layers of features from tiny ima-
ges,” University of Toronto Department of Computer Science, 2009.
N. Urbano Pintos, H. Lacomi, and M. Lavorato, “B-vgg16: Red
neuronal de convolución cuantizada binariamente para la clasificación
de imágenes,” Elektron, vol. 6, no. 2, pp. 107–114, 2022.
S. Liu and W. Deng, “Very deep convolutional neural network based
image classification using small training sample size,” Proceedings -
rd IAPR Asian Conference on Pattern Recognition, ACPR 2015, pp.
–734, 2016, doi: 10.1109/ACPR.2015.7486599.
Y. Netzer, T. Wang, A. Coates, A. Bissacco, B. Wu, and A. Y. Ng,
“Reading digits in natural images with unsupervised feature learning,”
in NIPS Workshop on Deep Learning and Unsupervised Feature
Learning 2011, 2011.
XILINX, “Fast, scalable quantized neural network inference on
fpgas,” 2024. [Online]. Available: https://github.com/Xilinx/finn
——, “Dataflow qnn inference accelerator examples on fpgas,” 2024.
[Online]. Available: https://github.com/Xilinx/finn-examples
T. maintainers and contributors, “Torchvision: Pytorch’s computer
vision library,” https://github.com/pytorch/vision, 2016.
A. Farahani, B. Pourshojae, K. Rasheed, and H. R. Arabnia, “A
concise review of transfer learning,” 2021. [Online]. Available:
https://arxiv.org/abs/2104.02144
XILINX, “Dpu on pynq,” 2022. [Online]. Available: https:
//github.com/Xilinx/DPU-PYNQ
Xilinx, “Vitis ai optimizer,” 2023. [Online]. Available: https:
//docs.amd.com/r/en-US/ug1414-vitis-ai/Vitis-AI-Optimizer
XILINX, “Kria-pynq,” 2022. [Online]. Available: https://github.com/
Xilinx/Kria-PYNQ
Downloads
Published
Issue
Section
License
The authors who publish in this journal agree with terms established in the license Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0)