中文摘要 |
近年深度學習技術興起,加速人工智慧(AI)應用的發展。產學界爭相利用最新的高效能異質計算系統快速探索深度學習演算法的設計,並且解決重要的科學與工程的問題,乃至於衍生出許多商業和民生應用。因此,如何以大規模計算叢集,整合最新的硬體加速器,在最短時間內利用大數據訓練出精確的深度學習模型,即時部署到資料中心和物聯網,成為當前發展AI應用的過程中極為核心的系統基礎建設,其中包含許多複雜且具挑戰性的研究議題。本文介紹我們研究團隊所開發的效能測量和分析技術,對於建構與優化上述高效能分散式深度學習系統,提供重要的協助,可以讓系統開發人員分析各種類神經網路模型在普遍使用的深度學習框架進行散式訓練的效能,探討其中可能限制效能的因素,並且進一步對於系統架構設計、模型參數更新機制、編譯器選項等提供效能優化的建議。
Deep learning has been widely used to develop artificial intelligence applications recently. Academia and companies race to find new deep learning algorithms and solve important scientific/engineering problems, as well as develop applications for business and daily use with high-performance heterogeneous computing systems. Hence, the system infrastructure to accelerate the development and deployment of deep learning applications using a largescale computing cluster with state-of-the-art accelerators is not only critical, but also contains many complexes, challenging research opportunities. In this article, we introduce our research work on performance measurement and analysis techniques, which provide essential information to help construct the aforementioned system infrastructure and enable the system designer to examine the factors that may affect the performance of various neural network models on mainstream deep learning frameworks. We show how our tools can be used to investigate on performance bottlenecks and optimize the system performance by adjusting the the system architecture, the parameter update mechanism, and the compiler options. |