Effects of BP Algorithm-based Activation Functions on Neural Network Convergence

Junguo Hu; Lili Xu; Xin Wang; Xiaojun Xu; Guangyun Su

熱門：

首頁

臺灣期刊 法律公行政治醫事相關財經社會學教育其他

大陸期刊 核心重要期刊

DOI文章

	本站僅提供期刊文獻檢索。　　【月旦知識庫】是否收錄該篇全文，敬請【登入】查詢為準。最新【購點活動】
篇名	Effects of BP Algorithm-based Activation Functions on Neural Network Convergence
並列篇名	Effects of BP Algorithm-based Activation Functions on Neural Network Convergence
作者	Junguo Hu (Junguo Hu)、Lili Xu (Lili Xu)、Xin Wang (Xin Wang)、Xiaojun Xu (Xiaojun Xu)、Guangyun Su (Guangyun Su)
英文摘要	Activation functions map data in artificial neural network computation. In an application, the activation function and selection of its gradient and translation factors are directly related to the convergence of the network. Usually, the activation function parameters are determined by trial and error. In this work, a Cauchy distribution (Cauchy), Laplace distribution (Laplace), and Gaussian error function (Erf) were used as new activation functions for the back-propagation (BP) algorithm. In addition, this study compares the effects of the Sigmoid type function (Logsig), hyperbolic tangent function (Tansig), and normal distribution function (Normal). The XOR problem was used in simulation experiments to evaluate the effects of these six kinds of activation functions on network convergence and determine their optimal gradient and translation factors. The results show that the gradient factor and initial weights significantly impact the convergence of activation functions. The optimal gradient factors for Laplace, Erf-Logsig, Tansig-Logsig, Logsig, and Normal were 0.5, 0.5, 4, 2, and 1, respectively, and the best intervals were [0.5, 1], [0.5, 2], [2, 6], [1, 4], and [1, 2], respectively. Using optimal gradient factors, the order of convergence speed was Laplace, Erf-Logsig, Tansig-Logsig, Logsig, and Normal. The functions Logsig (gradient factor = 2), Tansig-Logsig (gradient factor = 4), Normal (translation factor = 0, gradient factor = 1), Erf-Logsig (gradient factor = 0.5) and Laplace (translation factor = 0, gradient factor = 0.5) were less sensitive to initial weights, and as a result, their convergence performances were less influenced. As the gradient of the curve of the activation functions increased, the convergence speed of the networks showed an accelerating trend. The conclusions obtained from the simulation analysis can be used as a reference for the selection of activation functions for BP algorithm-based feedforward neural networks.
起訖頁	076-085
關鍵詞	activation functions、back-propagation (BP) algorithm、convergence、gradient factor、initial weights
刊名	電腦學刊
期數	201802 (29:1期)
該期刊-上一篇	Proximal Support Vector Machine with Mixed Norm
該期刊-下一篇	Combining Features to Meet User Satisfaction: Mining Helpful Chinese Reviews