RL策略梯度方法之(十三): actor-critic using Kronecker-factored trust region(ACKTR)

本专栏按照 https://lilianweng.github.io/lil-log/2018/04/08/policy-gradient-algorithms.html 顺序进行总结 。


  • 原理解析
  • 算法实现
    • 总体流程
    • 代码实现


ACKTR\color{red}ACKTRACKTR :[ paper | code ]


(更详细的解释可以参考:[https://blog.csdn.net/bbbeoy/article/details/106984109] (https://blog.csdn.net/bbbeoy/article/details/106984109))
Kronecker因子化置信区间的演员-评论家算法(Actor-Critic using Kronecker-factored Trust Region,ACKTR,Yuhuai Wu, et al., 2017)使用Kronecker因子化曲率估计(K-FAC)同时进行演员以及评论家的梯度更新。K-FAC对自然梯度的计算进行了改进,这与我们的标准梯度有很大不同。这里有一个对于自然梯度很好很直观的解释。




“This approximation is built in two stages. In the first, the rows and columns of the Fisher are divided into groups, each of which corresponds to all the weights in a given layer, and this gives rise to a block-partitioning of the matrix. These blocks are then approximated as Kronecker products between much smaller matrices, which we show is equivalent to making certain approximating assumptions regarding the statistics of the network’s gradients.
In the second stage, this matrix is further approximated as having an inverse which is either block-diagonal or block-tridiagonal. We justify this approximation through a careful examination of the relationships between inverse covariances, tree-structured graphical models, and linear regression. Notably, this justification doesn’t apply to the Fisher itself, and our experiments confirm that while the inverse Fisher does indeed possess this structure (approximately), the Fisher itself does not.”



