Report for: Simple statistical gradient-following algorithms for connectionist reinforcement learning

#111論文等共有 395 https://t.co/BANJB8JwpI [Machine Learning '92] Policy gradient methodの有名手法REINFORCEの原論文。この時点で収束証明は無い。文献を見るとTD法, Q-learning, anchor-criticの方が古い。

31 Oct 2022

Reply Repost Favourite

REINFARCE! Industrial-strength Monte Carlo Policy Gradient for Crime. @RichardSSutton Special dedication to Ronald J. Williams: "Simple statistical gradient-following algorithms for connectionist reinforcement learning", 1992, https://t.co/5CX8XAFpCg http

25 Mar 2022

Reply Repost Favourite

@IntuitMachine @skornblith @KordingLab @TonyZador For the saddle points vs local optima, e.g.: https://t.co/cHqxP7hUY2 For optimizing sufficient statistics of discrete variables, e.g.: https://t.co/MJb4vH3zRr

20 Jul 2021

Reply Repost Favourite

Loss function is from Williams in 1992: Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning https://t.co/7ZwR5M192H #Shef2019

17 Jun 2019

Reply Repost Favourite

教科書だけじゃなく原典を読むの大事だな。 R. J. WILLIAMS, Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning, Machine Learning, 8, 229-256 (1992) https://t.co/TYTedMpwST

16 Mar 2019

Reply Repost Favourite

Simple statistical gradient-following algorithms for connectionist reinforcement learning

About this Attention Score

Mentioned by

Citations

Readers on