D4PG

Published on January 31, 2021

https://arxiv.org/pdf/1804.08617.pdf

For all algorithms we initialize the learning rates for both actor and critic updates to the same value. In the next section we will present a suite of simple control problems for which this value corresponds to α0 “ β0 “ 1ˆ10´4 ; for the following, harder problems we set this to a smaller value of α0 “ β0 “ 5 ˆ 10´5 . Similarly for the control suite we utilize a batch size of M “ 256 and for all subsequent problems we will increase this to M “ 512.

If you like it, share it!

Tomo's Tech Notes

Tomohiro Nagasaka

Tomohiro Nagasaka

Topics/Language

Programming/Technology

D4PG