Alex Welcing / Archive

Proximal Policy Optimization: Modern RL Algorithm

2024-11-03

PPO provides stable policy gradient updates for reinforcement learning.

Related Chronicles: The Reward Hacking Incident (2033)