test reward never change when implement PPO on Acrobot-v1


I failed to train some PPO agents on Acrobot-v1, the test reward never change. It stays at -500. My code is same as test/discrete/test_ppo, except the env is Acrobot-v1. Also, when I use a custom actor, the test reward does not change either. It stays around -120(env is LunarLanderContinous-v1). I am confused. How could reward always be the same?