반응형 전체 글75 DDPG(CONTINUOUS CONTROL WITH DEEP RL) Main Paper https://arxiv.org/pdf/1509.02971.pdf CONTINUOUS CONTROL WITH DEEP REINFORCEMENTLEARNING Abstract 성공적 결과를 보였던 DQN을 continuous action domain에 적용한다. 본 논문에서는 continuous action space에서 동작하는 deterministic policy gradient에 기반한 actor-critic, model-free algorithm을 제안한다. 이 algorithm은 domain과 derivatives에 대해 full access가 가능한 planning algorithm으로 찾은 policy의 성능과 경쟁적인 policy를 찾을 수 있다. Introduction .. 2021. 8. 10. PPO(Proximal Policy Optimization Algorithms) Main Paper https://arxiv.org/pdf/1707.06347.pdf Proximal Policy Optimization Algorithms Abstract env와의 interaction을 통한 data sampling과 policy gradient ascent 사용한 surrogate objective function optimization를 교대하는 Policy Gradient 방식을 제안한다. 기존의 PG가 data sample당 하나의 gradient update를 수행하는 반면, 우리는 mini batch update의 여러 epoch를 가능하게 하는 새로운 objective function을 제안한다. 제안하는 PPO는 TRPO의 일부 이점이 있지만 구현이 훨씬 간단하고 sa.. 2021. 8. 6. TRPO(Trust Region Policy Optimization) Main Paper https://arxiv.org/pdf/1502.05477.pdf Trust Region Policy Optimization Abstract 더보기 We describe an iterative procedure for optimizing policies, with guaranteed monotonic improvement. By making several approximations to the theoretically-justified procedure, we develop a practical algorithm, called Trust Region Policy Optimization (TRPO). This algorithm is similar to natural policy grad.. 2021. 8. 2. code # main.py if __name__ == '__main__': args = parse_args() if args.option == 'train': train(args) else: evaluate(args) # set env def init_env(config, port=0): # get scenario scenario = config.get('scenario') if scenario.startswith('atsc'): # atsc env: set port parameter if scenario.endswith('large_grid'): # atsc-large_grid env return LargeGridEnv(config, port=port) else: # atsc-real_net env return.. 2021. 7. 26. CommNet (learning communication, PG) Main Paper https://arxiv.org/pdf/1605.07736.pdf Learning Multiagent Communication with Backpropagation Abstract 더보기 Typically, the communication protocol between agents is manually specified and not altered during training. In this paper we explore a simple neural model, called CommNet, that uses continuous communication for fully cooperative tasks. The model consists of multiple agents and the .. 2021. 7. 12. RIAL & DIAL (learning communication, VB) Main Paper https://arxiv.org/pdf/1605.06676.pdf Learning to Communicate with Deep Multi-Agent Reinforcement Learning Abstract 더보기 We consider the problem of multiple agents sensing and acting in environments with the goal of maximising their shared utility. In these environments, agents must learn communication protocols in order to share information that is needed to solve the tasks. By embraci.. 2021. 7. 7. 이전 1 ··· 5 6 7 8 9 10 11 ··· 13 다음 반응형