반응형
태그
rl
reinforcement learning
David Silver
강화학습
DDQN
RIAL
Policy Gradient
백준
DQN
pg
파이썬
DIAL
recency weighted average
Incremental Implementation
e-greedy
Multi-Armed Bandits
A-CCNet
AC-CNet
ACCNet
BiCNet
QMIX
VDN
3M-RL
MADDPG
Distributed W-learning
DWL
W-learning
21300
24309
DDDQN
D3QN
Dueling DQN
Proximal Policy Optimization Algorithms
leraning communication
learning communication
MDRL
agent-modeling
communication learning
CTDE
CTCE
DTDE
fully competitive
fully cooperative
multi-agent
Slow R-CNN
Mocte-Carlo
nn.Module
손실 함수
activation function
Action Value
PPO
loss function
선형 회귀
Value Function
ReLU
Sigmoid
Multivariable linear regression
활성화함수
reinforcement
gradient descent
cost function
Dron
commnet
sample average
greedy
reward
softmax
linear regression
MAb
Sutton
neural network
Estimation
feature
Matrix
perceptron
PER
Policy
Selection
nn
Mixed
sgd
CNN
vb
sample
MC
Model
반응형