TensorFlow and deep reinforcement learning, without a PhD (Google I/O '18)

22 thoughts on “TensorFlow and deep reinforcement learning, without a PhD (Google I/O '18)”

  1. Thank you for the insight. I was able to successfully apply your approach to problems in OpenAI gym

  2. But when they take the reward and multiply by the cross_entropy, won't a negative reward ( loss ) turn the cross entropy negative? And by minimizing this, they actually encourage the algorithm to lose? I notice in the slides that they do: loss = – R( … ), but I can't see this reflected in the code?

  3. The code is here.
    https://github.com/GoogleCloudPlatform/tensorflow-without-a-phd/tree/master/tensorflow-rl-pong

  4. Great stuff!

    Typo:

    tf.losses.softmax_cross_entropy(one_hot_labels,
    should be:
    tf.losses.softmax_cross_entropy(onehot_labels,

  5. This is adapted from Karpathy's Blog. The original post with the code and everything is here : http://karpathy.github.io/2016/05/31/rl/

  6. Form the code, refer to the following line
    loss = tf.reduce_sum(processed_rewards * cross_entropies + move_cost)

    Could I know the reason processed_reward is passed in as it is instead of negating it? Cause to my understanding, even it is normalised, negative or small reward indicate losing point or result of bad action and it should be discouraged. And from the code it minimize loss in optimization function, so it seems to encourage bad action?

  7. where is the code for this? where is the game environment anyone know where i can find it ? thank you

Leave a Reply

Your email address will not be published. Required fields are marked *