- Know reinforcement studying fundamentals, MDPs, Dynamic Programming, Monte Carlo, TD Learning
- Calculus and chance on the undergraduate stage
- Expertise constructing machine studying fashions in Python and Numpy
- Know easy methods to construct a feedforward, convolutional, and recurrent neural community utilizing Theano and Tensorflow
This course is all in regards to the utility of deep studying and neural networks to reinforcement studying.
In the event you’ve taken my first reinforcement studying class, then that reinforcement studying is on the bleeding fringe of what we are able to do with AI.
Particularly, the mixture of deep studying with reinforcement studying has led to AlphaGo beating a world champion in the technique recreation Go, it has led to self-driving automobiles, and it has led to machines that may play video video games at a superhuman stage.
Reinforcement studying has been round because the 70s however none of this has been attainable till now.
The world is altering at a really quick tempo. The state of California is altering their rules in order that self-driving automotive corporations can take a look at their automobiles and not using a human in the automotive to oversee.
We’ve seen that reinforcement studying is a completely totally different sort of machine studying than supervised and unsupervised studying.
Supervised and unsupervised machine studying algorithms are for analyzing and making predictions about information, whereas reinforcement studying is about coaching an agent to work together with an atmosphere and maximize its reward.
Not like supervised and unsupervised studying algorithms, reinforcement studying brokers have an impetus – they need to attain a purpose.
That is such a captivating perspective, it might probably even make supervised / unsupervised machine studying and “data science” appear boring in hindsight. Why practice a neural community to study in regards to the information in a database, when you may practice a neural community to work together with the true-world?
Whereas deep reinforcement studying and AI has a number of potential, it additionally carries with it large danger.
Invoice Gates and Elon Musk have made public statements about among the dangers that AI poses to financial stability and even our existence.
As we realized in my first reinforcement studying course, one of many most important ideas of coaching reinforcement studying brokers is that there are unintended penalties when coaching an AI.
AIs don’t assume like people, and they also provide you with novel and non-intuitive options to succeed in their objectives, usually in ways in which shock area consultants – people who’re one of the best at what they do.
OpenAI is a non-revenue based by Elon Musk, Sam Altman (Y Combinator), and others, in order to make sure that AI progresses in a means that’s helpful, somewhat than dangerous.
A part of the motivation behind OpenAI is the existential danger that AI poses to people. They imagine that open collaboration is among the keys to mitigating that danger.
One of many nice issues about OpenAI is that they’ve a platform referred to as the OpenAI Fitness center, which we’ll be making heavy use of in this course.
It permits anybody, wherever in the world, to coach their reinforcement studying brokers in normal environments.
On this course, we’ll construct upon what we did in the final course by working with extra complicated environments, particularly, these supplied by the OpenAI Fitness center:
- Mountain Automotive
- Atari video games
To coach efficient studying brokers, we’ll want new strategies.
We’ll lengthen our data of temporal distinction studying by wanting on the TD Lambda algorithm, we’ll take a look at a particular kind of neural community referred to as the RBF community, we’ll take a look at the coverage gradient technique, and we’ll finish the course by Deep Q-Learning (DQN) and A3C (Asynchronous Benefit Actor-Critic).
Thanks for studying, and I’ll see you in class!
All of the code for this course will be downloaded from my github:
Within the listing: rl2
Be sure to all the time “git pull” so you have got the most recent model!
HARD PREREQUISITES / KNOWLEDGE YOU ARE ASSUMED TO HAVE:
- Object-oriented programming
- Python coding: if/else, loops, lists, dicts, units
- Numpy coding: matrix and vector operations
- Linear regression
- Gradient descent
- Know easy methods to construct a feedforward, convolutional, and recurrent neural community in Theano and TensorFlow
- Markov Choice Proccesses (MDPs)
- Know easy methods to implement Dynamic Programming, Monte Carlo, and Temporal Distinction Learning to resolve MDPs
TIPS (for getting by the course):
- Watch it at 2x.
- Take handwritten notes. This may drastically improve your skill to retain the data.
- Write down the equations. In the event you don’t, I assure it would simply appear like gibberish.
- Ask a number of questions on the dialogue board. The extra the higher!
- Understand that the majority workouts will take you days or perhaps weeks to finish.
- Write code your self, don’t simply sit there and take a look at my code.
WHAT ORDER SHOULD I TAKE YOUR COURSES IN?:
- Take a look at the lecture “What order should I take your courses in?” (out there in the Appendix of any of my programs, together with the free Numpy course)
Who this course is for:
- Professionals and college students with robust technical backgrounds who want to study state-of-the-artwork AI strategies.
Created by Lazy Programmer Inc.
Final up to date 12/2020
Dimension: 2.96 GB
DISCLAIMER: No Copyright Infringement Supposed, All Rights Reserved to the Precise Proprietor. This content material has been shared below Academic Functions Solely. For Copyright Content material Removing Please Contact the Administrator or E mail at Getintocourse@gmail.com