“A pole is attached by an un-actuated joint to a cart, which moves along a frictionless track. The pendulum starts upright, and the goal is to prevent it from falling over by increasing and reducing the cart’s velocity.”
Github Code: https://github.com/omerbsezer/QLearning_CartPole
“QLearning is a model free reinforcement learning technique that can be used to find the optimal action selection policy using Q function without requiring a model of the environment. Q-learning eventually finds an optimal policy.” Q-learning is a specific TD (Temporal-difference) algorithm used to learn the Q-function. If there is no large scale problems, we can use look up table like in this problem.
Refs: QLearning: https://en.wikipedia.org/wiki/Q-learning
Cart Pole Problem: https://en.wikipedia.org/wiki/Inverted_pendulum
Cart Pole Open AI Gym: https://github.com/openai/gym/wiki/CartPole-v0
Open AI Gym: https://gym.openai.com/docs/