## Pong Game RL application: Policy Gradient Implementation Using OpenAI Gym and Tensorflow

Policy gradient network is implemented using popular atari game, Pong Game. “Policy gradients method involves running a policy for a while, seeing what actions lead to high rewards, increasing their probability through backpropagating gradients”.

If there is a large scale problems which is aimed to solve, a type of function approximator should be used. In this problem, a neural network is used as function approximator. There are too many states and/or actions to store in memory, so look up table can not be used.

Andrej Karpathy (Deep Reinforcement Learning: Pong from Pixels): http://karpathy.github.io/2016/05/31/rl/

Policy Gradient Neural Network, based on Andrej’s solution, will do:

• take in images from the game and “preprocess” them (remove color, background, etc).
• use the TF NN to compute a probability of moving up or down.
• sample from that probability distribution and tell the agent to move up or down.
• if the round is over, find whether you won or lost.
• when the episode has finished, pass the result through the backpropagation algorithm to compute the gradient for weights.
• after each episodes have finished, sum up the gradient and move the weights in the direction of the gradient.
• repeat this process until weights are tuned to the point.

### PongGame Experiment Results

After a period time, scores are getting better.

After 2 days running, system is learned and starting to beat opponent. Last saved checkpoint which is learned after 2days is committed in the checkpoint folder. When starting code in your environment, if there is a checkpoint point folder, it will be loaded..

### References:

Policy Gradients Method: http://www.scholarpedia.org/article/Policy_gradient_methods

Policy Gradients from David Silver: http://www0.cs.ucl.ac.uk/staff/d.silver/web/Teaching_files/pg.pdf

Pong Game Open AI Gym: https://gym.openai.com/envs/Pong-v0/

Open AI Gym: https://gym.openai.com/docs/

https://github.com/llSourcell/policy_gradients_pong

https://github.com/mrahtz/tensorflow-rl-pong

https://medium.com/@dhruvp/how-to-write-a-neural-network-to-play-pong-from-scratch-956b57d4f6e0

## Reinforcement Learning Application: CartPole Implementation Using QLearning

“A pole is attached by an un-actuated joint to a cart, which moves along a frictionless track. The pendulum starts upright, and the goal is to prevent it from falling over by increasing and reducing the cart’s velocity.”

Github Code: https://github.com/omerbsezer/QLearning_CartPole

### QLearning Implementation Using Gym

“QLearning is a model free reinforcement learning technique that can be used to find the optimal action selection policy using Q function without requiring a model of the environment. Q-learning eventually finds an optimal policy.” Q-learning is a specific TD (Temporal-difference) algorithm used to learn the Q-function. If there is no large scale problems, we can use look up table like in this problem.

CartPole Results:

Refs: QLearning: https://en.wikipedia.org/wiki/Q-learning

Cart Pole Problem: https://en.wikipedia.org/wiki/Inverted_pendulum

Cart Pole Open AI Gym: https://github.com/openai/gym/wiki/CartPole-v0

Open AI Gym: https://gym.openai.com/docs/

## Prediction of Stock Prices Using LSTM network

Stock and ETFs prices are predicted using LSTM network (Keras-Tensorflow).

• Stock prices are downloaded from finance.yahoo.com.
• Closed value (column[5]) is used in the network.
• Values are normalized in range (0,1).
• Datasets are splitted into train and test sets, 50% test data, 50% training data.
• Keras-Tensorflow is used for implementation.
• LSTM network consists of 25 hidden neurons, and 1 output layer (1 dense layer).
• LSTM network features input: 1 layer, output: 1 layer , hidden: 25 neurons, optimizer:adam, dropout:0.1, timestep:240, batchsize:240, epochs:1000 (features can be further optimized).
• Root mean squared errors are calculated.
• Output files: lstm_results (consists of prediction and actual values), plot file (actual and prediction values).

What is LSTM? (General Information) https://en.wikipedia.org/wiki/Long_short-term_memory

Keras: https://keras.io/

Tensorflow: https://www.tensorflow.org/

## A Deep Neural-Network based Stock Trading System based on Evolutionary Optimized Technical Analysis Parameters

In this study, we propose a stock trading system based on optimized technical analysis parameters for creating buy-sell points using genetic algorithms. The model is developed utilizing Apache Spark big data platform. The optimized parameters are then passed to a deep MLP neural network for buy-sell-hold predictions. Dow 30 stocks are chosen for model validation. Each Dow stock is trained separately using daily close prices between 1996-2016 and tested between 2007-2016. The results indicate that optimizing the technical indicator parameters not only enhances the stock trading performance but also provides a model that might be used as an alternative to Buy and Hold and other standard technical analysis models. The phase of proposed method is illustrated in below.

Utilizing optimized technical analysis feature parameter values as input features for neural network stock trading system is the basis for our proposed model. We used genetic algorithms to optimize RSI parameters for uptrend and downtrend market conditions. Then, we used those optimized feature values as buy-sell trigger points for our deep neural network data set. We used Dow 30 stocks to validate our model. The results indicate that such a trading system produces comparable or better results when compared with Buy & Hold and other trading systems for a wide range of stocks even for relatively longer periods. The structure of the chromosomes and genes in the chromosomes are shown below.

• RSI Buy values are created randomly between 5 and 40.
• RSI Buy intervals are created randomly between 5 and 20 days.
• RSI Sell values are created randomly between 60 and 95.
• RSI Sell intervals are created randomly between 5 and 20 days.
• The same procedure is followed to create 4 genes for uptrend.

Genetic algorithm phase is illustrated as follows:

Science Direct Link: http://www.sciencedirect.com/science/article/pii/S1877050917318252

Cite as:

Bibtex:

``````@article{sezer2017deep,
title={A Deep Neural-Network Based Stock Trading System Based on Evolutionary Optimized Technical Analysis Parameters},
author={Sezer, Omer Berat and Ozbayoglu, Murat and Dogdu, Erdogan},
journal={Procedia Computer Science},
volume={114},
pages={473--480},
year={2017},
publisher={Elsevier}
}
``````

MLA:

Sezer, Omer Berat, Murat Ozbayoglu, and Erdogan Dogdu. “A Deep Neural-Network Based Stock Trading System Based on Evolutionary Optimized Technical Analysis Parameters.” Procedia Computer Science 114 (2017): 473-480

What is Multi Layer Perceptron (MLP)? (General Information): https://en.wikipedia.org/wiki/Multilayer_perceptron

What is Genetic Algorithm?: https://en.wikipedia.org/wiki/Genetic_algorithm

What is Relative Strength Index?: https://en.wikipedia.org/wiki/Relative_strength_index

Apache Spark MLlib: https://spark.apache.org/mllib/

## An Artificial Neural Network Based Stock Trading System Using Technical Analysis and Big Data Framework

The model developed first converts the financial time series data into a series of `buy-sell-hold` trigger signals using the most commonly preferred technical analysis indicators (TA4J is used to calculate technical analysis indicators’ values). Then, a multilayer perceptron (MLP) is trained in the learning stage on the daily stock prices between 1997 and 2007 for all of the Dow 30 stocks. Apache Spark big data framework is used in the training stage. The trained model is then tested with data from 2007 to 2017. The results indicate that by choosing the most appropriate technical indicators, the NN model can achieve comparable results against the `buy` and `hold` strategy in most of the cases. Furthermore, fine tuning the technical indicators and/or optimization strategy can enhance the overall trading performance.

We presented a new stock trading and prediction model based on an MLP model, utilizing technical analysis indicator values as features. Big data framework Apache Spark is used in implementation. The model is trained and tested on Dow 30 stocks in order to see the evaluate the model. The results indicate that comparable outcomes are obtained against the baseline `buy` and `hold` strategy even without fine tuning and/or optimizing the model parameters. Phases of proposed method is illustrated in below.

Abstract:

In this paper, a neural network-based stock price prediction and trading system using technical analysis indicators is presented. The model developed first converts the financial time series data into a series of buy-sell-hold trigger signals using the most commonly preferred technical analysis indicators. Then, a Multilayer Perceptron (MLP) artificial neural network (ANN) model is trained in the learning stage on the daily stock prices between 1997 and 2007 for all of the Dow30 stocks. Apache Spark big data framework is used in the training stage. The trained model is then tested with data from 2007 to 2017. The results indicate that by choosing the most appropriate technical indicators, the neural network model can achieve comparable results against the Buy and Hold strategy in most of the cases. Furthermore, fine tuning the technical indicators and/or optimization strategy can enhance the overall trading performance.

Github Link: https://github.com/omerbsezer/SparkMlpDow30

Related Links:

What is Multi Layer Perceptron (MLP)? (General Information): https://en.wikipedia.org/wiki/Multilayer_perceptron

What is Relative Strength Index?: https://en.wikipedia.org/wiki/Relative_strength_index

What is MACD?: https://en.wikipedia.org/wiki/MACD

What is William%R?: https://www.investopedia.com/terms/w/williamsr.asp

Apache Spark MLlib: https://spark.apache.org/mllib/