Final Project: Reinforcement LearningCS 301 (004),

08 Jul Final Project: Reinforcement LearningCS 301 (004),

Posted at 00:00h in general questions by

Final Project: Reinforcement LearningCS 301 (004), Spring 2020, Introduction to Data ScienceWARNING: this project might be hard for some of you: please start as soon as possible!Remarks. You are expected to write a short essay, which covers in detail your approaches and answers tothe below questions. It is highly recommended that you rst state your approaches and ideas at a high leveland then show how your ideas apply to the two concrete examples as shown here. Your score of this projectwill be evaluated based on not only your answers to specic questions but also the overall writing skills.Consider such an interesting game as follows. There is a special dice with N sides, where the ith side hasthe number i for each 1 i N. Let [N]:=f1; 2; 3; : : : ;Ng, the set of integers ranging from 1 to N. Letp 2 [0; 1]N be a vector of length N such that the ith entry of p, denoted by pi, represents the probability thatwe will end with the ith side (thus, we will see the number i) if rolling the dice once. For example, N = 4and p = (0; 1=2; 1=4; 1=4), which means that if we roll the dice once, we will see the number 1, 2, 3, and 4,with probability 0, 1=2, 1=4 and 1=4, respectively. There is another binary vector q 2 f0; 1gN, where the ithentry of q, denoted by qi, indicates if the ith side is BAD (qi = 1) or not (qi = 0).Game Rules. At the beginning, you have $0 at hand. Suppose at some time, you have x dollars at hand.You have two choices to make, either accept the challenge or quit. (Case 1) If your choice is quit,then game is over and you take x dollars and go away. (Case 2) If your choice is accept, then you willroll the dice once and see a random number X 2 [N]. Here are two subcases. (1) If qX = 1, i.e., the Xthside is BAD, then you loss all current money at hand; (2) If qX = 0, i.e., the Xth side is not BAD, then youwill get a reward of f(X) where f is a function of X. In this case, you will have x + f(X) dollars. Here is atricky part: if x+f(X) K (K is a parameter known in advance), then game is over, and you take x+f(X)dollars and go away; otherwise, you will continue the game with x+f(X) dollars at hand. Attention: If youaccept the challenge, roll the dice once, get a random number X, and qX = 1, you just loss all the money athand but Game is NOT over: you can still continue to play the game with $0 at hand. Game is over onlywhen you choose to quit or you have at least K dollars at hand. Note that these key components uniquelydene the game: (N, p, q, f, K).(Question 1) Consider a simple case where N = 6, p = (1=6; 1=6; 1=6; 1=6; 1=6; 1=6). In other words, wehave a normal dice with six sides, and each side will appear with the same chance if we roll once. Letq = (1; 1; 0; 0; 0; 0), f(X) = X, and K = 100. You are asked to do the following.(a) Formulate the above game as a reinforcement learning system. Please specify the ve key componentsin the game (S;A; T;R; ), where S is the state space, A is the action space, T is the transition probabilityfunction, R is the reward function, and is the discounted factor. Please specify clearly the terminal statespace (SA) and the non-terminal state space (SN).(b) Compute the optimal value function V and optimal policy . You can try either the value iterationmethod or the dynamic programming method. Please make sure to state explicitly the values of V (s) and(s) for all s 2 SN, where SN refers to the non-terminal state space. Based on your results, state explictlythe maximum expected total rewards you will get in this game when starting with $0.1(c) Please use linear programming (LP) to compute the optimal value function V and optimal policy . Youshould explicitly specify the following elements in the LP: variables, the objective function, and constraints.Again, please state explicitly the values of V (s) and (s) for all s 2 SN.(Question 2) Consider a special case where N = 5, p = (1=2; 1=4; 1=8; 1=16; 1=16), q = (0; 0; 1; 1; 0),f(X) = X2, and K = 100. Answer the same questions (a), (b), and (c), as shown in Question 1.2

Our website has a team of professional writers who can help you write any of your homework. They will write your papers from scratch. We also have a team of editors just to make sure all papers are of HIGH QUALITY & PLAGIARISM FREE. To make an Order you only need to click Ask A Question and we will direct you to our Order Page at WriteDemy. Then fill Our Order Form with all your assignment instructions. Select your deadline and pay for your paper. You will get it few hours before your set deadline.

Fill in all the assignment paper details that are required in the order form with the standard information being the page count, deadline, academic level and type of paper. It is advisable to have this information at hand so that you can quickly fill in the necessary information needed in the form for the essay writer to be immediately assigned to your writing project. Make payment for the custom essay order to enable us to assign a suitable writer to your order. Payments are made through Paypal on a secured billing page. Finally, sit back and relax.

Do you need an answer to this or any other questions?

About Writedemy

We are a professional paper writing website. If you have searched a question and bumped into our website just know you are in the right place to get help in your coursework. We offer HIGH QUALITY & PLAGIARISM FREE Papers.

How It Works

To make an Order you only need to click on “Order Now” and we will direct you to our Order Page. Fill Our Order Form with all your assignment instructions. Select your deadline and pay for your paper. You will get it few hours before your set deadline.

Are there Discounts?

All new clients are eligible for 20% off in their first Order. Our payment method is safe and secure.

Hire a tutor today CLICK HERE to make your first order

Print page