Reinforcement learning SARSA algorithm decreases values over time

Reinforcement learning SARSA algorithm decreases values over time - julia

I am currently trying to implement the SARSA algorithm, as described in Sutton, Barto "Reinforcement Learning, An Introduction" on a gridworld with a windy upstream. (I am using the same environment as Sutton, Barto - p.130.) Basically, there are 70 fields and one can move in four directions: up, down, left or right. On some states, a wind will cause the movement to shift up one step. The reward is -1 for each timestep, where the goal has not been reached.
I implemented the environment and everything seems to be working fine. However, the learning algorithm does not seem to work. The authors of the book claim that when using certain parameters, the algorithm converges to a near optimal solution after about 150 episodes learned. This is not the case for my code (written in Julia v1.1.0)
g = GridWorld()
α = 0.5
γ = 1
ϵ = 0.1
Q = zeros(70,4)
for episode in 1:100000
isDone = false
S = g.start
A = eps_greedy(Q,ϵ,S)
while !isDone
(R,isDone) = action(g,A)
S´ = g.position
A´ = eps_greedy(Q,ϵ,S´)
Q[S,A] += α*(R + γ* Q[S´,A´] - Q[S,A])
S = S´
A = A´
end
end
The object g stores the current state, which gets changed according to action A when calling action(g,A). The function eps_greedy(Q,epsilon,S) just takes a current state and chooses an epsilon-greedy action from the action-value function Q.
The problem is: The longer I train, the lower the action values, stored in Q, will get. For example, training for about 2000 episodes, the action values of the starting state are all similar at approximately -950. Training for 20000 will yield action values of around -10000.
I don't think this is supposed to happen, but I am not quite sure what causes the problem. Any help would be greatly appreciated!

Related

How to overcome "Variable returned by scilab argument function is incorrect" while using fsolve in scilab?

While solving one problem in the fluid mechanics topic, I came across a situation where I have to solve 4 non linear equation to get 4 unknown variable values. So, I used fsolve function in scilab to solve the equations. My code is as follows:
clc
clear
function f=F(x)
f(1)=x(1)-(0.4458*x(2)^(-2))
f(2)=x(3)-(0.26936*x(2)*(-1))
f(3)=(2.616*x(2))-(x(4)*x(1)^2)
f(4)=(0.316/(x(3)^(1/4)))
endfunction
function j=jacob(x)
j(1,1)=1;j(1,2)=0.8916*x(2)^(-3);j(1,3)=0;j(1,4)=0
j(2,1)=0;j(2,2)=0.26936*x(2)^(-2);j(2,3)=1;j(2,4)=0;
j(3,1)=-2*x(1)*x(4);j(3,2)=2.616;j(3,3)=0;j(3,4)=-1*x(1)^2;
j(4,1)=0;j(4,2)=0;j(4,3)=-2/x(3)/log(10);j(4,4)=(-0.5*x(4)^(-1.5))-(1/x(4)/log(10));
endfunction
x0=[1 1 2000 1];
[x,v,info]=fsolve(x0,F,jacob);
disp(x);
Error:
[x,v,info]=fsolve(x0,F,jacob);
!--error 98
Variable returned by scilab argument function is incorrect.
at line 17 of exec file called by :
exec('D:\Desktop files\Ajith\TBC\SCILAB code\Chapter_08\fsolve.sce', -1)
Details of the question:-
Actual question: Heated air at 1 atm and 35 degree C is to be transported in a 150m long circular plastic duct at a rate of 0.35 m3/s. If the head loss in the pipe is not to exceed 20m, determine the minimum diameter of the duct?
Book name: Fluid Mechanics: Fundamentals and Applications by Y.A.Cengel and J.M.Cimbala.
Page and question number: Page no.: 345, EXAMPLE 8-4
ISBN of the book: 0-07-247236-7
Textbook link: https://www.academia.edu/32439502/Cengel_fluid_mechanics_6_edition.PDF
In my code: x(1) is velocity, x(2) is the diameter, x(3) is the Reynolds number, x(4) is the friction factor
Expected answers: x(1)=6.24, x(2)=0.267, x(3)=100800, x(4)=0.0180.
My thoughts about the error:
What is see is that if I change the power of the variable such as from 0.5 to 2 or -1.5 to 1, answer is calculated and displayed. So, the problem is somewhere around the power of the variables used.
Also the initial values of the x, I saw that for some initial value there is no error and I got the output.

After reading the description of the problem in the book, there is only one non-trivial equation (the third) all other give directly other unknowns as functions of D. Here is the code to determine the diameter:
function out=F(D)
V = 0.35/%pi/D^2*4;
Re = V*D/1.655e-5;
f = 20/(150/D*V^2/2/9.81);
out = 1/sqrt(f) + 2*log10(2.51/Re/sqrt(f));
endfunction
D0 = 1;
[D,v,info]=fsolve(D0,F);
disp(D)

Iteration / Maximization Excel solver in R

I am trying to do a maximization in R that I have done previously in Excel with the solver. The problem is that I don't know how to deal with it (i don't have a good level in R).
let's talk a bit about my data. I have 26 Swiss cantons and the Swiss government (which is the sum of the value of the 26 cantons) with their population and their "wealth". So I have 27 observatios by variable. I'm not sure that the following descriptions are useful but I put them anyway. From this, I calculate some variables with while loops. For each canton [i]:
resource potential = mean(wealth2011 [i],wealth2012 [i],wealth2013 [i])
population mean = mean(population2011 [i],population2012 [i],population2013 [i])
resource potential per capita = 1000*resource potential [i]/population [i]
resource index = 100*resource potential capita [i]/resource potential capita [swiss government]
Here a little example of the kind of loops I used:
RI=0
i = 1
while(i<28){
RI[i]=resource potential capita [i]/resource potential capita [27]*100
i = i+1
}
The resource index (RI) for the Swiss government (i = 27) is 100 because we divide the resource potential capita of the swiss government (when i = 27) by itself and multiply by 100. Hence, all cantons that have a RI>100 are rich cantons and other (IR<100) are poor cantons. Until here, there was no problem. I just explained how I built my dataset.
Now the problem that I face: I have to create the variable weighted difference (wd). It takes the value of:
0 if RI>100 (rich canton)
(100-RI[i])^(1+P)*Pop[i] if RI<100 (poor canton)
I create this variable like this: (sorry for the weakness of the code, I did my best).
wd=-1
i = 1
a = 0
c = 0
tot = 0
while(i<28){
if(i == 27) {
wd[i] = a
} else if (RI[i] < 100) {
wd[i] = (100-RI[i])^(1+P)*Pop[i]
c = wd[i]
a = a+c
} else {
wd[i]= 0
}
i = i+1
}
However, I don't now the value of "p". It is a value between 0 and 1. To find the value of p, I have to do a maximization using the following features:
RI_26 = 65.9, it is the minimum of RI in my data
RI_min = 100-((x*wd [27])/((1+p)*z*100))^(1/p), where x and z are fixed values (x = 8'677, z = 4'075'977'077) and wd [27] the sum of wd for each canton.
We have p in two equation: RI_min and wd. To solve it in Excel, I used the Excel solver with the following features:
p_dot = RI_26/RI_min* p ==> p_dot =[65.9/100-((x* wd [27])/((1+p)*z*100))^(1/p)]*p
RI_26 = RI_min ==>65.9 =100-((x*wd [27])/((1+p)*z*100))^(1/p)
In Excel, p is my variable cell (the only value allowed to change), p_dot is my objective to define and RI_26 = RI_min is my constraint.
So I would like to maximize p and I don't know how to do this in R. My main problem is the presence of p in RI_min and wd. We need to do an iteration to solve it but this is too far from my skills.
Is anyone able to help me with the information I provided?

you should look into the optim function.
Here I will try to give you a really simple explanation since you said you don't have a really good level in R.
Assuming I have a function f(x) that I want to maximize and therefore I want to find the parameter x that gives me the max value of f(x).
First thing to do will be to define the function, in R you can do this with:
myfunction<- function(x) {...}
Having defined the function I can optimize it with the command:
optim(par,myfunction)
where par is the vector of initial parameters of the function, and myfunction is the function that needs to be optimized. Bear in mind that optim performs minimization, however it will maximize if control$fnscale is negative. Another strategy will be to change the function (i.e. changing the sign) to suit the problem.
Hope that this helps,
Marco

From the description you provided, if I'm not mistaken, it looks like that everything you need to do it's just an equation.
In particular you have the following two expressions:
RI_min = 100-((x*y)/((1+p)*z*100))^(1/p)
and, since x,y,z are fixed, the only variable is p.
Moreover, having RI_26 = RI_min this yields to:
65.9 =100-((x*y)/((1+p)*z*100))^(1/p)
Plugging in the values of x,y and z you have provided, this yields to
p=0.526639915936052
I don't understand what exactly you are trying to maximize.

chen's chaotic system solution using differential transform method

I am calculating the solution of Chen's chaotic system using differential transform method. The code that I am using is:
x=zeros(1,7);
x(1)=-0.1;
y=zeros(1,7);
y(1)=0.5;
z=zeros(1,7);
z(1)=-0.6;
for k=0:5
x(k+2)=(40*gamma(1+k)/gamma(2+k))*(y(k+1)-x(k+1));
sum=0;
for l=0:k
sum=sum+x(l+1)*z(k+1-l);
end
y(k+2)=(gamma(1+k)/gamma(2+k))*(-12*x(k+1)-sum+28*y(k+1));
sum=0;
for l=0:k
sum=sum+x(l+1)*y(k+1-l);
end
z(k+2)=(gamma(1+k)/(1+k))*(sum-3*z(k+1));
end
s=fliplr(x);
t=0:0.05:2;
a=polyval(s,t);
plot(t,a)
What this code does is calculate x(k), y(k) and z(k) these are the coefficients of the polynomial that is approximating the solution.
The solution x(t) = sum_0^infinity x(k)t^k, and similarly the others. But this code doesn't give the desired output of a chaotic sequence the graph of x(t) that I am getting is:

This is not an answer, but a clearer and more correct (programmatically speaking) to write your loop:
for k = 1:6
x(k+1)=(40*1/k)*(y(k)-x(k));
temp_sum = sum(x(1:k).*z(k:-1:1),2);
y(k+1) = (1/k)*(-12*x(k)-temp_sum+28*y(k));
temp_sum = sum(x(1:k).*y(k:-1:1),2);
z(k+1) = (1/k)*(temp_sum-3*z(k));
end
The most important issue here is not overloading the built-in function sum (I replaced it with temp_sum. Other things include vectorization of the inner loops (using sum...), indexing that starts in 1 (instead of writing k+1 all the time), and removing unnecessary calls to gamma (gamma(k)/gamma(k+1) = 1/k).

Characterizing the "prominence" of a poll response

I’m trying to characterize how prominent a selection is.
You poll 10 people for their
favorite color and you get the following response:
r = c(rep("blue",5),rep("green",4),rep("red",1))
And you make a contingency table:
tab = table(r)
If you take the density of this...
d = density(tab)
If you ask someone else and they say "red", you characterize the prominence
of this response by the integral of the portion of the probability density function equal to or less than the rate of their response:
get.prominence = function(new.response){
rate = tab[new.response]
window.index = tail(which(d$x<=rate),n=1)
sum(d$y[1:window.index])/sum(d$y)
}
get.prominence("red") # .16
get.prominence("blue") # .77
Is this a reasonable way of doing this? There must be a known technique
for this and I don't know the name of it.

Why there are no breakpoints in time series

I have the following time series:
Lines <- "D1,Diff
1,14/01/2015 00:00,0.03
2,14/01/2015 01:00,0.03
3,14/01/2015 02:00,0.01
4,14/01/2015 03:00,0.02
5,14/01/2015 04:00,0.03
6,14/01/2015 05:00,0.02
7,14/01/2015 06:00,0.01
8,14/01/2015 07:00,0.03
9,14/01/2015 08:00,0.02
10,14/01/2015 09:00,0.01
11,14/01/2015 10:00,0.03
12,14/01/2015 11:00,0.03
13,14/01/2015 12:00,0.03
14,14/01/2015 13:00,0.02
15,14/01/2015 14:00,0.01
16,14/01/2015 15:00,0.03
17,14/01/2015 16:00,0.03
18,14/01/2015 17:00,0.03
19,14/01/2015 18:00,0.01
20,14/01/2015 19:00,0.02
21,14/01/2015 20:00,0.03
22,14/01/2015 21:00,0.03
23,14/01/2015 22:00,0.02
24,14/01/2015 23:00,0.01
25,15/01/2015 00:00,0.03
26,15/01/2015 01:00,0.02
27,15/01/2015 02:00,0.03
28,15/01/2015 03:00,0
29,15/01/2015 04:00,0.03
30,15/01/2015 05:00,0.03
31,15/01/2015 06:00,0.03
32,15/01/2015 07:00,0
33,15/01/2015 08:00,0.03
34,15/01/2015 09:00,0.03
35,15/01/2015 10:00,0.03
36,15/01/2015 11:00,0.01
37,15/01/2015 12:00,0.02
38,15/01/2015 13:00,0.03
39,15/01/2015 14:00,0.03
40,15/01/2015 15:00,0.03
41,15/01/2015 16:00,0.02
42,15/01/2015 17:00,0.01
43,15/01/2015 18:00,0.03
44,15/01/2015 19:00,0.03
45,15/01/2015 20:00,0.03
46,15/01/2015 21:00,0.01
47,15/01/2015 22:00,0.02
48,15/01/2015 23:00,0.03
49,16/01/2015 00:00,0.03
50,16/01/2015 01:00,0
51,16/01/2015 02:00,0.03
52,16/01/2015 03:00,0.03
53,16/01/2015 04:00,0.02
54,16/01/2015 05:00,0.01
55,16/01/2015 06:00,0.03
56,16/01/2015 07:00,0.03
57,16/01/2015 08:00,0
58,16/01/2015 09:00,0.03
59,16/01/2015 10:00,0.03
60,16/01/2015 11:00,0.01
61,16/01/2015 12:00,0.02
62,16/01/2015 13:00,0.03
63,16/01/2015 14:00,0.02
64,16/01/2015 15:00,0.01
65,16/01/2015 16:00,0.03
66,16/01/2015 17:00,0.03
67,16/01/2015 18:00,0
68,16/01/2015 19:00,0.03
69,16/01/2015 20:00,0.03
70,16/01/2015 21:00,0
71,16/01/2015 22:00,0.03
72,16/01/2015 23:00,0.02"
I use the following commands:
library("strucchange")
z <- read.zoo(text = Lines, tz = "", format = "%d/%m/%Y %H:%M", sep = ",")
bp <- breakpoints(z~ 1, h = 36)
> bp
Optimal 1-segment partition:
Call:
breakpoints.formula(formula = z ~ 1, h = 36)
Breakpoints at observation number:
NA
Corresponding to breakdates:
NA
I would like to find the segmentation. Per one day I found Optimal 3-segment partition. However for 3 days (72 hours) no segment was found. How can I find the 12 segments (breakpoints)?

Regarding breakpoint (aka changepoint) detection, here I borrow from the headline of a blog post from Dr. Andrew Gelman (https://statmodeling.stat.columbia.edu/2016/03/18/i-definitely-wouldnt-frame-it-as-to-determine-if-the-time-series-has-a-change-point-or-not-the-time-series-whatever-it-is-has-a-change-point-at-every-time-the-question/):
I definitely wouldn’t frame it as “To determine if the time series has a change-point or not.” The time series, whatever it is, has a change point at every time. The question might be, “Is a change point necessary to model these data?” That’s a question I could get behind.
So, given that time series segmentation is model-based, when and how many breakpoints occur are more or less model-dependent (e.g., assumptions-dependent); that is also why there are numerous alternative breakpoint detection models available in R--the same argument applies essentially to all data analyses. With that said, here are some quick results from two Bayesian changepoint packages: bcp and Rbeast (as a disclaimer, I developed the latter). Unlike those freqentisit-based models seeking the single set of best possible locations of breakpoints, Bayesian approaches try to estimate probabilities of breakpoint occurrence for any given point of time.
# Your sample time series; this is a pure data vector without the time info.
z = c( 0.03,0.03,0.01,0.02,0.03,0.02,0.01,0.03,0.02,0.01,0.03,0.03,0.03,0.02,0.01,0.03,
0.03,0.03,0.01,0.02,0.03,0.03,0.02,0.01,0.03,0.02,0.03,0.00,0.03,0.03,0.03,0.00,
0.03,0.03,0.03,0.01,0.02,0.03,0.03,0.03,0.02,0.01,0.03,0.03,0.03,0.01,0.02,0.03,
0.03,0.00,0.03,0.03,0.02,0.01,0.03,0.03,0.00,0.03,0.03,0.01,0.02,0.03,0.02,0.01,
0.03,0.03,0.00,0.03,0.03,0.00,0.03,0.02)
library(bcp)
out = bcp(z)
plot(out)
No breakpoints found, as shown below, but somehow in the posterior probability curve, there is a tiny bit of probability somewhere to find changepoints. Overall, statistical evidence suggesting a breakpoint is very weak.
A try with Rbeast, which aims to both detect breakpoints and decompose time series (i.e., separate seasonality from trend), but your data contains no periodic/seasonal component; that is why season='none' is used in the beast function.
library(Rbeast)
out = beast(z, season='none')
plot(out)
Similarly, the overall evidence of suggesting the presence of breakpoints is low, but Rbeast finds a few locations that are more likely to be breakpoints than others, as indicated by the tiny peaks in the Pr(tcp) curve. Not surprisingly, the magnitudes of these peak probabilities are very small. On average, beast finds 1 breakpoint and if it has to pinpoint its location, the most probable location is the last peak--the one marked by the vertical dash line.
Again, how to segment a time series depends on how you define breakpoints. I am pretty sure if you try a different approach, the result will vary. If you intend to find any locations that have a literal change (i.e., not constant). You can try something like this
which(abs(diff(z)) >0.02)
which, as you expected, gives 12 breakpoints (Not sure if these are the 12 breakpoints you expected to see.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex