I hope I don't have a big gap in education.
I need to get the final best alpha - learning rate of the model, but I can't manage to get the function right.
I have a data that looks something like this:
ID Turn_no p_mean t_mean
1 1 170 99
1 2 176 93
1 3 138 92
1 4 172 118
1 5 163 96
1 6 170 105
1 7 146 99
1 8 172 94
and so on...
I want to use the equation:
p(turn) = p(turn-1) + alpha[(p(turn-1) - t(turn-1)]
I'm pretty stuck on making a function and log-likelihood based on the Rescorla-Wagner model.
This is the function so far:
RWmodel = function(data, par) {
ll <- NA
alpha <- par[1]
ID <- data$ID
Turn_no <- data$Turn_no
p_mean<- data$p_mean
t_mean<- data$t_mean
num_reps <- length(df$Turn_no)
i <- 2
for (i in 2:num_reps) {
#calculate prediction error
PE <- p_mean[i-1] - t_mean[i-1]
#update p's value
p_mean[i] <- p_mean[i-1] + alpha*PE
}
#minus maximum log likelihood, use sum and log functions
ll <- -sum(log(??))
#return ll
ll
}`
I know I'm missing an important step in the function, I just can't figure out how to execute the log likelihood right in this situation.
I'm struggeling to get a good performing script for this problem: I have a table with a score, x, y. I want to sort the table by score and than build groups based on the x value. Each group should have an equal sum (not counts) of x. x is a metric number in the dataset and resembles the historic turnover of a customer.
score x y
0.436024136 3 435
0.282303336 46 56
0.532358015 24 34
0.644236597 0 2
0.99623626 0 4
0.557673456 56 46
0.08898779 0 7
0.702941303 453 2
0.415717835 23 1
0.017497461 234 3
0.426239166 23 59
0.638896238 234 86
0.629610596 26 68
0.073107526 0 35
0.85741877 0 977
0.468612039 0 324
0.740704267 23 56
0.720147257 0 68
0.965212467 23 0
a good way to do so is adding a group variable to the data.frame with cumsum! Now you can easily sum the groups with e. g. subset.
data.frame$group <-cumsum(as.numeric(data.frame$x)) %/% (ceiling(sum(data.frame$x) / 3)) + 1
remarks:
in big data.frames cumsum(as.numeric()) works reliably
%/% is a division where you get an integer back
the '+1' just let your groups start with 1 instead of 0
thank you #Ronak Shah!
I am trying to do something pretty simple with R but I am not sure I am doing it well. I have a dataset containing three columns V1,V4,V5 and I want to do a regression to get the coefficients Ci,j of the following polynomial of two variables:
sum[i=0->3] sum[j=0->i] Ci,j . (V4_k)^i . (V5_k)^(3-j)
So I tried using the function polym:
lm(V1 ~ polym(V4, V5, degree=3, raw = TRUE), data)
which gives me the following coefficients
[1] 1.048122e+04 -2.050453e+02 1.407736e+00 -3.309312e-03 -3.748650e+01 8.983050e-01 -4.308559e-03 1.834724e-01 -6.868446e-04 4.030224e-04
Now, if I understand well how we must build a formula, I assumed that the following would give the same:
lm(v1 ~ V4 + V5 + I(V4 * V5) + I(V4^2 * V5) + I(V4^3 * V5) + I(V4^2 * V5^2) + I(V4^2*V5^3) + I(V4^3 * V5^2) + I(V4^3 * V5^3), data)
But I get different coefficients:
[1] 3.130403e+03 -1.652007e+01 -1.592879e+02 3.984177e+00 -2.419069e-02 3.919910e-05 1.008657e-04 4.271893e-07 -5.305623e-07 -2.289836e-09
Could you please tell me what I am doing wrong, and what is the correct way to achieve this regression with R?
The polym(V4, V5) call is not giving you what you think it is. (It doesn't matter if you use poly or polym for this example)
Let's look at an example:
v1 <- 1:10; v2 <- 1:10
poly(v1, v2, degree=3, raw=TRUE)
1.0 2.0 3.0 0.1 1.1 2.1 0.2 1.2 0.3
[1,] 1 1 1 1 1 1 1 1 1
[2,] 2 4 8 2 4 8 4 8 8
[3,] 3 9 27 3 9 27 9 27 27
[4,] 4 16 64 4 16 64 16 64 64
[5,] 5 25 125 5 25 125 25 125 125
[6,] 6 36 216 6 36 216 36 216 216
[7,] 7 49 343 7 49 343 49 343 343
[8,] 8 64 512 8 64 512 64 512 512
[9,] 9 81 729 9 81 729 81 729 729
[10,] 10 100 1000 10 100 1000 100 1000 1000
The column label is telling you the degree of the first and second vectors that you gave as arguments. The first three are from V2^0, the seconds three are linear in V2, and so on.
This is correct, but your second example has 4th degree terms in it. If you are actually looking for the 4th degree terms, just change degree to be 4 in the method call.
If you need some more help with polynomial regression, this article, on R-Bloggers should be helpful. It shows how to create models with both I() and poly although I think they were just univariate.
With the sample data
dd<-data.frame(x1=rnorm(50),
x2=rnorm(50))
dd<-transform(dd, z = 2*x1-.5*x1*x2 + 3*x2^2+x1^2 + rnorm(50))
we see that
lm(z~polym(x1,x2,degree=3, raw=T), dd)
lm(z~x1+I(x1^2)+I(x1^3)+I(x2)+I(x1*x2) +
I(x1^2*x2)+I(x2^2) + I(x1*x2^2) + I(x2^3), dd)
are the same.
Note that in your expansion, you have terms like
I(V4^3 * V5) + I(V4^2 * V5^2)
which are both 4th degree terms (the sum of the exponents is 4) so they should not appear in a third degree polynomial. So it depends on what you want. Normally, for a third degree polynomial, you have
sum[i=0->3] sum[j=0->3-i] Ci,j . (V4_k)^i . (V5_k)^j
so i+j<=3 always. It's unclear to me exactly what type of regression you want.
The libraries used are: library(survival)
library(splines)
library(boot)
library(frailtypack) and the function used is in the library frailty pack.
In my data I have two recurrent events(delta.stable and delta.unstable) and one terminal event (delta.censor). There are some time-varying explanatory variables, like unemployment rate(u.rate) (is quarterly) that's why my dataset has been splitted by quarters.
Here there is a link to the subsample used in the code just below, just in case it may be helpful to see the mistake. https://www.dropbox.com/s/spfywobydr94bml/cr_05_males_services.rda
The problem is that it takes a lot of time running until the warning message appear.
Main variables of the Survival function are:
I have two recurrent events:
delta.unstable (unst.): takes value one when the individual find an unstable job.
delta.stable (stable): takes value one when the individual find a stable job.
And one terminal event
delta.censor (d.censor): takes value one when the individual has death, retired or emigrated.
row id contadorbis unst. stable d.censor .t0 .t
1 78 1 0 1 0 0 88
2 101 2 0 1 0 0 46
3 155 3 0 1 0 0 27
4 170 4 0 0 0 0 61
5 170 4 1 0 0 61 86
6 213 5 0 0 0 0 92
7 213 5 0 0 0 92 182
8 213 5 0 0 0 182 273
9 213 5 0 0 0 273 365
10 213 5 1 0 0 365 394
11 334 6 0 1 0 0 6
12 334 7 1 0 0 0 38
13 369 8 0 0 0 0 27
14 369 8 0 0 0 27 119
15 369 8 0 0 0 119 209
16 369 8 0 0 0 209 300
17 369 8 0 0 0 300 392
When I apply multivePenal I obtain the following message:
Error en aggregate.data.frame(as.data.frame(x), ...) :
arguments must have same length
Además: Mensajes de aviso perdidos
In Surv(.t0, .t, delta.stable) : Stop time must be > start time, NA created
#### multivePenal function
fit.joint.05_malesP<multivePenal(Surv(.t0,.t,delta.stable)~cluster(contadorbis)+terminal(as.factor(delta.censor))+event2(delta.unstable),formula.terminalEvent=~1, formula2=~as.factor(h.skill),data=cr_05_males_serv,Frailty=TRUE,recurrentAG=TRUE,cross.validation=F,n.knots=c(7,7,7), kappa=c(1,1,1), maxit=1000, hazard="Splines")
I have checked if Surv(.t0,.t,delta.stable) contains NA, and there are no NA's.
In addition, when I apply for the same data the function frailtyPenal for both possible combinations, the function run well and I get results. I take one week looking at this and I do not find the key. I would appreciate some of light to this problem.
#delta unstable+death
enter code here
fit.joint.05_males<-frailtyPenal(Surv(.t0,.t,delta.unstable)~cluster(id)+u.rate+as.factor(h.skill)+as.factor(m.skill)+as.factor(non.manual)+as.factor(municipio)+as.factor(spanish.speakers)+ as.factor(no.spanish.speaker)+as.factor(Aged.16.19)+as.factor(Aged.20.24)+as.factor(Aged.25.29)+as.factor(Aged.30.34)+as.factor(Aged.35.39)+ as.factor(Aged.40.44)+as.factor(Aged.45.51)+as.factor(older61)+ as.factor(responsabilities)+
terminal(delta.censor),formula.terminalEvent=~u.rate+as.factor(h.skill)+as.factor(m.skill)+as.factor(municipio)+as.factor(spanish.speakers)+as.factor(no.spanish.speaker)+as.factor(Aged.16.19)+as.factor(Aged.20.24)+as.factor(Aged.25.29)+as.factor(Aged.30.34)+as.factor(Aged.35.39)+as.factor(Aged.40.44)+as.factor(Aged.45.51)+as.factor(older61)+ as.factor(responsabilities),data=cr_05_males_services,n.knots=12,kappa1=1000,kappa2=1000,maxit=1000, Frailty=TRUE,joint=TRUE, recurrentAG=TRUE)
###Be patient. The program is computing ...
###The program took 2259.42 seconds
#delta stable+death
fit.joint.05_males<frailtyPenal(Surv(.t0,.t,delta.stable)~cluster(id)+u.rate+as.factor(h.skill)+as.factor(m.skill)+as.factor(non.manual)+as.factor(municipio)+as.factor(spanish.speakers)+as.factor(no.spanish.speaker)+as.factor(Aged.16.19)+as.factor(Aged.20.24)+as.factor(Aged.25.29)+as.factor(Aged.30.34)+as.factor(Aged.35.39)+as.factor(Aged.40.44)+as.factor(Aged.45.51)+as.factor(older61)+as.factor(responsabilities)+terminal(delta.censor),formula.terminalEvent=~u.rate+as.factor(h.skill)+as.factor(m.skill)+as.factor(municipio)+as.factor(spanish.speakers)+as.factor(no.spanish.speaker)+as.factor(Aged.16.19)+as.factor(Aged.20.24)+as.factor(Aged.25.29)+as.factor(Aged.30.34)+as.factor(Aged.35.39)+as.factor(Aged.40.44)+as.factor(Aged.45.51)+as.factor(older61)+as.factor(responsabilities),data=cr_05_males_services,n.knots=12,kappa1=1000,kappa2=1000,maxit=1000, Frailty=TRUE,joint=TRUE, recurrentAG=TRUE)
###The program took 3167.15 seconds
Because you neither provide information about the packages used, nor the data necessary to run multivepenal or frailtyPenal, I can only help you with the Surv part (because I happened to have that package loaded).
The Surv warning message you provided (In Surv(.t0, .t, delta.stable) : Stop time must be > start time, NA created) suggests that something is strange with your variables .t0 (the time argument in Surv, refered to as 'start time' in the warning), and/or .t (time2 argument, 'Stop time' in the warning). I check this possibility with a simple example
# read the data you feed `Surv` with
df <- read.table(text = "row id contadorbis unst. stable d.censor .t0 .t
1 78 1 0 1 0 0 88
2 101 2 0 1 0 0 46
3 155 3 0 1 0 0 27
4 170 4 0 0 0 0 61
5 170 4 1 0 0 61 86
6 213 5 0 0 0 0 92
7 213 5 0 0 0 92 182
8 213 5 0 0 0 182 273
9 213 5 0 0 0 273 365
10 213 5 1 0 0 365 394
11 334 6 0 1 0 0 6
12 334 7 1 0 0 0 38
13 369 8 0 0 0 0 27
14 369 8 0 0 0 27 119
15 369 8 0 0 0 119 209
16 369 8 0 0 0 209 300
17 369 8 0 0 0 300 392", header = TRUE)
# create survival object
mysurv <- with(df, Surv(time = .t0, time2 = .t, event = stable))
mysurv
# create a new data set where one .t for some reason is less than .to
# on row five .t0 is 61, so I set .t to 60
df2 <- df
df2$.t[df2$.t == 86] <- 60
# create survival object using new data which contains at least one Stop time that is less than Start time
mysurv2 <- with(df2, Surv(time = .t0, time2 = .t, event = stable))
# Warning message:
# In Surv(time = .t0, time2 = .t, event = stable) :
# Stop time must be > start time, NA created
# i.e. the same warning message as you got
# check the survival object
mysurv2
# as you can see, the fifth interval contains NA
# I would recommend you check .t0 and .t in your data set carefully
# one way to examine rows where Stop time (.t) is less than start time (.t0) is:
df2[which(df2$.t0 > df2$.t), ]
I am not familiar with multivepenal but it seems that it does not accept a survival object which contains intervals with NA, whereas might frailtyPenal might do so.
The authors of the package have told me that the function is not finished yet, so perhaps that is the reason that it is not working well.
I encountered the same error and arrived at this solution.
frailtyPenal() will not accept data.frames of different length. The data.frame used in Surv and data.frame named in data= in frailtyPenal must be the same length. I used a Cox regression to identify the incomplete cases, reset the survival object to exclude the missing cases and, finally, run frailtyPenal:
library(survival)
library(frailtypack)
data(readmission)
#Reproduce the error
#change the first start time to NA
readmission[1,3] <- NA
#create a survival object with one missing time
surv.obj1 <- with(readmission, Surv(t.start, t.stop, event))
#observe the error
frailtyPenal(surv.obj1 ~ cluster(id) + dukes,
data=readmission,
cross.validation=FALSE,
n.knots=10,
kappa=1,
hazard="Splines")
#repair by resetting the surv object to omit the missing value(s)
#identify NAs using a Cox model
cox.na <- coxph(surv.obj1 ~ dukes, data = readmission)
#remove the NA cases from the original set to create complete cases
readmission2 <- readmission[-cox.na$na.action,]
#reset the survival object using the complete cases
surv.obj2 <- with(readmission2, Surv(t.start, t.stop, event))
#run frailtyPenal using the complete cases dataset and the complete cases Surv object
frailtyPenal(surv.obj2 ~ cluster(id) + dukes,
data = readmission2,
cross.validation = FALSE,
n.knots = 10,
kappa = 1,
hazard = "Splines")
I am trying to carry out diagnostics on the mixed effects logistic regression model below .
mod <- lmer(CEever ~ (1|SL)
+ birthWeightCat
+ AFno
+ FRAgeY*factor(genCat)
+ damGirBir
+ factor(YNSUPPLEM),
data=Data, family="binomial")
The data for this model is in the form:
head(data)
CalfID CEever birthWeightCat AFno FRAgeY damGirBir YNSUPPLEM
305 CA010110001 1 <20 2 48 140.0 1
306 CA010110002 1 21-25 1 45 144.0 0
307 CA010110004 0 21-25 1 47 151.5 0
308 CA010110005 0 <20 2 71 147.0 0
309 CA010110006 0 <20 1 57 141.5 1
310 CA010110007 0 <20 1 53 141.5 1
I can plot the residuals:
res <- resid(mod)
plot(res)
.... but can't get values for leverage or Cook's Distance and Dfbeta.
Firstly are these useful techniques for use with this model type, and then if so what code have people used to get these values.
Have a look at the influence.ME package at CRAN.
alt.est <- influence(modJ, group = "SL")
will produce an estex object from which you may derive dfbetas, cooks d, etc.
alt.est.cooks <- cooks.distance(alt.est)
alt.est.dfB <- dbetas(alt.est)