How to use `grm` in `ltm` package? - r

I'm trying to run grm in ltm package. My script is as follows:
library (ltm)
library (msm)
library (polycor)
dim(data)
head(data)
str(data)
descript(data)
options(max.print=1000000)
rcor.test(data, method = "pearson")
data_2 <- data
data_2[] <- lapply(data_2, factor)
out <- grm(data_2)
out2 <- grm(data_2, constrained = TRUE)
anova(out2,out)
margins(out)
However, when I run margins(out) I get this error: Error in exp[ind] <- n * colSums(GHw * pp) : subscript out of bounds
Would someone please explain this? And how can I resolve this?
I have 35 items in my questionnaire and 576 responders. Here is is an example of the data (first 6 responders and first 6 items).
pespd_qa1 pespd_qa2 pespd_qa3 pespd_qa4 pespd_qa5 pespd_qa6
1 9 5 7 4 1 3
2 5 0 9 6 0 8
3 5 3 5 6 3 5
4 7 5 4 3 1 1
5 2 3 0 0 0 0
6 10 1 8 2 2 5

Related

Global fit using nlsLM

I am trying to fit a difference of Gamma functions to some fMRI data. Here is the function I am trying to fit:
# Difference of Gamma distributions to model HRF
DiffGammas <- function(x, w, ww, a, aa, b, bb) {
y1 = w*((b^a*x^(a-1)*exp(-x*b))/gamma(a))
y2 = (1-ww)*((bb^aa*x^(aa-1)*exp(-x*bb))/gamma(aa))
y = y1-y2;
return(y)
}
Here the data:
Run t y
1 0 0.032003192
1 1 0.035247903
1 2 0.075404794
1 3 0.246668665
1 4 0.43784297
1 5 0.48204744
1 6 0.306346753
1 7 0.143187816
1 8 0.057954844
1 9 0.013958918
1 10 0.022630042
1 11 -0.00735287
1 12 -0.055431955
1 13 -0.11563044
1 14 -0.155657944
1 15 -0.146548568
1 16 -0.086195647
1 17 -0.048550909
1 18 0.016424371
1 19 0.049021839
1 20 0.012366969
1 21 -0.03851945
1 22 -0.071969113
1 23 -0.044332852
2 0 0.08518882
2 1 0.110297941
2 2 0.185532434
2 3 0.352716178
2 4 0.53645537
2 5 0.599135887
2 6 0.443617796
2 7 0.275094048
2 8 0.179031458
2 9 0.118620937
2 10 0.111958314
2 11 0.072388446
2 12 -0.004448448
2 13 -0.058529647
2 14 -0.086651798
2 15 -0.085788373
2 16 -0.032654685
2 17 0.020878978
2 18 0.104788051
2 19 0.169295268
2 20 0.101337921
2 21 0.021178963
2 22 -0.025350047
2 23 -0.053233691
3 0 0.058608233
3 1 0.096408759
3 2 0.194452044
3 3 0.374613189
3 4 0.570983267
3 5 0.572352346
3 6 0.417996955
3 7 0.257623921
3 8 0.16186917
3 9 0.116943452
3 10 0.119766292
3 11 0.064198058
3 12 -0.013711493
3 13 -0.095039932
3 14 -0.105732843
3 15 -0.085641436
3 16 -0.041355324
3 17 0.001644888
3 18 0.037273866
3 19 0.03784796
3 20 0.004481299
3 21 -0.0216824
3 22 -0.020064194
3 23 -0.039836136
4 0 0.068518121
4 1 0.08325848
4 2 0.13751084
4 3 0.276952687
4 4 0.473697571
4 5 0.49691874
4 6 0.37607162
4 7 0.243455766
4 8 0.161476939
4 9 0.132455191
4 10 0.154391828
4 11 0.138457915
4 12 0.120507831
4 13 0.049945217
4 14 0.002031973
4 15 -0.009507957
4 16 0.052133462
4 17 0.107326776
4 18 0.153646926
4 19 0.15333057
4 20 0.107420992
4 21 0.038419348
4 22 0.009900797
4 23 -0.026444602
Where 'Run' is the type of stimulus, 't' is the time and 'y' is the BOLD signal. I want to compare a model in which Run 1-4 each has a separate set of parameters (model14) with a global model in which Runs 1-4 have the same parameters (model0).
model0 converges and works fine:
## Global fit (one curve for all data sets)
fo <- y ~ DiffGammas(t, w, ww, a, aa, b, bb)
model0 <- NULL
model0 <- nlsLM(fo,
data = mydata,
subset = Run %in% 1:4,
start = as.data.frame(rbind(coef(m1))),
trace = T)
summary(model0)
'start' in this case is:
w ww a aa b bb
1 1.769255 0.3870352 10.67308 92.03272 2.163427 6.408473
parameters have been estimated with an individual fit (m1) to Run 1 with the same 'DiffGammas' function.
However, when I try to fit a model with a different set of paramters for each Run:
model14 <- NULL
model14 <- nlsLM(y ~ DiffGammas(t, w[Run], ww[Run], a[Run], aa[Run], b[Run], bb[Run]),
data = mydata,
subset = Run %in% 1:4,
start = as.data.frame(rbind(coef(m1), coef(m2), coef(m3), coef(m4))),
trace = T)
summary(model14)
start in this case is:
w ww a aa b bb
1 1.769255 0.3870352 10.673081 92.03272 2.1634274 6.408473
2 2.857442 1.4833173 6.072707 139.16018 1.1338433 7.297339
3 2.868868 0.6270769 5.665530 132.47579 1.0744604 9.449620
4 2.721601 1.6320522 4.703770 138.55078 0.8022566 7.463612
with parameters been estimated with separate fits to Runs 1-4 with the same 'DiffGammas' function.
Running this last bit of code I get the following errors and I am not sure how to deal with them:
Error in dimnames(x) <- dn :
length of 'dimnames' [2] not equal to array extent
In addition: Warning message:
In matrix(out$hessian, nrow = length(unlist(par))) :
data length [36] is not a sub-multiple or multiple of the number of rows [24]
Any help is appreciated.
Best,
Andrea
With the rest of the data as they were,
start2 <- read.table(text=
" w ww a aa b bb
1 1.769255 0.3870352 10.673081 92.03272 2.1634274 6.408473
2 2.857442 1.4833173 6.072707 139.16018 1.1338433 7.297339
3 2.868868 0.6270769 5.665530 132.47579 1.0744604 9.449620
4 2.721601 1.6320522 4.703770 138.55078 0.8022566 7.463612
", header=TRUE )
models14 <- lapply( 1:nrow(start2), function(i) {
try( nlsLM( fo, data=mydata, start=start2[i,], subset = Run == i, trace=TRUE ) )
})
You will probably see, like me, that start parameter set 2 and 4 fails to produce a model.

R: How to make sequence (1,1,1,2,3,3,3,4,5,5,5,6,7,7,7,8)

Title says it all: how would I code such a repeating sequence where the base repeat unit is : a vector c(1,1,1,2) - repeated 4 times, but incrementing the values in the vector by 2 each time?
I've tried a variety of rep,times,each,seq and can't get the wanted result out..
c(1,1,1,2) + rep(seq(0, 6, 2), each = 4)
# [1] 1 1 1 2 3 3 3 4 5 5 5 6 7 7 7 8
The rep function allows for a vector of the same length as x to be used in the times argument. We can extend the desired pattern with the super secret rep_len.
rep(1:8, rep_len(c(3, 1), 8))
#[1] 1 1 1 2 3 3 3 4 5 5 5 6 7 7 7 8
I'm not sure if I get it right but what's wrong with something as simple as that:
rep<-c(1,1,1,2)
step<-2
vec<-c(rep,step+rep,2*step+rep,3*step+rep)
I accepted luke as it is the easiest for me to understand (and closest to what I was already trying, but failing with!)
I have used this final form:
> c(1,1,1,2)+rep(c(0,2,4,6),each=4)
[1] 1 1 1 2 3 3 3 4 5 5 5 6 7 7 7 8
You could do:
pattern <- rep(c(3, 1), len = 50)
unlist(lapply(1:8, function(x) rep(x, pattern[x])))
[1] 1 1 1 2 3 3 3 4 5 5 5 6 7 7 7 8
This lets you just adjust the length of the pattern under rep(len = X) and removes any usage of addition, which some of the other answers show.
How about:
input <- c(1,1,1,2)
n <- 4
increment <- 2
sort(rep.int(seq.int(from = 0, by = increment, length.out = n), length(input))) + input
[1] 1 1 1 2 3 3 3 4 5 5 5 6 7 7 7 8

fit linear regression model for a variable that depends on past values in R

I am working on a model that is similar to time series prediction.
I have to fit a linear regression model to a target variable(TV) which has two other dependent variables(X and Y) and also on its own past values.
Basically the model looks like this:
TV(t) ~ X(t) + Y(t) + TV(t-1) + TV(t-2) + TV(t-3)
I got stuck attempting at converting this R code
model <- lm(modeldata$TV ~ modeldata$X +modeldata$Y+ ??)
How do i write the R code to fit this kind of model?.
One of the possible solutions is to use the Hadley Wickham's dplyr package and its lag() function.
Here is a complete example. We first create a simple modeldata.
modeldata <- data.frame(X=1:10, Y=1:10, TV=1:10)
modeldata
X Y TV
1 1 1 1
2 2 2 2
3 3 3 3
4 4 4 4
5 5 5 5
6 6 6 6
7 7 7 7
8 8 8 8
9 9 9 9
10 10 10 10
Then we load dplyr package and use its mutate() function. We create new columns in the data frame using lag() function.
library(dplyr)
modeldata <- mutate(modeldata, TVm1 = lag(TV,1), TVm2 = lag(TV,2), TVm3 = lag(TV, 3))
modeldata
X Y TV TVm1 TVm2 TVm3
1 1 1 1 NA NA NA
2 2 2 2 1 NA NA
3 3 3 3 2 1 NA
4 4 4 4 3 2 1
5 5 5 5 4 3 2
6 6 6 6 5 4 3
7 7 7 7 6 5 4
8 8 8 8 7 6 5
9 9 9 9 8 7 6
10 10 10 10 9 8 7
Lastly we provide all variables from our data frame (using ~. notation) to lm() function.
model <- lm(TV ~ ., data = modeldata)
To obtain predictions based on this model, we have to prepare test set in the same way.
testdata <- data.frame(X = 11:15, Y = 11:15, TV = 11:15)
testdata <- mutate(testdata, TVm1 = lag(TV,1), TVm2 = lag(TV,2), TVm3 = lag(TV, 3))
predict(model, newdata = testdata)
In this case we can obtain prediction only for observation 14 and 15 in testdata. For earlier observations, we are not able to calculate all lag values.
Of course, we assume that we have some kind of time series data. Otherwise, it is not possible to fit and use such model.
You need to build the proper dataset before sending to lm. Some lag functions exist: one in the dply package and a different one for use with time series objects. You might get a quick approach to creating a lagged version of TV with:
laggedVar <- embed(Var, 4)
E.g.
> embed(1:10, 4)
[,1] [,2] [,3] [,4]
[1,] 4 3 2 1
[2,] 5 4 3 2
[3,] 6 5 4 3
[4,] 7 6 5 4
[5,] 8 7 6 5
[6,] 9 8 7 6
[7,] 10 9 8 7
You might also look at the regression methods designed for use with panel data that might be expected to have some degree of auto-correlation.

SAX function of TSclust package generate an error

I am using TSclust package for SAX (symbolic aggregate aggregation) plots. In accordance to example shown on page 25, I am using the function
SAX.plot(as.ts(df$power), w=30, alpha=4)
But, it generates error as:
Error in if ((n <- as.integer(n[1L])) > 0) { : argument is of length zero
I am not able to debug it. Even I looked into the source code of SAX.plot function but I do not find the relevant error message typed in.
The required R dataobject can be found at link
R version: 3.2
TSclust version:1.2.3
Hello apparently it's because you need to normalize your data, check out this example :
# Parameters
w <- 30
alpha <- 4
# PAA
x <- df$power
paax <- PAA(x, w)
plot(x, type="l", main="PAA reduction of series x")
p <- rep(paax,each=length(x)/length(paax)) #just for plotting the PAA
lines(p, col="red")
# SAX
convert.to.SAX.symbol(paax , alpha)
# [1] 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
# You need to scale PAA result
convert.to.SAX.symbol(scale(paax) , alpha)
# [1] 1 1 1 1 1 1 1 1 1 2 2 1 4 3 3 1 2 2 2 4 4 4 1 1 2 4 3 3 4 4
# SAX plot : with scaling this works
SAX.plot(as.ts(scale(df$power)), w=w, alpha=alpha)
That's likely the example you can found in the function help page.

Optimum way to perform a bland altman analysis using R

Is there a way to produce a Bland-Altman plot using GGplot2?
I have looked at using methcomp but cant seem to get my data into a Meth object
library(MethComp)
comp <- read.csv("HIVVL.csv")
com <- data.frame(comp)
co <- Meth(com)
with(co, BA.plot(Qiagen, Abbot))
keep running into the error
comp <- read.csv("HIVVL.csv")
com <- data.frame(comp)
co <- Meth(com)
Error in `[.data.frame`(data, , meth) : undefined columns selected
a print of com looks somthing like this
Abbot Qiagen
1 66000 66057
2 40273 73376
3 13818 14684
4 53328 195509
5 8369 25000
6 89833 290000
7 116 219
Have you read ?Meth? It is looking for columns named meth and item in your data, which don't exist (see my example below).
Also, the step com <- data.frame(comp) is not doing anything different than com <- comp. read.csv already returns a data.frame.
d <- data.frame(x=1:10, y=1:10)
Meth(d)
# Error in `[.data.frame`(data, , meth) : undefined columns selected
Meth(d, meth='x')
# Error in `[.data.frame`(data, , item) : undefined columns selected
Meth(d, meth='x', item='y')
# The following variables from the dataframe
# "d" are used as the Meth variables:
# meth: x
# item: y
# y: y
# #Replicates
# Method 1 #Items #Obs: 10 Values: min med max
# 1 1 1 1 1 1 1
# 2 1 1 1 2 2 2
# 3 1 1 1 3 3 3
# 4 1 1 1 4 4 4
# 5 1 1 1 5 5 5
# 6 1 1 1 6 6 6
# 7 1 1 1 7 7 7
# 8 1 1 1 8 8 8
# 9 1 1 1 9 9 9
# 10 1 1 1 10 10 10

Resources