Global fit using nlsLM - r

I am trying to fit a difference of Gamma functions to some fMRI data. Here is the function I am trying to fit:
# Difference of Gamma distributions to model HRF
DiffGammas <- function(x, w, ww, a, aa, b, bb) {
y1 = w*((b^a*x^(a-1)*exp(-x*b))/gamma(a))
y2 = (1-ww)*((bb^aa*x^(aa-1)*exp(-x*bb))/gamma(aa))
y = y1-y2;
return(y)
}
Here the data:
Run t y
1 0 0.032003192
1 1 0.035247903
1 2 0.075404794
1 3 0.246668665
1 4 0.43784297
1 5 0.48204744
1 6 0.306346753
1 7 0.143187816
1 8 0.057954844
1 9 0.013958918
1 10 0.022630042
1 11 -0.00735287
1 12 -0.055431955
1 13 -0.11563044
1 14 -0.155657944
1 15 -0.146548568
1 16 -0.086195647
1 17 -0.048550909
1 18 0.016424371
1 19 0.049021839
1 20 0.012366969
1 21 -0.03851945
1 22 -0.071969113
1 23 -0.044332852
2 0 0.08518882
2 1 0.110297941
2 2 0.185532434
2 3 0.352716178
2 4 0.53645537
2 5 0.599135887
2 6 0.443617796
2 7 0.275094048
2 8 0.179031458
2 9 0.118620937
2 10 0.111958314
2 11 0.072388446
2 12 -0.004448448
2 13 -0.058529647
2 14 -0.086651798
2 15 -0.085788373
2 16 -0.032654685
2 17 0.020878978
2 18 0.104788051
2 19 0.169295268
2 20 0.101337921
2 21 0.021178963
2 22 -0.025350047
2 23 -0.053233691
3 0 0.058608233
3 1 0.096408759
3 2 0.194452044
3 3 0.374613189
3 4 0.570983267
3 5 0.572352346
3 6 0.417996955
3 7 0.257623921
3 8 0.16186917
3 9 0.116943452
3 10 0.119766292
3 11 0.064198058
3 12 -0.013711493
3 13 -0.095039932
3 14 -0.105732843
3 15 -0.085641436
3 16 -0.041355324
3 17 0.001644888
3 18 0.037273866
3 19 0.03784796
3 20 0.004481299
3 21 -0.0216824
3 22 -0.020064194
3 23 -0.039836136
4 0 0.068518121
4 1 0.08325848
4 2 0.13751084
4 3 0.276952687
4 4 0.473697571
4 5 0.49691874
4 6 0.37607162
4 7 0.243455766
4 8 0.161476939
4 9 0.132455191
4 10 0.154391828
4 11 0.138457915
4 12 0.120507831
4 13 0.049945217
4 14 0.002031973
4 15 -0.009507957
4 16 0.052133462
4 17 0.107326776
4 18 0.153646926
4 19 0.15333057
4 20 0.107420992
4 21 0.038419348
4 22 0.009900797
4 23 -0.026444602
Where 'Run' is the type of stimulus, 't' is the time and 'y' is the BOLD signal. I want to compare a model in which Run 1-4 each has a separate set of parameters (model14) with a global model in which Runs 1-4 have the same parameters (model0).
model0 converges and works fine:
## Global fit (one curve for all data sets)
fo <- y ~ DiffGammas(t, w, ww, a, aa, b, bb)
model0 <- NULL
model0 <- nlsLM(fo,
data = mydata,
subset = Run %in% 1:4,
start = as.data.frame(rbind(coef(m1))),
trace = T)
summary(model0)
'start' in this case is:
w ww a aa b bb
1 1.769255 0.3870352 10.67308 92.03272 2.163427 6.408473
parameters have been estimated with an individual fit (m1) to Run 1 with the same 'DiffGammas' function.
However, when I try to fit a model with a different set of paramters for each Run:
model14 <- NULL
model14 <- nlsLM(y ~ DiffGammas(t, w[Run], ww[Run], a[Run], aa[Run], b[Run], bb[Run]),
data = mydata,
subset = Run %in% 1:4,
start = as.data.frame(rbind(coef(m1), coef(m2), coef(m3), coef(m4))),
trace = T)
summary(model14)
start in this case is:
w ww a aa b bb
1 1.769255 0.3870352 10.673081 92.03272 2.1634274 6.408473
2 2.857442 1.4833173 6.072707 139.16018 1.1338433 7.297339
3 2.868868 0.6270769 5.665530 132.47579 1.0744604 9.449620
4 2.721601 1.6320522 4.703770 138.55078 0.8022566 7.463612
with parameters been estimated with separate fits to Runs 1-4 with the same 'DiffGammas' function.
Running this last bit of code I get the following errors and I am not sure how to deal with them:
Error in dimnames(x) <- dn :
length of 'dimnames' [2] not equal to array extent
In addition: Warning message:
In matrix(out$hessian, nrow = length(unlist(par))) :
data length [36] is not a sub-multiple or multiple of the number of rows [24]
Any help is appreciated.
Best,
Andrea

With the rest of the data as they were,
start2 <- read.table(text=
" w ww a aa b bb
1 1.769255 0.3870352 10.673081 92.03272 2.1634274 6.408473
2 2.857442 1.4833173 6.072707 139.16018 1.1338433 7.297339
3 2.868868 0.6270769 5.665530 132.47579 1.0744604 9.449620
4 2.721601 1.6320522 4.703770 138.55078 0.8022566 7.463612
", header=TRUE )
models14 <- lapply( 1:nrow(start2), function(i) {
try( nlsLM( fo, data=mydata, start=start2[i,], subset = Run == i, trace=TRUE ) )
})
You will probably see, like me, that start parameter set 2 and 4 fails to produce a model.

Related

Repeat a set of ID's for every "n rows"

I have this data set in R:
first_variable = rexp(100,100)
second_variable = rexp(100,100)
n_obs = 1:100
question_data = data.frame(n_obs, first_variable, second_variable)
I want to make this dataset so that:
The rows 1-10 has id:1,2,3,4,5,6,7,8,9,10
The rows 11-20 has id: 1,2,3,4,5,6,7,8,9,10
The rows 21-30 has id : 1,2,,3,4,5,6,7,8,9,10
etc
In other words, the id's 1-10 repeat for each sets of 10 rows.
I found this code that I thought would work:
# here, n = 10 (a set of n = 10 rows)
bloc_len <- 10
question_data$id <-
rep(seq(1, 1 + nrow(question_data) %/% bloc_len), each = bloc_len, length.out = nrow(question_data))
But this is not working, and is making each set of 10 rows as the same ID:
n_obs first_variable second_variable id
1 1 0.006223412 0.0258968583 1
2 2 0.004473815 0.0065543554 1
3 3 0.011745754 0.0005061101 1
4 4 0.005620351 0.0033549525 1
5 5 0.045860202 0.0132625822 1
6 6 0.002477348 0.0068517981 1
I would have wanted something like this:
n_obs first_variable second_variable id
1 1 0.0062234115 0.0258968583 1
2 2 0.0044738150 0.0065543554 2
3 3 0.0117457544 0.0005061101 3
4 4 0.0056203508 0.0033549525 4
5 5 0.0458602019 0.0132625822 5
6 6 0.0024773478 0.0068517981 6
7 7 0.0049527013 0.0047461094 7
8 8 0.0058581805 0.0108604478 8
9 9 0.0041171801 0.0002445268 9
10 10 0.0090667287 0.0019289691 10
11 11 0.0039002449 0.0135441919 1
12 12 0.0064558661 0.0230979415 2
13 13 0.0104993267 0.0005609776 3
14 14 0.0153162705 0.0038364012 4
15 15 0.0107109676 0.0183818539 5
16 16 0.0131620151 0.0029710189 6
17 17 0.0244441763 0.0095645480 7
18 18 0.0058112355 0.0125754349 8
19 19 0.0005022588 0.0156614272 9
20 20 0.0007572985 0.0049964333 10
21 21 0.0276024376 0.0024303513 1
Is this possible?
Thank you!
Instead of each, try using times:
question_data$id <-
rep(seq(bloc_len), times = nrow(question_data) %/% bloc_len, length.out = nrow(question_data))
Like the example shared, if the number of rows in the data (100) is completely divisible by the number of id's (10) then we can use R's recycling property to repeat the id's.
bloc_len <- 10
question_data$id <- seq_len(bloc_len)
If they are not completely divisible we can use rep -
question_data$id <- rep(seq_len(bloc_len), length.out = nrow(question_data))

How to use `grm` in `ltm` package?

I'm trying to run grm in ltm package. My script is as follows:
library (ltm)
library (msm)
library (polycor)
dim(data)
head(data)
str(data)
descript(data)
options(max.print=1000000)
rcor.test(data, method = "pearson")
data_2 <- data
data_2[] <- lapply(data_2, factor)
out <- grm(data_2)
out2 <- grm(data_2, constrained = TRUE)
anova(out2,out)
margins(out)
However, when I run margins(out) I get this error: Error in exp[ind] <- n * colSums(GHw * pp) : subscript out of bounds
Would someone please explain this? And how can I resolve this?
I have 35 items in my questionnaire and 576 responders. Here is is an example of the data (first 6 responders and first 6 items).
pespd_qa1 pespd_qa2 pespd_qa3 pespd_qa4 pespd_qa5 pespd_qa6
1 9 5 7 4 1 3
2 5 0 9 6 0 8
3 5 3 5 6 3 5
4 7 5 4 3 1 1
5 2 3 0 0 0 0
6 10 1 8 2 2 5

Attempting to remove a row in R using variable names

I am trying to remove some rows in a for loop in R. The conditional involves comparing it to the line below it, so I can't filter within the brackets.
I know that I can remove a row when a constant is specified: dataframe[-2, ]. I just want to do the same with a variable: dataframe[-x, ]. Here's the full loop:
for (j in 1:(nrow(referrals) - 1)) {
k <- j + 1
if (referrals[j, "Client ID"] == referrals[k, "Client ID"] &
referrals[j, "Provider SubCode"] == referrals[k, "Provider SubCode"]) {
referrals[-k, ]
}
}
The code runs without complaint, but no rows are removed (and I know some should be). Of course, if it I test it with a constant, it works fine: referrals[-2, ].
You need to add a reproducible example for people to work with. I don't know the structure of your data, so I can only guess if this will work for you. I would not use a loop, for the reasons pointed out in the comments. I would identify the rows to remove first, and then remove them using normal means. Consider:
set.seed(4499) # this makes the example exactly reproducible
d <- data.frame(Client.ID = sample.int(4, 20, replace=T),
Provider.SubCode = sample.int(4, 20, replace=T))
d
# Client.ID Provider.SubCode
# 1 1 1
# 2 1 4
# 3 3 2
# 4 4 4
# 5 4 1
# 6 2 2
# 7 2 2 # redundant
# 8 3 1
# 9 4 4
# 10 3 4
# 11 1 3
# 12 1 3 # redundant
# 13 3 4
# 14 1 2
# 15 3 2
# 16 4 4
# 17 3 4
# 18 2 2
# 19 4 1
# 20 3 3
redundant.rows <- with(d, Client.ID[1:nrow(d)-1]==Client.ID[2:nrow(d)] &
Provider.SubCode[1:nrow(d)-1]==Provider.SubCode[2:nrow(d)] )
d[-c(which(redundant.rows)+1),]
# Client.ID Provider.SubCode
# 1 1 1
# 2 1 4
# 3 3 2
# 4 4 4
# 5 4 1
# 6 2 2
# 8 3 1 # 7 is missing
# 9 4 4
# 10 3 4
# 11 1 3
# 13 3 4 # 12 is missing
# 14 1 2
# 15 3 2
# 16 4 4
# 17 3 4
# 18 2 2
# 19 4 1
# 20 3 3
Using all information given by you, I believe this could be a good alternative:
duplicated.rows <- duplicated(referrals)
Then, if you want the duplicated results run:
referrals.double <- referrals[duplicated.rows, ]
However, if you want the non duplicated results run:
referrals.not.double <- referrals[!duplicated.rows, ]
If you prefer to go step by step (maybe it's interesting for you):
duplicated.rows.Client.ID <- duplicated(referrals$"Client ID")
duplicated.rows.Provider.SubCode <- duplicated(referrals$"Provider SubCode")
referrals.not.double <- referrals[!duplicated.rows.Client.ID, ]
referrals.not.double <- referrals.not.double[!duplicated.rows.Client.ID, ]

Optimum way to perform a bland altman analysis using R

Is there a way to produce a Bland-Altman plot using GGplot2?
I have looked at using methcomp but cant seem to get my data into a Meth object
library(MethComp)
comp <- read.csv("HIVVL.csv")
com <- data.frame(comp)
co <- Meth(com)
with(co, BA.plot(Qiagen, Abbot))
keep running into the error
comp <- read.csv("HIVVL.csv")
com <- data.frame(comp)
co <- Meth(com)
Error in `[.data.frame`(data, , meth) : undefined columns selected
a print of com looks somthing like this
Abbot Qiagen
1 66000 66057
2 40273 73376
3 13818 14684
4 53328 195509
5 8369 25000
6 89833 290000
7 116 219
Have you read ?Meth? It is looking for columns named meth and item in your data, which don't exist (see my example below).
Also, the step com <- data.frame(comp) is not doing anything different than com <- comp. read.csv already returns a data.frame.
d <- data.frame(x=1:10, y=1:10)
Meth(d)
# Error in `[.data.frame`(data, , meth) : undefined columns selected
Meth(d, meth='x')
# Error in `[.data.frame`(data, , item) : undefined columns selected
Meth(d, meth='x', item='y')
# The following variables from the dataframe
# "d" are used as the Meth variables:
# meth: x
# item: y
# y: y
# #Replicates
# Method 1 #Items #Obs: 10 Values: min med max
# 1 1 1 1 1 1 1
# 2 1 1 1 2 2 2
# 3 1 1 1 3 3 3
# 4 1 1 1 4 4 4
# 5 1 1 1 5 5 5
# 6 1 1 1 6 6 6
# 7 1 1 1 7 7 7
# 8 1 1 1 8 8 8
# 9 1 1 1 9 9 9
# 10 1 1 1 10 10 10

Read csv with two headers into a data.frame

Apologies for the seemingly simple question, but I can't seem to find a solution to the following re-arrangement problem.
I'm used to using read.csv to read in files with a header row, but I have an excel spreadsheet with two 'header' rows - cell identifier (a, b, c ... g) and three sets of measurements (x, y and z; 1000s each) for each cell:
a b
x y z x y z
10 1 5 22 1 6
12 2 6 21 3 5
12 2 7 11 3 7
13 1 4 33 2 8
12 2 5 44 1 9
csv file below:
a,,,b,,
x,y,z,x,y,z
10,1,5,22,1,6
12,2,6,21,3,5
12,2,7,11,3,7
13,1,4,33,2,8
12,2,5,44,1,9
How can I get to a data.frame in R as shown below?
cell x y z
a 10 1 5
a 12 2 6
a 12 2 7
a 13 1 4
a 12 2 5
b 22 1 6
b 21 3 5
b 11 3 7
b 33 2 8
b 44 1 9
Use base R reshape():
temp = read.delim(text="a,,,b,,
x,y,z,x,y,z
10,1,5,22,1,6
12,2,6,21,3,5
12,2,7,11,3,7
13,1,4,33,2,8
12,2,5,44,1,9", header=TRUE, skip=1, sep=",")
names(temp)[1:3] = paste0(names(temp[1:3]), ".0")
OUT = reshape(temp, direction="long", ids=rownames(temp), varying=1:ncol(temp))
OUT
# time x y z id
# 1.0 0 10 1 5 1
# 2.0 0 12 2 6 2
# 3.0 0 12 2 7 3
# 4.0 0 13 1 4 4
# 5.0 0 12 2 5 5
# 1.1 1 22 1 6 1
# 2.1 1 21 3 5 2
# 3.1 1 11 3 7 3
# 4.1 1 33 2 8 4
# 5.1 1 44 1 9 5
Basically, you should just skip the first row, where there are the letters a-g every third column. Since the sub-column names are all the same, R will automatically append a grouping number after all of the columns after the third column; so we need to add a grouping number to the first three columns.
You can either then create an "id" variable, or, as I've done here, just use the row names for the IDs.
You can change the "time" variable to your "cell" variable as follows:
# Change the following to the number of levels you actually have
OUT$cell = factor(OUT$time, labels=letters[1:2])
Then, drop the "time" column:
OUT$time = NULL
Update
To answer a question in the comments below, if the first label was something other than a letter, this should still pose no problem. The sequence I would take would be as follows:
temp = read.csv("path/to/file.csv", skip=1, stringsAsFactors = FALSE)
GROUPS = read.csv("path/to/file.csv", header=FALSE,
nrows=1, stringsAsFactors = FALSE)
GROUPS = GROUPS[!is.na(GROUPS)]
names(temp)[1:3] = paste0(names(temp[1:3]), ".0")
OUT = reshape(temp, direction="long", ids=rownames(temp), varying=1:ncol(temp))
OUT$cell = factor(temp$time, labels=GROUPS)
OUT$time = NULL

Resources