I'm doing a network metanalysis of 29 studies using the netmeta package with R and I now have to do the leave-one-out analysis. I was thus wondering whether there is a way to use for loops to gain the results of a such method in order not to do it by manually excluding one trial at a time.
I came up with this:
for (i in 1:29){
NMA_DB_L<-NMA_DB[-i,]
yi_All_cause<-summary(escalc(ai= NMA_DB_L$All_Cause_d_C, bi=NMA_DB_L$PTS_All_Cause_d_C - NMA_DB_L$All_Cause_d_C,
ci= NMA_DB_L$All_Cause_d_I, di= NMA_DB_L$PTS_All_Cause_d_I - NMA_DB_L$All_Cause_d_I,
measure = "RR"))[,"yi"]
sei_All_cause<-summary(escalc(ai= NMA_DB_L$All_Cause_d_C, bi=NMA_DB_L$PTS_All_Cause_d_C - NMA_DB_L$All_Cause_d_C,
ci= NMA_DB_L$All_Cause_d_I, di= NMA_DB_L$PTS_All_Cause_d_I - NMA_DB_L$All_Cause_d_I,
measure = "RR"))[,"sei"]
netmeta(TE=yi_All_cause, seTE = sei_All_cause, treat1 = NMA_DB_L$Arm_1, treat2 = NMA_DB_L$INT, sm="RR",
studlab = NMA_DB_L$Study, reference.group = "Standard_DAPT")
}
and it seems to work properly, but I cannot find a way to save the results of each analysis without one of the trials.
Does anyone have an idea of how to do so?
Consider also lapply (to avoid bookkeeping of initializing a list and assign in for loop by index). Also, use a defined method and avoid rerunning summary + escalc just to retrieve attributes. Run it once and extract attributes as needed.
# DEFINED METHOD TO RUN CALCULATIONS
# AVOID DRY (I.E., DON'T REPEAT YOURSELF)
run_trials <- function(i) {
NMA_DB_L <- NMA_DB[-i,]
results <- summary(escalc(
ai = NMA_DB_L$All_Cause_d_C,
bi = NMA_DB_L$PTS_All_Cause_d_C - NMA_DB_L$All_Cause_d_C,
ci = NMA_DB_L$All_Cause_d_I,
di = NMA_DB_L$PTS_All_Cause_d_I - NMA_DB_L$All_Cause_d_I,
measure = "RR"
))
yi_All_cause <- results[,"yi"]
sei_All_cause <- results[,"sei"]
netmeta(
TE = yi_All_cause,
seTE = sei_All_cause,
treat1 = NMA_DB_L$Arm_1,
treat2 = NMA_DB_L$INT, sm="RR",
studlab = NMA_DB_L$Study,
reference.group = "Standard_DAPT"
)
}
# BUILD LIST OF RESULTS
netmeta_results <- lapply(1:29, run_trials)
Why not save the outputs of netmeta function into a list?
# Create list of length 29
net_results <- vector('list', 29)
for (i in 1:29) {
NMA_DB_L<-NMA_DB[-i,]
...
net <- netmeta(TE=yi_All_cause, seTE = sei_All_cause,
treat1 = NMA_DB_L$Arm_1, treat2 = NMA_DB_L$INT, sm="RR",
studlab = NMA_DB_L$Study, reference.group = "Standard_DAPT")
net_results[[i]] <- net
}
You can then access results of the specific run with net_results[[1]] etc.
R lists can in general contain any type of element which makes it a suitable structure for this type of problems.
Solution (thanks #Peter_Evan!) in case anyone coming across this question has a similar issue
(Original question is below)
## get all slopes (lm coefficients) first
# list of subfields of interest to loop through
sf <- c("left_presubiculum", "right_presubiculum",
"left_subiculum", "right_subiculum", "left_CA1", "right_CA1",
"left_CA3", "right_CA3", "left_CA4", "right_CA4", "left_GC-ML-DG",
"right_GC-ML-DG")
# dependent variables are sf, independent variable common to all models in the inner lm() call is ICV
# applies the lm(subfield ~ ICV, dataset = DF) to all subfields of interest (sf) specified previously
lm.results <- lapply(sf, function(dv) {
temp.lm <- lm(get(dv) ~ ICV, data = DF)
coef(temp.lm)
})
# returns a list, where each element is a vector of coefficients
# do.call(rbind, ) will paste them together
lm.coef <- data.frame(sf = sf,
do.call(rbind, lm.results))
# tidy up name of intercept variable
names(lm.coef)[2] <- "intercept"
lm.coef
## set up all components for the equation
# matrix to store output
out <- matrix(ncol = length(sf), nrow = NROW(DF))
# name the rows after each subject
row.names(out) <- DF$Subject
# name the columns after each subfield
colnames(out) <- sf
# nested for loop that goes by subject (j) and subfield (i)
for(j in DF$Subject){
for (i in sf) {
slope <- lm.coef[lm.coef$sf == i, "ICV"]
out[j,i] <- as.numeric( DF[DF$Subject == j, i] - (slope * (DF[DF$Subject == j, "ICV"] - mean(DF$ICV))) )
}
}
# check output
out
===============
Original Question:
I have a dataframe (DF) with 13 columns (12 different brain subfields, and one column containing total intracranial volume(ICV)) and 50 rows (each a different participant). I'm trying to automate an equation being looped over every column for each participant.
The data:
structure(list(Subject = c("sub01", "sub02", "sub03", "sub04",
"sub05", "sub06", "sub07", "sub08", "sub09", "sub10", "sub11",
"sub12", "sub13", "sub14", "sub15", "sub16", "sub17", "sub18",
"sub19", "sub20"), ICV = c(1.50813, 1.3964237, 1.6703585, 1.4641886,
1.6351018, 1.5524641, 1.4445532, 1.6384505, 1.6152434, 1.5278011,
1.4788126, 1.4373356, 1.4109637, 1.3634952, 1.3853583, 1.4855268,
1.6082085, 1.5644998, 1.5617522, 1.4304141), left_subiculum = c(411.225013,
456.168033, 492.968477, 466.030173, 533.95505, 476.465524, 448.278213,
476.45566, 422.617374, 498.995121, 450.773906, 461.989663, 549.805272,
452.619547, 457.545623, 451.988333, 475.885847, 490.127968, 470.686415,
494.06548), left_CA1 = c(666.893596, 700.982955, 646.21927, 580.864234,
721.170599, 737.413139, 737.683665, 597.392434, 594.343911, 712.781376,
733.157168, 699.820162, 701.640861, 690.942843, 606.259484, 731.198846,
567.70879, 648.887718, 726.219904, 712.367433), left_presubiculum = c(325.779458,
391.252815, 352.765098, 342.67797, 390.885737, 312.857458, 326.916867,
350.657957, 325.152464, 320.718835, 273.406949, 305.623938, 371.079722,
315.058313, 311.376271, 319.56678, 348.343569, 349.102678, 322.39908,
306.966008), `left_GC-ML-DG` = c(327.037756, 305.63224, 328.945065,
238.920358, 319.494513, 305.153183, 311.347404, 259.259723, 295.369164,
312.022281, 324.200989, 314.636501, 306.550385, 311.399107, 295.108592,
356.197094, 251.098248, 294.76349, 317.308576, 301.800253), left_CA3 = c(275.17038,
220.862237, 232.542718, 170.088695, 234.707172, 210.803287, 246.861975,
171.90896, 220.83478, 236.600832, 246.842024, 239.677362, 186.599097,
224.362411, 229.9142, 293.684776, 172.179779, 202.18936, 232.5666,
221.896625), left_CA4 = c(277.614028, 264.575987, 286.605092,
206.378619, 281.781858, 258.517989, 269.354864, 226.269982, 256.384436,
271.393257, 277.928824, 265.051581, 262.307377, 266.924683, 263.038686,
306.133918, 226.364556, 262.42823, 264.862956, 255.673948), right_subiculum = c(468.762375,
445.35738, 446.536018, 456.73484, 521.041823, 482.768261, 487.2911,
456.39996, 445.392976, 476.146498, 451.775611, 432.740085, 518.170065,
487.642399, 405.564237, 487.188989, 467.854363, 479.268714, 473.212833,
472.325916), right_CA1 = c(712.973011, 717.815214, 663.637105,
649.614586, 711.844375, 779.212704, 862.784416, 648.925038, 648.180611,
760.761704, 805.943016, 717.486756, 801.853608, 722.213109, 621.676321,
791.672796, 605.35667, 637.981476, 719.805053, 722.348921), right_presubiculum = c(327.285242,
364.937865, 288.322641, 348.30058, 341.309111, 279.429847, 333.096795,
342.184296, 364.245998, 350.707173, 280.389853, 276.423658, 339.439377,
321.534798, 302.164685, 328.365751, 341.660085, 305.366589, 320.04127,
303.83284), `right_GC-ML-DG` = c(362.391907, 316.853532, 342.93274,
282.550769, 339.792696, 357.867386, 342.512721, 277.797528, 309.585721,
343.770416, 333.524912, 302.505077, 309.063135, 291.29361, 302.510461,
378.682679, 255.061044, 302.545288, 313.93902, 297.167161), right_CA3 = c(307.007404,
243.839349, 269.063801, 211.336979, 249.283479, 276.092623, 268.183349,
202.947849, 214.642782, 247.844657, 291.206598, 235.864996, 222.285729,
201.427853, 237.654913, 321.338801, 199.035108, 243.204203, 236.305659,
213.386702), right_CA4 = c(312.164065, 272.905586, 297.99392,
240.765062, 289.98697, 306.459566, 284.533068, 245.965817, 264.750571,
296.149675, 290.66935, 264.821461, 264.920869, 246.267976, 266.07378,
314.205819, 229.738951, 274.152503, 256.414608, 249.162404)), row.names = c(NA,
-20L), class = c("tbl_df", "tbl", "data.frame"))
The equation:
adjustedBrain(participant1) = rawBrain(participant1) - slope*[ICV(participant1) - (mean of all ICV measures included in the calculation of the slope)]
The code (which is not working and I was hoping for some pointers):
adjusted_Brain <- function(DF, subject) {
subfields <- colnames(select(DF, "left_presubiculum", "right_presubiculum",
"left_subiculum", "right_subiculum", "left_CA1", "right_CA1",
"left_CA3", "right_CA3", "left_CA4", "right_CA4", "left_GC-ML-DG",
"right_GC-ML-DG"))
out <- matrix(ncol = length(subfields), nrow = NROW(DF))
for (i in seq_along(subfields)) {
DF[i] = DF[DF$Subject == "subject", "i"] -
slope * (DF[DF$Subject == "subject", "ICV"] -
mean(DF$ICV))
}
}
Getting this error:
Error: Can't subset columns that don't exist.
x Column `i` doesn't exist.
A few notes:
The slopes for each subject for each subfield will be different (and will come from a regression) -> is there a way to specify that in the function so the slope (coefficient from the appropriate regression equation) gets called in?
I have my nrow set to the number of participants right now in the output because I'd like to have this run through EVERY subject across EVERY subfield and spit out a matrix with all the adjusted brain volumes... But that seems very complicated and so for now I will just settle for running each participant separately.
Any help is greatly appreciated!
As others have noted in the comments, there are quite a few syntax issues that prevent your code from running, as well as a few unstated requirements. That aside, I think there is enough to recommend a few improvements that you can hopefully build on. Here are the top line changes:
You likely don't need this to be a function, but rather a nested for loop (if you want to do this with base R). As written, the code isn't flexible enough to merit a function. If you intend to apply this many times across different datasets, a function might make sense. However, it will require a much larger rewrite.
Assuming you are fitting a simple regression via lm, then you can pull out the coefficient of interest via the $ operator and indexing (see below). Some thought will need to go into how to handle different models in the loop. Here, we assume you only need one coefficient from one model.
There are a few areas where the syntax is incorrect and a review of sub setting in base R would be helpful. Others have pointed out in the comments were some of these are.
Here is one approach were we loop through each subject (j) through each feature or subfield (i) and store them in a matrix (out). This is just an approach and will almost certainly need tweaking on your end!
#NOTE: the dataset your provided is saved as x in this example.
#fit a linear model - here we assume there is only one coef. of interest, but you may need to alter
# depending on how the slope changes in each calculation
reg <- lm(ICV ~ right_CA3, x)
# view the coeff.
reg$coefficients
# pull out the slope by getting the coeff. of interest (via index) from the reg object
slope <- reg$coefficients[[1]]
# list of features/subfeilds to loop through
sf <- c("left_presubiculum", "right_presubiculum",
"left_subiculum", "right_subiculum", "left_CA1", "right_CA1",
"left_CA3", "right_CA3", "left_CA4", "right_CA4", "left_GC-ML-DG",
"right_GC-ML-DG")
# matrix to store output
out <- matrix(ncol = length(sf), nrow = NROW(x))
#name the rows after each subject
row.names(out) <- x$Subject
#name the columns after each sub feild
colnames(out) <- sf
# nested for loop that goes by subject (j) and features/subfeilds (i)
for(j in x$Subject){
for (i in sf) {
out[j,i] <- as.numeric( x[x$Subject == j, i] - (slope * (x[x$Subject == j, "ICV"] - mean(x$ICV))) )
}
}
# check output
out
I am relatively beginner in R and trying to figure out how to use cpquery function for bnlearn package for all edges of DAG.
First of all, I created a bn object, a network of bn and a table with all strengths.
library(bnlearn)
data(learning.test)
baynet = hc(learning.test)
fit = bn.fit(baynet, learning.test)
sttbl = arc.strength(x = baynet, data = learning.test)
Then I tried to create a new variable in sttbl dataset, which was the result of cpquery function.
sttbl = sttbl %>% mutate(prob = NA) %>% arrange(strength)
sttbl[1,4] = cpquery(fit, `A` == 1, `D` == 1)
It looks pretty good (especially on bigger data), but when I am trying to automate this process somehow, I am struggling with errors, such as:
Error in sampling(fitted = fitted, event = event, evidence = evidence, :
logical vector for evidence is of length 1 instead of 10000.
In perfect situation, I need to create a function that fills the prob generated variable of sttbl dataset regardless it's size. I tried to do it with for loop to, but stumbled over the error above again and again. Unfortunately, I am deleting failed attempts, but they were smt like this:
for (i in 1:nrow(sttbl)) {
j = sttbl[i,1]
k = sttbl[i,2]
sttbl[i,4]=cpquery(fit, fit$j %in% sttbl[i,1]==1, fit$k %in% sttbl[i,2]==1)
}
or this:
for (i in 1:nrow(sttbl)) {
sttbl[i,4]=cpquery(fit, sttbl[i,1] == 1, sttbl[i,2] == 1)
}
Now I think I misunderstood something in R or bnlearn package.
Could you please tell me how to realize this task with filling the column by multiple cpqueries? That would help me a lot with my research!
cpquery is quite difficult to work with programmatically. If you look at the examples in the help page you can see the author uses eval(parse(...)) to build the queries. I have added two approaches below, one using the methods from the help page and one using cpdist to draw samples and reweighting to get the probabilities.
Your example
library(bnlearn); library(dplyr)
data(learning.test)
baynet = hc(learning.test)
fit = bn.fit(baynet, learning.test)
sttbl = arc.strength(x = baynet, data = learning.test)
sttbl = sttbl %>% mutate(prob = NA) %>% arrange(strength)
This uses cpquery and the much maligned eval(parse(...)) -- this is the
approach the the bnlearn author takes to do this programmatically in the ?cpquery examples. Anyway,
# You want the evidence and event to be the same; in your question it is `1`
# but for example using learning.test data we use 'a'
state = "\'a\'" # note if the states are character then these need to be quoted
event = paste(sttbl$from, "==", state)
evidence = paste(sttbl$to, "==", state)
# loop through using code similar to that found in `cpquery`
set.seed(1) # to make sampling reproducible
for(i in 1:nrow(sttbl)) {
qtxt = paste("cpquery(fit, ", event[i], ", ", evidence[i], ",n=1e6", ")")
sttbl$prob[i] = eval(parse(text=qtxt))
}
I find it preferable to work with cpdist which is used to generate random samples conditional on some evidence. You can then use these samples to build up queries. If you use likelihood weighting (method="lw") it is slightly easier to do this programatically (and without evil(parse(...))).
The evidence is added in a named list i.e. list(A='a').
# The following just gives a quick way to assign the same
# evidence state to all the evidence nodes.
evidence = setNames(replicate(nrow(sttbl), "a", simplify = FALSE), sttbl$to)
# Now loop though the queries
# As we are using likelihood weighting we need to reweight to get the probabilities
# (cpquery does this under the hood)
# Also note with this method that you could simulate from more than
# one variable (event) at a time if the evidence was the same.
for(i in 1:nrow(sttbl)) {
temp = cpdist(fit, sttbl$from[i], evidence[i], method="lw")
w = attr(temp, "weights")
sttbl$prob2[i] = sum(w[temp=='a'])/ sum(w)
}
sttbl
# from to strength prob prob2
# 1 A D -1938.9499 0.6186238 0.6233387
# 2 A B -1153.8796 0.6050552 0.6133448
# 3 C D -823.7605 0.7027782 0.7067417
# 4 B E -720.8266 0.7332107 0.7328657
# 5 F E -549.2300 0.5850828 0.5895373
I am trying to bring real time data from LabVIEW (vibration of a bearing and temperature) into an app written in R to create a control chart. It works for a while but eventually crashes with the following error message:
Error in aggregate.data.frame(B, list(rep(1:(nrow(B)%/%n + 1), each = n, :
no rows to aggregate
The process works as LabVIEW takes the data and projects it onto two Excel files. Those files are read in the R code and used to project a control chart in R. The process succeeds for some time, and the failure moment is not always the same amount of time. Sometimes the control chart will run for 6-7 min, other times is will crash in 2 min.
My suspicion is that the Excel files are not being updated fast enough, so the R code tries to read that Excel file when it is empty.
Any suggestions would be great! thank you!
I have tried to lower the sample size taken per second. That did not work.
getwd()
setwd("C:/Users/johnd/Desktop/R Data")
while(1) {
A = fread("C:/Users/johnd/Desktop/R Data/a1.csv" , skip = 4 , header = FALSE , col.names = c("t1","B2","t2","AM","t3","M","t4","B1"))
t1 = A$t1
B2 = A$B2
t2 = A$t2
AM = A$AM
t3 = A$t3
M = A$M
t4 = A$t4
B1 = A$B1
B = fread("C:/Users/johnd/Desktop/R Data/b1.csv" , skip = 4 , header = FALSE , col.names = c("T1","small","T2","big"))
T1 = B$T1
small = B$small
T2 = B$T2
big = B$big
DJ1 = A[seq(1,nrow(A),1),c('t1','B2','AM','M','B1')]
DJ1
n = 16
DJ2 = aggregate(B,list(rep(1:(nrow(B)%/%n+1),each=n,len=nrow(B))),mean)[-1]
DJ2
#------------------------------------------------------------------------
DJ6 = cbind(DJ1[,'B1'],DJ2[,c('small','big')]) # creates matrix for these three indicators
DJ6
#--------------T2 Hand made---------------------------------------------------------------------
new_B1 = DJ6[,'B1']
new_small = DJ6[,'small'] ### decompose the DJ6 matrix into vectors for each indicator(temperature, big & small accelerometers)
new_big = DJ6[,'big']
new_B1
new_small
new_big
mean_B1 = as.numeric(colMeans(DJ6[,'B1']))
mean_small = as.numeric(colMeans(DJ6[,'small'])) ##decomposes into vectors of type numeric
mean_big = as.numeric(colMeans(DJ6[,'big']))
cov_inv = data.matrix(solve(cov(DJ6))) # obtain inverse covariance matrix
cov_inv
p = ncol(DJ6) #changed to pull number of parameters by taking the number of coumns in OG matrix #p=3 # #ofQuality Characteristics
m=64 # #of samples (10 seconds of data)
a_alpha = 0.99
f= qf(a_alpha , df1 = p,df2 = (m-p)) ### calculates the F-Statistic for our data
f
UCL = (p*(m+1)*(m-1)*(f))/(m*(m-p)) ###produces upper control limit
UCL
diff_B1 = new_B1-mean_B1
diff_small = new_small-mean_small
diff_big = new_big-mean_big
DJ7 = cbind(diff_B1, diff_small , diff_big) #produces matrix of difference between average and observations (x-(x-bar))
DJ7
# DJ8 = data.matrix(DJ7[1,])
# DJ8
DJ9 = data.matrix(DJ7) ### turns matrix into appropriate numeric form
DJ9
# T2.1.1 = DJ8 %*% cov_inv %*% t(DJ8)
# T2.1.1
# T2.1 = t(as.matrix(DJ9[1,])) %*% cov_inv %*% as.matrix(DJ9[1,])
# T2.1
#T2 <- NULL
for(i in 1:64){ #### creates vector of T^2 statistic
T2<- t(as.matrix(DJ9[i,])) %*% cov_inv %*% as.matrix(DJ9[i,]) # calculation of T^2 test statistic ## there is no calculation of x-double bar
write.table(T2,"C:/Users/johnd/Desktop/R Data/c1.csv",append=T,sep="," , col.names = FALSE)#
#
DJ12 <-fread("C:/Users/johnd/Desktop/R Data/c1.csv" , header = FALSE ) #
}
# DJ12
DJ12$V1 = 1:nrow(DJ12)
# plot(DJ12 , type='l')
p1 = nrow(DJ12)-m
p2 = nrow(DJ12)
plot(DJ12[p1:p2,], type ='o', ylim =c(0,15), ylab="T2 Chart" , xlab="Data points") ### plots last 640 points
# plot(DJ12[p1:p2,], type ='o' , ylim =c(0,15) , ylab="T2 Chart" , xlab="Data points")
abline(h=UCL , col="red") ## displays upper control limit
Sys.sleep(1)
}
The process succeeds for some time, and the failure moment is not always the same amount of time. Sometimes the control chart will run for 6-7 min, other times is will crash in 2 min.
My suspicion is that the Excel files are not being updated fast enough, so the R code tries to read that Excel file when it is empty.
Your suspicion is correct.
With your current design, your R application can crash depending on how fast it runs relative to your LabVIEW application. This is called a race condition; you must eliminate race conditions from your code.
A quick and dirty solution
One simple solution to avoid the crash is to call NROW to check if any data exists. If there's no data available, don't call aggregate. This is described here: error message in r: no rows to aggregate
A more robust solution
A better solution is to use a communications protocol like TCP to stream data from LabVIEW to R, instead of using CSV files to transfer real-time data. For example, your R program could listen for data on a TCP socket. Make it wait for data to be sent from LabVIEW before running your data processing code.
Here is an example on using socketConnection in R: http://blog.corynissen.com/2013/05/using-r-to-communicate-via-socket.html
Here is an example on sending/receiving data over TCP in LabVIEW: http://www.ni.com/product-documentation/2710/en/