I'm doing a network metanalysis of 29 studies using the netmeta package with R and I now have to do the leave-one-out analysis. I was thus wondering whether there is a way to use for loops to gain the results of a such method in order not to do it by manually excluding one trial at a time.
I came up with this:
for (i in 1:29){
NMA_DB_L<-NMA_DB[-i,]
yi_All_cause<-summary(escalc(ai= NMA_DB_L$All_Cause_d_C, bi=NMA_DB_L$PTS_All_Cause_d_C - NMA_DB_L$All_Cause_d_C,
ci= NMA_DB_L$All_Cause_d_I, di= NMA_DB_L$PTS_All_Cause_d_I - NMA_DB_L$All_Cause_d_I,
measure = "RR"))[,"yi"]
sei_All_cause<-summary(escalc(ai= NMA_DB_L$All_Cause_d_C, bi=NMA_DB_L$PTS_All_Cause_d_C - NMA_DB_L$All_Cause_d_C,
ci= NMA_DB_L$All_Cause_d_I, di= NMA_DB_L$PTS_All_Cause_d_I - NMA_DB_L$All_Cause_d_I,
measure = "RR"))[,"sei"]
netmeta(TE=yi_All_cause, seTE = sei_All_cause, treat1 = NMA_DB_L$Arm_1, treat2 = NMA_DB_L$INT, sm="RR",
studlab = NMA_DB_L$Study, reference.group = "Standard_DAPT")
}
and it seems to work properly, but I cannot find a way to save the results of each analysis without one of the trials.
Does anyone have an idea of how to do so?
Consider also lapply (to avoid bookkeeping of initializing a list and assign in for loop by index). Also, use a defined method and avoid rerunning summary + escalc just to retrieve attributes. Run it once and extract attributes as needed.
# DEFINED METHOD TO RUN CALCULATIONS
# AVOID DRY (I.E., DON'T REPEAT YOURSELF)
run_trials <- function(i) {
NMA_DB_L <- NMA_DB[-i,]
results <- summary(escalc(
ai = NMA_DB_L$All_Cause_d_C,
bi = NMA_DB_L$PTS_All_Cause_d_C - NMA_DB_L$All_Cause_d_C,
ci = NMA_DB_L$All_Cause_d_I,
di = NMA_DB_L$PTS_All_Cause_d_I - NMA_DB_L$All_Cause_d_I,
measure = "RR"
))
yi_All_cause <- results[,"yi"]
sei_All_cause <- results[,"sei"]
netmeta(
TE = yi_All_cause,
seTE = sei_All_cause,
treat1 = NMA_DB_L$Arm_1,
treat2 = NMA_DB_L$INT, sm="RR",
studlab = NMA_DB_L$Study,
reference.group = "Standard_DAPT"
)
}
# BUILD LIST OF RESULTS
netmeta_results <- lapply(1:29, run_trials)
Why not save the outputs of netmeta function into a list?
# Create list of length 29
net_results <- vector('list', 29)
for (i in 1:29) {
NMA_DB_L<-NMA_DB[-i,]
...
net <- netmeta(TE=yi_All_cause, seTE = sei_All_cause,
treat1 = NMA_DB_L$Arm_1, treat2 = NMA_DB_L$INT, sm="RR",
studlab = NMA_DB_L$Study, reference.group = "Standard_DAPT")
net_results[[i]] <- net
}
You can then access results of the specific run with net_results[[1]] etc.
R lists can in general contain any type of element which makes it a suitable structure for this type of problems.
Solution (thanks #Peter_Evan!) in case anyone coming across this question has a similar issue
(Original question is below)
## get all slopes (lm coefficients) first
# list of subfields of interest to loop through
sf <- c("left_presubiculum", "right_presubiculum",
"left_subiculum", "right_subiculum", "left_CA1", "right_CA1",
"left_CA3", "right_CA3", "left_CA4", "right_CA4", "left_GC-ML-DG",
"right_GC-ML-DG")
# dependent variables are sf, independent variable common to all models in the inner lm() call is ICV
# applies the lm(subfield ~ ICV, dataset = DF) to all subfields of interest (sf) specified previously
lm.results <- lapply(sf, function(dv) {
temp.lm <- lm(get(dv) ~ ICV, data = DF)
coef(temp.lm)
})
# returns a list, where each element is a vector of coefficients
# do.call(rbind, ) will paste them together
lm.coef <- data.frame(sf = sf,
do.call(rbind, lm.results))
# tidy up name of intercept variable
names(lm.coef)[2] <- "intercept"
lm.coef
## set up all components for the equation
# matrix to store output
out <- matrix(ncol = length(sf), nrow = NROW(DF))
# name the rows after each subject
row.names(out) <- DF$Subject
# name the columns after each subfield
colnames(out) <- sf
# nested for loop that goes by subject (j) and subfield (i)
for(j in DF$Subject){
for (i in sf) {
slope <- lm.coef[lm.coef$sf == i, "ICV"]
out[j,i] <- as.numeric( DF[DF$Subject == j, i] - (slope * (DF[DF$Subject == j, "ICV"] - mean(DF$ICV))) )
}
}
# check output
out
===============
Original Question:
I have a dataframe (DF) with 13 columns (12 different brain subfields, and one column containing total intracranial volume(ICV)) and 50 rows (each a different participant). I'm trying to automate an equation being looped over every column for each participant.
The data:
structure(list(Subject = c("sub01", "sub02", "sub03", "sub04",
"sub05", "sub06", "sub07", "sub08", "sub09", "sub10", "sub11",
"sub12", "sub13", "sub14", "sub15", "sub16", "sub17", "sub18",
"sub19", "sub20"), ICV = c(1.50813, 1.3964237, 1.6703585, 1.4641886,
1.6351018, 1.5524641, 1.4445532, 1.6384505, 1.6152434, 1.5278011,
1.4788126, 1.4373356, 1.4109637, 1.3634952, 1.3853583, 1.4855268,
1.6082085, 1.5644998, 1.5617522, 1.4304141), left_subiculum = c(411.225013,
456.168033, 492.968477, 466.030173, 533.95505, 476.465524, 448.278213,
476.45566, 422.617374, 498.995121, 450.773906, 461.989663, 549.805272,
452.619547, 457.545623, 451.988333, 475.885847, 490.127968, 470.686415,
494.06548), left_CA1 = c(666.893596, 700.982955, 646.21927, 580.864234,
721.170599, 737.413139, 737.683665, 597.392434, 594.343911, 712.781376,
733.157168, 699.820162, 701.640861, 690.942843, 606.259484, 731.198846,
567.70879, 648.887718, 726.219904, 712.367433), left_presubiculum = c(325.779458,
391.252815, 352.765098, 342.67797, 390.885737, 312.857458, 326.916867,
350.657957, 325.152464, 320.718835, 273.406949, 305.623938, 371.079722,
315.058313, 311.376271, 319.56678, 348.343569, 349.102678, 322.39908,
306.966008), `left_GC-ML-DG` = c(327.037756, 305.63224, 328.945065,
238.920358, 319.494513, 305.153183, 311.347404, 259.259723, 295.369164,
312.022281, 324.200989, 314.636501, 306.550385, 311.399107, 295.108592,
356.197094, 251.098248, 294.76349, 317.308576, 301.800253), left_CA3 = c(275.17038,
220.862237, 232.542718, 170.088695, 234.707172, 210.803287, 246.861975,
171.90896, 220.83478, 236.600832, 246.842024, 239.677362, 186.599097,
224.362411, 229.9142, 293.684776, 172.179779, 202.18936, 232.5666,
221.896625), left_CA4 = c(277.614028, 264.575987, 286.605092,
206.378619, 281.781858, 258.517989, 269.354864, 226.269982, 256.384436,
271.393257, 277.928824, 265.051581, 262.307377, 266.924683, 263.038686,
306.133918, 226.364556, 262.42823, 264.862956, 255.673948), right_subiculum = c(468.762375,
445.35738, 446.536018, 456.73484, 521.041823, 482.768261, 487.2911,
456.39996, 445.392976, 476.146498, 451.775611, 432.740085, 518.170065,
487.642399, 405.564237, 487.188989, 467.854363, 479.268714, 473.212833,
472.325916), right_CA1 = c(712.973011, 717.815214, 663.637105,
649.614586, 711.844375, 779.212704, 862.784416, 648.925038, 648.180611,
760.761704, 805.943016, 717.486756, 801.853608, 722.213109, 621.676321,
791.672796, 605.35667, 637.981476, 719.805053, 722.348921), right_presubiculum = c(327.285242,
364.937865, 288.322641, 348.30058, 341.309111, 279.429847, 333.096795,
342.184296, 364.245998, 350.707173, 280.389853, 276.423658, 339.439377,
321.534798, 302.164685, 328.365751, 341.660085, 305.366589, 320.04127,
303.83284), `right_GC-ML-DG` = c(362.391907, 316.853532, 342.93274,
282.550769, 339.792696, 357.867386, 342.512721, 277.797528, 309.585721,
343.770416, 333.524912, 302.505077, 309.063135, 291.29361, 302.510461,
378.682679, 255.061044, 302.545288, 313.93902, 297.167161), right_CA3 = c(307.007404,
243.839349, 269.063801, 211.336979, 249.283479, 276.092623, 268.183349,
202.947849, 214.642782, 247.844657, 291.206598, 235.864996, 222.285729,
201.427853, 237.654913, 321.338801, 199.035108, 243.204203, 236.305659,
213.386702), right_CA4 = c(312.164065, 272.905586, 297.99392,
240.765062, 289.98697, 306.459566, 284.533068, 245.965817, 264.750571,
296.149675, 290.66935, 264.821461, 264.920869, 246.267976, 266.07378,
314.205819, 229.738951, 274.152503, 256.414608, 249.162404)), row.names = c(NA,
-20L), class = c("tbl_df", "tbl", "data.frame"))
The equation:
adjustedBrain(participant1) = rawBrain(participant1) - slope*[ICV(participant1) - (mean of all ICV measures included in the calculation of the slope)]
The code (which is not working and I was hoping for some pointers):
adjusted_Brain <- function(DF, subject) {
subfields <- colnames(select(DF, "left_presubiculum", "right_presubiculum",
"left_subiculum", "right_subiculum", "left_CA1", "right_CA1",
"left_CA3", "right_CA3", "left_CA4", "right_CA4", "left_GC-ML-DG",
"right_GC-ML-DG"))
out <- matrix(ncol = length(subfields), nrow = NROW(DF))
for (i in seq_along(subfields)) {
DF[i] = DF[DF$Subject == "subject", "i"] -
slope * (DF[DF$Subject == "subject", "ICV"] -
mean(DF$ICV))
}
}
Getting this error:
Error: Can't subset columns that don't exist.
x Column `i` doesn't exist.
A few notes:
The slopes for each subject for each subfield will be different (and will come from a regression) -> is there a way to specify that in the function so the slope (coefficient from the appropriate regression equation) gets called in?
I have my nrow set to the number of participants right now in the output because I'd like to have this run through EVERY subject across EVERY subfield and spit out a matrix with all the adjusted brain volumes... But that seems very complicated and so for now I will just settle for running each participant separately.
Any help is greatly appreciated!
As others have noted in the comments, there are quite a few syntax issues that prevent your code from running, as well as a few unstated requirements. That aside, I think there is enough to recommend a few improvements that you can hopefully build on. Here are the top line changes:
You likely don't need this to be a function, but rather a nested for loop (if you want to do this with base R). As written, the code isn't flexible enough to merit a function. If you intend to apply this many times across different datasets, a function might make sense. However, it will require a much larger rewrite.
Assuming you are fitting a simple regression via lm, then you can pull out the coefficient of interest via the $ operator and indexing (see below). Some thought will need to go into how to handle different models in the loop. Here, we assume you only need one coefficient from one model.
There are a few areas where the syntax is incorrect and a review of sub setting in base R would be helpful. Others have pointed out in the comments were some of these are.
Here is one approach were we loop through each subject (j) through each feature or subfield (i) and store them in a matrix (out). This is just an approach and will almost certainly need tweaking on your end!
#NOTE: the dataset your provided is saved as x in this example.
#fit a linear model - here we assume there is only one coef. of interest, but you may need to alter
# depending on how the slope changes in each calculation
reg <- lm(ICV ~ right_CA3, x)
# view the coeff.
reg$coefficients
# pull out the slope by getting the coeff. of interest (via index) from the reg object
slope <- reg$coefficients[[1]]
# list of features/subfeilds to loop through
sf <- c("left_presubiculum", "right_presubiculum",
"left_subiculum", "right_subiculum", "left_CA1", "right_CA1",
"left_CA3", "right_CA3", "left_CA4", "right_CA4", "left_GC-ML-DG",
"right_GC-ML-DG")
# matrix to store output
out <- matrix(ncol = length(sf), nrow = NROW(x))
#name the rows after each subject
row.names(out) <- x$Subject
#name the columns after each sub feild
colnames(out) <- sf
# nested for loop that goes by subject (j) and features/subfeilds (i)
for(j in x$Subject){
for (i in sf) {
out[j,i] <- as.numeric( x[x$Subject == j, i] - (slope * (x[x$Subject == j, "ICV"] - mean(x$ICV))) )
}
}
# check output
out
I want to plot or visualize the result of each layers out from a trained CNN with mxnet in R. Like w´those abstract art from what a nn's each layer can see.
But I don't know how. Please somebody help me. One way I can think out is to put the weights and bias back to every step and plot the step out. But when I try to put model$arg.params$convolution0_weight back to mx.symbol.Convolution(), I get
Error in mx.varg.symbol.Convolution(list(...)) :
./base.h:291: Unsupported parameter type object type for argument weight, expect integer, logical, or string.
Can anyone help me?
I thought out one way, but encounter a difficulty at one step. Here is what I did.
I found all the trained cnn's parameters inmodel$arg.params , and to compute with parameters we can use mx.nd... founctions as bellow:
`#convolution 1_result
conv1_result<- mxnet::mx.nd.Convolution(data=mx.nd.array(train_array),weight=model$arg.params$convolution0_weight,bias=model$arg.params$convolution0_bias,kernel=c(8,8),num_filter = 50)
str(conv1_result)
tanh1_result<-mx.nd.Activation(data= conv1_result, act_type = "sigmoid")
pool1_result <- mx.nd.Pooling(data = tanh1_result, pool_type = "avg", kernel = c(4,4), stride = c(4,4))
conv2 result
conv2_result<- mxnet::mx.nd.Convolution(data=pool1_result,weight=model$arg.params$convolution1_weight,bias=model$arg.params$convolution1_bias,kernel=c(5,5),num_filter = 50)
tanh2_result<-mx.nd.Activation(data= conv1_result, act_type = "sigmoid")
pool2_result <- mx.nd.Pooling(data = tanh1_result, pool_type = "avg", kernel = c(4,4), stride = c(4,4))
1st fully connected layer result
flat_result <- mx.nd.flatten(data = pool2_result)
fcl_1_result <- mx.nd.FullyConnected(data = flat_result,weight = model$arg.params$fullyconnected0_weight,bias = model$arg.params$fullyconnected0_bias, num_hidden = 500)
tanh_3_result <- mx.nd.Activation(data = fcl_1_result, act_type = "tanh")
2nd fully connected layer result
fcl_2_result <- mx.nd.FullyConnected(data = tanh_3,weight = model$arg.params$fullyconnected1_weight,bias = model$arg.params$fullyconnected1_bias, num_hidden =100)`
but when I came to mx.nd.FullyConnected() step , I encountered not sufficient memory(i have 16 GB RAM) and R crashed.
So, does anyone know how to batch_size the input data in
mx.nd.FullyConnected(), or any method to make mx.nd.FullyConnected() run successfully as mx.model.FeedForward.create()
did?
Here is the code that can help you to achieve what you want. The code below displays activations of 2 convolution layers of LeNet. The code gets as an input MNIST dataset, which is 28x28 grayscale images (downloaded automatically), and produces images as activations.
You can grab outputs from executor. To see the list of available outputs use names(executor$ref.outputs)
The result of each output is available as a matrix with values in [-1; 1] range. The dimensions of the matrix depends on parameters of the layer. The code use these matrices to display as greyscaled images where -1 is white pixel, 1 - black pixel. (most of the code is taken from https://github.com/apache/incubator-mxnet/issues/1152 and massaged a little bit)
The code is a self sufficient to run, but I have noticed that if I build the model second time in the same R session, the names of ouputs get different indices, and later the code fails because the expected names of outputs are hard coded. So if you decide to create a model more than once, you will need to restart R session.
Hope it helps and you can adjust this example to your case.
library(mxnet)
download.file('https://apache-mxnet.s3-accelerate.dualstack.amazonaws.com/R/data/mnist_csv.zip', destfile = 'mnist_csv.zip')
unzip('mnist_csv.zip', exdir = '.')
train <- read.csv('train.csv', header=TRUE)
data.x <- train[,-1]
data.x <- data.x/255
data.y <- train[,1]
val_ind = 1:100
train.x <- data.x[-val_ind,]
train.x <- t(data.matrix(train.x))
train.y <- data.y[-val_ind]
val.x <- data.x[val_ind,]
val.x <- t(data.matrix(val.x))
val.y <- data.y[val_ind]
train.array <- train.x
dim(train.array) <- c(28, 28, 1, ncol(train.x))
val.array <- val.x
dim(val.array) <- c(28, 28, 1, ncol(val.x))
# input layer
data <- mx.symbol.Variable('data')
# first convolutional layer
convLayer1 <- mx.symbol.Convolution(data=data, kernel=c(5,5), num_filter=30)
convAct1 <- mx.symbol.Activation(data=convLayer1, act_type="tanh")
poolLayer1 <- mx.symbol.Pooling(data=convAct1, pool_type="max", kernel=c(2,2), stride=c(2,2))
# second convolutional layer
convLayer2 <- mx.symbol.Convolution(data=poolLayer1, kernel=c(5,5), num_filter=60)
convAct2 <- mx.symbol.Activation(data=convLayer2, act_type="tanh")
poolLayer2 <- mx.symbol.Pooling(data=convAct2, pool_type="max",
kernel=c(2,2), stride=c(2,2))
# big hidden layer
flattenData <- mx.symbol.Flatten(data=poolLayer2)
hiddenLayer <- mx.symbol.FullyConnected(flattenData, num_hidden=500)
hiddenAct <- mx.symbol.Activation(hiddenLayer, act_type="tanh")
# softmax output layer
outLayer <- mx.symbol.FullyConnected(hiddenAct, num_hidden=10)
LeNet1 <- mx.symbol.SoftmaxOutput(outLayer)
# Group some output layers for visual analysis
out <- mx.symbol.Group(c(convAct1, poolLayer1, convAct2, poolLayer2, LeNet1))
# Create an executor
executor <- mx.simple.bind(symbol=out, data=dim(val.array), ctx=mx.cpu())
# Prepare for training the model
mx.set.seed(0)
# Set a logger to keep track of callback data
logger <- mx.metric.logger$new()
# Using cpu by default, but set gpu if your machine has a supported one
devices=mx.cpu(0)
# Train model
model <- mx.model.FeedForward.create(LeNet1, X=train.array, y=train.y,
eval.data=list(data=val.array, label=val.y),
ctx=devices,
num.round=1,
array.batch.size=100,
learning.rate=0.05,
momentum=0.9,
wd=0.00001,
eval.metric=mx.metric.accuracy,
epoch.end.callback=mx.callback.log.train.metric(100, logger))
# Update parameters
mx.exec.update.arg.arrays(executor, model$arg.params, match.name=TRUE)
mx.exec.update.aux.arrays(executor, model$aux.params, match.name=TRUE)
# Select data to use
mx.exec.update.arg.arrays(executor, list(data=mx.nd.array(val.array)), match.name=TRUE)
# Do a forward pass with the current parameters and data
mx.exec.forward(executor, is.train=FALSE)
# List of outputs available.
names(executor$ref.outputs)
# Plot the filters of a sample from validation set
sample_index <- 99 # sample number in validation set. Change it to if you want to see other samples
activation0_filter_count <- 30 # number of filters of the "convLayer1" layer
par(mfrow=c(6,5), mar=c(0.1,0.1,0.1,0.1)) # number of rows x columns in output
dim(executor$ref.outputs$activation0_output)
for (i in 1:activation0_filter_count) {
outputData <- as.array(executor$ref.outputs$activation0_output)[,,i,sample_index]
image(outputData,
xaxt='n', yaxt='n',
col=gray(seq(1,0,-0.1)))
}
activation1_filter_count <- 60 # number of filters of the "convLayer2" layer
dim(executor$ref.outputs$activation1_output)
par(mfrow=c(6,10), mar=c(0.1,0.1,0.1,0.1)) # number of rows x columns in output
for (i in 1:activation1_filter_count) {
outputData <- as.array(executor$ref.outputs$activation1_output)[,,i,sample_index]
image(outputData,
xaxt='n', yaxt='n',
col=gray(seq(1,0,-0.1)))
}
As a result you should see the following images for a validation sample #2 (use RStudio left and right arrows to navigate between them).