How to make a Loop in R referencing a data set - r

I'm confused on how to run a complicated loop. I want R to run a function (rpt) on each of the 14 turtles in the data set (starting with R3L12). Here is what the code looks like for just running the function for one turtle.
R3L12repodba <- rpt(odba ~ (1|date.1), grname = "date.1", data= R3L12rep,
datatype = "Gaussian", nboot = 500, npermut = 0)
print(R3L12repodba)
The problem is is that the dataset will be changing each time. For the next turtle, turtle R3L1, the data = would be R3L1rep.
It could just be easier to copy and paste the above code and change it for the 13 turtles, but I wanted to see if anyone could help me with a loop.
Thank you!

You could just make a vector containing the names of each dataset.
data_names=c("R3L12rep","R3L1rep")
Then loop over each name:
for(i in seq_along(data_names)){
foo = rpt(odba ~ (1|date.1),
grname = "date.1",
data= data_names[i],
datatype = "Gaussian",
nboot = 500,
npermut = 0))
print(foo)
}

put your datasets into a list, then iterate over that list:
datasets = list(R3L12rep,R3L1rep, <insert-rest-of-turtles>)
for (data in datasets) {
R3L12repodba <- rpt(odba ~ (1|date.1), grname = "date.1", data= data,
datatype = "Gaussian", nboot = 500, npermut = 0)
print(R3L12repodba)
}

Related

Parallel Processing packages in R with user function and multiple outcomes

I'm working on trying to make my model fitting procedure in R more efficient. Currently, I have all of my data generated with 1500 sims for 15 variables. This data is stored in an array, with each level being one sim, each row being one "person" and each column being one of the 15 variables (eg., 300 x 15 x 1500). I then pass one layer of the array through mplusObject numerous times, fitting different LPA models (one class, two class, etc). For each of these models, there are numerous outcomes that get reported and saved. I've been working for a while now trying to figure out how to speed this up using parallel processing given that the data is pre-generated and one layer of the array doesn't depend on the other. I'll show what I currently have below, but it isn't working, so I'm wondering if I need a different package. Thanks!
inp <- array(1:(300*15*1500), dim=(300,15,1500)) #Really there's actual data here, not random values, but the data generation process is a whole other thing.
results <- results = matrix(NA,1500,129) #A results table for values to be written to, filled with NAs, 1500 simulations, 129 results.
num_sims=1500
foreach(i=1:num_sims, .packages=c('mclust','MplusAutomation')) %dopar% {
working <- inp[,,i]
sim_num=i
results[sim_num,1] = working[1,17] #number of groups
results[sim_num,2] = working[1,18] #sample size 1
results[sim_num,3] = working[1,19] #sample size 2
results[sim_num,4] = working[1,20] #sample size 3
results[sim_num,5] = working[1,21] #dist2
results[sim_num,6] = working[1,22] #dist3
df <- as.data.frame(working[,1:15])
lpa1_15 <- mplusObject(
TITLE = "1-Class LPA;",
VARIABLE = "USEVARIABLES = x01-x15;
CLASSES=c(1);",
ANALYSIS = "ESTIMATOR = MLR;
TYPE=MIXTURE;",
MODEL = "
%OVERALL%
x01-x15;
[x01-x15];
%c#1%
x01-x15;
[x01-x15];",
usevariables = c("x01", "x02", "x03", "x04", "x05",
"x06", "x07", "x08", "x09", "x10",
"x11", "x12", "x13", "x14", "x15"),
rdata = df)
lpa1_15_fit = mplusModeler(lpa1_15, "df.dat", modelout = "lpa1_15.inp", killOnFail = FALSE, run = 1L)
if (!is.null(lpa1_15_fit$results$summaries$LL)){
results[sim_num,7] = -2 * lpa1_15_fit$results$summaries$LL
results[sim_num,8] = lpa1_15_fit$results$summaries$BIC
results[sim_num,9] = lpa1_15_fit$results$summaries$aBIC
results[sim_num,10] = lpa1_15_fit$results$summaries$AIC
results[sim_num,11] = lpa1_15_fit$results$summaries$AICC}
lpa2_15 <- mplusObject(
TITLE = "2-Class LPA;",
VARIABLE = "USEVARIABLES = x01-x15;
CLASSES=c(2);",
ANALYSIS = "ESTIMATOR = MLR;
TYPE=MIXTURE;",
MODEL = "
%OVERALL%
x01-x15;
[x01-x15];
%c#1%
x01-x15;
[x01-x15];
%c#2%
x01-x15;
[x01-x15];",
OUTPUT = "TECH11;",
usevariables = c("x01", "x02", "x03", "x04", "x05",
"x06", "x07", "x08", "x09", "x10",
"x11", "x12", "x13", "x14", "x15"),
rdata = df)
lpa2_15_fit = mplusModeler(lpa2_15, "df.dat", modelout = "lpa2_15.inp", killOnFail = FALSE, run = 1L)
if (!is.null(lpa2_15_fit$results$summaries$LL)){
results[sim_num,12] = -2 * lpa2_15_fit$results$summaries$LL
results[sim_num,13] = lpa2_15_fit$results$summaries$BIC
results[sim_num,14] = lpa2_15_fit$results$summaries$aBIC
results[sim_num,15] = lpa2_15_fit$results$summaries$AIC
results[sim_num,16] = lpa2_15_fit$results$summaries$AICC
results[sim_num,17] = lpa2_15_fit$results$summaries$Entropy
if (!is.null(lpa2_15_fit$results$summaries$T11_VLMR_2xLLDiff)){
results[sim_num,18] = lpa2_15_fit$results$summaries$T11_VLMR_2xLLDiff
results[sim_num,19] = lpa2_15_fit$results$summaries$T11_VLMR_PValue
results[sim_num,20] = lpa2_15_fit$results$summaries$T11_LMR_Value
results[sim_num,21] = lpa2_15_fit$results$summaries$T11_LMR_PValue}
... and so on...
}
The results I got from running this were:
[[1]]
[1] 0.491
[[2]]
[1] 0.7037
I've tried using parallel, foreach and dopar, and parLapply, but just can't get them to work. The closest I got was using the foreach function, but that returned a single value for each and none of the results were saved to the results table. I can provide the code for how I attempted these, but none of them worked really, so at this point I'm questioning if it can be done (and if so, which method/approach is best for this setup).
I should also point out that the levels of data can be run in any order (eg., [,,1], [,,5], [,,3]) is okay, but once that level is called the full function (or however it should be set up) should be run, as several tests compare the current model to the previous model (3 classes vs 2 classes) for that dataset, so in that sense the data does have to be run in order.
Thanks for any help or suggestions you might have!

Exporting Seurat Object Data by Cluster

I'm using Seurat to perform a single cell analysis and am interested in exporting the data for all cells within each of my clusters. I tried to use the below code but have had no success.
My Seurat object is called Patients. I also attached a screenshot of my Seurat object. I am looking to extract all the clusters (i.e. Ductal1, Macrophage1, Macrophage2, etc...)
meta.data.cluster <- unique(x = Patients#meta.data$active.ident)
for(group in meta.data.cluster) {
group.cells <- WhichCells(object = Patients, subset.name = "active.ident" , accept.value = group)
data_to_write_out <- as.data.frame(x = as.matrix(x = Patients#raw.data[, group.cells]))
write.csv(x = data_to_write_out, row.names = TRUE, file = paste0(save_dir,"/",group, "_cluster_outfile.csv"))
}
I am new to R and coding so any help is greatly appreciated! :)
It doesn't work because there is no active.ident column under your metadata. For example if we use an example dataset like yours and set the ident:
library(Seurat)
M = matrix(rnbinom(5000,mu=20,size=1),ncol=50)
colnames(M) = paste0("P",1:50)
rownames(M) = paste0("gene",1:100)
Patients = CreateSeuratObject(M)
Patients$grp = sample(c("Ductal1","Macrophage1","Macrophage2"),50,replace=TRUE)
Idents(Patients) = Patients$grp
You can see this line of code gives you no value:
meta.data.cluster <- unique(x = Patients#meta.data$active.ident)
meta.data.cluster
NULL
You can do:
meta.data.cluster <- unique(Idents(Patients))
for(group in meta.data.cluster) {
group.cells <- WhichCells(object = Patients, idents = group)
data_to_write_out <- as.data.frame(GetAssayData(Patients,slot = 'counts')[,group.cells])
write.csv(data_to_write_out, row.names = TRUE, file = paste0(save_dir,"/",group, "_cluster_outfile.csv"))
}
Note also you can get the counts out using GetAssayData . You can subset one group and write out like this:
wh <- which(Idents(Patients) =="Macrophage1" )
da = as.data.frame(GetAssayData(Patients,slot = 'counts')[,wh])
write.csv(da,...)

for loop in ctree [R]

I want to run a decision tree for each variable in my dataframe, so I'm using this:
results_cont = list()
for (i in 2:(ncol(DATA)-1)) {
current_var = colnames(DATA[i])
current_result = ctree(TARGET ~ current_var, DATA, control = ctrl)
results_cont[[i]] = current_result
}
Where DATA is a dataframe where the first column is the ID and the last column (TARGET) is my binary Target.
I keep getting this error:
Error in trafo(data = data, numeric_trafo = numeric_trafo, factor_trafo = factor_trafo, :
data class “character” is not supported
But I don't have any character in mi dataframe.
Is there anything wrong with my loop or something else ?
Thank you guys.
Since you do not provide data, I have not tested this, but I believe your problem is the line
current_result = ctree(TARGET ~ current_var, DATA, control = ctrl)
This is not working because current_var is just a character string. You need to build the formula as a string and then convert it to a formula - like this:
current_var = colnames(DATA[i])
FORM = as.formula(paste("TARGET ~ ", current_var))
current_result = ctree(FORM, DATA, control = ctrl)

R "Error in terms.formula" using GA/genalg library

I'm attempting to create a genetic algorithm (not picky about library, ga and genalg produce same errors) to identify potential columns for use in a linear regression model, by minimizing -adj. r^2. Using mtcars as a play-set, trying to regress on mpg.
I have the following fitness function:
mtcarsnompg <- mtcars[,2:ncol(mtcars)]
evalFunc <- function(string) {
costfunc <- summary(lm(mtcars$mpg ~ ., data = mtcarsnompg[, which(string == 1)]))$adj.r.squared
return(-costfunc)
}
ga("binary",fitness = evalFunc, nBits = ncol(mtcarsnompg), popSize = 100, maxiter = 100, seed = 1, monitor = FALSE)
this causes:
Error in terms.formula(formula, data = data) :
'.' in formula and no 'data' argument
Researching this error, I decided I could work around it this way:
evalFunc = function(string) {
child <- mtcarsnompg[, which(string == 1)]
costfunc <- summary(lm(as.formula(paste("mtcars$mpg ~", paste(child, collapse = "+"))), data = mtcars))$adj.r.squared
return(-costfunc)
}
ga("binary",fitness = evalFunc, nBits = ncol(mtcarsnompg), popSize = 100, maxiter = 100, seed = 1, monitor = FALSE)
but this results in:
Error in terms.formula(formula, data = data) :
invalid model formula in ExtractVars
I know it should work, because I can evaluate the function by hand written either way, while not using ga:
solution <- c("1","1","1","0","1","0","1","1","1","0")
evalFunc(solution)
[1] -0.8172511
I also found in "A quick tour of GA" (https://cran.r-project.org/web/packages/GA/vignettes/GA.html) that using "string" in which(string == 1) is something the GA ought to be able to handle, so I have no idea what GA's issue with my function is.
Any thoughts on a way to write this to get ga or genalg to accept the function?
Turns out I didn't consider that a solution string of 0s (or indeed, a string of 0s with one 1) would cause the internal paste to read "mpg ~ " which is not a possible linear regression.

R - XGBoost: Error building DMatrix

I am having trouble using the XGBoost in R.
I am reading a CSV file with my data:
get_data = function()
{
#Loading Data
path = "dados_eye.csv"
data = read.csv(path)
#Dividing into two groups
train_porcentage = 0.05
train_lines = nrow(data)*train_porcentage
train = data[1:train_lines,]
test = data[train_lines:nrow(data),]
rownames(train) = c(1:nrow(train))
rownames(test) = c(1:nrow(test))
return (list("test" = test, "train" = train))
}
This function is Called my the main.R
lista_dados = get_data()
#machine = train_svm(lista_dados$train)
#machine = train_rf(lista_dados$train)
machine = train_xgt(lista_dados$train)
The problem is here in the train_xgt
train_xgt = function(train_data)
{
data_train = data.frame(train_data[,1:14])
label_train = data.frame(factor(train_data[,15]))
print(is.data.frame(data_train))
print(is.data.frame(label_train))
dtrain = xgb.DMatrix(data_train, label=label_train)
machine = xgboost(dtrain, num_class = 4 ,max.depth = 2,
eta = 1, nround = 2,nthread = 2,
objective = "binary:logistic")
return (machine)
}
This is the Error:
becchi#ubuntu:~/Documents/EEG_DATA/Dados_Eye$ Rscript main.R
[1] TRUE
[1] TRUE
Error in xgb.DMatrix(data_train, label = label_train) :
xgb.DMatrix: does not support to construct from list Calls: train_xgt
-> xgb.DMatrix Execution halted becchi#ubuntu:~/Documents/EEG_DATA/Dados_Eye$
As you can see, they are both DataFrames.
I dont know what I am doing wrong, please help!
Just convert data frame to matrix first using as.matrix() and then pass to xgb.Dmatrix().
Check if all columns have numeric data in them- I think this could be because you have some column that has data stored as factors/ characters which it won't be able to convert to a matrix. if you have factor variables, you can use one-hot encoding to convert them into dummy variables.
Try:
dtrain = xgb.DMatrix(as.matrix(sapply(data_train, as.numeric)), label=label_train)
instead of just:
dtrain = xgb.DMatrix(data_train, label=label_train)

Resources