for loop in ctree [R] - r

I want to run a decision tree for each variable in my dataframe, so I'm using this:
results_cont = list()
for (i in 2:(ncol(DATA)-1)) {
current_var = colnames(DATA[i])
current_result = ctree(TARGET ~ current_var, DATA, control = ctrl)
results_cont[[i]] = current_result
}
Where DATA is a dataframe where the first column is the ID and the last column (TARGET) is my binary Target.
I keep getting this error:
Error in trafo(data = data, numeric_trafo = numeric_trafo, factor_trafo = factor_trafo, :
data class “character” is not supported
But I don't have any character in mi dataframe.
Is there anything wrong with my loop or something else ?
Thank you guys.

Since you do not provide data, I have not tested this, but I believe your problem is the line
current_result = ctree(TARGET ~ current_var, DATA, control = ctrl)
This is not working because current_var is just a character string. You need to build the formula as a string and then convert it to a formula - like this:
current_var = colnames(DATA[i])
FORM = as.formula(paste("TARGET ~ ", current_var))
current_result = ctree(FORM, DATA, control = ctrl)

Related

Making a function that builds a dataframe

I'm trying to make a function that basically builds a dataframe and returns it. This new dataframe is made of columns taken from another dataframe that I have, called metadata.. in addetion to some additional data that I want to control, by passing the TRUE or FALSE values when calling the function.
Here is what I did:
make_data = function(metric, use_additions = FALSE){
data = data.frame(my_metric = metadata[['metric']], gender = metadata$Gender ,
age = as.numeric(metadata$Age) , use_additions = t(additional_data))
data = data %>% dplyr::select(my_metric, everything())
return(data)
}
data = make_data(CR, FALSE)
I want to pass different metric values each time, and all other features stay the same. So here for example I called the function with metric as CR which is the name of the column I want in the metadata. The argument I want to control is use_additions, sometines I want to add it and sometimes I don't.
metadata and additional_data have the exact same row names and the same rows number. It's just adding the data or not.
I get this error(s):
Error in data.frame(metric = metadata[["metric"]], gender = metadata$Gender, :
arguments imply differing number of rows: 0, 1523
In addition: Warning message:
In data.frame(metric = metadata[["metric"]], gender = metadata$Gender, :
Error in data.frame(my_metric = metadata[["metric"]], gender = metadata$Gender, :
arguments imply differing number of rows: 0, 1523
I've tried several ways to do this, with '' and without, using the $, but non of these worked. So for example when I type metric = metadata[[metric]] I get this:
Error in (function(x, i, exact) if (is.matrix(i)) as.matrix(x)[[i]] else .subset2(x, :
object 'CR' not found
make_data = function(colname, use_additions = FALSE){
data = data.frame(my_metric = metadata[colname], gender = metadata$Gender ,
age = as.numeric(metadata$Age))
if (use_additions) data$use_additions=additional_data
return(data)
}
data = make_data(“CR”, FALSE)

How to make a Loop in R referencing a data set

I'm confused on how to run a complicated loop. I want R to run a function (rpt) on each of the 14 turtles in the data set (starting with R3L12). Here is what the code looks like for just running the function for one turtle.
R3L12repodba <- rpt(odba ~ (1|date.1), grname = "date.1", data= R3L12rep,
datatype = "Gaussian", nboot = 500, npermut = 0)
print(R3L12repodba)
The problem is is that the dataset will be changing each time. For the next turtle, turtle R3L1, the data = would be R3L1rep.
It could just be easier to copy and paste the above code and change it for the 13 turtles, but I wanted to see if anyone could help me with a loop.
Thank you!
You could just make a vector containing the names of each dataset.
data_names=c("R3L12rep","R3L1rep")
Then loop over each name:
for(i in seq_along(data_names)){
foo = rpt(odba ~ (1|date.1),
grname = "date.1",
data= data_names[i],
datatype = "Gaussian",
nboot = 500,
npermut = 0))
print(foo)
}
put your datasets into a list, then iterate over that list:
datasets = list(R3L12rep,R3L1rep, <insert-rest-of-turtles>)
for (data in datasets) {
R3L12repodba <- rpt(odba ~ (1|date.1), grname = "date.1", data= data,
datatype = "Gaussian", nboot = 500, npermut = 0)
print(R3L12repodba)
}

How to structure output to write to csv within a for statement in R?

I'm running a conditional logistic regression analysis on different individuals using a for statement in R. The code for this is pretty straightforward:
for(ID in unique(Hour168Fin$BAND)){
modelone = clogit(Hour168Fin$OBSERVED ~ Hour168Fin$LNSTEPLENG + Hour168Fin$PowCross + Shrub +
strata(Hour168Fin$STEPID), data=Hour168Fin, subset = which(ID==Hour168Fin$BAND))
I'm interested in very specific parts of the output, so I've structured the output to give me exactly the coefficients I need using this:
x1beta = as.numeric(summary(modelone)$coef[1,1])
x2beta = as.numeric(summary(modelone)$coef[2,1])
x3beta = as.numeric(summary(modelone)$coef[3,1])
x1SE = as.numeric(summary(modelone)$coef[1,3])
x2SE = as.numeric(summary(modelone)$coef[2,3])
x3SE = as.numeric(summary(modelone)$coef[3,3])
x1pvalue = as.numeric(summary(modelone)$coef[1,5])
x2pvalue = as.numeric(summary(modelone)$coef[2,5])
x3pvalue = as.numeric(summary(modelone)$coef[3,5])
modelAIC = AIC(modelone)
results = table(x1beta, x1SE, x1pvalue, x2beta, x2SE, x2pvalue, x2beta, x2SE, x2pvalue, modelAIC, rownames = ID)}
In R, I can see all the results in the format I'm looking for, but when I use this to get these results into a csv:
write.csv = (results, file = "TrialOut.csv")
I'm only getting the results of 1 unique ID. I've tried embedding the write.csv statement in the for statement, and using it outside of it with the same results. Any suggestions? I'm really baffled because I can see the results in R but can't seem to get that to translate to a csv.
Thanks for your time!
Try including the write.csv call inside the loop, and use append = TRUE:
for (...) {
# ...
# ...
write.csv(results, file = "someFile.csv", append = TRUE)
}

Resolving a variable value within a function call R

I have a data frame defined as follows:
model_comp
logLik IC Lack of fit Res var
W2.4 -353.2939 716.5878 1.361885e-01 26.80232
baro5 -353.2936 718.5871 NaN 27.04363
LL.5 -353.2940 718.5880 NaN 27.04384
LL.3 -360.3435 728.6871 3.854799e-04 29.99842
W1.3 -360.3842 728.7684 3.707592e-04 30.01948
W1.4 -360.3129 730.6258 7.850947e-05 30.25028
LL.4 -360.3170 730.6340 7.818416e-05 30.25243
The best model fit is the one with the lowest IC (information criteria). I want to use the best fit to do some plotting etc... So I created:
> bestmodel <- noquote(paste0(as.name(rownames(model_comp[which.min(model_comp$IC),])),"()"))
> bestmodel
[1] W2.4()
I want to use the W2.4() as a function call to a the DRC package.
For example this call works when manually specified:
drm(y~x,logDose = 10, fct=W2.4())
I'm trying to use the value in bestmodel instead to do something like:
drm(y~x,logDose = 10,fct = as.formula(paste(bestmodel)))
I've tried all the options given here with no success. I've messed with as.formula(), noquote(), as.name() with no success.
I also tried as.name(paste0(as.name(bestmodel),"()")) where I didn't add on the "()" in the bestmodel definition above. Still no dice.
model_comp <- structure(list(logLik = c(-353.293902612472, -353.293568997018,
-353.294024776211, -360.343530770823, -360.384220907907, -360.312897918459,
-360.317018443052), IC = c(716.587805224944, 718.587137994035,
718.588049552421, 728.687061541646, 728.768441815814, 730.625795836919,
730.634036886105), `Lack of fit` = c(0.136188459104035, NaN,
NaN, 0.000385479884900107, 0.000370759187117765, 7.85094742623572e-05,
7.81841606352332e-05), `Res var` = c(26.8023196097934, 27.0436263934882,
27.0438389102235, 29.9984226526044, 30.0194755526501, 30.2502847248304,
30.2524338881051)), .Names = c("logLik", "IC", "Lack of fit",
"Res var"), row.names = c("W2.4", "baro5", "LL.5", "LL.3", "W1.3",
"W1.4", "LL.4"), class = "data.frame")
Just using noquote() not to draw the quotes around a string doesn't turn a character value into an executable piece of code. There is a big different in R between a character value an a symbol or function call. You can't really just replace one with the other.
So let's say you have extracted the character value from the rownames
x <- "W2.4"
This is basically the string version of the function you want. You can get the value of a symbol (in this case the function W2.4 from the drc:package) from its string name with get(). So you can call
drm(y~x, logDose = 10, fct = get(x)())
Note the extra parenthesis. The get(x)-call returns the W2.4 function, and the second set of parenthesis calls that function returned by get().
Using the ryegrass dataset that comes with the drc package, we can see that these two lines return the same thing
drm(rootl ~ conc, data = ryegrass, fct = W2.4())
drm(rootl ~ conc, data = ryegrass, fct = get(x)())

R "Error in terms.formula" using GA/genalg library

I'm attempting to create a genetic algorithm (not picky about library, ga and genalg produce same errors) to identify potential columns for use in a linear regression model, by minimizing -adj. r^2. Using mtcars as a play-set, trying to regress on mpg.
I have the following fitness function:
mtcarsnompg <- mtcars[,2:ncol(mtcars)]
evalFunc <- function(string) {
costfunc <- summary(lm(mtcars$mpg ~ ., data = mtcarsnompg[, which(string == 1)]))$adj.r.squared
return(-costfunc)
}
ga("binary",fitness = evalFunc, nBits = ncol(mtcarsnompg), popSize = 100, maxiter = 100, seed = 1, monitor = FALSE)
this causes:
Error in terms.formula(formula, data = data) :
'.' in formula and no 'data' argument
Researching this error, I decided I could work around it this way:
evalFunc = function(string) {
child <- mtcarsnompg[, which(string == 1)]
costfunc <- summary(lm(as.formula(paste("mtcars$mpg ~", paste(child, collapse = "+"))), data = mtcars))$adj.r.squared
return(-costfunc)
}
ga("binary",fitness = evalFunc, nBits = ncol(mtcarsnompg), popSize = 100, maxiter = 100, seed = 1, monitor = FALSE)
but this results in:
Error in terms.formula(formula, data = data) :
invalid model formula in ExtractVars
I know it should work, because I can evaluate the function by hand written either way, while not using ga:
solution <- c("1","1","1","0","1","0","1","1","1","0")
evalFunc(solution)
[1] -0.8172511
I also found in "A quick tour of GA" (https://cran.r-project.org/web/packages/GA/vignettes/GA.html) that using "string" in which(string == 1) is something the GA ought to be able to handle, so I have no idea what GA's issue with my function is.
Any thoughts on a way to write this to get ga or genalg to accept the function?
Turns out I didn't consider that a solution string of 0s (or indeed, a string of 0s with one 1) would cause the internal paste to read "mpg ~ " which is not a possible linear regression.

Resources