Entering variables into regression function

Entering variables into regression function - r

I have this feature_list that contains several possible values, say "A", "B", "C" etc. And there is time in time_list.
So I will have a loop where I will want to go through each of these different values and put it in a formula.
something like for(i in ...) and then my_feature <- feature_list[i] and my_time <- time_list[i].
Then I put the time and the chosen feature to a dataframe to be used for regression
feature_list<- c("GPRS")
time_list<-c("time")
calc<-0
feature_dim <- length(feature_list)
time_dim <- length(time_list)
data <- read.csv("data.csv", header = TRUE, sep = ";")
result <- matrix(nrow=0, ncol=5)
errors<-matrix(nrow=0, ncol=3)
for(i in 1:feature_dim) {
my_feature <- feature_list[i]
my_time <- time_list[i]
fitdata <- data.frame(data[my_feature], data[my_time])
for(j in 1:60) {
my_b <- 0.0001 * (2^j)
for(k in 1:60) {
my_c <- 0.0001 * (2^k)
cat("Feature: ", my_feature, "\t")
cat("b: ", my_b, "\t")
cat("c: ", my_c, "\n")
err <- try(nlsfit <- nls(GPRS ~ 53E5*exp(-1*b*exp(-1*c*time)), data=fitdata, start=list(b=my_b, c=my_c)), silent=TRUE)
calc<-calc+1
if(class(err) == "try-error") {
next
}
else {
coefs<-coef(nlsfit)
ess<-deviance(nlsfit)
result<-rbind(result, c(coefs[1], coefs[2], ess, my_b, my_c))
}
}
}
}
Now in the nls() call I want to be able to call my_feature instead of just "A" or "B" or something and then to the next one on the list. But I get an error there. What am I doing wrong?

You can use paste to create a string version of your formula including the variable name you want, then use either as.formula or formula functions to convert this to a formula to pass to nls.
as.formula(paste(my_feature, "~ 53E5*exp(-1*b*exp(-1*c*time))"))
Another option is to use the bquote function to insert the variable names into a function call, then eval the function call.

I worked with R a while ago, maybe you can give this a try:
What you want is create a formula with a list of variables right?
so if the response variable is the first element of your list and the others are the explanatory variables you could create your formula this way:
my_feature[0] ~ reduce("+",my_feature[1:]) . This might work.
this way you can create formulae that depends on the variables in my_features.

Related

R how to pass NULL for optional parameters to function (e.g. in for loop)

I wrote a for loop to test different settings for an ordination function in R (package "vegan", called by "phyloseq"). I have several subsets of my data within a list (sample_subset_list) and therefore, testing different parameters for all these subsets results in many combinations.
The ordination function contains the optional argument formula and I would like to perform my ordinations with and without a formula. I assume NULL would be the correct way to not use the formula parameter? But how do I pass NULL when using a for loop (or apply etc)?
Using the phyloseq example data:
library(phyloseq)
data(GlobalPatterns)
ps <- GlobalPatterns
ps1 <- filter_taxa(ps, function (x) {sum(x > 0) > 10}, prune = TRUE)
ps2 <- filter_taxa(ps, function (x) {sum(x > 0) > 20}, prune = TRUE)
sample_subset_list <- list()
sample_subset_list <- c(ps1, ps2)
I tried:
formula <- c("~ SampleType", NULL)
> formula
[1] "~ SampleType"
ordination_list <- list()
for (current_formula in formula) {
tmp <- lapply(sample_subset_list,
ordinate,
method = "CCA",
formula = as.formula(current_formula))
ordination_list[[paste(current_formula)]] <- tmp
}
this way, formula only consists of "~ SampleType". If I put NULL into ticks, it gets wrongly interpreted as formula:
formula <- c("~ SampleType", "NULL")
Error in parse(text = x, keep.source = FALSE)
What is right way to solve this?
Regarding Lyzander's answer:
# make sure to use (as suggested)
formula <- list("~ SampleType", NULL)
# and not
formula <- list()
formula <- c("~ SampleType", NULL)

You can use a list instead:
formula <- list("~ my_constraint", NULL)
# for (i in formula) print(i)
#[1] "~ my_constraint"
#NULL
If your function takes NULL as an argument for a function you should also do:
ordination_list <- list()
for (current_formula in formula) {
tmp <- lapply(sample_subset_list,
ordinate,
method = "CCA",
formula = if (is.null(current_formula)) NULL else as.formula(current_formula))
ordination_list[[length(ordination_list) + 1]] <- tmp
}

R: dynamic arguments

I'm using an R function which requires a list of variables as input arguments in the following format:
output <- funName(gender ~ height + weight + varName4, data=tableName)
Basically the input arguments are column names in the table (and are not to be enclosed in ""). I have a list of these variables that I want to add one by one; i.e. run the function with one variable first, get the output, and incrementally adding variables (getting an output each time) i.e.
iteration 1:
output <- funName(gender ~ height, data=tableName)
iteration 2:
output <- funName(gender ~ height + weight, data=tableName)
iteration 3:
output <- funName(gender ~ height + weight + varName4, data=tableName)
Is this possible?

Try the following:
# vector of variable names
myNames <- c("gender", "height", "weight", "varName4")
# print out results
for(i in 2:4) {
print(as.formula(paste(myNames[1], "~", paste(myNames[2:i], collapse="+"))))
}
Of course, you can replace print with the appropriate funName, such as lm, along with additional arguments. So
for(i in 2:4) {
lm(as.formula(paste(myNames[1], "~", paste(myNames[2:i], collapse="+"))), data=tableName)
}
Should work as you would expect it to. You could also use lapply if you wanted to save the results in an orderly fashion:
temp <- lapply(2:4, function(i) as.formula(paste(myNames[1], "~",
paste(myNames[2:i], collapse="+"))))
will save a list of formulas, for example.
Using the reformulate function as mentioned by #ben-bolker, you can simplify the web of paste functions:
for(i in 2:4) {
print(reformulate(myNames[2:i], response = myNames[1], intercept = TRUE))
}

Pass generic column names to xtabs function in R

Is there any way to pass generic column names to functions like xtabs in R?
Typically, I'm trying to do something like:
xtabs(weight ~ col, data=dframe)
with col and weight two columns of my data.frame, weight being a column containing weights. It works, but if I want to wrap xtabs in a function to which I pass the column names as argument, it fails. So, if I do:
xtabs.wrapper <- function(dframe, colname, weightname) {
return(xtabs(weightname ~ colname, data=dframe))
}
it fails. Is there a simple way to do something similar? Perhaps I'm missing something with R logic, but it seems to me quite annoying not to be able to pass generic variables to such functions since I'm not particularly fond of copy/paste.
Any help or comments appreciated!
Edit: as mentioned in comments, I was suggested to use eval and I came with this solution:
xtabs.wrapper <- function(dframe, wname, cname) {
xt <- eval(parse(text=paste("xtabs(", wname, "~", cname, ", data=",
deparse(substitute(dframe)), ")")))
return(xt)
}
As I said, I seems to me to be an ugly trick, but I'm probably missing something about the language logic.

Not sure if this is any prettier, but here is a way to define a function without using eval ... it involves accessing the correct columns of dframe via []:
xtabs.wrapper <- function(dframe, wname, cname) {
tmp.wt <- dframe[,wname]
tmp.col <- dframe[,cname]
xt <- xtabs(tmp.wt~tmp.col)
return(xt)
}
Or you can shorten the guts of the function to:
xtabs.wrapper2 <- function(dframe, wname, cname) {
xt <- xtabs(dframe[,wname]~dframe[,cname])
return(xt)
}
To show they are equivalent here with an example from the mtcars data:
data(mtcars)
xtabs(wt~cyl, mtcars)
xtabs.wrapper(mtcars, "wt", "cyl")
xtabs.wrapper2(mtcars, "wt", "cyl")

I did this once:
creatextab<-function(factorsToUse, data)
{
newform<-as.formula(paste("Freq ~", paste(factorsToUse, collapse="+"), sep=""))
xtabs(formula= newform, drop.unused.levels = TRUE, data=data)
}
Obviously this is a different form because of the Freq, but basically .. you can generate the forumula as a string and then you are just using xtabs() directly.

If you want an n-way crosstab and cname contains a string of variable names, then you'll want the following:
xtabs.wrapper3 <- function(dframe, wname, cname) {
eval(cname)
formula <- paste0(wname, " ~ ", paste0(cname, collapse=" + ") )
xt <- xtabs(formula, data = dframe)
return(xt)
}
xtabs.wrapper3(mtcars, "wt", c("cyl", "vs"))

How do I convert this for loop into something cooler like by in R

uniq <- unique(file[,12])
pdf("SKAT.pdf")
for(i in 1:length(uniq)) {
dat <- subset(file, file[,12] == uniq[i])
names <- paste("Sample_filtered_on_", uniq[i], sep="")
qq.chisq(-2*log(as.numeric(dat[,10])), df = 2, main = names, pvals = T,
sub=subtitle)
}
dev.off()
file[,12] is an integer so I convert it to a factor when I'm trying to run it with by instead of a for loop as follows:
pdf("SKAT.pdf")
by(file, as.factor(file[,12]), function(x) { qq.chisq(-2*log(as.numeric(x[,10])), df = 2, main = paste("Sample_filtered_on_", file[1,12], sep=""), pvals = T, sub=subtitle) } )
dev.off()
It works fine to sort the data frame by this (now a factor) column. My problem is that for the plot title, I want to label it with the correct index from that column. This is easy to do in the for loop by uniq[i]. How do I do this in a by function?
Hope this makes sense.

A more vectorized (== cooler?) version would pull the common operations out of the loop and let R do the book-keeping about unique factor levels.
dat <- split(-2 * log(as.numeric(file[,10])), file[,12])
names(dat) <- paste0("IoOPanos_filtered_on_pc_", names(dat))
(paste0 is a convenience function for the common use case where normally one would use paste with the argument sep=""). The for loop is entirely appropriate when you're running it for its side effects (plotting pretty pictures) rather than trying to capture values for further computation; it's definitely un-cool to use T instead of TRUE, while seq_along(dat) means that your code won't produce unexpected results when length(dat) == 0.
pdf("SKAT.pdf")
for(i in seq_along(dat)) {
vals <- dat[[i]]
nm <- names(dat)[[i]]
qq.chisq(val, main = nm, df = 2, pvals = TRUE, sub=subtitle)
}
dev.off()
If you did want to capture values, the basic observation is that your function takes 2 arguments that vary. So by or tapply or sapply or ... are not appropriate; each of these assume that just a single argument is varying. Instead, use mapply or the comparable Map
Map(qq.chisq, dat, main=names(dat),
MoreArgs=list(df=2, pvals=TRUE, sub=subtitle))

How can I include a variable name in a function call in R?

I'm trying to change the name of a variable that is included inside a for loop and function call. In the example below, I'd like column_1 to be passed to the plot function, then column_2 etc. I've tried using do.call, but it returns "object 'column_j' not found". But object column_j is there, and the plot function works if I hard-code them in. Help much appreciated.
for (j in 2:12) {
column_to_plot = paste("column_", j, sep = "")
do.call("plot", list(x, as.name(column_to_plot)))
}

I do:
x <- runif(100)
column_2 <-
column_3 <-
column_4 <-
column_5 <-
column_6 <-
column_7 <-
column_8 <-
column_9 <-
column_10 <-
column_11 <-
column_12 <- rnorm(100)
for (j in 2:12) {
column_to_plot = paste("column_", j, sep = "")
do.call("plot", list(x, as.name(column_to_plot)))
}
And I have no errors. Maybe you could provide hard-code which (according to your question) works, then will be simpler to find a reason of the error.
(I know that I can generate vectors using loop and assign, but I want to provide clear example)

You can do it without the paste() command in your for loop. Simply assign the columns via the function colnames() in your loop:
column_to_plot <- colnames(dataframeNAME)[j]
Hope that helps as a first kludge.

Are you trying to retrieve an object in the workspace by a character string? In that case, parse() might help:
for (j in 2:12) {
column_to_plot = paste("column_", j, sep = "")
plot(x, eval(parse(text=column_to_plot)))
}
In this case you could use do.call(), but it would not be required.
Edit: wrapp parse() in eval()

Here is one way to do it:
tmp.df <- data.frame(col_1=rnorm(10),col_2=rnorm(10),col_3=rnorm(10))
x <- seq(2,20,by=2)
plot(x, tmp.df$col_1)
for(j in 2:3){
name.list <- list("x",paste("col_",j,sep=""))
with(tmp.df, do.call("lines",lapply(name.list,as.name))) }
You can also do colnames(tmp.df)[j] instead of paste(..) if you'd like.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Entering variables into regression function - r

Related

R how to pass NULL for optional parameters to function (e.g. in for loop)

R: dynamic arguments

Pass generic column names to xtabs function in R

How do I convert this for loop into something cooler like by in R

How can I include a variable name in a function call in R?

Categories

Resources