I'm using an R function which requires a list of variables as input arguments in the following format:
output <- funName(gender ~ height + weight + varName4, data=tableName)
Basically the input arguments are column names in the table (and are not to be enclosed in ""). I have a list of these variables that I want to add one by one; i.e. run the function with one variable first, get the output, and incrementally adding variables (getting an output each time) i.e.
iteration 1:
output <- funName(gender ~ height, data=tableName)
iteration 2:
output <- funName(gender ~ height + weight, data=tableName)
iteration 3:
output <- funName(gender ~ height + weight + varName4, data=tableName)
Is this possible?
Try the following:
# vector of variable names
myNames <- c("gender", "height", "weight", "varName4")
# print out results
for(i in 2:4) {
print(as.formula(paste(myNames[1], "~", paste(myNames[2:i], collapse="+"))))
}
Of course, you can replace print with the appropriate funName, such as lm, along with additional arguments. So
for(i in 2:4) {
lm(as.formula(paste(myNames[1], "~", paste(myNames[2:i], collapse="+"))), data=tableName)
}
Should work as you would expect it to. You could also use lapply if you wanted to save the results in an orderly fashion:
temp <- lapply(2:4, function(i) as.formula(paste(myNames[1], "~",
paste(myNames[2:i], collapse="+"))))
will save a list of formulas, for example.
Using the reformulate function as mentioned by #ben-bolker, you can simplify the web of paste functions:
for(i in 2:4) {
print(reformulate(myNames[2:i], response = myNames[1], intercept = TRUE))
}
Related
I have a function that takes a dataset, extracts different variables, and then makes linear models from those variables (it expects the response in the last column). I want the data argument of the calls for these models to use objects in the global environment so that I can manipulate them with other functions outside this function. The following gives the expected behavior when provided with a single dataset.
make_mods <- function(dataset) {
make_mod <- function(x){
response <- names(dataset)[length(dataset)]
form <- paste0(response, " ~ ", x)
form <- as.formula(form)
bquote( lm(.(form), data = .(d_sub)) ) # Unevaluated to show output
}
d_sub <- substitute(dataset)
vars <- names(dataset)[-length(dataset)]
mods <- lapply(vars, make_mod)
return(mods)
}
# Make some different datasets
ex1 <- ex2 <- ex3 <- mtcars[c(3,4,6,1)]
new_data <- function(x) {
x + rnorm(length(x), mean = 0, sd = sd(x))
}
ex2[-length(ex2)] <- lapply(ex2[-length(ex2)], new_data)
ex3[-length(ex3)] <- lapply(ex3[-length(ex3)], new_data)
make_mods(ex1)
I also want to be able to use this function within lapply
# List of datasets for testing function with lapply
ex_l <- mget(c("ex1", "ex2", "ex3"))
lapply(ex_l, make_mods)
But here the model calls end up looking like this: lm(mpg ~ disp, data = X[[i]]) and, of course, this model call doesn't evaluate in the default environment (the actual function evaluates the model call in the function). The desired output is a list of lists of models that look like: lm(mpg ~ disp, data = ex_l[["ex1"]]), i.e., they have valid calls with data arguments that reference data frames in the global environment.
I've experimented with passing names to lapply and different wrapper functions for calling make_mods from lapply but it seems like my function, in using substitute only gives the expected behavior when called from the global environment. I'm new to working with scoping and environments. How can I get my function to give the desired lm call both when passed a data frame from the global environment, and when passed data frames from within lapply.
The only thing that I could think of was to add an if statement to my make mods function that tests if the input is a call or not. If it's a call, it expects it to be a call for a dataset in the global environment.
make_mods <- function(dataset) {
make_mod <- function(x){
response <- names(dataset)[length(dataset)]
form <- paste0(response, " ~ ", x)
form <- as.formula(form)
bquote( lm(.(form), data = .(d_sub)) )
}
if(is.call(dataset)) {
d_sub <- dataset
dataset <- eval(dataset)
} else {
d_sub <- substitute(dataset)
}
vars <- names(dataset)[-length(dataset)]
mods <- lapply(vars, make_mod)
return(mods)
}
Then I can use lapply like this:
out <- lapply(names(ex_l), function(x){
g <- bquote(ex_l[[.(x)]])
make_mods(g)
})
names(out) <- names(ex_l)
which gives me this:
$ex1
$ex1[[1]]
lm(mpg ~ disp, data = ex_l[["ex1"]])
$ex1[[2]]
lm(mpg ~ hp, data = ex_l[["ex1"]])
$ex1[[3]]
lm(mpg ~ wt, data = ex_l[["ex1"]])
<<output truncated>>
Maybe not an elegant solution, but it's working.
Is there any way to pass generic column names to functions like xtabs in R?
Typically, I'm trying to do something like:
xtabs(weight ~ col, data=dframe)
with col and weight two columns of my data.frame, weight being a column containing weights. It works, but if I want to wrap xtabs in a function to which I pass the column names as argument, it fails. So, if I do:
xtabs.wrapper <- function(dframe, colname, weightname) {
return(xtabs(weightname ~ colname, data=dframe))
}
it fails. Is there a simple way to do something similar? Perhaps I'm missing something with R logic, but it seems to me quite annoying not to be able to pass generic variables to such functions since I'm not particularly fond of copy/paste.
Any help or comments appreciated!
Edit: as mentioned in comments, I was suggested to use eval and I came with this solution:
xtabs.wrapper <- function(dframe, wname, cname) {
xt <- eval(parse(text=paste("xtabs(", wname, "~", cname, ", data=",
deparse(substitute(dframe)), ")")))
return(xt)
}
As I said, I seems to me to be an ugly trick, but I'm probably missing something about the language logic.
Not sure if this is any prettier, but here is a way to define a function without using eval ... it involves accessing the correct columns of dframe via []:
xtabs.wrapper <- function(dframe, wname, cname) {
tmp.wt <- dframe[,wname]
tmp.col <- dframe[,cname]
xt <- xtabs(tmp.wt~tmp.col)
return(xt)
}
Or you can shorten the guts of the function to:
xtabs.wrapper2 <- function(dframe, wname, cname) {
xt <- xtabs(dframe[,wname]~dframe[,cname])
return(xt)
}
To show they are equivalent here with an example from the mtcars data:
data(mtcars)
xtabs(wt~cyl, mtcars)
xtabs.wrapper(mtcars, "wt", "cyl")
xtabs.wrapper2(mtcars, "wt", "cyl")
I did this once:
creatextab<-function(factorsToUse, data)
{
newform<-as.formula(paste("Freq ~", paste(factorsToUse, collapse="+"), sep=""))
xtabs(formula= newform, drop.unused.levels = TRUE, data=data)
}
Obviously this is a different form because of the Freq, but basically .. you can generate the forumula as a string and then you are just using xtabs() directly.
If you want an n-way crosstab and cname contains a string of variable names, then you'll want the following:
xtabs.wrapper3 <- function(dframe, wname, cname) {
eval(cname)
formula <- paste0(wname, " ~ ", paste0(cname, collapse=" + ") )
xt <- xtabs(formula, data = dframe)
return(xt)
}
xtabs.wrapper3(mtcars, "wt", c("cyl", "vs"))
I am trying to get the column names of a dataframe to use them in another call, but this apply call returns the values separated, instead of concatenated correctly. What did I do wrong here?
df<-data.frame(c(1,2,3),c(4,5,6))
colnames(df)<-c("hi","bye")
apply(df,2,function(x){
paste("subscale_scores$",colnames(x),sep="")
#this is the command I am eventually trying to run
#lm(paste("subscale_scores",colnames(x))~surveys$npitotal+ipip$extraversion+ipip$agreeableness+ipip$conscientiousness+ipip$emotionalStability+ipip$intelImagination)
})
Goal output:
subscale_scores$hi
subscale_scores$bye
Is there any need for the apply?
Is this what you mean?
paste0('subscale_scores$', names(df))
# [1] "subscale_scores$hi" "subscale_scores$bye"
if you need them concatenated by newline say, add , sep='\n'.
The paste0 is shorthand for paste(..., sep="").
A note on your lm call later - if you want to do lm(Y ~ ...) where Y is each of your columns separately, try:
lms <- lapply(colnames(df),
function (y) {
# construct your formula
frm <- paste0('subscale_scores$', y, ' ~ surveys$npitotal+ipip$extraversion+ipip$agreeableness+ipip$conscientiousness+ipip$emotionalStability+ipip$intelImagination')
lm(frm)
})
names(lms) <- colnames(df)
Then lms$hi will contain the output of lm(subscale_scores$hi ~ ...) and so on.
Or if the aim was to combine all the columns together (Y1 + Y2 ~ ...)
Then paste0('subscale_scores$', names(df), collapse='+') will give you subscale_scores$hi+subscale_scores$bye
How about this?
unlist(lapply(colnames(df),function(x){
paste("subscale_scores$",x,sep="")
}))
uniq <- unique(file[,12])
pdf("SKAT.pdf")
for(i in 1:length(uniq)) {
dat <- subset(file, file[,12] == uniq[i])
names <- paste("Sample_filtered_on_", uniq[i], sep="")
qq.chisq(-2*log(as.numeric(dat[,10])), df = 2, main = names, pvals = T,
sub=subtitle)
}
dev.off()
file[,12] is an integer so I convert it to a factor when I'm trying to run it with by instead of a for loop as follows:
pdf("SKAT.pdf")
by(file, as.factor(file[,12]), function(x) { qq.chisq(-2*log(as.numeric(x[,10])), df = 2, main = paste("Sample_filtered_on_", file[1,12], sep=""), pvals = T, sub=subtitle) } )
dev.off()
It works fine to sort the data frame by this (now a factor) column. My problem is that for the plot title, I want to label it with the correct index from that column. This is easy to do in the for loop by uniq[i]. How do I do this in a by function?
Hope this makes sense.
A more vectorized (== cooler?) version would pull the common operations out of the loop and let R do the book-keeping about unique factor levels.
dat <- split(-2 * log(as.numeric(file[,10])), file[,12])
names(dat) <- paste0("IoOPanos_filtered_on_pc_", names(dat))
(paste0 is a convenience function for the common use case where normally one would use paste with the argument sep=""). The for loop is entirely appropriate when you're running it for its side effects (plotting pretty pictures) rather than trying to capture values for further computation; it's definitely un-cool to use T instead of TRUE, while seq_along(dat) means that your code won't produce unexpected results when length(dat) == 0.
pdf("SKAT.pdf")
for(i in seq_along(dat)) {
vals <- dat[[i]]
nm <- names(dat)[[i]]
qq.chisq(val, main = nm, df = 2, pvals = TRUE, sub=subtitle)
}
dev.off()
If you did want to capture values, the basic observation is that your function takes 2 arguments that vary. So by or tapply or sapply or ... are not appropriate; each of these assume that just a single argument is varying. Instead, use mapply or the comparable Map
Map(qq.chisq, dat, main=names(dat),
MoreArgs=list(df=2, pvals=TRUE, sub=subtitle))
I have this feature_list that contains several possible values, say "A", "B", "C" etc. And there is time in time_list.
So I will have a loop where I will want to go through each of these different values and put it in a formula.
something like for(i in ...) and then my_feature <- feature_list[i] and my_time <- time_list[i].
Then I put the time and the chosen feature to a dataframe to be used for regression
feature_list<- c("GPRS")
time_list<-c("time")
calc<-0
feature_dim <- length(feature_list)
time_dim <- length(time_list)
data <- read.csv("data.csv", header = TRUE, sep = ";")
result <- matrix(nrow=0, ncol=5)
errors<-matrix(nrow=0, ncol=3)
for(i in 1:feature_dim) {
my_feature <- feature_list[i]
my_time <- time_list[i]
fitdata <- data.frame(data[my_feature], data[my_time])
for(j in 1:60) {
my_b <- 0.0001 * (2^j)
for(k in 1:60) {
my_c <- 0.0001 * (2^k)
cat("Feature: ", my_feature, "\t")
cat("b: ", my_b, "\t")
cat("c: ", my_c, "\n")
err <- try(nlsfit <- nls(GPRS ~ 53E5*exp(-1*b*exp(-1*c*time)), data=fitdata, start=list(b=my_b, c=my_c)), silent=TRUE)
calc<-calc+1
if(class(err) == "try-error") {
next
}
else {
coefs<-coef(nlsfit)
ess<-deviance(nlsfit)
result<-rbind(result, c(coefs[1], coefs[2], ess, my_b, my_c))
}
}
}
}
Now in the nls() call I want to be able to call my_feature instead of just "A" or "B" or something and then to the next one on the list. But I get an error there. What am I doing wrong?
You can use paste to create a string version of your formula including the variable name you want, then use either as.formula or formula functions to convert this to a formula to pass to nls.
as.formula(paste(my_feature, "~ 53E5*exp(-1*b*exp(-1*c*time))"))
Another option is to use the bquote function to insert the variable names into a function call, then eval the function call.
I worked with R a while ago, maybe you can give this a try:
What you want is create a formula with a list of variables right?
so if the response variable is the first element of your list and the others are the explanatory variables you could create your formula this way:
my_feature[0] ~ reduce("+",my_feature[1:]) . This might work.
this way you can create formulae that depends on the variables in my_features.