I'm trying to separate a dataset into parts that have factor variables and non-factor variables.
I'm looking to do something like:
This part works:
factorCols <- sapply(df1, is.factor)
factorDf <- df1[,factorCols]
This part won't work:
nonFactorCols <- sapply(df1, !is.factor)
due to this error:
Error in !is.factor : invalid argument type
Is there a correct way to do this?
Correct way:
nonFactorCols <- sapply(df1, function(col) !is.factor(col))
# or, more efficiently
nonFactorCols <- !sapply(df1, is.factor)
# or, even more efficiently
nonFactorCols <- !factorCols
Joshua gave you the correct way to do it. As for why sapply(df1, !is.factor) did not work:
sapply is expecting a function. !is.factor is not a function. The bang operator returns a logical value (albeit, it cannot take is.factor as an argument).
Alternatively, you could use Negate(is.factor) which does in fact return a function.
Related
(I hope that this question hasn't been asked before).
For convenience I am using abbreviations for functions like "cn" instead of "colnames". However, for colnames/rownames the abbreviated functions only work for reading purposes. I am not able to set colnames with that new "cn" function. Can anyone explain the black magic behind the colnames function? This is the example:
cn <- match.fun(colnames)
x <- matrix(1:2)
colnames(x) <- "a" # OK, works.
cn(x) <- "b" # Error in cn(x) <- "b" : could not find function "cn<-"
Thank you, echasnovski, for the link to that great website.
It has helped me a lot to better understand R!
http://adv-r.had.co.nz/Functions.html#replacement-functions
In R, special "replacement functions" like foo<- can be defined. E.g. we can define a function
`setSecondElement<-` <- function(x, value){
x[2] <- value
return(x)
}
# Let's try it:
x <- 1:3
setSecondElement(x) <- 100
print(x)
# [1] 1 100 3
The colnames<- function works essentially the same. However, "behind the scenes" it will check if x is a data.frame or matrix and set either names(x) or dimnames(x)[[2]]. Just execute the following line in R and you'll see the underlying routine.
print( `colnames<-` )
For my specific problem the solution turns out to be very simple. Remember that I'd like to have a shorter version of colnames which shall be called cn. I can either do it like this:
cn <- match.fun(colnames);
`cn<-` <- function(x, value){
colnames(x) <- value
return(x)
}
More easily, as Stéphane Laurent points out, the definition of `cn<-` can be simplified to:
`cn<-` <- `colnames<-`
There is a minor difference between these approaches. The first approach will define a new function, which calls the colnames<- function. The second approach will copy the reference from the colnames<- function and make exactly the same function call even if you use cn<-. This approach is more efficient, since 1 additinal function call will be avoided.
Could somebody please point out to me why is that the following example does not work:
df <- data.frame(ex =rep(1,5))
sample.fn <- function(var1) with(df, mean(var1))
sample.fn(ex)
It seems that I am using the wrong syntax to combine with inside of a function.
Thanks,
This is what I meant by learning to use "[" (actually"[["):
> df <- data.frame(ex =rep(1,5))
> sample.fn <- function(var1) mean(df[[var1]])
> sample.fn('ex')
[1] 1
You cannot use an unquoted ex since there is no object named 'ex', at least not at the global environment where you are making the call to sample.fn. 'ex' is a name only inside the environment of the df-dataframe and only df itself is "visible" when the sample.fn-function is called.
Out of interest, I tried using the method that the with.default function uses to build a function taking an unquoted expression argument in the manner you were expecting:
samp.fn <- function(expr) mean(
eval(substitute(expr), df, enclos = parent.frame())
)
samp.fn(ex)
#[1] 1
It's not a very useful function, since it would only be applicable when there was a dataframe named 'df' in the parent.frame(). And apologies for incorrectly claiming that there was a warning on the help page. As #rawr points out the warning about using functions that depend on non-standard evaluation appears on the subset page.
I am trying to understand how to pass a data frame to an R function. I found an answer to this question on StackOverflow that provides the following demonstration / solution:
Pass a data.frame column name to a function
df <- data.frame(A=1:10, B=2:11, C=3:12)
fun1 <- function(x, column){
max(x[,column])
}
fun1(df, "B")
fun1(df, c("B","A"))
This makes sense to me, but I don't quit understand the rules for calling data frames within a function. Take the following example:
data(iris)
x.test <- function(df, x){
out <- with(df, mean(x))
return(out)
}
x.test(iris, "Sepal.Length")
The output of this is NA, with a warning message. But, if I do the same procedure without the function it seems to work just fine.
with(iris, mean(Sepal.Length))
I'm obviously missing something here -- any help would be greatly appreciated.
Thanks!
You have been given the correct advice already (which was to use "[" or "[[" rather than with inside functions) but it might also be helpful to ponder why the problem occurred. Inside the with you asked the mean function to return the mean of a character vector, so NA was the result. When you used with at the interactive level, you had no quotes around the character name of the column and if you had you would have gotten the same result:
> with(iris, mean('Sepal.Length'))
[1] NA
Warning message:
In mean.default("Sepal.Length") :
argument is not numeric or logical: returning NA
If you had used the R get mechanism for "promoting" a character object to return the result of a named object you would actually have succeeded, although with is still generally not recommended for programming use:
x.test <- function(df, x){
out <- with(df, mean( get(x)) ) # get() retrieves the named object from the workspace
return(out)
}
x.test(iris, "Sepal.Length")
#[1] 5.843333
See the Details section of the ?with page for warnings about its use in functions.
This will work
data(iris)
x.test <- function(df, x){
out <- mean(df[, x])
return(out)
}
x.test(iris, "Sepal.Length")
Your code is trying to take mean("Sepal.Length") which is clearly not what you want.
In R, when I try to assign a function via ifelse, I get the following error:
> my.func <- ifelse(cond, sqrt, identity)
Error in rep(yes, length.out = length(ans)) :
attempt to replicate an object of type 'builtin'
If cond is FALSE, the error looks equivalent, R complains about an
attempt to replicate an object of type 'closure'
What can I do to assign one of two functions to a variable and what is going on here?
Because ifelse is vectorized and does not provide special cases for non-vectorized conditions, the arguments are replicated with rep(...). rep(...) fails for closures such as in the example though.
A workaround would be to temprarily wrap the functions:
my.func <- ifelse(cond, c(sqrt), c(identity))[[1]]
#Joshua Ulrich's comment is a proper answer. The correct way to accomplish conditional function assignment is a classic if...else rather than the vectorized ifelse method:
my.func <- if (cond) sqrt else identity
Instead of writing one vector subscript operation a line, such as:
x.and.y <- intersect(x, y)
idx.x <- match(x, x.and.y)
idx.x <- idx.x[!is.na(idx.x)]
I could chain them in one line:
x.and.y <- intersect(x, y)
idx.x <- subset(tmp <- match(x, x.and.y), !is.na(tmp))
In order to do that, I must give intermediate vector a name to be used in subscript operations. To make code even more concise, is there a way to refer to a vector anonymously? Like this:
x.and.y <- intersect(x, y)
idx.x <- match(x, x.and.y)[!is.na] ## illegal R
Considering intersect calls match, what you're doing is redundant. intersect is defined as:
function (x, y)
{
y <- as.vector(y)
unique(y[match(as.vector(x), y, 0L)])
}
And you can get the same result as your 3 lines of code by using %in%: x[y%in%x].
I realize this may not be representative of your actual problem, but "referring to a vector anonymously" doesn't really fit the R paradigm. Function arguments are pass-by-value. You're essentially saying, "I want a function to manipulate an object, but I don't want to provide the object to the function."
You could use R's scoping rules to do this (which is what mplourde did using Filter with an anonymous function), but you're going to create quite a bit of convoluted code that way.