(I hope that this question hasn't been asked before).
For convenience I am using abbreviations for functions like "cn" instead of "colnames". However, for colnames/rownames the abbreviated functions only work for reading purposes. I am not able to set colnames with that new "cn" function. Can anyone explain the black magic behind the colnames function? This is the example:
cn <- match.fun(colnames)
x <- matrix(1:2)
colnames(x) <- "a" # OK, works.
cn(x) <- "b" # Error in cn(x) <- "b" : could not find function "cn<-"
Thank you, echasnovski, for the link to that great website.
It has helped me a lot to better understand R!
http://adv-r.had.co.nz/Functions.html#replacement-functions
In R, special "replacement functions" like foo<- can be defined. E.g. we can define a function
`setSecondElement<-` <- function(x, value){
x[2] <- value
return(x)
}
# Let's try it:
x <- 1:3
setSecondElement(x) <- 100
print(x)
# [1] 1 100 3
The colnames<- function works essentially the same. However, "behind the scenes" it will check if x is a data.frame or matrix and set either names(x) or dimnames(x)[[2]]. Just execute the following line in R and you'll see the underlying routine.
print( `colnames<-` )
For my specific problem the solution turns out to be very simple. Remember that I'd like to have a shorter version of colnames which shall be called cn. I can either do it like this:
cn <- match.fun(colnames);
`cn<-` <- function(x, value){
colnames(x) <- value
return(x)
}
More easily, as Stéphane Laurent points out, the definition of `cn<-` can be simplified to:
`cn<-` <- `colnames<-`
There is a minor difference between these approaches. The first approach will define a new function, which calls the colnames<- function. The second approach will copy the reference from the colnames<- function and make exactly the same function call even if you use cn<-. This approach is more efficient, since 1 additinal function call will be avoided.
I am trying to create a custom function that would, applied within a loop, give me a table with all the informations I need for all the variables of my table. My function is based on dplyr functions and base.
myfun <- function(x, y) summarise(x, var=names(x[y]), n=sum(!is.na(y)), blank=n()-sum(!is.na(y)), distinct=n_distinct(y, na.rm=TRUE))
My problem is that the base function (names()) requires the y argument (the variable name) to be given with quotation marks, but the dplyr function n_distinct needs to be simply so without quotation marks to give the right answer with na.rm=TRUE (if I use n_distinct(x[y], na.rm=TRUE) it doesn't give me a result without NA values). So I don't know how to find a solution to have the good form of the y argument to pass in both functions. I've tried using \" for the names() function, but it didn't seemed to work. Here the errors I obtain:
myfun <- function(x, y) summarise(x, var=names(x[y]), n=sum(!is.na(y)), blank=n()-sum(!is.na(y)), distinct=n_distinct(y, na.rm=TRUE))
myfun(mtcars, "cyl")
Error: Error in summarise_impl(.data, dots) : variable 'y' not found
myfun <- function(x, y) summarise(x, var=names(x[y]), n=sum(!is.na(y)), blank=n()-sum(!is.na(y)), distinct=n_distinct(y, na.rm=TRUE))
myfun(mtcars, cyl)
Error: Error in summarise_impl(.data, dots) : Evaluation error: object 'cyl' not found.
myfun <- function(x, y) summarise(x, var=names(x[y]), n=sum(!is.na(x[y])), blank=n()-sum(!is.na(x[y])), distinct=n_distinct(x[y], na.rm=TRUE))
myfun(mtcars, "cyl")
No error, but na.rm=TRUE doesn't seem to be seen.
My goal would then be apple with some loop to make a table with one row for each variable of my dataframe that I could then export to have these informations for all the variables in just one table.
I tried to make a minimal reproducible example:
library(dplyr)
myfun <- function(x, y) summarise(x, var=names(x[, y]), n=sum(!is.na(x[, y])), blank=n()-sum(!is.na(x[, y])), n_distinct=n_distinct(x[, y], na.rm=TRUE))
a <- mtcars%>%
summarise(n=sum(!is.na(cyl)), blank=n()-sum(!is.na(cyl)), n_distinct=n_distinct(cyl, na.rm=TRUE))
a <- lapply(colnames(mtcars), function(x) data.frame(bind_rows(a, myfun(mtcars, x))))
a <- data.frame(bind_rows(a, myfun(mtcars, "cyl")))
a <- a%>%
filter(!is.na(var))%>%
distinct(var, .keep_all=TRUE)
But for some incomprehensible reason (at least for me) it doesn't work (line a <- lapply(colnames(mtcars), function(x) data.frame(bind_rows(a, myfun(mtcars, x)))), error message Error in summarise_impl(.data, dots) : Columnvaris of unsupported type NULL). It works fine with my dataframe, I subsetted it and it still worked fine, I manually created the same again by writting from hand all the same values in the same class, it didn't work... So I'm really lost, don't understand why it works for my dataset but no other, and because I'm new in R and just learn that by trying, without having lectures about this language code, I sometimes have no idea what I'm really doing but it works (like this code above for me), and then no more...
So this code works for me pretty good, there is just the problem as said that because I use n_distinct(x[, y]) it ignores na.rm=TRUE, what I cannot understand.
Sorry for the rather uncomprehensive question I asked I think, I would be glad to edit it if you leaves comment about how to clarify it. I'm simply totally lost with my try and have no idea how to present things in a clearer way. Thanks for the help and sorry for the mess
I'm not entirely clear on what on exactly what you are trying to do, but this might get at it.
First create a function that will be run for each column.
fn <- function(x){
a = levels(x)
n = n=sum(!is.na(x))
blank = length(x) - sum(!is.na(x))
dist = length(unique(x))
c(column = a, n=n, blank=blank, distinct=dist )
}
Then use apply to apply the function to each column of the data.frame. I've transposed it to provide rows.
t(apply(mtcars, 2, fn))
I am trying to develop my first package in R and I am facing some issues with "myclass" generic functions that i will try to describe.
Assume a data.frame X with n <- nrow(X) rows and K <- ncol(X) columns.
My main package function (too big to put it in this post) lets say
fun1 <- function(X){
# do staff...
out <- list(index= character vector, A= A, B= B,... etc)
return(out)
class(out) <- "myclass"
}
returns as an output a list. Then I have to use the output for the generic print method in a print.myclass function. However, in my print function I want to use the data frame X used in my main function without asking the user to provide it in an argument (i.e, print(out,X)) and without having it in my output list out (visible to the user at least). Is there any way to do that? Thanks in advance!
Could somebody please point out to me why is that the following example does not work:
df <- data.frame(ex =rep(1,5))
sample.fn <- function(var1) with(df, mean(var1))
sample.fn(ex)
It seems that I am using the wrong syntax to combine with inside of a function.
Thanks,
This is what I meant by learning to use "[" (actually"[["):
> df <- data.frame(ex =rep(1,5))
> sample.fn <- function(var1) mean(df[[var1]])
> sample.fn('ex')
[1] 1
You cannot use an unquoted ex since there is no object named 'ex', at least not at the global environment where you are making the call to sample.fn. 'ex' is a name only inside the environment of the df-dataframe and only df itself is "visible" when the sample.fn-function is called.
Out of interest, I tried using the method that the with.default function uses to build a function taking an unquoted expression argument in the manner you were expecting:
samp.fn <- function(expr) mean(
eval(substitute(expr), df, enclos = parent.frame())
)
samp.fn(ex)
#[1] 1
It's not a very useful function, since it would only be applicable when there was a dataframe named 'df' in the parent.frame(). And apologies for incorrectly claiming that there was a warning on the help page. As #rawr points out the warning about using functions that depend on non-standard evaluation appears on the subset page.
I'm trying to separate a dataset into parts that have factor variables and non-factor variables.
I'm looking to do something like:
This part works:
factorCols <- sapply(df1, is.factor)
factorDf <- df1[,factorCols]
This part won't work:
nonFactorCols <- sapply(df1, !is.factor)
due to this error:
Error in !is.factor : invalid argument type
Is there a correct way to do this?
Correct way:
nonFactorCols <- sapply(df1, function(col) !is.factor(col))
# or, more efficiently
nonFactorCols <- !sapply(df1, is.factor)
# or, even more efficiently
nonFactorCols <- !factorCols
Joshua gave you the correct way to do it. As for why sapply(df1, !is.factor) did not work:
sapply is expecting a function. !is.factor is not a function. The bang operator returns a logical value (albeit, it cannot take is.factor as an argument).
Alternatively, you could use Negate(is.factor) which does in fact return a function.