Anonymous function in R - r

Using a dataset w, which includes a numeric column PY, I can do:
nrow(subset(w, PY==50))
and get the correct answer. If, however, I try to create a function:
fxn <- function(dataset, fac, lev){nrow(subset(dataset, fac==lev))}
and run
fxn(w, PY, 50)
I get the following error:
Error in eval(expr, envir, enclos) : object 'PY' not found
What am I doing wrong? Thanks.

From the documentation of subset:
Warning
This is a convenience function intended for use interactively. For programming it is better to use the standard subsetting functions like [, and in particular the non-standard evaluation of argument subset can have unanticipated consequences.
This rather obscure warning was very well explained here: Why is `[` better than `subset`?
The final word is you can't use subset other than interactively, in particular, not via a wrapper like you are trying. You should use [ instead:
fxn <- function(dataset, fac, lev) nrow(dataset[dataset[fac] == lev, , drop = FALSE])
or rather simply:
fxn <- function(dataset, fac, lev) sum(dataset[fac] == lev)

Related

R custom function to apply to all variables in a dataframe

I am trying to create a custom function that would, applied within a loop, give me a table with all the informations I need for all the variables of my table. My function is based on dplyr functions and base.
myfun <- function(x, y) summarise(x, var=names(x[y]), n=sum(!is.na(y)), blank=n()-sum(!is.na(y)), distinct=n_distinct(y, na.rm=TRUE))
My problem is that the base function (names()) requires the y argument (the variable name) to be given with quotation marks, but the dplyr function n_distinct needs to be simply so without quotation marks to give the right answer with na.rm=TRUE (if I use n_distinct(x[y], na.rm=TRUE) it doesn't give me a result without NA values). So I don't know how to find a solution to have the good form of the y argument to pass in both functions. I've tried using \" for the names() function, but it didn't seemed to work. Here the errors I obtain:
myfun <- function(x, y) summarise(x, var=names(x[y]), n=sum(!is.na(y)), blank=n()-sum(!is.na(y)), distinct=n_distinct(y, na.rm=TRUE))
myfun(mtcars, "cyl")
Error: Error in summarise_impl(.data, dots) : variable 'y' not found
myfun <- function(x, y) summarise(x, var=names(x[y]), n=sum(!is.na(y)), blank=n()-sum(!is.na(y)), distinct=n_distinct(y, na.rm=TRUE))
myfun(mtcars, cyl)
Error: Error in summarise_impl(.data, dots) : Evaluation error: object 'cyl' not found.
myfun <- function(x, y) summarise(x, var=names(x[y]), n=sum(!is.na(x[y])), blank=n()-sum(!is.na(x[y])), distinct=n_distinct(x[y], na.rm=TRUE))
myfun(mtcars, "cyl")
No error, but na.rm=TRUE doesn't seem to be seen.
My goal would then be apple with some loop to make a table with one row for each variable of my dataframe that I could then export to have these informations for all the variables in just one table.
I tried to make a minimal reproducible example:
library(dplyr)
myfun <- function(x, y) summarise(x, var=names(x[, y]), n=sum(!is.na(x[, y])), blank=n()-sum(!is.na(x[, y])), n_distinct=n_distinct(x[, y], na.rm=TRUE))
a <- mtcars%>%
summarise(n=sum(!is.na(cyl)), blank=n()-sum(!is.na(cyl)), n_distinct=n_distinct(cyl, na.rm=TRUE))
a <- lapply(colnames(mtcars), function(x) data.frame(bind_rows(a, myfun(mtcars, x))))
a <- data.frame(bind_rows(a, myfun(mtcars, "cyl")))
a <- a%>%
filter(!is.na(var))%>%
distinct(var, .keep_all=TRUE)
But for some incomprehensible reason (at least for me) it doesn't work (line a <- lapply(colnames(mtcars), function(x) data.frame(bind_rows(a, myfun(mtcars, x)))), error message Error in summarise_impl(.data, dots) : Columnvaris of unsupported type NULL). It works fine with my dataframe, I subsetted it and it still worked fine, I manually created the same again by writting from hand all the same values in the same class, it didn't work... So I'm really lost, don't understand why it works for my dataset but no other, and because I'm new in R and just learn that by trying, without having lectures about this language code, I sometimes have no idea what I'm really doing but it works (like this code above for me), and then no more...
So this code works for me pretty good, there is just the problem as said that because I use n_distinct(x[, y]) it ignores na.rm=TRUE, what I cannot understand.
Sorry for the rather uncomprehensive question I asked I think, I would be glad to edit it if you leaves comment about how to clarify it. I'm simply totally lost with my try and have no idea how to present things in a clearer way. Thanks for the help and sorry for the mess
I'm not entirely clear on what on exactly what you are trying to do, but this might get at it.
First create a function that will be run for each column.
fn <- function(x){
a = levels(x)
n = n=sum(!is.na(x))
blank = length(x) - sum(!is.na(x))
dist = length(unique(x))
c(column = a, n=n, blank=blank, distinct=dist )
}
Then use apply to apply the function to each column of the data.frame. I've transposed it to provide rows.
t(apply(mtcars, 2, fn))

Evaluate strings for regression

From these strings
data = "mtcars"
y = "mpg"
x = c("cyl","disp")
, I am trying to perform a linear model. I tried things like
epp=function(x) eval(parse(text=paste0(x,collapse="+")))
lm(data=epp(data),epp(y)~epp(x))
# Error in eval(expr, envir, enclos) : object 'cyl' not found
where the last line was aimed to be equivalent to
lm(data=mtcars,mpg~cyl+disp)
This involves two operations that are both described in multiple SO entries that use perhaps singly either the get or as.formula functions:
lm(data=get(data),
formula=as.formula( paste( y, "~", paste(x, collapse="+") ) )
)
In both cases you are use a text/character object to return a language object. In the first argument get returns a 'symbol' that can be evaluated and in the second instance as.formula returns a 'formula' object. #blmoore is correct in advising us that lm will accept a character object, so the as.formula call is not needed here.

Specifying variables in cor.matrix

Trying to use Deducer's cor.matrix to create a correlation matrix to be used in ggcorplot.
Trying to run a simple example. Only explicitly specifying the variable names in the data works:
cor.mat<-cor.matrix(variables=d(Sepal.Length,Sepal.Width,Petal.Length,Petal.Width),
data=iris[1:4],test=cor.test,method='p')
But I'd like it to simple use all columns in the provided data.
This:
cor.mat<-cor.matrix(data=iris[1:4],test=cor.test,method='p')
throws an error:
Error in eval(expr, envir, enclos) : argument is missing, with no default
This:
cor.mat<-cor.matrix(variables=d(as.name(paste(colnames(iris)[1:4]),collapse=",")),
data=iris[1:4],test=cor.test,method='p')
Error in as.name(paste(colnames(iris)[1:4]), collapse = ",") :
unused argument (collapse = ",")
So is there any way to tell variables to use all columns in data without explicitly specifying them?
The first argument of the function is variables =, which is required but you did not specify (you had data =). Try
cor.mat <- cor.matrix(variables = iris[1:4], test = cor.test, method = 'p')
ggcorplot(cor.mat, data=iris)

cannot combine with and functions in R

Could somebody please point out to me why is that the following example does not work:
df <- data.frame(ex =rep(1,5))
sample.fn <- function(var1) with(df, mean(var1))
sample.fn(ex)
It seems that I am using the wrong syntax to combine with inside of a function.
Thanks,
This is what I meant by learning to use "[" (actually"[["):
> df <- data.frame(ex =rep(1,5))
> sample.fn <- function(var1) mean(df[[var1]])
> sample.fn('ex')
[1] 1
You cannot use an unquoted ex since there is no object named 'ex', at least not at the global environment where you are making the call to sample.fn. 'ex' is a name only inside the environment of the df-dataframe and only df itself is "visible" when the sample.fn-function is called.
Out of interest, I tried using the method that the with.default function uses to build a function taking an unquoted expression argument in the manner you were expecting:
samp.fn <- function(expr) mean(
eval(substitute(expr), df, enclos = parent.frame())
)
samp.fn(ex)
#[1] 1
It's not a very useful function, since it would only be applicable when there was a dataframe named 'df' in the parent.frame(). And apologies for incorrectly claiming that there was a warning on the help page. As #rawr points out the warning about using functions that depend on non-standard evaluation appears on the subset page.

Why can't I assign a function with ifelse in R?

In R, when I try to assign a function via ifelse, I get the following error:
> my.func <- ifelse(cond, sqrt, identity)
Error in rep(yes, length.out = length(ans)) :
attempt to replicate an object of type 'builtin'
If cond is FALSE, the error looks equivalent, R complains about an
attempt to replicate an object of type 'closure'
What can I do to assign one of two functions to a variable and what is going on here?
Because ifelse is vectorized and does not provide special cases for non-vectorized conditions, the arguments are replicated with rep(...). rep(...) fails for closures such as in the example though.
A workaround would be to temprarily wrap the functions:
my.func <- ifelse(cond, c(sqrt), c(identity))[[1]]
#Joshua Ulrich's comment is a proper answer. The correct way to accomplish conditional function assignment is a classic if...else rather than the vectorized ifelse method:
my.func <- if (cond) sqrt else identity

Resources