I am trying to do the following but can't figure it out. Could someone please help me?
f <- expression(x^3+4*y)
df <- D(f,'x')
x <-0
df0 <- eval(df)
df0 should be a function of y!
If you take the derivative of f with respect to x you get 3 * x^2. The 4*y is a constant as far as x is concerned. So you don't have a function of y as such, your df is a constant as far as y is concerned (although it is a function of x).
Assigning to x doesn't change df; it remains the expression 3 * x^2 and is still a function of x if you wanted to treat it as such.
If you want to substitute a variable in an expression, then substitute() is what you are looking for.
> substitute(3 * x^2, list(x = 0))
3 * 0^2
It is a blind substitute with no simplification of the expression--we probably expected zero here, but we get zero times 3--but that is what you get.
Unfortunately, substituting in an expression you have in a variable is a bit cumbersome, since substitute() thinks its first argument is the verbatim expression, so you get
> substitute(df, list(x = 0))
df
The expression is df, there is no x in that so nothing is substituted, and you just get df back.
You can get around that with two substitutions and an eval:
> df0 <- eval(
+ substitute(substitute(expr, list(x = 0)),
+ list(expr = df)))
> df0
3 * 0^2
> eval(df0)
[1] 0
The outermost substitute() puts the value of df into expr, so you get the right expression there, and the inner substitute() changes the value of x.
There are nicer functions for manipulating expressions in the Tidyverse, but I don't remember them off the top of my head.
Related
I'm sure the question is a bit dummy (sorry)... I'm trying to create a function using differents variables I have stored in a Dataframe. The function is like that:
mlr_turb <- function(Cond_in, Flow_in, pH_in, pH_out, Turb_in, nm250_i, nm400_i, nm250_o, nm400_o){
Coag = (+0.032690 + 0.090289*Cond_in + 0.003229*Flow_in - 0.021980*pH_in - 0.037486*pH_out
+0.016031*Turb_in -0.026006*nm250_i +0.093138*nm400_o - 0.397858*nm250_o - 0.109392*nm400_o)/0.167304
return(Coag)
}
m4_turb <- mlr_turb(dataset)
The problem is when I try to run my function in a dataframe (with the same name of variables). It doesn't detect my variables and shows this message:
Error in mlr_turb(dataset) :
argument "Flow_in" is missing, with no default
But, actually, there is, also all the variables.
I think I missplace or missing some order in the function that gives it the possibility to take the variables from the dataset. I have searched a lot about that but I have not found any answer...
No dumb questions!
I think you're looking for do.call. This function allows you to unpack values into a function as arguments. Here's a really simple example.
# a simple function that takes x, y and z as arguments
myFun <- function(x, y, z){
result <- (x + y)/z
return(result)
}
# a simple data frame with columns x, y and z
myData <- data.frame(x=1:5,
y=(1:5)*pi,
z=(11:15))
# unpack the values into the function using do.call
do.call('myFun', myData)
Output:
[1] 0.3765084 0.6902654 0.9557522 1.1833122 1.3805309
You meet a standard problem when writing R that is related to the question of standard evaluation (SE) vs non standard evaluation (NSE). If you need more elements, you can have a look at this blog post I wrote
I think the most convenient way to write function using variables is to use variable names as arguments of the function.
Let's take again #Muon example.
# a simple function that takes x, y and z as arguments
myFun <- function(x, y, z){
result <- (x + y)/z
return(result)
}
The question is where R should find the values behind names x, y and z. In a function, R will first look within the function environment (here x,y and z are defined as parameters) then it will look at global environment and then it will look at the different packages attached.
In myFun, R expects vectors. If you give a column name, you will experience an error. What happens if you want to give a column name ? You must say to R that the name you gave should be associated to a value in the scope of a dataframe. You can for instance do something like that:
myFun <- function(df, col1 = "x", col2 = "y", col3 = "z"){
result <- (df[,col1] + df[,col2])/df[,col3]
return(result)
}
You can go far further in that aspect with data.table package. If you start writing functions that need to use variables from a dataframe, I recommend you to start having a look at this package
I like Muon's answer, but I couldn't get it to work if there are columns in the data.frame not in the function. Using the with() function is a simple way to make this work as well...
#Code from Muon:
# a simple function that takes x, y and z as arguments
myFun <- function(x, y, z){
result <- (x + y)/z
return(result)
}
# a simple data frame with columns x, y and z
myData <- data.frame(x=1:5,
y=(1:5)*pi,
z=(11:15),
a=6:10) #adding a var not used in myFun
# unpack the values into the function using do.call
do.call('myFun', myData)
#generates an error for the unused "a" column
#using with() function:
with(myData, myFun(x, y, z))
usage of pipe operator in purrr-dplyr packages is (in short) defined as follows:
y%>%f(x,.,z) is the same as f(x,y,z)
I am trying to do the following task using pipe operator. First I show you the task without using pipes:
#####for reproducibility
set.seed(50)
z0<-factor(sample(c(letters[1:3],NA),100,replace = T))
###the task
rep(1,length(table(z0)))
now I want to do this using pipes:
z0%>%table%>%rep(1,length(.))
however the result is not the same. It seems that pipe operator cannot handle the proper assignation to a composition of functions. That is
y%>%f(x,g(.)) should be the same as f(x,g(y))
so, the concrete question is if ti is possible to do
y%>%f(x,g(.))
Thank you in advance for your comments.
The %>% implements a first argument rule, that is, it passes the previous data as first argument to the function if . is not a direct argument; In your second case, the argument to rep is 1 and length(.), so the first argument rule takes effect; To avoid this, use {} to enclose the expression; You can read more about this at Re-using the placeholder for attributes:
Re-using the placeholder for attributes
It is straight-forward to use the placeholder several times in a
right-hand side expression. However, when the placeholder only appears
in a nested expressions magrittr will still apply the first-argument
rule. The reason is that in most cases this results more clean code.
x %>% f(y = nrow(.), z = ncol(.)) is equivalent to f(x, y = nrow(x), z = ncol(x))
The behavior can be overruled by enclosing the right-hand side in
braces:
x %>% {f(y = nrow(.), z = ncol(.))} is equivalent to f(y = nrow(x), z = ncol(x))
rep(1,length(table(z0)))
# [1] 1 1 1
Equivalent would be:
z0 %>% table %>% {rep(1,length(.))}
# [1] 1 1 1
I came across this fantastic function called
within {base}
I use it more often now than the much hyped
mutate {dplyr}
My question is, why is within() having such a peculiar format with assignment operators used <- instead of the usual = for args; How is it different from mutate other than what is given in this fantastic article I found. I am interested to know the underlying mechanism.
Article of Bob Munchen - 2013
The function within takes an expression as second argument. That expression is essentially a codeblock, best contained within curly brackets {}.
In this codeblock, you can assign new variables, change values and the likes. The variables can be used in the codeblock as objects.
mutate on the other hand takes a set of arguments for the mutation. These arguments have to be named after the variable that should be created, and get the value for that variable as the value.
So :
mutate(iris, ratio = Sepal.Length/Petal.Length)
# and
within(iris, {ratio = Sepal.Length/Petal.Length})
give the same result. The problem starts when you remove the curly brackets:
> within(iris, ratio = Sepal.Length/Petal.Length)
Error in eval(substitute(expr), e) : argument is missing, with no default
The curly brackets enclosed an expression (piece of code), and hence within() worked correctly. If you don't use the {}, then R semantics reads that last command as "call the function within with iris as first argument and a second argument called ratio set to Sepal.Length/Petal.Length". And as the function within() doesn't have an argument ratio, that one is ignored. Instead, within looks for the expression that should be the second argument. But it can't find that one, so that explains the error.
So there's little peculiar about it. Both functions just have different arguments. All the rest is pretty much how R deals with arguments.
Args of within are not assigned with <- but with the usual =.
Let's see the first example in your link:
mydata.new <- within(mydata, {
+ x2 <- x ^ 2
+ x3 <- x2 + 100
+ } )
Here,
{
x2 <- x ^ 2
x3 <- x2 + 100
}
is just an argument of the function (an R expression). Nor x2 nor x3 are arguments to within. The function could have been called in that way instead to make it clearer:
mydata.new <- within(data = mydata, expr = {
x2 <- x ^ 2
x3 <- x2 + 100
})
I'm trying to make a small R package with my limited knowledge in R programming. I am trying to use the following argument:
formula=~a+b*X
where X is vector, 'a' and 'b' are constants in a function call.
What I'm wondering is once I input the formula, I want to extract (a,b) and X separately and use them for other data manipulations inside the function call. Is there a way to do it in R?
I would really appreciate any guidance.
Note: Edited my question for clarity
I'm looking for something similar to model.matrix() output. The above mentioned formula can be more generalized to accommodate 'n' number of variables, say,
~2+3*X +4*Y+...+2*Z
In the output, I need the coefficients (2 3 4 ...2) as a vector and [1 X Y ... Z] as a covariate matrix.
The question is not completely clear so we will assume that the question is, given a formula using standard formula syntax, how do we parse out the variables names (or in the second answer the variable names and constants) giving as output a character vector containing them.
1) all.vars Try this:
fo <- a + b * X # input
all.vars(fo)
giving:
[1] "a" "b" "X"
2) strapplyc Also we could do it with string manipulation. In this case it also parses out the constants.
library(gsubfn)
fo <- ~ 25 + 35 * X # input
strapplyc(gsub(" ", "", format(fo)), "-?[0-9.]+|[a-zA-Z0-9._]+", simplify = unlist)
giving:
[1] "25" "35" "X"
Note: If all you are trying to do is to evaluate the RHS of the formula as an R expression then it is just:
X <- 1:3
fo <- ~ 1 + 2 * X
eval(fo[[2]])
giving:
[1] 3 5 7
Update: Fixed and added second solution and Note.
A call is a list of symbols and/or other calls and its elements can be accessed through normal indexing operations, e.g.
f <- ~a+bX
f[[1]]
#`~`
f[[2]]
#a + bX
f[[2]][[1]]
#`+`
f[[2]][[2]]
#a
However notice that in your formula bX is one symbol, you probably meant b * X instead.
f <- ~a + b * X
Then a and b typically would be stored in an unevaluated list.
vars <- call('list', f[[2]][[2]], f[[2]][[3]][[2]])
vars
#list(a, b)
and vars would be passed to eval at some point.
Consider the following simple function:
f <- function(x, value){print(x);print(substitute(value))}
Argument x will eventually be evaluated by print, but value never will. So we can get results like this:
> f(a, a)
Error in print(x) : object 'a' not found
> f(3, a)
[1] 3
a
> f(1+1, 1+1)
[1] 2
1 + 1
> f(1+1, 1+"one")
[1] 2
1 + "one"
Everything as expected.
Now consider the same function body in a replacement function:
'g<-' <- function(x, value){print(x);print(substitute(value))}
(the single quotes should be fancy quotes)
Let's try it:
> x <- 3
> g(x) <- 4
[1] 3
[1] 4
Nothing unusual so far...
> g(x) <- a
Error: object 'a' not found
This is unexpected. Name a should be printed as a language object.
> g(x) <- 1+1
[1] 4
1 + 1
This is ok, as x's former value is 4. Notice the expression passed unevaluated.
The final test:
> g(x) <- 1+"one"
Error in 1 + "one" : non-numeric argument to binary operator
Wait a minute... Why did it try to evaluate this expression?
Well the question is: bug or feature? What is going on here? I hope some guru users will shed some light about promises and lazy evaluation on R. Or we may just conclude it's a bug.
We can reduce the problem to a slightly simpler example:
g <- function(x, value)
'g<-' <- function(x, value) x
x <- 3
# Works
g(x, a)
`g<-`(x, a)
# Fails
g(x) <- a
This suggests that R is doing something special when evaluating a replacement function: I suspect it evaluates all arguments. I'm not sure why, but the comments in the C code (https://github.com/wch/r-source/blob/trunk/src/main/eval.c#L1656 and https://github.com/wch/r-source/blob/trunk/src/main/eval.c#L1181) suggest it may be to make sure other intermediate variables are not accidentally modified.
Luke Tierney has a long comment about the drawbacks of the current approach, and illustrates some of the more complicated ways replacement functions can be used:
There are two issues with the approach here:
A complex assignment within a complex assignment, like
f(x, y[] <- 1) <- 3, can cause the value temporary
variable for the outer assignment to be overwritten and
then removed by the inner one. This could be addressed by
using multiple temporaries or using a promise for this
variable as is done for the RHS. Printing of the
replacement function call in error messages might then need
to be adjusted.
With assignments of the form f(g(x, z), y) <- w the value
of z will be computed twice, once for a call to g(x, z)
and once for the call to the replacement function g<-. It
might be possible to address this by using promises.
Using more temporaries would not work as it would mess up
replacement functions that use substitute and/or
nonstandard evaluation (and there are packages that do
that -- igraph is one).
I think the key may be found in this comment beginning at line 1682 of "eval.c" (and immediately followed by the evaluation of the assignment operation's RHS):
/* It's important that the rhs get evaluated first because
assignment is right associative i.e. a <- b <- c is parsed as
a <- (b <- c). */
PROTECT(saverhs = rhs = eval(CADR(args), rho));
We expect that if we do g(x) <- a <- b <- 4 + 5, both a and b will be assigned the value 9; this is in fact what happens.
Apparently, the way that R ensures this consistent behavior is to always evaluate the RHS of an assignment first, before carrying out the rest of the assignment. If that evaluation fails (as when you try something like g(x) <- 1 + "a"), an error is thrown and no assignment takes place.
I'm going to go out on a limb here, so please, folks with more knowledge feel free to comment/edit.
Note that when you run
'g<-' <- function(x, value){print(x);print(substitute(value))}
x <- 1
g(x) <- 5
a side effect is that 5 is assigned to x. Hence, both must be evaluated. But if you then run
'g<-'(x,10)
both the values of x and 10 are printed, but the value of x remains the same.
Speculation:
So the parser is distinguishing between whether you call g<- in the course of making an actual assignment, and when you simply call g<- directly.