Why I got the wrong mean value in R? - r

I am trying to calculate the mean of -0.9643991 and -0.6756494, but for some reasons, the mean function returns the value of the first number; as shown below:
> a = -0.9643991
> b = -0.6756494
> mean(a, b)
[1] -0.9643991
What is the issue here?

mean takes the mean of its first argument, a vector. Additional arguments let you set preferences, which can depend on the class of the first argument, such as ignoring missing values. If you want the mean of a and b, you need to put them together in a vector, using c(). Like this:
mean(c(a, b))

Related

How does mutate_all(.funs=~./sum(x)) work?

I used this code to calculate relative abundance (cell/total of column) of a table I had. I don't understand how the . and ~ functions work.
The construct ~./sum(x) is technically a special type of R object called a formula
class(~./sum(x))
#> [1] "formula"
However, in tidyverse functions such as mutate_all, this formula is taken and converted into a lambda function, which is an anonymous function (i.e. a function that isn't named and is written in place as a parameter passed in a call to another function).
Internally, the formula is converted into a function with rlang::as_function. Suppose we wanted to write a function that just adds two to a variable. In base R we might write
add_two <- function(var){
return(var + 2)
}
add_two(5)
#> [1] 7
In the tidyverse, we can use a formula as shorthand for this function, where the . becomes a shorthand for "the variable that was passed as a first argument to the function":
add_two <- rlang::as_function(~ . + 2)
add_two(5)
#> [1] 7
In functions such as mutate_all, the formula will automatically be passed through rlang::as_function, so if we wanted to add two to each column in our data frame, instead of writing:
mutate_all(.funs= function(var) {return(var + 2);})
we could write
mutate_all(.funs=~.+2)
In your case, the formula ~./sum(x) is effectively transformed into
function(var) {
return(var / sum(x))
}
where x has to exist either as a column in your data frame or a variable in the calling environment.
The reasons for having it this way are that it saves typing and shortens lines of code. Inserting a function within a call to another function often leads to messy and poorly formatted code. This shorthand method helps to prevent that.
You can read more about anonymous functions and how they are used in the tidyverse here
Suppose we have this dataset:
dataset <- data.frame(a = c(1,2,3,4),
b = c(2,3,4,5),
c = c(3,4,5,6))
And you want to divide all vectors by the total (ie. for vector a = 1/10, 2/10, 3/10, 4/10). To avoid writing for all variables, you can use mutate_all, and then a lambda using .funswhich says make a function that divides all values in each vector represented by the dot by the sum of all values in that vector.
dataset %>% mutate_all(.funs = ~./sum(.))
Hope this helps.
mutate_all applies the function in .funs to all values. Each value (.) is divided by the sum(x), to get you the "relative abundance" which is essentially the fraction of the total value, which is the sum(x). You can think of ~ as a "function of". So you are saying each cell in the dataframe is a function of itself divided by the overall sum.

R- distinguishing argument values

Hi I want to pass a list of arguments into my main function to use two sub functions.
f<-function(a,...){
x1<-f1(...)
x2<-f2(...)
}
Suppose f1 takes an argument with name "a" and f2 takes an argument with name "a". How can I solve this problem. The name "a" is used inside the main function and the two subfunctions. I am trying to distinguish what name "a" is for different functions but it seems to be a very difficult task.
I can give a more specific example
f<-function(x,...){
print(mean(x))
x1<-dnorm(...)
x2<-dbinom(...)
}
Obviously, dnorm and dbinom use name "x" as inputs. But, I want to use a different value of x for each of the sub functions. Furthermore, I want to use name "x" inside the main function to calculate it's mean because the main x is a vector.
Since they have the same name, you'll need some way of distinguishing them or they will simply clash, as you've pointed out. There's really not much magic beyond that: you've spotted the issue.
You'll also need a way of keeping dbinom-specific arguments out of dnorm, because dnorm throws an error if you give it a size argument, for example.
You can write out all the relevant args, for example:
f<-function(x,dnx, mean=0, sd=1, dnlog=FALSE, dbx, size, prob, dblog=FALSE, ...){
print(mean(x))
x1<-dnorm(x=dnx, mean, sd, log=dnlog)
x2<-dbinom(x=dbx, size, prob, log=dblog)
}
or supply them as lists:
f<-function(x,
dn_args=list(x=0, mean = 0, sd = 1, log = FALSE),
db_args=list(x=5, size=10, prob=0.5, log = FALSE), ...){
print(mean(x))
x1<-do.call(dnorm, dn_args)
x2<-do.call(dbinom, db_args)
}
You can also consider whether you need to refactor the function into smaller pieces. :)

Vector with elements equal to a function evaluated at a, a+1,... b .in R

I have two integers a and b (with a less than b), as well as a function f(x). Is there a way of getting the vector
x<-(f(a), ..., f(b))
from R without having to explicitly having to write it out? as my a and b vary.
Thanks for your help.
You can try something like the following :
foo <- function(x) x+1
a <- 1
b <- 5
sapply(a:b, foo)
But note that if you need this kind of behavior, you should vectorize your function, ie make it accept a vector as argument instead of a single integer. In my previous example, the sapply is not needed at all : + is vectorized, so I can just do :
foo(a:b)

Inline expansion of variables in R

I'm confused with when a value is treated as a variable, and when as a string in R. In Ruby and Python, I'm used to a string always having to be quoted, and an unquoted string is always treated as a variable. Ie.
a["hello"] => a["hello"]
b = "hi"
a[b] => a["hi"]
But in R, this is not the case, for example
a$b < c(1,2,3)
b here is the value/name of the column, not the variable b.
c <- "b"
a$c => column not found (it's looking for column c, not b, which is the value of the variable c)
(I know that in this specific case I can use a[c], but there are many other cases. Such as ggplot(a, aes(x=c)) - I want to plot the column that is the value of c, not with the name c)...
In other StackOverflow questions, I've seen things like quote, substitute etc mentioned.
My question is: Is there a general way of "expanding" a variable and making sure the value of the variable is used, instead of the name of the variable? Or is that just not how things are done in R?
In your example, a$b is syntatic sugar for a[["b"]]. That's a special feature of the $ symbol when used with lists. The second form does what you expect - a[[b]] will return the element of a whose name == the value of the variable b, rather than the element whose name is "b".
Data frames are similar. For a data frame a, the $ operator refers to the column names. So a$b is the same as a[ , "b"]. In this case, to refer to the column of a indicated by the value of b, use a[, b].
The reason that what you posted with respect to the $ operator doesn't work is quite subtle and is in general quite different to most other situations in R where you can just use a function like get which was designed for that purpose. However, calling a$b is equivalent to calling
`$`(a , b)
This reminds us, that in R, everything is an object. $ is a function and it takes two arguments. If we check the source code we can see that calling a$c and expecting R to evaluate c to "b" will never work, because in the source code it states:
/* The $ subset operator.
We need to be sure to only evaluate the first argument.
The second will be a symbol that needs to be matched, not evaluated.
*/
It achieves this using the following:
if(isSymbol(nlist) )
SET_STRING_ELT(input, 0, PRINTNAME(nlist));
else if(isString(nlist) )
SET_STRING_ELT(input, 0, STRING_ELT(nlist, 0));
else {
errorcall(call,_("invalid subscript type '%s'"),
type2char(TYPEOF(nlist)));
}
nlist is the argument you passed do_subset_3 (the name of the C function $ maps to), in this case c. It found that c was a symbol, so it replaces it with a string but does not evaluate it. If it was a string then it is passed as a string.
Here are some links to help you understand the 'why's and 'when's of evaluation in R. They may be enlightening, they may even help, if nothing else they will let you know that you are not alone:
http://developer.r-project.org/nonstandard-eval.pdf
http://journal.r-project.org/2009-1/RJournal_2009-1_Chambers.pdf
http://www.burns-stat.com/documents/presentations/inferno-ish-r/
In that last one, the most important piece is bullet point 2, then read through the whole set of slides. I would probably start with the 3rd one, then the 1st 2.
These are less in the spirit of how to make a specific case work (as the other answers have done) and more in the spirit of what has lead to this state of affairs and why in some cases it makes sense to have standard nonstandard ways of accessing variables. Hopefully understanding the why and when will help with the overall what to do.
If you want to get the variable named "b", use the get function in every case. This will substitute the value of b for get(b) wherever it is found.
If you want to play around with expressions, you need to use quote(), substitute(), bquote(), and friends like you mentioned.
For example:
x <- quote(list(a = 1))
names(x) # [1] "" "a"
names(x) <- c("", a)
x # list(foo = 1)
And:
c <- "foo"
bquote(ggplot(a, aes(x=.(c)))) # ggplot(a, aes(x = "foo"))
substitute(ggplot(a, aes(x=c)), list(c = "foo"))

R curve() on expression involving vector

I'd like to plot a function of x, where x is applied to a vector. Anyway, easiest to give a trivial example:
var <- c(1,2,3)
curve(mean(var)+x)
curve(mean(var+x))
While the first one works, the second one gives errors:
'expr' did not evaluate to an object of length 'n' and
In var + x : longer object length is not a multiple of shorter object length
Basically I want to find the minimum of such a function: e.g.
optimize(function(x) mean(var+x), interval=c(0,1))
And then be able to visualise the result. While the optimize function works, I can't figure out how to get the curve() to work as well.. Thanks!
The function needs to be vectorized. That means, if it evaluates a vector it has to return a vector of the same length. If you pass any vector to mean the result is always a vector of length 1. Thus, mean is not vectorized. You can use Vectorize:
f <- Vectorize(function(x) mean(var+x))
curve(f,from=0, to=10)
This can be done in the general case using sapply:
curve(sapply(x, function(e) mean(var + e)))
In the specific example you give, mean(var) + x, is of course arithmetically equivalent to what you're looking for. Similar shortcuts might exist for whatever more complicated function you're working with.

Resources