Calculate mean value of two objects - r

Let's say we have two objects at the beginning:
a <- c(2,4,6)
b <- 8
If we apply the mean() function in each of them we get this:
> mean(a)
[1] 4
> mean(b)
[1] 8
... which is absolutely normal.
If I create a new object merging a and b...
c <- c(2,4,6,8)
and calculate its mean...
> mean(c)
[1] 5
... we get 5, which is the expected value.
However, I would like to calculate the mean value of both objects at the same time. I tried this way:
> mean(a,b)
[1] 4
As we can see, its value differs from the expected correct value (5). What am I missing?

As mentioned, the correct solution is to concatenate the vectors before passing them to mean:
mean(c(a, b))
The reason that your original code gives a wrong result is due to what mean’s second argument is:
## Default S3 method:
mean(x, trim = 0, na.rm = FALSE, ...)
So when calling mean with two numeric arguments, the second one is passed as the trim argument, which, in turn, controls how much trimming is to be done. In your case, 8 causes the function to simply return the median (meaningful values for trim would be fractions between 0 and 0.5).

If you print the argument a,b that you are feeding into the mean function, you will see that only a prints:
print(a,b)
[1] 2 4 6
So mean(a,b) only provides the mean of a.
mean(c(a,b)) will produce the expected answer of 5.

Related

How to suppress warnings from stats:::regularize.values?

In newer versions of R (I have 3.6 and previously had 3.2), the stats::regularize.values function has been changed to have a default value of warn.collapsing as TRUE. This function is used in splinefun and several other interpolation functions in R. In a microsimulation model, I am using splinefun to smooth a large amount (n > 100,000) of data points of the form (x, f(x)). Here, x is a simulated vector of positive-valued scalers, and f(x) is some function of (x). With an n that large, there are often some replications of pseudo-randomly generated values (i.e., not all values of x are unique). My understanding is that splinefun gets rid of ties in the x values. That is not a problem for me, but, because of the new default, I get a warning message printed each time (below)
"In regularize.values(x, y, ties, missing(ties)) : collapsing to
unique 'x' values"
Is there a way to either change the default of the warn.collapsing argument of the stats::regularize.values function back to F? Or can I somehow suppress that particular warning? This matters because it's embedded in a long microsimulation code and when I update it I often run into bugs. So I can't just ignore warning messages.
I tried using the formalize function. I was able to get the default arguments of stats::regularize.values printed, but when I tried to assign new values using the alist function it said there is no object 'stats'.
I had this problem too, and fixed it by adding ties=min to the argument list of splinefun().
The value of missing(ties) is now passed as warn.collapsing to regularize.values().
https://svn.r-project.org/R/trunk/src/library/stats/R/splinefun.R
https://svn.r-project.org/R/trunk/src/library/stats/R/approx.R
Also see:
https://cran.r-project.org/doc/manuals/r-release/NEWS.html
and search for regularize.values().
Referencing this article
Wrap your call of regularize.values like this:
withCallingHandlers(regularize.values(x), warning = function(w){
if (grepl("collapsing to unique 'x' values", w$message))
invokeRestart("muffleWarning")
})
Working example (adapted from the above link to call a function):
f1 <- function(){
x <- 1:10
x + 1:3
}
f1()
# if we just call f1() we get a warning
Warning in x + 1:3 :
longer object length is not a multiple of shorter object length
[1] 2 4 6 5 7 9 8 10 12 11
withCallingHandlers(f1(), warning=function(w){invokeRestart("muffleWarning")})
[1] 2 4 6 5 7 9 8 10 12 11

R: Transforming a vector into a list of arguments

I would like to apply a function to a vector. However, the function is expecting a sequence of arguments. Thus, I need to "split" the vector into unrelated arguments.
Suppose that I have a dataframe called dta. I want to run a function, say mean on one of its column, say DV.
The following shows the problem
call("mean", dta$DV)
returns
mean(c(0.371, -0.860, etc... ))
The fact that the column is a vector is not compatible with the function mean which expects a sequence of arguments, not combined.
The solution should work if "mean" is replaced by a variable containing a string, e.g.
fun <- "mean"
call( fun, dta$DV)
R has functions that are not completely consistent. For instance, min and max accept arbitrary numbers of arguments, where all unrecognized arguments are considered in the mathematical calculation. mean is not, it must have all numbers to be considered as the first (or named x=) argument.
min(1,20,0)
# [1] 0
min(c(1,20,0))
# [1] 0
mean(1,20,0) # might not be what one would expect
# [1] 1
mean(c(1,20,0))
# [1] 7
For the curious, the 20 and 0 are not ignored, the first mean call is interpreted as mean(0, trim=20, na.rm=0) (where na.rm=0 is effectively the same as na.rm=FALSE).
Your use of call is a little off. From the help ?call,
call returns an unevaluated function call
which doesn't really help you a lot. You might do eval(call(...)), but that seems silly in light of this next function.
Use of do.call is a bit more straight forward. I can take as its first argument: a function (anonymous or named) or a character string which matches a function. There are actually speed differences between using one or the other, so I tend to use a character reference to the function name when able. (I don't recall the reference that quantifies this assertion, I'll include it if I find it soon.)
For functions like min above that can accept any number of arguments, one can do this:
args <- c(1,20,0)
as.list(args)
# [[1]]
# [1] 1
# [[2]]
# [1] 20
# [[3]]
# [1] 0
do.call("min", as.list(args)) # == min(1,20,0)
# [1] 0
list(args)
# [[1]]
# [1] 1 20 0
do.call("min", list(args)) # == min(c(1,20,0))
# [1] 0
However, for mean and similar, you need to force the latter behavior:
do.call("mean", list(args)) # == mean(c(1,20,0))
# [1] 7
For you to call a function with programmatically-defined arguments, you need to use do.call.

How to safely drop nothing from a vector when the negative index could be integer(0)?

Suppose I have a vector x = 1:10, and it is constructed by concatenating two other vectors a = integer(0) and b = 1:10 together (this is an edge case). I want to split up the combined vector again into a and b later on. I would have thought I could safely separate them with:
i = seq_along(a)
x[i]
x[-i]
But I discovered that when I use x[-integer(0)] I get integer(0) returned, instead of x itself as I naively thought. What is the best way to do this sort of thing?
If you want to use negative indexing and the index may degenerate to integer(0) (for example, the index is computed from which), pad a large "out-of-bound" value to the index. Removing an "out-of-bound" value has no side effect.
x <- 1:10
i <- integer(0)
x[-c(i, 11)] ## position 11 is "out-of-bound"
# [1] 1 2 3 4 5 6 7 8 9 10
If you bother setting this "out-of-bound" value, here is a canonical choice: 2 ^ 31, because this value has exceeded representation range of 32-bit signed integer, yet it is not Inf.
An alternative way is to do an if test on length(i). For example:
if (length(i)) x[-i] else x
Caution: don't use function ifelse for this purpose.

Loss of decimal places when calculating mean in R

I have a list entitled SET1Bearing1slope with nine numbers, and each number has at least 10 decimal places. When I use the mean() function on the list I get an arithmetic mean
.
Yet if I list the numbers individually and then use the mean() function, I get a different output
I know that this is caused by a rounding and that the second mean is more accurate. Is there a way to avoid this issue? What method can I use to avoid rounding errors when calculating the mean?
In R, mean() expects a vector of values, not multiple values. It is also a generic function so it is tolerant of additional parameters it doesn't understand (but doesn't warn you about them). See
mean(c(1,5,6))
# [1] 4
mean(1, 5, 6) #only "1" is used here, 5 and 6 are ignored.
# [1] 1
So in your example there are no rounding errors, you are just calling the function incorrectly.
Look at the difference in the way you're calling the function:
mean(c(1,2,5))
[1] 2.666667
mean(1,2,5)
[1] 1
As pointed by MrFlick, in the first case you're passing a vector of numbers (the correct way); in the second, you're passing a list of arguments, and just the first one is considered.
As for the number of digits, you can specify it using options():
options(digits = 10)
x <- runif(10)
x
[1] 0.49957540398 0.71266139182 0.07266473584 0.90541790240 0.41799820261
[6] 0.59809536533 0.88133668737 0.17078919476 0.92475634208 0.48827998806
mean(x)
[1] 0.5671575214
But remember that a greater number of digits is not necessarily better. There's a reason why R and others limits the number os digits. Check this topic: https://en.wikipedia.org/wiki/Significance_arithmetic

Difference between mean(c(1,2,21)) and mean(1,2,21)

What's the difference between these two?
mean(c(1,2,21))
and
mean(1,2,21)
The answers are different, but what's the meaning of each one?
mean(c(1,2,21))
#[1] 8
This passes a vector of three elements to the mean function and the mean value of these three elements is calculated.
mean(1,2,21)
#[1] 1
This passes 1 as the first argument, 2 as the second argument and 21 as the third argument to the mean function. mean passes these arguments to mean.default. In help("mean.default") you can find the arguments of this function:
The object you want the mean for.
the fraction (0 to 0.5) of observations to be trimmed from each end of x before the mean is computed. Values of trim outside that range are taken as the nearest endpoint.
a logical value indicating whether NA values should be stripped before the computation proceeds. (Since you pass a numeric value, it is coerced to logical automatically).
So you calculate this:
mean.default(1, 0.5, TRUE)
[1] 1
When using mean(c(1,2,21)) R is taking the mean out of the vector consisting of 1,2 and 21, in the second case, when using mean(1,2,21), is equivalent to mean(1, trim=2, na.rm=21) and R is taking the mean out one single number, 1, and you are passing value 2 to trim which controls for the fraction (0 to 0.5) of observations to be trimmed from each end of the vector before the mean is computed, and also you are giving value 21 to na.rm argument, which should be TRUE or FALSE, as you can see 2 and 21 without c are completely useless here.

Resources