How to do a mathematical sum in R? - r

I have the following mathematical formula that I want to program as efficiently as possible in R.
$\sum_{i=1}^{N}(x_i-\bar x)(y_i-\bar y)$
Let's say we have the following example data:
x = c(1,5,7,10,11)
y = c(2,4,8,9,12)
How can I easily get this sum with this data without making a separate function?
Isn't there a package or a function that can compute these mathematical sums?

Use the sum command and vectorized operations: sum((x-mean(x))*(y-mean(y)))
The key revelation here is that the sum function is just taking the sum over the argument (vector, matrix, whatever). In this case, it's sufficient to give it a vector, and in this case, the vector expression is a little more complicated than sum(z), but notice that (x-mean(x))*(y-mean(y)) evaluates to z, so the fact that the command is slightly ornate doesn't really change how the function works. This is true in many places, not just the sum command.

Related

Why is for-loop not required for to build a function to scale all values in a vector between 0 and 1?

I found this previous thread here:
Range standardization (0 to 1) in R
This leads me to my question: when building a function to perform a calculation across all values
in a vector, my understanding was that this scenario is when the use of for-loops would be necessary (because said calculation is being all applied to all vector values). However, apparently that is not the case. What am I misunderstanding?
All of the basic arithmetic operations in R, and most of the basic numerical/mathematical functions, are natively vectorized; that is, they operate position-by-position on elements of vectors (or pairs of elements from matched vectors). So for example if you compute
c(1,3,5) + c(2,4,7)
You don't need an explicit for loop (although there is one in the underlying C code!) to get c(3,7,12) as the answer.
In addition, R has vector recycling rules; any time you call an operation with two vectors, the shorter automatically gets recycled to match the length of the longer one. R doesn't have scalar types, so a single number is stored as a numeric vector of length 1. So if we compute
(x-min(x))/(max(x)-min(x))
max(x) and min(x) are both of length 1, so the denominator is also of length 1. Subtracting min(x) from x, min(x) gets replicated/recycled to a vector the same length as x, and then the pairwise subtraction is done. Finally, the numerator (length == length(x)) is divided by the denominator (length 1) following similar rules.
In general, exploiting this built-in vectorization in R is much faster than writing your own for-loop, so it's definitely worth trying to get the hang of when operations can be vectorized.
This kind of vectorization is implemented in many higher-level languages/libraries that are specialized for numerical computation (pandas and numpy in Python, Matlab, Julia, ...)

How to use formula to calculate sum of sequence in R?

I'm wonder is there already some forumla function ready to ues, for example?
sum(seq(10)) = 55
It could use math formula by n to calculate for faster response. But I don't know what's the proper keyword to found out is there builtin formula or already exist in R reposotory?

Handling matrices using Brobdingnag package

I need to build a matrix with extremely small entries.
So far I realized that the fastest way to define the kind of matrix that I need is:
Define a vectorized function of coordinates:
func = function(m,n){...}
Combine every possible coordinate using outer:
matrix = outer(1:100,1:100,FUN=func)
Having to deal with extremely small numbers I work in func's environment using brob numbers, its output will therefore be of the same type of a brob:
typeof(func(0:100,0:100) )
[1] "S4"
If I directly plug two vectors 0:100 in my function func it returns a vector of brobs but if I try to use it with outer I get the error:
Error in outer(1:100, 1:100, FUN = func) : invalid first argument
I suppose this is because package Brobdingnag can somehow deal with vectors but not with matrices. Is it right? Is there any way to make it work?

what data type is produced by mapply()?

This is what my code looks like. a, b, c, and d are scalers, e is a list of vectors. A, B, C and D are vectors.
GetOutput=function(a,b,c,d){
e=FunOther(a,b,c,d)
i=mean(e$f)
j=mean(e$g)
k=abs(mean(e$h))
return(list(b=b,i=i,j=j,k=k))
}
Output=mapply(GetOutput,A,B,C,D)
GetOutput will return a list of 4 scalers. I want to factor this up to a matrix of inputs and a matrix of outputs. I had been using a for loop but I thought I would try mapply instead.
Suppose A, B, C and D have a length 100. I just want to get a vector with length 100 which give me all of the i's so that I can calculate their minima. Then the same for the j's and k's. This is part of a Monte Carlo study. But I am having trouble understanding the Output object. It appears to be a list of lists. What I thought would be a one liner turns into several operations. The best I can come up with is:
Output2=as.data.frame(t(Output))
OutputMeans=c(mean(as.numeric(Output2$i)),
mean(as.numeric(Output2$j)),
mean(as.numeric(Output2$k)))
This seems just bananas to me. I though I could operate on Output directly with the mean function without having to bother with all of these transformations.
If you had instead written: return( c(b=b,i=i,j=j,k=k) ) , then each element in the list from mapply would have been a named vector, rather than what you did get .... a list of lists. And since the 'simplify' argument would have let you return a matrix, you could have returned a non-recursive result as well. Because mapply is so versatile, it gives you multiple levels of control of the returned structure.
Another R programming tip: Don't use '$' as an extraction function inside functions. If you are sure of your column name you can use `[['colname']] but then by using '[[' you can later generalize your function to accept column names as arguments, a feature which '$' will not support.

R curve() on expression involving vector

I'd like to plot a function of x, where x is applied to a vector. Anyway, easiest to give a trivial example:
var <- c(1,2,3)
curve(mean(var)+x)
curve(mean(var+x))
While the first one works, the second one gives errors:
'expr' did not evaluate to an object of length 'n' and
In var + x : longer object length is not a multiple of shorter object length
Basically I want to find the minimum of such a function: e.g.
optimize(function(x) mean(var+x), interval=c(0,1))
And then be able to visualise the result. While the optimize function works, I can't figure out how to get the curve() to work as well.. Thanks!
The function needs to be vectorized. That means, if it evaluates a vector it has to return a vector of the same length. If you pass any vector to mean the result is always a vector of length 1. Thus, mean is not vectorized. You can use Vectorize:
f <- Vectorize(function(x) mean(var+x))
curve(f,from=0, to=10)
This can be done in the general case using sapply:
curve(sapply(x, function(e) mean(var + e)))
In the specific example you give, mean(var) + x, is of course arithmetically equivalent to what you're looking for. Similar shortcuts might exist for whatever more complicated function you're working with.

Resources