Implementing equations in R - r

I am new to R (also not too good at math) and I am trying to calculate this equation in R with some difficulties:
X is some integer data I have, with 550 samples.
Any help is appreciated since I am unsure how to do this. I think I have to use a for loop and the sum() function but other than that I don;t know.

R supports vectorisation, which means you very rarely need to implement for loops.
For example, you can solve your equation like so:
## I'm just making up a long numerical vector for x - obviously you can use anything
x <- 1:1000
solution <- sum(20/x)^0.5
Unless the brackets denote the integral, rather than the sum? In which case:
solution <- sum( (20/x)^0.5 )

Related

R: difference between apply(object, 1, function(x) sum(x-a)/b) and rowsums((object-a)/b)

I'm new to R and am struggling with the apply function. It is really slow to execute and I was trying to optimize some code I received.
I am trying to do some matrix operations (element-wise multiplication and division on ~10^6 element matrices) then sum the rows of the resulting matrix. I found the fantastic library Rfast and it executes what I thought was the same code in about 1/30 the time, but I am getting systematic differences between my 'optimized' answer and the previous answer.
The original code was something along the lines of
ans <- apply(object, 1, function(x) sum((x - a) / b))
and my code is
ans = Rfast:::rowsums((object-a)/b)
I'm not sure if it's because one of the methods is throwing away precision or making rounding errors - any thoughts?
Edit
Trying to reproduce the error is pretty hard...
I have been able to isolate the discrepancy to when I divide by my vector b with entries each ~ 3000 (i.e. [3016.460436, 3021.210321, 3033.3303219]. If I take this term out the two methods give the same answer.
I then tried two methods to improve my answer, one was dividing b by 1000 then dividing the sum by 1000 at the end. This didn't work, presumably because the float precision is the same either way.
I also tried forcing my b vector to be integers, which also didn't work.
Sample data doesn't reproduce my error either, which is frustrating...
objmat = rbind(rep(c(1,0,0),1000),rep(c(0,0,1),1000))
amat = rbind(rep(c(0.064384654, 0.025465132, 0.36543214),1000))
bmat = rbind(rep(c(1016.460431,1021.210431,1033.330431),1000))
ans = apply(objmat,1,function(x) sum((x-amat)/bmat))
gives
ans[1] = 0.5418828413
rowsums((objmat[1,]-amat)/bmat) = 0.5418828413
I think it has to be a floating point precision error, but I'm not sure why my dummy data doesn't reproduce it, or which method (apply or rowsums) would be more accurate!

Limit in R in terms of another variable

I need to take a limit of a function $\frac{x^n-1}{x-1}$ in R but I want the answer in terms of n. I tried defining n as a symbol but it did not work. I am new to R so I would appreciate any assistance.
Assuming that you want to take the limit of this function for x->1, you can obtain the result by using the package Ryacas in the following way:
require(Ryacas)
x <- Sym("x")
n <- Sym("n")
Limit((x^n-1)/(x-1),x,1)
which yields the answer:
expression(n)

Output of parApply different from my input

I am still quite new to r (used to program in Matlab) and I am trying use the parallel package to speed up some calculations. Below is an example which I am trying to calculate the rolling standard deviation of a matrix (by column) with the use of zoo package, with and without parallelising the codes. However, the shape of the outputs came out to be different.
# load library
library('zoo')
library('parallel')
library('snow')
# Data
z <- matrix(runif(1000000,0,1),100,1000)
#This is what I want to calculate with timing
system.time(zz <- rollapply(z,10,sd,by.column=T, fill=NA))
# Trying to achieve the same output with parallel computing
cl<-makeSOCKcluster(4)
clusterEvalQ(cl, library(zoo))
system.time(yy <-parCapply(cl,z,function(x) rollapplyr(x,10,sd,fill=NA)))
stopCluster(cl)
My first output zz has the same dimensions as input z, whereas output yy is a vector rather than a matrix. I understand that I can do something like matrix(yy,nrow(z),ncol(z)) however I would like to know if I have done something wrong or if there is a better way of coding to improve this. Thank you.
From the documentation:
parRapply and parCapply always return a vector. If FUN always returns
a scalar result this will be of length the number of rows or columns:
otherwise it will be the concatenation of the returned values.
And:
parRapply and parCapply are parallel row and column apply functions
for a matrix x; they may be slightly more efficient than parApply but
do less post-processing of the result.
So, I'd suggest you use parApply.

R: conditional expand.grid function

I would like to find all combinations of vector elements that matches a specific condition. The function expand.grid returns all possible combinations without checking for a specific condition. It is possible to test for a specific condition after using the expand.grid function, but in some situations the number of possible combinations is too large to generate them with expand.grid. Therefore is there a function that allows me to check for a condition while generating all possible combinations.
This is a simplified version of the problem:
A <- seq.int(12, from=0, by=1)*15
B <- seq.int(27, from=0, by=1)*23
C <- seq.int(18, from=0, by=1)*18
D <- seq.int(33, from=0, by=1)*10
out<-expand.grid(A,B,C,D) #out is a dataframe with 235144 x 4 as dimensions
idx<-which(rowSums(out)<=400 & rowSums(out)>=300) #Only a small fraction of 'out' is needed
results <- out(idx,)
In a word, no. After all, if you knew a priori which combinations were desirable/undesirable, you could exclude them from the expansion, e.g. expand.grid(A[A<20],B[B<15],...) . In the general case, which I'm assuming is your real question, you have no simple way to exclude portions of the input vectors.
You might just want to write a multilevel loop which tests each combination in turn and saves or rejects it. This will be slow (again, unless you come up with some clever algorithm to predict regions which are all TRUE or FALSE). So, in the long run, you may be better off using some of the R-packages which partition large calculations (and datasets) so as to avoid exceeding your memory limits.
Now that I've said all that, someone's going to post a link to a package which does exactly that :-(

using argmax or something simpler in R

I am trying to set up a Gibbs sampler in R where I update my value at each step.
I have a function in R that I want to maximise for 2 values; my previous value and a new one.
So I know the maximum outcome from the function applied to both values. But then how do I select the best input without doing it manually? (I need to do a lot of iterations). Here is an idea of the code and the variables:
g0<-function(k){sample(0:1,k,replace=T)}
this is a k dimensional vector with entries 1 or 0 uniformly. Initial starting point for my chain. If i=1 then include the i'th variable in the design matrix.
X1 design matrix
Xg<-function(g){
Xg<-cbind(X1[,1]*g[1],X1[,2]*g[2],X1[,3]*g[3],X1[,4]*g[4],X1[,5]*g[5],X1[,6]*g[6],X1[,7]*g[7])
return(Xg[,which(!apply(Xg,2,FUN = function(x){all(x == 0)}))])
}
Xg0<-Xg(g0)
reduced design matrix for g0
c<-1:100000
mp<-function(g){
mp<-sum((1/(c*(c+1)^-((q+1)/2)))*
(t(Y)%*%Y-(c/(c+1))*t(Y)%*%Xg(g)%*%solve(t(Xg(g))%*%Xg(g))%*%t(Xg(g))%*%Y)^(-27/2))
return(mp)
}
this is my function.
Therefore if I have mp(g) and mp(g*), for 2 inputs g and g*, such that the max is mp(g*) how can I return g*?
Thanks for any help and if you have any queries just ask. sorry about the messy code as well; I have not used this site before.
Like this:
inputs <- list(g, g2)
outputs <- sapply(inputs, mp)
best.input <- inputs[which.max(outputs)]

Resources