Simplify an equation by giving some variable a value in R - r

I want to simplify the equation below, and at the same time, varies the value of b
b*(b*((1-b)*x) + (1-b)*y) + (1-b)*z
So, if I give b = 0.9,
b <- 0.9
# the answer will be:
0.081x + 0.09y + 0.1z
The reason is I want to see how different values of b, will impact the weights/coefficients of x, y, and z.
I have no idea how to do this, or if it even possible in R.
Any help is appreciated.

I guess you may try Reduce like below
Reduce(function(u, v) b * u + v, (1 - b) * c(x, y, z))
and you will see
> b <- 0.9
> x <- 1e3
> y <- 1e2
> z <- 1e1
> Reduce(function(u, v) b * u + v, (1 - b) * c(x, y, z))
[1] 91
If you want to see the coefficients of x, y and z, you can use
> f <- function(b) (1 - b) * b^((3:1) - 1)
> f(0.9)
[1] 0.081 0.090 0.100
and the sum of weighted x, y, and z can be written as
s = sum(f(0.9)*c(x,y,z))

Related

Efficiently compute a sum based on sequences in R

I'm trying to compute a specific sum in R as quickly as possible. The object of interest is
and the relevant input objects are two L times K matrices x (contains only positive integers) and alpha (contains only positive real values). A is equivalent to rowSums(alpha) and N is equivalent to rowSums(x). Subscripts l and k denote a row / a column of alpha or x, respectively.
At first I thought it's going to be easy to come up with something that's super-quick, but I couldn't find an elegant solution. I think a matrix-valued version of seq() would be very helpful here. Does anyone have a creative solution to implement this efficiently?
Here's an easy-to-read, but obviously inefficient, loop-based version for reference:
# parameters
L = 20
K = 5
# x ... L x K matrix of integers
x = matrix(1 : (L * K), L, K)
# alpha ... L x K matrix of positive real numbers
alpha = matrix(1 : (L * K) / 100, L, K)
# N ... sum over rows of x
N = rowSums(x)
# A ... sum over rows of alpha
A = rowSums(alpha)
# implementation
stacksum = function(x, alpha, N, A){
# parameters
K = ncol(x)
L = nrow(x)
result = 0
for(ll in 1:L){
# first part of sum
first.sum = 0
for(kk in 1:K){
# create sequence
sequence.k = seq(alpha[ll, kk], (alpha[ll, kk] + x[ll, kk] - 1), 1)
# take logs and sum
first.sum = first.sum + sum(log(sequence.k))
}
# second part of sum
second.sum = sum(log(seq(A[ll], (A[ll] + N[ll] - 1), 1)))
# add to result
result = result + first.sum - second.sum
}
return(result)
}
# test
stacksum(x, alpha, N, A)
Update with a lgamma solution based on #RobertDodier comments.
Using sequence and rep.int.
# parameters
L <- 20
K <- 5
# x ... L x K matrix of integers
x <- matrix(1 : (L * K), L, K)
# alpha ... L x K matrix of positive real numbers
alpha <- matrix(1 : (L * K) / 100, L, K)
# N ... sum over rows of x
N <- rowSums(x)
# A ... sum over rows of alpha
A <- rowSums(alpha)
# proposed solution
stacksum2 <- function(x, alpha, N, A) {
sum(log(sequence(x, alpha) + rep.int(alpha %% 1, x))) - sum(log(sequence(N, A) + rep.int(A %% 1, N)))
}
# solution from Robert Dodier's comments
stacksum3 <- function(x, alpha, N, A) {
sum(lgamma(alpha + x) - lgamma(alpha)) - sum(lgamma(A + N) - lgamma(A))
}
# OP solution
stacksum1 = function(x, alpha, N, A){
# parameters
K = ncol(x)
L = nrow(x)
result = 0
for(ll in 1:L){
# first part of sum
first.sum = 0
for(kk in 1:K){
# create sequence
sequence.k = seq(alpha[ll, kk], (alpha[ll, kk] + x[ll, kk] - 1), 1)
# take logs and sum
first.sum = first.sum + sum(log(sequence.k))
}
# second part of sum
second.sum = sum(log(seq(A[ll], (A[ll] + N[ll] - 1), 1)))
# add to result
result = result + first.sum - second.sum
}
result
}
res <- list(
stacksum1(x, alpha, N, A),
stacksum2(x, alpha, N, A),
stacksum3(x, alpha, N, A)
)
all.equal(res[1:2], res[-1])
#> [1] TRUE
microbenchmark::microbenchmark(stacksum1 = stacksum1(x, alpha, N, A),
stacksum2 = stacksum2(x, alpha, N, A),
stacksum3 = stacksum3(x, alpha, N, A),
check = "equal")
#> Unit: microseconds
#> expr min lq mean median uq max neval
#> stacksum1 1654.2 1704.60 1899.384 1740.80 1964.75 4234.4 100
#> stacksum2 238.2 246.45 258.284 252.35 268.40 319.4 100
#> stacksum3 18.5 19.05 20.981 20.55 21.70 36.4 100

How to calculate integral inside an integral in R?

I need to evaluate an integral in the following form:
\int_a^b f(x) \int_0^x g(t)(x-t)dtdx
Can you please suggest a way? I assume that this integral can't be done in the standard approach suggested in the following answer:
Standard approach
Update: Functions are added in the following image. f(x) basically represents a pdf of a uniform distribution but the g(t) is a bit more complicated. a and b can be any positive real numbers.
The domain of integration is a simplex (triangle) with vertices (a,a), (a,b) and (b,b). Use the SimplicialCubature package:
library(SimplicialCubature)
alpha <- 3
beta <- 4
g <- function(t){
((beta/t)^(1/2) + (beta/t)^(3/2)) * exp(-(t/beta + beta/t - 2)/(2*alpha^2)) /
(2*alpha*beta*sqrt(2*pi))
}
a <- 1
b <- 2
h <- function(tx){
t <- tx[1]
x <- tx[2]
g(t) * (x-t)
}
S <- cbind(c(a, a), c(a ,b), c(b, b))
adaptIntegrateSimplex(h, S)
# $integral
# [1] 0.01962547
#
# $estAbsError
# [1] 3.523222e-08
Another way, less efficient and less reliable, is:
InnerFunc <- function(t, x) { g(t) * (x - t) }
InnerIntegral <- Vectorize(function(x) { integrate(InnerFunc, a, x, x = x)$value})
integrate(InnerIntegral, a, b)
# 0.01962547 with absolute error < 2.2e-16

Same logic but different results from a simple optimization in R

I'm completely baffled by the following simple R code. In the first part x will equal v (that's what I want).
But then strangely in the second part I change the input values but follow the exact same logic as in the first part HOWEVER this time x and v no longer match! I'm deeply wondering where is the problem?
First Part:
m1 = 5
m2 = 1.3*m1
A = m1 + m2
x = 5
a <- function(m3){
abs((m1 - (A + m3)/3)^2 + (1.3*m1 - (A + m3)/3)^2 + (m3 - (A + m3)/3)^2 - 3*x) }
m3 = optimize(a, interval = c(0, 100), tol = 1e-20)[[1]]
v = var(c(m1, m2, m3))*(2/3) # gives "5" same as "x"
Second Part:
eta.sq = .25
beta = qnorm(c(1e-12, .999999999999))
q = c(0, 25)
mu.sig = solve(cbind(1L, beta), q)
m1 = mu.sig[[1]]
H = (mu.sig[[2]])^2
m2 = 1.3 * m1
A = m1 + m2
x = (H * eta.sq) / (1 - eta.sq) # "x" is: 1.052529
a = function(m3){
abs((m1 - (A + m3)/3)^2 + (1.3*m1 - (A + m3)/3)^2 + (m3 - (A + m3)/3)^2 - 3*x) }
m3 = optimize(a, interval = c(0, 100), tol = 1e-20)[[1]]
v = var(c(m1, m2, m3))*(2/3) # "v" is: 2.343749
The difference is that for your first part, the function a has two roots, and the optimize function finds one of them (m3=10.31207). At this value of m3, the fact that a(m3)==0 implies that the normalized sum of squares (SS) of m1, m2, and m3 is equal to 3*x:
> a(m3)
[1] 3.348097e-07
> ss <- function(x) { sum((x-mean(x))^2) }
> ss(c(m1, m2, m3))
[1] 15
> 3*x
[1] 15
>
By the definition of the sample variance, the variable v is equal to one-third the SS, so you get v==x.
In contrast, in the second part, your function a has no roots. It attains a minimum at m3=14.375, but at this value of m3, the value of a(m3)==3.87366 is not zero, so the normalized sum of squares is not equal to 3*x, and so there's no reason to expect that v (one-third the SS) should equal x.
> a(m3)
[1] 3.87366
> ss(c(m1, m2, m3))
[1] 7.031247 -- actual SS value...
> 3*x
[1] 3.157587 -- ...couldn't be optimized to equal 3*x
>

solving set of linear equations using R for plane 3D equation

I have some trouble in order to solve my set of linear equations.
I have three 3D points (A, B, C) in my example and I want to automate the solving of my system. I want to create a plane with these 3 points.
It's very simple manually (mathematically) but I don't see why I don't solve my problem when I code...
I have a system of cartesian equation which is the equation of a plane : ax+by+cz+d=0
xAx + yAy + zA*z +d = 0 #point A
xBx + yBy + zB*z +d = 0 #point B
etc
I use a matrix, for example A=(0,0,1) ; B=(4,2,3) and C=(-3,1,0).
With manual solving, I have for this example this solution : x+3y-5z+5=0.
For resolving it in R : I wanted to use solve().
A <- c(0,0,1)
B <- c(4,2,3)
C <- c(-3,1,0)
res0 <- c(-d,-d,-d) #I don't know how having it so I tried c(0,0,0) cause each equation = 0. But I really don't know for that !
#' #param A vector 3x1 with the 3d coordinates of the point A
carteq <- function(A, B, C, res0) {
matrixtest0 <- matrix(c(A[1], A[2], A[3], B[1], B[2], B[3],C[1], C[2], C[3]), ncol=3) #I tried to add the 4th column for solving "d" but that doesn't work.
#checking the invertibility of my matrix
out <- tryCatch(determinant(matrixtest0)$modulus<threshold, error = function(e) e)#or out <- tryCatch(solve(X) %*% X, error = function(e) e)
abcd <- solve(matrixtest0, res0) #returns just 3 values
abcd <- qr.solve(matrixtest0, res0) #returns just 3 values
}
That's not the good method... But I don't know how I can add the "d" in my problem.
The return that I need is : return(a, b, c, d)
I thing that my problem is classical and easy, but I don't find a function like solve() or qr.solve() which can solve my problem...
Your solution is actually wrong:
A <- c(0,0,1)
B <- c(4,2,3)
C <- c(-3,1,0)
CrossProduct3D <- function(x, y, i=1:3) {
#http://stackoverflow.com/a/21736807/1412059
To3D <- function(x) head(c(x, rep(0, 3)), 3)
x <- To3D(x)
y <- To3D(y)
Index3D <- function(i) (i - 1) %% 3 + 1
return (x[Index3D(i + 1)] * y[Index3D(i + 2)] -
x[Index3D(i + 2)] * y[Index3D(i + 1)])
}
N <- CrossProduct3D(A - B, C - B)
#[1] 4 2 -10
d <- -sum(N * B)
#[1] 10
#test it:
crossprod(A, N) + d
# [,1]
#[1,] 0
crossprod(B, N) + d
# [,1]
#[1,] 0
crossprod(C, N) + d
# [,1]
#[1,] 0

Weighted Pearson's Correlation?

I have a 2396x34 double matrix named y wherein each row (2396) represents a separate situation consisting of 34 consecutive time segments.
I also have a numeric[34] named x that represents a single situation of 34 consecutive time segments.
Currently I am calculating the correlation between each row in y and x like this:
crs[,2] <- cor(t(y),x)
What I need now is to replace the cor function in the above statement with a weighted correlation. The weight vector xy.wt is 34 elements long so that a different weight can be assigned to each of the 34 consecutive time segments.
I found the Weighted Covariance Matrix function cov.wt and thought that if I first scale the data it should work just like the cor function. In fact you can specify for the function to return a correlation matrix as well. Unfortunately it does not seem like I can use it in the same manner because I cannot supply my two variables (x and y) separately.
Does anyone know of a way I can get a weighted correlation in the manner I described without sacrificing much speed?
Edit: Perhaps some mathematical function could be applied to y prior to the cor function in order to get the same results that I'm looking for. Maybe if I multiply each element by xy.wt/sum(xy.wt)?
Edit #2 I found another function corr in the boot package.
corr(d, w = rep(1, nrow(d))/nrow(d))
d
A matrix with two columns corresponding to the two variables whose correlation we wish to calculate.
w
A vector of weights to be applied to each pair of observations. The default is equal weights for each pair. Normalization takes place within the function so sum(w) need not equal 1.
This also is not what I need but it is closer.
Edit #3
Here is some code to generate the type of data I am working with:
x<-cumsum(rnorm(34))
y<- t(sapply(1:2396,function(u) cumsum(rnorm(34))))
xy.wt<-1/(34:1)
crs<-cor(t(y),x) #this works but I want to use xy.wt as weight
Unfortunately the accepted answer is wrong when y is a matrix of more than one row. The error is in the line
vy <- rowSums( w * y * y )
We want to multiply the columns of y by w, but this will multiply the rows by the elements of w, recycled as necessary. Thus
> f(x, y[1, , drop = FALSE], xy.wt)
[1] 0.103021
is correct, because in this case the multiplication is performed element-wise, which is equivalent to column-wise multiplication here, but
> f(x, y, xy.wt)[1]
[1] 0.05463575
gives a wrong answer due to the row-wise multiplication.
We can correct the function as follows
f2 <- function( x, y, w = rep(1,length(x))) {
stopifnot(length(x) == dim(y)[2] )
w <- w / sum(w)
# Center x and y, using the weighted means
x <- x - sum(x * w)
ty <- t(y - colSums(t(y) * w))
# Compute the variance
vx <- sum(w * x * x)
vy <- colSums(w * ty * ty)
# Compute the covariance
vxy <- colSums(ty * x * w)
# Compute the correlation
vxy / sqrt(vx * vy)
}
and check the results against those produced by corr from the boot package:
> res1 <- f2(x, y, xy.wt)
> res2 <- sapply(1:nrow(y),
+ function(i, x, y, w) corr(cbind(x, y[i,]), w = w),
+ x = x, y = y, w = xy.wt)
> all.equal(res1, res2)
[1] TRUE
which in itself gives another way that this problem could be solved.
You can go back to the definition of the correlation.
f <- function( x, y, w = rep(1,length(x))) {
stopifnot( length(x) == dim(y)[2] )
w <- w / sum(w)
# Center x and y, using the weighted means
x <- x - sum(x*w)
y <- y - apply( t(y) * w, 2, sum )
# Compute the variance
vx <- sum( w * x * x )
vy <- rowSums( w * y * y ) # Incorrect: see Heather's remark, in the other answer
# Compute the covariance
vxy <- colSums( t(y) * x * w )
# Compute the correlation
vxy / sqrt(vx * vy)
}
f(x,y)[1]
cor(x,y[1,]) # Identical
f(x, y, xy.wt)
Here is a generalization to compute the weighted Pearson correlation between two matrices (instead of a vector and a matrix, as in the original question):
matrix.corr <- function (a, b, w = rep(1, nrow(a))/nrow(a))
{
# normalize weights
w <- w / sum(w)
# center matrices
a <- sweep(a, 2, colSums(a * w))
b <- sweep(b, 2, colSums(b * w))
# compute weighted correlation
t(w*a) %*% b / sqrt( colSums(w * a**2) %*% t(colSums(w * b**2)) )
}
Using the above example and the correlation function from Heather, we can verify it:
> sum(matrix.corr(as.matrix(x, nrow=34),t(y),xy.wt) - f2(x,y,xy.wt))
[1] 1.537507e-15
In terms of calling syntax, this resembles the unweighted cor:
> a <- matrix( c(1,2,3,1,3,2), nrow=3)
> b <- matrix( c(2,3,1,1,7,3,5,2,8,1,10,12), nrow=3)
> matrix.corr(a,b)
[,1] [,2] [,3] [,4]
[1,] -0.5 0.3273268 0.5 0.9386522
[2,] 0.5 0.9819805 -0.5 0.7679882
> cor(a, b)
[,1] [,2] [,3] [,4]
[1,] -0.5 0.3273268 0.5 0.9386522
[2,] 0.5 0.9819805 -0.5 0.7679882

Resources