Speeding up this tricky matrix calculation - r

As of now I am computing some features from a large matrix and doing it all in a for-loop. As expected it's very slow. I have been able to vectorize part of the code, but I'm stuck on one part.
I would greatly appreciate some advice/help!
s1 <- MyMatrix #dim = c(5167,256)
fr <- MyVector #vector of length 256
tw <- 5
fw <- 6
# For each point S(t,f) we need the sub-matrix of points S_hat(i,j),
# i in [t - tw, t + tw], j in [f - fw, f + fw] for the feature vector.
# To avoid edge effects, I pad the original matrix with zeros,
# resulting in a matrix of size nobs+2*tw x nfreqs+2*fw
nobs <- dim(s1)[1] #note: this is 5167
nf <- dim(s1)[2] #note: this is 256
sp <- matrix(0, nobs+2*tw, nf+2*fw)
t1 <- tw+1; tn <- nobs+tw
f1 <- fw+1; fn <- nf+fw
sp[t1:tn, f1:fn] <- s1 # embed the actual matrix into the padding
nfeatures <- 1 + (2*tw+1)*(2*fw+1) + 1
fsp <- array(NaN, c(dim(sp),nfeatures))
for (t in t1:tn){
for (f in f1:fn){
fsp[t,f,1] <- fr[(f - f1 + 1)] #this part I can vectorize
fsp[t,f,2:(nfeatures-1)] <- as.vector(sp[(t-tw):(t+tw),(f-fw):(f+fw)]) #this line is the problem
fsp[t,f,nfeatures] <- var(fsp[t,f,2:(nfeatures-1)])
}
}
fspec[t1:tn, f1:fn, 1] <- t(matrix(rep(fr,(tn-t1+1)),ncol=(tn-t1+1)))
#vectorized version of the first feature ^
return(fsp[t1:tn, f1:fn, ]) #this is the returned matrix

I assume that the var feature will be easy to vectorize after the 2nd feature is vectorized

Related

Remove a vector from another vector

I would like to remove from the vector wine below the vector b=c(1,0).
The result should be d=c(1,1,0).
library(gtools)
wine=c(1,1,1,0,0)
x=combinations(5,2,v=wine,set=FALSE,repeats.allowed=FALSE)
y=matrix(NA,nrow(x),3)
I want to find the complementary matrix y of x.
Thanks for your time.
The following uses a function I have posted here. The function finds where in y the vector x occurs returning an index vector into y.
First, get where b occurs in wine. Then the location is used to remove the found vector.
occurs <- function(x, y) {
m <- length(x)
n <- length(y)
candidate <- seq.int(length = n - m + 1L)
for (i in seq.int(length = m)) {
candidate <- candidate[x[i] == y[candidate + i - 1L]]
}
candidate
}
wine <- c(1,1,1,0,0)
b <- c(1,0)
i <- occurs(b, wine)
d <- wine[-(i + seq(b) - 1L)]
d
#[1] 1 1 0

Is it possible to use vector math in R for a summation involving intervals?

Title's a little rough, open to suggestions to improve.
I'm trying to calculate time-average covariances for a 500 length vector.
This is the equation we're using
The result I'm hoping for is a vector with an entry for k from 0 to 500 (0 would just be the variance of the whole set).
I've started with something like this, but I know I'll need to reference the gap (i) in the first mean comparison as well:
x <- rnorm(500)
xMean <-mean(x)
i <- seq(1, 500)
dfGam <- data.frame(i)
dfGam$gamma <- (1/(500-dfGam$i))*(sum((x-xMean)*(x[-dfGam$i]-xMean)))
Is it possible to do this using vector math or will I need to use some sort of for loop?
Here's the for loop that I've come up with for the solution:
gamma_func <- function(input_vec) {
output_vec <- c()
input_mean <- mean(input_vec)
iter <- seq(1, length(input_vec)-1)
for(val in iter){
iter2 <- seq((val+1), length(input_vec))
gamma_sum <- 0
for(val2 in iter2){
gamma_sum <- gamma_sum + (input_vec[val2]-input_mean)*(input_vec[val2-val]-input_mean)
}
output_vec[val] <- (1/length(iter2))*gamma_sum
}
return(output_vec)
}
Thanks
Using data.table, mostly for the shift function to make x_{t - k}, you can do this:
library(data.table)
gammabar <- function(k, x){
xbar <- mean(x)
n <- length(x)
df <- data.table(xt = x, xtk = shift(x, k))[!is.na(xtk)]
df[, sum((xt - xbar)*(xtk - xbar))/n]
}
gammabar(k = 10, x)
# [1] -0.1553118
The filter [!is.na(xtk)] starts the sum at t = k + 1, because xtk will be NA for the first k indices due to being shifted by k.
Reproducible x
x <- c(0.376972124936433, 0.301548373935665, -1.0980231706536, -1.13040590360378,
-2.79653431987176, 0.720573498411587, 0.93912102300901, -0.229377746707471,
1.75913134696347, 0.117366786802848, -0.853122822287008, 0.909259181618213,
1.19637295955276, -0.371583903741348, -0.123260233287436, 1.80004311672545,
1.70399587729432, -3.03876460529759, -2.28897494991878, 0.0583034949929225,
2.17436525195634, 1.09818265352131, 0.318220322390854, -0.0731475581637693,
0.834268741278827, 0.198750636733429, 1.29784138432631, 0.936718306241348,
-0.147433193833294, 0.110431994640128, -0.812504663900505, -0.743702167768748,
1.09534507180741, 2.43537370755095, 0.38811846676708, 0.290627670295127,
-0.285598287083935, 0.0760147178373681, -0.560298603759627, 0.447188372143361,
0.908501134499943, -0.505059597708343, -0.301004012157305, -0.726035976548133,
-1.18007702699501, 0.253074712637114, -0.370711296884049, 0.0221795637601637,
0.660044122429767, 0.48879363533552)

Can't get an R loop to execute

I am starting in R and trying to get this loop to execute. I am trying to get the loop to calculate consecutive distances between coordinates using a function (Vincenty's formula). 'Distfunc' is the file to the function. The function is then called up by 'x' below. All I want then is a data frame or a list of the distances between coordinates. Greatful of any help!
Distfunc <- source("F://Distfunc.R")
for (i in length(Radians)) {
LatRad1 <- Radians[i,1]
LongRad1 <- Radians[i,2]
LatRad2 <- Radians[i+1,1]
LongRad2 <- Radians[i+1,2]
x <- gcd.vif(LongRad1, LatRad1, LongRad2, LatRad2)
print(data.frame(x[i]))
}
Well, without a good description of the problem you are facing and a proper reproducible example it is very difficult to provide any good insight. To start off, see How to make a great R reproducible example?.
There are many things that are not clear in the way you are doing things. First of all, why assign the results of source(...) to the variable Distfunc?
Anyways, here is some code that I put together in trying to understand this; it runs without problems, but it is not clear that it accomplishes what you expect (since you don't provide much information). In particular, the codet uses the implementation for function gcd.vif by Mario Pineda-Krch (http://www.r-bloggers.com/great-circle-distance-calculations-in-r/). The code below is aimed at clarity, since you mention that you are starting in R.
# Calculates the geodesic distance between two points specified by radian latitude/longitude using
# Vincenty inverse formula for ellipsoids (vif)
# By Mario Pineda-Krch (http://www.r-bloggers.com/great-circle-distance-calculations-in-r/)
gcd.vif <- function(long1, lat1, long2, lat2) {
# WGS-84 ellipsoid parameters
a <- 6378137 # length of major axis of the ellipsoid (radius at equator)
b <- 6356752.314245 # ength of minor axis of the ellipsoid (radius at the poles)
f <- 1/298.257223563 # flattening of the ellipsoid
L <- long2-long1 # difference in longitude
U1 <- atan((1-f) * tan(lat1)) # reduced latitude
U2 <- atan((1-f) * tan(lat2)) # reduced latitude
sinU1 <- sin(U1)
cosU1 <- cos(U1)
sinU2 <- sin(U2)
cosU2 <- cos(U2)
cosSqAlpha <- NULL
sinSigma <- NULL
cosSigma <- NULL
cos2SigmaM <- NULL
sigma <- NULL
lambda <- L
lambdaP <- 0
iterLimit <- 100
while (abs(lambda-lambdaP) > 1e-12 & iterLimit>0) {
sinLambda <- sin(lambda)
cosLambda <- cos(lambda)
sinSigma <- sqrt( (cosU2*sinLambda) * (cosU2*sinLambda) +
(cosU1*sinU2-sinU1*cosU2*cosLambda) * (cosU1*sinU2-sinU1*cosU2*cosLambda) )
if (sinSigma==0) return(0)  # Co-incident points
cosSigma <- sinU1*sinU2 + cosU1*cosU2*cosLambda
sigma <- atan2(sinSigma, cosSigma)
sinAlpha <- cosU1 * cosU2 * sinLambda / sinSigma
cosSqAlpha <- 1 - sinAlpha*sinAlpha
cos2SigmaM <- cosSigma - 2*sinU1*sinU2/cosSqAlpha
if (is.na(cos2SigmaM)) cos2SigmaM <- 0  # Equatorial line: cosSqAlpha=0
C <- f/16*cosSqAlpha*(4+f*(4-3*cosSqAlpha))
lambdaP <- lambda
lambda <- L + (1-C) * f * sinAlpha *
(sigma + C*sinSigma*(cos2SigmaM+C*cosSigma*(-1+2*cos2SigmaM*cos2SigmaM)))
iterLimit <- iterLimit - 1
}
if (iterLimit==0) return(NA)  # formula failed to converge
uSq <- cosSqAlpha * (a*a - b*b) / (b*b)
A <- 1 + uSq/16384*(4096+uSq*(-768+uSq*(320-175*uSq)))
B <- uSq/1024 * (256+uSq*(-128+uSq*(74-47*uSq)))
deltaSigma = B*sinSigma*(cos2SigmaM+B/4*(cosSigma*(-1+2*cos2SigmaM^2) -
B/6*cos2SigmaM*(-3+4*sinSigma^2)*(-3+4*cos2SigmaM^2)))
s <- b*A*(sigma-deltaSigma) / 1000
return(s) # Distance in km
}
# Initialize the variable 'Radians' with random data
Radians <- matrix(runif(20, min = 0, max = 2 * pi), ncol = 2)
lst <- list() # temporary list to store the results
for (i in seq(1, nrow(Radians) - 1)) { # loop through each row of the 'Radians' matrix
LatRad1 <- Radians[i, 1]
LongRad1 <- Radians[i, 2]
LatRad2 <- Radians[i + 1, 1]
LongRad2 <- Radians[i + 1, 2]
gcd_vif <- gcd.vif(LongRad1, LatRad1, LongRad2, LatRad2)
# Store the input data and the results
lst[[i]] <- c(
latitude_position_1 = LatRad1,
longtude_position_1 = LongRad1,
latitude_position_2 = LatRad2,
longtude_position_2 = LongRad2,
GCD = gcd_vif
)
}
Results <- as.data.frame(do.call(rbind, lst)) # store the input data and the results in a data frame

non-numeric argument to binary operator, AR(1) model

I have an exercise to do where I have to run the following AR(1) model:
xi =c+φxi−1+ηi (i=1,...,T)
I know that ni ~ N(0,1) ; x0 ~ N(c/(1-φ),1/(1-φˆ2)); c= 2 ; φ = 0.6
I am trying to do a for loop. My code is the following:
n <- rnorm(T, 0, 1)
c <- 2
phi <- 0.6
x_0 <- rnorm(1,c/(1-phi), 1/(1-phi**2))
v <- vector("numeric", 0)
#for (i in 2:T){
name <- paste("x", i, sep="_")
v <- c(v,name)
v[1] <- c + phi*x_0 + n[1]
v[i] <- c + phi*v[i-1] + n[i]
}
However, I keep getting this error:
Error in phi * v[i - 1] : non-numeric argument to binary operator
I understand what this error is, but I can't find any solutions to solve it. Could someone please enlighten me? How could I assign numeric values to the name vector?
Thank you!
You're defining v as a numeric vector, but then v <- c(v, name) turns v into a character vector since name is character. That's what's causing the error.
If I'm not mistaken, your intent is to assign names to the values in a numeric vector. That's fine, you just need a different approach.
n <- rnorm(t)
c <- 2
phi <- 0.6
x_0 <- rnorm(1, c/(1-phi), 1/(1-phi^2))
v <- c + phi*x_0 + n[1]
for (i in 2:t) {
v[i] <- c + phi*v[i-1] + n[i]
}
names(v) <- paste("x", 1:t, sep="_")
Vectors in R don't have a static size; they're dynamically resized as needed. So even though we're initializing v with a scalar value, it grows to fit each new value in the loop.
The final step is to give v a list of names. This can be accomplished using names(v) <-. Take a look at v now--it has names!
And as an aside, since T is a synonym for TRUE in R, it's best not to use T as a variable name. Thus I've used t here instead.
I guess you seem to need the following. It'll produces 11 elements including the initial x value. You may exclude it later.
set.seed(1237)
t <- 10
n <- rnorm(t, 0, 1)
c <- 2
phi <- 0.6
x0 <- rnorm(1, c/(1-phi), 1/(1-phi**2))
v <- c(x0, rep(0, t))
for(i in 2:length(v)) {
v[i] <- c + phi * v[i-1] + n[i-1]
}
v
[1] 4.967833 4.535847 2.748292 2.792992 5.389548 6.173001 4.526824 3.790483 4.307981 5.442913 4.958193

R: Generate matrix from function

In R I'm interested in the general case to generate a matrix from a formula such as:
X = some other matrix
Y(i, j) = X(i, j) + Y(i - 1, j - 1)
Unfortunately I can't find how to account for the matrix self-referencing.
Obviously order of execution and bounds checking are factors here, but I imagine these could be accounted for by the matrix orientation and formula respetively.
Thanks.
This solution assumes that you want Y[1,n] == X[1,n] and Y[n,1] == X[n,1]. If not, you can apply the same solution on the sub-matrix X[-1,-1] to fill in the values of Y[-1,-1]. It also assumes that the input matrix is square.
We use the fact that Y[N,N] = X[N,N] + X[N-1, N-1] + ... + X[1,1] plus similar relations for off-diagonal elements. Note that off-diagonal elements are a diagonal of a particular sub-matrix.
# Example input
X <- matrix(1:16, ncol=4)
Y <- matrix(0, ncol=ncol(X), nrow=nrow(X))
diag(Y) <- cumsum(diag(X))
Y[1,ncol(X)] <- X[1,ncol(X)]
Y[nrow(X),1] <- X[nrow(X),1]
for (i in 1:(nrow(X)-2)) {
ind <- seq(i)
diag(Y[-ind,]) <- cumsum(diag(X[-ind,])) # lower triangle
diag(Y[,-ind]) <- cumsum(diag(X[,-ind])) # upper triangle
}
Well, you can always use a for loop:
Y <- matrix(0, ncol=3, nrow=3)
#boundary values:
Y[1,] <- 1
Y[,1] <- 2
X <- matrix(1:9, ncol=3)
for (i in 2:nrow(Y)) {
for (j in 2:ncol(Y)) {
Y[i, j] <- X[i, j] + Y[i-1, j-1]
}
}
If that is too slow you can translate it to C++ (using Rcpp) easily.

Resources