I have got a triple summation expression like this
sum(l(from 1 to n))
sum(i(from 1 to m))
sum(t(from 1 to m)
[phil_z1_1[i]*phil_z1_1[t}*I(X(l)<min(y(i),y(t))]
I have done:
set.seed(1234567)
x <- rnorm(2900)
n <- length(x)
y <- rnorm(3000)*0.25
m <-length(y)
z1 <- runif(m,min=0,max=1)
z2 <- runif(m,min=0,max=1)
phil_z1_1 <- sqrt(12*(z1/z2)))
for min(y[i],y[t]) I have done something like
y_m<-matrix(rep(y,length(y)),ncol=length(y))
y_m_t<-t(y_m)
y_min<-pmin(y_m_t,y_m)
After expanding the two inner summation, For example, for example m=2,n=3
I can put the original expression into the matrices like x*A*x'
where
x=[phil_z1_1[1] phil_z1_1[2]]
A is a 2*2 matrix
A=[sum(from 1 to n) I(x[l]<=min(y[1],y[1]), sum(from 1 to n) I(x[l]<=min(y1,y2); sum(from 1 to n) I(x[l]<=min(y[2],y[1]), sum(from 1 to n) I(x[l]<=min(y[2],y[2])]
Therefore,
x*A*x'=[phil_z1_1[1] phil_z1_1[2]]*[sum(from 1 to n) I(x[l]<=min(y[1],y[1]), sum(from 1 to n) I(x[l]<=min(y1,y2); sum(from 1 to n) I(x[l]<=min(y[2],y[1]), sum(from 1 to n) I(x[l]<=min(y[2],y[2])][phil_z1_1[1] phil_z1_1[2]]'
Basically I want to create a m*m matrix for A, in which each individual element is equal to the sum of its corresponding part, for example, sum(from 1 to n)x[l]<=min(y[1],y[1]) will be the a11 of matrix A I want to create
I have tried to use
args <- expand.grid(l=1:n, i=1:m, t=1:m)
args <- subset(args, x[l] <= pmin(y[i],y[t])-z1[i]*z2[t])
args <- transform(args, result=phil_z1_1[i]*phil_z1_1[t])
sum(args[,"result"])
But r cannot run the above programming, as the sample size of data set is too big, around 3,000.
Can someone tell me how to solve this problem?
Thanks in advance!
Here is a matrix approach for your triple sum
set.seed(1234567)
n <- 10
x <- rnorm(n)
m <- 3000
y <- rnorm(m)/4
y_m <- pmin(matrix(rep(y,m), ncol=m, byrow=TRUE), y)
z1 <- runif(m,min=0,max=1)
z2 <- runif(m,min=0,max=1)
phi <- sqrt(12*(z1/z2))
phi_m <- phi %o% phi
f1 <- function(l) sum(phi_m * (x[l] < y_m))
sum(sapply(1:n, f1))
[1] 242034847337
It is not lightning fast, but much faster than the data.frame approach
f2 <- function(lrng) {
args <- expand.grid(l=lrng, i=1:m, t=1:m)
args <- subset(args, x[l] <= pmin(y[i],y[t]))
args <- transform(args, result=phi[i]*phi[t])
sum(args[,"result"])
}
sum(sapply(1:n, f2)) # 90 times slower
[1] 242034847337
Related
I am a novice in R required by my superior to do things a certain way. I am interested in determining values of descriptive statistics setup count and heavy-dominance setup count. Setup count basically counts the number of setups found within a location, while heavy-dominance setup count counts the number of setups that has dominance values of x population ≥ 50% within the said location. This is how I would normally calculate said statistics:
##Normal Approach
#Sample Data 1
v <- c(53, 2, 97) #let vector "v" represent Location 1
w <- c(7, 16, 31, 44, 16) #let vector "w" represent Location 2
#Setup Count
sc_v <- length(v)
sc_w <- length(w)
sc <- c(sc_v, sc_w)
sc
#Heavy-Dominance Setup Count
hd_v <- length(which(v >= 50))
hd_w <- length(which(w >= 50))
hd <- c(hd_v, hd_w)
hd
I am tasked with developing a function that can both determine said statistical values from raw data and concatenate the outputs into a single vector. Here are the working functions I developed:
#Setup Count (2 vectors at a time only)
setup.count <- function(x, y){
a <- length(x)
b <- length(y)
d <- c(a, b)
d
}
#Heavy-Dominance Setup Count (2 vectors at a time only)
heavy.dominance <- function(x, y){
a <- length(which(x >= 50))
b <- length(which(y >= 50))
d <- c(a, b)
d
}
y <- setup.count(v, w)
y
z <- heavy.dominance(v, w)
z
Suppose there are more than two locations:
#Sample Data 2
v <- c(53, 2, 97)
w <- c(7, 16, 31, 44, 16)
x <- c(45, 22, 96, 74) #let vector "x" represent the additional Location 3
How can I specify R to take one argument at a time when passing multiple arguments using '...'? Here are the failed attempts to revise the abovementioned functions, to give an idea:
##Attempt 1
#Setup Count (incorrect v1)
setup.count <- function(x, ...){
data <- list(...)
a <- length(x)
b <- length(data) #will return the number of locations other than x, not the separate number of setups within each of these locations
d <- c(a, b)
d
}
#Heavy-Dominance Setup Count (incorrect v1)
heavy.dominance <- function(x, ...){
data <- list(...)
a <- length(which(x >= 50))
b <- length(which(data >= 50)) #will return the error "'list' object cannot be coerced to type 'double'"
d <- c(a, b)
d
}
y <- setup.count(v, w, x)
y
z <- heavy.dominance(v, w, x)
z
##Attempt 2
#Setup Count (incorrect v2)
setup.count <- function(x, ...){
data <- list(...)
a <- length(x)
b <- length(unlist(data)) #will return the total number of setups in all locations other than x, not as separate values
d <- c(a, b)
d
}
#Heavy-Dominance Setup Count (incorrect v2)
heavy.dominance <- function(x, ...){
data <- list(...)
a <- length(which(x >= 50))
b <- length(which(unlist(data) >= 50)) #will return the total number of setups with dominance ≥ 50% in all locations other than x, not as separate values
d <- c(a, b)
d
}
y <- setup.count(v, w, x)
y
z <- heavy.dominance(v, w, x)
z
You may just list() elements in the ellipsis. Use sapply() to loop over the list elements. Add a type= argument to have one function for both purposes, and a thresh= argument.
setup.fun <- function(..., type=c('count', 'dominance'), thresh=50) {
x <- list(...)
type <- match.arg(type)
if (type == 'count') sapply(x, length)
else sapply(x, function(x) length(which(x >= thresh)))
}
setup.fun(v, w, x)
# [1] 3 5 4
setup.fun(v, w, x, type='count')
# [1] 3 5 4
setup.fun(v, w, x, type='dominance')
# [1] 2 0 2
setup.fun(v, w, x, type='d')
# [1] 2 0 2
setup.fun(v, w, x, v)
# [1] 3 5 4 3
setup.fun(v)
# [1] 3
setup.fun(v, w, x, type='dominance', thresh=40)
# [1] 2 1 3
I am having problems when saving the results in a for loop.
I am computing a variance (this is not relevant I think) and my code is:
library(dirmult)
n <- 50
p <- 20
size <- 5*p
prob_true <- rep(1/p, p)
multinom <- as.matrix(rmultinom(n, size, prob = prob_true))
zeros <- round(0.5*p*n)
a <- c(as.matrix(multinom))
a[sample(1:(p*n), zeros)] <- 0
data_zeros <- matrix(a, p, n)
dirmult <- dirmult(t(data_zeros))
alpha <- dirmult$gamma
sum_alpha <- (1-dirmult$theta)/dirmult$theta
for (j in ncol(data_zeros)){
A <- alpha/sum_alpha
B <- 1 - A
N <- colSums(data_zeros)
C <- 1 + sum_alpha
var_s_dirm <- list()
var_s_dirm[[j]] <- N[j]*A*B*((N[j]+sum_alpha)/C)
}
In particular I can say that alpha is a vector with 20 values, sum_alpha is a scalar data_zeros is my dataset which has 20 rows and 50 columns and N is the sum of each column of the dataset, so it is a vector with 50 values.
It seems very simple to do what I wanted to do:
I want to get a list with 50 vectors where each one differs form the other by the fact that I multiply for a different value of N.
I really hope that somebody can help me finding the error.
The problem is (probably) you are setting constants in each time j is increased, and in each step you clear the list with the line var_s_dirm <- list()...
See if this works for you
library(dirmult)
n <- 50
p <- 20
size <- 5*p
prob_true <- rep(1/p, p)
multinom <- as.matrix(rmultinom(n, size, prob = prob_true))
zeros <- round(0.5*p*n)
a <- c(as.matrix(multinom))
a[sample(1:(p*n), zeros)] <- 0
data_zeros <- matrix(a, p, n)
dirmult <- dirmult(t(data_zeros))
alpha <- dirmult$gamma
sum_alpha <- (1-dirmult$theta)/dirmult$theta
A <- alpha/sum_alpha
B <- 1 - A
N <- colSums(data_zeros)
C <- 1 + sum_alpha
var_s_dirm <- list()
for (j in 1:ncol(data_zeros)){
var_s_dirm[[j]] <- N[j]*A*B*((N[j]+sum_alpha)/C)
}
output
var_s_dirm
[[1]]
[1] 2.614833 2.327105 2.500483 3.047700 2.233528 2.130223 2.700103 2.869699 2.930213 2.575903 2.198459 2.846096
[13] 2.425448 3.517559 3.136266 2.565345 2.578267 2.763113 2.709707 3.420792
[[2]]
[1] 2.568959 2.286279 2.456615 2.994231 2.194343 2.092850 2.652732 2.819353 2.878806 2.530712 2.159889 2.796165
[13] 2.382897 3.455848 3.081244 2.520339 2.533034 2.714637 2.662168 3.360778
[[3]]
[1] 3.211199 2.857849 3.070769 3.742790 2.742930 2.616064 3.315916 3.524193 3.598509 3.163391 2.699862 3.495207
[13] 2.978622 4.319811 3.851556 3.150424 3.166294 3.393297 3.327711 4.200974
....
This is a combination problem. I have 10 shops. I want to find the best 8 shops which minimize the sum of distances from my 100 observations.
From the combination matrix 'test_comb_matrix', I wish to extract combinations of 8 at each iteration of choose(10, 8).
I then apply those indices to the distance matrix 'test_dist_matrix' and record distances. I use pmin() to find the closest shop for each observation, then record the minimum in myminCol.
Below is my code (scroll down for reproducible code). I want to remove the 'a to h' bit.
for(i in 1:nrow(testDat))
{
print(i)
# get indices from combination matrix
a <- test_comb_matrix[1, i]
b <- test_comb_matrix[2, i]
c <- test_comb_matrix[3, i]
d <- test_comb_matrix[4, i]
e <- test_comb_matrix[5, i]
f <- test_comb_matrix[6, i]
g <- test_comb_matrix[7, i]
h <- test_comb_matrix[8, i]
# find the minimum
myminCol <- as.vector(pmin(test_dist_matrix[, a], test_dist_matrix[, b],
test_dist_matrix[, c], test_dist_matrix[, d],
test_dist_matrix[, e], test_dist_matrix[, f],
test_dist_matrix[, g], test_dist_matrix[, h]))
# sum distances
mySum <- sum(myminCol)
testDat[i, 1] <- mySum
}
Reproducible code:
# number of combinations from 10 choose 8
n <- choose(10, 8)
# get combination matrix
test_comb_matrix <- combn(1:10, 8)
# view first 5 combinations
test_comb_matrix[, 1:5]
# create distance matrix for 100 observations and 10 columns
test_dist_matrix <- data.frame(matrix(rnorm(100), nrow = 100, ncol = 10))
testDat <- data.frame(matrix(NA, nrow = n, ncol = 1))
names(testDat) <- "min"
Try using this :
result_vec <- sapply(seq_len(nrow(testDat)), function(i)
sum(matrixStats::rowMins(as.matrix(test_dist_matrix[, test_comb_matrix[, i]]))))
What is the fastest way to do this summation in R?
This is what I have so far
ans = 0
for (i in 1:dimx[1]){
for (j in 1:dimx[2]){
ans = ans + ((x[i,j] - parameters$mu)^2)/(parameters$omega_2[i]*parameters$sigma_2[j])
}
}
where omega_2, and sigma_2 are omega^2 and sigma^2 respectively.
Nothing fancy:
# sample data
m <- matrix(1:20, 4)
sigma <- 1:ncol(m)
omega <- 1:nrow(m)
mu <- 2
sum(((m - mu) / outer(omega, sigma))^2)
Usually it is quite easy to vectorize this kind of operations. In this case, though, it is a bit trickier when n is not equal to m and also because of double summation. But here is how we can proceed:
# n = 3, m = 2
xs <- cbind(1:3, 4:6)
omegas <- 1:3
sigmas <- 1:2
mu <- 3
sum((t((xs - mu) / omegas) / sigmas)^2)
# [1] 5
Here we use recycling three times and t() to divide appropriate elements by sigmas.
I would like to get a feel of functional programming in R.
To that effect, I would like to write the vandermonde matrix computation, as it can involve a few constructs.
In imperative style that would be :
vandermonde.direct <- function (alpha, n)
{
if (!is.vector(alpha)) stop("argument alpha is not a vector")
if (!is.numeric(alpha)) stop("argument n is not a numeric vector")
m <- length(alpha)
V <- matrix(0, nrow = m, ncol = n)
V[, 1] <- rep(1, m)
j <- 2
while (j <= n) {
V[, j] <- alpha^(j - 1)
j <- j + 1
}
return(V)
}
How would you write that elegantly in R in functional style ?
The following does not work :
x10 <- runif(10)
n <- 3
Reduce(cbind, aaply(seq_len(n-1),1, function (i) { function (x) {x**i}}), matrix(1,length(x10),1))
As it tells me Error: Results must have one or more dimensions. for list of function which go from i in seq_len(3-1) to the function x -> x**i.
It does not seem very natural to use Reduce for this task.
The error message is caused by aaply, which tries to return an array:
you can use alply instead; you also need to call your functions, somewhere.
Here are a few idiomatic alternatives:
outer( x10, 0:n, `^` )
t(sapply( x10, function(u) u^(0:n) ))
sapply( 0:3, function(k) x10^k )
Here it is with Reduce:
m <- as.data.frame(Reduce(f=function(left, right) left * x10,
x=1:(n-1), init=rep(1,length(x10)), accumulate=TRUE))
names(m) <- 1:n - 1
Here's another option, that uses the environment features of R:
vdm <- function(a)
{
function(i, j) a[i]^(j-1)
}
This will work for arbitrary n (the number of columns).
To create the "Vandermonde functional" for a given a, use this:
v <- vdm(a=c(10,100))
To build a matrix all at once, use this:
> outer(1:3, 1:4, v)
[,1] [,2] [,3] [,4]
[1,] 1 10 100 1e+03
[2,] 1 100 10000 1e+06
[3,] 1 NA NA NA
Note that index a[3] is out of bounds, thus returning NA (except for the first column, which is 1).