dist is an nxn matrix of costs:
dist <-matrix(c(0,3.2,1.2,3.2,0,0.5,1.2,0.5,0),nrow=3,ncol=3)
v is a vector of length n, where the index of the vector corresponds to the row of dist, and the value in the vector corresponds to the column of dist
v <- c(2,2,3)
I want to sum the costs like this:
cost <- 0
for(i in 1:length(v)){
cost <- dist[i,v[i]] + cost
}
but this seems clumsy and slow. What is the trick to doing this without the for loop? Is the for loop not taking advantage of some magical R alternative? Suggestions please!
We need to cbind with the row index to extract the values and sum
sum(dist[cbind(1:nrow(dist), v)])
Related
Suppose I have a vector of length 10
vec <- c(10,9,8,7,6,5,4,3,2,1)
and I wanted to create a function that takes in a subset length value (say 3) and computes the squared inverse up to that length. I would like to compute:
10+(9/(2^2))+(8/(3^2))
which would be
vec[1]+(vec[2]/(2^2))+(vec[3]/(3^2))
but with a function that can take input of the subset length.
The only solution I can think of is a for loop, is there a faster more elegant solution in R?
Yes, you can use the fact that most operations in R are vectorised to do this without a loop:
vec <- c(10,9,8,7,6,5,4,3,2,1)
cum_inverse_square <- function(vec, n) {
sum(vec[1:n] / (1:n)^2)
}
cum_inverse_square(vec, 3) == 10+(9/(2^2))+(8/(3^2)) # TRUE
Teach me how to create a simple loop to calculate the following equation:
v0 = v * exp(k*d)
where v is a dataframe containing 17631 rows x 15 variables. For every v(row) it is multiplied with exp(k*d).
where k is a column vector containing 15 rate constant, one for each variable.
where d is a row vector containing 17631 rows.
From my heart thanks!
If you want for loops, you can do it like below
# for loop by row
for (i in seq(nrow(v))) {
v0 <- rbind(v0,v[i,]*exp(d*k[i]))
}
# for loop by column
for (j in seq(ncol(v))) {
v0 <- cbind(v0,v[,j]*exp(d*k))
}
However, the most efficient way is using matrix to manipulate the data. Instead of using for loop, maybe you can try the code below
# matrix approach
v0 <- as.matrix(v)*exp(outer(d,k,"*"))
MC is a very large matrix, 1E6 rows (or more) and 500 columns. I am trying to get the number of occurrences of the values 1 through 13 for each of the columns. Sometimes the number of occurrences for one of these values will be zero. I would like my final output to be a 300X13 matrix (or data frame) with these count values. I am wondering if anyone can suggest a more efficient manner then what I currently have, which is the following:
MCct<-matrix(0,500,13)
for (j in 1:500){
for (i in 1:13){
MCct[j,i]<-length(which(MC[,j]==i))}}
I don't that table works, because I need to also know if zero occurrences occurred...I couldn't figure it out how to do that if it is possible. And I am only somewhat familiar with apply, so maybe there is a method to use that...I haven't been successful in figuring that out yet.
Thanks for the help,
Vivien
You could do this with sapply (to iterate from 1 to 13) and colSums (to add up the columns of j):
MCct <- sapply(1:13, function(i) {
colSums(MC == i)
})
Suppose you have a set of values you're interested in
set <- 1:4
n = length(set)
and you have a matrix that includes those values, and others
m <- matrix(sample(10, 120, TRUE), 12, 10)
Create a vector indicating the index in the set of each matching value
idx <- match(m, set)
then make the index unique to each column
idx <- idx + (col(m) - 1) * n
idx ranges from 1 (occurrences of the first set element in the first column) to n * ncol(m) (occurrence of the nth set element in the last column of m). Tabulate the unique values of idx
v <- tabulate(idx, nbin = n * ncol(m))
The first n elements of v summarize the number of times set elements 1..n appear in the first column of m. The second n elements of v summarize the number of times set elements 1..n appear in the second column of m, etc. Reshape as the desired matrix, where each row represents the corresponding member of the set.
matrix(v, ncol=ncol(m))
table can count zero occurrences, you just need to create a factor that has the whole range of levels, e.g.
apply(MC, 2, function(x) table(factor(x, levels=1:13)))
This is not as efficient as #Patronus' solution though.
I average coordinates stored in a data frame as follows:
sapply(coords[N:M,],mean) # mean of coordinates N to M
I need the average of several sets of coordinates, so I made this loop, which finds the mean of coordinates 1-4, 5-11 and 20-30.
N <- c(1, 5,20)
M <- c(4,11,30)
for ( i in 1:length(N) ) {
sapply(coords[N(i):M(i),],mean)
}
How can I vectorize that loop? I've tried to pass a matrix to coords (coords[NM,]), but that doesn't give me what I want.
You may replace your sapply(x, mean) by colMeans(x) in the sake of simplicity and efficiency.
Perhaps by a vector thinking you prefer to convert several variables (N and M) to a single vector - here array - when possible and simple.
N <- data.frame(from=c(1,5,20), to=c(4,11,30))
apply(N, 1, function(x) colMeans(coords[x[1]:x[2],]))
I'm stuck with a simple loop that takes more than an hour to run, and need help to speed it up.
Basically, I have a matrix with 31 columns and 400 000 rows. The first 30 columns have values, and the 31st column has a column-number. I need to, per row, retrieve the value in the column indicated by the 31st column.
Example row: [26,354,72,5987..,461,3] (this means that the value in column 3 is sought after (72))
The too slow loop looks like this:
a <- rep(0,nrow(data)) #To pre-allocate memory
for (i in 1:nrow(data)) {
a[i] <- data[i,data[i,31]]
}
I would think this would work:
a <- data[,data[,31]]
... but it results in "Error: cannot allocate vector of size 2.8 Mb".
I fear that this is a really simple question, so I've spent hours trying to understand apply, lapply, reshape, and more, but somehow I can't get a grip on the vectorization concept in R.
The matrix actually has even more columns that also go into the a-parameter, which is why I don't want to rebuild the matrix, or split it.
Your support is highly appreciated!
Chris
t(data[,1:30])[30*(0:399999)+data[,31]]
This works because you can reference matricies both in array format, and vector format (a 400000*31 long vector in this case) counting column-wise first. To count row-wise, you use the transpose.
Singe-index notation for the matrix may use less memory. This would involve doing something like:
i <- nrow(data)*(data[,31]-1) + 1:nrow(data)
a <- data[i]
Below is an example of single-index notation for matrices in R. In this example, the index of the per-row maximum is appended as the last column of a random matrix. This last column is then used to select the per-row maxima via single-index notation.
## create a random (10 x 5) matrix
M <- matrix(rpois(50,50),10,5)
## use the last column to index the maximum value of the first 5
## columns
MM <- cbind(M,apply(M,1,which.max))
## column ID row ID
i <- nrow(MM)*(MM[,ncol(MM)]-1) + 1:nrow(MM)
all(MM[i] == apply(M,1,max))
Using an index matrix is an alternative that will probably use more memory but is slightly clearer:
ii <- cbind(1:nrow(MM),MM[,ncol(MM)])
all(MM[ii] == apply(M,1,max))
Try to change the code to work a column at a time:
M <- matrix(rpois(30*400000,50),400000,30)
MM <- cbind(M,apply(M,1,which.max))
a <- rep(0,nrow(MM))
for (i in 1:(ncol(MM)-1)) {
a[MM[, ncol(MM)] == i] <- MM[MM[, ncol(MM)] == i, i]
}
This sets all elements in a with the values from column i if the last column has value i. It took longer to build the matrix than to calculate vector a.