writing a loop in R with a function - r

Teach me how to create a simple loop to calculate the following equation:
v0 = v * exp(k*d)
where v is a dataframe containing 17631 rows x 15 variables. For every v(row) it is multiplied with exp(k*d).
where k is a column vector containing 15 rate constant, one for each variable.
where d is a row vector containing 17631 rows.
From my heart thanks!

If you want for loops, you can do it like below
# for loop by row
for (i in seq(nrow(v))) {
v0 <- rbind(v0,v[i,]*exp(d*k[i]))
}
# for loop by column
for (j in seq(ncol(v))) {
v0 <- cbind(v0,v[,j]*exp(d*k))
}
However, the most efficient way is using matrix to manipulate the data. Instead of using for loop, maybe you can try the code below
# matrix approach
v0 <- as.matrix(v)*exp(outer(d,k,"*"))

Related

Iterative multiplication of the two lists in R

I am trying to multiply the values stored in a list containing 1,000 values with another list containing ages. Ultimately, I want to store 1,000 rows to a dataframe.
I wonder if it's better to use lapply fucntion or for loop function here.
list 1
lambdaSamples1 <- lapply(
floor(runif(numSamples, min = 1, max = nrow(mcmcMatrix))),
function(x) mcmcMatrix[x, lambdas[[1]]])
*the out put is 1,000 different values in a list. *
list 2
ager1= 14:29
What I want to do is
for (i in 1: numSamples) {
assign(paste0("newRow1_", i), 1-exp(-lambdaSample1[[i]]*ager1))
}
now I got 1,000 rows of values that I want to store in a predetermiend dataframe, outDf_1 (nrow=1000, ncol = ager1).
I tried
`
for (i in 1:numSamples) {
outDf_1[i,] <- newRow1_i
}
I want to store newRow1_1, ,,,,,, , newRow1_1000 to each of the 1,000 row of outDf_1 dataframe.
SHould I approach different way?
I think you're overcomplicating this a bit. Many operations in R are vectorized so you shoudln't need lapply or for loops for this. You didn't give us any data to work with but the code below should do what you want in a more straightforward and fast way.
lambdaSamples1 <- mcmcMatrix[sample(nrow(mcmcMatrix), numSamples, replace=T),
lambdas[[1]]]
outDF_1 <- 1 - exp(-lambdaSamples1 %*% t(ager1))
Just note that this makes outDF_1 a matrix, not a data frame.
To do this for multiple ages, you could use a loop to save your resulting matrices in a list:
outDF <- list()
x <- 5
for (i in seq_len(x)) {
lambdaSamples <- mcmcMatrix[sample(nrow(mcmcMatrix), numSamples, replace=T),
lambdas[[1]]]
outDF[[i]] <- 1 - exp(-lambdaSamples %*% t(ager[[i]]))
}
Here, ager1, ..., agerx are expected to be stored in a list (ager).

Calculating difference between points in vector

I'm trying to calculate the difference between all points in a vector of length 10605 in R. For example, I am trying to do this:
for (i in 1:10605){
for (j in 1:10605){
differences[i] = housedata$Mean_household_income[i] - housedata$Mean_household_income[j]
}
}
It is taking so long to compute, and I'm thinking there's a more timely way to calculate the difference between all the points with each other in this vector. Does anyone have any suggestions?
Thanks!
Seems like the dist function should do that. Distance matrices are only lower triangular because distance(x,y) == distance(y,x):
my.distances <- dist(housedata$Mean_household_income,
housedata$Mean_household_income)
It's going to be faster since it's done in C code. Just type:
dist
You could loop through an incrementally shifted/wrapped copy of the vector and subtract the two vectors. You still have to loop through the length of the data once and shift and subtract the vector each time, but it will probably save some time.
Here is an example:
# make a shift/wrap function
shift <- function(df,offset){
df[((1:length(df))-1-offset)%%length(df)+1]
}
# make some data
data <- seq(1,4)
# make an empty vector to hold the data
difs = vector()
# loop through the data
for(i in 1:length(data)){
shifted <- shift(data,i)
result <- data - shifted
difs <- c(difs, result)
}
print(difs)
What about using outer? It uses a vectorized function (here -) on all combinations of two vectors and stores the results in a matrix.
For example,
x <- runif(10605)
system.time(
differences <- outer(x, x, '-')
)
takes one second on my computer.

Avoiding a loop in matrix index

dist is an nxn matrix of costs:
dist <-matrix(c(0,3.2,1.2,3.2,0,0.5,1.2,0.5,0),nrow=3,ncol=3)
v is a vector of length n, where the index of the vector corresponds to the row of dist, and the value in the vector corresponds to the column of dist
v <- c(2,2,3)
I want to sum the costs like this:
cost <- 0
for(i in 1:length(v)){
cost <- dist[i,v[i]] + cost
}
but this seems clumsy and slow. What is the trick to doing this without the for loop? Is the for loop not taking advantage of some magical R alternative? Suggestions please!
We need to cbind with the row index to extract the values and sum
sum(dist[cbind(1:nrow(dist), v)])

Matrix multiplication using variable element producing Error non-conformable arguments

I am a newbie to R, but avid to learn.
I have been trying endlessly to create a matrix with a variable element (in this case [2,2]). The variable element should take number 4 on the first run and 5 on the second (numbers).
This matrix would be multiplied by another matrix (N0) and produce a result matrix (resul).
Up so far, I have only been able to create the initial matrix with the variable element using a for loop, but I am having problems indexing the result matrix. I have tried several versions, but this is the latest. Any suggestions would be greatly appreciated. Thank you.
numbers <- c(4,5,length.out = 2)
A <- matrix(c(1,2,3,NA),nrow=2,ncol=2)
resul <- matrix(nrow=2,ncol=1)
for (i in 1:2) {
A[2,2]<- matrix(numbers[i])
N0 <- matrix(c(1,2),nrow=2,ncol=1)
resul[i,]<- A[i,i]%*%N0
}
Your code has two distinct problems. the first is that A[i,i] is a 1 x 1
matrix, so you're getting an error because your multiplying a 1 x 1 matrix
by a 2 x 1 matrix (N0).
you could either drop the subscript [i,i] and initialize the result to be
a two by two matrix like so:
result <- matrix(nrow=2,ncol=1)
for (i in 1:2){
A[2,2]<- matrix(numbers[i])
# a colunm vector
N0 <- matrix(c(1,2),
nrow=2,
ncol=1)
# note the index is on the column b/c `A%*%N0` is a column matrix
result[,i]<- A%*%N0
}
or you could either drop the the second subscript [i,] and initialize the result to be
a two by two matrix like so:
result <- matrix(nrow=2,ncol=1)
for (i in 1:2){
A[2,2]<- matrix(numbers[i])
# a colunm vector
N0 <- matrix(c(1,2),
nrow=2,
ncol=1)
result[i,]<- A[i,]%*%N0
}
but it's not clear from you post which (if either) answer is the correct one. Indexing is tricky :)

Return value from column indicated in same row

I'm stuck with a simple loop that takes more than an hour to run, and need help to speed it up.
Basically, I have a matrix with 31 columns and 400 000 rows. The first 30 columns have values, and the 31st column has a column-number. I need to, per row, retrieve the value in the column indicated by the 31st column.
Example row: [26,354,72,5987..,461,3] (this means that the value in column 3 is sought after (72))
The too slow loop looks like this:
a <- rep(0,nrow(data)) #To pre-allocate memory
for (i in 1:nrow(data)) {
a[i] <- data[i,data[i,31]]
}
I would think this would work:
a <- data[,data[,31]]
... but it results in "Error: cannot allocate vector of size 2.8 Mb".
I fear that this is a really simple question, so I've spent hours trying to understand apply, lapply, reshape, and more, but somehow I can't get a grip on the vectorization concept in R.
The matrix actually has even more columns that also go into the a-parameter, which is why I don't want to rebuild the matrix, or split it.
Your support is highly appreciated!
Chris
t(data[,1:30])[30*(0:399999)+data[,31]]
This works because you can reference matricies both in array format, and vector format (a 400000*31 long vector in this case) counting column-wise first. To count row-wise, you use the transpose.
Singe-index notation for the matrix may use less memory. This would involve doing something like:
i <- nrow(data)*(data[,31]-1) + 1:nrow(data)
a <- data[i]
Below is an example of single-index notation for matrices in R. In this example, the index of the per-row maximum is appended as the last column of a random matrix. This last column is then used to select the per-row maxima via single-index notation.
## create a random (10 x 5) matrix
M <- matrix(rpois(50,50),10,5)
## use the last column to index the maximum value of the first 5
## columns
MM <- cbind(M,apply(M,1,which.max))
## column ID row ID
i <- nrow(MM)*(MM[,ncol(MM)]-1) + 1:nrow(MM)
all(MM[i] == apply(M,1,max))
Using an index matrix is an alternative that will probably use more memory but is slightly clearer:
ii <- cbind(1:nrow(MM),MM[,ncol(MM)])
all(MM[ii] == apply(M,1,max))
Try to change the code to work a column at a time:
M <- matrix(rpois(30*400000,50),400000,30)
MM <- cbind(M,apply(M,1,which.max))
a <- rep(0,nrow(MM))
for (i in 1:(ncol(MM)-1)) {
a[MM[, ncol(MM)] == i] <- MM[MM[, ncol(MM)] == i, i]
}
This sets all elements in a with the values from column i if the last column has value i. It took longer to build the matrix than to calculate vector a.

Resources