R Nested For Loop for a Sensitivity Analysis - r

I'm fairly new to R and I've been trying for a while to do something, which I assumed to be very simple, but I keep failing at it (unfortunately for me, it doesn't mean it's not simple!).
I have defined a function that takes a time series as an input and outputs a single value (as a vector) at the end. The function has two parameters (from now on n and m) affecting the output, so it looks like this:
fnc <- function(x, n, m)
My goal is to store/see (possibly in a matrix?) the output while changing n and m (so, a basic sensitivity analysis, if this makes sense?).
My attempts were to create an empty matrix, run a nested for loop for several n and m values and fill in the matrix accordingly.
So, something like this (I'm aware that the code below gives an error, as i,j values would end up being out of bounds the 3x3 matrix, but it is just one of the illogical-trial I did):
n_lens = c(750, 1000, 1250)
m_lens = c(250, 300, 350)
output_matrix = matrix(data=NA, nrow = length(n_lens), ncol = length(m_lens))
for (i in n_lens){
for (j in m_lens){
output_matrix[i,j] <- function(x, i, j)
}
}
Unfortunately all of them were far from getting the job done.
Any suggestion/tip is much appreciated.

I took the freedom to define a simple fnc function.
The idea is to loop over the indices of n_lens and not on the values of n_lens.
Nested for loops may be (will be?) slower in R compared to other ways of R.
It produces the required output.
fnc <- function(x, n, m)
{
return (n+m)
}
n_lens = c(750, 1000, 1250)
m_lens = c(250, 300, 350)
x = 1
len_n = length(n_lens)
len_m = length(m_lens)
output_matrix = matrix(data=NA, nrow = length(n_lens), ncol = length(m_lens))
for (i in seq(len_n)){
for (j in seq(len_m)){
output_matrix[i,j] <- fnc(x, n_lens[i], m_lens[j])
}
}
output_matrix
The output received is
[,1] [,2] [,3]
[1,] 1000 1050 1100
[2,] 1250 1300 1350
[3,] 1500 1550 1600

Related

R code for replacing the values of Matrix

Hey everyone, I have a large Matrix X with the dimensions (654x7095). I wanted to subset this matrix and replace the values of this subsetted matrix of X with another matrix which I have created. The R-code is as follows -
install.packages("Matrix")
install.packages("base")
library(Matrix)
library(base)
T = 215
n = 3
k = 33
X = matrix(0,T*n,T*k)
IN = diag(n)
K1 = Matrix(0, n*n, n*(n-1)/2, sparse = TRUE)
for(i in 1:(n-1)){
K1[(2+(i-1)*(n+1)):(i*n), (1+(i-1)*(n-i/2)):(i*(n-i)*(i+1)/2)] <- diag(n-i)
}
yin = matrix(rnorm(645), ncol = 3)
Xu = matrix(rnorm(2150), ncol = 10)
#Till yet I have defined the variables and matrices which will be used in subsetting.
Above codes are perfectly fine, however, the code below is showing error -
#Loop for X subsetting
for(i in 1:T){
X[(((i-1)*n)+1):(i*n), (((i-1)*k)+1):(i*k)] <- cbind( (t(kronecker(yin[i,],IN))%*%K1) , (t(kronecker(Xu[i,],IN))))
}
# in this Kronecker() finds the Kronecker tensor product of two Matrix A and B. This function can be used with the help of "base" library.
When I am running this above code, the error which is showing is -
Error in X[(((i - 1) * n) + 1):(i * n), ] <- cbind((t(kronecker(yin[i, :
number of items to replace is not a multiple of replacement length
However, when I am running this same command in MATLAB it is working perfectly fine. MATLAB CODE -
X = zeros(T*n,T*k);
for i = 1:T
X((i-1)*n+1:i*n,(i-1)*k+1:i*k) = [kron(yin(i,:),IN)*K1, kron(Xu(i,:),IN)];
end
The output which MATLAB is giving is that it fills up the values in number of rows and columns which is defined in the Loop for subsetting the X. I have attached the snapshot of the desired output which MATLAB is giving. However, error is showing in R for the same.
Can someone enlighten me as where I am going wrong with the R code?
I appreciate the help, Many thanks.
I think the problem is how the class 'dgeMatrix' is handled. Try
for (i in 1:T) {
X[(((i-1)*n)+1):(i*n), (((i-1)*k)+1):(i*k)] <- as.matrix(cbind((t(kronecker(yin[i,],IN))%*%K1) , (t(kronecker(Xu[i,],IN)))))
}

Call a function and provide output in a matrix

I have a function in R which I call
RS1 = t(cbind(Data[,18], Data[,20]))
RS2 = t(cbind(Data[,19], Data[,21]))
p = t(Data[23:24])
rand_x <- function (p, x) {
n.goods <- dim (p)[1]
n.obs <- dim (p)[2]
xRC = NaN*matrix(1, n.goods, n.obs)
for(i in 1:n.obs) {
xRC[1,i] <- RS1[1,i] + RS1[2,i]
xRC[2,i] <- RS2[1,i] + RS2[2,i]
}
result <- xRC
return(result)
}
This function by having these two inputs generates a vector (2x50) with some random numbers. I want to call this function rand_x 1000 times and derive 1000 matrices and then bind the results in a final matrix. I have tried to create a loop to sort this problem but I am still struggling. Any help will be much appreciated.
If you intend to add each element of column 18 to 20 (that is what your code does), try using rowSums().
Try:
xRC <- rbind(
rowSums (Data [, c(18, 20)])
rowSums (Data [, c(19, 21)])
)
The output will be a matrix.
I do not see, where randomness appears in your function though. If you just want a 2x50 matrix with random numbers you may want to use:
xRC <- matrix (rnorm(50*2), 2) # for standard-normal generated numbers
xRC <- matrix (sample(1:100, replace = T, size = 100), 2) # for numbers between 1 and 100, uniformly distributed
To do this 1000 times, try:
for (i in 1:1000) {
rbind(xRC,
rowSums (Data [, c(18, 20)])
rowSums (Data [, c(19, 21)])
)
}
# or if you just want to generate random numbers, performance is way faster when you use:
xRC <- matrix(rnorm(1000 * 2 * 50), ncol = 50)

Create a function that takes in a vector and returns a matrix in R

I am trying to create a function that will take in a vector k and return to me a matrix with dimensions length(distMat[1,]) by length(k). distMat is a huge matrix and indSpam is a long vector. In particular to my situation, length(distMat[1,]) is 2412. When I enter in k as a vector of length one, I get a vector of length 2412. I want to be able to enter in k as a vector of length two and get a matrix of 2412x2. I am trying to use a while loop to let it go through the length of k, but it only returns to me a vector of length 2412. What am I doing wrong?
predNeighbor = function(k, distMat, indSpam){
counter = 1
while (counter<(length(k)+1))
{
preMatrix = apply(distMat, 1, order)
orderedMatrix = t(preMatrix)
truncate = orderedMatrix[,1:k[counter]]
checking = indSpam[truncate]
checking2 = matrix(checking, ncol = k[counter])
number = apply(checking2, 1, sum)
return(number[1:length(distMat[1,])] > (k[counter]/2))
counter = counter + 1
}
}
I am trying to create a function that will take in a vector k and return to me a matrix with dimensions length(distMat[1,]) by length(k)
Here's a function that does this.
foo <- function(k, distMat) {
return(matrix(0, nrow = length(distMat[1, ]), ncol = length(k)))
}
If you have other requirements, please describe them in words.
Based on your comment, I think I understand better your goal. You have a function that returns a vector of length k and you want to save it's output as rows in a matrix. This is a pretty common task. Let's do a simple example where k starts out as 1:10, and say we want to add some noise to it with a function foo() and see how the rank changes.
In the case where the input to the function is always the same, replicate() works very well. It will automatically put everything in a matrix
k <- 1:10
noise_and_rank <- function(k) {
rank(k + runif(length(k), min = -2, max = 2))
}
results <- replicate(n = 8, expr = {noise_and_rank(k)})
In the case where you want to iterate, i.e., the output from the one go is the input for the next, a for loop is good, and we just pre-allocate a matrix with 0's, to fill in one column/row at a time
k <- 1:10
n.sim <- 8
results <- matrix(0, nrow = length(k), ncol = n.sim)
results[, 1] <- k
for(i in 2:n.sim) {
results[, i] <- noise_and_rank(results[, i - 1])
}
What your original question seems to be about is how to do the pre-allocation. If the input is always the same, using replicate() means you don't worry about it. If the input is is different each time, then pre-allocate using matrix(), you don't need to write any special function.

Formatting print output of for loop

Updated with code that addresses most of my questions.
I have a modest function to generate iterations of a MLE for a population estimate. (I know that iterations are poor form in R, but I am trying to show a nonlinear search procedure in detail, to accompany methods from an Excel spreadsheet).
n <- c(32,54,37,60,41) # number of captures
R <- c(32,36,6,13,5) # of marked fish returned to the population
fn <- function(x){
N = 97 #starting value of N
mle = matrix(0, nrow=x, ncol=8) #per suggestion
colnames(mle) = c("N","g.N","h.N","N1","g.N1","h.N1","delta.h","corr") #added column names
for (i in 1:x) {
g.N = prod(1-n/N)
h.N = N-sum(R)-N*g.N
N1 = N-1
g.N1 = prod(1-n/N1)
h.N1 = N1-sum(R)-N*g.N1
delta.h = h.N-h.N1
corr = -h.N/delta.h
#print(c(N,g.N,h.N,N1,g.N1,h.N1,delta.h,corr))#original output
mle[i,] = c(N,g.N,h.N,N1,g.N1,h.N1,delta.h,corr) #per suggestion
N = N+corr
}
return(mle) #per suggestion
}
fn(5)
This creates the following output
N g.N h.N N1 g.N1 h.N1 delta.h corr
[1,] 97.00000 0.04046356 1.075034e+00 96.00000 0.03851149 0.2643856 0.8106486 -1.326141e+00
[2,] 95.67386 0.03788200 4.954192e-02 94.67386 0.03597455 -0.7679654 0.8175073 -6.060119e-02
[3,] 95.61326 0.03776543 2.382189e-03 94.61326 0.03586008 -0.8154412 0.8178234 -2.912841e-03
[4,] 95.61035 0.03775983 1.147664e-04 94.61035 0.03585458 -0.8177238 0.8178386 -1.403289e-04
[5,] 95.61020 0.03775956 5.529592e-06 94.61020 0.03585432 -0.8178338 0.8178393 -6.761220e-06
I would like to cleanup the output, but have not been able to crack the code to put the results in a matrix or data.frame or any format where I can give column titles and adjust the digits, numeric format, etc. in a meaningful manner. I've have had limited success with cat and format but have been unable to get them to do precisely what I would like. Any help formatting this as a table, or matrix or data.frame would be appreciated.
Your function doesn't actually work for me (what's n for example). Anyway, you should have something like:
N<-97 #starting value of N
m = matrix(0, nrow=5, ncol=7)
for (i in 1:x) {
#<snip>
m[i,] = c(N,g.N,N1,g.N1,h.N1,delta.h,corr)
N<-N+corr
}
return(m)
}

Calculating all distances between one point and a group of points efficiently in R

First of all, I am new to R (I started yesterday).
I have two groups of points, data and centers, the first one of size n and the second of size K (for instance, n = 3823 and K = 10), and for each i in the first set, I need to find j in the second with the minimum distance.
My idea is simple: for each i, let dist[j] be the distance between i and j, I only need to use which.min(dist) to find what I am looking for.
Each point is an array of 64 doubles, so
> dim(data)
[1] 3823 64
> dim(centers)
[1] 10 64
I have tried with
for (i in 1:n) {
for (j in 1:K) {
d[j] <- sqrt(sum((centers[j,] - data[i,])^2))
}
S[i] <- which.min(d)
}
which is extremely slow (with n = 200, it takes more than 40s!!). The fastest solution that I wrote is
distance <- function(point, group) {
return(dist(t(array(c(point, t(group)), dim=c(ncol(group), 1+nrow(group)))))[1:nrow(group)])
}
for (i in 1:n) {
d <- distance(data[i,], centers)
which.min(d)
}
Even if it does a lot of computation that I don't use (because dist(m) computes the distance between all rows of m), it is way more faster than the other one (can anyone explain why?), but it is not fast enough for what I need, because it will not be used only once. And also, the distance code is very ugly. I tried to replace it with
distance <- function(point, group) {
return (dist(rbind(point,group))[1:nrow(group)])
}
but this seems to be twice slower. I also tried to use dist for each pair, but it is also slower.
I don't know what to do now. It seems like I am doing something very wrong. Any idea on how to do this more efficiently?
ps: I need this to implement k-means by hand (and I need to do it, it is part of an assignment). I believe I will only need Euclidian distance, but I am not yet sure, so I will prefer to have some code where the distance computation can be replaced easily. stats::kmeans do all computation in less than one second.
Rather than iterating across data points, you can just condense that to a matrix operation, meaning you only have to iterate across K.
# Generate some fake data.
n <- 3823
K <- 10
d <- 64
x <- matrix(rnorm(n * d), ncol = n)
centers <- matrix(rnorm(K * d), ncol = K)
system.time(
dists <- apply(centers, 2, function(center) {
colSums((x - center)^2)
})
)
Runs in:
utilisateur système écoulé
0.100 0.008 0.108
on my laptop.
rdist() is a R function from {fields} package which is able to calculate distances between two sets of points in matrix format quickly.
https://www.image.ucar.edu/~nychka/Fields/Help/rdist.html
Usage :
library(fields)
#generating fake data
n <- 5
m <- 10
d <- 3
x <- matrix(rnorm(n * d), ncol = d)
y <- matrix(rnorm(m * d), ncol = d)
rdist(x, y)
[,1] [,2] [,3] [,4] [,5]
[1,] 1.512383 3.053084 3.1420322 4.942360 3.345619
[2,] 3.531150 4.593120 1.9895867 4.212358 2.868283
[3,] 1.925701 2.217248 2.4232672 4.529040 2.243467
[4,] 2.751179 2.260113 2.2469334 3.674180 1.701388
[5,] 3.303224 3.888610 0.5091929 4.563767 1.661411
[6,] 3.188290 3.304657 3.6668867 3.599771 3.453358
[7,] 2.891969 2.823296 1.6926825 4.845681 1.544732
[8,] 2.987394 1.553104 2.8849988 4.683407 2.000689
[9,] 3.199353 2.822421 1.5221291 4.414465 1.078257
[10,] 2.492993 2.994359 3.3573190 6.498129 3.337441
You may want to have a look into the apply functions.
For instance, this code
for (j in 1:K)
{
d[j] <- sqrt(sum((centers[j,] - data[i,])^2))
}
Can easily be substituted by something like
dt <- data[i,]
d <- apply(centers, 1, function(x){ sqrt(sum(x-dt)^2)})
You can definitely optimise it more but you get the point I hope
dist works fast because is't vectorized and call internal C functions.
You code in loop could be vectorized in many ways.
For example to compute distance between data and centers you could use outer:
diff_ij <- function(i,j) sqrt(rowSums((data[i,]-centers[j,])^2))
X <- outer(seq_len(n), seq_len(K), diff_ij)
This gives you n x K matrix of distances. And should be way faster than loop.
Then you could use max.col to find maximum in each row (see help, there are some nuances when are many maximums). X must be negate cause we search for minimum.
CL <- max.col(-X)
To be efficient in R you should vectorized as possible. Loops could be in many cases replaced by vectorized substitute. Check help for rowSums (which describe also rowMeans, colSums, rowSums), pmax, cumsum. You could search SO, e.g.
https://stackoverflow.com/search?q=[r]+avoid+loop (copy&paste this link, I don't how to make it clickable) for some examples.
My solution:
# data is a matrix where each row is a point
# point is a vector of values
euc.dist <- function(data, point) {
apply(data, 1, function (row) sqrt(sum((point - row) ^ 2)))
}
You can try it, like:
x <- matrix(rnorm(25), ncol=5)
euc.dist(x, x[1,])

Resources