How it iterate over a matrix using a function in R? - r

I have created a function to order a vector of length 2, using the following code
x = (c(6,2))
orders = function(x){
for(i in 1:(length(x)-1)){
if(x[i+1] < x[i]){
return(c(x[i+1], x[i]))} else{
(return(x))
}}}
orders(x)
I have been asked to use this function to process a dataset with 2 columns as follows. Iterate over the rows of the
data set, and if the element in the 2nd column of row i is less than the element in the first
column of row i, switch the order of the two entries in the row by making a suitable call to
the function you just wrote.
I've tried using the following code
set.seed(1128719)
data=matrix(rnorm(20),byrow=T,ncol=2)
df = for (i in 1:2) {
for(j in 1:10){
data = orders(c(x[i], x[j]))
return(data)
}
}
The output is null. I'm not quite sure where I'm going wrong.
Any suggestions?

I modified your code a bit but tried to keep the 'style' the same
Ther is no need for a loop
i in 1:(length(x)-1) always evaluates to
for i in 1:1 and i will only take the value of 1.
orders = function(x){
# Since the function will only work on vectors of length 2
# its good practice to raise an error right at the start
#
if (length(x) != 2) {
stop("x must be vector of lenght 2")
}
if (x[2] < x[1]) {
return(c(x[2], x[1]))
} else {
return(x)
}
}
orders(c(6, 2))
set.seed(1128719)
data <- matrix(rnorm(20),byrow=T,ncol=2)
The for loop itself cant be assigned to a variable
But we use the loop to mutate the matrix 'data'
in place
for (row in 1:nrow(data)) {
data[row, ] <- orders(data[row,])
}
data
Edit:
This is the input:
[,1] [,2]
[1,] -0.04142965 0.2377140
[2,] -0.76237866 -0.8004284
[3,] 0.18700893 -0.6800310
[4,] 0.76499646 0.4430643
[5,] 0.09193440 -0.2592316
[6,] 1.17478053 -0.4044760
[7,] -1.62262500 0.1652850
[8,] -1.54848857 0.7475451
[9,] -0.05907252 -0.8324074
[10,] -1.11064318 -0.1148806
This is the output i get:
[,1] [,2]
[1,] -0.04142965 0.23771403
[2,] -0.80042842 -0.76237866
[3,] -0.68003104 0.18700893
[4,] 0.44306433 0.76499646
[5,] -0.25923164 0.09193440
[6,] -0.40447603 1.17478053
[7,] -1.62262500 0.16528496
[8,] -1.54848857 0.74754509
[9,] -0.83240742 -0.05907252
[10,] -1.11064318 -0.11488062

Here are two ways of ordering the 2 columns matrix.
This is the test matrix posted in the question.
set.seed(1128719)
data <- matrix(rnorm(20), byrow = TRUE, ncol = 2)
1. With a function orders.
The function expects as input a 2 element vector. If they are out of order, return the vector with its elements reversed, else return the vector as is.
orders <- function(x){
stopifnot(length(x) == 2)
if(x[2] < x[1]){
x[2:1]
}else{
x
}
}
Test the function.
x <- c(6,2)
orders(x)
#[1] 2 6
Now with the matrix data.
df1 <- t(apply(data, 1, orders))
2. Vectorized code.
Creates a logical index with TRUE whenever the elements are out of order and reverse only those elements.
df2 <- data
inx <- data[,2] < data[,1]
df2[inx, ] <- data[inx, 2:1]
The results are the same.
identical(df1, df2)
#[1] TRUE

Related

Replacing pair of element of symmetric matrix with NA

I have a positive definite symmetric matrix. Pasting the matrix generated using the following code:
set.seed(123)
m <- genPositiveDefMat(
dim = 3,
covMethod = "unifcorrmat",
rangeVar = c(0,1) )
x <- as.matrix(m$Sigma)
diag(x) <- 1
x
#Output
[,1] [,2] [,3]
[1,] 1.0000000 -0.2432303 -0.4110525
[2,] -0.2432303 1.0000000 -0.1046602
[3,] -0.4110525 -0.1046602 1.0000000
Now, I want to run the matrix through iterations and in each iteration I want to replace the symmetric pair with NA. For example,
Iteration 1:
x[1,2] = x[2,1] <- NA
Iteration2:
x[1,3] = x[3,1] <- NA
and so on....
My idea was to check using a for loop
Prototype:
for( r in 1:nrow(x)
for( c in 1:ncol(x)
if x[r,c]=x[c,r]<-NA
else
x[r,c]
The issue with my code is for row 1 and column 1, the values are equal hence it sets to 0 (which is wrong). Also, the moment it is not NA it comes out of the loop.
Appreciate any help here.
Thanks
If you need the replacement done iteratively, you can use the indexes of values represented by upper.tri(x)/lower.tri to do the replacements pair-by-pair. That will allow you to pass the results to a function before/after each replacement, e.g.:
idx <- which(lower.tri(mat), arr.ind=TRUE)
sel <- cbind(
replace(mat, , seq_along(mat))[ idx ],
replace(mat, , seq_along(mat))[ idx[,2:1] ]
)
# [,1] [,2]
#[1,] 2 4 ##each row represents the lower/upper pair
#[2,] 3 7
#[3,] 6 8
for( i in seq_len(nrow(sel)) ) {
mat[ sel[i,] ] <- NA
print(mean(mat, na.rm=TRUE))
}
#[1] 0.2812249
#[1] 0.5581359
#[1] 1

Use a FOR loop to run function over each column in r

I have several columns of data that I want to use a for loop (specifically a for loop. Please, no answers that don't involve a for loop) to run a function for each column in a matrix.
x <- runif(10,0,10)
y <- runif(10,10,20)
z <- runif(10,20,30)
tab <- cbind(x,y,z)
x y z
[1,] 9.5262742 16.22999 21.93228
[2,] 5.8183264 14.53771 21.81774
[3,] 3.9509342 17.36694 22.46594
[4,] 3.0245614 19.46411 25.80411
[5,] 5.0284351 13.89636 21.61767
[6,] 3.0291715 17.50267 26.28110
[7,] 8.4727471 16.77365 27.60535
[8,] 3.3816903 15.23395 22.01265
[9,] 0.3182083 13.97575 29.25909
[10,] 2.6499290 16.71129 27.05160
for (i in 1:ncol(tab)){
print(mean(i))
}
I have almost no familiarity with R and have had trouble finding a solution that specifically uses a for loop to run a function and output a result per column.
Well, strictly using a for loop, I think this would do what you want to!
x <- runif(10,0,10)
y <- runif(10,10,20)
z <- runif(10,20,30)
tab <- cbind(x,y,z)
for (i in 1:ncol(tab)){
print(mean(tab[, i]))
}
You need to index the matrix by using [row, column]. When you want to select all rows for a specific column (which is your case), just leave the row field empty. So that's why you have to use [, i], where i is your index.

Retaining lagged value to compound towards end value

I would like to please ask for your help concerning the following issue.
In a table-like object where each row corresponds to an observation in time, I would like to obtain the value from the previous row for one particular variable (:= p0), multiply it with an element of another column (:= returnfactor) and write the result to the current row as an element of another column (:= p1).
Illustrated via two pictures, I want to go from
to
.
I have written
matrix <- cbind (
1:10,
1+rnorm(10, 0, 0.05),
NA,
NA
)
colnames(matrix) <- c("timeid", "returnfactor", "p0", "p1")
matrix[1, "p0"] <- 100
for (i in 1:10)
{
if (i==1)
{
matrix[i, "p1"] <- matrix[1, "p0"] * matrix[i, "returnfactor"]
}
else
{
matrix[i, "p0"] <- matrix[i-1, "p1"]
matrix[i, "p1"] <- matrix[i, "p0"] * matrix[i, "returnfactor"]
}
}
That is, I implemented what I would like to reach using a loop. However, this loop is too slow. Obviously, I am new to R.
Could you please give me a hint how to improve the speed using the capabilities R has to offer? I assume there is no need for a loop here, though I lack an approach how to do it else. In SAS, I used its reading of data frames by row and the retain-statement in a data step.
Yours sincerely,
Sinistrum
We can indeed improve this. The key thing to notice is that values of both p0 and p1 involve mostly cumulative products. In particular, we have
mat[, "p1"] <- mat[1, "p0"] * cumprod(mat[, "returnfactor"])
mat[-1, "p0"] <- head(mat[, "p1"], -1)
where head(mat[, "p1"], -1) just takes all the mat[, "p1"] except for its last element. This gives
# timeid returnfactor p0 p1
# [1,] 1 0.9903601 100.00000 99.03601
# [2,] 2 1.0788946 99.03601 106.84941
# [3,] 3 1.0298117 106.84941 110.03478
# [4,] 4 0.9413212 110.03478 103.57806
# [5,] 5 0.9922179 103.57806 102.77200
# [6,] 6 0.9040545 102.77200 92.91149
# [7,] 7 0.9902371 92.91149 92.00440
# [8,] 8 0.8703836 92.00440 80.07913
# [9,] 9 1.0657001 80.07913 85.34033
# [10,] 10 0.9682228 85.34033 82.62846

Calculating distance between two points for multiple records for matching rows - loop over rows of two matrices

I have got two matrices with coordinates and I am trying to compute distances between points in matching rows, i.e. between row 1 in first matrix and row 1 in second matrix.
What I am getting is computed distance between row 1 and all the other rows. This is creating memory issues as I have 800,000 rows. Does anyone know how to ask for that?
I am using
dist1 <- distm(FareStageMatrix[1:25000,], LSOACentroidMatrix[1:25000,], fun=distHaversine)
I am trying to create something like this but doesn't seem to work
for(i in 1:nrow(FareStageMatrix)) {
for(j in 1:nrow(LSOACentroidMatrix)) {
my_matrix[i] <- my_matrix[distm(FareStageMatrix[i], LSOACentroidMatrix[i], fun=distHaversine)]
}
}
changed to
for (i in 1:nrow(FareStageMatrix)){
for (i in 1:nrow(LSOACentroidMatrix)){
r1<-FareStageMatrix[i,]
r2<-LSOACentroidMatrix[i,]
results[i]<-distm(r1, r2, fun=distHaversine)
}
}
Is that something that should be working?
It seems I have managed to find a solution to that:
results<-matrix(NA,nrow(FareStageMatrix))
for (i in 1:nrow(FareStageMatrix)){
for (i in 1:nrow(LSOACentroidMatrix)){
r1<-FareStageMatrix[i,]
r2<-LSOACentroidMatrix[i,]
results[i]<-distm(r1, r2, fun=distHaversine) ## Example function
}
}
where FareStageMatrix and LSOACentroidMatrix are matrices with coordinates
It seems to have calculated one distance for a given pair of points
I've adapted geosphere's distGeo function (geodesic distance) for this purpose.
library(geosphere)
source("https://raw.githubusercontent.com/RomanAbashin/distGeo_v/master/distGeo_v.R")
Data
set.seed(1702)
m1 <- matrix(runif(20000, -10, 10), ncol = 2)
m2 <- matrix(runif(20000, -10, 10), ncol = 2)
Code
result <- distGeo_v(m1[, 1], m1[, 2],
m2[, 1], m2[, 2])
Result
> head(m1)
[,1] [,2]
[1,] 8.087152 9.227607
[2,] 9.528334 9.103403
[3,] 5.637921 -2.213228
[4,] -2.473758 -9.812986
[5,] -2.844036 -5.245779
[6,] -4.824615 -4.330890
> head(m2)
[,1] [,2]
[1,] 0.1673027 0.6483745
[2,] -2.5033184 0.1386050
[3,] 4.8589785 5.1996968
[4,] 8.3239454 -8.9810949
[5,] 0.8280422 -7.8272613
[6,] -6.2633738 -5.8725562
> head(result)
[1] 1292351.3 1661739.3 824260.0 1189476.4 496403.2 233480.2

R: Apply family that deletes columns as part of the function

I am trying to iterate through each row in a matrix, find the column with the minimum value and the column name and then delete that column after it has been used so that a new minimum can be calculated. The correct answer should look like this:
result
1/1 50
2/2 61
3/3 72
4/4 83
Test_Matrix <- matrix(c(50:149), ncol = 10 , byrow=FALSE)
Names <- c(1:10)
colnames(Test_Matrix) <- Names
rownames(Test_Matrix) <- Names
result <- t(sapply(seq(nrow(Test_Matrix)), function(i) {
j <- which.min(Test_Matrix[i,])
c(paste(rownames(Test_Matrix)[i], colnames(Test_Matrix)[j], sep='/'), Test_Matrix[i,j])
drops <- colnames(Test_Matrix)[j]
Test_Matrix[ , !(names(Test_Matrix) %in% drops)]
}))
result
Second question is that I would like to choose the order of the rows during the iteration so that it chooses to go to the next row that had the same name as the column name. For example, if the column with the minimum was named 5, column 5 would be deleted and the minimum for the row named 5 would be calculated next.
Wondering if this is possible and if a loop is needed for these calculations.
As a new R user, I appreciate any help.
Thanks!
For the first part of your question:
Test_Matrix <- matrix(c(50:149), ncol = 10 , byrow=FALSE)
Names <- c(1:10)
colnames(Test_Matrix) <- Names
rownames(Test_Matrix) <- Names
result <- matrix(nrow=0, ncol=2)
for (i in 1:nrow(Test_Matrix)) {
Test_Matrix <- as.matrix(Test_Matrix) #when Test_Matrix has only 1 column R converts it into a vector
j <- which.min(Test_Matrix[i, ])
result <- rbind(result, c(paste(rownames(Test_Matrix)[i],
colnames(Test_Matrix)[j], sep='/'),
as.numeric(Test_Matrix[i,j])))
Test_Matrix <- Test_Matrix[, -j] #remove column j
}
result
## [,1] [,2]
## [1,] "1/1" "50"
## [2,] "2/2" "61"
## [3,] "3/3" "72"
## [4,] "4/4" "83"
## [5,] "5/5" "94"
## [6,] "6/6" "105"
## [7,] "7/7" "116"
## [8,] "8/8" "127"
## [9,] "9/9" "138"
##[10,] "10/" "149"
Edit: For the second part, instead of the for loop, you can use this:
i <- 1
while(length(Test_Matrix)>0) {
Test_Matrix <- as.matrix(Test_Matrix)
j <- which.min(Test_Matrix[i, ])
result <- rbind(result, c(paste(rownames(Test_Matrix)[i],
colnames(Test_Matrix)[j], sep='/'),
as.numeric(Test_Matrix[i,j])))
Test_Matrix <- Test_Matrix[, -j]
i <- as.numeric(names(j))+1
}

Resources