Suppose we have a dataframe or matrix with one column specifying an integer value N as below (col 5).
Is there a vector approach to repopulate the object such that each row gets copied N times?
> y
[,1] [,2] [,3] [,4] [,5]
[1,] -0.02738267 0.5170621 -0.01644855 0.48830663 1
[2,] -0.30076544 1.8136359 0.02319640 -1.59649330 2
[3,] 1.73447245 0.4043638 -0.29112385 -0.25102988 3
[4,] 0.01025271 -0.4908636 0.80857300 0.08137033 4
The result would be as follows.
[1,] -0.02738267 0.5170621 -0.01644855 0.48830663 1
[2,] -0.30076544 1.8136359 0.02319640 -1.59649330 2
[2,] -0.30076544 1.8136359 0.02319640 -1.59649330 2
[3,] 1.73447245 0.4043638 -0.29112385 -0.25102988 3
[3,] 1.73447245 0.4043638 -0.29112385 -0.25102988 3
[3,] 1.73447245 0.4043638 -0.29112385 -0.25102988 3
[4,] 0.01025271 -0.4908636 0.80857300 0.08137033 4
[4,] 0.01025271 -0.4908636 0.80857300 0.08137033 4
[4,] 0.01025271 -0.4908636 0.80857300 0.08137033 4
[4,] 0.01025271 -0.4908636 0.80857300 0.08137033 4
Another question would be how to jitter the newly populated rows, such that there is not compute overlap of the newly copied data.
Some made-up data:
y <- cbind(matrix(runif(16), 4, 4), 1:4)
Just do:
z <- y[rep(seq_len(nrow(y)), y[,5]), ]
# [,1] [,2] [,3] [,4] [,5]
# [1,] 0.5256007 0.07467979 0.95189484 0.2887943 1
# [2,] 0.3083967 0.03518523 0.08380005 0.9168161 2
# [3,] 0.3083967 0.03518523 0.08380005 0.9168161 2
# [4,] 0.8549639 0.79452728 0.22483537 0.4452553 3
# [5,] 0.8549639 0.79452728 0.22483537 0.4452553 3
# [6,] 0.8549639 0.79452728 0.22483537 0.4452553 3
# [7,] 0.5453508 0.47633523 0.51522514 0.3936340 4
# [8,] 0.5453508 0.47633523 0.51522514 0.3936340 4
# [9,] 0.5453508 0.47633523 0.51522514 0.3936340 4
# [10,] 0.5453508 0.47633523 0.51522514 0.3936340 4
And I am not sure what you mean by "jitter", but maybe
z <- z + runif(z) / 1000
?
Related
Let's say I have a symmetric matrix A, for example:
> A <- matrix(runif(16),nrow = 4,byrow = T)
> ind <- lower.tri(A)
> A[ind] <- t(A)[ind]
> A
[,1] [,2] [,3] [,4]
[1,] 0.4212778 0.6874073 0.1551896 0.46757640
[2,] 0.6874073 0.5610995 0.1779030 0.54072946
[3,] 0.1551896 0.1779030 0.9515304 0.79429777
[4,] 0.4675764 0.5407295 0.7942978 0.01206526
I also have a 4 x 3 matrix B that gives specific positions of matrix A, for example:
> B<-matrix(c(1,2,4,2,1,3,3,2,4,4,1,3),nrow=4,byrow = T)
> B
[,1] [,2] [,3]
[1,] 1 2 4
[2,] 2 1 3
[3,] 3 2 4
[4,] 4 1 3
The B matrix represents the following positions of A: (1,1), (1,2), (1,4), (2,2), (2,1), (2,3), (3,3), (3,2), (3,4), (4,4), (4,1), (4,3).
I want to change the values of A that are NOT in the positions given by B, replacing them by Inf. The result I want is:
[,1] [,2] [,3] [,4]
[1,] 0.4212778 0.6874073 Inf 0.46757640
[2,] 0.6874073 0.5610995 0.1779030 Inf
[3,] Inf 0.1779030 0.9515304 0.79429777
[4,] 0.4675764 Inf 0.7942978 0.01206526
How can I do that quickly avoiding a for loop (which I'm able to code)? I've seen many similar posts, but no one gave me what I want. Thank you!
You want to do something like matrix subsetting (e.g., P[Q]) except that you can't use negative indexing in matrix subsetting (e.g., P[-Q] is not allowed). Here's a work-around.
Store the elements you want to retain from A in a 2-column matrix where each row is a coordinate of A:
Idx <- cbind(rep(1:4, each=ncol(B)), as.vector(t(B)))
Create a matrix where all values are Inf, and then overwrite the values you wanted to "keep" from A:
Res <- matrix(Inf, nrow=nrow(A), ncol=ncol(A))
Res[Idx] <- A[Idx]
Result
Res
# [,1] [,2] [,3] [,4]
#[1,] 0.9043131 0.639718071 Inf 0.19158238
#[2,] 0.6397181 0.601327568 0.007363378 Inf
#[3,] Inf 0.007363378 0.752123162 0.61428003
#[4,] 0.1915824 Inf 0.614280026 0.02932679
Here is a one-liner
A[cbind(1:nrow(A), sum(c(1:ncol(A))) - rowSums(B))] <- Inf
[,1] [,2] [,3] [,4]
[1,] 0.4150663 0.23440503 Inf 0.6665222
[2,] 0.2344050 0.38736067 0.01352211 Inf
[3,] Inf 0.01352211 0.88319263 0.9942303
[4,] 0.6665222 Inf 0.99423028 0.7630221
Another way would be to identify the cells with an apply and set then to inf.
cnum <- 1:ncol(A)
A[cbind(1:nrow(A), apply(B, 1, function(x) cnum[-which(cnum %in% x)]))] <- Inf
A
# [,1] [,2] [,3] [,4]
# [1,] 0.9148060 0.9370754 Inf 0.8304476
# [2,] 0.9370754 0.5190959 0.7365883 Inf
# [3,] Inf 0.7365883 0.4577418 0.7191123
# [4,] 0.8304476 Inf 0.7191123 0.9400145
Note: set.seed(42).
A <- matrix(runif(16),nrow = 4,byrow = T)
ind <- lower.tri(A)
A[ind] <- t(A)[ind]
## >A[]
## [,1] [,2] [,3] [,4]
## [1,] 0.07317535 0.167118857 0.0597721 0.2128698
## [2,] 0.16711886 0.008661005 0.6419335 0.6114373
## [3,] 0.05977210 0.641933514 0.7269202 0.3547959
## [4,] 0.21286984 0.611437278 0.3547959 0.4927997
The first thing to notice is that the matrix B is not very helpful in its current form, because the information we need is the rows and each value in B
B<-matrix(c(1,2,4,2,1,3,3,2,4,4,1,3),nrow=4,byrow = T)
> B
## [,1] [,2] [,3]
## [1,] 1 2 4
## [2,] 2 1 3
## [3,] 3 2 4
## [4,] 4 1 3
So we can create that simply by using melt and use Var1 and value.
>melt(B)
## Var1 Var2 value
## 1 1 1 1
## 2 2 1 2
## 3 3 1 3
## 4 4 1 4
## 5 1 2 2
## 6 2 2 1
## 7 3 2 2
## 8 4 2 1
## 9 1 3 4
## 10 2 3 3
## 11 3 3 4
## 12 4 3 3
We need to replace the non existing index in A by inf. This is not easy to do directly. So an easy way out would be to create another matrix of Inf and fill the values of A according to the index of melt(B)
> C<-matrix(Inf,nrow(A),ncol(A))
idx <- as.matrix(melt(B)[,c("Var1","value")])
C[idx]<-A[idx]
> C
## [,1] [,2] [,3] [,4]
## [1,] 0.07317535 0.167118857 0.0597721 0.2128698
## [2,] 0.16711886 0.008661005 0.6419335 Inf
## [3,] Inf 0.641933514 0.7269202 0.3547959
## [4,] 0.21286984 Inf 0.3547959 0.4927997
Another approach that accomplishes matrix subsetting (e.g., P[Q]) would be to create the index Q manually. Here's one approach.
Figure out which column index is "missing" from each row of B:
col_idx <- apply(B, 1, function(x) (1:nrow(A))[-match(x, 1:nrow(A))])
Create subsetting matrix Q
Idx <- cbind(1:nrow(A), col_idx)
Do the replacement
A[Idx] <- Inf
Of course, you can make this a one-liner if you really want to:
A[cbind(1:nrow(A), apply(B, 1, function(x) (1:nrow(A))[-match(x, 1:nrow(A))])]
I'm trying to subset number of rows in a list using R.
I have 2 lists one has matrix with n rows and p columns the second list has the number of rows that I need to subset.
mat <- list(a = matrix(rnorm(8*4),8), b = matrix(rnorm(15*4),15), c = matrix(rnorm(7*4),7))
rw <- list(a = 6, b = 7, c = 4)
Both list have common names, in the above example, I would like to retain for element a first 6 rows, for b first 7 rows and c 4 rows.
How would you do that in R
One solution with Map:
Map(function(x, y) x[1:y, ], mat, rw)
# $a
# [,1] [,2] [,3] [,4]
# [1,] 1.3331549 -0.6985623 -1.1842788 -0.1496880
# [2,] 0.2096395 -0.2901906 0.4210395 0.9116542
# [3,] 0.1763317 1.3858205 -1.1567526 -1.1794618
# [4,] 1.3596395 0.5815012 -0.3681799 -0.6569447
# [5,] 0.2251352 0.2331387 -1.2509844 -1.1346729
# [6,] 0.6796729 1.1274772 0.3992489 0.2305927
#
# $b
# [,1] [,2] [,3] [,4]
# [1,] 0.30700748 -1.2173855 -0.3377885 -0.6748974
# [2,] 1.09506443 -0.6142685 -1.1301122 -0.7792081
# [3,] -0.61049306 -1.3414474 0.9771373 1.0191636
# [4,] 0.66687294 -0.5269721 0.9971987 -0.6514121
# [5,] 0.54623236 0.9020964 0.3252700 -0.3925129
# [6,] -0.04848903 -0.5204047 0.3344675 -0.3232105
# [7,] -0.56502719 -0.3743275 2.1760364 -0.2941956
#
# $c
# [,1] [,2] [,3] [,4]
# [1,] -0.3225609 -0.40126955 -1.787255 -1.5005721
# [2,] 0.3474430 -1.16657015 1.106033 0.3114282
# [3,] 0.4099467 -0.04353555 0.838330 0.3282246
# [4,] -1.4648740 0.51279791 0.198768 -0.3394502
this is my problem:
I have a grid (see plot below), and I need to get and store in a list the coordinates of each vertex of each block (cell). The order of blocks that I need is '1-1', ... '4-1', '1-2', ... '4-2'. To keep it simple I'm just working with the indexes for now.
Based on two vectors with the common East and North coordinates I've written a little function, which is partially producing the output that I need. It is skipping the cell '1-2' and '2-2' (see output below). I can't see where exactly is the error, but I suspect that the issue is in my nested for loop. (There are many questions on for loop, but none helped me with my problem).
Any help will be appreciated and apologise if this is too basic to be asked here.
vectors:
x.breaks <- c(191789.1, 291789.1, 391789.1)
y.breaks <- c(5172287, 5272287, 5372287, 5472287, 5572287)
Function:
getting_vertices <- function(x.breaks, y.breaks){
xs <- list()
ys <- list()
polys <- list()
for(i in 1 : (length(x.breaks)-1)){
xs[[i]] <- c(i, i+1 , i+1, i, i)
}
for(j in 1 : (length(y.breaks)-1)){
ys[[j]] <- c(j, j, j+1, j+1, j)
}
for(v in 1 : length(sapply(ys, length)) ){
for(k in 1: length(sapply(xs, length))){
polys[[v*k]] <- cbind(xs[[k]], ys[[v]])
}
}
return(polys)
}
getting_vertices(x.breaks, y.breaks)
Output (this is partially correct):
[[1]]
[,1] [,2]
[1,] 1 1
[2,] 2 1
[3,] 2 2
[4,] 1 2
[5,] 1 1
[[2]]
[,1] [,2]
[1,] 1 2
[2,] 2 2
[3,] 2 3
[4,] 1 3
[5,] 1 2
[[3]]
[,1] [,2]
[1,] 1 3
[2,] 2 3
[3,] 2 4
[4,] 1 4
[5,] 1 3
[[4]]
[,1] [,2]
[1,] 1 4
[2,] 2 4
[3,] 2 5
[4,] 1 5
[5,] 1 4
[[5]]
NULL
[[6]]
[,1] [,2]
[1,] 2 3
[2,] 3 3
[3,] 3 4
[4,] 2 4
[5,] 2 3
[[7]]
NULL
[[8]]
[,1] [,2]
[1,] 2 4
[2,] 3 4
[3,] 3 5
[4,] 2 5
[5,] 2 4
The logic behind the line polys[[v*k]] <- ... is incorrect, for example, v=2, k=1 will overwrite v=1, k=2. There are no combinations of v and k that make 5 or 7, hence these entries are empty.
I expect that you meant to write something like:
polys[[v+(k-1)*(length(ys))]] <- ...
or
polys[[k+(v-1)*(length(xs))]] <- ...
depending on the order that you want your results in
n the matrix example below (Stocks Return) :
IBOV PETR4 VALE5 ITUB4 BBDC4 PETR3
[1,] -0.03981646 -0.027412907 -0.051282051 -0.05208333 -0.047300526 -0.059805285
[2,] -0.03000415 -0.030534351 -0.046332046 -0.03943116 -0.030090271 -0.010355030
[3,] -0.02241318 -0.026650515 0.000000000 -0.04912517 -0.077559462 0.005231689
[4,] -0.05584830 -0.072184194 -0.066126856 -0.04317056 -0.066704036 0.000000000
[5,] 0.01196833 -0.004694836 0.036127168 -0.00591716 -0.006006006 Inf
[6,] 0.02039587 0.039083558 0.009762901 0.01488095 0.024169184 0.011783189
I would like to replace the 0 (Zeros) and Inf values for the values of the same row in the first column.
Here's a sample matrix
set.seed(15)
stocks<-matrix(rnorm(3*5), nrow=3)
stocks[cbind(c(2,3,1),c(4,4,2))] <- 0
stocks[2,2] <- Inf
stocks
# [,1] [,2] [,3] [,4] [,5]
# [1,] 0.2588229 0.000000 0.0227882 -1.075001 0.1655543
# [2,] 1.8311207 Inf 1.0907732 0.000000 -1.2427850
# [3,] -0.3396186 -1.255386 -0.1321224 0.000000 1.45928777
Now we can find the bad values, and then replace them with the values in the first column of the same row by using matrix indexing and the row() function to find the correct row.
bad <- stocks==0 | is.infinite(stocks)
stocks[bad] <- stocks[row(bad)[bad], 1]
stocks
# [,1] [,2] [,3] [,4] [,5]
# [1,] 0.2588229 0.2588229 0.0227882 -1.0750013 0.1655543
# [2,] 1.8311207 1.8311207 1.0907732 1.8311207 -1.2427850
# [3,] -0.3396186 -1.2553858 -0.1321224 -0.3396186 1.4592877
Is it possible to obtain the actual observations within each cluster after performing k-means in R?
Like for example, after my analysis, I have 2 clusters, and I want to find the exact observations within each cluster, is it possible?
# random samples
x <- matrix(c(rnorm(30,10,2), rnorm(30,0,1)), nrow=12, byrow=T)
# clustering
clusters <- kmeans(x, 2)
# accessing cluster membership
clusters$cluster
[1] 1 1 1 1 1 1 2 2 2 2 2 2
# samples within cluster 1
c1 <- x[which(clusters$cluster == 1),]
# samples within cluster 2
c2 <- x[which(clusters$cluster == 2),]
# printing variables
x
[,1] [,2] [,3] [,4] [,5]
[1,] 10.8415151 9.3075438 9.443433171 13.5402818 7.0574904
[2,] 6.0721775 7.4570368 9.999411972 12.8186182 6.1697638
[3,] 11.3170525 10.9458832 7.576416396 12.7177707 6.7104535
[4,] 8.1377999 8.0558304 9.925363089 11.6547736 9.4911071
[5,] 11.6078294 8.7782984 8.619840508 12.2816048 9.4460169
[6,] 10.2972477 9.1498916 11.769122361 7.6224395 12.0658246
[7,] -0.9373027 -0.5051318 -0.530429758 -0.8200562 -0.0623147
[8,] -0.7257655 -1.1469400 -0.297539831 -0.0477345 -1.0278240
[9,] 0.7285393 -0.6621878 2.914976054 0.6390049 -0.5032553
[10,] 0.2672737 -0.6393167 -0.198287317 0.1430110 -2.2213365
[11,] -0.8679649 0.3354149 -0.003510304 0.6665495 0.6664689
[12,] 0.1731384 -1.8827645 0.270357961 0.3944154 1.3564678
c1
[,1] [,2] [,3] [,4] [,5]
[1,] 10.841515 9.307544 9.443433 13.540282 7.057490
[2,] 6.072177 7.457037 9.999412 12.818618 6.169764
[3,] 11.317053 10.945883 7.576416 12.717771 6.710454
[4,] 8.137800 8.055830 9.925363 11.654774 9.491107
[5,] 11.607829 8.778298 8.619841 12.281605 9.446017
[6,] 10.297248 9.149892 11.769122 7.622439 12.065825
c2
[,1] [,2] [,3] [,4] [,5]
[1,] -0.9373027 -0.5051318 -0.530429758 -0.8200562 -0.0623147
[2,] -0.7257655 -1.1469400 -0.297539831 -0.0477345 -1.0278240
[3,] 0.7285393 -0.6621878 2.914976054 0.6390049 -0.5032553
[4,] 0.2672737 -0.6393167 -0.198287317 0.1430110 -2.2213365
[5,] -0.8679649 0.3354149 -0.003510304 0.6665495 0.6664689
[6,] 0.1731384 -1.8827645 0.270357961 0.3944154 1.3564678