I have a matrix with 2 columns, and I'd like to turn it into a matrix with specified dimensions.
> t <- matrix(rnorm(20), ncol=2, nrow=10)
[,1] [,2]
[1,] 1.4938530 1.2493088
[2,] -0.8079445 1.8715868
[3,] 0.5775695 -0.9277420
[4,] 0.4415969 2.6357908
[5,] 0.3209226 -1.1306049
[6,] 0.5109251 -0.8661100
[7,] 1.9495571 0.2092941
[8,] 0.7816373 1.1517466
[9,] 0.0300595 -0.1351532
[10,] 0.7550894 0.7778869
What I'd like to do is something like:
> tt <- matrix(t, ncol=4, nrow=5)
[,1] [,2] [3,] [4,]
[1,] 1.4938530 1.2493088 -0.8079445 1.8715868
[2,] 0.5775695 -0.9277420 0.4415969 2.6357908
[3,] etc.
I tried to do things with modulo but my head hurts too much for me to try even one more minute.
You can transpose your first matrix, so that data is stored in the order you want, and then fill the second matrix by row:
tt <- matrix(t(t), ncol=4, nrow=5, byrow = T)
t
# [,1] [,2]
# [1,] -1.4162465950 0.01532476
# [2,] -0.2366332875 -0.04024386
# [3,] 0.5146631983 -0.34720239
# [4,] 1.9243922633 -0.24016160
# [5,] 1.6161165230 0.63187438
# [6,] -0.3558181508 -0.73199138
# [7,] 0.7459405376 0.01934826
# [8,] -1.0428581093 -2.04422042
# [9,] 0.0003166344 0.98973993
#[10,] 0.6390745275 -0.65584930
tt
# [,1] [,2] [,3] [,4]
# [1,] -1.4162465950 0.01532476 -0.2366333 -0.04024386
# [2,] 0.5146631983 -0.34720239 1.9243923 -0.24016160
# [3,] 1.6161165230 0.63187438 -0.3558182 -0.73199138
# [4,] 0.7459405376 0.01934826 -1.0428581 -2.04422042
# [5,] 0.0003166344 0.98973993 0.6390745 -0.65584930
When you work with matrix in R, you can think of it as a vector with data stored column by column. So extracting data by row from a matrix is not as straight forward as extracting by column which is essentially how data is stored. After transposing the first matrix, the data will be stored in an order you want to extract and then fill the second matrix by row would be straight forward.
Related
I have a list of matrices and a list of vectors, and I want to divide the columns of each matrix with the corresponding vector element.
For example, given
set.seed(230)
data <- list(cbind(c(NA, rnorm(6)),c(rnorm(6),NA)), cbind(runif(7), runif(7)))
divisors <- list(c(0.5,2), c(3,4))
I'm looking for a vectorized function that produces output that looks the same as
for(i in 1:length(data)){
for(j in 1:ncol(data[[i]])){data[[i]][,j] <- data[[i]][,j] / divisors[[i]][j]}
}
i.e.
[[1]]
[,1] [,2]
[1,] NA 0.28265752
[2,] -0.46967014 -0.07132588
[3,] 0.20253439 -0.37432527
[4,] 0.65736410 0.06630705
[5,] 0.72349294 0.67202129
[6,] 0.88532648 -0.80892508
[7,] 0.08162027 NA
[[2]]
[,1] [,2]
[1,] 0.26597435 0.18120979
[2,] 0.31213250 0.16493883
[3,] 0.19250804 0.14104145
[4,] 0.21196882 0.10172964
[5,] 0.10389773 0.04979742
[6,] 0.02754329 0.15064043
[7,] 0.25771766 0.23042586
The closest I have been able to come is
Map(`/`, data, divisors)
But that divides rows (rather than columns) of the matrix by the vector. Any help appreciated.
Transpose your matrices before and after:
lapply(Map(`/`, lapply(data, t), divisors), t)
# [[1]]
# [,1] [,2]
# [1,] NA 0.28265752
# [2,] -0.46967014 -0.07132588
# [3,] 0.20253439 -0.37432527
# [4,] 0.65736410 0.06630705
# [5,] 0.72349294 0.67202129
# [6,] 0.88532648 -0.80892508
# [7,] 0.08162027 NA
#
# [[2]]
# [,1] [,2]
# [1,] 0.26597435 0.18120979
# [2,] 0.31213250 0.16493883
# [3,] 0.19250804 0.14104145
# [4,] 0.21196882 0.10172964
# [5,] 0.10389773 0.04979742
# [6,] 0.02754329 0.15064043
# [7,] 0.25771766 0.23042586
I prefer the transpose approach above, but another option is to expand your divisor vectors into matrices of the same dimensions as in data:
div_mat = Map(matrix, data = divisors, nrow = sapply(data, nrow), ncol = 2, byrow = T)
Map("/", data, div_mat)
so basicly I want to separate a random generated matrix into 2 matrix, 1 for training and 1 for testing.
a <- s[sample(nrow(s),size=3,replace=FALSE),]
b <- s[-a,]
> s
[,1] [,2]
[1,] 0.69779187 -0.75869384
[2,] -0.46857477 -0.33813598
[3,] 0.53903809 -0.95950598
[4,] -0.33312675 -0.49951164
[5,] 0.88500834 0.08256923
[6,] 0.63664652 0.87420720
[7,] 0.61614134 0.77893294
[8,] 0.36956134 0.07586245
[9,] -0.03678593 -0.23743987
[10,] -0.27057064 -0.86067063
> a
[,1] [,2]
[1,] 0.8850083 0.08256923
[2,] 0.6366465 0.87420720
[3,] -0.2705706 -0.86067063
> b
[,1] [,2]
The idea here is generate a 10*2 matrix, and random pick 3 rows as training data from matrix, then output the training matrix and the rest row of matrix as testing matrix.
Does anyone has some suggestions on how to delete a from s?
The issue is that you're trying to index s with a matrix a, rather than the randomly selected indices. Modifying your code to the following should do the trick:
i <- sample(nrow(s),size=3,replace=FALSE)
a <- s[i,]
b <- s[-i,] # Note the indexing with i, rather than a
Very, very specific question, but I'm stuck trying to unravel the code within contr.poly() in R.
I am at what I think is the last hurdle... There is this internal function, make.poly(), which is the critical part of contr.poly(). Within make.poly I see that there is a raw matrix generated, which for contr.poly(4) is:
[,1] [,2] [,3] [,4]
[1,] 1 -1.5 1 -0.3
[2,] 1 -0.5 -1 0.9
[3,] 1 0.5 -1 -0.9
[4,] 1 1.5 1 0.3
From there the function sweep() is applied with the following call and result:
Z <- sweep(raw, 2L, apply(raw, 2L, function(x) sqrt(sum(x^2))),
"/", check.margin = FALSE)
[,1] [,2] [,3] [,4]
[1,] 0.5 -0.6708204 0.5 -0.2236068
[2,] 0.5 -0.2236068 -0.5 0.6708204
[3,] 0.5 0.2236068 -0.5 -0.6708204
[4,] 0.5 0.6708204 0.5 0.2236068
I am familiar with the apply functions, and I guess sweep is similar, at least in syntax, but I don't understand what 2L is doing, and I don't know if "/" and check.margin = F are important to understand the mathematical operation being performed.
EDIT: Quite easy... thanks to this - it just normalizes vector lengths by dividing "/" by the function(x) applied column-wise, each entry of the matrix.
Here is an example that answers the operation in the function sweep().
I start with a matrix
> set.seed(0)
> (mat = matrix(rnorm(30, 5, 3), nrow= 10))
[,1] [,2] [,3]
[1,] 8.7888629 7.290780 4.327196
[2,] 4.0212999 2.602972 6.132187
[3,] 8.9893978 1.557029 5.400009
[4,] 8.8172880 4.131615 7.412569
[5,] 6.2439243 4.102355 4.828680
[6,] 0.3801499 3.765468 6.510824
[7,] 2.2142989 5.756670 8.257308
[8,] 4.1158387 2.324237 2.927138
[9,] 4.9826985 6.307050 1.146202
[10,] 12.2139602 1.287385 5.140179
and I want to center the data columnwise. Granted, I could use scale(mat, center = T, scale = F) and be done, but I find that this function give you a list of attributes at the end as such:
attr(,"scaled:center")
[1] 6.076772 3.912556 5.208229
corresponding to the column means. Good to have, but I just wanted the matrix, clean and neat. So it turns out that this can be achieved with:
> (centered = sweep(mat, 2, apply(mat,2, function(x) mean(x)),"-"))
[,1] [,2] [,3]
[1,] 2.7120910 3.3782243 -0.88103281
[2,] -2.0554720 -1.3095838 0.92395779
[3,] 2.9126259 -2.3555271 0.19177993
[4,] 2.7405161 0.2190592 2.20433938
[5,] 0.1671524 0.1897986 -0.37954947
[6,] -5.6966220 -0.1470886 1.30259477
[7,] -3.8624730 1.8441143 3.04907894
[8,] -1.9609332 -1.5883194 -2.28109067
[9,] -1.0940734 2.3944938 -4.06202721
[10,] 6.1371883 -2.6251713 -0.06805063
So the sweep() function is understood as:
sweep(here goes matrix name to sweep through, tell me if you want to do it column (2) or row wise (1), but first let's calculate the second argument to use in the sweep - let's use apply on either the same matrix, or another matrix: just type the name here, again... column or row wise, now define a function(x) mean(x), almost done: now the actual operation in the function in quotes: "-" or "/"... and done
Interestingly, we could have used the means of the columns of a completely different matrix to then sweep through the original matrix - presumably a more complex operation, more in line with the reason why this function was developed.
> aux.mat = matrix(rnorm(9), nrow = 3)
> aux.mat
[,1] [,2] [,3]
[1,] -0.2793463 -0.4527840 -1.065591
[2,] 1.7579031 -0.8320433 -1.563782
[3,] 0.5607461 -1.1665705 1.156537
> (centered = sweep(mat, 2, apply(aux.mat,2, function(x) mean(x)),"-"))
[,1] [,2] [,3]
[1,] 8.1090952 8.107913 4.818142
[2,] 3.3415323 3.420105 6.623132
[3,] 8.3096302 2.374162 5.890954
[4,] 8.1375203 4.948748 7.903514
[5,] 5.5641567 4.919487 5.319625
[6,] -0.2996178 4.582600 7.001769
[7,] 1.5345313 6.573803 8.748253
[8,] 3.4360710 3.141369 3.418084
[9,] 4.3029308 7.124183 1.637147
[10,] 11.5341925 2.104517 5.631124
Data
I have a list of lists that looks something like this:
sublist1 <- list(power=as.matrix(c(rnorm(10)),c(rnorm)),x=rnorm(10),y=rnorm(10))
sublist2 <- list(power=as.matrix(c(rnorm(10)),c(rnorm)),x=rnorm(10),y=rnorm(10))
sublist3 <- list(power=as.matrix(c(rnorm(10)),c(rnorm)),x=rnorm(10),y=rnorm(10))
mylist = list(sublist1,sublist2,sublist3)
My goal would be to pull out only the matrices named power
I've tried
mylist_power =mylist[sapply(mylist, '[', 'Power')]
But thats not working.
Brownie point alert!!!
How can I find the mean of the newly created list of matrices named power?
mylist_power <- sapply(mylist, '[', 'power')
and some means:
sapply(mylist_power, mean) # one per matrix
sapply(mylist_power, colMeans) # for each column and each matrix
sapply(mylist_power, rowMeans) # for each row and each matrix
mean(unlist(mylist_power)) # for the whole list
Reduce(`+`, mylist_power) / length(mylist_power) # element-wise
purrr solution which can be replicated to baseR's Map
#part 1 (to return only $power of every list item)
map(mylist, ~.x$power)
[[1]]
[,1]
[1,] 0.33281918
[2,] -1.12404046
[3,] -0.70613078
[4,] -0.72754386
[5,] -1.83431439
[6,] -0.40768794
[7,] 0.02686119
[8,] 0.91162864
[9,] 1.63434648
[10,] 0.06068561
[[2]]
[,1]
[1,] -0.02256943
[2,] -0.90315486
[3,] 0.90777295
[4,] 1.16194290
[5,] -0.45795340
[6,] 0.92795667
[7,] -2.10293514
[8,] -1.67716711
[9,] 1.76565577
[10,] 0.79444742
[[3]]
[,1]
[1,] -0.36200564
[2,] -1.13955016
[3,] -0.81537133
[4,] 1.31024563
[5,] -0.25836094
[6,] 0.60626489
[7,] 0.31344822
[8,] 0.05360308
[9,] 1.12825379
[10,] -0.55813346
part-2
map(mylist, ~.x$power %>% colMeans)
[[1]]
[1] -0.1833376
[[2]]
[1] 0.03939958
[[3]]
[1] 0.02783941
To get these values in a vector instead
map_dbl(mylist, ~.x$power %>% colMeans)
[1] -0.18333763 0.03939958 0.02783941
Is it possible to obtain the actual observations within each cluster after performing k-means in R?
Like for example, after my analysis, I have 2 clusters, and I want to find the exact observations within each cluster, is it possible?
# random samples
x <- matrix(c(rnorm(30,10,2), rnorm(30,0,1)), nrow=12, byrow=T)
# clustering
clusters <- kmeans(x, 2)
# accessing cluster membership
clusters$cluster
[1] 1 1 1 1 1 1 2 2 2 2 2 2
# samples within cluster 1
c1 <- x[which(clusters$cluster == 1),]
# samples within cluster 2
c2 <- x[which(clusters$cluster == 2),]
# printing variables
x
[,1] [,2] [,3] [,4] [,5]
[1,] 10.8415151 9.3075438 9.443433171 13.5402818 7.0574904
[2,] 6.0721775 7.4570368 9.999411972 12.8186182 6.1697638
[3,] 11.3170525 10.9458832 7.576416396 12.7177707 6.7104535
[4,] 8.1377999 8.0558304 9.925363089 11.6547736 9.4911071
[5,] 11.6078294 8.7782984 8.619840508 12.2816048 9.4460169
[6,] 10.2972477 9.1498916 11.769122361 7.6224395 12.0658246
[7,] -0.9373027 -0.5051318 -0.530429758 -0.8200562 -0.0623147
[8,] -0.7257655 -1.1469400 -0.297539831 -0.0477345 -1.0278240
[9,] 0.7285393 -0.6621878 2.914976054 0.6390049 -0.5032553
[10,] 0.2672737 -0.6393167 -0.198287317 0.1430110 -2.2213365
[11,] -0.8679649 0.3354149 -0.003510304 0.6665495 0.6664689
[12,] 0.1731384 -1.8827645 0.270357961 0.3944154 1.3564678
c1
[,1] [,2] [,3] [,4] [,5]
[1,] 10.841515 9.307544 9.443433 13.540282 7.057490
[2,] 6.072177 7.457037 9.999412 12.818618 6.169764
[3,] 11.317053 10.945883 7.576416 12.717771 6.710454
[4,] 8.137800 8.055830 9.925363 11.654774 9.491107
[5,] 11.607829 8.778298 8.619841 12.281605 9.446017
[6,] 10.297248 9.149892 11.769122 7.622439 12.065825
c2
[,1] [,2] [,3] [,4] [,5]
[1,] -0.9373027 -0.5051318 -0.530429758 -0.8200562 -0.0623147
[2,] -0.7257655 -1.1469400 -0.297539831 -0.0477345 -1.0278240
[3,] 0.7285393 -0.6621878 2.914976054 0.6390049 -0.5032553
[4,] 0.2672737 -0.6393167 -0.198287317 0.1430110 -2.2213365
[5,] -0.8679649 0.3354149 -0.003510304 0.6665495 0.6664689
[6,] 0.1731384 -1.8827645 0.270357961 0.3944154 1.3564678