I am trying to randomize a matrix such that each of the rows in each column are randomized individually so that in the final matrix there is no association between columns. I know that I need to use the sample() function and some sort of for(each column) loop, but I'm not exactly sure of how to go about doing it. Specifically, I am asking how to write a function that will loop through the columns of a matrix and randomize the rows of each column.
Edit: An example of what I'm trying to achieve
Original matrix:
X1 X2 X3
1 4 3 6
2 7 2 4
3 9 5 1
Sample desired output:
X1 X2 X3
1 7 3 1
2 4 5 6
3 9 2 4
As you can see, the rows in each column have been randomized separately.
If you have a matrix X, you can use apply() (ideal for matrix)
apply(X, 2, sample)
Example:
X <- matrix(1:25, 5)
# [,1] [,2] [,3] [,4] [,5]
# [1,] 1 6 11 16 21
# [2,] 2 7 12 17 22
# [3,] 3 8 13 18 23
# [4,] 4 9 14 19 24
# [5,] 5 10 15 20 25
Apply the code above gives:
# [,1] [,2] [,3] [,4] [,5]
# [1,] 3 10 11 16 21
# [2,] 5 8 12 20 22
# [3,] 4 9 14 18 24
# [4,] 2 6 15 19 25
# [5,] 1 7 13 17 23
I did not set random seed via set.seed(), so you will get different result when you run it. But all you need to know is that: the result is random.
If you have a data frame X, you'd better use sapply()
sapply(X, sample)
You could use a for loop for each column.
Or you could use:
apply(x, 2, function(col) sample(col, replace=F))
Related
I have the following matrix
> mat<-rbind(c(9,6),c(10,6),c(11,7),c(12,7),c(12,8),c(12,9),c(12,10),c(12,11),c(12,12),c(13,12))
> mat
[,1] [,2]
[1,] 9 6
[2,] 10 6
[3,] 11 7
[4,] 12 7
[5,] 12 8
[6,] 12 9
[7,] 12 10
[8,] 12 11
[9,] 12 12
[10,] 13 12
I would like to remove duplicate rows based on first column values and store the row whose entry in the second column is maximum. E.g. for the example above, the desidered outcome is
[,1] [,2]
[1,] 9 6
[2,] 10 6
[3,] 11 7
[4,] 12 12
[5,] 13 12
I tried with
> mat[!duplicated(mat[,1]),]
but I obtained
[,1] [,2]
[1,] 9 6
[2,] 10 6
[3,] 11 7
[4,] 12 7
[5,] 13 12
which is different from the desidered outcome for the entry [4,2]. Suggestions?
You can sort the matrix first, using ascending order for column 1 and descending order for column 2. Then the duplicated function will remove all but the maximum column 2 value for each column 1 value.
mat <- mat[order(mat[,1],-mat[,2]),]
mat[!duplicated(mat[,1]),]
[,1] [,2]
[1,] 9 6
[2,] 10 6
[3,] 11 7
[4,] 12 12
[5,] 13 12
Like Josephs solution, but if you add row names first you can keep the original order (which will be the same in this case).
rownames(mat) <- 1:nrow(mat)
mat <- mat[order(mat[,2], -mat[,2]),]
mat <- mat[!duplicated(mat[,1]),]
mat[order(as.numeric(rownames(mat))),]
# [,1] [,2]
# 1 9 6
# 2 10 6
# 3 11 7
# 4 12 12
# 5 13 12
First Sort then keep only the first row for each duplicate
mat <- mat[order(mat[,1], mat[,2]),]
mat[!duplicated(mat[,1]),]
EDIT: Sorry I thought your desired result is last df,Ok so you want max value
mat<-rbind(c(9,6),c(10,6),c(11,7),c(12,7),c(12,8),c(12,9),c(12,10),c(12,11),c(12,12),c(13,12))
#Reverse sort
mat <- mat[order(mat[,1], mat[,2], decreasing=TRUE),]
#Keep only the first row for each duplicate, this will give the largest values
mat <- mat[!duplicated(mat[,1]),]
#finally sort it
mat <- mat[order(mat[,1], mat[,2]),]
I need to create a list of sequences that always goes back to the first digit in the sequence. I've written the code below but it seems clunky. Is there a solution that uses fewer characters?
(i = seq(1, 24, by = 3))
#> [1] 1 4 7 10 13 16 19 22
(i_list = purrr::map(i, ~c(.:(. + 2), .)))
#> [[1]]
#> [1] 1 2 3 1
#>
#> [[2]]
#> ...
Edit: here's a way with lapply(). Not sure why this is getting downvotes, any advice on how to improve the question welcome!
(i_list = lapply(i, function(x) c(x:(x+2), x)))
I was wondering if there's a way with replicate() so have added that tag.
In matrix, rather than list form, theres:
cbind(matrix(1:24, ncol=3,byrow=TRUE),seq(1, 24, by = 3))
[,1] [,2] [,3] [,4]
[1,] 1 2 3 1
[2,] 4 5 6 4
[3,] 7 8 9 7
[4,] 10 11 12 10
[5,] 13 14 15 13
[6,] 16 17 18 16
[7,] 19 20 21 19
[8,] 22 23 24 22
and then you'd iterate over rows of the matrix instead of elements of the list.
Or if you are into code golf:
> seq(1,24,by=3) + t(matrix(c(0,1,2,0),ncol=8,nrow=4))
[,1] [,2] [,3] [,4]
[1,] 1 2 3 1
[2,] 4 5 6 4
[3,] 7 8 9 7
[4,] 10 11 12 10
...
but then how much work do you put into constructing the RHS of the + in this case? How is your question parameterised?
This depends on i having a regular pattern (with some adjustment for step size), it doesn't work for arbitrary i sequences.
I'm trying to calculate the sum column wise of the cells in an 8x8 matrix M[[K]] for when the column is odd it adds the odd rows under it; if the column is even it add the even row cells under it. I then need this to loop through a data folder.
vin <- rep(c(1,0),(NoParticipants/2))
vout <- rep(c(0,1),(NoParticipants/2))
M <- vector("list")
total <- vector("list")
group <- vector("list")
for(k in 1:NoGames){
M[[k]] <- (CumulativeAdjacencyMatrices[[k]][[20]])
total[[k]] <- colSums(M[[k]])
group[[k]] <- (M[[k]]%*%vin)*vin + (NoRounds - (M[[k]]%*%vin))*vout
}
The line with group[[k]] is giving back negative integers (which it shouldn't). How can I re-write the command to do what I want it to? Any thoughts would be greatly appreciated :)
I found where I was wrong. I needed to replace NoRounds with total[[k]] in the last line.
I suggest an alternative to your approach as just traversing through odd and even indexes as follows :
#sample matrix
m <- matrix(sample.int(20, replace = TRUE), nrow = 8, ncol = 8)
> m
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
[1,] 11 9 2 14 10 11 9 2
[2,] 17 6 9 7 4 17 6 9
[3,] 1 10 20 12 18 1 10 20
[4,] 1 14 8 3 12 1 14 8
[5,] 14 10 11 9 2 14 10 11
[6,] 7 4 17 6 9 7 4 17
[7,] 12 18 1 10 20 12 18 1
[8,] 3 12 1 14 8 3 12 1
#adding odd index columns
> colSums(m[,seq(1,ncol(m),2)])
[1] 66 69 83 83
#adding even index columns
> colSums(m[,seq(2,ncol(m),2)])
[1] 83 75 66 69
# column 1 = sum rows 1,3,5,7 and so on...
for(i in 1:(ncol(m)-2)){
m[,i] <- rowSums(m[,seq(i,8,2)])
}
Hope this helps.
Suppose I have the following matrix:
mat <- matrix(1:20, ncol=5)
[,1] [,2] [,3] [,4] [,5]
[1,] 1 5 9 13 17
[2,] 2 6 10 14 18
[3,] 3 7 11 15 19
[4,] 4 8 12 16 20
and the following vector
counts=c(2,1,2)
I need to collapse this matrix by adding the columns based on each value of that vector counts. That means that the first two columns most be added, the third remain equal and sum the last two columns. My resulting matrix must be like this
[,1] [,2] [,3]
[1,] 6 9 30
[2,] 8 10 32
[3,] 10 11 34
[4,] 12 12 36
How could I do this in an automatic way, given that in my case I have a very big matrix and with a vector of counts with different values?
One way would be to replicate the sequence of 'counts' by 'counts' vector, use that to split the column sequence of 'mat' to return a list, loop through the list with sapply, use the column index to subset the 'mat' for each list element and get the rowSums.
mat2 <- sapply(split(1:ncol(mat), rep(seq_along(counts), counts)),
function(i) rowSums(mat[,i,drop=FALSE]))
dimnames(mat2) <- NULL
mat2
# [,1] [,2] [,3]
#[1,] 6 9 30
#[2,] 8 10 32
#[3,] 10 11 34
#[4,] 12 12 36
Another idea, conceptually similar to akrun's:
t(rowsum(t(mat), rep(seq_along(counts), counts)))
# 1 2 3
#[1,] 6 9 30
#[2,] 8 10 32
#[3,] 10 11 34
#[4,] 12 12 36
I have a 3D array of dimensions MxNxO. For each of the M arrays of dimensions NxO, I want to apply a function myfunction that takes as input a NxO array and return a NxO array.
If I do
apply(array, 1, myfunction)
the output is a 2D array of dimension (N*O)xM instead of a 2D array of dimensions MxNxO.
As an example, we can use the identity function from R.
Here is a 3D array
> a <- array(1:20, c(2,2,5))
, , 1
[,1] [,2]
[1,] 1 3
[2,] 2 4
, , 2
[,1] [,2]
[1,] 5 7
[2,] 6 8
, , 3
[,1] [,2]
[1,] 9 11
[2,] 10 12
, , 4
[,1] [,2]
[1,] 13 15
[2,] 14 16
, , 5
[,1] [,2]
[1,] 17 19
[2,] 18 20
and the apply result should be the same (for my needs) but it is a 2D array instead:
> apply(a,1,identity)
[,1] [,2]
[1,] 1 2
[2,] 3 4
[3,] 5 6
[4,] 7 8
[5,] 9 10
[6,] 11 12
[7,] 13 14
[8,] 15 16
[9,] 17 18
[10,] 19 20
Also, I would like to preserve the labels on each dimension of the array (myfunction itself preserves those labels).
You are probably looking for aaply in the plyr package:
a <- array(1:20, c(2,2,5))
> aaply(a,1,identity)
, , = 1
X1 1 2
1 1 3
2 2 4
, , = 2
X1 1 2
1 5 7
2 6 8
, , = 3
X1 1 2
1 9 11
2 10 12
, , = 4
X1 1 2
1 13 15
2 14 16
, , = 5
X1 1 2
1 17 19
2 18 20
If you know the dimensions that your function will be returning (as you do in your trivial example), then you can us vapply. I also would have suggested aaply, though I prefer the result to be 2 2x5 matrices, not 5 2x2 matrices (since a[1, ,] is a 2x5 matrix, so this makes more sense to me).
dim <- c(x=2,y=2,z=5)
dim.n <- lapply(1:length(dim), function(x) paste(names(dim)[x], seq(len=dim[x]), sep="_"))
a <- array(1:20, dim, dim.n)
vapply(dimnames(a)[[1]], function(i) identity(a[i, ,]), a[1, ,])
# , , x_1
#
# z_1 z_2 z_3 z_4 z_5
# y_1 1 5 9 13 17
# y_2 3 7 11 15 19
#
# , , x_2
#
# z_1 z_2 z_3 z_4 z_5
# y_1 2 6 10 14 18
# y_2 4 8 12 16 20
Main advantage of this is it's all base package, so you don't need to include plyr. The plyr answer is fine so long as you're okay with the interpretation of what the return dimensions should be, or don't necessarily know ahead of time the result format.
Also, I recognize the format here is nowhere near as clean as aaply, and wish apply respected the dimensions so one didn't need to resort to these things. I had a very similar question last week.