I want to repeatedly divide a set into two complementary subsets with known size and keep them as the columns of two matrix. For example assume the main set is {1, 2, ..., 10}, the size of first sample is 8 and I want to repeat sampling 3 times. I want to have:
[,1] [,2] [,3]
[1,] 10 9 1
[2,] 8 1 10
[3,] 3 7 5
[4,] 4 2 3
[5,] 1 8 8
[6,] 6 4 2
[7,] 9 5 7
[8,] 5 10 6
and
[,1] [,2] [,3]
[1,] 2 3 4
[2,] 7 6 9
Any idea how to implement it in R avoiding for loops?
I would use replicate + sample, like this:
set.seed(1) # Just so you can replicate my results
A <- replicate(3, sample(10, 8, FALSE)) # Change 3 to the number of replications
A
# [,1] [,2] [,3]
# [1,] 3 7 8
# [2,] 4 1 9
# [3,] 5 2 4
# [4,] 7 8 6
# [5,] 2 5 7
# [6,] 8 10 2
# [7,] 9 4 3
# [8,] 6 6 1
For the other set, I would use apply + setdiff, like this:
B <- apply(A, 2, function(x) setdiff(1:10, x))
B
# [,1] [,2] [,3]
# [1,] 1 3 5
# [2,] 10 9 10
Another option as suggested by #thelatemail (which would be more efficient) is to just create use replicate to create your original matrix, and use basic subsetting to create your separate matrices.
A <- replicate(3, sample(10))
B <- A[-(seq_len(8)), ]
A <- A[seq_len(8), ]
Related
Assume I have to generate 1000 Sample Pairs (Y1,Y2) (from a Normal Distribution with replacement). Each of the pairs should have 20 observations.
y1 <- rep(sample(c(1:10),10, replace = TRUE))
y2 <- rep(sample(c(1:10),10, replace = TRUE))
How would I now generate 1000 of these pairs, so that they are easy to access for further computations.
I had the idea of looping them a 1000 times and saving them in a dataframe, but this may get chaotic.
Is there a simpler/nicer way to do this? A package or a function that I am missing?
Help would be appreciated!
One way is to use replicate, i.e.
replicate(5, rep(sample(c(1:10), 10, replace = TRUE)))
# [,1] [,2] [,3] [,4] [,5]
# [1,] 3 9 2 4 5
# [2,] 4 1 10 8 1
# [3,] 5 6 1 3 7
# [4,] 1 9 9 6 5
# [5,] 5 3 4 7 9
# [6,] 4 5 4 4 5
# [7,] 2 10 9 4 9
# [8,] 3 1 10 5 3
# [9,] 7 3 10 9 10
#[10,] 10 3 10 10 1
I have a small matrix, say
x <- matrix(1:10, nrow = 5) # values 1:10 across 5 rows and 2 columns
The result is
[,1] [,2]
[1,] 1 6
[2,] 2 7
[3,] 3 8
[4,] 4 9
[5,] 5 10
What I want to be able to do now is duplicate random rows in x; for example, producing
[,1] [,2]
[1,] 1 6
[2,] 2 7
[3,] 5 10
[4,] 4 9
[5,] 5 10
I believe the R function 'rep()' is the solution and also 'sample()', but I don't want to have to specify the size argument in sample(); i.e., I want an arbitrary number of rows to be duplicated each time.
Is there a simple way of accomplishing this using rep() and sample()?
We can use the sample function. I've used set.seed for reproducibility, if you remove that line the results should change.
set.seed(1848) # reproducibility
x[sample(x = nrow(x), size = nrow(x), replace = T), ]
[,1] [,2]
[1,] 1 6
[2,] 2 7
[3,] 5 10
[4,] 1 6
[5,] 5 10
Another option could be as sample a row number and replace that with another sampled row number. It will be as:
x[sample(1:nrow(x),1),] <- x[sample(1:nrow(x),1),]
x
# [,1] [,2]
#[1,] 5 10
#[2,] 2 7
#[3,] 3 8
#[4,] 4 9
#[5,] 5 10
OR
Just to duplicate upto 3 random rows, solution could be:
x[sample(1:nrow(x),3),] <- x[sample(1:nrow(x),3),]
Let's say I have a simple vector
v <- 1:5
I can add the vector to each element within the vector with the following code to generate the resulting matrix.
matrix(rep(v, 5), nrow=5, byrow=T) + matrix(rep(v, 5), nrow=5)
[,1] [,2] [,3] [,4] [,5]
[1,] 2 3 4 5 6
[2,] 3 4 5 6 7
[3,] 4 5 6 7 8
[4,] 5 6 7 8 9
[5,] 6 7 8 9 10
But this seems verbose and inefficient. Is there a more concise way to accomplish this? Perhaps some linear algebra concept that is evading me?
outer should do what you want
outer(v, v, `+`)
# [,1] [,2] [,3] [,4] [,5]
# [1,] 2 3 4 5 6
# [2,] 3 4 5 6 7
# [3,] 4 5 6 7 8
# [4,] 5 6 7 8 9
# [5,] 6 7 8 9 10
Posting this answer not for up votes but to highlight Franks comment. You can use
sapply(v,"+",v)
In R, let M be the matrix
[,1] [,2] [,3]
[1,] 1 2 3
[2,] 1 3 3
[3,] 2 4 5
[4,] 6 7 8
I would like to select the submatrix m
[,1] [,2] [,3]
[1,] 1 3 3
[2,] 2 4 5
[3,] 6 7 8
using unique on M[,1], specifying to keep the row with the maximal value in the second columnM.
At the end, the algorithm should keep row [2,] from the set \{[1,], [2,]\}. Unfortunately unique() returns me a vector with actual values, and not row numbers, after elimination of duplicates.
Is there a way to get the asnwer without the package plyr?
Thanks a lot,
Avitus
Here's how:
is.first.max <- function(x) seq_along(x) == which.max(x)
M[as.logical(ave(M[, 2], M[, 1], FUN = is.first.max)), ]
# [,1] [,2] [,3]
# [1,] 1 3 3
# [2,] 2 4 5
# [3,] 6 7 8
You're looking for duplicated.
m <- as.matrix(read.table(text="1 2 3
1 3 3
2 4 5
6 7 8"))
m <- m[order(m[,2], decreasing=TRUE), ]
m[!duplicated(m[,1]),]
# V1 V2 V3
# [1,] 6 7 8
# [2,] 2 4 5
# [3,] 1 3 3
Not the most efficient:
M <- matrix(c(1,1,2,6,2,3,4,7,3,3,5,8),4)
t(sapply(unique(M[,1]),function(i) {temp <- M[M[,1]==i,,drop=FALSE]
temp[which.max(temp[,2]),]
}))
# [,1] [,2] [,3]
#[1,] 1 3 3
#[2,] 2 4 5
#[3,] 6 7 8
I have the following matrix
2 4 1
6 32 1
4 2 1
5 3 2
4 2 2
I want to make the following two matrices based on 3rd column
first
2 4
6 32
4 2
second
5 3
4 2
Best I can come up with, but I get an error
x <- cbind(mat[,1], mat[,2]) if mat[,3]=1
y <- cbind(mat[,1], mat[,2]) if mat[,3]=2
If mat is your matrix:
mat <- matrix(1:15,ncol=3)
mat[,3] <- c(1,1,1,2,2)
> mat
[,1] [,2] [,3]
[1,] 1 6 1
[2,] 2 7 1
[3,] 3 8 1
[4,] 4 9 2
[5,] 5 10 2
Then you can use split:
> lapply( split( mat[,1:2], mat[,3] ), matrix, ncol=2)
$`1`
[,1] [,2]
[1,] 1 6
[2,] 2 7
[3,] 3 8
$`2`
[,1] [,2]
[1,] 4 9
[2,] 5 10
The lapply of matrix is necessary because split drops the attributes that make a vector a matrix, so you need to add them back in.
Yet another example:
#test data
mat <- matrix(1:15,ncol=3)
mat[,3] <- c(1,1,1,2,2)
#make a list storing a matrix for each id as components
result <- lapply(by(mat,mat[,3],identity),as.matrix)
Final product:
> result
$`1`
V1 V2 V3
1 1 6 1
2 2 7 1
3 3 8 1
$`2`
V1 V2 V3
4 4 9 2
5 5 10 2
If you have a matrix A, this will get the first two columns when the third column is 1:
A[A[,3] == 1,c(1,2)]
You can use this to obtain matrices for any value in the third column.
Explanation: A[,3] == 1 returns a vector of booleans, where the i-th position is TRUE if A[i,3] is 1. This vector of booleans can be used to index into a matrix to extract the rows we want.
Disclaimer: I have very little experience with R, this is the MATLAB-ish way to do it.
split.data.frame could be used also to split a matrix.
mat <- matrix(1:15,ncol=3)
mat[,3] <- c(1,1,1,2,2)
x <- split.data.frame(mat[,-3], mat[,3])
x
#$`1`
# [,1] [,2]
#[1,] 1 6
#[2,] 2 7
#[3,] 3 8
#
#$`2`
# [,1] [,2]
#[1,] 4 9
#[2,] 5 10
str(x)
#List of 2
# $ 1: num [1:3, 1:2] 1 2 3 6 7 8
# $ 2: num [1:2, 1:2] 4 5 9 10
Or split the index and and use it in lapply to subset.
lapply(split(seq_along(mat[,3]), mat[,3]), \(i) mat[i, -3, drop=FALSE])
#$`1`
# [,1] [,2]
#[1,] 1 6
#[2,] 2 7
#[3,] 3 8
#
#$`2`
# [,1] [,2]
#[1,] 4 9
#[2,] 5 10
This is a functional version of pedrosorio's idea:
getthird <- function(mat, idx) mat[mat[,3]==idx, 1:2]
sapply(unique(mat[,3]), getthird, mat=mat) #idx gets sent the unique values
#-----------
[[1]]
[,1] [,2]
[1,] 1 6
[2,] 2 7
[3,] 3 8
[[2]]
[,1] [,2]
[1,] 4 9
[2,] 5 10
We can use by or tapply
> by(seq_along(mat[, 3]), mat[, 3], function(k) mat[k, -3])
mat[, 3]: 1
[,1] [,2]
[1,] 1 6
[2,] 2 7
[3,] 3 8
------------------------------------------------------------
mat[, 3]: 2
[,1] [,2]
[1,] 4 9
[2,] 5 10
> tapply(seq_along(mat[, 3]), mat[, 3], function(k) mat[k, -3])
$`1`
[,1] [,2]
[1,] 1 6
[2,] 2 7
[3,] 3 8
$`2`
[,1] [,2]
[1,] 4 9
[2,] 5 10