I have a small matrix, say
x <- matrix(1:10, nrow = 5) # values 1:10 across 5 rows and 2 columns
The result is
[,1] [,2]
[1,] 1 6
[2,] 2 7
[3,] 3 8
[4,] 4 9
[5,] 5 10
What I want to be able to do now is duplicate random rows in x; for example, producing
[,1] [,2]
[1,] 1 6
[2,] 2 7
[3,] 5 10
[4,] 4 9
[5,] 5 10
I believe the R function 'rep()' is the solution and also 'sample()', but I don't want to have to specify the size argument in sample(); i.e., I want an arbitrary number of rows to be duplicated each time.
Is there a simple way of accomplishing this using rep() and sample()?
We can use the sample function. I've used set.seed for reproducibility, if you remove that line the results should change.
set.seed(1848) # reproducibility
x[sample(x = nrow(x), size = nrow(x), replace = T), ]
[,1] [,2]
[1,] 1 6
[2,] 2 7
[3,] 5 10
[4,] 1 6
[5,] 5 10
Another option could be as sample a row number and replace that with another sampled row number. It will be as:
x[sample(1:nrow(x),1),] <- x[sample(1:nrow(x),1),]
x
# [,1] [,2]
#[1,] 5 10
#[2,] 2 7
#[3,] 3 8
#[4,] 4 9
#[5,] 5 10
OR
Just to duplicate upto 3 random rows, solution could be:
x[sample(1:nrow(x),3),] <- x[sample(1:nrow(x),3),]
Related
M = matrix(1:9,3,3)
colnames(M)=c('a','b','c')
Suppose I have a matrix M , with column names 'a','b','c'. And I want to remove the names, so that M
M [,1] [,2] [,3]
[1,] 1 4 7
[2,] 2 5 8
[3,] 3 6 9
Rather than
a b c
[1,] 1 4 7
[2,] 2 5 8
[3,] 3 6 9
How do I do this?
I know it's been a while since this was asked, but seeing as it is a highly trafficked question, I thought this might be useful.
If you want to perform this action on M instead of its column names, you could try
M <- unname(M)
>M
[,1] [,2] [,3]
[1,] 1 4 7
[2,] 2 5 8
[3,] 3 6 9
This would be more efficient if you want to pipe or nest the output into subsequent functions because colnames does not return M.
You can try
colnames(M) <- NULL
Using your example:
> M
# a b c
#[1,] 1 4 7
#[2,] 2 5 8
#[3,] 3 6 9
> colnames(M) <- NULL
> M
# [,1] [,2] [,3]
#[1,] 1 4 7
#[2,] 2 5 8
#[3,] 3 6 9
However, if your data is stored in a data.frame instead of a matrix, this won't work. As explained in ?data.frame:
The column names should be non-empty, and attempts to use empty names will have unsupported results
If your data is stored as a data.frame (this can be checked with class(my_data)), you could try to convert it into a matrix with M <- as.matrix(my_data). Hope this helps.
If you want to delete row names use row.names() function
>M
a b c
1[1,] 1 4 7
2[2,] 2 5 8
3[3,] 3 6 9
>row.names(M)<- NULL ; colnames(M)<- NULL
>M
[,1] [,2] [,3]
[1,] 1 4 7
[2,] 2 5 8
[3,] 3 6 9
I want to repeatedly divide a set into two complementary subsets with known size and keep them as the columns of two matrix. For example assume the main set is {1, 2, ..., 10}, the size of first sample is 8 and I want to repeat sampling 3 times. I want to have:
[,1] [,2] [,3]
[1,] 10 9 1
[2,] 8 1 10
[3,] 3 7 5
[4,] 4 2 3
[5,] 1 8 8
[6,] 6 4 2
[7,] 9 5 7
[8,] 5 10 6
and
[,1] [,2] [,3]
[1,] 2 3 4
[2,] 7 6 9
Any idea how to implement it in R avoiding for loops?
I would use replicate + sample, like this:
set.seed(1) # Just so you can replicate my results
A <- replicate(3, sample(10, 8, FALSE)) # Change 3 to the number of replications
A
# [,1] [,2] [,3]
# [1,] 3 7 8
# [2,] 4 1 9
# [3,] 5 2 4
# [4,] 7 8 6
# [5,] 2 5 7
# [6,] 8 10 2
# [7,] 9 4 3
# [8,] 6 6 1
For the other set, I would use apply + setdiff, like this:
B <- apply(A, 2, function(x) setdiff(1:10, x))
B
# [,1] [,2] [,3]
# [1,] 1 3 5
# [2,] 10 9 10
Another option as suggested by #thelatemail (which would be more efficient) is to just create use replicate to create your original matrix, and use basic subsetting to create your separate matrices.
A <- replicate(3, sample(10))
B <- A[-(seq_len(8)), ]
A <- A[seq_len(8), ]
First create a "row" vector and a "column" vector in R:
> row.vector <- seq(from = 1, length = 4, by = 1)
> col.vector <- {t(seq(from = 1, length = 3, by = 2))}
From that I'd like to create a matrix by, e.g., multiplying each value in the row vector with each value in the column vector, thus creating from just those two vectors:
[,1] [,2] [,3]
[1,] 1 3 5
[2,] 2 6 10
[3,] 3 9 15
[4,] 4 12 20
Can this be done with somehow using apply()? sweep()? ...a for loop?
Thank you for any help!
Simple matrix multiplication will work just fine
row.vector %*% col.vector
# [,1] [,2] [,3]
# [1,] 1 3 5
# [2,] 2 6 10
# [3,] 3 9 15
# [4,] 4 12 20
You'd be better off working with two actual vectors, instead of a vector and a matrix:
outer(row.vector,as.vector(col.vector))
# [,1] [,2] [,3]
#[1,] 1 3 5
#[2,] 2 6 10
#[3,] 3 9 15
#[4,] 4 12 20
Here's a way to get there with apply. Is there a reason why you're not using matrix?
> apply(col.vector, 2, function(x) row.vector * x)
## [,1] [,2] [,3]
## [1,] 1 3 5
## [2,] 2 6 10
## [3,] 3 9 15
## [4,] 4 12 20
I am trying to determine which columns were sampled from a matrix randomly sampled within each row. The function sample does not appear to have the ability to tell you which locations were actually sampled. Now, a simple matching routine can solve the problem if all values are unique. However, they are not in my case, so this will not work.
x <- c(2,3,5,1,6,7,2,3,5,6,3,5)
y <- matrix(x,ncol=4,nrow=3)
random <- t(apply(y,1,sample,2,replace=FALSE))
y
[,1] [,2] [,3] [,4]
[1,] 2 1 2 6
[2,] 3 6 3 3
[3,] 5 7 5 5
random
[,1] [,2]
[1,] 2 6
[2,] 3 3
[3,] 5 5
With repeated values in the original matrix, I cannot tell if random[1,1] was sampled from column 1 or column 3, since they both have a value of 2. Hence, matching won't work here.
Accompanying the matrix "random" I would also like a matrix that gives the column from which each value was sampled, in an identically sized matrix. For example, such as:
[,1] [,2]
[1,] 1 4
[2,] 1 3
[3,] 3 4
Thanks!
You need to save your random selections from sample separately so you don't have to worry about matching later. E.g., using y again:
y
# [,1] [,2] [,3] [,4]
#[1,] 2 1 2 6
#[2,] 3 6 3 3
#[3,] 5 7 5 5
set.seed(42)
randkey <- t(replicate(nrow(y),sample(1:ncol(y),2)))
# [,1] [,2]
#[1,] 4 3
#[2,] 2 3
#[3,] 3 2
random <- matrix(y[cbind(c(row(randkey)), c(randkey))], nrow(y))
# [,1] [,2]
#[1,] 6 2
#[2,] 6 3
#[3,] 5 7
I have the matrix y with variable x:
x
[1,] 0
[2,] 1
[3,] 0
[4,] 0
[5,] 1
[6,] 1
I selected just values with 1. Now I have a vector z:
2 5 6
I need match this vector with lines selected with my matrix y. This a example, I have a big data. I tried y[z], but this don't show the rows. Thanks
y[z,] returns matrix y with rows z.
y[z] returns elements z of matrix y
> y <- matrix(1:12, ncol=3)
> y
[,1] [,2] [,3]
[1,] 1 5 9
[2,] 2 6 10
[3,] 3 7 11
[4,] 4 8 12
> y[c(2,3),]
[,1] [,2] [,3]
[1,] 2 6 10
[2,] 3 7 11
> y[c(2,3)]
[1] 2 3
As Joran points out, if you are working with a single column matrix, include ,drop=FALSE to make sure your output is a matrix.