I look for a R-code that transform the matrix as follows (a: the original matrix, b: the desired output), example:
a <- matrix(c(1,2,3,4,5,6,1,2,3,4,5,6,1,2,3,4,5,6,1,2,3,4,5,6), nrow = 6, ncol = 4)
b <- matrix(c(1,2,3,4,5,6,2,3,4,5,6,0,3,4,5,6,0,0,4,5,6,0,0,0), nrow = 6, ncol = 4)
a
[,1] [,2] [,3] [,4]
[1,] 1 1 1 1
[2,] 2 2 2 2
[3,] 3 3 3 3
[4,] 4 4 4 4
[5,] 5 5 5 5
[6,] 6 6 6 6
b
[,1] [,2] [,3] [,4]
[1,] 1 2 3 4
[2,] 2 3 4 5
[3,] 3 4 5 6
[4,] 4 5 6 0
[5,] 5 6 0 0
[6,] 6 0 0 0
Thus, the first column is not shifted, the second column is shifted up one step, the third column shifted up two steps, and so on. The shifted columns are padded with zeros.
The following links didn't help me (nor: double for-loop, a function with different variables, the codes diag or kronecker).
R: Shift values in single column of dataframe UP
r matrix individual shift operations of elements
Rotate a Matrix in R
Have you any ideas? Thanks.
This seems to work with data.table. Should perform well with a large matrix:
library(data.table)
# One way
dt[, shift(.SD, 0:3, 0, "lead", FALSE), .SDcols = 1]
# Alternatively
dt[, shift(dt, 0:3, 0, "lead", FALSE)][, 1:4]
Both return:
V1 V2 V3 V4
1: 1 2 3 4
2: 2 3 4 5
3: 3 4 5 6
4: 4 5 6 0
5: 5 6 0 0
6: 6 0 0 0
Using the following data:
a <- matrix(c(1,2,3,4,5,6,1,2,3,4,5,6,1,2,3,4,5,6,1,2,3,4,5,6), nrow = 6, ncol = 4)
dt <- setDT(as.data.frame(a))
I have a raw solution using sapply. You shift your column on each iteration of sapply, and then sapply concatenate all the output, that you can feed to matrix with the good size (the size of your initial matrix)
matrix(sapply(1:dim(a)[2], function(x){c(a[x:dim(a)[1], x], rep(0, (x - 1) ))}), ncol = dim(a)[2], nrow = dim(a)[1])
[,1] [,2] [,3] [,4]
[1,] 1 2 3 4
[2,] 2 3 4 5
[3,] 3 4 5 6
[4,] 4 5 6 0
[5,] 5 6 0 0
[6,] 6 0 0 0
You can shift the columns by filling a matrix which have one row more than "a" with the values from "a" (a Warning is generated during the recycling). Select the original number of rows. Replace the lower right triangle with zeros.
nr <- nrow(a)
a2 <- matrix(a, ncol = ncol(a), nrow = nr + 1)[1:nr, ]
a2[col(a2) + row(a2) > nr + 1] <- 0
a2
# [,1] [,2] [,3] [,4]
# [1,] 1 2 3 4
# [2,] 2 3 4 5
# [3,] 3 4 5 6
# [4,] 4 5 6 0
# [5,] 5 6 0 0
# [6,] 6 0 0 0
Building on tyluRp's answer, which almost worked for me, I suggest to loop through all columns and call shift on each, individually. Let's start with a matrix of random numbers here:
a <- matrix(floor(10*runif(24)), ncol=4)
a
[,1] [,2] [,3] [,4]
[1,] 8 4 8 3
[2,] 0 6 9 0
[3,] 1 6 0 7
[4,] 0 3 9 7
[5,] 2 4 2 9
[6,] 4 8 5 6
library(data.table)
dt <- setDT(as.data.frame(a))
Now the loop that does the job...
for (i in 2:length(dt)) dt[,i] <- shift(dt[,i,with=F],(i-1),0,"lead")
...by replacing columns with their shifted version.
The original answers replaced all columns by shifted copies of the first column, thus losing data. This is probably due to the group behaviour of data.table.
Related
I want to generate a symmetric matrix around a diagonal of zeroes and a predetermined sequence around them. In theory the lines should show as
0 1 3 5 7 9
1 0 3 5 7 9
I've tried tweaking with the conditionals, but I suspect that it's wonky because of indexing, which I am nowhere near skilled enough to fix.
bend <- function(n){
m <- seq(1, n, by=2)
a <- length(m)
y <- matrix(nrow= a, ncol = a, byrow= TRUE)
y <- ifelse(row(y) == col(y), 0, m)
y
}
Assuming that the input is a 9, expected output is
0 1 3 5 7 9
1 0 3 5 7 9
1 3 0 5 7 9
1 3 5 0 7 9
1 3 5 7 0 9
1 3 5 7 9 0
Actual output is
0 3 5 7 9 1
3 0 7 9 1 3
5 7 0 1 3 5
7 9 1 0 5 7
9 1 3 5 0 9
1 3 5 7 9 0
There's a simpler way to do what you need. You can start off by creating a matrix of length(x) + 1 columns and rows with all elements as a logical TRUE. Then make the diagonal FALSE using diag(). Now you can replace the TRUEs with your desired vector. The diagonal being FALSE is not affected. Since the values are replaced column-wise you need a final transpose t() to get correct result.
This way, you don't need to worry about tracking indices.
x <- c(1,3,5,7,9)
make_matrix <- function(x) {
m <- matrix(TRUE, ncol = length(x) + 1, nrow = length(x) + 1)
diag(m) <- FALSE
m[m] <- x
t(m)
}
make_matrix(x)
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 0 1 3 5 7 9
[2,] 1 0 3 5 7 9
[3,] 1 3 0 5 7 9
[4,] 1 3 5 0 7 9
[5,] 1 3 5 7 0 9
[6,] 1 3 5 7 9 0
Here's another way with sapply. This creates the necessary row elements in each iteration and puts them in a matrix by column. Again, you need a t() to get correct results. -
sapply(0:length(x), function(a) append(x, 0, after = a)) %>% t()
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 0 1 3 5 7 9
[2,] 1 0 3 5 7 9
[3,] 1 3 0 5 7 9
[4,] 1 3 5 0 7 9
[5,] 1 3 5 7 0 9
[6,] 1 3 5 7 9 0
Benchmarks -
sapply is slower, likely because it's creating the matrix elements one row at a time and calls append for every row. All this overhead is avoided in the make_matrix() approach.
x <- sample(100)
microbenchmark(
make_matrix = make_matrix(x),
sapply = t(sapply(0:length(x), function(a) append(x, 0, after = a))),
akrun_forloop = {
n <- length(x) + 1
m1 <- matrix(0, n, n)
for(i in seq_len(nrow(m1))) m1[i, -i] <- x
},
times = 1000
)
Unit: microseconds
expr min lq mean median uq max neval
make_matrix 111.495 117.5610 128.3135 126.890 135.7540 225.323 1000
sapply 520.620 551.1765 592.2642 573.335 602.2585 10477.221 1000
akrun_forloop 3380.292 3526.3080 3837.1570 3648.765 3812.5075 20943.245 1000
Using a simple for loop
n <- length(x) + 1
m1 <- matrix(0, n, n)
for(i in seq_len(nrow(m1))) m1[i, -i] <- x
m1
# [,1] [,2] [,3] [,4] [,5] [,6]
#[1,] 0 1 3 5 7 9
#[2,] 1 0 3 5 7 9
#[3,] 1 3 0 5 7 9
#[4,] 1 3 5 0 7 9
#[5,] 1 3 5 7 0 9
#[6,] 1 3 5 7 9 0
data
x <- c(1,3,5,7,9)
Is it possible to extend the sample function in R to not return more than say 2 of the same element when replace = TRUE?
Suppose I have a list:
l = c(1,1,2,3,4,5)
To sample 3 elements with replacement, I would do:
sample(l, 3, replace = TRUE)
Is there a way to constrain its output so that only a maximum of 2 of the same elements are returned? So (1,1,2) or (1,3,3) is allowed, but (1,1,1) or (3,3,3) is excluded?
set.seed(0)
The basic idea is to convert sampling with replacement to sampling without replacement.
ll <- unique(l) ## unique values
#[1] 1 2 3 4 5
pool <- rep.int(ll, 2) ## replicate each unique so they each appear twice
#[1] 1 2 3 4 5 1 2 3 4 5
sample(pool, 3) ## draw 3 samples without replacement
#[1] 4 3 5
## replicate it a few times
## each column is a sample after out "simplification" by `replicate`
replicate(5, sample(pool, 3))
# [,1] [,2] [,3] [,4] [,5]
#[1,] 1 4 2 2 3
#[2,] 4 5 1 2 5
#[3,] 2 1 2 4 1
If you wish different value to appear up to different number of times, we can do for example
pool <- rep.int(ll, c(2, 3, 3, 4, 1))
#[1] 1 1 2 2 2 3 3 3 4 4 4 4 5
## draw 9 samples; replicate 5 times
oo <- replicate(5, sample(pool, 9))
# [,1] [,2] [,3] [,4] [,5]
# [1,] 5 1 4 3 2
# [2,] 2 2 4 4 1
# [3,] 4 4 1 1 1
# [4,] 4 2 3 2 5
# [5,] 1 4 2 5 2
# [6,] 3 4 3 3 3
# [7,] 1 4 2 2 2
# [8,] 4 1 4 3 3
# [9,] 3 3 2 2 4
We can call tabulate on each column to count the frequency of 1, 2, 3, 4, 5:
## set `nbins` in `tabulate` so frequency table of each column has the same length
apply(oo, 2L, tabulate, nbins = 5)
# [,1] [,2] [,3] [,4] [,5]
#[1,] 2 2 1 1 2
#[2,] 1 2 3 3 3
#[3,] 2 1 2 3 2
#[4,] 3 4 3 1 1
#[5,] 1 0 0 1 1
The count in all columns meet the frequency upper bound c(2, 3, 3, 4, 1) we have set.
Would you explain the difference between rep and rep.int?
rep.int is not the "integer" method for rep. It is just a faster primitive function with less functionality than rep. You can get more details of rep, rep.int and rep_len from the doc page ?rep.
Using R, I am trying to extract unique rows in a matrix, where a "unique row" is subject to all the values in a given row.
For example if I had this data set:
x = matrix(c(1,1,1,2,2,5,1,2,2,1,2,1,5,3,5,2,1,1),6,3)
Rows 1 & 6, and rows 4 & 5 are duplicated since (1,1,5) = (5,1,1) and (2,1,2) = (2,2,1).
Ultimately, i'm trying to end up with something in the form of:
y = matrix(c(1,1,1,2,1,2,2,1,5,3,5,2),4,3)
or
z = matrix(c(1,1,2,5,2,2,2,1,3,5,1,1),4,3)
The order doesn't matter as long as only one of the unique rows remains. I've searched online, but functions such as unique() and duplicated() have only worked for exact matching rows.
Thanks in advance for any help you provide.
Another answer: use sets. Slightly modified matrix:
library(sets)
x <- matrix(c(1,1,1,2,2,5,5, 1,2,2,1,2,1,5, 5,3,5,2,1,1,1),7,3)
x
[,1] [,2] [,3]
[1,] 1 1 5
[2,] 1 2 3
[3,] 1 2 5
[4,] 2 1 2
[5,] 2 2 1
[6,] 5 1 1
[7,] 5 5 1
If (5,1,1) = (5,5,1) you can use just ordinary sets:
a <- sapply(1:nrow(x), function(i) as.set(x[i,]))
x[!duplicated(a),]
[,1] [,2] [,3]
[1,] 1 1 5
[2,] 1 2 3
[3,] 1 2 5
[4,] 2 1 2
Note: rows 6 and 7 are both gone.
If (5,1,1) != (5,5,1), use generalized sets:
b <- sapply(1:nrow(x), function(i) as.gset(x[i,]))
x[!duplicated(b),]
[,1] [,2] [,3]
[1,] 1 1 5
[2,] 1 2 3
[3,] 1 2 5
[4,] 2 1 2
[5,] 5 5 1
I have a sparse matrix represented as
> (f <- data.frame(row=c(1,2,3,1,2,1,2,3,4,1,1,2),value=1:12))
row value
1 1 1
2 2 2
3 3 3
4 1 4
5 2 5
6 1 6
7 2 7
8 3 8
9 4 9
10 1 10
11 1 11
12 2 12
Here the first column is always present (in fact, the first few are present, the rest are not).
I want to get the data into the matrix format:
> t(matrix(c(1,2,3,NA,4,5,NA,NA,6,7,8,9,10,NA,NA,NA,11,12,NA,NA),nrow=4,ncol=5))
[,1] [,2] [,3] [,4]
[1,] 1 2 3 NA
[2,] 4 5 NA NA
[3,] 6 7 8 9
[4,] 10 NA NA NA
[5,] 11 12 NA NA
Here is what seems to be working:
> library(Matrix)
> as.matrix(sparseMatrix(i = cumsum(f[[1]] == 1), j=f[[1]], x=f[[2]]))
[,1] [,2] [,3] [,4]
[1,] 1 2 3 0
[2,] 4 5 0 0
[3,] 6 7 8 9
[4,] 10 0 0 0
[5,] 11 12 0 0
Except that I have to replace 0 with NA myself.
Is there a better solution?
You can do everything with base functions. The trick is to use indexing by a 2-col (row and col indices) matrix:
j <- f$row
i <- cumsum(j == 1)
x <- f$value
m <- matrix(NA, max(i), max(j))
m[cbind(i, j)] <- x
m
Whether it is better or not than using the Matrix package is subjective. Overkill in my opinion if you are not doing anything else with it. Also if your data had 0 in the f$value column, they would end up being converted as NA if you are not too careful.
In R, let M be the matrix
[,1] [,2] [,3]
[1,] 1 2 3
[2,] 1 3 3
[3,] 2 4 5
[4,] 6 7 8
I would like to select the submatrix m
[,1] [,2] [,3]
[1,] 1 3 3
[2,] 2 4 5
[3,] 6 7 8
using unique on M[,1], specifying to keep the row with the maximal value in the second columnM.
At the end, the algorithm should keep row [2,] from the set \{[1,], [2,]\}. Unfortunately unique() returns me a vector with actual values, and not row numbers, after elimination of duplicates.
Is there a way to get the asnwer without the package plyr?
Thanks a lot,
Avitus
Here's how:
is.first.max <- function(x) seq_along(x) == which.max(x)
M[as.logical(ave(M[, 2], M[, 1], FUN = is.first.max)), ]
# [,1] [,2] [,3]
# [1,] 1 3 3
# [2,] 2 4 5
# [3,] 6 7 8
You're looking for duplicated.
m <- as.matrix(read.table(text="1 2 3
1 3 3
2 4 5
6 7 8"))
m <- m[order(m[,2], decreasing=TRUE), ]
m[!duplicated(m[,1]),]
# V1 V2 V3
# [1,] 6 7 8
# [2,] 2 4 5
# [3,] 1 3 3
Not the most efficient:
M <- matrix(c(1,1,2,6,2,3,4,7,3,3,5,8),4)
t(sapply(unique(M[,1]),function(i) {temp <- M[M[,1]==i,,drop=FALSE]
temp[which.max(temp[,2]),]
}))
# [,1] [,2] [,3]
#[1,] 1 3 3
#[2,] 2 4 5
#[3,] 6 7 8