Related
I'm trying to recycle a vector, but don't want to recycle with the default in R.
Imagine I have 2 vectors with unequal number of elements:
gen1 = 2:10
gen2 = 1:10
rbind(gen1,gen2)
This gives this table
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
gen1 2 3 4 5 6 7 8 9 10 2
gen2 1 2 3 4 5 6 7 8 9 10
As you can see in the last column, the 2 gets paired with 10. But I want this:
gen1 = c(2,2:10)
gen2 = 1:10
rbind(gen1,gen2)
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
gen1 2 2 3 4 5 6 7 8 9 10
gen2 1 2 3 4 5 6 7 8 9 10
Now the 2 is duplicated, but at the front. Evidently I do not want to do this by hand since I have a collection of these non pairing vectors which I want to use this trick. Is there a way to do this?
Or perhaps a way to find the 'closest' position possible in the list.
For example, if I have
[,1] [,2] [,3] [,4] [,5] [,6]
gen1 8 9 10 8 9 10
gen2 5 6 7 8 9 10
I would like this to be:
[,1] [,2] [,3] [,4] [,5] [,6]
gen1 8 8 9 9 10 10
gen2 5 6 7 8 9 10
First example in question
1) Convert each to a ts series with appropriate alignment and then use na.locf.
library(zoo)
# inputs
gen1 <- 2:10; gen2 = 1:10
t(na.locf(cbind(gen1 = ts(gen1, start = 2), gen2 = ts(gen2)), fromLast = TRUE))
giving:
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
gen1 2 2 3 4 5 6 7 8 9 10
gen2 1 2 3 4 5 6 7 8 9 10
2) It can also be written with pipes like this
cbind(gen1 = ts(gen1, start = 2), gen2 = ts(gen2)) |>
na.locf(fromLast = TRUE) |>
t()
3) If you want to derive the aligment from the data itself use this:
maxlen <- max(length(gen1), length(gen2))
cbind(gen1 = ts(gen1, end = maxlen), gen2 = ts(gen2, end = maxlen)) |>
na.locf(fromLast = TRUE) |>
t()
4) Another approach is to use dynamic time warping.
library(dtw)
with(dtw(gen1, gen2), rbind(gen1 = gen1[index1], gen2 = gen2[index2]))
giving:
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
gen1 2 2 3 4 5 6 7 8 9 10
gen2 1 2 3 4 5 6 7 8 9 10
Last example in question
The last example in the question seems entirely different and is just a matter of sorting each row.
# input in reproducible form
m <- rbind(gen1 = c(8, 9, 10, 8, 9, 10), gen2 = c(5, 6, 7, 8, 9, 10))
t(apply(m, 1, sort))
giving
[,1] [,2] [,3] [,4] [,5] [,6]
gen1 8 8 9 9 10 10
gen2 5 6 7 8 9 10
I think this is the trick:
gen1 = c(2,2:10)
gen2 = 1:10
ddd=rbind(gen1,gen2)
ddd[1,] = sort(ddd[1,])
ddd
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
gen1 2 2 3 4 5 6 7 8 9 10
gen2 1 2 3 4 5 6 7 8 9 10
I was able to write a function in r to "shift" a column of a matrix over to the right by one:
shift <- function(disc){
mat <- matrix(nrow = 4, ncol = 12)
mat[,1] <- disc[,12]
for(i in 1:11){
mat[,i+1] <- disc[,i]
}
return(mat)
}
So to see how that works:
> disc0
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12]
[1,] 2 5 10 7 16 8 7 8 8 3 4 12
[2,] 3 3 14 14 21 21 9 9 4 4 6 6
[3,] 8 9 10 11 12 13 14 15 4 5 6 7
[4,] 14 11 14 14 11 14 11 14 11 11 14 11
> shift(disc0)
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12]
[1,] 12 2 5 10 7 16 8 7 8 8 3 4
[2,] 6 3 3 14 14 21 21 9 9 4 4 6
[3,] 7 8 9 10 11 12 13 14 15 4 5 6
[4,] 11 14 11 14 14 11 14 11 14 11 11 14
What if I wanted to shift over 3 times, for example? I could do this manually:
> x <- disc0
> x <- shift(x)
> x <- shift(x)
> x <- shift(x)
> x
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12]
[1,] 3 4 12 2 5 10 7 16 8 7 8 8
[2,] 4 6 6 3 3 14 14 21 21 9 9 4
[3,] 5 6 7 8 9 10 11 12 13 14 15 4
[4,] 11 14 11 14 11 14 14 11 14 11 14 11
So now the original first column (2,3,8,14) is now in the 4th column.
But how can I automate this? I want to write a function that will repeat my shift function n times. Thanks in advance
You could write a function that takes in the shift parameter:
shift <- function(x, num = 1){
n <- ncol(x)
x[, c((n - num +1):n, 1:(n - num))]
}
mat
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
[1,] 1 2 3 4 5 6 7 8
[2,] 1 2 3 4 5 6 7 8
[3,] 1 2 3 4 5 6 7 8
[4,] 1 2 3 4 5 6 7 8
shift(mat)
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
[1,] 8 1 2 3 4 5 6 7
[2,] 8 1 2 3 4 5 6 7
[3,] 8 1 2 3 4 5 6 7
[4,] 8 1 2 3 4 5 6 7
shift(mat,2)
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
[1,] 7 8 1 2 3 4 5 6
[2,] 7 8 1 2 3 4 5 6
[3,] 7 8 1 2 3 4 5 6
[4,] 7 8 1 2 3 4 5 6
shift(mat,3)
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
[1,] 6 7 8 1 2 3 4 5
[2,] 6 7 8 1 2 3 4 5
[3,] 6 7 8 1 2 3 4 5
[4,] 6 7 8 1 2 3 4 5
You may use a for loop -
n <- 3
for(i in seq_len(n)) {
disc0 <- shift(disc0)
}
disc0
# [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12]
#[1,] 3 4 12 2 5 10 7 16 8 7 8 8
#[2,] 4 6 6 3 3 14 14 21 21 9 9 4
#[3,] 5 6 7 8 9 10 11 12 13 14 15 4
#[4,] 11 14 11 14 11 14 14 11 14 11 14 11
I have an array of number
x <- seq(1:10)
I am after a matrix with n rows. Here is an example with 3-row matrix:
1 2 3 4 5 6 7 8 9 10
NA 1 2 3 4 5 6 7 8 9
NA NA 1 2 3 4 5 6 7 8
What would be the best way to create one?
There is an odd little function called embed that will do it...
t(embed(c(NA, NA, 1:10), 3))
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,] 1 2 3 4 5 6 7 8 9 10
[2,] NA 1 2 3 4 5 6 7 8 9
[3,] NA NA 1 2 3 4 5 6 7 8
For a vector x and a matrix of n rows, the equivalent would be
t(embed(c(rep(NA, n-1), x), n))
Maybe there is more simpler way to do this but one way to create this matrix would be
create_matrix <- function(x, n) {
t(sapply(seq(n), function(m) c(rep(NA, m - 1), head(x, length(x) - m + 1))))
}
create_matrix(1:10, 3)
# [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
#[1,] 1 2 3 4 5 6 7 8 9 10
#[2,] NA 1 2 3 4 5 6 7 8 9
#[3,] NA NA 1 2 3 4 5 6 7 8
create_matrix(c(4, 3, 6, 8, 7), 4)
# [,1] [,2] [,3] [,4] [,5]
#[1,] 4 3 6 8 7
#[2,] NA 4 3 6 8
#[3,] NA NA 4 3 6
#[4,] NA NA NA 4 3
I'm doing cross validation and I want to separate the data into 3 folds.
I create a matrix withmat=matrix(sample.int(10, 9*100, TRUE), 6, 10) which looks like this:
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,] 10 10 9 3 3 3 4 4 3 9
[2,] 9 3 5 1 3 9 5 5 4 8
[3,] 7 6 6 3 8 2 3 10 7 4
[4,] 7 4 10 8 7 5 2 6 2 8
[5,] 9 7 7 5 3 9 5 8 7 8
[6,] 3 3 1 2 9 3 6 7 6 9
I want to get then 3 matrices with the data:
fold 1
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,] 10 10 9 3 3 3 4 4 3 9
[2,] 9 3 5 1 3 9 5 5 4 8
fold 2
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,] 7 6 6 3 8 2 3 10 7 4
[2,] 7 4 10 8 7 5 2 6 2 8
fold 3
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,] 9 7 7 5 3 9 5 8 7 8
[2,] 3 3 1 2 9 3 6 7 6 9
Here is my code what I did:
require(stats)
mat=matrix(sample.int(10, 9*100, TRUE), 6, 10)
folds=cut(seq(1, nrow(mat)), breaks = 3, labels = FALSE)
#Perform 10 fold cross validation
for(i in 1:3){
#segment your data by folds using the which() function
testIndexes=which(folds==i, arr.ind = TRUE)
testData=mat[testIndexes,]
trainData=mat[-testIndexes,]
}
The training data that I get from fold 1 and fold 2 are connected, I want to generate them separately.
This is the generated training set which should be separate in two folds.
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,] 10 10 9 3 3 3 4 4 3 9
[2,] 9 3 5 1 3 9 5 5 4 8
[3,] 7 6 6 3 8 2 3 10 7 4
[4,] 7 4 10 8 7 5 2 6 2 8
Each loop of my sapply function will out put a n*m matrix. n is fixed, m is not.
For example, if I run this in R:
sapply(1:3, function(x) {matrix(1:9, 3)})
and it will output:
[,1] [,2] [,3]
[1,] 1 1 1
[2,] 2 2 2
[3,] 3 3 3
[4,] 4 4 4
[5,] 5 5 5
[6,] 6 6 6
[7,] 7 7 7
[8,] 8 8 8
[9,] 9 9 9
However, what I want is something like this:
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9]
[1,] 1 4 7 1 4 7 1 4 7
[2,] 2 5 8 2 5 8 2 5 8
[3,] 3 6 9 3 6 9 3 6 9
Any idea for this? Thanks
One solution is:
do.call(cbind, lapply(1:3, function(x) {matrix(1:9, 3)}))
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9]
[1,] 1 4 7 1 4 7 1 4 7
[2,] 2 5 8 2 5 8 2 5 8
[3,] 3 6 9 3 6 9 3 6 9
We can use replicate
`dim<-`(replicate(3, matrix(1:9, 3)), c(3, 3*3))
# [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9]
#[1,] 1 4 7 1 4 7 1 4 7
#[2,] 2 5 8 2 5 8 2 5 8
#[3,] 3 6 9 3 6 9 3 6 9