I have a large (rectangular) vector of strings, e.g:
my.strings <- c("1234567", "1234567", "1234567", "1234567")
which I would like to convert to a matrix:
# [,1] [,2] [,3] [,4] [,5] [,6] [,7]
# [1,] 1 2 3 4 5 6 7
# [2,] 1 2 3 4 5 6 7
# [3,] 1 2 3 4 5 6 7
# [4,] 1 2 3 4 5 6 7
Is there a simple way to do this in R? (Unfortunately, yes the strings of numbers are indeed character strings and not numeric.)
We could use strsplit to split at '', and then rbind the list elements after converting the type
do.call(rbind, type.convert(strsplit(my.strings, ""), as.is = TRUE))
[,1] [,2] [,3] [,4] [,5] [,6] [,7]
[1,] 1 2 3 4 5 6 7
[2,] 1 2 3 4 5 6 7
[3,] 1 2 3 4 5 6 7
[4,] 1 2 3 4 5 6 7
Here, we assume the strings have the same number of characters (nchar). If it is different, the lengths will be different and thus have to pad NA before reshaping to matrix
lst1 <- type.convert(strsplit(my.strings, ""), as.is = TRUE)
mx <- max(lengths(lst1))
do.call(rbind, lapply(lst1, `length<-`, mx))
Another possible solution:
library(tidyverse)
my.strings <- c("1234567", "1234567", "1234567", "1234567")
my.strings %>%
sapply(function(x) str_split(x,"") %>% unlist %>% as.numeric) %>%
unname %>% t
#> [,1] [,2] [,3] [,4] [,5] [,6] [,7]
#> [1,] 1 2 3 4 5 6 7
#> [2,] 1 2 3 4 5 6 7
#> [3,] 1 2 3 4 5 6 7
#> [4,] 1 2 3 4 5 6 7
Here's another way:
matrix(as.numeric(unlist(strsplit(my.strings, ""))), nrow = length(my.strings), byrow=T)
[,1] [,2] [,3] [,4] [,5] [,6] [,7]
[1,] 1 2 3 4 5 6 7
[2,] 1 2 3 4 5 6 7
[3,] 1 2 3 4 5 6 7
[4,] 1 2 3 4 5 6 7
Related
Given a vector, 1:4, and a sequence length, 2, I would like to separate the vector into 'sub-vectors', each with a length of 2, and generate a matrix of all possible combinations of these sub-vectors.
Output would look like this:
[,1] [,2] [,3] [,4]
[1,] 1 2 3 4
[2,] 3 4 1 2
Another example. With vector 1:8 and sub-vector length of 4, output would look like this:
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
[1,] 1 2 3 4 5 6 7 8
[2,] 5 6 7 8 1 2 3 4
With a vector 1:9 and sub-vector length of 3, output would look like this:
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9]
[1,] 1 2 3 4 5 6 7 8 9
[2,] 1 2 3 7 8 9 4 5 6
[3,] 4 5 6 1 2 3 7 8 9
[4,] 4 5 6 7 8 9 1 2 3
[5,] 7 8 9 4 5 6 1 2 3
[6,] 7 8 9 1 2 3 4 5 6
It's a given that the vector length must be divisible by the sub-vector length.
I can answer the whole question, but it will take a bit longer. This should give you the flavour of the answer.
The package combinat has a function called permn which gives you the all the permutations of a vector. You want this, but not quite. What you need is the permutations of all the blocks. So in your first example you have two blocks of length two, and in your second example you have three blocks of length three. If we look at the first, and think about ordering the blocks:
> library(combinat)
> numBlocks = 2
> permn(1:numBlocks)
[[1]]
[1] 1 2
[[2]]
[1] 2 1
So I hope you can see that the first permutation would take the blocks b1 = c(1,2), and b2 = c(3,4) and order them c(b1,b2), and the second would order them c(b2,b1).
Equally if you had three blocks, b1 = 1:3; b2 = 4:6; b3 = 7:9 then
permn(1:3)
[[1]]
[1] 1 2 3
[[2]]
[1] 1 3 2
[[3]]
[1] 3 1 2
[[4]]
[1] 3 2 1
[[5]]
[1] 2 3 1
[[6]]
[1] 2 1 3
gives you the ordering of these blocks. The more general solution is figuring out how to move the blocks around, but that isn't too hard.
Update: Using my multicool package. Note co-lexical ordering (coolex) isn't the order you'd come up with by yourself.
library(multicool)
combs = function(v, blockLength){
if(length(v) %% blockLength != 0){
stop("vector length must be divisible by blockLength")
}
numBlocks = length(v) / blockLength
blockWise = matrix(v, nc = blockLength, byrow = TRUE)
m = initMC(1:numBlocks)
Perms = allPerm(m)
t(apply(Perms, 1, function(p)as.vector(t(blockWise[p,]))))
}
> combs(1:4, 2)
[,1] [,2] [,3] [,4]
[1,] 3 4 1 2
[2,] 1 2 3 4
> combs(1:9, 3)
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9]
[1,] 7 8 9 4 5 6 1 2 3
[2,] 1 2 3 7 8 9 4 5 6
[3,] 7 8 9 1 2 3 4 5 6
[4,] 4 5 6 7 8 9 1 2 3
[5,] 1 2 3 4 5 6 7 8 9
[6,] 4 5 6 1 2 3 7 8 9
I have an array of number
x <- seq(1:10)
I am after a matrix with n rows. Here is an example with 3-row matrix:
1 2 3 4 5 6 7 8 9 10
NA 1 2 3 4 5 6 7 8 9
NA NA 1 2 3 4 5 6 7 8
What would be the best way to create one?
There is an odd little function called embed that will do it...
t(embed(c(NA, NA, 1:10), 3))
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,] 1 2 3 4 5 6 7 8 9 10
[2,] NA 1 2 3 4 5 6 7 8 9
[3,] NA NA 1 2 3 4 5 6 7 8
For a vector x and a matrix of n rows, the equivalent would be
t(embed(c(rep(NA, n-1), x), n))
Maybe there is more simpler way to do this but one way to create this matrix would be
create_matrix <- function(x, n) {
t(sapply(seq(n), function(m) c(rep(NA, m - 1), head(x, length(x) - m + 1))))
}
create_matrix(1:10, 3)
# [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
#[1,] 1 2 3 4 5 6 7 8 9 10
#[2,] NA 1 2 3 4 5 6 7 8 9
#[3,] NA NA 1 2 3 4 5 6 7 8
create_matrix(c(4, 3, 6, 8, 7), 4)
# [,1] [,2] [,3] [,4] [,5]
#[1,] 4 3 6 8 7
#[2,] NA 4 3 6 8
#[3,] NA NA 4 3 6
#[4,] NA NA NA 4 3
The title with the following example should be self-explanatory:
m = unique(replicate(5, sample(1:5, 5, rep=F)), MARGIN = 2)
m
[,1] [,2] [,3] [,4] [,5]
[1,] 1 5 1 4 3
[2,] 5 1 5 1 2
[3,] 4 3 3 3 1
[4,] 3 4 4 5 5
[5,] 2 2 2 2 4
But what I want is instead:
[,1] [,2] [,3] [,4] [,5]
[1,] 1 1 3 4 5
[2,] 5 5 2 1 1
[3,] 3 4 1 3 3
[4,] 4 3 5 5 4
[5,] 2 2 4 2 2
Ideally, I would like to find a method that allows the same process to be carried out when the column vectors are words (alphabetic order).
I tried things like m[ , sort(m)] but nothing did the trick...
m[, order(m[1, ]) will order the columns by the first row. m[, order(m[1, ], m[2, ])] will order by the first row, using second row as tie-breaker. Getting fancy, m[, do.call(order, split(m, row(m)))] will order the columns by the first row, using all subsequent rows for tie-breakers. This will work character data just as well as numeric.
set.seed(47)
m = replicate(5, sample(1:5, 5, rep=F))
m
# [,1] [,2] [,3] [,4] [,5]
# [1,] 5 4 1 5 1
# [2,] 2 2 3 2 3
# [3,] 3 5 5 1 2
# [4,] 4 3 2 3 5
# [5,] 1 1 4 4 4
m[, do.call(order, split(m, row(m)))]
# [,1] [,2] [,3] [,4] [,5]
# [1,] 1 1 4 5 5
# [2,] 3 3 2 2 2
# [3,] 2 5 5 1 3
# [4,] 5 2 3 3 4
# [5,] 4 4 1 4 1
Is it possible in R to say - I want all indices from position i to the end of vector/matrix?
Say I want a submatrix from 3rd column onwards. I currently only know this way:
A = matrix(rep(1:8, each = 5), nrow = 5) # just generate some example matrix...
A[,3:ncol(A)] # get submatrix from 3rd column onwards
But do I really need to write ncol(A)? Isn't there any elegant way how to say "from the 3rd column onwards"? Something like A[,3:]? (or A[,3:...])?
Sometimes it's easier to tell R what you don't want. In other words, exclude columns from the matrix using negative indexing:
Here are two alternative ways that both produce the same results:
A[, -(1:2)]
A[, -seq_len(2)]
Results:
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 3 4 5 6 7 8
[2,] 3 4 5 6 7 8
[3,] 3 4 5 6 7 8
[4,] 3 4 5 6 7 8
[5,] 3 4 5 6 7 8
But to answer your question as asked: Use ncol to find the number of columns. (Similarly there is nrow to find the number of rows.)
A[, 3:ncol(A)]
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 3 4 5 6 7 8
[2,] 3 4 5 6 7 8
[3,] 3 4 5 6 7 8
[4,] 3 4 5 6 7 8
[5,] 3 4 5 6 7 8
For rows (not columns as per your example) then head() and tail() could be utilised.
A <- matrix(rep(1:8, each = 5), nrow = 5)
tail(A, 3)
is almost the same as
A[3:dim(A)[1],]
(the rownames/indices printed are different is all).
Those work for vectors and data frames too:
> tail(1:10, 4)
[1] 7 8 9 10
> tail(data.frame(A = 1:5, B = 1:5), 3)
A B
3 3 3
4 4 4
5 5 5
For the column versions, you could adapt tail(), but it is a bit trickier. I wonder if NROW() and NCOL() might be useful here, rather than dim()?:
> A[, 3:NCOL(A)]
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 3 4 5 6 7 8
[2,] 3 4 5 6 7 8
[3,] 3 4 5 6 7 8
[4,] 3 4 5 6 7 8
[5,] 3 4 5 6 7 8
Or flip this on its head and instead of asking R for things, ask it to drop things instead. Here is a function that encapsulates this:
give <- function(x, i, dimen = 1L) {
ind <- seq_len(i-1)
if(isTRUE(all.equal(dimen, 1L))) { ## rows
out <- x[-ind, ]
} else if(isTRUE(all.equal(dimen, 2L))) { ## cols
out <- x[, -ind]
} else {
stop("Only for 2d objects")
}
out
}
> give(A, 3)
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
[1,] 1 2 3 4 5 6 7 8
[2,] 1 2 3 4 5 6 7 8
[3,] 1 2 3 4 5 6 7 8
> give(A, 3, dimen = 2)
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 3 4 5 6 7 8
[2,] 3 4 5 6 7 8
[3,] 3 4 5 6 7 8
[4,] 3 4 5 6 7 8
[5,] 3 4 5 6 7 8
You can use the following instruction:
A[, 3:length(A[, 1])]
A dplyr readable renewed approach for the same thing:
A %>% as_tibble() %>%
select(-c(V1,V2))
A %>% as_tibble() %>%
select(V3:ncol(A))
Is it possible in R to say - I want all indices from position i to the end of vector/matrix?
Say I want a submatrix from 3rd column onwards. I currently only know this way:
A = matrix(rep(1:8, each = 5), nrow = 5) # just generate some example matrix...
A[,3:ncol(A)] # get submatrix from 3rd column onwards
But do I really need to write ncol(A)? Isn't there any elegant way how to say "from the 3rd column onwards"? Something like A[,3:]? (or A[,3:...])?
Sometimes it's easier to tell R what you don't want. In other words, exclude columns from the matrix using negative indexing:
Here are two alternative ways that both produce the same results:
A[, -(1:2)]
A[, -seq_len(2)]
Results:
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 3 4 5 6 7 8
[2,] 3 4 5 6 7 8
[3,] 3 4 5 6 7 8
[4,] 3 4 5 6 7 8
[5,] 3 4 5 6 7 8
But to answer your question as asked: Use ncol to find the number of columns. (Similarly there is nrow to find the number of rows.)
A[, 3:ncol(A)]
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 3 4 5 6 7 8
[2,] 3 4 5 6 7 8
[3,] 3 4 5 6 7 8
[4,] 3 4 5 6 7 8
[5,] 3 4 5 6 7 8
For rows (not columns as per your example) then head() and tail() could be utilised.
A <- matrix(rep(1:8, each = 5), nrow = 5)
tail(A, 3)
is almost the same as
A[3:dim(A)[1],]
(the rownames/indices printed are different is all).
Those work for vectors and data frames too:
> tail(1:10, 4)
[1] 7 8 9 10
> tail(data.frame(A = 1:5, B = 1:5), 3)
A B
3 3 3
4 4 4
5 5 5
For the column versions, you could adapt tail(), but it is a bit trickier. I wonder if NROW() and NCOL() might be useful here, rather than dim()?:
> A[, 3:NCOL(A)]
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 3 4 5 6 7 8
[2,] 3 4 5 6 7 8
[3,] 3 4 5 6 7 8
[4,] 3 4 5 6 7 8
[5,] 3 4 5 6 7 8
Or flip this on its head and instead of asking R for things, ask it to drop things instead. Here is a function that encapsulates this:
give <- function(x, i, dimen = 1L) {
ind <- seq_len(i-1)
if(isTRUE(all.equal(dimen, 1L))) { ## rows
out <- x[-ind, ]
} else if(isTRUE(all.equal(dimen, 2L))) { ## cols
out <- x[, -ind]
} else {
stop("Only for 2d objects")
}
out
}
> give(A, 3)
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
[1,] 1 2 3 4 5 6 7 8
[2,] 1 2 3 4 5 6 7 8
[3,] 1 2 3 4 5 6 7 8
> give(A, 3, dimen = 2)
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 3 4 5 6 7 8
[2,] 3 4 5 6 7 8
[3,] 3 4 5 6 7 8
[4,] 3 4 5 6 7 8
[5,] 3 4 5 6 7 8
You can use the following instruction:
A[, 3:length(A[, 1])]
A dplyr readable renewed approach for the same thing:
A %>% as_tibble() %>%
select(-c(V1,V2))
A %>% as_tibble() %>%
select(V3:ncol(A))