R - Convert matrix to vector from the bottom and row wise - r

I have the following matrix:
m <- matrix(1:9, ncol=3, byrow=TRUE)
m
[,1] [,2] [,3]
[1,] 1 2 3
[2,] 4 5 6
[3,] 7 8 9
that I need to flatten, i.e., convert to a vector.
However, instead of going along the columns:
as.vector(m)
[1] 7 4 1 8 5 2 9 6 3
I need the resulting vector to go along the rows and from the bottom and to the right, e.g.:
[1] 7 8 9 4 5 6 1 2 3
How can I do that?

1) Reverse the first dimension, tranpose and then unravel:
c(t(m[nrow(m):1, ]))
## [1] 7 8 9 4 5 6 1 2 3
2) Here is a second approach which computes the indices and then applies them. It is longer but avoids the transpose:
nr <- nrow(m)
nc <- ncol(m)
c(m[cbind(rep(nr:1, each = nc), 1:nc)])
## [1] 7 8 9 4 5 6 1 2 3
2a) A variation of (2) is to use a 1d index:
m[rep(nr:1, each = nc) + nr * (0:(nc - 1))]
## [1] 7 8 9 4 5 6 1 2 3
Note
I tried it for a 100x100 and a 1000x1000 matrix. In the first case (1) was the fastest and in the second case (2) and (2a) were the fastest thus if speed is a concern the actual dimensions seem to make a difference as to which to choose.

One option could be also using asplit():
unlist(rev(asplit(m, 1)))
[1] 7 8 9 4 5 6 1 2 3

Maybe you can use the following ways:
Solution 1:
as.vector(t(apply(m, 2, rev)))
which gives:
> as.vector(t(apply(m, 2, rev)))
[1] 7 8 9 4 5 6 1 2 3
Solution 2:
unlist(rev(data.frame(t(m))),use.names = F)
which gives:
> unlist(rev(data.frame(t(m))),use.names = F)
[1] 7 8 9 4 5 6 1 2 3

Related

Automate input for new vector

I was wondering if you had any idea what R code I could use to automate my process.
I would like to repeat "chunks" of an initial vector (Vec1). I divide the vector in groups of 4 values and repeat each group 5 times. Currently, with my bad technique, each time I add a new experiment to the analysis I have to manually create a vector to indicate which chunk I would like to repeat next. In the end I put the vector corresponding to each experiment together to get my desired output.
Vec1 <- A simple numeric vector that grows in size for each new experiment. Each new experiment extends the vector by 4 additional values.
Exp1 <- rep(Vec1 [1:4], times=5)
Exp2 <- rep(Vec1 [5:8], times=5)
Exp3 <- rep(Vec1 [9:12], times=5)
NewVector<- c(Exp1, Exp2, Exp3)
Could I use a trick to automate it?
Many thanks for the help,
Best regards,
Edouard M.
I don't know about "automate". You could write a function that takes the values 1:4 and adds multiples of 4 to it.
add_exp <- function(values = 1:4, n = 0) {
rep(values, 5) + 4 * n
}
Then add_exp() gives:
[1] 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4
And add_exp(n = 1) gives:
[1] 5 6 7 8 5 6 7 8 5 6 7 8 5 6 7 8 5 6 7 8
So you could get NewVector using:
NewVector<- c(add_exp(), add_exp(n = 1), add_exp(n = 2))
Or if you wanted to use lapply to supply the values of n:
NewVector <- unlist(lapply(0:2, function(x) add_exp(n = x)))
Using sequence:
n <- 3L # number of experiments
v <- 4L # length of vector added for each experiment
r <- 5L # number of replications
sequence(rep(v, n*r), rep(seq(1, n*v, v), each = r))
#> [1] 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 5 6 7 8 5
#> [26] 6 7 8 5 6 7 8 5 6 7 8 5 6 7 8 9 10 11 12 9 10 11 12 9 10
#> [51] 11 12 9 10 11 12 9 10 11 12

Generate a vector of sequence greater than 1 but less than n in r

How do I generate a vector of sequence in this range 1<i<n that is the number contained in the vector will be a positive integer greater than 1, but less than n.
Here is what I tried bellow:
n <- 10
my_seq <- seq(from => 1, to =< n)
It gave me this error:
Error: unexpected '>' in "my_seq <- seq(from =>"
my expected output should be
[1] 2 3 4 5 6 7 8 9
Depending on which type of vectors you need. Below are some examples:
If you want to have ascend sequence (without duplicates)
seq(n-2)+1
# [1] 2 3 4 5 6 7 8 9
If you want to shuffle the values 2 to n-2:
sample(n-2)+1
# [1] 6 7 9 5 8 4 2 3
If you need random integers that allow duplicates
sample(n-2,replace = TRUE)+1
# [1] 5 2 8 9 4 3 6 9
You could generate the sequence using
n <- 10
2:(n-1)
#[1] 2 3 4 5 6 7 8 9
OR
seq(2, n - 1)
You can also do:
tail(head(1:n, -1), -1)
[1] 2 3 4 5 6 7 8 9

repeat sequences from vector

Say I have a vector like so:
vector <- 1:9
#$ [1] 1 2 3 4 5 6 7 8 9
I now want to repeat every i to i+x sequence n times, like so for x=3, and n=2:
#$ [1] 1 2 3 1 2 3 4 5 6 4 5 6 7 8 9 7 8 9
I'm accomplishing this like so:
index <- NULL
x <- 3
n <- 2
for (i in 1:(length(vector)/3)) {
index <- c(index, rep(c(1:x + (i-1)*x), n))
}
#$ [1] 1 2 3 1 2 3 4 5 6 4 5 6 7 8 9 7 8 9
This works just fine, but I have a hunch there's got to be a better way (especially since usually, a for loop is not the answer).
Ps.: the use case for this is actually repeating rows in a dataframe, but just getting the index vector would be fine.
You can try to first split the vector, then use rep and unlist:
x <- 3 # this is the length of each subset sequence from i to i+x (see above)
n <- 2 # this is how many times you want to repeat each subset sequence
unlist(lapply(split(vector, rep(1:(length(vector)/x), each = x)), rep, n), use.names = FALSE)
# [1] 1 2 3 1 2 3 4 5 6 4 5 6 7 8 9 7 8 9
Or, you can try creating a matrix and converting it to a vector:
c(do.call(rbind, replicate(n, matrix(vector, ncol = x), FALSE)))
# [1] 1 2 3 1 2 3 4 5 6 4 5 6 7 8 9 7 8 9

Finding unique combinations irrespective of position [duplicate]

This question already has answers here:
pair-wise duplicate removal from dataframe [duplicate]
(4 answers)
Closed 6 years ago.
I'm sure it's something simple, but I have a data frame
df <- data.frame(a = c(1, 2, 3),
b = c(2, 3, 1),
c = c(3, 1, 4))
And I want a new data frame that contains the unique combinations of values in the rows, irrespective of which column they're in. So in the case above I'd want
a b c
1 2 3
3 1 4
I've tried
unique(df[c('a', 'b', 'c')])
but it sees (1, 2, 3) as unique from (2, 3, 1), which I don't want.
Maybe something like that
indx <- !duplicated(t(apply(df, 1, sort))) # finds non - duplicates in sorted rows
df[indx, ] # selects only the non - duplicates according to that index
# a b c
# 1 1 2 3
# 3 3 1 4
If your data.frame is quite big, the speed may be a matter for you. You can find duplicated sets much faster with the following idea.
Let's imaginary assign each possible value in rows a prime number and count products for each row. For example, for given df we can accept primenums = c(2,3,5,7) and count products c(30,30,70). Then duplicates in this products-vector correspond to duplicated sets in our data.frame. As multiplication is being computed much faster then any kinds of sorting, you can gain efficiency.
The code is following.
require("numbers")
primenums <- Primes(100)[1:4]
dfmult <- apply(as.matrix(df), 1, function(z) prod(primenums[z]) )
my_indx <- !duplicated(dfmult)
df[my_indx,]
Here we initialize vector primenums with the help of function Primes from package numbers, but you can do manually in other way.
Take a look at the example. Here I show comparison of efficiency.
require("numbers")
# generate all unique combinations 10 out of 20
allcomb <- t(combn(20,10))
# make sample of 1 million rows
set.seed(789)
df <- allcomb[sample(nrow(allcomb), 1e6, T),]
# lets sort matrix to show we have duplicates
df <- df[do.call(order, lapply(1:ncol(df), function(i) df[, i])), ]
head(df, 10)
# [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
# [1,] 1 2 3 4 5 6 7 8 9 10
# [2,] 1 2 3 4 5 6 7 8 9 10
# [3,] 1 2 3 4 5 6 7 8 9 10
# [4,] 1 2 3 4 5 6 7 8 9 10
# [5,] 1 2 3 4 5 6 7 8 9 11
# [6,] 1 2 3 4 5 6 7 8 9 11
# [7,] 1 2 3 4 5 6 7 8 9 11
# [8,] 1 2 3 4 5 6 7 8 9 11
# [9,] 1 2 3 4 5 6 7 8 9 11
# [10,] 1 2 3 4 5 6 7 8 9 11
# to be fair need to permutate numbers in rows before searching for identical sets
df <- t(apply(df, 1, function(z) z[sample(10,10)] ))
df <- as.data.frame(df)
names(df) <- letters[1:10]
# how does it look like now?
head(df, 10)
# a b c d e f g h i j
# 1 2 3 7 9 10 1 4 8 5 6
# 2 4 2 6 3 8 10 9 1 5 7
# 3 4 2 6 8 5 1 10 7 3 9
# 4 6 8 5 4 2 1 10 9 7 3
# 5 11 2 7 6 8 1 9 4 5 3
# 6 9 6 3 11 4 2 8 7 5 1
# 7 5 2 3 11 1 8 6 9 7 4
# 8 3 9 7 1 2 5 4 8 11 6
# 9 6 2 8 3 4 1 11 5 9 7
# 10 4 6 3 9 7 2 1 5 11 8
# now lets shuffle rows to make df more plausible
df <- df[sample(nrow(df), nrow(df)),]
Now when data.frame is ready we can test different algorithms.
system.time(indx <- !duplicated(t(apply(df, 1, sort))) )
# user system elapsed
# 119.75 0.06 120.03
# doesn't impress, frankly speaking
library(sets)
system.time(indx <- !duplicated(apply(df, 1, as.set)) )
# user system elapsed
# 91.60 0.00 91.89
# better, but we want faster! =)
# now lets check out the method with prime numbers
primenums <- Primes(100)[1:20]
# [1] 2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71
system.time({
dfmult <- apply(as.matrix(df), 1, function(z) prod(primenums[z]) )
my_indx <- !duplicated(dfmult) })
# user system elapsed
# 6.44 0.16 6.61
# not bad, isn't it? but lets compare results
identical(indx, my_indx)
# [1] TRUE
# So, if there is no difference, why wait more? ;)
There is one important assumption here - we use as.matrix(df), but what if there are not only numeric variables in our data.frame? A more unificated solution will be as follows:
system.time({
dfmult <- apply(
apply(df, 2, function(colmn) as.integer(factor(colmn,
levels = unique(c(as.matrix(df)))))),
1, function(z) prod(primenums[z]) )
my_indx <- !duplicated(dfmult) })
# user system elapsed
# 27.48 0.34 27.84
# is distinctly slower but still much faster then previous methods
And what about if we have very much columns or very much different variables? In this case instead of prod() we can use sum(log()) (which is being computed probably even faster for large numbers). Take a look at this.
pr <- Primes(5e7)
length(pr)
# [1] 3001134
system.time(N <- sum(log(pr)))
# user system elapsed
# 0.12 0.00 0.13
N
# [1] 49993718
It's hard to imagine df with 3 mln columns, but here it's ok. This way allows us to carry df of any incredibly huge size with as many columns our RAM can hold.
As an alternative approach, the package sets provides a fast way of checking for set equality:
library(sets)
df.sets <- apply(df, 1, as.set)
#[[1]]
#{1, 2, 3}
#[[2]]
#{1, 2, 3}
#[[3]]
#{1, 3, 4}
df[!duplicated(df.sets),]
# a b c
#1 1 2 3
#3 3 1 4

R: create a data frame out of a rolling window

Lets say I have a data frame with the following structure:
DF <- data.frame(x = 0:4, y = 5:9)
> DF
x y
1 0 5
2 1 6
3 2 7
4 3 8
5 4 9
what is the most efficient way to turn 'DF' into a data frame with the following structure:
w x y
1 0 5
1 1 6
2 1 6
2 2 7
3 2 7
3 3 8
4 3 8
4 4 9
Where w is a length 2 window rolling through the dataframe 'DF.' The length of the window should be arbitrary, i.e a length of 3 yields
w x y
1 0 5
1 1 6
1 2 7
2 1 6
2 2 7
2 3 8
3 2 7
3 3 8
3 4 9
I am a bit stumped by this problem, because the data frame can also contain an arbitrary number of columns, i.e. w,x,y,z etc.
/edit 2: I've realized edit 1 is a bit unreasonable, as xts doesn't seem to deal with multiple observations per data point
My approach would be to use the embed function. The first thing to do is to create a rolling sequence of indices into a vector. Take a data-frame:
df <- data.frame(x = 0:4, y = 5:9)
nr <- nrow(df)
w <- 3 # window size
i <- 1:nr # indices of the rows
iw <- embed(i,w)[, w:1] # matrix of rolling-window indices of length w
> iw
[,1] [,2] [,3]
[1,] 1 2 3
[2,] 2 3 4
[3,] 3 4 5
wnum <- rep(1:nrow(iw),each=w) # window number
inds <- i[c(t(iw))] # the indices flattened, to use below
dfw <- sapply(df, '[', inds)
dfw <- transform(data.frame(dfw), w = wnum)
> dfw
x y w
1 0 5 1
2 1 6 1
3 2 7 1
4 1 6 2
5 2 7 2
6 3 8 2
7 2 7 3
8 3 8 3
9 4 9 3

Resources