Create integer sequences defined by 'from' and 'to' vectors - r

I have two vectors which define start (from) indices and finish (to) indices:
Start = c(1, 10, 20)
Finish = c(9, 19, 30)
I want to create a list of all Start:Finish sequences along the two vectors, i.e. generate the sequences Start[1]:Finish[1] (1:9); Start[2]:Finish[2], and so on.
## [[1]]
## [1] 1 2 3 4 5 6 7 8 9
##
## [[2]]
## [1] 10 11 12 13 14 15 16 17 18 19
##
## [[3]]
## [1] 20 21 22 23 24 25 26 27 28 29 30
Preferably in some vectorized way. The values in 'Start' vector will always be larger than the corresponding elements in 'Finish' vector.

Just use mapply:
Start = c(1,10,20)
Finish = c(9,19,30)
mapply(":", Start, Finish)
## [[1]]
## [1] 1 2 3 4 5 6 7 8 9
##
## [[2]]
## [1] 10 11 12 13 14 15 16 17 18 19
##
## [[3]]
## [1] 20 21 22 23 24 25 26 27 28 29 30
##
You could, of course, also use Vectorize, but that's just a wrapper for mapply. However, Vectorize cannot be used with primitive functions, so you'll have to specify seq.default rather than seq, or seq.int.
Example:
Vectorize(seq.default)(Start, Finish)
## [[1]]
## [1] 1 2 3 4 5 6 7 8 9
##
## [[2]]
## [1] 10 11 12 13 14 15 16 17 18 19
##
## [[3]]
## [1] 20 21 22 23 24 25 26 27 28 29 30
##

Agree with #ColonelBeauvel and #nicola, though you could use seq instead of :, hence
Start = c(1, 10, 20)
Finish = c(9, 19, 30)
Map(seq, Start, Finish)

Related

Generate a list from a vector in r

I have a vector like a. I would like to generate a list b from a. I only typed the first 4 components of the list. Any suggestions of quick ways to achieve this be appreciated:
a <- seq(from =0, to = 359, by = 8)
b <- list(c(0:7), c(8:(8+7)), c(16:(16+7)), c(24:(24+7)))
> b
[[1]]
[1] 0 1 2 3 4 5 6 7
[[2]]
[1] 8 9 10 11 12 13 14 15
[[3]]
[1] 16 17 18 19 20 21 22 23
[[4]]
[1] 24 25 26 27 28 29 30 31
You can create a sequence from min value of a to max and then use findInterval or cut to split the sequence based on intervals.
tmp <- seq(min(a), max(a))
split(tmp, findInterval(tmp, a))
#$`1`
#[1] 0 1 2 3 4 5 6 7
#$`2`
#[1] 8 9 10 11 12 13 14 15
#$`3`
#[1] 16 17 18 19 20 21 22 23
#$`4`
#[1] 24 25 26 27 28 29 30 31
#$`5`
#[1] 32 33 34 35 36 37 38 39
#...
Another way using Map :
Map(seq, a[-length(a)], a[-1] - 1)
This will achieve the desired result
list1 <- list()
for (i in 1:45) {
base=i*8-8
list1[[i]] <- base + 1:8
}

Finding the right package in R for cluster analysis

I'm trying to find a package in R where I can find clusters that exceed a given threshold in a dataset.
What I want to know is the the cluster duration/size and the individual values of each cluster.
For example (a simple one):
I have a vector of data,
10 8 6 14 14 7 14 5 11 12 8 11 11 16 20 6 8 8 6 15
The clusters that are larger than 9 are defined in bold,
10 8 6 14 14 7 14 5 11 12 8 11 11 16 20 6 8 8 6 15
So here the cluster sizes in order are,
1, 2, 1, 2, 4, 1
What I want R to do is return the clusters in separate ordered vectors, e.g.
[1] 10
[2] 14 14
[3] 14
[4] 11 12
[5] 11 11 16 20
[6] 15
Is there such a package or also a piece of code with if statements for example would also help.
Cheers
The data.table::rleid function works well for this:
Filter(function(a) a[1] > 9, split(vec, data.table::rleid(vec > 9)))
# $`1`
# [1] 10
# $`3`
# [1] 14 14
# $`5`
# [1] 14
# $`7`
# [1] 11 12
# $`9`
# [1] 11 11 16 20
# $`11`
# [1] 15
If you'd prefer to not load the data.table package just for that, then a base-R approach from https://stackoverflow.com/a/33509966:
myrleid <- function(x) {
rl <- rle(x)$lengths
rep(seq_along(rl), times = rl)
}
Filter(function(a) a[1] > 9, split(vec, myrleid(vec > 9)))

R: Creating a single vector using elements from two separate vectors [duplicate]

I have two vectors which define start (from) indices and finish (to) indices:
Start = c(1, 10, 20)
Finish = c(9, 19, 30)
I want to create a list of all Start:Finish sequences along the two vectors, i.e. generate the sequences Start[1]:Finish[1] (1:9); Start[2]:Finish[2], and so on.
## [[1]]
## [1] 1 2 3 4 5 6 7 8 9
##
## [[2]]
## [1] 10 11 12 13 14 15 16 17 18 19
##
## [[3]]
## [1] 20 21 22 23 24 25 26 27 28 29 30
Preferably in some vectorized way. The values in 'Start' vector will always be larger than the corresponding elements in 'Finish' vector.
Just use mapply:
Start = c(1,10,20)
Finish = c(9,19,30)
mapply(":", Start, Finish)
## [[1]]
## [1] 1 2 3 4 5 6 7 8 9
##
## [[2]]
## [1] 10 11 12 13 14 15 16 17 18 19
##
## [[3]]
## [1] 20 21 22 23 24 25 26 27 28 29 30
##
You could, of course, also use Vectorize, but that's just a wrapper for mapply. However, Vectorize cannot be used with primitive functions, so you'll have to specify seq.default rather than seq, or seq.int.
Example:
Vectorize(seq.default)(Start, Finish)
## [[1]]
## [1] 1 2 3 4 5 6 7 8 9
##
## [[2]]
## [1] 10 11 12 13 14 15 16 17 18 19
##
## [[3]]
## [1] 20 21 22 23 24 25 26 27 28 29 30
##
Agree with #ColonelBeauvel and #nicola, though you could use seq instead of :, hence
Start = c(1, 10, 20)
Finish = c(9, 19, 30)
Map(seq, Start, Finish)

Unexpected behavior of order(x, na.last = FALSE) [duplicate]

i am having trouble understanding the difference between the R function rank and the R function order. they seem to produce the same output:
> rank(c(10,30,20,50,40))
[1] 1 3 2 5 4
> order(c(10,30,20,50,40))
[1] 1 3 2 5 4
Could somebody shed some light on this for me?
Thanks
set.seed(1)
x <- sample(1:50, 30)
x
# [1] 14 19 28 43 10 41 42 29 27 3 9 7 44 15 48 18 25 33 13 34 47 39 49 4 30 46 1 40 20 8
rank(x)
# [1] 9 12 16 25 7 23 24 17 15 2 6 4 26 10 29 11 14 19 8 20 28 21 30 3 18 27 1 22 13 5
order(x)
# [1] 27 10 24 12 30 11 5 19 1 14 16 2 29 17 9 3 8 25 18 20 22 28 6 7 4 13 26 21 15 23
rank returns a vector with the "rank" of each value. the number in the first position is the 9th lowest. order returns the indices that would put the initial vector x in order.
The 27th value of x is the lowest, so 27 is the first element of order(x) - and if you look at rank(x), the 27th element is 1.
x[order(x)]
# [1] 1 3 4 7 8 9 10 13 14 15 18 19 20 25 27 28 29 30 33 34 39 40 41 42 43 44 46 47 48 49
As it turned out this was a special case and made things confusing. I explain below for anyone interested:
rank returns the order of each element in an ascending list
order returns the index each element would have in an ascending list
I always find it confusing to think about the difference between the two, and I always think, "how can I get to order using rank"?
Starting with Justin's example:
Order using rank:
## Setup example to match Justin's example
set.seed(1)
x <- sample(1:50, 30)
## Make a vector to store the sorted x values
xx = integer(length(x))
## i is the index, ir is the ith "rank" value
i = 0
for(ir in rank(x)){
i = i + 1
xx[ir] = x[i]
}
all(xx==x[order(x)])
[1] TRUE
rank is more complicated and not neccessarily an index (integer):
> rank(c(1))
[1] 1
> rank(c(1,1))
[1] 1.5 1.5
> rank(c(1,1,1))
[1] 2 2 2
> rank(c(1,1,1,1))
[1] 2.5 2.5 2.5 2.5
In layman's language, order gives the actual place/position of a value after sorting the values
For eg:
a<-c(3,4,2,7,8,5,1,6)
sort(a) [1] 1 2 3 4 5 6 7 8
The position of 1 in a is 7. similarly position of 2 in a is 3.
order(a) [1] 7 3 1 2 6 8 4 5
as is stated by ?order() in R prompt,
order just return a permutation which sort the original vector into ascending/descending order.
suppose that we have a vector
A<-c(1,4,3,6,7,4);
A.sort<-sort(A);
then
order(A) == match(A.sort,A);
rank(A) == match(A,A.sort);
besides, i find that order has the following property(not validated theoratically):
1 order(A)∈(1,length(A))
2 order(order(order(....order(A)....))):if you take the order of A in odds number of times, the results remains the same, so as to even number of times.
some observations:
set.seed(0)
x<-matrix(rnorm(10),1)
dm<-diag(length(x))
# compute rank from order and backwards:
rank(x) == col(x)%*%dm[order(x),]
order(x) == col(x)%*%dm[rank(x),]
# in special cases like this
x<-cumsum(rep(c(2,0),times=5))+rep(c(0,-1),times=5)
# they are equal
order(x)==rank(x)

R Append a list wih assign

I want to define a list that will depend on the loop sequence and append this list with another list
for (i in 4:4) {
nam <- paste0("estim",i)
assign(nam, list(1:10))
assign(paste0(nam,"[2]"),list(11:40))
}
##estim4
##[[1]]
##[1] 1 2 3 4 5 6 7 8 9 10
desired output
## estim4
## [[1]]
## [1] 1 2 3 4 5 6 7 8 9 10
## [[2]]
## [1] 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35
## [26] 36 37 38 39 40
any help please?
update
As mentioned by #nicola below this method is difficult to implement and granted the desired output is not exactly what I was asking for. Sorry for the imprecise question.
l <- list()
for (i in 1:1) {
l[i] <- list(list(1:10));
l[[i]][2] <- list(list(letters[1:4]))
}
## l
## [[1]]
## [[1]][[1]]
## [1] 1 2 3 4 5 6 7 8 9 10
## [[1]][[2]]
## [[1]][[2]][[1]]
## [1] "a" "b" "c" "d"
Is there a simpler way to carry variable around?

Resources