Split a numeric vector into continuous chunks in R [duplicate] - r

This question already has answers here:
Collapse continuous integer runs to strings of ranges
(6 answers)
Closed 9 years ago.
If I have a numeric vector [1 2 3 4 7 8 9 10 15 16 17], how can I split it so that I have multiple vectors returned that separate the continuous elements of that vector? I.e. [1 2 3 4] [7 8 9 10] [15 16 17]. I've found an answer of how to do this in matlab, but I only use R.
Thanks.

Here's another alternative:
vec <- c( 1, 2, 3, 4, 7, 8, 9, 10, 15, 16, 17 )
split(vec, cumsum(seq_along(vec) %in% (which(diff(vec)>1)+1)))
# $`0`
# [1] 1 2 3 4
#
# $`1`
# [1] 7 8 9 10
#
# $`2`
# [1] 15 16 17

Another option:
split(vec, cummax(c(1,diff(vec))))
Result
$`1`
[1] 1 2 3 4
$`3`
[1] 7 8 9 10
$`5`
[1] 15 16 17

Related

map() into an argument that is not the first argument

I have a function that takes multiple arguments (simple reproducible example below):
return_numbers <- function(first = 1, last = 10){
seq(first, last)
}
If I then have a vector that I want to map(), for example:
x <- c(5, 6, 7)
It's quite easy to map() the vector x into the first argument of the function:
map(x, return_numbers)
[[1]]
[1] 5 6 7 8 9 10
[[2]]
[1] 6 7 8 9 10
[[3]]
[1] 7 8 9 10
But I can't work out how to map x into the second argument (last = ).
I referred to Hadley Wickham's Advanced R:
https://adv-r.hadley.nz/functionals.html#change-argument
and tried this, but I must be doing something wrong:
map(x, ~ return_numbers(x, last = .x))
My desired output would be:
[[1]]
[1] 1 2 3 4 5
[[2]]
[1] 1 2 3 4 5 6
[[3]]
[1] 1 2 3 4 5 6 7
This should work:
map(x, ~return_numbers(last = .))
You can also mention the first argument explicitly :
return_numbers <- function(first = 1, last = 10){
seq(first, last)
}
x <- c(5, 6, 7)
purrr::map(x, return_numbers, first=1)
#> [[1]]
#> [1] 1 2 3 4 5
#>
#> [[2]]
#> [1] 1 2 3 4 5 6
#>
#> [[3]]
#> [1] 1 2 3 4 5 6 7
Created on 2019-11-10 by the reprex package (v0.3.0)

How to implement extract/separate functions (from dplyr and tidyr) to separate a column into multiple columns. based on arbitrary values?

I have a column:
Y = c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20)
I would like to split into multiple columns, based on the positions of the column values. For instance, I would like:
Y1=c(1,2,3,4,5)
Y2=c(6,7,8,9,10)
Y3=c(11,12,13,14,15)
Y4=c(16,17,18,19,20)
Since I am working with a big data time series set, the divisions will be arbitrary depending on the length of one time period.
You can use the base split to split this vector into vectors that are each 5 items long. You could also use a variable to store this interval length.
Using rep with each = 5, and creating a sequence programmatically, gets you a sequence of the numbers 1, 2, ... up to the length divided by 5 (in this case, 4), each 5 times consecutively. Then split returns a list of vectors.
It's worth noting that a variety of SO posts will recommend you store similar data in lists such as this, rather than creating multiple variables, so I'm leaving it in list form here.
Y <- 1:20
breaks <- rep(1:(length(Y) / 5), each = 5)
split(Y, breaks)
#> $`1`
#> [1] 1 2 3 4 5
#>
#> $`2`
#> [1] 6 7 8 9 10
#>
#> $`3`
#> [1] 11 12 13 14 15
#>
#> $`4`
#> [1] 16 17 18 19 20
Created on 2019-02-12 by the reprex package (v0.2.1)
Not a dplyr solution, but I believe the easiest way would involve using matrices.
foo = function(data, sep.in=5) {
data.matrix = matrix(data,ncol=5)
data.df = as.data.frame(data.matrix)
return(data.df)
}
I have not tested it but this function should create a data.frame who can be merge to a existing one using cbind()
We can make use of split (writing the commented code as solution) to split the vector into a list of vectors.
lst <- split(Y, as.integer(gl(length(Y), 5, length(Y))))
lst
#$`1`
#[1] 1 2 3 4 5
#$`2`
#[1] 6 7 8 9 10
#$`3`
#[1] 11 12 13 14 15
#$`4`
#[1] 16 17 18 19 20
Here, the gl create a grouping index by specifying the n, k and length parameters where n - an integer giving the number of levels, k - an integer giving the number of replications, and length -an integer giving the length of the result.
In our case, we want to have 'k' as 5.
as.integer(gl(length(Y), 5, length(Y)))
#[1] 1 1 1 1 1 2 2 2 2 2 3 3 3 3 3 4 4 4 4 4
If we want to have multiple objects in the global environment, use list2env
list2env(setNames(lst, paste0("Y", seq_along(lst))), envir = .GlobalEnv)
Y1
#[1] 1 2 3 4 5
Y2
#[1] 6 7 8 9 10
Y3
#[1] 11 12 13 14 15
Y4
#[1] 16 17 18 19 20
Or as the OP mentioned dplyr/tidyr in the question, we can use those packages as well
library(tidyverse)
tibble(Y) %>%
group_by(grp = (row_number()-1) %/% 5 + 1) %>%
summarise(Y = list(Y)) %>%
pull(Y)
#[[1]]
#[1] 1 2 3 4 5
#[[2]]
#[1] 6 7 8 9 10
#[[3]]
#[1] 11 12 13 14 15
#[[4]]
#[1] 16 17 18 19 20
data
Y <- 1:20

Create all possible combinations from two values for each element in a vector in R [duplicate]

This question already has answers here:
How to generate a matrix of combinations
(3 answers)
Closed 6 years ago.
I have been trying to create vectors where each element can take two different values present in two different vectors.
For example, if there are two vectors a and b, where a is c(6,2,9) and b is c(12,5,15) then the output should be 8 vectors given as follows,
6 2 9
6 2 15
6 5 9
6 5 15
12 2 9
12 2 15
12 5 9
12 5 15
The following piece of code works,
aa1 <- c(6,12)
aa2 <- c(2,5)
aa3 <- c(9,15)
for(a1 in 1:2)
for(a2 in 1:2)
for(a3 in 1:2)
{
v <- c(aa1[a1],aa2[a2],aa3[a3])
print(v)
}
But I was wondering if there was a simpler way to do this instead of writing several for loops which will also increase linearly with the number of elements the final vector will have.
expand.grid is a function that makes all combinations of whatever vectors you pass it, but in this case you need to rearrange your vectors so you have a pair of first elements, second elements, and third elements so the ultimate call is:
expand.grid(c(6, 12), c(2, 5), c(9, 15))
A quick way to rearrange the vectors in base R is Map, the multivariate version of lapply, with c() as the function:
a <- c(6, 2, 9)
b <- c(12, 5, 15)
Map(c, a, b)
## [[1]]
## [1] 6 12
##
## [[2]]
## [1] 2 5
##
## [[3]]
## [1] 9 15
Conveniently expand.grid is happy with either individual vectors or a list of vectors, so we can just call:
expand.grid(Map(c, a, b))
## Var1 Var2 Var3
## 1 6 2 9
## 2 12 2 9
## 3 6 5 9
## 4 12 5 9
## 5 6 2 15
## 6 12 2 15
## 7 6 5 15
## 8 12 5 15
If Map is confusing you, if you put a and b in a list, purrr::transpose will do the same thing, flipping from a list of two elements of length three to a list of three elements of length two:
library(purrr)
list(a, b) %>% transpose() %>% expand.grid()
and return the same thing.
I think what you're looking for is expand.grid.
a <- c(6,2,9)
b <- c(12,5,15)
expand.grid(a,b)
Var1 Var2
1 6 12
2 2 12
3 9 12
4 6 5
5 2 5
6 9 5
7 6 15
8 2 15
9 9 15

How can I add vector elements to corresponding vectors in lists?

I have a vectors of variable length in lists and a vector, somewhat like this:
set.seed(0)
x <- lapply(as.list(sample(1:10, 10, repl=TRUE)),
function(x) sample(1:10, x, repl=TRUE))
y <- sample(1:10, 10, repl=TRUE)
I need to add each element of y to a corresponding vector in x. Currently I accomplish this as so:
newList <- list()
for (i in seq_along(y)) {
newList <- c(newList, list(y[i] + x[[i]]))
}
> x[1:2]
[[1]]
[1] 1 3 2 7 4 8 5 8 10
[[2]]
[1] 4 8 10
> y[1:2]
[1] 4 8
> newList
[[1]]
[1] 5 7 6 11 8 12 9 12 14
[[2]]
[1] 12 16 18
[[3]]
[1] 13 17 12 13
...
Is there a better way, perhaps using a lapply-like function?
This is very similar to previous questions, which use Map or mapply to operate on two lists/vectors of the same length in tandem:
How do I apply an index vector over a list of vectors?
Add respective dataframes in list together in R
For this specific case, try:
Map("+",x,y)
#[[1]]
#[1] 5 7 6 11 8 12 9 12 14
#
#[[2]]
#[1] 12 16 18
#
#[[3]]
#[1] 13 17 12 13

How to make a sequences from a range

I am trying to make a sequences from a range from the output of range.
> range(wines$quality)
[1] 3 8
> seq(3, 8)
[1] 3 4 5 6 7 8
> seq(range(wines$quality))
[1] 1 2
but I am trying to get put the output of range 3, 8 into seq to get the list of 3, 4, 5, 6, 7 ,8 why is giving me a list with 1 2? How do a make it behave as I want?
Another option:
do.call(seq, as.list(range(wines$quality)))
# [1] 3 4 5 6 7 8
You problem right now is you are passing a two element vector as one argument, when seq expects two one element arguments in order for it to do what you want.
do.call calls seq with each of the items in as.list... as an argument.
I am sure there is a fancier way to do it but why not just:
x <- range(wine$quality)
seq(x[1], x[2])
Some possible solutions, though the eval parse is more fooling around:
set.seed(10)
x <- rpois(20, 10)
y <- range(x); y[1]:y[2]
seq(y[1], y[2])
eval(parse(text = paste(range(x), collapse=":")))
## > y <- range(x); y[1]:y[2]
## [1] 5 6 7 8 9 10 11 12 13 14 15
## > seq(y[1], y[2])
## [1] 5 6 7 8 9 10 11 12 13 14 15
## > eval(parse(text = paste(range(x), collapse=":")))
## [1] 5 6 7 8 9 10 11 12 13 14 15

Resources