Splitting numeric vectors in R - r

If I have a vector, c(1,2,3,5,7,9,10,12)...and another vector c(3,7,10), how would I produce the following:
[[1]]
1,2,3
[[2]]
5,7
[[3]]
9,10
[[4]]
12
Notice how 3 7 and 10 become the last number of each list element (except the last one). Or in a sense the "breakpoint". I am sure there is a simple R function I am unknowledgeable of or having loss of memory.

Here's one way using cut and split:
split(x, cut(x, c(-Inf, y, Inf)))
#$`(-Inf,3]`
#[1] 1 2 3
#
#$`(3,7]`
#[1] 5 7
#
#$`(7,10]`
#[1] 9 10
#
#$`(10, Inf]`
#[1] 12

Could do
split(x, cut(x, unique(c(y, range(x)))))
## $`[1,3]`
## [1] 1 2 3
## $`(3,7]`
## [1] 5 7
## $`(7,10]`
## [1] 9 10
## $`(10,12]`
## [1] 12

Similar to #beginneR 's answer, but using findInterval instead of cut
split(x, findInterval(x, y + 1))
# $`0`
# [1] 1 2 3
#
# $`1`
# [1] 5 7
#
# $`2`
# [1] 9 10
#
# $`3`
# [1] 12

Related

Use mapply or lapply to nested list

I want to apply a sample function to a nested list (I will call this list bb) and I also have a list of numbers (I will call this list k) to be supplied in the sample function. I would like each of the numbers in k to iterate through all the values of each list in bb. How to do this using mapply or lapply?
Here are the data:
k <- list(1,2,4,3) #this is the list of numbers to be supplied in the `sample.int` function
b1 <- list(c(1,2,3),c(2,3,4),c(3,4,5),c(4,5,6)) #The first list of bb
b2 <- list(c(1,2),c(2,3),c(3,4),c(4,5), c(5,6)) #The second list of bb
bb <- list(b1,b2) #This is list bb containing b1 and b2 whose values are to be iterated through
I created this mapply function but it didn't get the expected outcome:
mapply(function(x, y) {
x[sample.int(y,y, replace = TRUE)]
}, bb,k, SIMPLIFY = FALSE)
This only returns 10 output values but I would like each number of k to loop through all values of the two lists in bb and so there should be 10*2 outputs for the two lists in bb. I might be using mapply in the wrong way and so I would appreciate if anyone can point me to the right direction!
outer is your friend. It's normally used to calculate the outer matrix product. Consider:
outer(1:3, 2:4)
1:3 %o% 2:4 ## or
# [,1] [,2] [,3]
# [1,] 2 3 4
# [2,] 4 6 8
# [3,] 6 9 12
It also has a FUN= argument that defaults to "*". However it enables you to calculate any function over the combinations of x and y cross-wise, i.e. x[1] X y[1], x[1] X y[2], ... whereas *apply functions only calculate x[1] X y[1], x[2] X y[2], .... So let's do it:
FUN <- Vectorize(function(x, y) x[sample.int(y, y)])
set.seed(42)
res <- outer(bb, k, FUN)
res
# [,1] [,2] [,3] [,4]
# [1,] List,1 List,2 List,4 List,3
# [2,] List,1 List,2 List,4 List,3
This result looks a little weird, but we may easily unlist it.
res <- unlist(res, recursive=F)
Result
res
# [[1]]
# [1] 1 2 3
#
# [[2]]
# [1] 1 2
#
# [[3]]
# [1] 1 2 3
#
# [[4]]
# [1] 2 3 4
#
# [[5]]
# [1] 2 3
#
# [[6]]
# [1] 1 2
#
# [[7]]
# [1] 2 3 4
#
# [[8]]
# [1] 4 5 6
#
# [[9]]
# [1] 1 2 3
#
# [[10]]
# [1] 3 4 5
#
# [[11]]
# [1] 3 4
#
# [[12]]
# [1] 4 5
#
# [[13]]
# [1] 2 3
#
# [[14]]
# [1] 1 2
#
# [[15]]
# [1] 1 2 3
#
# [[16]]
# [1] 2 3 4
#
# [[17]]
# [1] 3 4 5
#
# [[18]]
# [1] 2 3
#
# [[19]]
# [1] 3 4
#
# [[20]]
# [1] 1 2
VoilĂ , 20 results.

Replacing values in a list based on a condition

I have a list of values called squares and would like to replace all values which are 0 to a 40.
I tried:
replace(squares, squares==0, 40)
but the list remains unchanged
If it is a list, then loop through the list with lapply and use replace
squares <- lapply(squares, function(x) replace(x, x==0, 40))
squares
#[[1]]
#[1] 40 1 2 3 4 5
#[[2]]
#[1] 1 2 3 4 5 6
#[[3]]
#[1] 40 1 2 3
data
squares <- list(0:5, 1:6, 0:3)
I think for this purpose, you can just treat it as if it were a vector as follows:
squares=list(2,4,6,0,8,0,10,20)
squares[squares==0]=40
Output:
[[1]]
[1] 2
[[2]]
[1] 4
[[3]]
[1] 6
[[4]]
[1] 40
[[5]]
[1] 8
[[6]]
[1] 40
[[7]]
[1] 10
[[8]]
[1] 20

How to check if the given value belong to the vectors in list?

Suppose we have a value y=4, and a list of vectors, I want to check if this value belongs to any vector in the list if yes, I will add this value to all the elements of vectors.
y<-4
M<- list( c(1,3,4,6) , c(2,3,5), c(1,3,6) ,c(1,4,5,6))
> M
[[1]]
[1] 1 3 4 6
[[2]]
[1] 2 3 5
[[3]]
[1] 1 3 6
[[4]]
[1] 1 4 5 6
The outcomes will be similar to :
> R
[[1]]
[1] 5 7 8 10
[[2]]
[1] 5 8 9 10
We can use keep which only keeps elements that satisfy a predicate. In this case, it is only keeping the vectors that contain y.
We then add y to each of the vectors.
library('tidyverse')
keep(M, ~y %in% .) %>%
map(~. + y)
Here is a simple hacky way to do this:
lapply(M[sapply(M, function(x){y %in% x})],function(x){x+y})
returning:
[[1]]
[1] 5 7 8 10
[[2]]
[1] 5 8 9 10
Logic: use sapply to work out which parts of M have a 4 in, then add 4 to those with lapply
You can do this with...
lapply(M[sapply(M, `%in%`, x=y)], `+`, y)
[[1]]
[1] 5 7 8 10
[[2]]
[1] 5 8 9 10
Here is a method with lapply and set functions.
# loop through M, check length of intersect
myList <- lapply(M, function(x) if(length(intersect(y, x)) > 0) x + y else NULL)
# now subset, dropping the NULL elements
myList <- myList[lengths(myList) > 0]
this returns
myList
[[1]]
[1] 5 7 8 10
[[2]]
[1] 5 8 9 10
Wow! everyone has given great answers, just including the use of Map functionality.
Map("+",M[unlist(Map("%in%", y,M))],y)
[[1]]
[1] 5 7 8 10
[[2]]
[1] 5 8 9 10

Remove outliers based on a preceding value

How to remove outliers using a criterion that a value cannot be more than 2-fold higher then its preceding one.
Here is my try:
x<-c(1,2,6,4,10,20,50,10,2,1)
remove_outliers <- function(x, na.rm = TRUE, ...) {
for(i in 1:length(x))
x < (x[i-1] + 2*x)
x
}
remove_outliers(y)
expected outcome: 1,2,4,10,20,2,1
Thanks!
I think the first 10 should be removed in your data because 10>2*4. Here's a way to do what you want without loops. I'm using the dplyr version of lag.
library(dplyr)
x<-c(1,2,6,4,10,20,50,10,2,1)
x[c(TRUE,na.omit(x<=dplyr::lag(x)*2))]
[1] 1 2 4 20 10 2 1
EDIT
To use this with a data.frame:
df <- data.frame(id=1:10, x=c(1,2,6,4,10,20,50,10,2,1))
df[c(TRUE,na.omit(df$x<=dplyr::lag(df$x,1)*2)),]
id x
1 1 1
2 2 2
4 4 4
6 6 20
8 8 10
9 9 2
10 10 1
A simple sapply:
bool<-sapply(seq_along(1:length(x)),function(i) {ifelse(x[i]<2*x[i-1],FALSE,TRUE)})
bool
[[1]]
logical(0)
[[2]]
[1] TRUE
[[3]]
[1] TRUE
[[4]]
[1] FALSE
[[5]]
[1] TRUE
[[6]]
[1] TRUE
[[7]]
[1] TRUE
[[8]]
[1] FALSE
[[9]]
[1] FALSE
[[10]]
[1] FALSE
resulting in:
x[unlist(bool)]
[1] 1 2 4 10 20 1

Split a vector by its sequences [duplicate]

This question already has answers here:
Create grouping variable for consecutive sequences and split vector
(5 answers)
Closed 5 years ago.
The following vector x contains the two sequences 1:4 and 6:7, among other non-sequential digits.
x <- c(7, 1:4, 6:7, 9)
I'd like to split x by its sequences, so that the result is a list like the following.
# [[1]]
# [1] 7
#
# [[2]]
# [1] 1 2 3 4
#
# [[3]]
# [1] 6 7
#
# [[4]]
# [1] 9
Is there a quick and simple way to do this?
I've tried
split(x, c(0, diff(x)))
which gets close, but I don't feel like appending 0 to the differenced vector is the right way to go. Using findInterval didn't work either.
split(x, cumsum(c(TRUE, diff(x)!=1)))
#$`1`
#[1] 7
#
#$`2`
#[1] 1 2 3 4
#
#$`3`
#[1] 6 7
#
#$`4`
#[1] 9
Just for fun, you can make use of Carl Witthoft's seqle function from his "cgwtools" package. (It's not going to be anywhere near as efficient as Roland's answer.)
library(cgwtools)
## Here's what seqle does...
## It's like rle, but for sequences
seqle(x)
# Run Length Encoding
# lengths: int [1:4] 1 4 2 1
# values : num [1:4] 7 1 6 9
y <- seqle(x)
split(x, rep(seq_along(y$lengths), y$lengths))
# $`1`
# [1] 7
#
# $`2`
# [1] 1 2 3 4
#
# $`3`
# [1] 6 7
#
# $`4`
# [1] 9

Resources