Remove outliers based on a preceding value - r

How to remove outliers using a criterion that a value cannot be more than 2-fold higher then its preceding one.
Here is my try:
x<-c(1,2,6,4,10,20,50,10,2,1)
remove_outliers <- function(x, na.rm = TRUE, ...) {
for(i in 1:length(x))
x < (x[i-1] + 2*x)
x
}
remove_outliers(y)
expected outcome: 1,2,4,10,20,2,1
Thanks!

I think the first 10 should be removed in your data because 10>2*4. Here's a way to do what you want without loops. I'm using the dplyr version of lag.
library(dplyr)
x<-c(1,2,6,4,10,20,50,10,2,1)
x[c(TRUE,na.omit(x<=dplyr::lag(x)*2))]
[1] 1 2 4 20 10 2 1
EDIT
To use this with a data.frame:
df <- data.frame(id=1:10, x=c(1,2,6,4,10,20,50,10,2,1))
df[c(TRUE,na.omit(df$x<=dplyr::lag(df$x,1)*2)),]
id x
1 1 1
2 2 2
4 4 4
6 6 20
8 8 10
9 9 2
10 10 1

A simple sapply:
bool<-sapply(seq_along(1:length(x)),function(i) {ifelse(x[i]<2*x[i-1],FALSE,TRUE)})
bool
[[1]]
logical(0)
[[2]]
[1] TRUE
[[3]]
[1] TRUE
[[4]]
[1] FALSE
[[5]]
[1] TRUE
[[6]]
[1] TRUE
[[7]]
[1] TRUE
[[8]]
[1] FALSE
[[9]]
[1] FALSE
[[10]]
[1] FALSE
resulting in:
x[unlist(bool)]
[1] 1 2 4 10 20 1

Related

Remove nested loops in R with map or apply

I have a list of vectors in R:
test_list <- list()
test_list[[1]] <- c(1,2,3,4,5,6,7)
test_list[[2]] <- c(1,2,3)
test_list[[3]] <- c(6,7)
test_list[[4]] <- c(9,10,11)
And I want to check the intersection of each vector with all the other vectors. A nested loop approach would look like this:
for(i in test_list) {
for(j in test_list) {
intersect(i, j)
}
}
And the results would look like this:
[1] 1 2 3 4 5 6 7
[1] 1 2 3
[1] 6 7
numeric(0)
[1] 1 2 3
[1] 1 2 3
numeric(0)
numeric(0)
[1] 6 7
numeric(0)
[1] 6 7
numeric(0)
numeric(0)
numeric(0)
numeric(0)
[1] 9 10 11
I have seen that I can remove one of the foor loops using map or apply:
get_overlap_cells <- function(x) {
for(i in test_list) {
overlaping_cells <- intersect(i, x)
}
}
r <- map(test_list, get_overlap_cells)
However, I would like to remove both loops, any ideas on how to achieve this?
Thank you!
combins <- expand.grid(seq_along(test_list), seq_along(test_list))
mapply( function(x,y) intersect(test_list[[x]],test_list[[y]]),
combins[,1], combins[,2])
[[1]]
[1] 1 2 3 4 5 6 7
[[2]]
[1] 1 2 3
[[3]]
[1] 6 7
[[4]]
numeric(0)
[[5]]
[1] 1 2 3
[[6]]
[1] 1 2 3
[[7]]
numeric(0)
[[8]]
numeric(0)
[[9]]
[1] 6 7
[[10]]
numeric(0)
[[11]]
[1] 6 7
[[12]]
numeric(0)
[[13]]
numeric(0)
[[14]]
numeric(0)
[[15]]
numeric(0)
[[16]]
[1] 9 10 11

Get subsets between one element and the previous same element

Consider a vector:
vec <- c(1, 3, 4, 3, 3, 1, 1)
I'd like to get, for each element of the vector, a subset of the values in between the nth element and its previous occurrence.
The expected output is:
f(vec)
# [[1]]
# [1] 1
#
# [[2]]
# [1] 3
#
# [[3]]
# [1] 4
#
# [[4]]
# [1] 3 4 3
#
# [[5]]
# [1] 3 3
#
# [[6]]
# [1] 1 3 4 3 3 1
#
# [[7]]
# [1] 1 1
We may loop over the sequence of the vector, get the index of the last match of the same element ('i1') from the previous elements of the vector and get the sequence (:) to subset the vector
lapply(seq_along(vec), function(i) {
i1 <- tail(which(vec[1:(i-1)] == vec[i]), 1)[1]
i1[is.na(i1)] <- i
vec[i1:i]
})
-output
[[1]]
[1] 1
[[2]]
[1] 3
[[3]]
[1] 4
[[4]]
[1] 3 4 3
[[5]]
[1] 3 3
[[6]]
[1] 1 3 4 3 3 1
[[7]]
[1] 1 1

How to use "for loop" to get couples of indexes every time in R?

Let's assume I have this vector v:
v = seq(1,30,1)
I write this simple loop:
for(i in v) {
print(i)
}
However, I would like to write a loop that gives me, in time, 1:2, 3:4, 5:6, 7:8, etc. I would then get:
[1] 1,2
[1] 3,4
[1] 5,6
[1] 7,8
...
Can anyone help me?
Thanks!
Maybe you can generate v with step of 2.
v = seq(1,30,2)
for(i in v) {
cat(paste(i, i + 1, sep = ','), '\n')
}
#1,2
#3,4
#5,6
#7,8
#9,10
#11,12
#13,14
#...
If you want to keep your approach, try this:
for(i in v[-length(v)]) {
print(c(i, i+1))
}
[1] 1 2
[1] 2 3
[1] 3 4
...
Adding i to a subset.
for(i in v) {
print(v[0:1 + i])
}
# [1] 1 2
# [1] 2 3
# [1] 3 4
# [1] ...
Alternatively you could also consider this:
cbind(v[-length(v)], v[-1])
# [,1] [,2]
# [1,] 1 2
# [2,] 2 3
# [3,] 3 4
# [4,] 4 5
# [5,] 5 6
# [6,] ...
You need to update the print command and use the method range
https://www.w3schools.com/python/ref_func_range.asp
for i in range(0,len(v)-1,2):
print(str(v[i])+","+str(v[i+1]))
In this way you should get
1,2
3,4
5,6
7,8
...
Try this
> for(i in v) if (i%%2) print(c(i,i+1))
[1] 1 2
[1] 3 4
[1] 5 6
[1] 7 8
[1] 9 10
[1] 11 12
[1] 13 14
[1] 15 16
[1] 17 18
[1] 19 20
[1] 21 22
[1] 23 24
[1] 25 26
[1] 27 28
[1] 29 30

Replacing values in a list based on a condition

I have a list of values called squares and would like to replace all values which are 0 to a 40.
I tried:
replace(squares, squares==0, 40)
but the list remains unchanged
If it is a list, then loop through the list with lapply and use replace
squares <- lapply(squares, function(x) replace(x, x==0, 40))
squares
#[[1]]
#[1] 40 1 2 3 4 5
#[[2]]
#[1] 1 2 3 4 5 6
#[[3]]
#[1] 40 1 2 3
data
squares <- list(0:5, 1:6, 0:3)
I think for this purpose, you can just treat it as if it were a vector as follows:
squares=list(2,4,6,0,8,0,10,20)
squares[squares==0]=40
Output:
[[1]]
[1] 2
[[2]]
[1] 4
[[3]]
[1] 6
[[4]]
[1] 40
[[5]]
[1] 8
[[6]]
[1] 40
[[7]]
[1] 10
[[8]]
[1] 20

Splitting numeric vectors in R

If I have a vector, c(1,2,3,5,7,9,10,12)...and another vector c(3,7,10), how would I produce the following:
[[1]]
1,2,3
[[2]]
5,7
[[3]]
9,10
[[4]]
12
Notice how 3 7 and 10 become the last number of each list element (except the last one). Or in a sense the "breakpoint". I am sure there is a simple R function I am unknowledgeable of or having loss of memory.
Here's one way using cut and split:
split(x, cut(x, c(-Inf, y, Inf)))
#$`(-Inf,3]`
#[1] 1 2 3
#
#$`(3,7]`
#[1] 5 7
#
#$`(7,10]`
#[1] 9 10
#
#$`(10, Inf]`
#[1] 12
Could do
split(x, cut(x, unique(c(y, range(x)))))
## $`[1,3]`
## [1] 1 2 3
## $`(3,7]`
## [1] 5 7
## $`(7,10]`
## [1] 9 10
## $`(10,12]`
## [1] 12
Similar to #beginneR 's answer, but using findInterval instead of cut
split(x, findInterval(x, y + 1))
# $`0`
# [1] 1 2 3
#
# $`1`
# [1] 5 7
#
# $`2`
# [1] 9 10
#
# $`3`
# [1] 12

Resources