how to write code for find index of element in subset? - r

library(gRbase)
m<-combn(1:4,2,simplify=FALSE)
m
[[1]]
[1] 1 2
[[2]]
[1] 1 3
[[3]]
[1] 1 4
[[4]]
[1] 2 3
[[5]]
[1] 2 4
[[6]]
[1] 3 4
I want output for element 2 fn(2):1,4,5

res <- vector("numeric"); for( i in seq_along(m)){ if(2 %in% m[[i]]){res<-c(res,i)}}
>res
[1] 1 4 5
To encapsulate in a function:
fn <- function(lis , item){
res <- vector("numeric")
for( i in seq_along(m)){ if(2 %in% m[[i]]){res<-c(res,i)}}
res}
> fn(m,item)
[1] 1 4 5
Here's another methos which seems less obvious to me: I'm posting the sequence of intermediate steps I used to get there:
sapply(seq_along(m),function(x) match(x=2,m[[x]]) )
#[1] 2 NA NA 1 1 NA
as.logical(sapply(seq_along(m),function(x) match(x=2,m[[x]]) ) )
#[1] TRUE NA NA TRUE TRUE NA
which(as.logical(sapply(seq_along(m),function(x) match(x=2,m[[x]]) ) ) )
#[1] 1 4 5

which(vapply(m, function(x) 2 %in% x, FUN.VALUE = FALSE))
#[1] 1 4 5

Related

Convert each row of dataframe to new list in R

I have below sample input data-
> df <- data.frame(a=c(1,2,9),b=c(3,4,5),c=c(2,6,7))
> df
a b c
1 1 3 2
2 2 4 6
3 9 5 7
I am trying to convert rach row into separate list.
My Attempt-
> apply(df,1,as.list)
The above solution converts each row into sublists. But, I am looking for 3 separate list in this case.
nrow(df) = no. of lists
Desired Output-
> list1
$a
[1] 1
$b
[1] 3
$c
[1] 2
> list2
$a
[1] 2
$b
[1] 4
$c
[1] 6
> list3
$a
[1] 9
$b
[1] 5
$c
[1] 7
You can use by and as.list
out <- by(df, 1:nrow(df), as.list)
out
#1:nrow(df): 1
#$a
#[1] 1
#
#$b
#[1] 3
#$c
#[1] 2
#------------------------------------------------------------------------------
#1:nrow(df): 2
#$a
#[1] 2
#$b
#[1] 4
#$c
#[1] 6
#------------------------------------------------------------------------------
#1:nrow(df): 3
#$a
#[1] 9
#$b
#[1] 5
#$c
#[1] 7
That creates an object of class by. So you may call unclass(out) in the end.

Split to list() based on condition, omiting the False elements

What is the most elegant way to split a vector into n-Elements based on a condition?
Every separate true-block should go into its own list element. All the false elements get thrown away.
example1:
vec <- c(1:3,NA,NA,NA,4:6,NA,NA,NA,7:9,NA)
cond <- !is.na(vec)
result = list(1:3,4:6,7:9)
example2:
vec_2 <- c(3:1,11:13,6:4,14:16,9:7,20)
cond_2 <- vec_2 < 10
results_2 = list(3:1,6:4,9:7)
It would be great to have a general solution for a vector vec and a relating condition cond.
My best try:
res <- split(vec,data.table::rleidv(cond))
odd <- as.logical(seq_along(res)%%2)
res[if(cond[1])odd else !odd]
I guess this should work generally:
> split(vec[cond], data.table::rleid(cond)[cond])
$`1`
[1] 1 2 3
$`3`
[1] 4 5 6
$`5`
[1] 7 8 9
Let's make it a function:
> f <- function(vec, cond) split(vec[cond], data.table::rleid(cond)[cond])
> f(vec_2, cond_2)
$`1`
[1] 3 2 1
$`3`
[1] 6 5 4
$`5`
[1] 9 8 7
Here is a base R option with rle
grp <- with(rle(cond), rep(seq_along(values) * NA^ !values, lengths))
split(vec[cond], grp[cond])
#$`1`
#[1] 1 2 3
#$`3`
#[1] 4 5 6
#$`5`
#[1] 7 8 9
Similarly with 'vec_2'
grp <- with(rle(cond_2), rep(seq_along(values) * NA^ !values, lengths))
split(vec_2[cond_2], grp[cond_2])
#$`1`
#[1] 3 2 1
#$`3`
#[1] 6 5 4
#$`5`
#[1] 9 8 7
Or create a grouping variable with cumsum and diff
grp <- cumsum(c(TRUE, diff(cond) < 0)) * NA^ is.na(vec)

Conditional merging of two lists in R

I am trying to combine two lists that complement each other, where one contains half the set of values and the second the other half:
v1 <- c(1,2,2,4)
v2 <- c(NULL)
v3 <- c(1,2,2,4)
l1 <- list(v1,v2,v3)
v1b <- c(NULL)
v2b <- c(1,2,2,4)
v3b <- c(NULL)
l2 <- list(v1b,v2b,v3b)
> l1
[[1]]
[1] 1 2 2 4
[[2]]
NULL
[[3]]
[1] 1 2 2 4
> l2
[[1]]
NULL
[[2]]
[1] 1 2 2 4
[[3]]
NULL
The desired result is:
[[1]]
[1] 1 2 2 4
[[2]]
[1] 1 2 2 4
[[3]]
[1] 1 2 2 4
I tried several ways. This is the closest I got:
> sapply(l1, function(x) ifelse(x == "NULL", l2[[x]], x))
[[1]]
[1] 1 2 2 4
[[2]]
logical(0)
[[3]]
[1] 1 2 2 4
Any help is appreciated.

Getting all splits of numeric sequence in R

I'm trying to get all the possible splits of a sequence [1:n] in R. E.g.:
getSplits(0,3)
Should return all possible splits of the sequence 123, in other words (in a list of vectors):
[1] 1
[2] 1 2
[3] 1 2 3
[4] 1 3
[5] 2
[6] 2 3
[7] 3
Now I've created a function which does get to these vectors recursively, but having trouble combining them into one as above. My function is:
getSplits <- function(currentDigit, lastDigit, split) {
splits=list();
for (nextDigit in currentDigit: lastDigit)
{
currentSplit <- c(split, c(nextDigit));
print(currentSplit);
if(nextDigit < lastDigit) {
possibleSplits = c(list(currentSplit), getSplits(nextDigit+1, lastDigit, currentSplit));
}else{
possibleSplits = currentSplit;
}
splits <- c(splits, list(possibleSplits));
}
return(splits);
}
Where printing each currentSplit results in all the right vectors I need, but somehow the final returnt list (splits) nests them into deeper levels of lists, returning:
[1] 1
[[1]][[2]]
[[1]][[2]][[1]]
[1] 1 2
[[1]][[2]][[2]]
[1] 1 2 3
[[1]][[3]]
[1] 1 3
[[2]]
[[2]][[1]]
[1] 2
[[2]][[2]]
[1] 2 3
[[3]]
[1] 3
For the corresponding function call getSplits(1, 3, c()).
If anyone could help me out on getting this to work the way I described above, it'd be much appreciated!
character vector output
Try combn:
k <- 3
s <- unlist(lapply(1:k, combn, x = k, toString))
s
## [1] "1" "2" "3" "1, 2" "1, 3" "2, 3" "1, 2, 3"
data frame output
If you would prefer that the output be in the form of a data frame:
read.table(text = s, header = FALSE, sep = ",", fill = TRUE, col.names = 1:k)
giving:
X1 X2 X3
1 1 NA NA
2 2 NA NA
3 3 NA NA
4 1 2 NA
5 1 3 NA
6 2 3 NA
7 1 2 3
list output
or a list:
lapply(s, function(x) scan(textConnection(x), quiet = TRUE, sep = ","))
giving:
[[1]]
[1] 1
[[2]]
[1] 2
[[3]]
[1] 3
[[4]]
[1] 1 2
[[5]]
[1] 1 3
[[6]]
[1] 2 3
[[7]]
[1] 1 2 3
Update: Have incorporated improvement mentioned in comments as well as one further simplification and also added data frame and list output.
Here is another approach:
f <- function(nums) sapply(1:length(nums), function(x) t(combn(nums, m = x)))
f(1:3)
This yields
[[1]]
[,1]
[1,] 1
[2,] 2
[3,] 3
[[2]]
[,1] [,2]
[1,] 1 2
[2,] 1 3
[3,] 2 3
[[3]]
[,1] [,2] [,3]
[1,] 1 2 3
The OP is looking for the Power set of c(1,2,3). There are several packages that will quickly get you this in one line. Using the package rje, we have:
library(rje)
powerSet(c(1,2,3))
[[1]]
numeric(0)
[[2]]
[1] 1
[[3]]
[1] 2
[[4]]
[1] 1 2
[[5]]
[1] 3
[[6]]
[1] 1 3
[[7]]
[1] 2 3
[[8]]
[1] 1 2 3
... and with iterpc:
library(iterpc)
getall(iterpc(c(2,1,1,1), 3, labels = 0:3))
[,1] [,2] [,3]
[1,] 0 0 1
[2,] 0 0 2
[3,] 0 0 3
[4,] 0 1 2
[5,] 0 1 3
[6,] 0 2 3
[7,] 1 2 3
More generally,
n <- 3
getall(iterpc(c(n-1,rep(1, n)), n, labels = 0:n)) ## same as above

Remove outliers based on a preceding value

How to remove outliers using a criterion that a value cannot be more than 2-fold higher then its preceding one.
Here is my try:
x<-c(1,2,6,4,10,20,50,10,2,1)
remove_outliers <- function(x, na.rm = TRUE, ...) {
for(i in 1:length(x))
x < (x[i-1] + 2*x)
x
}
remove_outliers(y)
expected outcome: 1,2,4,10,20,2,1
Thanks!
I think the first 10 should be removed in your data because 10>2*4. Here's a way to do what you want without loops. I'm using the dplyr version of lag.
library(dplyr)
x<-c(1,2,6,4,10,20,50,10,2,1)
x[c(TRUE,na.omit(x<=dplyr::lag(x)*2))]
[1] 1 2 4 20 10 2 1
EDIT
To use this with a data.frame:
df <- data.frame(id=1:10, x=c(1,2,6,4,10,20,50,10,2,1))
df[c(TRUE,na.omit(df$x<=dplyr::lag(df$x,1)*2)),]
id x
1 1 1
2 2 2
4 4 4
6 6 20
8 8 10
9 9 2
10 10 1
A simple sapply:
bool<-sapply(seq_along(1:length(x)),function(i) {ifelse(x[i]<2*x[i-1],FALSE,TRUE)})
bool
[[1]]
logical(0)
[[2]]
[1] TRUE
[[3]]
[1] TRUE
[[4]]
[1] FALSE
[[5]]
[1] TRUE
[[6]]
[1] TRUE
[[7]]
[1] TRUE
[[8]]
[1] FALSE
[[9]]
[1] FALSE
[[10]]
[1] FALSE
resulting in:
x[unlist(bool)]
[1] 1 2 4 10 20 1

Resources