R: find vector in list of vectors - r

i'm working with R and my goal is to check wether a given vector is in a list of unique vectors.
The list looks like
final_states <- list(c("x" = 5, "y" = 1),
c("x" = 5, "y" = 2),
c("x" = 5, "y" = 3),
c("x" = 5, "y" = 4),
c("x" = 5, "y" = 5),
c("x" = 3, "y" = 5))
Now I want to check wether a given state is in the list. For example:
state <- c("x" = 5, "y" = 3)
As you can see, the vector state is an element of the list final_states. My idea was to check it with %in% operator:
state %in% final_states
But I get this result:
[1] FALSE FALSE
Can anyone tell me, what is wrong?
Greets,
lupi

If you just want to determine if the vector is in the list, try
Position(function(x) identical(x, state), final_states, nomatch = 0) > 0
# [1] TRUE
Position() basically works like match(), but on a list. If you set nomatch = 0 and check for Position > 0, you'll get a logical result telling you whether state is in final_states

"final_states" is a "list", so you could convert the "state" to list and then do
final_states %in% list(state)
#[1] FALSE FALSE TRUE FALSE FALSE FALSE
or use mapply to check whether all the elements in "state" are present in each of the list elements of "final_states" (assuming that the lengths are the same for the vector and the list elements)
f1 <- function(x,y) all(x==y)
mapply(f1, final_states, list(state))
#[1] FALSE FALSE TRUE FALSE FALSE FALSE
Or rbind the list elements to a matrix and then check whether "state" and the "rows" of "m1" are the same.
m1 <- do.call(rbind, final_states)
!rowSums(m1!=state[col(m1)])
#[1] FALSE FALSE TRUE FALSE FALSE FALSE
Or
m1[,1]==state[1] & m1[,2]==state[2]
#[1] FALSE FALSE TRUE FALSE FALSE FALSE
Update
If you need to get a single TRUE/FALSE
any(mapply(f1, final_states, list(state)))
#[1] TRUE
Or
any(final_states %in% list(state))
#[1] TRUE
Or
list(state) %in% final_states
#[1] TRUE
Or use the "faster" fmatch from fastmatch
library(fastmatch)
fmatch(list(state), final_states) >0
#[1] TRUE
Benchmarks
#Richard Sciven's base R function is very fast compared to other solutions except the one with fmatch
set.seed(295)
final_states <- replicate(1e6, sample(1:20, 20, replace=TRUE),
simplify=FALSE)
state <- final_states[[151]]
richard <- function() {Position(function(x) identical(x, state),
final_states, nomatch = 0) > 0}
Bonded <- function(){any( sapply(final_states, identical, state) )}
akrun2 <- function() {fmatch(list(state), final_states) >0}
akrun1 <- function() {f1 <- function(x,y) all(x==y)
any(mapply(f1, final_states, list(state)))}
library(microbenchmark)
microbenchmark(richard(), Bonded(), akrun1(), akrun2(),
unit='relative', times=20L)
#Unit: relative
# expr min lq mean median uq
# richard() 35.22635 29.47587 17.49164 15.66833 14.58235
# Bonded() 109440.56885 101382.92450 55252.86141 47734.96467 44289.80309
# akrun1() 167001.23864 138812.85016 75664.91378 61417.59871 62667.94867
# akrun2() 1.00000 1.00000 1.00000 1.00000 1.00000
# max neval cld
# 14.62328 20 a
# 46299.43325 20 b
# 63890.68133 20 c
# 1.00000 20 a

Whenever i see a list object I first think of lapply. Seems to deliver the expected result with identical as the test and 'state' as the second argument:
> lapply(final_states, identical, state)
[[1]]
[1] FALSE
[[2]]
[1] FALSE
[[3]]
[1] TRUE
[[4]]
[1] FALSE
[[5]]
[1] FALSE
[[6]]
[1] FALSE
You get a possibly useful intermediate result with:
lapply(final_states, match, state)
... but it comes back as a series of position vectors where c(1,2) is the correct result.
If you want the result to come back as a vector , say for instance you want to use any, then use sapply instead of lapply.
> any( sapply(final_states[-3], identical, state) )
[1] FALSE
> any( sapply(final_states, identical, state) )
[1] TRUE

Related

Is there a better way to check if all elements in a list are named?

I want to check if all elements in a list are named. I've came up with this solution, but I wanted to know if there is a more elegant way to check this.
x <- list(a = 1, b = 2)
y <- list(1, b = 2)
z <- list (1, 2)
any(stringr::str_length(methods::allNames(x)) == 0L) # FALSE, all elements are
# named.
any(stringr::str_length(methods::allNames(y)) == 0L) # TRUE, at least one
# element is not named.
# Throw an error here.
any(stringr::str_length(methods::allNames(z)) == 0L) # TRUE, at least one
# element is not named.
# Throw an error here.
I am not sure if the following base R code works for your general cases, but it seems work for the ones in your post.
Define a function f to check the names
f <- function(lst) length(lst) == sum(names(lst) != "",na.rm = TRUE)
and you will see
> f(x)
[1] TRUE
> f(y)
[1] FALSE
> f(z)
[1] FALSE
We can create a function to check if the the names attribute is NULL or (|) there is blank ("") name, negate (!)
f1 <- function(lst1) is.list(lst1) && !(is.null(names(lst1))| '' %in% names(lst1))
-checking
f1(x)
#[1] TRUE
f1(y)
#[1] FALSE
f1(z)
#[1] FALSE
Or with allNames
f2 <- function(lst1) is.list(lst1) && !("" %in% allNames(lst1))
-checking
f2(x)
#[1] TRUE
f2(y)
#[1] FALSE
f2(z)
#[1] FALSE

Check if a number is between two others

I am looking for a function that verifies if a number is between two other numbers. I also need to control if I want a strict comparison (a
I know the function between() in dplyr. Yet, I have to know the upper and lower numbers.
MyNumber = 8
First = 2
Second = 10
# This will return TRUE
between(MyNumber, lower = First, upper = Second)
# But this will return FALSE
between(MyNumber, lower = Second, upper = First)
# This will return TRUE. I want it to return FALSE
First = 8
between(MyNumber, lower = First, upper = Second)
I need a function that returns TRUE no matter what is the order.
Something like:
between2 <- function(number,bounds) { number > min(bounds) & number < max(bounds)}
between2(8, c(2,10))
[1] TRUE
between2(8, c(10,2))
[1] TRUE
This function also deals with your added condition
between2(8,c(8,10))
[1] FALSE
You could do it with a simple arithmetics:
between <- function(number, first, second) { (first - number) * (second - number) < 0 }
Here are some example outputs:
> between(8, 2, 10)
[1] TRUE
> between(8, 10, 2)
[1] TRUE
> between(8, 10, 12)
[1] FALSE
> between(8, 1, 2)
[1] FALSE
You could use %in% with the : function, once you now first and last:
first <- 2
last <- 10
number <- 8
number %in% first:last
[1] TRUE
first <- 10
last <- 2
number <- 8
number %in% first:last
[1] TRUE
first <- 10
last <- 12
number <- 8
number %in% first:last
[1] FALSE
first <- 12
last <- 10
number <- 8
number %in% first:last
[1] FALSE
In a function, and strict lets you consider or not strict comparison:
my_between <- function(n, f, l, strict = FALSE) {
if (!strict) {
n %in% f:l # if strict == FALSE (default)
} else {
n %in% (f+1):(l-1) # if strict == TRUE
}
}
my_between(8, 2, 10)
What's wrong with
f_between <- function (num, L, R) num>=min(L,R) & num<=max(L,R)
f_between(8, 2, 10)
#[1] TRUE
f_between(6, 6, 10)
#[1] TRUE
f_between(2, -10, -2)
#[1] FALSE
f_between(3, 5, 7)
#[1] FALSE

Replace elements of vector by vector

I want to replace few elements of vector by whole second vector. Condition is, that replaced elements of first vector are equal to third vector. Here is an example:
a <- 1:10
b <- 5:7
v <- rnorm(2, mean = 1, sd = 5)
my output should be
c(a[1:4], v, a[8:10])
I have already tried
replace(a, a == b, v)
a[a == b] <- v
but with a little success. Can anyone help?
The == operator is best used to match vectors of the same length, or when one of the vector is only length 1.
Try this, and notice in neither case do you get the positional match that you desire.
> a == b
[1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
Warning message:
In a == b : longer object length is not a multiple of shorter object length
> b == a
[1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
Warning message:
In b == a : longer object length is not a multiple of shorter object length
Instead, use match() - this gives you the index position where there is a match in the values.
> match(b, a)
[1] 5 6 7
Then:
a <- 1:10
b <- 5:7
v <- rnorm(3, mean=1, sd=5)
a[match(b, a)] <- v
The results:
a
[1] 1.0000000 2.0000000 3.0000000 4.0000000 -4.6843669 0.9014578 -0.7601413 8.0000000
[9] 9.0000000 10.0000000
Here' another option:
a[a %in% b] <- v
Since in the example described in the OP there are three common numbers in the vectors a and b while v <- rnorm(2, mean = 1, sd = 5)
contains only 2 numbers, the vector v will be recycled and a warning will be issued.
The warning and recycling can be prevented, e.g., by defining v as
v <- rnorm(sum(a %in% b), mean = 1, sd = 5)

check whether a variable is in increasing order in R

Supposing I have a variable
x <- c(1,3,5,7,8)
Now x is in increasing order
How to check whether a variable is in increasing order in R?
From ?is.unsorted:
Test if an object is not sorted (in increasing order) ...
So, in this case, you could:
is.sorted = Negate(is.unsorted)
is.sorted(x)
#[1] TRUE
#> is.sorted(1:5)
#[1] TRUE
#> is.sorted(5:1)
#[1] FALSE
#> is.sorted(sample(5))
#[1] FALSE
#> is.sorted(sort(runif(5)))
#[1] TRUE
#> is.sorted(c(1,2,2,3))
#[1] TRUE
#> is.sorted(c(1,2,2,3), strictly = T)
#[1] FALSE
This function is fast, because it loops over the vector and breaks the loop as soon as an element is not ">=" (or ">", if "strictly = T") from the previous one.
Try this:
all(diff(x) > 0)
or
all(diff(x) >= 0)
I agree with #flodel that is.unsorted (h/t #alexis_laz) is probably even better.
Look at the differences:
R> x <- c(1,3,5,7,8)
R> allIncreasing <- function(x) all(diff(x)>0)
R> allIncreasing(x)
[1] TRUE
R> y <- x; y[3] <-0
R> allIncreasing(y)
[1] FALSE
R>

Insert elements into a vector at given indexes

I have a logical vector, for which I wish to insert new elements at particular indexes. I've come up with a clumsy solution below, but is there a neater way?
probes <- rep(TRUE, 15)
ind <- c(5, 10)
probes.2 <- logical(length(probes)+length(ind))
probes.ind <- ind + 1:length(ind)
probes.original <- (1:length(probes.2))[-probes.ind]
probes.2[probes.ind] <- FALSE
probes.2[probes.original] <- probes
print(probes)
gives
[1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
and
print(probes.2)
gives
[1] TRUE TRUE TRUE TRUE TRUE FALSE TRUE TRUE TRUE TRUE TRUE FALSE
[13] TRUE TRUE TRUE TRUE TRUE
So it works but is ugly looking - any suggestions?
These are all very creative approaches. I think working with indexes is definitely the way to go (Marek's solution is very nice).
I would just mention that there is a function to do roughly that: append().
probes <- rep(TRUE, 15)
probes <- append(probes, FALSE, after=5)
probes <- append(probes, FALSE, after=11)
Or you could do this recursively with your indexes (you need to grow the "after" value on each iteration):
probes <- rep(TRUE, 15)
ind <- c(5, 10)
for(i in 0:(length(ind)-1))
probes <- append(probes, FALSE, after=(ind[i+1]+i))
Incidentally, this question was also previously asked on R-Help. As Barry says:
"Actually I'd say there were no ways of doing this, since I dont think you can actually insert into a vector - you have to create a new vector that produces the illusion of insertion!"
You can do some magic with indexes:
First create vector with output values:
probs <- rep(TRUE, 15)
ind <- c(5, 10)
val <- c( probs, rep(FALSE,length(ind)) )
# > val
# [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
# [13] TRUE TRUE TRUE FALSE FALSE
Now trick. Each old element gets rank, each new element gets half-rank
id <- c( seq_along(probs), ind+0.5 )
# > id
# [1] 1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 9.0 10.0 11.0 12.0 13.0 14.0 15.0
# [16] 5.5 10.5
Then use order to sort in proper order:
val[order(id)]
# [1] TRUE TRUE TRUE TRUE TRUE FALSE TRUE TRUE TRUE TRUE TRUE FALSE
# [13] TRUE TRUE TRUE TRUE TRUE
probes <- rep(TRUE, 1000000)
ind <- c(50:100)
val <- rep(FALSE,length(ind))
new.probes <- vector(mode="logical",length(probes)+length(val))
new.probes[-ind] <- probes
new.probes[ind] <- val
Some timings:
My method
user system elapsed
0.03 0.00 0.03
Marek method
user system elapsed
0.18 0.00 0.18
R append with for loop
user system elapsed
1.61 0.48 2.10
How about this:
> probes <- rep(TRUE, 15)
> ind <- c(5, 10)
> probes.ind <- rep(NA, length(probes))
> probes.ind[ind] <- FALSE
> new.probes <- as.vector(rbind(probes, probes.ind))
> new.probes <- new.probes[!is.na(new.probes)]
> new.probes
[1] TRUE TRUE TRUE TRUE TRUE FALSE TRUE TRUE TRUE TRUE TRUE FALSE
[13] TRUE TRUE TRUE TRUE TRUE
That is sorta tricky. Here's one way. It iterates over the list, inserting each time, so it's not too efficient.
probes <- rep(TRUE, 15)
probes.ind <- ind + 0:(length(ind)-1)
for (i in probes.ind) {
probes <- c(probes[1:i], FALSE, probes[(i+1):length(probes)])
}
> probes
[1] TRUE TRUE TRUE TRUE TRUE FALSE TRUE TRUE TRUE TRUE TRUE FALSE
[13] TRUE TRUE TRUE TRUE TRUE
This should even work if ind has repeated elements, although ind does need to be sorted for the probes.ind construction to work.
Or you can do it using the insertRow function from the miscTools package.
probes <- rep(TRUE, 15)
ind <- c(5,10)
for (i in ind){
probes <- as.vector(insertRow(as.matrix(probes), i, FALSE))
}
I came up with a good answer that's easy to understand and fairly fast to run, building off Wojciech's answer above. I'll adapt the method for the example here, but it can be easily generalized to pretty much any data type for an arbitrary pattern of missing points (shown below).
probes <- rep(TRUE, 15)
ind <- c(5,10)
probes.final <- rep(FALSE, length(probes)+length(ind))
probes.final[-ind] <- probes
The data I needed this for is sampled at a regular interval, but many samples are thrown out, and the resulting data file only includes the timestamps and measurements for those retained. I needed to produce a vector containing all the timestamps and a data vector with NAs inserted for timestamps that were tossed. I used the "not in" function stolen from here to make it a bit simpler.
`%notin%` <- Negate(`%in%`)
dat <- rnorm(50000) # Data given
times <- seq(from=554.3, by=0.1, length.out=70000] # "Original" time stamps
times <- times[-sample(2:69999, 20000)] # "Given" times with arbitrary points missing from interior
times.final <- seq(from=times[1], to=times[length(times)], by=0.1)
na.ind <- which(times.final %notin% times)
dat.final <- rep(NA, length(times.final))
dat.final[-na.ind] <- dat
Um, hi, I had the same doubt, but I couldn't understand what people had answered, because I'm still learning the language. So I tried make my own and I suppose it works! I created a vector and I wanted to insert the value 100 after the 3rd, 5th and 6th indexes. This is what I wrote.
vector <- c(0:9)
indexes <- c(6, 3, 5)
indexes <- indexes[order(indexes)]
i <- 1
j <- 0
while(i <= length(indexes)){
vector <- append(vector, 100, after = indexes[i] + j)
i <-i + 1
j <- j + 1
}
vector
The vector "indexes" must be in ascending order for this to work. This is why I put them in order at the third line.
The variable "j" is necessary because at each iteration, the length of the new vector increases and the original values are moved.
In the case you wish to insert the new value next to each other, simply repeat the number of the index. For instance, by assigning indexes <- c(3, 5, 5, 5, 6), you should get vector == 0 1 2 100 3 4 100 100 100 5 100 6 7 8 9

Resources