I have a vector of numbers
f <- c(1, 3, 5, 8, 10, 12, 19, 27)
I want to compare the values in the vector to another number, and find the closest smaller value.
For example, if the input number is 18, then the closest, smaller value in the vector is 12 (position 6 in the vector).
If the input is 19, then the result should be the value 19, i.e. the index 7.
I think this answer is pretty straightforward:
f <- c(1,3,6,8,10,12,19,27)
x <- 18
# find the value that is closest to x
maxless <- max(f[f <= x])
# find out which value that is
which(f == maxless)
If your vector f is always sorted, then you can do sum(f <= x)
f <- c(1,3,6,8,10,12,19,27)
x <- 18
sum(f <= x)
# [1] 6
x <- 19
sum(f <= x)
# [1] 7
Try this (not a perfect solution)
x<-c(1,3,6,8,10,12,19,27)
showIndex<-function(x,input){
abs.diff<-abs(x-input)
index.value<-unique(ifelse(abs.diff==0,which.min(abs.diff),which.min(abs.diff)-1))
return(index.value)
}
showIndex(x,12)
[1] 6
showIndex(x,19)
[1] 7
You could try:
x <- 18
f <- c(1,3,6,8,10,12,19,27)
ifelse(x %in% f, which(f %in% x), which.min(abs(f - x)) - 1)
That way if x is not in f, it will return the nearest previous index. If x is in f, it will return x index.
Another one:
which.min(abs(18 - replace(f, f>18, Inf)))
#[1] 6
f[which.min(abs(18 - replace(f, f>18, Inf)))]
#[1] 12
Or as a function:
minsmaller <- function(x,value) which.min(abs(value - replace(x, x>value, Inf)))
minsmaller(f, 18)
#[1] 6
minsmaller(f, 19)
#[1] 7
There is findInterval:
findInterval(18:19, f)
#[1] 6 7
And building a more concrete function:
ff = function(x, table)
{
ot = order(table)
ans = findInterval(x, table[ot])
ot[ifelse(ans == 0, NA, ans)]
}
set.seed(007); sf = sample(f)
sf
#[1] 27 6 1 12 10 19 8 3
ff(c(0, 1, 18, 19, 28), sf)
#[1] NA 3 4 6 1
In a functional programming style:
f <- c(1, 3, 6, 8, 10, 12, 19, 27)
x <- 18
Position(function(fi) fi <= x, f, right = TRUE)
Related
I have a question I have the following data
c(1, 2, 4, 5, 1, 8, 9)
I set a l = 2 and an u = 6
I want to find all the values in the range (3,7)
How can I do this?
In base R we can use comparison operators to create a logical vector and use that for subsetting the original vector
x[x > 2 & x <= 6]
#[1] 3 5 6
Or using a for loop, initialize an empty vector, loop through the elements of 'x', if the value is between 2 and 6, then concatenate that value to the empty vector
v1 <- c()
for(i in x) {
if(i > 2 & i <= 6) v1 <- c(v1, i)
}
v1
#[1] 3 5 6
data
x <- c(3, 5, 6, 8, 1, 2, 1)
I am looking for a function which takes a vector and keeps dropping the first value until the sum of the vector is less than 20. Return the remaining values.
I've tried both a for-loop and while-loop and can't find a solution.
vec <- c(3,5,3,4,3,9,1,8,2,5)
short <- function(vec){
for (i in 1:length(vec)){
while (!is.na((sum(vec)) < 20)){
vec <- vec[i+1:length(vec)]
#vec.remove(i)
}
}
The expected output should be:
1,8,2,5
which is less than 20.
Looking at the expected output it looks like you want to drop values until sum of remaining values is less than 20.
We can create a function
drop_20 <- function(vec) {
tail(vec, sum(cumsum(rev(vec)) < 20))
}
drop_20(vec)
#[1] 1 8 2 5
Trying it on another input
drop_20(1:10)
#[1] 9 10
Breaking down the function, first the vec
vec = c(3,5,3,4,3,9,1,8,2,5)
We then reverse it
rev(vec)
#[1] 5 2 8 1 9 3 4 3 5 3
take cumulative sum over it (cumsum)
cumsum(vec)
#[1] 3 8 11 15 18 27 28 36 38 43
Find out number of enteries that are less than 20
cumsum(rev(vec)) < 20
#[1] TRUE TRUE TRUE TRUE FALSE FALSE FALSE FALSE FALSE FALSE
sum(cumsum(rev(vec)) < 20)
#[1] 4
and finally subset these last enteries using tail.
A slight modification in the code and it should be able to handle NAs as well
drop_20 <- function(vec) {
tail(vec, sum(cumsum(replace(rev(vec), is.na(rev(vec)), 0)) < 20))
}
vec = c(3, 2, NA, 4, 5, 1, 2, 3, 4, 9, NA, 1, 2)
drop_20(vec)
#[1] 3 4 9 NA 1 2
The logic being we replace NA with zeroes and then take the cumsum
You need to remove the first value each time, so your while loop should be,
while (sum(x, na.rm = TRUE) >= 20) {
x <- x[-1]
}
#[1] 1 8 2 5
base solution without loops
not my most readable code ever, but it's pretty fast (see benchmarking below)
rev( rev(vec)[cumsum( replace( rev(vec), is.na( rev(vec) ), 0 ) ) < 20] )
#[1] 1 8 2 5
note: 'borrowed' the NA-handling from #Ronak's answer
sample data
vec = c(3, 2, NA, 4, 5, 1, 2, 3, 4, 9, NA, 1, 2)
benchmarks
microbenchmark::microbenchmark(
Sotos = {
while (sum(vec, na.rm = TRUE) >= 20) {
vec <- vec[-1]
}
},
Ronak = tail(vec, sum(cumsum(replace(rev(vec), is.na(rev(vec)), 0)) < 20)),
Wimpel = rev( rev(vec)[cumsum( replace( rev(vec), is.na( rev(vec) ), 0 ) ) < 20]),
WimpelMarkus = vec[rev(cumsum(rev(replace(vec, is.na(vec), 0))) < 20)]
)
# Unit: microseconds
# expr min lq mean median uq max neval
# Sotos 2096.795 2127.373 2288.15768 2152.6795 2425.4740 3071.684 100
# Ronak 30.127 33.440 42.54770 37.2055 49.4080 101.827 100
# Wimpel 13.557 15.063 17.65734 16.1175 18.5285 38.261 100
# WimpelMarkus 7.532 8.737 12.60520 10.0925 15.9680 45.491 100
I would go with Reduce
vec[Reduce(f = "+", x = vec, accumulate = T, right = T) < 20]
##[1] 1 8 2 5
Alternatively, define Reduce with function sum with the conditional argument na.rm = T in order to hanlde NAs if desired:
vec2 <- c(3, 2, NA, 4, 5, 1, 2, 3, 4, 9, NA, 1, 2)
vec2[Reduce(f = function(a,b) sum(a, b, na.rm = T), x = vec2, accumulate = TRUE, right = T) < 20]
##[1] 3 4 9 NA 1 2
I find the Reduce option to start from right (end of the integer vector), and hence not having to reverse it first, convenient.
I need to create groups of numbers which summed up do not reach 34.
For example: I have an array x<-c(28,26,20,5,3,2,1) and I need to create the following groups: a=(28,5,1), b=(26,3,2), c=(20) because the sums of the groups a, b and c do not exceed 34.
Is it possible to implement this procedure in R?
If I understand correctly this is what you want to do:
create_groups <- function(input, threshold) {
input <- sort(input, decreasing = TRUE)
result <- vector("list", length(input))
sums <- rep(0, length(input))
for (k in input) {
i <- match(TRUE, sums + k <= threshold)
if (!is.na(i)) {
result[[i]] <- c(result[[i]], k)
sums[i] <- sums[i] + k
}
}
result[sapply(result, is.null)] <- NULL
result
}
create_groups(x, 34)
# [[1]]
# [1] 28 5 1
#
# [[2]]
# [1] 26 3 2
#
# [[3]]
# [1] 20
However it is not guaranteed that this greedy algorithm will output the optimal solution in terms of number of groups. For instance:
y <- c(18, 15, 11, 9, 8, 7)
create_groups(y, 34)
# [[1]]
# [1] 18 15
#
# [[2]]
# [1] 11 9 8
#
# [[3]]
# [1] 7
while the optimal solution in this case consists of only 2 groups: list(c(18, 9, 7), c(15, 11, 8)).
Assuming you want all possible combinations of subsets of x that meet this condition, you can use
x = c(28,26,20,5,3,2,1)
y = lapply(seq_along(x), function(y) combn(x, y)) # list all combinations of all subsets
le34 = sapply(y, function(z) colSums(z) <= 34) # which sums are less than 34
lapply(seq_along(y), function(i) y[[i]][,le34[[i]]] ) # list of combinations that meet condition
I have a vector that has been divided into two clusters (as discussed in this question):
x <- c(1, 4, 5, 6, 9, 29, 32, 46, 55)
tree <- hclust(dist(x), method = "single")
split(x, cutree(tree, h = 19))
# $`1`
# [1] 1 4 5 6 9
#
# $`2`
# [1] 29 32 46 55
Now suppose I have another cluster of the same length, which I wish to divide into the same number of clusters by the same indices as x, take the following vector y as an example:
set.seed(77)
y = rnorm(9)
y
#[1] -0.54964 1.09105 0.63978 1.04258 0.16970 1.13780 -0.97055 -0.13183
#[9] 0.14623
The desired result should be like this:
# $`1`
# [1] -0.54964 1.09105 0.63978 1.04258 0.16970
#
# $`2`
# [1] 1.13780 -0.97055 -0.13183 0.14623
Just like you did for x:
split(y, cutree(tree, h = 19))
And since you are now using cutree(tree, h = 19) in multiple places, you might as well assign it to a variable:
groups <- cutree(tree, h = 19)
split(x, groups)
split(y, groups)
Suppose that my vector numbers contains c(1,2,3,5,7,8), and I wish to find if it contains 3 consecutive numbers, which in this case, are 1,2,3.
numbers = c(1,2,3,5,7,8)
difference = diff(numbers) //The difference output would be 1,1,2,2,1
To verify that there are 3 consecutive integers in my numbers vector, I've tried the following with little reward.
rep(1,2)%in%difference
The above code works in this case, but if my difference vector = (1,2,2,2,1), it would still return TRUE even though the "1"s are not consecutive.
Using diff and rle, something like this should work:
result <- rle(diff(numbers))
any(result$lengths>=2 & result$values==1)
# [1] TRUE
In response to the comments below, my previous answer was specifically only testing for runs of length==3 excluding longer lengths. Changing the == to >= fixes this. It also works for runs involving negative numbers:
> numbers4 <- c(-2, -1, 0, 5, 7, 8)
> result <- rle(diff(numbers4))
> any(result$lengths>=2 & result$values==1)
[1] TRUE
Benchmarks!
I am including a couple functions of mine. Feel free to add yours. To qualify, you need to write a general function that tells if a vector x contains n or more consecutive numbers. I provide a unit test function below.
The contenders:
flodel.filter <- function(x, n, incr = 1L) {
if (n > length(x)) return(FALSE)
x <- as.integer(x)
is.cons <- tail(x, -1L) == head(x, -1L) + incr
any(filter(is.cons, rep(1L, n-1L), sides = 1, method = "convolution") == n-1L,
na.rm = TRUE)
}
flodel.which <- function(x, n, incr = 1L) {
is.cons <- tail(x, -1L) == head(x, -1L) + incr
any(diff(c(0L, which(!is.cons), length(x))) >= n)
}
thelatemail.rle <- function(x, n, incr = 1L) {
result <- rle(diff(x))
any(result$lengths >= n-1L & result$values == incr)
}
improved.rle <- function(x, n, incr = 1L) {
result <- rle(diff(as.integer(x)) == incr)
any(result$lengths >= n-1L & result$values)
}
carl.seqle <- function(x, n, incr = 1) {
if(!is.numeric(x)) x <- as.numeric(x)
z <- length(x)
y <- x[-1L] != x[-z] + incr
i <- c(which(y | is.na(y)), z)
any(diff(c(0L, i)) >= n)
}
Unit tests:
check.fun <- function(fun)
stopifnot(
fun(c(1,2,3), 3),
!fun(c(1,2), 3),
!fun(c(1), 3),
!fun(c(1,1,1,1), 3),
!fun(c(1,1,2,2), 3),
fun(c(1,1,2,3), 3)
)
check.fun(flodel.filter)
check.fun(flodel.which)
check.fun(thelatemail.rle)
check.fun(improved.rle)
check.fun(carl.seqle)
Benchmarks:
x <- sample(1:10, 1000000, replace = TRUE)
library(microbenchmark)
microbenchmark(
flodel.filter(x, 6),
flodel.which(x, 6),
thelatemail.rle(x, 6),
improved.rle(x, 6),
carl.seqle(x, 6),
times = 10)
# Unit: milliseconds
# expr min lq median uq max neval
# flodel.filter(x, 6) 96.03966 102.1383 144.9404 160.9698 177.7937 10
# flodel.which(x, 6) 131.69193 137.7081 140.5211 185.3061 189.1644 10
# thelatemail.rle(x, 6) 347.79586 353.1015 361.5744 378.3878 469.5869 10
# improved.rle(x, 6) 199.35402 200.7455 205.2737 246.9670 252.4958 10
# carl.seqle(x, 6) 213.72756 240.6023 245.2652 254.1725 259.2275 10
After diff you can check for any consecutive 1s -
numbers = c(1,2,3,5,7,8)
difference = diff(numbers) == 1
## [1] TRUE TRUE FALSE FALSE TRUE
## find alteast one consecutive TRUE
any(tail(difference, -1) &
head(difference, -1))
## [1] TRUE
It's nice to see home-grown solutions here.
Fellow Stack Overflow user Carl Witthoft posted a function he named seqle() and shared it here.
The function looks like this:
seqle <- function(x,incr=1) {
if(!is.numeric(x)) x <- as.numeric(x)
n <- length(x)
y <- x[-1L] != x[-n] + incr
i <- c(which(y|is.na(y)),n)
list(lengths = diff(c(0L,i)),
values = x[head(c(0L,i)+1L,-1L)])
}
Let's see it in action. First, some data:
numbers1 <- c(1, 2, 3, 5, 7, 8)
numbers2 <- c(-2, 2, 3, 5, 6, 7, 8)
numbers3 <- c(1, 2, 2, 2, 1, 2, 3)
Now, the output:
seqle(numbers1)
# $lengths
# [1] 3 1 2
#
# $values
# [1] 1 5 7
#
seqle(numbers2)
# $lengths
# [1] 1 2 4
#
# $values
# [1] -2 2 5
#
seqle(numbers3)
# $lengths
# [1] 2 1 1 3
#
# $values
# [1] 1 2 2 1
#
Of particular interest to you is the "lengths" in the result.
Another interesting point is the incr argument. Here we can set the increment to, say, "2" and look for sequences where the difference between the numbers are two. So, for the first vector, we would expect the sequence of 3, 5, and 7 to be detected.
Let's try:
> seqle(numbers1, incr = 2)
$lengths
[1] 1 1 3 1
$values
[1] 1 2 3 8
So, we can see that we have a sequence of 1 (1), 1 (2), 3 (3, 5, 7), and 1 (8) if we set incr = 2.
How does it work with ECII's second challenge? Seems OK!
> numbers4 <- c(-2, -1, 0, 5, 7, 8)
> seqle(numbers4)
$lengths
[1] 3 1 2
$values
[1] -2 5 7
Simple but works
numbers = c(-2,2,3,4,5,10,6,7,8)
x1<-c(diff(numbers),0)
x2<-c(0,diff(numbers[-1]),0)
x3<-c(0,diff(numbers[c(-1,-2)]),0,0)
rbind(x1,x2,x3)
colSums(rbind(x1,x2,x3) )==3 #Returns TRUE or FALSE where in the vector the consecutive intervals triplet takes place
[1] FALSE TRUE TRUE FALSE FALSE FALSE TRUE FALSE FALSE
sum(colSums(rbind(x1,x2,x3) )==3) #How many triplets of consecutive intervals occur in the vector
[1] 3
which(colSums(rbind(x1,x2,x3) )==3) #Returns the location of the triplets consecutive integers
[1] 2 3 7
Note that this will not work for consecutive negative intervals c(-2,-1,0) because of how diff() works