I have a question about finding index values in a vector.
Let's say I have a vector as follows:
vector <- c(1,2,4,6,8,10)
And, let's say I have the value '5'. I would like to find the maximum index in "vector" such that it is less than or equal to the value 5. In the case of the example above, this index would be 3 (since 4 is less than or equal to 5). Similarly, if instead I had a vector such as:
vector <- c(1,2,4,5,6,8,10)
Then if I were to find a value less than or equal to 5, this index would now be 4 instead of 3.
However, I also want to find the first and last time this index occurs. For example, if I had a vector such as:
vector <- c(1,1,2,2,4,5,5,5,5,6,8,10)
Then the first time this index occurs would be 6 and the last time this index occurs would be 9.
Is there a short, one-line method which would allow me to perform this task? Up until now I have been using the function max(which(....)), however I find that this method is extremely inefficient for large datasets since it will literally list hundreds/thousands of values, so I would like to find a more efficient method if possible which can fit in one line.
Thanks in advance.
You can use the following code:
min(max(which(vector <= 5)), min(which(vector == 5)))
First, it searches all indices where vector is less or equal to 5 with which function, then it takes the maximum one.
Second, it searches all indices where vector is equal to 5 and takes the minimum.
Third, it takes the first of these two indices
Thanks for all those who replied, I actually found an extremely short, one-line method to do this by download a package BBmisc. It has functions called which.last and which.first, and they perform the actions I need. Thanks again for taking the time to reply, I appreciate it.
You can use:
my_ind <- function(vec, num){
ind <- which.max(vec == num) # Check for equality first
if(ind == 1L && vec[1L] != num){
ind <- which.min(vec < num) - 1L
}
ind
}
my_ind(c(1,2,4,6,8,10), 5L) # 3
my_ind(c(1,2,4,5,6,8,10), 5L) # 4
my_ind(c(1,1,2,2,4,5,5,5,5,6,8,10), 5L) # 6
my_ind(c(5,8,10), 5L) # 1
my_ind(c(6,8,10), 5L) # 0 - returns 0 if all(vec > 5L)
I don't see a need for packages here. It seems like the construct which(x == max(x[x <= 5])) would work for you.
x <- c(1, 2, 4, 6, 8, 10)
which(x == max(x[x <= 5]))
# [1] 3
x <- c(1, 2, 4, 5, 6, 8, 10)
which(x == max(x[x <= 5]))
# [1] 4
x <- c(1, 1, 2, 2, 4, 5, 5, 5, 5, 6, 8, 10)
which(x == max(x[x <= 5]))
# [1] 6 7 8 9
And to find the min/max index for multiples indices, use head/tail.
head(which(x == max(x[x <= 5])), 1)
# [1] 6
tail(which(x == max(x[x <= 5])), 1)
# [1] 9
Related
Can I pass a custom compare function to order that, given two items, indicates which one is ranked higher?
In my specific case I have the following list.
scores <- list(
'a' = c(1, 1, 2, 3, 4, 4),
'b' = c(1, 2, 2, 2, 3, 4),
'c' = c(1, 1, 2, 2, 3, 4),
'd' = c(1, 2, 3, 3, 3, 4)
)
If we take two vectors a and b, the index of the first element i at which a[i] > b[i] or a[i] < b[i] should determine what vector comes first. In this example, scores[['d']] > scores[['a']] because scores[['d']][2] > scores[['a']][2] (note that it doesn't matter that scores[['d']][5] < scores[['a']][5]).
Comparing two of those vectors could look something like this.
compare <- function(a, b) {
# get first element index at which vectors differ
i <- which.max(a != b)
if(a[i] > b[i])
1
else if(a[i] < b[i])
-1
else
0
}
The sorted keys of scores by using this comparison function should then be d, b, a, c.
From other solutions I've found, they mess with the data before ordering or introduce S3 classes and apply comparison attributes. With the former I fail to see how to mess with my data (maybe turn it into strings? But then what about numbers above 9?), with the latter I feel uncomfortable introducing a new class into my R package only for comparing vectors. And there doesn't seem to be a sort of comparator parameter I'd want to pass to order.
Here's an attempt. I've explained every step in the comments.
compare <- function(a, b) {
# subtract vector a from vector b
comparison <- a - b
# get the first non-zero result
restult <- comparison[comparison != 0][1]
# return 1 if result == 1 and 2 if result == -1 (0 if equal)
if(is.na(restult)) {return(0)} else if(restult == 1) {return(1)} else {return(2)}
}
compare_list <- function(list_) {
# get combinations of all possible comparison
comparisons <- combn(length(list_), 2)
# compare all possibilities
results <- apply(comparisons, 2, function(x) {
# get the "winner"
x[compare(list_[[x[1]]], list_[[x[2]]])]
})
# get frequency table (how often a vector "won" -> this is the result you want)
fr_tab <- table(results)
# vector that is last in comparison
last_vector <- which(!(1:length(list_) %in% as.numeric(names(fr_tab))))
# return the sorted results and add the last vectors name
c(as.numeric(names(sort(fr_tab, decreasing = T))), last_vector)
}
If you run the function on your example, the result is
> compare_list(scores)
[1] 4 2 1 3
I haven't dealt with the case that the two vectors are identical, you haven't explained how to deal with this.
The native R way to do this is to introduce an S3 class.
There are two things you can do with the class. You can define a method for xtfrm that converts your list entries to numbers. That could be vectorized, and conceivably could be really fast.
But you were asking for a user defined compare function. This is going to be slow because R function calls are slow, and it's a little clumsy because nobody does it. But following the instructions in the xtfrm help page, here's how to do it:
scores <- list(
'a' = c(1, 1, 2, 3, 4, 4),
'b' = c(1, 2, 2, 2, 3, 4),
'c' = c(1, 1, 2, 2, 3, 4),
'd' = c(1, 2, 3, 3, 3, 4)
)
# Add a class to the list
scores <- structure(scores, class = "lexico")
# Need to keep the class when subsetting
`[.lexico` <- function(x, i, ...) structure(unclass(x)[i], class = "lexico")
# Careful here: identical() might be too strict
`==.lexico` <- function(a, b) {identical(a, b)}
`>.lexico` <- function(a, b) {
a <- a[[1]]
b <- b[[1]]
i <- which(a != b)
length(i) > 0 && a[i[1]] > b[i[1]]
}
is.na.lexico <- function(a) FALSE
sort(scores)
#> $c
#> [1] 1 1 2 2 3 4
#>
#> $a
#> [1] 1 1 2 3 4 4
#>
#> $b
#> [1] 1 2 2 2 3 4
#>
#> $d
#> [1] 1 2 3 3 3 4
#>
#> attr(,"class")
#> [1] "lexico"
Created on 2021-11-27 by the reprex package (v2.0.1)
This is the opposite of the order you asked for, because by default sort() sorts to increasing order. If you really want d, b, a, c use sort(scores, decreasing = TRUE.
Here's another, very simple solution:
sort(sapply(scores, function(x) as.numeric(paste(x, collapse = ""))), decreasing = T)
What it does is, it takes all the the vectors, "compresses" them into a single numerical digit and then sorts those numbers in decreasing order.
I have a question I have the following data
c(1, 2, 4, 5, 1, 8, 9)
I set a l = 2 and an u = 6
I want to find all the values in the range (3,7)
How can I do this?
In base R we can use comparison operators to create a logical vector and use that for subsetting the original vector
x[x > 2 & x <= 6]
#[1] 3 5 6
Or using a for loop, initialize an empty vector, loop through the elements of 'x', if the value is between 2 and 6, then concatenate that value to the empty vector
v1 <- c()
for(i in x) {
if(i > 2 & i <= 6) v1 <- c(v1, i)
}
v1
#[1] 3 5 6
data
x <- c(3, 5, 6, 8, 1, 2, 1)
I have a random vector (of numbers 1:5) of length 20. I need to count the number of runs of 1 (i.e. each number that is not followed by the same number), 2 (i.e. 2 consecutive numbers the same), 3 and 4.
I'm trying to write a function that takes x[1] and x[2] and compares them, if they are the same then + 1 to a counting variable. After that, x[1] becomes x[2] and x[2] should become x[3] so it keeps on repeating. How do I make x[2] change to x[3] without assigning it again? Sorry if that doesn't make much sense
This is my first day learning R so please simplify as much as you can so I understand lol..
{
startingnumber <- x[1]
nextnumber <- x[2]
count <- 0
repeat {
if (startingnumber == nextnumber) {
count <- count + 1
startingnumber <- nextnumber
nextnumber <- x[3]
} else {
if (startingnumber != nextnumber) {
break
........
}
}
}
}
As mentioned in the comments, using table() on the rle() lengths is probably the most concise solution
E.g:
x <- c(3, 1, 1, 3, 4, 5, 3, 1, 5, 4, 2, 4, 2, 3, 2, 3, 2, 4, 5, 4)
table(rle(x)$lengths)
# 1 2
# 18 1
# or
v <- c(1, 1, 2, 4, 5, 5, 4, 5, 5, 3, 3, 2, 2, 2, 1, 4, 4, 4, 2, 1)
table(rle(v)$lengths)
# 1 2 3
# 6 4 2
In the first example there's 18 singles and one double (the two 1s near the beginning), for a total of 1*18 + 2*1 = 20 values
In the second example there are 6 singles, 4 doubles, and 2 triples, giving a total of 1*6 + 2*4 + 3*2 = 20 values
But if computational speed is of more importance than concise code, we can do better, as both table() and rle() do computations internally that we don't really need. Instead we can assemble a function that only does the bare minimum.
runlengths <- function(x) {
n <- length(x)
r <- which(x[-1] != x[-n])
rl <- diff(c(0, r, n))
rlu <- sort(unique(rl))
rlt <- tabulate(match(rl, rlu))
names(rlt) <- rlu
as.table(rlt)
}
runlengths(x)
# 1 2
# 18 1
runlengths(v)
# 1 2 3
# 6 4 2
Bonus:
You already know that you can compare individual elements of a vector like this
x[1] == x[2]
x[2] == x[3]
but did you know that you can compare vectors with each other, and that you can select multiple elements from a vector by specifying multiple indices? Together that means we can instead of doing
x[1] == x[2]
x[2] == x[3]
.
.
.
x[18] == x[19]
x[19] == x[20]
do
x[1:19] == x[2:20]
# Or even
x[-length(x)] == x[-1]
Is there a better way to count how many elements of a result satisfy a condition?
a <- c(1:5, 1:-3, 1, 2, 3, 4, 5)
b <- c(6:-8)
u <- a > b
length(u[u == TRUE])
## [1] 7
sum does this directly, counting the number of TRUE values in a logical vector:
sum(u, na.rm=TRUE)
And of course there is no need to construct u for this:
sum(a > b, na.rm=TRUE)
works just as well. sum will return NA by default if any of the values are NA. na.rm=TRUE ignores NA values in the sum (for logical or numeric).
If z consists of only TRUE or FALSE, then simply
length(which(z))
I've always used table for this:
a <- c(1:5, 1:-3, 1, 2, 3, 4, 5)
b <- c(6:-8)
table(a>b)
FALSE TRUE
8 7
I am looking for a nice and fast way of applying some arbitrary function which operates on vectors, such as sum, consecutively to a subvector of consecutive K elements.
Here is one simple example, which should illustrate very clearly what I want:
v <- c(1, 2, 3, 4, 5, 6, 7, 8)
v2 <- myapply(v, sum, group_size=3) # v2 should be equal to c(6, 15, 15)
The function should try to process groups of group_size elements of a given vector and apply a function to each group (treating it as another vector). In this example, the vector v2 is obtained as follows: (1 + 2 + 3) = 6, (4 + 5 + 6) = 15, (7 + 8) = 15. In this case, the K did not divide N exactly, so the last group was of size less then K.
If there is a nicer/faster solution which only works if N is a multiple of K, I would also appreciate it.
Try this:
library(zoo)
rollapply(v, 3, by = 3, sum, partial = TRUE, align = "left")
## [1] 6 15 15
or
apply(matrix(c(v, rep(NA, 3 - length(v) %% 3)), 3), 2, sum, na.rm = TRUE)
## [1] 6 15 15
Also, in the case of sum the last one could be shortened to
colSums(matrix(c(v, rep(0, 3 - length(v) %% 3)), 3))
As #Chase said in a comment, you can create your own grouping variable and then use that. Wrapping that process into a function would look like
myapply <- function(v, fun, group_size=1) {
unname(tapply(v, (seq_along(v)-1) %/% group_size, fun))
}
which gives your results
> myapply(v, sum, group_size=3)
[1] 6 15 15
Note this does not require the length of v to be a multiple of the group_size.
You could try this as well. This works nicely even if you want to include overlapping intervals, as controlled by by, and as a bonus, returns the intervals over which each value is derived:
library (gtools)
v2 <- running(v, fun=sum, width=3, align="left", allow.fewer=TRUE, by=3)
v2
1:3 4:6 7:8
6 15 15