Related
So I am basically looking for a more efficient way to do this:
c(seq(1, 5, 2), seq(2, 6, 2))
Is there a simpler function built in R or some of the packages that would allow me to specify just one interval (from 1 to 6; instead of having to specify from 1 to 5 and from 2 to 6), but to sort the numbers so that all the odd numbers appear before the even ones?
You can use sequence. The first argument of the function is the length of each sequence, from is the starting point, and by is the interval.
sequence(c(3, 3), from = c(1, 2), by = 2)
#[1] 1 3 5 2 4 6
Or, as a function that fits your request:
seqOrdered <- function(from = 1, to){
n = ceiling((to - from) / 2)
sequence(c(n, n), from = c(from, from + 1), by = 2)
}
seqOrdered(1, 6)
#[1] 1 3 5 2 4 6
Just concatenate the sub-data that contains only odd numbers of the original data and the other sub-data that contains the remaining even numbers.
In the following, you can have the original data x1, which consists of 10 integers from a poisson distribution of mean 8 (rpois(n = 10, lambda = 8)), and merge the sub-data of odd numbers (x1[x1 %% 2 == 1]) and that of even numbers (x1[x1 %% 2 == 0]).
## To prepare data
x1 <- rpois(n = 10, lambda = 8)
x1
## To sort the data so that odd numbers come earlier
c(x1[x1 %% 2 == 1], x1[x1 %% 2 == 0])
Can I pass a custom compare function to order that, given two items, indicates which one is ranked higher?
In my specific case I have the following list.
scores <- list(
'a' = c(1, 1, 2, 3, 4, 4),
'b' = c(1, 2, 2, 2, 3, 4),
'c' = c(1, 1, 2, 2, 3, 4),
'd' = c(1, 2, 3, 3, 3, 4)
)
If we take two vectors a and b, the index of the first element i at which a[i] > b[i] or a[i] < b[i] should determine what vector comes first. In this example, scores[['d']] > scores[['a']] because scores[['d']][2] > scores[['a']][2] (note that it doesn't matter that scores[['d']][5] < scores[['a']][5]).
Comparing two of those vectors could look something like this.
compare <- function(a, b) {
# get first element index at which vectors differ
i <- which.max(a != b)
if(a[i] > b[i])
1
else if(a[i] < b[i])
-1
else
0
}
The sorted keys of scores by using this comparison function should then be d, b, a, c.
From other solutions I've found, they mess with the data before ordering or introduce S3 classes and apply comparison attributes. With the former I fail to see how to mess with my data (maybe turn it into strings? But then what about numbers above 9?), with the latter I feel uncomfortable introducing a new class into my R package only for comparing vectors. And there doesn't seem to be a sort of comparator parameter I'd want to pass to order.
Here's an attempt. I've explained every step in the comments.
compare <- function(a, b) {
# subtract vector a from vector b
comparison <- a - b
# get the first non-zero result
restult <- comparison[comparison != 0][1]
# return 1 if result == 1 and 2 if result == -1 (0 if equal)
if(is.na(restult)) {return(0)} else if(restult == 1) {return(1)} else {return(2)}
}
compare_list <- function(list_) {
# get combinations of all possible comparison
comparisons <- combn(length(list_), 2)
# compare all possibilities
results <- apply(comparisons, 2, function(x) {
# get the "winner"
x[compare(list_[[x[1]]], list_[[x[2]]])]
})
# get frequency table (how often a vector "won" -> this is the result you want)
fr_tab <- table(results)
# vector that is last in comparison
last_vector <- which(!(1:length(list_) %in% as.numeric(names(fr_tab))))
# return the sorted results and add the last vectors name
c(as.numeric(names(sort(fr_tab, decreasing = T))), last_vector)
}
If you run the function on your example, the result is
> compare_list(scores)
[1] 4 2 1 3
I haven't dealt with the case that the two vectors are identical, you haven't explained how to deal with this.
The native R way to do this is to introduce an S3 class.
There are two things you can do with the class. You can define a method for xtfrm that converts your list entries to numbers. That could be vectorized, and conceivably could be really fast.
But you were asking for a user defined compare function. This is going to be slow because R function calls are slow, and it's a little clumsy because nobody does it. But following the instructions in the xtfrm help page, here's how to do it:
scores <- list(
'a' = c(1, 1, 2, 3, 4, 4),
'b' = c(1, 2, 2, 2, 3, 4),
'c' = c(1, 1, 2, 2, 3, 4),
'd' = c(1, 2, 3, 3, 3, 4)
)
# Add a class to the list
scores <- structure(scores, class = "lexico")
# Need to keep the class when subsetting
`[.lexico` <- function(x, i, ...) structure(unclass(x)[i], class = "lexico")
# Careful here: identical() might be too strict
`==.lexico` <- function(a, b) {identical(a, b)}
`>.lexico` <- function(a, b) {
a <- a[[1]]
b <- b[[1]]
i <- which(a != b)
length(i) > 0 && a[i[1]] > b[i[1]]
}
is.na.lexico <- function(a) FALSE
sort(scores)
#> $c
#> [1] 1 1 2 2 3 4
#>
#> $a
#> [1] 1 1 2 3 4 4
#>
#> $b
#> [1] 1 2 2 2 3 4
#>
#> $d
#> [1] 1 2 3 3 3 4
#>
#> attr(,"class")
#> [1] "lexico"
Created on 2021-11-27 by the reprex package (v2.0.1)
This is the opposite of the order you asked for, because by default sort() sorts to increasing order. If you really want d, b, a, c use sort(scores, decreasing = TRUE.
Here's another, very simple solution:
sort(sapply(scores, function(x) as.numeric(paste(x, collapse = ""))), decreasing = T)
What it does is, it takes all the the vectors, "compresses" them into a single numerical digit and then sorts those numbers in decreasing order.
I am new to R and looking for calculating h index.
H index is the popular measure to quantify scientific productivity.
Formally, if f is the function that corresponds to the number of citations for each publication, we compute the h index as follows:
First we order the values of f from the largest to the lowest value. Then, we look for the last position in which f is greater than or equal to the position (we call h this position).
For example, if we have a researcher with 5 publications A, B, C, D, and E with 10, 8, 5, 4, and 3 citations, respectively, the h index is equal to 4 because the 4th publication has 4 citations and the 5th has only 3. In contrast, if the same publications have 25, 8, 5, 3, and 3 citations, then the index is 3 because the fourth paper has only 3 citations.
Can anyone suggest smarter way to solve this
a <- c(10,8,5,4,3)
I expect the output of h index value as 4.
Assuming the input is already sorted, I would use this:
tail(which(a >= seq_along(a)), 1)
# [1] 4
You could, of course, put this in a little function:
h_index = function(cites) {
if(max(cites) == 0) return(0) # assuming this is reasonable
cites = cites[order(cites, decreasing = TRUE)]
tail(which(cites >= seq_along(cites)), 1)
}
a1 = c(10,8, 5, 4, 3)
a2 = c(10, 9, 7, 1, 1)
h_index(a1)
# [1] 4
h_index(a2)
# [1] 3
h_index(1)
# [1] 1
## set this to be 0, not sure if that's what you want
h_index(0)
# [1] 0
I propose a shorter + more flexible function that takes whatever numeric vector of citations you include (sorted or unsorted, with or without zeros, only zeros, etc.)
hindex <- function(x) {
tx <- sort(x, decreasing = T)
print(sum(tx >= seq_along(tx)))
}
A dplyr version if citation data is in a dataframe (thanks to https://stackoverflow.com/users/5313511/oelshie):
a <- data.frame(cites = c(10,8,5,4,3))
b <- a %>%
arrange(desc(cites)) %>%
summarise(h_index = sum(cites >= seq_along(cites)))
b
h_index
1 4
I have a question about finding index values in a vector.
Let's say I have a vector as follows:
vector <- c(1,2,4,6,8,10)
And, let's say I have the value '5'. I would like to find the maximum index in "vector" such that it is less than or equal to the value 5. In the case of the example above, this index would be 3 (since 4 is less than or equal to 5). Similarly, if instead I had a vector such as:
vector <- c(1,2,4,5,6,8,10)
Then if I were to find a value less than or equal to 5, this index would now be 4 instead of 3.
However, I also want to find the first and last time this index occurs. For example, if I had a vector such as:
vector <- c(1,1,2,2,4,5,5,5,5,6,8,10)
Then the first time this index occurs would be 6 and the last time this index occurs would be 9.
Is there a short, one-line method which would allow me to perform this task? Up until now I have been using the function max(which(....)), however I find that this method is extremely inefficient for large datasets since it will literally list hundreds/thousands of values, so I would like to find a more efficient method if possible which can fit in one line.
Thanks in advance.
You can use the following code:
min(max(which(vector <= 5)), min(which(vector == 5)))
First, it searches all indices where vector is less or equal to 5 with which function, then it takes the maximum one.
Second, it searches all indices where vector is equal to 5 and takes the minimum.
Third, it takes the first of these two indices
Thanks for all those who replied, I actually found an extremely short, one-line method to do this by download a package BBmisc. It has functions called which.last and which.first, and they perform the actions I need. Thanks again for taking the time to reply, I appreciate it.
You can use:
my_ind <- function(vec, num){
ind <- which.max(vec == num) # Check for equality first
if(ind == 1L && vec[1L] != num){
ind <- which.min(vec < num) - 1L
}
ind
}
my_ind(c(1,2,4,6,8,10), 5L) # 3
my_ind(c(1,2,4,5,6,8,10), 5L) # 4
my_ind(c(1,1,2,2,4,5,5,5,5,6,8,10), 5L) # 6
my_ind(c(5,8,10), 5L) # 1
my_ind(c(6,8,10), 5L) # 0 - returns 0 if all(vec > 5L)
I don't see a need for packages here. It seems like the construct which(x == max(x[x <= 5])) would work for you.
x <- c(1, 2, 4, 6, 8, 10)
which(x == max(x[x <= 5]))
# [1] 3
x <- c(1, 2, 4, 5, 6, 8, 10)
which(x == max(x[x <= 5]))
# [1] 4
x <- c(1, 1, 2, 2, 4, 5, 5, 5, 5, 6, 8, 10)
which(x == max(x[x <= 5]))
# [1] 6 7 8 9
And to find the min/max index for multiples indices, use head/tail.
head(which(x == max(x[x <= 5])), 1)
# [1] 6
tail(which(x == max(x[x <= 5])), 1)
# [1] 9
v <- c(1,2,3,4,5,6)
And I mention max=4,min=2
So, I want to know how many values fall outside this range.
I can do this (v < 2 & v > 4)
But not sure how to do the count...
After that I will simply create a percentage with respect to total number of values (here 6).
You can create and sum a logical vector. TRUE elements count as 1 and FALSE as 0, so this will give you the number of elements matching a particular condition.
v <- c(1, 2, 3, 4, 5, 6)
sum(v < 2 | v > 4)
The latter returns 3 because there are three values less than 2 or greater than 4. The comparisons are vectorized, so v < 2 tests whether each element of v in turn is less than 2. The OR operator is given by | in R.
To get the proportion of values beyond the range, you can divide the sum by the length of the vector, or alternatively use mean(), since the mean is the sum divided by the length.
mean(v < 2 | v > 4)
You can simply do:
sum(v < 2 | v > 4) / length(v)
[1] 0.5
You want to use | instead of & because no number will be both less than 2 and greater than 4.