Return last match from vector - r

Is there a simple way to get the index of the last match of a vector?
lastInflectionRow(c(TRUE,FALSE,TRUE,FALSE,FALSE))
lastInflectionRow<-function(temp){
m<-match(TRUE,temp,nomatch=NA)
m
}
GOAL: 3

Another simple way could be using max on the index of TRUE elements.
x <- c(TRUE,FALSE,TRUE,FALSE,FALSE)
max(which(x))
#[1] 3

?Position is made for this sort of thing, when using the right=TRUE argument. All of the below should be essentially equivalent.
Position(I, x, right=TRUE)
#[1] 3
Position(identity, x, right=TRUE)
#[1] 3
Position(isTRUE, x, right=TRUE)
#[1] 3
Position(function(x) x, x, right=TRUE)
#[1] 3

We could use == if we are comparing with a single element
tail(which(v1 == TRUE),1)
#[1] 3
The == part is not necessary as the vector is logical
tail(which(v1),1)
#[1] 3
NOTE: Here I am assuming that the OP's vector may not be always TRUE/FALSE values as is showed in the example.
If we need to use match, one option is mentioned here
data
v1 <- c(TRUE,FALSE,TRUE,FALSE,FALSE)

If performance is a consideration, then the best way I've found of doing this is
length(x) + 1L - match(TRUE, rev(x))
This is significantly faster, particularly in the general case where one desires the rightmost match for more than one entry.
MatchLast <- function (needles, haystack) # This approach
length(haystack) + 1L - match(needles, rev(haystack))
MaxWhich <- function (needles, haystack) # Ronak Shah's approach
vapply(needles, function (needle) max(which(haystack==needle)), integer(1))
Pos <- function (needles, haystack) # thelatemail's suggestion
vapply(needles, function (needle)
Position(function (x) x == needle, haystack, right=TRUE),
integer(1))
Tail <- function (needles, haystack) # akrun's solution
vapply(needles, function (needle) tail(which(haystack==needle), 1), integer(1))
With Rilkon42's data:
x <- c(TRUE, FALSE, TRUE, FALSE, FALSE)
microbenchmark(MatchLast(TRUE, x), MaxWhich(TRUE, x), Pos(TRUE, x), Tail(TRUE, x))
## function min lq mean median uq max
## MatchLast 10.730 19.1270 175.3851 23.7920 28.458 14757.131
## MaxWhich 11.663 22.1600 275.4657 25.1920 28.224 24355.120
## Pos 25.192 47.5845 194.1296 52.7160 64.612 12890.622
## Tail 39.187 69.7435 223.1278 83.0395 101.233 9223.848
In the more general case:
needles <- 24:45
haystack <- c(45, 45, 44, 44, 43, 43, 42, 42, 41, 41, 40, 40, 39, 39, 38, 38, 37, 37,
36, 36, 35, 35, 34, 34, 33, 33, 32, 32, 31, 31, 30, 30, 29, 29, 28, 28,
27, 27, 26, 26, 25, 25, 24, 24)
microbenchmark(MatchLast(needles, haystack), MaxWhich(needles, haystack),
Pos(needles, haystack), Tail(needles, haystack))
## function min lq mean median uq max
## MatchLast 15.395 30.3240 137.3086 36.8550 48.051 9842.441
## MaxWhich 90.971 102.1665 161.1100 172.3765 214.829 238.854
## Pos 709.563 733.8220 1111.7000 1162.7780 1507.530 1645.383
## Tail 654.981 690.2035 1017.7400 882.6385 1404.197 1595.933

Related

How to find GCD for a list of (1:n) numbers in R

How to find GCD for a list of (1:n) numbers in R ?
GCD=function(a, b){
m=min(a, b)
while(a%%m>0|b%%m>0){m=m-1}
return(m)}
Here is my code to find GCD for two integers, how can I modify it to find GCD for a list of numbers from 1 to n without too much changes on my original code?
Thankyou very much !
You can use the any function:
GCD <- function(x) {
m = min(x)
while (any(x %% m > 0)){
m = m - 1
}
return(m)
}
GCD(c(12, 24, 28, 36, 200))
# [1] 4
GCD(c(6, 24, 28, 36, 200))
# [1] 2
You can define function gcd based on GCD like below
gcd <- function(...) Reduce(GCD,list(...))
and you can try
> gcd(6, 24, 28, 36, 200)
[1] 2

unexpected results when comparing Biostrings subsequences using the identical function

I'm checking if a sequence is present at the beginning and at the end of a longer sequence. I considered using identical but this gives me a surprising result:
library(Biostrings)
EcoRI <- DNAString("GAATTC")
myseq <- DNAString("GAATTCGGGGAAAATTTTCCCCGAATTC")
EcoRI
# 6-letter "DNAString" instance
#seq: GAATTC
subseq(myseq, 1, 6)
# 6-letter "DNAString" instance
#seq: GAATTC
subseq(myseq, 23, 28)
# 6-letter "DNAString" instance
#seq: GAATTC
identical(EcoRI, subseq(myseq, 1, 6))
#TRUE
identical(EcoRI, subseq(myseq, 23, 28))
#FALSE
identical(subseq(myseq, 1, 6), subseq(myseq, 23, 28))
#FALSE
An easy fix is to use:
identical(toString(EcoRI), toString(subseq(myseq, 23, 28)))
# TRUE
But I don't understand why identical on the DNAString objects returns FALSE sometimes.
Does identical also compare the offset attributes?
attributes(EcoRI)$offset
#[1] 0
attributes(subseq(myseq, 1, 6))$offset
#[1] 0
attributes(subseq(myseq, 23, 28))$offset
#[1] 22

Writing a median function in R

I have been tasked to write my own median function in R, without using the built-in median function. If the numbers are odd; calculate the two middle values, as is usual concerning the median value.
Something i probably could do in Java, but I struggle with some of the syntax in
R Code:
list1 <- c(7, 24, 9, 42, 12, 88, 91, 131, 47, 71)
sorted=list1[order(list1)]
sorted
n = length(sorted)
n
if(n%2==0) # problem here, implementing mod() and the rest of logic.
Here is a self-written function mymedian:
mymedian <- function(lst) {
n <- length(lst)
s <- sort(lst)
ifelse(n%%2==1,s[(n+1)/2],mean(s[n/2+0:1]))
}
Example
list1 <- c(7, 24, 9, 42, 12, 88, 91, 131, 47, 71)
list2 <- c(7, 24, 9, 42, 12, 88, 91, 131, 47)
mymedian(list1)
mymedian(list2)
such that
> mymedian(list1)
[1] 44.5
> mymedian(list2)
[1] 42
I believe this should get you the median you're looking for:
homemade_median <- function(vec){
sorted <- sort(vec)
n <- length(sorted)
if(n %% 2 == 0){
mid <- sorted[c(floor(n/2),floor(n/2)+1)]
med <- sum(mid)/2
} else {
med <- sorted[ceiling(n/2)]
}
med
}
homemade_median(list1)
median(list1) # for comparison
A short function that does the trick:
my_median <- function(x){
# Order Vector ascending
x <- sort(x)
# For even lenght average the value of the surrounding numbers
if((length(x) %% 2) == 0){
return((x[length(x)/2] + x[length(x)/2 + 1]) / 2)
}
# For uneven lenght just take the value thats right in the center
else{
return(x[(length(x)/2) + 0.5])
}
}
Check to see if it returns desired outcomes:
my_median(list1)
44.5
median(list1)
44.5
#
list2 <- c(1,4,5,90,18)
my_median(list2)
5
median(list2)
5
You don't need to test for evenness, you can just create a sequence from half the length plus one, using floor and ceiling as appriopriate:
x <- rnorm(100)
y <- rnorm(101)
my_median <- function(x)
{
mid <- seq(floor((length(x)+1)/2),ceiling((length(x)+1)/2))
mean(sort(x)[mid])
}
my_median(x)
[1] 0.1682606
median(x)
[1] 0.1682606
my_median(y)
[1] 0.2473015
median(y)
[1] 0.2473015

unique pairs or combinations from a vector

Where am I going wrong with my function.
I am trying to create a function which will count all the unique pairs in a vector, say I have the following input:
ar <- c(10, 20, 20, 30, 30, 30, 40, 50)
The number of unique pairs is 20 = 1, 30 = 1 so I can just sum these up and the total number of unique pairs is 2.
However everything I am trying is creating 30 as having 2 unique pairs (since 30 occurs 3 times in the vector.
n <- 9
ar <- c(10, 20, 20, 30, 30, 30, 40, 50)
CountThePairs <- function(n, ar){
for(i in 1:length(ar)){
sum = ar[i] - ar[]
pairs = length(which(sum == 0))
}
return(sum)
}
CountThePairs(n = NULL, ar)
Is there an easier way of doing this? I prefer the base R version but interested in package versions also.
Here's a simpler way using floor and table form base R -
ar <- c(10, 20, 20, 30, 30, 30, 40, 50)
sum(floor(table(ar)/2))
[1] 2
Example 2 - Adding one more 30 to vector so now there are 2 pairs of 30 -
ar <- c(10, 20, 20, 30, 30, 30, 30, 40, 50)
sum(floor(table(ar)/2))
[1] 3
If 2 30 pairs count as one "unique" pair then original solution by #tmfmnk was correct -
sum(table(ar) >= 2)
You could use sapply on the unique values of the vector to return a logical vector if that value is repeated. The sum of that logical value is then the number of unique pairs.
ar <- c(10, 20, 20, 30, 30, 30, 40, 50)
is_pair <- sapply(unique(ar), function(x) length(ar[ar == x]) > 1)
sum(is_pair)
#[1] 2
I'm not sure what behaviour you want if there are four 30's - does this count as one unique pair still or is it now two? If the latter, you would need a slightly different solution:
n_pair <- sapply(unique(ar), function(x) length(ar[ar == x]) %/% 2)
sum(n_pair)
#[1] 2

R: Average nearby elements in a vector

I have many vectors such as this: c(28, 30, 50, 55, 99, 102) and I would like to obtain a new vector where elements differing less than 10 from one to another are averaged. In this case, I would like to obtain c(29, 52.5, 100.5).
Another way
vec <- c(28, 30, 50, 55, 99, 102)
indx <- cumsum(c(0, diff(vec)) > 10)
tapply(vec, indx, mean)
# 0 1 2
# 29.0 52.5 100.5

Resources