unique pairs or combinations from a vector - r

Where am I going wrong with my function.
I am trying to create a function which will count all the unique pairs in a vector, say I have the following input:
ar <- c(10, 20, 20, 30, 30, 30, 40, 50)
The number of unique pairs is 20 = 1, 30 = 1 so I can just sum these up and the total number of unique pairs is 2.
However everything I am trying is creating 30 as having 2 unique pairs (since 30 occurs 3 times in the vector.
n <- 9
ar <- c(10, 20, 20, 30, 30, 30, 40, 50)
CountThePairs <- function(n, ar){
for(i in 1:length(ar)){
sum = ar[i] - ar[]
pairs = length(which(sum == 0))
}
return(sum)
}
CountThePairs(n = NULL, ar)
Is there an easier way of doing this? I prefer the base R version but interested in package versions also.

Here's a simpler way using floor and table form base R -
ar <- c(10, 20, 20, 30, 30, 30, 40, 50)
sum(floor(table(ar)/2))
[1] 2
Example 2 - Adding one more 30 to vector so now there are 2 pairs of 30 -
ar <- c(10, 20, 20, 30, 30, 30, 30, 40, 50)
sum(floor(table(ar)/2))
[1] 3
If 2 30 pairs count as one "unique" pair then original solution by #tmfmnk was correct -
sum(table(ar) >= 2)

You could use sapply on the unique values of the vector to return a logical vector if that value is repeated. The sum of that logical value is then the number of unique pairs.
ar <- c(10, 20, 20, 30, 30, 30, 40, 50)
is_pair <- sapply(unique(ar), function(x) length(ar[ar == x]) > 1)
sum(is_pair)
#[1] 2
I'm not sure what behaviour you want if there are four 30's - does this count as one unique pair still or is it now two? If the latter, you would need a slightly different solution:
n_pair <- sapply(unique(ar), function(x) length(ar[ar == x]) %/% 2)
sum(n_pair)
#[1] 2

Related

Rounding numbers to the nearest values (with different intervals) in R

I want to round (or replace) numbers in a
a <- c(0.505, 1.555, 2.667, 53.850, 411.793)
to the nearest values in b:
b <- c(0, 5, 10, 50, 100, 200, 500)
The output will be this:
a_rnd <- c(0, 0, 5, 50, 500)
The logic is simple but I couldn't find any solution, because all the approaches I found require values in b have an equal interval!
How can I achieve this?
You can use sapply to loop over all values of a and use these indexes to extract the proper b values
b[sapply(a, function(x) which.min(abs(x - b)))]
#> [1] 0 0 5 50 500
This is a relatively simple approach:
b[apply(abs(outer(a, b, "-")), 1, which.min)]

Find that values that are immediately below a given set of values and return the entry from another variable

I have two data frames:
a <- c(10, 20, 30)
c <- c(1, 50, 100)
df1 <- data.frame(cbind(a, b, c))
x <- c(80, 30, 15)
z <- c(10, 46, 99)
df2 <- data.frame(cbind(x, y, z))
I want to find the values in c that are immediately below the values in z, and then return the equivalent values in a.
So matching z to c will give me the locations: 1, 1, 2, and I want to output those locations from a (i.e 10, 10, 20)
Edit: For each value in z I want to find the location of the value that is below it in c, then return the value in a based on that location
You can use outer with the comparison <. Then colSums should add the TRUEs and give you your answer given that df1 is ordered on c, i.e.
colSums(outer(df1$c, df2$z, `<`))
#[1] 1 1 2
or
df1$a[colSums(outer(df1$c, df2$z, `<`))]
#[1] 10 10 20

How to create a combined sequence of constant length but starting at different values in R?

If I have a vector:
c(17,18,19)
And I want to get
c(17:17+5, 18:18+5, 19:19+5)
Or in other words:
c(17, 18, 19, 20, 21, 22, 18, 19, 20, 21, 22, 23, 19, 20, 21, 22, 23, 24)
How would I accomplish this in one line? Perhaps an essential R function I am missing? This can be done by sapply I am sure, but wondering if there was a non-iteration method.
c(outer(0:5, x, `+`))
or
rep(x, each = 6) + rep(0:5, 3)
There are probably a few easier ways, but here's an mapply method.
> x <- c(17,18,19)
> c(mapply(seq, from = x, to = x + 5))
# [1] 17 18 19 20 21 22 18 19 20 21 22 23 19 20 21 22 23 24
Or even quicker
> c(mapply(`:`, from = x, to = x + 5))
mapply is basically a multi-apply, for applying a function to multiple vector or list arguments.
The following actually proved slightly faster than mapply
> c(sapply(x, function(y) `:`(y, y+5)))

R Transition Matrix

I would like to convert a vector into a transitions matrix.
I have a vector t and divided this by its max value to get values between 0 and 1. I then made this into a matrix
t <- c(22, 65, 37, 84, 36, 14, 9, 19, 5, 49)
x <- t/max(t)
y <- x%*%t(x)
My problem is that I want the columns of the matrix (y) to add up to 1, i.e. to make it into a transition matrix but I'm not sure how to do that. Any suggestions appreciated!
sweep() is a versatile little function that you can use here to divide each column by its own sum:
yy <- sweep(y, MARGIN = 2, STATS = colSums(y), FUN = "/")
## Confirm that the columns of yy all sum to 1
colSums(yy)
## [1] 1 1 1 1 1 1 1 1 1 1

R: find nearest index

I have two vectors with a few thousand points, but generalized here:
A <- c(10, 20, 30, 40, 50)
b <- c(13, 17, 20)
How can I get the indicies of A that are nearest to b? The expected outcome would be c(1, 2, 2).
I know that findInterval can only find the first occurrence, and not the nearest, and I'm aware that which.min(abs(b[2] - A)) is getting warmer, but I can't figure out how to vectorize it to work with long vectors of both A and b.
You can just put your code in a sapply. I think this has the same speed as a for loop so isn't technically vectorized though:
sapply(b,function(x)which.min(abs(x - A)))
FindInterval gets you very close. You just have to pick between the offset it returns and the next one:
#returns the nearest occurence of x in vec
nearest.vec <- function(x, vec)
{
smallCandidate <- findInterval(x, vec, all.inside=TRUE)
largeCandidate <- smallCandidate + 1
#nudge is TRUE if large candidate is nearer, FALSE otherwise
nudge <- 2 * x > vec[smallCandidate] + vec[largeCandidate]
return(smallCandidate + nudge)
}
nearest.vec(b,A)
returns (1,2,2), and should comparable to FindInterval in performance.
Here's a solution that uses R's often overlooked outer function. Not sure if it'll perform better, but it does avoid sapply.
A <- c(10, 20, 30, 40, 50)
b <- c(13, 17, 20)
dist <- abs(outer(A, b, '-'))
result <- apply(dist, 2, which.min)
# [1] 1 2 2

Resources