Kendall tau distance (a.k.a bubble-sort distance) between permutations in base R - r

How can the Kendall tau distance (a.k.a. bubble-sort distance) between two permutations be calculated in R without loading additional libraries?

Here is an O(n.log(n)) implementation scraped together after reading around, but I suspect there may be better R solutions.
inversionNumber <- function(x){
mergeSort <- function(x){
if(length(x) == 1){
inv <- 0
#printind(' base case')
} else {
n <- length(x)
n1 <- ceiling(n/2)
n2 <- n-n1
y1 <- mergeSort(x[1:n1])
y2 <- mergeSort(x[n1+1:n2])
inv <- y1$inversions + y2$inversions
x1 <- y1$sortedVector
x2 <- y2$sortedVector
i1 <- 1
i2 <- 1
while(i1+i2 <= n1+n2+1){
if(i2 > n2 || (i1 <= n1 && x1[i1] <= x2[i2])){ # ***
x[i1+i2-1] <- x1[i1]
i1 <- i1 + 1
} else {
inv <- inv + n1 + 1 - i1
x[i1+i2-1] <- x2[i2]
i2 <- i2 + 1
}
}
}
return (list(inversions=inv,sortedVector=x))
}
r <- mergeSort(x)
return (r$inversions)
}
.
kendallTauDistance <- function(x,y){
return(inversionNumber(order(x)[rank(y)]))
}
If one needs custom tie-breaking one would have to fiddle with the last condition on the line marked # ***
Usage:
> kendallTauDistance(c(1,2,4,3),c(2,3,1,4))
[1] 3

You could use
(choose(length(x),2) - cov(x,y,method='kendall')/2)/2
if you know that both of the input lists x and y do not contain duplicates.

Hmmm. Somebody is interested in exactly same thing which I have been working on.
Below is my code in python.
from collections import OrderedDict
def invert(u):
identity = sorted(u)
ui = []
for x in identity:
index = u.index(x)
ui.append(identity[index])
print "Given U is:\n",u
print "Inverse of U is:\n",ui
return identity,ui
def r_vector(x,y,id):
# from collections import OrderedDict
id_x_Map = OrderedDict(zip(id,x))
id_y_Map = OrderedDict(zip(id,y))
r = []
for x_index,x_value in id_x_Map.items():
for y_index,y_value in id_y_Map.items():
if (x_value == y_index):
r.append(y_value)
print r
return r
def xr_vector(x):
# from collections import OrderedDict
values_checked = []
unorderd_xr = []
ordered_xr = []
for value in x:
values_to_right = []
for n in x[x.index(value)+1:]:
values_to_right.append(n)
result = [i for i in values_to_right if i < value]
if(len(result)!=0):
values_checked.append(value)
unorderd_xr.append(len(result))
value_ltValuePair = OrderedDict(zip(values_checked,unorderd_xr))
for key in sorted(value_ltValuePair):
# print key,value_ltValuePair[key]
ordered_xr.append(value_ltValuePair[key])
print "Xr= ",ordered_xr
print "Kendal Tau distance = ",sum(ordered_xr)
if __name__ == '__main__':
print "***********************************************************"
print "Enter the first string (U):"
u = raw_input().split()
print "Enter the second string (V):"
v = raw_input().split()
print "***********************************************************"
print "Step 1: Find U Inverse"
identity,uinverse = invert(u)
print "***********************************************************"
print "Step 2: Find R = V.UInverse"
r = r_vector(v,uinverse,identity)
print "***********************************************************"
print "Step 3: Finding XR and Kenday_Tau"
xr_vector(r)
About the approach/ algorithm to find Kendall Tau distance this way, I would either leave it to you, or point towards the research paper Optimal Permutation Codes and the Kendall’s τ-Metric
You can implement (Approach) the same in R.

Related

Slow recursion even with memoization in R

I'm trying to solve the problem #14 of Project Euler.
So the main objective is finding length of Collatz sequence.
Firstly I solved problem with regular loop:
compute <- function(n) {
result <- 0
max_chain <- 0
hashmap <- 1
for (i in 1:n) {
chain <- 1
number <- i
while (number > 1) {
if (!is.na(hashmap[number])) {
chain <- chain + hashmap[number]
break
}
if (number %% 2 == 0) {
chain <- chain + 1
number <- number / 2
} else {
chain <- chain + 2
number <- (3 * number + 1) / 2
}
}
hashmap[i] <- chain
if (chain > max_chain) {
max_chain <- chain
result <- i
}
}
return(result)
}
Only 2 seconds for n = 1000000.
I decided to replace while loop to recursion
len_collatz_chain <- function(n, hashmap) {
get_len <- function(n) {
if (is.na(hashmap[n])) {
hashmap[n] <<- ifelse(n %% 2 == 0, 1 + get_len(n / 2), 2 + get_len((3 * n + 1) / 2))
}
return(hashmap[n])
}
get_len(n)
return(hashmap)
}
compute <- function(n) {
result <- 0
max_chain <- 0
hashmap <- 1
for (i in 1:n) {
hashmap <- len_collatz_chain(i, hashmap)
print(length(hashmap))
if (hashmap[i] > max_chain) {
max_chain <- hashmap[i]
result <- i
}
}
return(result)
}
This solution works but works so slow. Almost 1 min for n = 10000.
I suppose that one of the reasons is R creates hashmap object each time when call function len_collatz_chain.
I know about Rcpp packages and yes, the first solution works fine but I can't understand where I'm wrong.
Any tips?
For example, my Python recursive solution works in 1 second with n = 1000000
def len_collatz_chain(n: int, hashmap: dict) -> int:
if n not in hashmap:
hashmap[n] = 1 + len_collatz_chain(n // 2, hashmap) if n % 2 == 0 else 2 + len_collatz_chain((3 * n + 1) // 2, hashmap)
return hashmap[n]
def compute(n: int) -> int:
result, max_chain, hashmap = 0, 0, {1: 1}
for i in range(2, n):
chain = len_collatz_chain(i, hashmap)
if chain > max_chain:
result, max_chain = i, chain
return result
The main difference between your R and Python code is that in R you use a vector for the hashmap, while in Python you use a dictionary and that hashmap is transferred many times as function argument.
In Python, if you have a Dictionary as function argument, only a reference to the actual data is transfered to the called function. This is fast. The called function works on the same data as the caller.
In R, a vector is copied when used as function argument. This is potentially slow, but safer in the sense that the called function cannot alter the data of the caller.
This the main reason that Python is so much faster in your code.
You can however alter the R code slightly, such that the hashmap is not transfered as function argument anymore:
len_collatz_chain <- local({
hashmap <- 1L
get_len <- function(n) {
if (is.na(hashmap[n])) {
hashmap[n] <<- ifelse(n %% 2 == 0, 1 + get_len(n / 2), 2 + get_len((3 * n + 1) / 2))
}
hashmap[n]
}
get_len
})
compute <- function(n) {
result <- rep(NA_integer_, n)
for (i in seq_len(n)) {
result[i] <- len_collatz_chain(i)
}
result
}
compute(n=10000)
This makes the R code much faster. (Python will probably still be faster though).
Note that I have also removed the return statements in the R code, as they are not needed and add one level to the call stack.

Error in for loop - attempt to select less than one element in integerOneIndex

I'm trying to translate a C routine from an old sound synthesis program into R, but have indexing issues which I'm struggling to understand (I'm a beginner when it comes to using loops).
The routine creates an exponential lookup table - the vector exptab:
# Define parameters
sinetabsize <- 8192
prop <- 0.8
BP <- 10
BD <- -5
BA <- -1
# Create output vector
exptab <- vector("double", sinetabsize)
# Loop
while(abs(BD) > 0.00001){
BY = (exp(BP) -1) / (exp(BP*prop)-1)
if (BY > 2){
BS = -1
}
else{
BS = 1
}
if (BA != BS){
BD = BD * -0.5
BA = BS
BP = BP + BD
}
if (BP <= 0){
BP = 0.001
}
BQ = 1 / (exp(BP) - 1)
incr = 1 / sinetabsize
x = 0
stabsize = sinetabsize + 1
for (i in (1:(stabsize-1))){
x = x + incr
exptab [[sinetabsize-i]] = 1 - (BQ * (exp(BP * x) - 1))
}
}
Running the code gives the error:
Error in exptab[[sinetabsize - i]] <- 1 - (BQ * (exp(BP * x) - 1)) :
attempt to select less than one element in integerOneIndex
Which, I understand from looking at other posts, indicates an indexing problem. But, I'm finding it difficult to work out the exact issue.
I suspect the error may lie in my translation. The original C code for the last few lines is:
for (i=1; i < stabsize;i++){
x += incr;
exptab[sinetabsize-i] = 1.0 - (float) (BQ*(exp(BP*x) - 1.0));
}
I had thought the R code for (i in (1:(stabsize-1))) was equivalent to the C code for (i=1; i< stabsize;i++) (i.e. the initial value of i is i = 1, the test is whether i < stabsize, and the increment is +1). But now I'm not so sure.
Any suggestions as to where I'm going wrong would be greatly appreciated!
As you say, array indexing in R starts at 1. In C it starts at zero. I reckon that's your problem. Can sinetabsize-i ever get to zero?

How to use the output of an r function in another function?

I want to create an script that calculates probabilities for a rol game.
I´m new to programming and I´m stuck with the return values and nested functions. What I want is to use the values returned by the first function in the next one.
I have two functions dice(k, n) and fight(a, b). (for the example, the functions are partly written):
dice <- function (k, n) {
if (k > 3 && n > 2){
a <- 3
b <- 2
attack <- sample(1:6, a)
deff <- sample(1:6, b)
}
return(c(attack, deff))
}
So I want to use the vector attack, and deff in the next function:
fight <- function(a, b){
if (a == 3 && b == 2){
if(sort(attack,T)[1] > sort(deff,T)[1]){
n <- n - 1}
if (sort(attack,T)[1] <= sort(deff,T)[1]) {
k <- k - 1}
if (sort(attack,T)[2] > sort(deff,T)[2]) {
n <- n - 1}
if (sort(attack,T)[2]<= sort(deff,T)[2]){
k <- k - 1}
}
return(c(k, n)
}
But this gives me the next error:
Error in sort(attack, T) : object 'attack' not found
Any ideas? Thanks!

R mknapsack function

I run the R program from article where used mknapsack function from adagio package, and everything's good. But if I want using a random values I get an error "Error condition raised".
I have a program:
n=16
m=5
max=700
min = 10
planks_we_have = floor(runif(n=m, min = 100, max = max))
planks_we_want = floor(runif(n=n, min = min, max = 16))
library(adagio)
# mknapsack calling signature is: mknapsack(values, weights, capacities)
solution <- mknapsack(planks_we_want, planks_we_want, planks_we_have)
# Above I added +1 cm to each length to compensate for the loss when sawing.
solution$ksack
# Now pretty printing what to cut so that we don't make mistakes...
assignment <- data.frame(cut_this = planks_we_have[solution$ksack], into_this = planks_we_want)
t(assignment[order(assignment[,1]), ])
Result:
Warning
In mknapsack(planks_we_want, planks_we_want, planks_we_have) :
Error condition raised: check input data ...!
Error
In data.frame(cut_this = planks_we_have[solution$ksack], into_this = planks_we_want) :
Arguments imply different numbers of lines: 0, 5
I don't understand what is the reason. The source code of the knapsack function gives me nothing:
function (p, w, k, bck = -1)
{
stopifnot(is.numeric(p), is.numeric(w), is.numeric(k))
if (any(w <= 0))
stop("'weights' must be a vector of positive numbers.")
if (any(p <= 0))
stop("'profits' must be a vector of positive numbers.")
if (any(floor(p) != ceiling(p)) || any(floor(w) != ceiling(w)) ||
any(floor(k) != ceiling(k)) || any(p >= 2^31) || any(w >=
2^31) || any(k >= 2^31))
stop("All inputs must be positive integers < 2^31 !")
n <- length(p)
m <- length(k)
if (length(w) != n)
stop("Profit 'p' and weight 'w' must be vectors of equal length.")
xstar <- vector("integer", n)
vstar <- 0
num <- 5 * m + 14 * n + 4 * m * n + 3
wk <- numeric(n)
iwk <- vector("integer", num)
S <- .Fortran("mkp", as.integer(n), as.integer(m), as.integer(p),
as.integer(w), as.integer(k), bs = as.integer(bck),
xs = as.integer(xstar), vs = as.integer(vstar), as.numeric(wk),
as.integer(iwk), as.integer(num), PACKAGE = "adagio")
if (S$vs < 0)
warning("Error condition raised: check input data ...!")
return(list(ksack = S$xs, value = S$vs, btracks = S$bs))
}
Versions:
R - 3.4.1
Adagio - 0.7.1
Please read first the help page if you have problems with a function. Looking at the solution returned, it has error code vs=-7 and help says "vs=-7 if array k is not correctly sorted". Sorting the vector of capacities may give another error, for instance in case all items can be put in one knapsack. Of course, all this depends on the random numbers generated (better fix random numbers before asking).

Complex numbers and missing arguments in R function

I am solving a task for my R online course. The task is to write a function, that solves the quadratic equation with the Lagrange resolvents, or:
x1<--p/2+sqrt((p/2)^2-q)
x2<--p/2-sqrt((p/2)^2-q)
1) If the arguments are non-numeric, the function should return an explained error (or why the error has happend). 2) If there are missing arguments, the function should return an explained error (different from the default). 3) If x1 and x2 are complex numbers (for example if p=-4 and q=7, then x1=2+i*1.73 and x2=2-i*1.73), the function should should also solve the equation instead of generating NaNs and return a warning message, that the numbers are complex. Maybe if I somehow cast it to as.complex, but I want this to be a special case and don't want to cast the basic formula.
My function looks like this:
quadraticEquation<-function(p,q){
if(!is.numeric(c(p,q)))stop("p and q are not numeric") #partly works
if(is.na(c(p,q)))stop("there are argument/s missing") #does not work
x1<--p/2+sqrt((p/2)^2-q)
x2<--p/2-sqrt((p/2)^2-q)
#x1<--p/2+sqrt(as.complex((p/2)^2-q)) works, but I want to perform this only in case the numbers are complex
#x2<--p/2-sqrt(as.complex((p/2)^2-q))
return (c(x1,x2))
}
When testing the function:
quadraticEquation(4,3) #basic case is working
quadraticEquation(TRUE,5) #non-numeric, however the if-statement is not executed, because it assumes that TRUE==1
quadraticEquation(-4,7) #complex number
1) how to write the function, so it assumes TRUE (without "") and anything that is non-numeric as non-numeric?
2) basic case, works.
3) how can I write the function, so it solves the equation and prints the complex numbers and also warns that the numbers are complex (warning())?
Something like this?
quadraticEquation <- function(p, q){
## ------------------------% chek the arguments %---------------------------##
if(
missing(p) | missing(q) # if any of arguments is
){ # missing - stop.
stop("[!] There are argument/s missing")
}
else if(
!is.numeric(p) | !is.numeric(q) | any(is.na(c(p, q))) # !is.numeric(c(1, T))
){ # returns TRUE - conver-
stop("[!] Argument/s p or/and q are not numeric") # tion to the same type
}
## --------------------% main part of the function %--------------------------##
r2 <- p^2 - 4*q # calculate r^2,
if(r2 < 0){ # if r2 < 0 (convert) it
warning("equation has complex roots") # to complex and warn
r2 <- as.complex(r2)
}
# return named roots
setNames(c(-1, 1) * sqrt(r2)/2 - p/2, c("x1", "x2"))
}
quadraticEquation() # No arguments provided
#Error in quadraticEquation() : [!] There are argument/s missing
quadraticEquation(p = 4) # Argument q is missing
#Error in quadraticEquation(p = 4) : [!] There are argument/s missing
quadraticEquation(p = TRUE, q = 7) # p is logical
#Error in quadraticEquation(p = TRUE, q = 7) :
#[!] Argument/s p or/and q are not numeric
quadraticEquation(p = NA, q = 7) # p is NA
#Error in quadraticEquation(p = NA, q = 7) :
#[!] Argument/s p or/and q are not numeric
quadraticEquation(p = 7, q = -4) # real roots
# x1 x2
#-7.5311289 0.5311289
quadraticEquation(p = -4, q = 7) # complex roots
# x1 x2
#2-1.732051i 2+1.732051i
#Warning message:
#In quadraticEquation(p = -4, q = 7) : equation has complex roots
When you write is.numeric(c(p, q)), R first evaluates c(p, q) before determining whether it is numeric or not. In particular if p = TRUE and q = 3, then c(p, q) is promoted to the higher type: c(1, 3).
Here is a vectorized solution, so if p and q are vectors instead of scalars the result is also a vector.
quadraticEquation <- function(p, q) {
if (missing(p)) {
stop("`p` is missing.")
}
if (missing(q)) {
stop("`q` is missing.")
}
if (!is.numeric(p)) {
stop("`p` is not numeric.")
}
if (!is.numeric(q)) {
stop("`q` is not numeric.")
}
if (anyNA(p)) {
stop("`p` contains NAs.")
}
if (anyNA(q)) {
stop("`q` contains NAs.")
}
R <- p^2 / 4 - q
if (min(R) < 0) {
R <- as.complex(R)
warning("Returning complex values.")
}
list(x1 = -p / 2 + sqrt(R),
x2 = -p / 2 - sqrt(R))
}
Also, you should never write x1<--p/2. Keep spaces around infix operators: x1 <- -p/2.

Resources