I want to program a function in R that compute the elementary symmetric polynomials. For i=0, 1, ..., p, the i-th elementary polynomial is given by
How can I code this function in R? I've tried
x<-c(1,2,3,4)
crossprod(x)
# or
for (i in 1:length(x)) print(crossprod((combn(x,i))))
but I don't get the desired result, which is supposed to give
e0= 1
e1= 10
e2= 35
e3= 50
e4= 24
Take the product of each combination using combn(x, k, prod) and then sum that:
sympoly <- function(k, x) sum(combn(x, k, prod))
sapply(0:4, sympoly, 1:4)
## [1] 1 10 35 50 24
The solution is not crossprod, it's combn/prod followed by sum.
elSymPoly <- function(x){
sapply(c(0, seq_along(x)), function(n){
sum(apply(combn(x, n), 2, prod))
})
}
x <- c(1, 2, 3, 4)
elSymPoly(x)
#[1] 1 10 35 50 24
Note that the function also works with an empty vector (but not with NULL).
y <- integer(0)
elSymPoly(y)
#[1] 1
Related
Im trying to write a function based on the Luhn algorithm (mod 10 algorithm), and I need a function that sums all integers > 9 in my number vector individually. E.g. 10 should sum to 1+0=1, and 19 should sum to 1+9=10. Example code:
nmr <- ("1_9_8_2_0_5_0_1_3_3_4_8")
nmr <- strsplit(nmr, "_")
nmr <- as.numeric(as.character(unlist(nmr[[1]])))
luhn_alg <- c(0,0,2,1,2,1,2,1,2,1,2,0)
x <- nmr*luhn_alg
x
[1] 0 0 16 2 0 5 0 1 6 3 8 0
sum(x)
[1] 41
I dont want the sum of x to equal 41. Instead I want the sum to equal: 0+0+1+6+2+0+5+0+1+6+3+8+0=32. I tried with a for loop but doesn't seem to get it right. Any help is much appreciated.
You may need to split the data again after multiplying it with luhn_alg.
Luhn_sum <- function(x, y) {
nmr <- as.numeric(unlist(strsplit(x, "_")))
x1 <- nmr*y
x1 <- as.numeric(unlist(strsplit(as.character(x1), '')))
sum(x1)
}
nmr <- ("1_9_8_2_0_5_0_1_3_3_4_8")
luhn_alg <- c(0,0,2,1,2,1,2,1,2,1,2,0)
Luhn_sum(nmr, luhn_alg)
#[1] 32
You can use substring and seq to create a vector of single digit numbers, then you only need to do a sum over them:
sum(
as.numeric(
substring(
paste(x, collapse = ""),
seq(1, sum(nchar(x)), 1),
seq(1, sum(nchar(x)), 1)
)
)
)
I would like to find out the three closest numbers in a vector.
Something like
v = c(10,23,25,26,38,50)
c = findClosest(v,3)
c
23 25 26
I tried with sort(colSums(as.matrix(dist(x))))[1:3], and it kind of works, but it selects the three numbers with minimum overall distance not the three closest numbers.
There is already an answer for matlab, but I do not know how to translate it to R:
%finds the index with the minimal difference in A
minDiffInd = find(abs(diff(A))==min(abs(diff(A))));
%extract this index, and it's neighbor index from A
val1 = A(minDiffInd);
val2 = A(minDiffInd+1);
How to find two closest (nearest) values within a vector in MATLAB?
My assumption is that the for the n nearest values, the only thing that matters is the difference between the v[i] - v[i - (n-1)]. That is, finding the minimum of diff(x, lag = n - 1L).
findClosest <- function(x, n) {
x <- sort(x)
x[seq.int(which.min(diff(x, lag = n - 1L)), length.out = n)]
}
findClosest(v, 3L)
[1] 23 25 26
Let's define "nearest numbers" by "numbers with minimal sum of L1 distances". You can achieve what you want by a combination of diff and windowed sum.
You could write a much shorter function but I wrote it step by step to make it easier to follow.
v <- c(10,23,25,26,38,50)
#' Find the n nearest numbers in a vector
#'
#' #param v Numeric vector
#' #param n Number of nearest numbers to extract
#'
#' #details "Nearest numbers" defined as the numbers which minimise the
#' within-group sum of L1 distances.
#'
findClosest <- function(v, n) {
# Sort and remove NA
v <- sort(v, na.last = NA)
# Compute L1 distances between closest points. We know each point is next to
# its closest neighbour since we sorted.
delta <- diff(v)
# Compute sum of L1 distances on a rolling window with n - 1 elements
# Why n-1 ? Because we are looking at deltas and 2 deltas ~ 3 elements.
withingroup_distances <- zoo::rollsum(delta, k = n - 1)
# Now it's simply finding the group with minimum within-group sum
# And working out the elements
group_index <- which.min(withingroup_distances)
element_indices <- group_index + 0:(n-1)
v[element_indices]
}
findClosest(v, 2)
# 25 26
findClosest(v, 3)
# 23 25 26
A base R option, idea being we first sort the vector and subtract every ith element with i + n - 1 element in the sorted vector and select the group which has minimum difference.
closest_n_vectors <- function(v, n) {
v1 <- sort(v)
inds <- which.min(sapply(head(seq_along(v1), -(n - 1)), function(x)
v1[x + n -1] - v1[x]))
v1[inds: (inds + n - 1)]
}
closest_n_vectors(v, 3)
#[1] 23 25 26
closest_n_vectors(c(2, 10, 1, 20, 4, 5, 23), 2)
#[1] 1 2
closest_n_vectors(c(19, 23, 45, 67, 89, 65, 1), 2)
#[1] 65 67
closest_n_vectors(c(19, 23, 45, 67, 89, 65, 1), 3)
#[1] 1 19 23
In case of tie this will return the numbers with smallest value since we are using which.min.
BENCHMARKS
Since we have got quite a few answers, it is worth doing a benchmark of all the solutions till now
set.seed(1234)
x <- sample(100000000, 100000)
identical(findClosest_antoine(x, 3), findClosest_Sotos(x, 3),
closest_n_vectors_Ronak(x, 3), findClosest_Cole(x, 3))
#[1] TRUE
microbenchmark::microbenchmark(
antoine = findClosest_antoine(x, 3),
Sotos = findClosest_Sotos(x, 3),
Ronak = closest_n_vectors_Ronak(x, 3),
Cole = findClosest_Cole(x, 3),
times = 10
)
#Unit: milliseconds
# expr min lq mean median uq max neval cld
#antoine 148.751 159.071 163.298 162.581 167.365 181.314 10 b
# Sotos 1086.098 1349.762 1372.232 1398.211 1453.217 1553.945 10 c
# Ronak 54.248 56.870 78.886 83.129 94.748 100.299 10 a
# Cole 4.958 5.042 6.202 6.047 7.386 7.915 10 a
An idea is to use zoo library to do a rolling operation, i.e.
library(zoo)
m1 <- rollapply(v, 3, by = 1, function(i)c(sum(diff(i)), c(i)))
m1[which.min(m1[, 1]),][-1]
#[1] 23 25 26
Or make it into a function,
findClosest <- function(vec, n) {
require(zoo)
vec1 <- sort(vec)
m1 <- rollapply(vec1, n, by = 1, function(i) c(sum(diff(i)), c(i)))
return(m1[which.min(m1[, 1]),][-1])
}
findClosest(v, 3)
#[1] 23 25 26
For use in a dataframe,
data%>%
group_by(var1,var2)%>%
do(data.frame(findClosest(.$val,3)))
This is my function:
g <- function(x,y){
x <- (x-y):x
y <- 1:30 # ------> (y is always fixed 1:30)
z<- outer(x,y,fv) # ---->(fv is a previous function)
s <- colSums(z)
which(s==max(s),arr.ind=T)
}
It tells me the position of the max value in s. I basically have a problem in choosing y because given a small y, the max(s) appears more than once in s. For example:
#given x=53
> g(53,1)
[1] 13 16 20 22 25 26 27
> g(53,2)
[1] 20 25 26
> g(53,3)
[1] 20 25 26
> g(53,4)
[1] 20 25 26
> g(53,5)
[1] 20 25
> g(53,6)
[1] 25 -----> This is the only result i would like from my function (right y=6)
Another example:
# given x=71
> g(71,1)
[1] 7 9 14
> g(71,2)
[1] 7 14
> g(71,3)
[1] 14 -----> my desired result (right y=3)
Therefore, i would like a function resulting in the first unique solution given y as small as possible ( ex: g(53)=25 , g(71)=14, ...). Any help? Thanks
This is a simplify example. I hope to be more clear in questioning:
#The idea is the same:
n <- 1:9
e <- rep(nn,500)
p<- sample(e) # --->(Need to sample in order to have more max later (mixed matrix)
mat <- matrix(p,90)
g <- function(x,y){
x <- (x-y):x
k <- rowSums(mat[,x])
which(k==max(k), arr.ind=T)
}
#In my sample matrix :
k <- rowSums(mat[,44:45])
which(k==max(k), arr.ind=T)
[1] 44 71 90
#In fact
g(45,1)
[1] 44 71 90 # ---> more than one solution
g(45,2)
[1] 90 # ----> I would like to pick up this value wich is the first unique solution given x=45
Therefore, i would like a function resulting in the first unique solution for y as small as possible given x ( in this new ex: g(45)=90... ).
I got it. It is a bit long but i think right.
Taking into consideration the second simplify example:
g <- function(x,y){
x <- (x-y):x
k <- rowSums(mat[,x])
q <- which(k==max(k), arr.ind=T)
length(q)
}
gv <- Vectorize(g)
l <- function(x){
y<- 1:30 # <- (until 30 to be sure)
z<- outer(x,y,gv)
y <- which.min(z) # <- (min is surely length=1 and which.min takes the first)
x <- (x-y):x
k <- rowSums(mat[,x])
q <- which(k==max(k), arr.ind=T)
q
}
l(45)
[1] 90
It seems like you could just do this with a recursive function. Consider the following:
set.seed(42)
n = 1:9
e = rep(n, 500)
p = sample(e)
mat = matrix(p, 90)
g <- function(x, y=1) {
xv <- (x-y):x
k <- rowSums(mat[, xv])
i <- which(k == max(k), arr.ind=T)
n <- length(i)
if (n == 1) {
return(y) # want to know the min y that solves the problem, right?
} else {
y <- y + 1 # increase y by 1
g(x,y) # run our function again with a new value of y
}
}
You should now be able to run g(45) and get 1 as the result, since that is the value of y that solves the problem, and g(33) to get 2.
I have a (13*122) x (14) matrix (122 stacked 13x14's), which I made into a list of 122 individual 13 x 14 matrices.
set.seed(1)
mat = matrix(rnorm(13*122*14,0,1),(13*122),14)
I have another matrix that is 122 x 14.
beta = matrix(rnorm(122*14,0,1),122,14)
I want to multiply each stacked matrix by the correspond row in beta, so the first 13 x 14 matrix would get multiplied by beta[1,] (which is 14x1), so I'd get 13x1 matrix, etc.
Should I do this with a list or is it unnecessary? I would like it to be as fast as possible.
I want to return a 13 x 122 matrix.
We could split the matrix into a 'list' of length '122' and use mapply to do the %*% of corresponding elements of 'lst' and rows of 'beta'
lst <- lapply(split(1:nrow(mat),(1:nrow(mat)-1) %/%13+1),
function(i) mat[i,])
res <- mapply(`%*%`, lst, split(beta, row(beta)))
dim(res)
#[1] 13 122
Or we could convert the matrix to array and then do the multiplication, which I guess would be fast
mat1 <- mat #if we need a copy of the original matrix
dim(mat1) <- c(13, 122, 14)
mat2 <- aperm(mat1, c(1,3,2))
res2 <- matrix(, ncol=122, nrow=13)
for(i in 1:(dim(mat2)[3])){
res2[,i] <- mat2[,,i] %*%beta[i,]
}
all.equal(res, res2, check.attributes=FALSE)
#[1] TRUE
Try this:
mat <- lapply(1:122, function(x) matrix(data = rnorm(13*14,0,1), nrow = 13, ncol = 14))
mat2 <- lapply(1:122, function(x) mat[[x]] %*% beta[x,])
look for the book introduction to algorithms and look at page 331. There is a pseodu algortihm to do so. you have to make a three of matrix products where it will sort it so that it will be an optimum for multiplication but short hand, if you have three matrices M1 of m x n, M2 of n x v, M3 of v x w then you wish to know if (M1 * M2) * M3 or M1 * (M2 * M3) is better the answer is to calculate the to numbers mnv and nvw and deside which is biggest. the smallest one is always better.
Sorry in advance if "inversion score" isn't the proper terminology. Here's a wiki entry.
Consider a list of values, for instance
1 2 3 4 7 6 9 10 8
would have three penalties (a score of 3)
The 6 comes after 7
The 8 comes after 9
The 8 comes after 10
How can I calculate this inversion for a given vector of numbers in R? Note that some values will be NA, and I just want to skip these.
Your "inversion score" is a central component of Kendall's tau statistic. According to Wikipedia (see link), the tau statistic is (# concordant pairs-#discordant pairs)/(n*(n-1)/2). I believe that what R reports as T is the number of concordant pairs. Therefore, we should be able to reconstruct the number of discordant pairs (which I think is what you want) via n*(n-1)/2-T, as follows
x <- c(1,2,3,4,7,6,9,10,8)
(cc <- cor.test(sort(x),x,method="kendall"))
## Kendall's rank correlation tau
## data: sort(x) and x
## T = 33, p-value = 0.0008543
## alternative hypothesis: true tau is not equal to 0
## sample estimates:
## tau
## 0.8333333
So this function should work:
ff <- function(x) {
cc <- cor.test(sort(x),x,method="kendall")
n <- length(x)
n*(n-1)/2-unname(cc$statistic["T"])
}
ff(x) is 3 as requested (it would be good if you gave more examples of desired output ...) Haven't checked speed, but this has the advantage of being implemented in underlying C code.
I quickly came up with two strategies. A naive and a more clever using the outer function.
We look at two vectors of numbers A and B, where A is your example.
A <- scan(text = "1 2 3 4 7 6 9 10 8")
B <- sample(1:2321)
Define and try the naive inversion counting:
simpleInversion <- function(A) {
sum <- 0
n <- length(A)
for (i in 1:(n-1)) {
for (j in (i+1):n) {
sum <- sum + (A[i] > A[j])
}
}
return(sum)
}
simpleInversion(A)
simpleInversion(B)
Define and try the slightly more clever inversion counting:
cleverInversion <- function(A) {
tab <- outer(A, A, FUN = ">")
return(sum(tab[upper.tri(tab)]))
}
cleverInversion(A)
cleverInversion(B)
For the version which ignores NAs we can simply add an na.omit:
cleverInversion2 <- function(A) {
AA <- na.omit(A)
Tab <- outer(AA, AA, FUN = ">")
return(sum(Tab[upper.tri(Tab)]))
}
A[2] <- NA
cleverInversion2(A)
Hope this helps.
Edit: A faster version
Both functions become quite slow quickly when the size of the vector grows. So I came up with at faster version:
fastInversion <- function(A) {
return(sum(cbind(1, -1) %*% combn(na.omit(AA), 2) > 0))
}
C <- sample(c(1:500, NA))
library("microbenchmark")
microbenchmark(
simpleInversion(C),
cleverInversion(C),
fastInversion(C))
#Unit: microseconds
# expr min lq median uq max neval
# simpleInversion(C) 128538.770 130483.626 133999.272 144660.116 185767.208 100
# cleverInversion(C) 9546.897 9893.358 10513.799 12564.298 17041.789 100
# fastInversion(C) 104.632 114.229 193.144 198.209 324.614 100
So we gain quite a speed-up of nearly two orders of magnitude. The speed-up is even greater for larger vectors.
You could test each pair of values in your vector, counting the number that are inverted:
inversion.score <- function(vec) {
sum(apply(combn(length(vec), 2), 2, function(x) vec[x[2]] < vec[x[1]]), na.rm=T)
}
inversion.score(c(1, 2, 3, 7, 6, 9, 10, 8, NA))
# [1] 3