Solutions to a system of inequalities in R - r

Suppose I have the following system of inequalities:
-2x + y <= -3
1.25x + y <= 2.5
y >= -3
I want to find multiple tuples of (x, y) that satisfy the above inequalities.
library(Rglpk)
obj <- numeric(2)
mat <- matrix(c(-2, 1, 1.25, 1, 0, 1), nrow = 3)
dir <- c("<=", "<=", ">=")
rhs <- c(-3, 2.5, -3)
Rglpk_solve_LP(obj = obj, mat = mat, dir = dir, rhs = rhs)
Using the above code only seems to return 1 possible solution tuple (1.5, 0). Is possible to return other solution tuples?
Edit: Based on the comments, I would be interested to learn if there are any functions that could help me find the corner points.

Actually to understand the possible answers for the given question we can try to solve the system of inequalities graphically.
There was a nice answer concerning plotting of inequations in R at stackowerflow. Using the given aproach we can plot the following graph:
library(ggplot2)
fun1 <- function(x) 2*x - 3 # this is the same as -2x + y <= -3
fun2 <- function(x) -1.25*x + 2.5 # 1.25x + y <= 2.5
fun3 <- function(x) -3 # y >= -3
x1 = seq(-1,5, by = 1/16)
mydf = data.frame(x1, y1=fun1(x1), y2=fun2(x1),y3= fun3(x1))
mydf <- transform(mydf, z = pmax(y3,pmin(y1,y2)))
ggplot(mydf, aes(x = x1)) +
geom_line(aes(y = y1), colour = 'blue') +
geom_line(aes(y = y2), colour = 'green') +
geom_line(aes(y = y3), colour = 'red') +
geom_ribbon(aes(ymin=y3,ymax = z), fill = 'gray60')
All the possible (infinite by number) tuples lie inside the gray triangle.
The vertexes can be found using the following code.
obj <- numeric(2)
mat <- matrix(c(-2, 1.25, 1, 1), nrow = 2)
rhs <- matrix(c(-3, 2.5), nrow = 2)
aPoint <- solve(mat, rhs)
mat <- matrix(c(-2, 0, 1, 1), nrow = 2)
rhs <- matrix(c(-3, -3), nrow = 2)
bPoint <- solve(mat, rhs)
mat <- matrix(c(1.25, 0, 1, 1), nrow = 2)
rhs <- matrix(c(2.5, -3), nrow = 2)
cPoint <- solve(mat, rhs)
Note the order of arguments of matrices.
And you get the coordinates:
> aPoint
[,1]
[1,] 1.6923077
[2,] 0.3846154
> bPoint
[,1]
[1,] 0
[2,] -3
> cPoint
[,1]
[1,] 4.4
[2,] -3.0

All the codes below are with base R only (no need library(Rglpk))
1. Corner Points
If you want to get all the corner points, here is one option
A <- matrix(c(-2, 1.25, 0, 1, 1, -1), nrow = 3)
b <- c(-3, 2.5, 3)
# we use `det` to check if the coefficient matrix is singular. If so, we return `Inf`.
xh <-
combn(nrow(A), 2, function(k) {
if (det(A[k, ]) == 0) {
rep(NA, length(k))
} else {
solve(A[k, ], b[k])
}
})
# We filter out the points that satisfy the constraint
corner_points <- t(xh[, colSums(A %*% xh <= b, na.rm = TRUE) == length(b)])
such that
> corner_points
[,1] [,2]
[1,] 1.692308 0.3846154
[2,] 0.000000 -3.0000000
[3,] 4.400000 -3.0000000
2. Possible Tuples
If you want to have multiple tuples, e.g., n=10, we can use Monte Carlo simulation (based on the obtained corner_points in the previous step) to select the tuples under the constraints:
xrange <- range(corner_points[, 1])
yrange <- range(corner_points[, 2])
n <- 10
res <- list()
while (length(res) < n) {
px <- runif(1, xrange[1], xrange[2])
py <- runif(1, yrange[1], yrange[2])
if (all(A %*% c(px, py) <= b)) {
res[length(res) + 1] <- list(c(px, py))
}
}
and you will see n possible tuples in a list like below
> res
[[1]]
[1] 3.643167 -2.425809
[[2]]
[1] 2.039007 -2.174171
[[3]]
[1] 0.4990635 -2.3363637
[[4]]
[1] 0.6168402 -2.6736421
[[5]]
[1] 3.687389 -2.661733
[[6]]
[1] 3.852258 -2.704395
[[7]]
[1] 1.7571062 0.1067597
[[8]]
[1] 3.668024 -2.771307
[[9]]
[1] 2.108187 -1.365349
[[10]]
[1] 2.106528 -2.134310

First of all, the matrix representing the three equations needs a small correction, because R fills matrices column by column :
-2x + y <= -3
1.25x + y <= 2.5
y >= -3
mat <- matrix(c(-2, 1.25, 0, 1, 1, 1), nrow = 3
# and not : mat <- matrix(c(-2, 1, 1.25, 1, 0, 1), nrow = 3)
To get different tuples, you could modify the objective function :
obj <- numeric(2) results in an objective function 0 * x + 0 * y which is always equal to 0 and can't be maximized : the first valid x,y will be selected.
Optimization on x is achieved by using obj <- c(1,0), resulting in maximization / minimization of 1 * x + 0 * y.
Optimization on y is achieved by using obj <- c(0,1).
#setting the bounds is necessary, otherwise optimization occurs only for x>=0 and y>=0
bounds <- list(lower = list(ind = c(1L, 2L), val = c(-Inf, -Inf)),
upper = list(ind = c(1L, 2L), val = c(Inf, Inf)))
# finding maximum x: obj = c(1,0), max = T
Rglpk_solve_LP(obj = c(10,0), mat = mat, dir = dir, rhs = rhs,bound=bounds, max = T)$solution
# [1] 4.4 -3.0
# finding minimum x: obj = c(1,0), max = F
Rglpk_solve_LP(obj = c(10,0), mat = mat, dir = dir, rhs = rhs,bound=bounds, max = F)$solution
#[1] 0 -3
# finding maximum y: obj = c(0,1), max = T
Rglpk_solve_LP(obj = c(0,1), mat = mat, dir = dir, rhs = rhs,bound=bounds, max = T)$solution
#[1] 1.6923077 0.3846154

Related

issue with loop increments of lower than 1 when simulating data

I'm trying to simulate data utulizing a for loop and storing it in some matrix with the following code:
m <- matrix(nrow = 500 , ncol = 7)
for(i in seq(from = 1, to = 4, by = 0.5)){
a <- 1 * i + rnorm(n = 500, mean = 0, sd = 1)
m[, i] <- a
}
But instead of giving me 7 columns with means of roughly 1, 1.5, 2, 2.5, 3, 3.5 and 4. matrix m contains 4 columns with means of roughly 1.5, 2.5, 3.5 and 4 and 3 columns of NA values.
If i change the increments to 1 and run the below code, everything behaves as expected so the issue seems to be with the increments, but i cant figure out what i should do differently, help would be most appreciated.
m <- matrix(nrow = 500 , ncol = 7)
for(i in seq(from = 1, to = 7, by = 1)){
a <- 1 * i + rnorm(n = 500, mean = 0, sd = 1)
m[, i] <- a
}
Column indices must be integers. In your case, you try to select column 1.5 which is not possible. You can fix this by some simple calculations ((i * 2) - 1)
# reduce number of rows for showcase
n <- 100
m <- matrix(nrow = n , ncol = 7)
for(i in seq(from = 1, to = 4, by = 0.5)){
# NOTE: 1*i does not change anything
a <- 1*i + rnorm(n = n, mean = 0, sd = 1)
# make column index integerish
m[, (i * 2) - 1] <- a
}
m[1:5, ]
#> [,1] [,2] [,3] [,4] [,5] [,6] [,7]
#> [1,] 1.15699467 0.8917952 1.999899 2.330557 4.502607 4.469957 5.687460
#> [2,] -1.13634309 1.5394771 1.700148 1.669329 2.124019 3.472836 3.513351
#> [3,] 2.08584731 1.0591743 2.866186 3.192953 3.984286 3.593902 3.983265
#> [4,] 0.02211767 2.2222376 2.055832 2.927851 2.846376 3.411725 3.742966
#> [5,] 0.49167319 2.2244472 2.190050 3.525931 2.841522 5.722172 4.797856
colMeans(m)
#> [1] 0.8537568 1.6805235 1.9907633 2.6434843 2.8651140 3.5499583 3.9757984
When you use rnorm, it actually allows vectorzied input for the mean value, so you can try the code below (but you should use matrix to fit the obtained output into the desired dimensions of your output matrix)
nr <- 500
nc <- 7
m <- t(matrix(rnorm(nr * nc, seq(1, 4, 0.5), 1), nc, nr))
where you can see, for example
> m[1:5, ]
[,1] [,2] [,3] [,4] [,5] [,6] [,7]
[1,] 3.2157776 0.3805689 0.7550255 2.508356 3.567479 2.597378 4.122201
[2,] 0.8634009 0.4887092 2.5655513 1.710756 2.377790 3.733045 4.199812
[3,] -0.1786419 2.4471083 1.2138140 3.090687 2.763694 3.471715 4.676037
[4,] 1.2492511 2.3480447 2.2180039 1.965656 1.505342 3.832380 4.086075
[5,] -0.1301543 1.7463687 1.2467769 2.649525 4.795677 2.606623 4.318468
> colMeans(m)
[1] 0.901146 1.476423 1.900147 2.567463 2.996918 3.468140 4.025929
You're using i as a row index, but i has non-integer values. Only integers can be used for indexing a matrix/df. When i is, say, 1.5 but you try to use it in the m[,i] expression, it gets forced into an integer and rounded down to 1, so the first 2 runs of your loop overwrite each other (and the 3rd and 4th, etc.).
You could simply use your second code and replace 1*i with 0.5 + 0.5*i:
m <- matrix(nrow = 5000 , ncol = 7)
for(i in seq(from = 1, to = 7, by = 1)){
a <- 0.5 + 0.5*i + rnorm(n = 5000, mean = 0, sd = 1)
m[,i] <- a
}
However, it may be better to use the params of the rnorm function to generate values with a specified mean/sd: currently, you are drawing from a normal distribution centered around 0 then shifting it sideways; you could simply tell it to use the mean you actually want.
m <- matrix(nrow = 5000 , ncol = 7)
for(i in seq(from = 1, to = 7, by = 1)){
m[,i] <- rnorm(n = 5000, mean = 0.5 + 0.5*i, sd = 1)
}

R function to plot inequalities with shading

Suppose I have a set of inequalities:
-2x + y <= -3
1.25x + y <= 2.5
y >= -3
And I can summarize the information as follows:
mat <- matrix(c(-2, 1, 1.25, 1, 0, 1), nrow = 3, byrow = TRUE)
dir <- c("<=", "<=", ">=")
rhs <- c(-3, 2.5, -3)
I wrote the following function to plot the inequalities:
plot(0, 0, xlim = c(-1, 5), ylim = c(-4, 1))
plot_ineq <- function(mat, dir, rhs, xlow, xhigh){
line <- list()
for(i in 1:nrow(mat)){
if(mat[i, 2] > 0){
line[[i]] <- sapply(seq(xlow, xhigh, 0.1), function(x) (rhs[i] - mat[i, 1] * x)/mat[i, 2])
}else if(mat[i, 2] < 0){
line[[i]] <- sapply(seq(xlow, xhigh, 0.1), function(x) (rhs[i] - mat[i, 1] * x)/mat[i, 2])
if(dir[i] == ">="){
dir[i] = "<="
}else dir[i] = ">="
}
lines(seq(xlow, xhigh, 0.1), line[[i]])
}
}
plot_ineq(mat = mat, dir = dir, rhs = rhs, xlow = -1, xhigh = 5)
I have two questions: (1) how can I have a blank plot without having the (0, 0) point there? and (2) how can I shade the corresponding region according to dir? Should I try ggplot2?
I'm simply looking to shade the area that is described by the set of inequalities above. Not where (0, 0) lies.
1) Change the last inequality to be the same direction as the others and then use plotPolytope in gMOIP.
library(gMOIP)
mat <- matrix(c(-2, 1, 1.25, 1, 0, -1), nrow = 3, byrow = TRUE)
rhs <- c(-3, 2.5, 3)
argsFaces <- list(argsGeom_polygon = list(fill = "blue"))
plotPolytope(mat, rhs, argsFaces = argsFaces)
giving (continued after image)
2) The above uses ggplot2 graphics but if you prefer classic graphics then using mat and rhs from above:
library(gMOIP)
cp <- cornerPoints(mat, rhs)
cp <- cp[chull(cp), ] # chull gives indices of convex hull in order
plot(cp, type = "n")
polygon(cp, col = "blue")
# not shown but to add lines run this too
for(i in 1:nrow(cp)) {
ix <- if (i < nrow(cp)) i + 0:1 else c(i, 1)
b <- diff(cp[ix, 2]) / (d <- diff(cp[ix, 1]))
if (abs(d) < 1e-5) abline(v = a <- cp[i, 1])
else abline(a = a <- cp[i, 2] - b * cp[i, 1], b = b)
}
giving (continued after image)
3) Note that there is an archived package named intpoint on CRAN and it could be used to draw the boundary of the feasible region and lines. It does have the limitation that it is hard coded to show X and Y axes between -1 and 5 although it might not be hard to generalize it. It is used like this (output not shown) where mat, rhs and cp are from above.
library(intpoint)
intpoint:::show2d(mat, rhs, c = numeric(2))
polygon(cp, col = "blue")

How can I plot this 'integral' code in R?

f1 <- function(x){integrate(f = function(t){
sqrt(t^3-1)
}, lower = 1, upper = x)}
The domain of x is 1 to 4. f1 always emit value characterized 'integrate'. I don't know how to plot this integral function in R.
Thanks to anyone who can help me.
You may need to compute the values of your function f1 and then use an apply function as follows:
f1 <- function(x) {
integrate( function(t) sqrt(t^3-1), lower = 1, upper = x)
}
u <- seq(1, 4, by = 0.1) # Defining a vector of values from 1 to 4 in steps of .1
f1u <- sapply(u, function(x) f1(x)$value) #computing the values of f1 over u
plot(u,f1u, type = "l", xlab = "x", ylab = "f1(x)") # your plot
You can vectorize the upper argument to integrate like so:
vintegrate <- Vectorize(integrate, "upper")
f1 <- function(x) {
unlist(vintegrate(function(t) sqrt(t^3-1), lower = 1, upper = x)[1,])
}
Then you can plot using the curve function in base R:
curve(f1(x), from = 1, to = 4)
Or using ggplot2:
library(ggplot2)
ggplot(data.frame(x = 0)) +
geom_function(fun = f1) +
xlim(1, 4)
Without vectorizing, the upper argument expects an vector of length 1 or else it will error:
integrate(function(t) sqrt(t^3-1), lower = 1, upper = 1:4)
Error in integrate(function(t) sqrt(t^3 - 1), lower = 1, upper = 1:4) :
length(upper) == 1 is not TRUE
After vectorizing:
vintegrate(function(t) sqrt(t^3-1), lower = 1, upper = 1:4)
[,1] [,2] [,3] [,4]
value 0 1.515927 5.356315 11.84309
abs.error 0 0.0001312847 0.0003641383 0.0006563824
subdivisions 1 5 5 5
message "OK" "OK" "OK" "OK"
call Expression Expression Expression Expression
And we use unlist and [1,] to get the value.

Constrained Optimization in R not giving expected results

I have 2 given matrices
a1 <- matrix(c(0.4092951, 0.1611806, 0.4283178, 0.001206529), nrow =
1)
a2 <- matrix(c(0.394223557, 0.140443266, 0.463980790, 0.001352387),
nrow = 1)
I have an initial matrix
b <- matrix(c(0.4095868, 0.1612955, 0.4286231, 0.0004946572,
0, 0.2732351, 0.7260891, 0.0006757670,
0, 0, 0.9909494, 0.0090505527,
0, 0, 0, 1), nrow = 4, byrow = T)
I need to update 'b' such that
a1 %*% b = a2
The above is an optimization problem where the
objective function is to minimize
(a1 %*% b - a2)
which would drive the value of the sum(absolute value(a1 %*% b - a2)) to zero, subject to the constraints:
Lower triangle(b) = 0 ;
RowSum(b) = 1
## creating a data vector with a1 and a2
data = c(as.numeric(a1), as.numeric(a2))
## objective function
min_obj <- function(p){
## Creating a matrix to recreate 'b'
p1 <- matrix(rep(0, 16), nrow = 4)
k = 1
for(i in 1:nrow(p1)){
for (j in 1:ncol(p1)){
if(j >= i){
p1[i,j] <- p[k]
k = k+1
}
}
}
actual <- matrix(data[1:(length(data)/2)], nrow = 1)
pred <- matrix(data[(length(data)/ 2 + 1):length(data)], nrow = 1)
s <- (actual %*% p1) - pred
sum(abs(s))
}
## Initializing the initial values for b taking only non-zero values
init <- b[b>0]
opt <- optim(init, min_obj, control = list(trace = T), method =
"L-BFGS-B", lower = rep(0, length(init)), upper = rep(1,
length(init)))
transformed_b <- matrix(rep(0, 16), nrow = 4)
k = 1
for(i in 1:nrow(transformed_b)){
for (j in 1:ncol(transformed_b)){
if(j >= i){
transformed_b[i,j] <- opt$par[k]
k = k+1
}
}
}
transformed_b
The issue with transformed_b is that rowSum of the matrix is not 1. Any help is highly appreciated.
"optim" is the right choice. Since the row sums have to be 1, there are only 6 parameters, not 10 as in your attempt. The diagonal is uniquely determined by the values strictly above the diagonal.
a1 <- matrix(c(0.4092951, 0.1611806, 0.4283178, 0.001206529), nrow =
1)
a2 <- matrix(c(0.394223557, 0.140443266, 0.463980790, 0.001352387),
nrow = 1)
b <- matrix(c(0.4095868, 0.1612955, 0.4286231, 0.0004946572,
0, 0.2732351, 0.7260891, 0.0006757670,
0, 0, 0.9909494, 0.0090505527,
0, 0, 0, 1), nrow = 4, byrow = T)
#======================================================================
# Build an upper triangular matrix with rowsums 1:
B <- function(x)
{
X <- matrix(c(0,x[1:3],0,0,x[4:5],0,0,0,x[6],rep(0,4)),4,4,byrow=TRUE)
diag(X) <- 1-rowSums(X)
return(X)
}
#----------------------------------------------------------------------
# The function we want to minimize:
f <- function(x)
{
return (sum((a1%*%B(x) - a2)^2))
}
#----------------------------------------------------------------------
#Optimization:
opt <- optim( par = c(b[1,2:4],b[2,3:4],b[3,4]),
fn = f,
lower = rep(0,6),
method = "L-BFGS-B" )
optB <- B(opt$par)
Result:
> optB
[,1] [,2] [,3] [,4]
[1,] 0.9631998 0.03680017 0.0000000 0.0000000000
[2,] 0.0000000 0.77820700 0.2217930 0.0000000000
[3,] 0.0000000 0.00000000 0.9998392 0.0001608464
[4,] 0.0000000 0.00000000 0.0000000 1.0000000000
> a1 %*% optB - a2
[,1] [,2] [,3] [,4]
[1,] 9.411998e-06 5.07363e-05 1.684534e-05 -7.696464e-05
> rowSums(optB)
[1] 1 1 1 1
I chose the sum of squares instead of sum of absolute values, since it is differentiable. This makes it easier for "optim" to find the minimum, I guess.

Is there a weighted.median() function?

I'm looking for something similar in form to weighted.mean(). I've found some solutions via search that write out the entire function but would appreciate something a bit more user friendly.
The following packages all have a function to calculate a weighted median: 'aroma.light', 'isotone', 'limma', 'cwhmisc', 'ergm', 'laeken', 'matrixStats, 'PSCBS', and 'bigvis' (on github).
To find them I used the invaluable findFn() in the 'sos' package which is an extension for R's inbuilt help.
findFn('weighted median')
Or,
???'weighted median'
as ??? is a shortcut in the same way ?some.function is for help(some.function)
Some experience using the answers from #wkmor1 and #Jaitropmange.
I've checked 3 functions from 3 packages, isotone, laeken, and matrixStats. Only matrixStats works properly. Other two (just as the median(rep(x, times=w) solution) give integer output. As long as I calculated median age of populations, decimal places matter.
Reproducible example. Calculation of the median age of a population
df <- data.frame(age = 0:100,
pop = spline(c(4,7,9,8,7,6,4,3,2,1),n = 101)$y)
library(isotone)
library(laeken)
library(matrixStats)
isotone::weighted.median(df$age,df$pop)
# [1] 36
laeken::weightedMedian(df$age,df$pop)
# [1] 36
matrixStats::weightedMedian(df$age,df$pop)
# [1] 36.164
median(rep(df$age, times=df$pop))
# [1] 35
Summary
matrixStats::weightedMedian() is the reliable solution
To calculate the weighted median of a vector x using a same length vector of (integer) weights w:
median(rep(x, times=w))
This is just a simple solution, ready to use almost anywhere.
weighted.median <- function(x, w) {
w <- w[order(x)]
x <- x[order(x)]
prob <- cumsum(w)/sum(w)
ps <- which(abs(prob - .5) == min(abs(prob - .5)))
return(x[ps])
}
Really old post but I just came across it and did some testing of the different methods. spatstat::weighted.median() seemed to be about 14 times faster than median(rep(x, times=w)) and its actually noticeable if you want to run the function more than a couple times. Testing was with a relatively large survey, about 15,000 people.
One can also use stats::density to create a weighted PDF, then convert this to a CDF, as elaborated here:
my_wtd_q = function(x, w, prob, n = 4096)
with(density(x, weights = w/sum(w), n = n),
x[which.max(cumsum(y*(x[2L] - x[1L])) >= prob)])
Then my_wtd_q(x, w, .5) will be the weighted median.
One could also be more careful to ensure that the total area under the density is one by re-normalizing.
A way in base to get a weighted median will be to order by the values and build the cumsum of the weights and get the value(s) at sum * 0.5 of the weights.
medianWeighted <- function(x, w, q=.5) {
n <- length(x)
i <- order(x)
w <- cumsum(w[i])
p <- w[n] * q
j <- findInterval(p, w)
Vectorize(function(p,j) if(w[n] <= 0) NA else
if(j < 1) x[i[1]] else
if(j == n) x[i[n]] else
if(w[j] == p) (x[i[j]] + x[i[j+1]]) / 2 else
x[i[j+1]])(p,j)
}
What will have the following results with simple input data.
medianWeighted(c(10, 40), c(1, 2))
#[1] 40
median(rep(c(10, 40), c(1, 2)))
#[1] 40
medianWeighted(c(10, 40), c(2, 1))
#[1] 10
median(rep(c(10, 40), c(2, 1)))
#[1] 10
medianWeighted(c(10, 40), c(1.5, 2))
#[1] 40
medianWeighted(c(10, 40), c(3, 4))
#[1] 40
median(rep(c(10, 40), c(3, 4)))
#[1] 40
medianWeighted(c(10, 40), c(1.5, 1.5))
#[1] 25
medianWeighted(c(10, 40), c(3, 3))
#[1] 25
median(rep(c(10, 40), c(3, 3)))
#[1] 25
medianWeighted(c(10, 40), c(0, 1))
#[1] 40
medianWeighted(c(10, 40), c(1, 0))
#[1] 10
medianWeighted(c(10, 40), c(0, 0))
#[1] NA
It can also be used for other qantiles
medianWeighted(1:10, 10:1, seq(0, 1, 0.25))
[1] 1 2 4 6 10
Compare with other methods.
#Functions from other Answers
weighted.median <- function(x, w) {
w <- w[order(x)]
x <- x[order(x)]
prob <- cumsum(w)/sum(w)
ps <- which(abs(prob - .5) == min(abs(prob - .5)))
return(x[ps])
}
my_wtd_q = function(x, w, prob, n = 4096)
with(density(x, weights = w/sum(w), n = n),
x[which.max(cumsum(y*(x[2L] - x[1L])) >= prob)])
weighted.quantile <- function(x, w, probs = seq(0, 1, 0.25),
na.rm = FALSE, names = TRUE) {
if (any(probs > 1) | any(probs < 0)) stop("'probs' outside [0,1]")
if (length(w) == 1) w <- rep(w, length(x))
if (length(w) != length(x)) stop("w must have length 1 or be as long as x")
if (isTRUE(na.rm)) {
w <- x[!is.na(x)]
x <- x[!is.na(x)]
}
w <- w[order(x)] / sum(w)
x <- x[order(x)]
cum_w <- cumsum(w) - w * (1 - (seq_along(w) - 1) / (length(w) - 1))
res <- approx(x = cum_w, y = x, xout = probs)$y
if (isTRUE(names)) {
res <- setNames(res, paste0(format(100 * probs, digits = 7), "%"))
}
res
}
Methods
M <- alist(
medRep = median(rep(DF$x, DF$w)),
isotone = isotone::weighted.median(DF$x, DF$w),
laeken = laeken::weightedMedian(DF$x, DF$w),
spatstat1 = spatstat.geom::weighted.median(DF$x, DF$w, type=1),
spatstat2 = spatstat.geom::weighted.median(DF$x, DF$w, type=2),
spatstat4 = spatstat.geom::weighted.median(DF$x, DF$w, type=4),
survey = survey::svyquantile(~x, survey::svydesign(id=~1, weights=~w, data=DF), 0.5)$x[1],
RAndres = weighted.median(DF$x, DF$w),
matrixStats = matrixStats::weightedMedian(DF$x, DF$w),
MichaelChirico = my_wtd_q(DF$x, DF$w, .5),
Leonardo = weighted.quantile(DF$x, DF$w, .5),
GKi = medianWeighted(DF$x, DF$w)
)
Results
DF <- data.frame(x=c(10, 40), w=c(1, 2))
sapply(M, eval)
# medRep isotone laeken spatstat1 spatstat2
# 40.00000 40.00000 40.00000 40.00000 25.00000
# spatstat4 survey RAndres matrixStats MichaelChirico
# 17.50000 40.00000 10.00000 30.00000 34.15005
# Leonardo.50% GKi
# 25.00000 40.00000
DF <- data.frame(x=c(10, 40), w=c(1, 1))
sapply(M, eval)
# medRep isotone laeken spatstat1 spatstat2
# 25.00000 25.00000 40.00000 10.00000 10.00000
# spatstat4 survey RAndres matrixStats MichaelChirico
# 10.00000 10.00000 10.00000 25.00000 25.05044
# Leonardo.50% GKi
# 25.00000 25.00000
In those two cases only isotone and GKi give identical results compared to what median(rep(x, w)) returns.
If you're working with the survey package, assuming you've defined your survey design and x is your variable of interest:
svyquantile(~x, mydesign, c(0.5))
I got here looking for weighted quantiles, so I thought I might as well leave for future readers what I ended up with. Naturally, using probs = 0.5 will return the weighted median.
I started with MichaelChirico's answer, which unfortunately was off at the edges. Then I decided to switch from density() to approx(). Finally, I believe I nailed the correction factor to ensure consistency with the default algorithm of the unweighted quantile().
weighted.quantile <- function(x, w, probs = seq(0, 1, 0.25),
na.rm = FALSE, names = TRUE) {
if (any(probs > 1) | any(probs < 0)) stop("'probs' outside [0,1]")
if (length(w) == 1) w <- rep(w, length(x))
if (length(w) != length(x)) stop("w must have length 1 or be as long as x")
if (isTRUE(na.rm)) {
w <- x[!is.na(x)]
x <- x[!is.na(x)]
}
w <- w[order(x)] / sum(w)
x <- x[order(x)]
cum_w <- cumsum(w) - w * (1 - (seq_along(w) - 1) / (length(w) - 1))
res <- approx(x = cum_w, y = x, xout = probs)$y
if (isTRUE(names)) {
res <- setNames(res, paste0(format(100 * probs, digits = 7), "%"))
}
res
}
When weights are uniform, the weighted quantiles are identical to regular unweighted quantiles:
x <- rnorm(100)
stopifnot(stopifnot(identical(weighted.quantile(x, w = 1), quantile(x)))
Example using the same data as in the weighted.mean() man page.
x <- c(3.7, 3.3, 3.5, 2.8)
w <- c(5, 5, 4, 1)/15
stopifnot(isTRUE(all.equal(
weighted.quantile(x, w, 0:4/4, names = FALSE),
c(2.8, 3.33611111111111, 3.46111111111111, 3.58157894736842,
3.7)
)))
And this is for whoever solely wants the weighted median value:
weighted.median <- function(x, w, ...) {
weighted.quantile(x, w, probs = 0.5, names = FALSE, ...)
}

Resources