I am new to optimization so please bear with me. Here is my problem:
A, B, C, D and E are percentages (18%,2%,1%,78%,1%)
Maximize sum (A(x) + B(x) + C(x) +D(x) + E(x)) ie maximize x ( x<=499572)
such that
A(x) <= 20076
B(x) <= 8619
C(x) <= 145
D(x) <= 465527
E(x) <= 5205
How do I frame this problem in R?
I was using LPsolve package but I am ok with any suggestions.
We restate the problem, omitting positivity constraints on the single scalar variable x, as:
maximize 1 * x
such that
0.18 * x <= 20076
0.02 * x <= 8619
0.01 * x <= 145
0.78 * x <= 465527
0.01 * x <= 5205
so as a linear program we have the following optimum value of x:
library(lpSolve)
constr.mat <- c(.18, .02, .01, .78, .01)
RHS <- c(20076, 8619, 145, 465527, 5205)
soln <- lp("max", 1, constr.mat, "<=", RHS)
soln$solution
## [1] 14500
Of course, as pointed out in the comments below the question this problem can be solved trivially without linear programming by taking the least upper bound of x:
min(RHS / constr.mat)
## [1] 14500
Note
If what you really meant was not the problem stated in the question but rather this 5 variable problem:
max 0.18 * x1 + 0.02 * x2 + 0.01 * x3 + 0.78 * x4 + 0.01 * x5
such that
0.18 * x1 <= 20076
0.02 * x2 <= 8619
0.01 * x3 <= 145
0.78 * x4 <= 465527
0.01 * x5 <= 5205
then we have
soln2 <- lp("max", constr.mat, diag(constr.mat), "<=", RHS)
soln2$solution
## [1] 111533.3 430950.0 14500.0 596829.5 520500.0
Again this is trivial to compute without linear programming:
RHS / constr.mat
## [1] 111533.3 430950.0 14500.0 596829.5 520500.0
Related
I am working on a problem that requires me to use Metropolis Hastings algorithm to generate values for 3 unknowns, beta, lambda and phi for a group of users, and then use that output as an input for the gibbs sampler to solve a regression equation and further estimate the true values of beta, lambda and phi. I am using the rjags library in R, which works fine, but I don't clearly understand how the library is calculating the values under the hood.
The following is what the output of Metropolis Hastings looks like, let's say we did 3 iterations of MH, and had 3 users, so each user will have 3 values of beta, lambda and phi:
user number
beta
lambda
phi
1
0.1
0.2
0.5
1
0.2
0.2
0.3
1
0.1
0.4
0.3
2
0.8
0.4
0.6
2
0.7
0.5
0.7
2
0.8
0.4
0.7
3
1.8
2.4
1.7
3
3.6
4.8
6.1
3
3.6
8.2
3.2
I also have certain covariates for the 3 users:
user number
x1
x2
1
24
36
2
12
15
3
18
31
The above two tables go as input to rjags. Rjags code is as follows:
# metropolis is the output of MH
beta <- metropolis[,beta]
lambda <- metropolis[,lambda]
phi <- metropolis[,phi]
user_id <- metropolis[,user number
x1 <- covariates[, x1]
x2 <- covariates[, x2]
#number of MH iterations
N <- dim(metropolis)[1]
#number of customers
P <- metropolis[,uniqueN(user_id)]
data.jags <- list(beta = beta, lambda = lambda, phi = phi,
N = N, P = P,
user_id = user_id, x1 = x1, x2 = x2)
inits.jags <- list(".RNG.name" = "base::Wichmann-Hill",
".RNG.seed" = 111)
modelString <-
"model{
for (i in 1:N){
beta[i] ~ dnorm(theta_beta[cust_id[i]], tau_beta)
lambda[i] ~ dnorm(theta_lambda[cust_id[i]], tau_lambda)
phi[i] ~ dnorm(theta_phi[cust_id[i]], tau_phi)
}
for (j in 1:P){
theta_beta[j] <- d_0 + d_1 * x1 + d_2 * x2 + error_beta[j]
theta_lambdaj] <- g_0 + g_1 * x1 + g_2 * x2 + error_lambda[j]
theta_phij] <- e_0 + e_1 * x1 + e_2 * x2 + error_phi[j]
real_beta[j] <- exp(theta_beta[j])
real_lambda[j] <- exp(theta_lambda[j])/(1 + exp(theta_lambda[j]))
real_phi[j] <- exp(theta_phi[j])/(1 + exp(theta_phi[j]))
error_beta[j] ~ dnorm(0,0.1)
error_lambda[j] ~ dnorm(0,0.1)
error_phi[j] ~ dnorm(0,0.1)
}
#priors
d_0 ~ dnorm(0,0.01)
d_1 ~ dnorm(0,0.01)
d_2 ~ dnorm(0,0.01)
g_0 ~ dnorm(0,0.01)
g_1 ~ dnorm(0,0.01)
g_2 ~ dnorm(0,0.01)
e_0 ~ dnorm(0,0.01)
e_1 ~ dnorm(0,0.01)
e_2 ~ dnorm(0,0.01)
sigma_beta ~ dunif(0,100)
sigma_lambda ~ dunif(0,100)
sigma_phi ~ dunif(0,100)
tau_beta <- 1/pow(sigma_beta,2)
tau_phi <- 1/pow(sigma_phi,2)
tau_lambda <- 1/pow(sigma_lambda,2)
}"
Now, the results from the above code look fine, but I don't understand how exactly is the output of Metropolis Hastings (beta, lambda, phi) being used here. In the first for loop of the modelString, the values of beta, lambda and phi are being drawn from a normal distribution, so where do the values of beta, lambda and phi, which were given as inputs in data.jags come into picture?
Any kind of explanation/ insight would be much appreciated. Thanks!
I am writing a paper about the validity of a billing code in hospitalized children. I am a very novice R studio user. I need the confidence intervals for the sensitive and specificity and positive and negative predictive values but I can't figure out how to do it.
My data has 3 columns : ID, true value, billing value
Here is my code:
confusionMatrix(table(finalcodedataset$billing_value, finalcodedataset$true_value),
positive="1", boot=TRUE, boot_samples=4669, alpha=0.05)
here is the output:
Confusion Matrix and Statistics
0 1
0 4477 162
1 10 20
Accuracy : 0.9632
95% CI : (0.9574, 0.9684)
No Information Rate : 0.961
P-Value [Acc > NIR] : 0.238
Kappa : 0.1796
Mcnemar's Test P-Value : <2e-16
Sensitivity : 0.109890
Specificity : 0.997771
Pos Pred Value : 0.666667
Neg Pred Value : 0.965079
Prevalence : 0.038981
Detection Rate : 0.004284
Detection Prevalence : 0.006425
Balanced Accuracy : 0.553831
'Positive' Class : 1
You can use epiR package for this purpouse.
Example:
library(epiR)
data <- as.table(matrix(c(670,202,74,640), nrow = 2, byrow = TRUE))
rval <- epi.tests(data, conf.level = 0.95)
print(rval)
Outcome + Outcome - Total
Test + 670 202 872
Test - 74 640 714
Total 744 842 1586
Point estimates and 95 % CIs:
---------------------------------------------------------
Apparent prevalence 0.55 (0.52, 0.57)
True prevalence 0.47 (0.44, 0.49)
Sensitivity 0.90 (0.88, 0.92)
Specificity 0.76 (0.73, 0.79)
Positive predictive value 0.77 (0.74, 0.80)
Negative predictive value 0.90 (0.87, 0.92)
Positive likelihood ratio 3.75 (3.32, 4.24)
Negative likelihood ratio 0.13 (0.11, 0.16)
---------------------------------------------------------
Caret and other packages use the Clopper-Pearson Interval method to calculate the confidence interval.
I consider your 2x2 reversed since the TP (True Positive) is on the bottom right. If the TP is at the top left then variables (A,B,C,D) would be switched.
D = 4477
C = 162
B = 10
A = 20
Acc = (A+D)/(A+B+C+D)
Sensitivity = A / (A + C)
Specificity = D / (D + B)
P = (A+C)/(A+B+C+D)
PPV = (Sensitivity*P)/((Sensitivity*P)+((1-Specificity)*(1-P)))
NPV = (Specificity*(1-P))/(((1 - Sensitivity)*P)+((Specificity)*(1-P)))
n = A+B+C+D
x = n - (A+D)
alpha = 0.05
ub = 1 - ((1 + (n - x + 1)/ (x * qf(alpha *.5, 2*x, 2*(n - x + 1))))^-1)
lb = 1 - ((1 + (n - x) / ((x + 1)* qf(1-(alpha*.5), 2*(x+1), 2*(n-x))))^-1)
CI = c(lb,ub)
> Acc
[1] 0.9631613
> CI
[1] 0.9573536 0.9683800
> Sensitivity
[1] 0.1098901
> Specificity
[1] 0.9977713
> PPV
[1] 0.6666667
> NPV
[1] 0.9650787
Here is also a good resource for where these formulas come from.
The following reproducible example is partially inspired from ROC curve from training data in caret.
library(MLeval)
library(caret)
library(pROC)
data(Sonar)
ctrl <- trainControl(method = "cv", summaryFunction = twoClassSummary, classProbs = TRUE, savePredictions = TRUE)
set.seed(42)
fit1 <- train(Class ~ ., data = Sonar,method = "rf",trControl = ctrl)
bestmodel <- merge(fit1$bestTune, fit1$pred)
mtx <- confusionMatrix(table(bestmodel$pred, bestmodel$obs))$table
# M R
# M 104 23
# R 7 74
# 95% Confident Interval
## Sensitivity
sens_errors <- sqrt(sensitivity(mtx) * (1 - sensitivity(mtx)) / sum(mtx[,1]))
sensLower <- sensitivity(mtx) - 1.96 * sens_errors
sensUpper <- sensitivity(mtx) + 1.96 * sens_errors
## Specificity
spec_errors <- sqrt(specificity(mtx) * (1 - specificity(mtx)) / sum(mtx[,2]))
specLower <- specificity(mtx) - 1.96 * spec_errors
specUpper <- specificity(mtx) + 1.96 * spec_errors
## Positive Predictive Values
ppv_errors <- sqrt(posPredValue(mtx) * (1 - posPredValue(mtx)) / sum(mtx[1,]))
ppvLower <- posPredValue(mtx) - 1.96 * ppv_errors
ppvUpper <- posPredValue(mtx) + 1.96 * ppv_errors
## Negative Predictive Values
npv_errors <- sqrt(negPredValue(mtx) * (1 - negPredValue(mtx)) / sum(mtx[2,]))
npvLower <- negPredValue(mtx) - 1.96 * npv_errors
npvUpper <- negPredValue(mtx) + 1.96 * npv_errors
Given I have 4 different values
intensities <- c(0.1,-0.1,0.05,-0.05)
My goal is to randomly sample every value 5 times but positive and negative values should alternate, e.g.
resultingList = (0.1, -0.05, 0.05, -0.05, 0.1, -0.1, ...)
Does anybody know an elegant way to do this in R?
Maybe something like this
# seed
set.seed(123)
plus <- rep(intensities[intensities >= 0], each = 5)
minus <- rep(intensities[intensities < 0], each = 5)
out <- numeric(length(plus) + length(minus))
out[seq(1, length(out), 2)] <- sample(plus)
out[seq(2, length(out), 2)] <- sample(minus)
out
# [1] 0.10 -0.05 0.05 -0.10 0.10 -0.05 0.05 -0.05 0.05 -0.10 0.10 -0.05 0.05 -0.05 0.05 -0.10
# [17] 0.10 -0.10 0.10 -0.10
If your list of intensities that you are sampling from come in +/- pairs, you could just sample from the list of positive values then change the sign of every other number drawn:
N <- 5
positiveIntensities <- c(0.1, 0.05)
resultingList <- sample(positiveIntensities,N,replace = T) * (-1)^(0:(N-1))
It's my solution, which creates a custom function and the argument n means the length of output. In addition, ceiling() and floor() can decide the lengths of odd and even positions.
mySample <- function(x, n){
res <- c()
res[seq(1, n, 2)] <- sample(x[x >= 0], ceiling(n / 2), T)
res[seq(2, n, 2)] <- sample(x[x < 0], floor(n / 2), T)
return(res)
}
intensities <- c(0.1, -0.1, 0.05, -0.05)
mySample(intensities, 10)
# [1] 0.10 -0.10 0.05 -0.05 0.10 -0.05 0.05 -0.05 0.05 -0.10
I am trying to fit a Marketing Mix Model and run into the following problem:
Warning message:
In KFS(model) :
Possible error in diffuse filtering: Negative variances in Pinf, check the >model or try changing the tolerance parameter tol or P1/P1inf of the model.
Below is a reproducible sample code with more detail.
The goal is to develop a tool to optimize marketing mix
Below x1(t), x2(t), x3(t) are investments into 3 marketing channelsat time t
y(t) is the sales at time t
We want to use Kalman Filter approach:
y(t) = alpha + lambda * y(t-1) + beta1 * x1(t) + beta2 * x2(t) + beta3 * x3(t) +
beta12 * x1(t) * x2(t) + beta13 * x1(t) * x3(t) + beta23 * x2(t) * x3(t) + N(0, sigma)
In order to:
i. deduce sales attributed to each channel x1(t), x2(t), x3(t)
ii. their synergies x1 * x2, x1 * x3, x2 * x3
iii. carry over sales yCO(t) = y(t-1)
We use KFAS package
https://cran.r-project.org/web/packages/KFAS/index.html
Below is a simple reproducible example where we:
1. Simulate x1(t), x2(t), x3(t)
2. Set input parameter values used for sales simulation
b1, b2, b12, ..., b23, lambdaà, and sigma
3. Use simulated y(t), x1(t), x2(t), x3(t) to fit the model
4. Compare estimated coefficient with input values b1, b2,...
library(KFAS)
library(dplyr)
sigma<-50
set.seed(1)
x1<-1000 + rnorm(n = 100,mean = 0,sd = 100) + rnorm(100, 0, sigma)
x2<-rep(0, 100)
x2[sort(which(1:100%%6==0))]<-500
x3<-300+100*sin(1:100%%12/12*pi) + rnorm(100, 0, sigma)
#Operationalize with SQRT
x1<-sqrt(x1)
x2<-sqrt(x2)
x3<-sqrt(x3)
#Set input parameters fro simulation
lambda0<-0.5
b1 <- 3
b2 <- 4
b3 <- 5
b12 <- 0.3
b13 <- 0.2
b23 <- 0.1
y_s <-
b1 * x1 +
b2 * x2 +
b3 * x3 +
b12 * x1 * x2 +
b13 * x1 * x3 +
b23 * x2 * x3 +
rnorm(100, sd = sigma)
# function to account for carry over term
# y(t) = lambda * y(t-1) + y_s(t) , where
# y_s(t) = b1 * x1 + b2 * x2 + b3 * x3 + Synergy terms
getCarryOver<-function(t,
y_s,
lambda)
{
if (t==1) return(y_s[1])
else lambda*getCarryOver(t-1,y_s,lambda) + y_s[t]
}
# Add Carry Over term
y<-vector('numeric',100)
for (i in 1:100){
y[i]<-getCarryOver(i,y_s,lambda0)
}
yCO=dplyr::lag(y)
yCO[1]=0
if (!identical(y[-1],y_s[-1]+lambda0*yCO[-1]))
stop('identical(y,y_s+lambda0*yCO)')
model <- SSModel(y ~ SSMregression(~ x1 +
x2 +
x3 +
x1*x2 +
x1*x3 +
x2*x3 +
yCO
, Q = diag(NA,1)), H = NA)
fit <- fitSSM(model, inits = c(0,0,0,0,0,0,0,0), method = "BFGS")
model <- fit$model
model$Q
model$H
out <- KFS(model)
print(out)
This doesn't fully answer your question but this is too long for a comment...
You are not creating the same model you have formulated in before the codes. By defining Q=NA you are actually stating that the first coefficient x1 should be time varying with unknown variance. And then in the fitSSM call you are giving too many initial values so you don't notice the error (only Q and H need numerical estimation by fitSSM, the coeffients for x1 etc are directly estimated by Kalman filter). I admit there is probably few checks missing here which would warn user accordingly. SSMregression function is only needed if you have time-varying regression coefficients or complex multivariate models, here you can just write SSModel(y~ x1*x2 + x1*x3 + x2*x3 + yCO, H=NA) (the main effects are automatically included as in lm).
I would also check the carryover term calculations, just to be sure that you actually generate your data correctly.
If you still get errors, it could be that you have really high multicollinearity that you need to modify the prior for first time step, ie set model$P1inf[] <- 0 (removes the diffuse initialization) and set diag(model$P1) to something moderate like 100, (prior variance of the coefficients).
Actually if you are using the the formulation you suggest (all x's and yC0 as simple explanatory variable) then you should get identical results with lm. And running your code with lm I get same apparently wrong results:
> model <- SSModel(y~ x1*x2 + x1*x3 + x2*x3 + yCO, H=NA)
>
> fit <- fitSSM(model, inits = 0, method = "BFGS")
> out <- KFS(fit$model)
Warning message:
In KFS(fit$model) :
Possible error in diffuse filtering: Negative variances in Pinf, check the model or try changing the tolerance parameter tol or P1/P1inf of the model.
> out
Smoothed values of states and standard errors at time n = 100:
Estimate Std. Error
(Intercept) -1.171e+03 1.300e+03
x1 3.782e+01 4.102e+01
x2 -4.395e+00 1.235e+01
x3 7.287e+01 6.844e+01
yCO 5.244e-01 3.396e-02
x1:x2 5.215e-01 3.979e-01
x1:x3 -1.853e+00 2.167e+00
x2:x3 1.671e-01 3.471e-01
> summary(lm(y~ x1*x2 + x1*x3 + x2*x3 + yCO))
Call:
lm(formula = y ~ x1 * x2 + x1 * x3 + x2 * x3 + yCO)
Residuals:
Min 1Q Median 3Q Max
-137.297 -29.870 -2.214 35.178 87.578
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -1.171e+03 1.300e+03 -0.901 0.370
x1 3.782e+01 4.102e+01 0.922 0.359
x2 -4.395e+00 1.235e+01 -0.356 0.723
x3 7.287e+01 6.844e+01 1.065 0.290
yCO 5.244e-01 3.396e-02 15.445 <2e-16 ***
x1:x2 5.215e-01 3.979e-01 1.311 0.193
x1:x3 -1.853e+00 2.167e+00 -0.855 0.395
x2:x3 1.671e-01 3.471e-01 0.481 0.631
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 50.25 on 92 degrees of freedom
Multiple R-squared: 0.88, Adjusted R-squared: 0.8709
F-statistic: 96.43 on 7 and 92 DF, p-value: < 2.2e-16
So I think you have some issues with generating your example data, or in the model formulation.
I need to create a fancy table and export it as png. I'm trying tables package in R. I need to group "variacion" by groups of agents ("agentes") who had a positive variation vs the rest. I want the mean, sd and the number of agents who fulfill these conditions
My table is:
agente mes1 mes2 variacion
1 a1 0.50 0.60 0.20000000
2 a2 0.70 0.65 -0.07142857
3 a3 0.60 0.75 0.25000000
4 a4 0.80 0.60 -0.25000000
5 a5 0.78 0.90 0.15384615
My output should be (including format):
You can arrive to those numbers by doing for example:
sd(t_agentes1$variacion[t_agentes1$variacion<=0])
And the result is the last number in the table for the column sd: 0.126
So in tables library:
library(tables)
X<-t_agentes1$variacion
latex( tabular( (X > 0) + (X < 0) + 1
+ ~ ((n = 1) + X*(mean + sd + length)) ) )
But I get the error:
non-numeric argument to binary operator
Also when I try the first example of the package I get the same error
tabular( (Species + 1) ~ (n=1) + Format(digits=2)*
+ + (Sepal.Length + Sepal.Width)*(mean + sd), data=iris )
Error in e[[3]] : subscript out of bounds
I really don't understand the parameters of this package. Is there a way to do the grouping? I'm really lost with this so any help would be really appreciated. Thanks.
X <- read.table(header = TRUE, text="agente mes1 mes2 variacion
1 a1 0.50 0.60 0.20000000
2 a2 0.70 0.65 -0.07142857
3 a3 0.60 0.75 0.25000000
4 a4 0.80 0.60 -0.25000000
5 a5 0.78 0.90 0.15384615")
X <- within(X, variation <- factor(variacion > 0, levels = c(TRUE, FALSE),
labels = c('variation > 0',
'variation <= 0')))
library(tables)
# latex(
# tabular(Heading() * variation ~
# Justify(l) * (Heading() * Format(digits = 2) * variacion * (mean + sd) + (number = (n = 1))),
# data = X))
latex(
tabular(Heading() * variation ~
Justify(l) * (Heading() * variacion * (Format(digits = 2) * mean + Format(digits = 2) *sd) + (number = (n = 1))),
data = X))
# mean sd number
# variation $>$ 0 0.20 0.048 3
# variation $\\leq$ 0 -0.16 0.126 2
Gives me
Without prettifying the results:
tabular((X > 0) + (X < 0) ~ mean*X + sd*X + length*X)