setting variables to a value using linear programming in r - r

I have developed a linear programming model in R and I would like to know the command to set a variable to a value, here is my code and the results:
install.packages("lpSolveAPI")
library(lpSolveAPI)
#want to solve for 6 variables, these correspond to the number of bins
lprec <- make.lp(0, 6)
lp.control(lprec, sense="max")
#MODEL 1
set.objfn(lprec, c(13.8, 70.52,122.31,174.73,223.49,260.65))
add.constraint(lprec, c(13.8, 70.52, 122.31, 174.73, 223.49, 260.65), "=", 204600)
add.constraint(lprec, c(1,1,1,1,1,1), "=", 5000)
Here are the results:
> solve(lprec)
[1] 0
> get.objective(lprec)
[1] 204600
> get.variables(lprec)
[1] 2609.309 2390.691 0.000 0.000 0.000 0.000
I would like to set the first result (2609) to 3200,and the last result to 48, and then optimize on the other variables, any help would be much appreciated.

Ideally your expectation is for constrained optimization for which you should add more constraints as per your requirement. I am not familiar with lpSolveAPI and so not able to do correct coding but you need something like:
add.constraint(lprec, c(1, 0, 0, 0, 0, 0), "=", 3200)
add.constraint(lprec, c(0, 0, 0, 0, 0, 1), "=", 48)
Along with your existing constraints.

Related

"Error in make.unique(bi, sep = sep): 'names' must be a character vector"? Any suggestions to solve it?

I need your help in solving an error related to vector name
I'm trying to define the function- ginv2 in r to convert a matrix output into fraction format but, I got an error related to 'names'.
`Error in make.unique(bi, sep = sep) : 'names' must be a character vector`
Here are the details of my goals and codes that I tried: I'm trying to achieve contrast coding for one of the predictor variables in the linear mixed effect model. I want to use the repeated contrast coding for a variable(AgeRange) with four levels.
*To code the repeated contrast in r, I followed 4 steps prescribed in D.J. Schad, et, al.(2020)*
##Step1~ specify hypothesis:
# HoII-I: -1.I + 1.II + 0.III + 0.IV = 0
# HoIII-II: 0.I - 1.II + 1.III + 0.IV = 0
# HoIV-III: 0.I + 0.II - 1.III + 1.IV = 0
##Step2~ create vector weights for all the hypotheses for this data
cIIvsI <- c(AgeRangeI=-1, AgeRangeII= +1, AgeRangeIII= 0, AgeRangeIV= 0)
cIIIvsII <- c(AgeRangeI= 0, AgeRangeII= -1, AgeRangeIII= +1, AgeRangeIV= 0)
cIVvsIII <- c(AgeRangeI= 0, AgeRangeII= 0, AgeRangeIII= -1, AgeRangeIV= +1)
##Step3~ extract and code the weights form all the hypothesis and convert into hypothesis matrix in R
RTacc.df2424$AgeRange <- factor(RTacc.df2424$AgeRange)
AgeRange.contrast <- rbind (
cIIvsI <- c(AgeRangeI=-1, AgeRangeII= +1, AgeRangeIII= 0, AgeRangeIV= 0),
cIIIvsII <- c(AgeRangeI= 0, AgeRangeII= -1, AgeRangeIII= +1, AgeRangeIV= 0),
cIVvsIII <- c(AgeRangeI= 0, AgeRangeII= 0, AgeRangeIII= -1, AgeRangeIV= +1))
fractions(t(AgeRange.contrast))
[,1] [,2] [,3]
AgeRangeI -1 0 0
AgeRangeII 1 -1 0
AgeRangeIII 0 1 -1
AgeRangeIV 0 0 1
##Step4 ~ define the generalize inverse to obtain a new contrast matrix XcRE.
##This is to achieve the generalised invers matrix that exactly tests the hypotheses coded which provides the code in fractions format
Before appliying generalise inverse to hypothesis matrix, we need to define the ginv2 function.
ginv2 <- function(x)
fractions(provideDimnames(ginv(x),
base = dimnames(x)[2:1]))
After defining ginv2, I passed the AgeRange contrast matrix to ginv2 function
ginv2(AgeRange.contrast)
This gives an error~~
Error in make.unique(bi, sep = sep) : 'names' must be a character vector
5.make.unique(bi, sep = sep)
4.provideDimnames(ginv(x), sep = "", base = dimnames(x)[2:1])
3..rat(x, cycles, max.denominator)
2.fractions(provideDimnames(ginv(x), sep = "", base = dimnames(x)[2:1]))
1.ginv2(AgeRange.contrast)
How to reslove this error issue? Any feedback and suggestions on defining the function correctly will be helpful.
Thank you in advance.

Emoji Sentiment Analysis in R

I am working on a project where I have used tweets with Emojis and Emoticons. My main goal is to get the combined sentiment score of the tweets( text + Emoticons ) and as we know these emoticons are probably the most meaningful part of the data and that's they can not be neglected. I have converted the encoding structure of the emojis and emoticons via iconv but I am only getting the sentiment score for the text, not the emojis. I am using Vader sentiment in this process but if there is another Sentiment library/Lexicon that can be used which will give me the senti score for all the emojis too it will be a lot helpful and highly appreciated.
Tweets:
dput(df_emoji$Description)
c("DoorDash or Uber method asap<f0><9f><98><ad> cause I be starving<f0><9f><98><ad><f0><9f><98><ad>",
"such a real ahh niqq cuz I be having myself weak asl<f0><9f><98><82>",
"shii made me laugh so fuccin hard bro<f0><9f><98><82><f0><9f><98><82><f0><9f><98><82><f0><9f><98><82>",
"Hart and Will Ferrell made a Gem in Get hard fr<f0><9f><98><82><f0><9f><98><82><f0><9f><98><82>",
"#NigerianAmazon Chill<f0><9f><a4><a3><f0><9f><98><ad>", "so bomedy <f0><9f><98><82><f0><9f><98><82><f0><9f><98><82>",
"is that ass Gotdam<f0><9f><98><82><f0><9f><98><82><f0><9f><98><82>",
"wild<f0><9f><98><82><f0><9f><98><82><f0><9f><98><82>",
"them late night DoorDash<e2><80><99>s be goin crazy<f0><9f><a4><a3>",
"of the week<f0><9f><98><82><f0><9f><98><82><f0><9f><98><82><f0><9f><98><82>"
)
Code:
emoji_senti <- data.frame(text = iconv(data_sample$text, "latin1", "ASCII", "byte"),
stringsAsFactors = FALSE)
column1 <- separate(emoji_senti, text, into = c("Bytes", "Description"), sep = "\\ ")
column2 <- separate(emoji_senti, text, into = c("Bytes", "Description"), sep = "^[^\\s]*\\s")
df_emoji <- data.frame(Bytes = column1$Bytes, Description = column2$Description)
allvals_emoji <- NULL
for (i in 1:length(df_emoji$Description)){
outs <- vader_df(df_emoji$Description[i])
allvals_emoji <- rbind(allvals_emoji,outs)
}
allvals_emoji
See this that the first tweet has only 9 English words which have their scores but it misses the score for converted Unicode for emojis.
# word_scores compound pos neu neg but_count
# 1 {0, 0, 0, 0, 0, 0, 0, 0, 0} 0.000 0.000 1.000 0.000 0
# 2 {0, 0, 0, 0, 0, 0, 0, 0, 0, 0, -1.9, 0, 0} -0.440 0.000 0.805 0.195 0
# 3 {0, 0, 0, 2.6, 0, 0, -0.67835, 0, 0} 0.444 0.293 0.570 0.137 0
# 4 {0, 0, 0, 0, 0, 0, 0, 0, 0, -0.4, 0} -0.103 0.000 0.877 0.123 0
# 5 {0, 0} 0.000 0.000 1.000 0.000 0
# 6 {0, 0, 0, 0} 0.000 0.000 1.000 0.000 0
# 7 {0, 0, -2.5, 0, 0} -0.542 0.000 0.533 0.467 0
# 8 {0, 0} 0.000 0.000 1.000 0.000 0
# 9 {0, 0, 0, 0, 0, 0, 0} 0.000 0.000 1.000 0.000 0
# 10 {0, 0, 0, 0} 0.000 0.000 1.000 0.000 0
Check this discussion: VaderSentiment: unable to update emoji sentiment score
"Vader transforms emojis to their word representation prior to extracting sentiment"
Basically from what I tested out emoji's values are hidden but part of the score and can influence it. If you need the score for a specific emoji you can check library(lexicon) and run data.frame(hash_emojis_identifier) (dataframe that contains identifiers for emojis and matches them to a lexicon format) and data.frame(hash_sentiment_emojis) to get each emoji sentiment value. It is not possible though to determine from that what was the impact of a series of emojis over the total message score without knowing how vader calculates their cumulative impact on the score itself using libraries such as vader, lexicon.
You can evaluate the impact of the emoji though by doing a simple difference between the total score value of the message with emojis and the score without it:
allvals <- NULL
for (i in 1:length(data_sample)){
outs <- vader_df(data_sample[i])
allvals <- rbind(allvals,outs)
}
allvalswithout <- NULL
for (i in 1:length(data_samplewithout)){
outs <- vader_df(data_samplewithout[i])
allvalswithout <- rbind(allvalswithout,outs)
}
emojiscore <- allvals$compound-allvalswithout$compound
Then:
allvals <- cbind(allvals,emojiscore)
Now for large datasets it would be ideal to automate the process of removing emojis out of texts. Here i just removed it manually to propose this kind of approach to the problem.

Quadratic optimization - portfolio maximization problems

In portfolio analysis, given the expectation, we aim to find the weight of each asset to minimize the variance
here is the code
install.packages("quadprog")
library(quadprog)
#Denoting annualized risk as an vector sigma
sigma <- c(0.56, 7.77, 13.48, 16.64)
#Formulazing the correlation matrix proposed by question
m <- diag(0.5, nrow = 4, ncol = 4)
m[upper.tri(m)] <- c(-0.07, -0.095, 0.959, -0.095, 0.936, 0.997)
corr <- m + t(m)
sig <- corr * outer(sigma, sigma)
#Defining the mean
mu = matrix(c(1.73, 6.65, 9.11, 10.30), nrow = 4)
m0 = 8
Amat <- t(matrix(c(1, 1, 1, 1,
c(mu),
1, 0, 0, 0,
0, 1, 0, 0,
0, 0, 1, 0,
0, 0, 0, 1), 6, 4, byrow = TRUE))
bvec <- c(1, m0, 0, 0, 0, 0)
qp <- solve.QP(sig, rep(0, nrow(sig)), Amat, bvec, meq = 2)
qp
x = matrix(qp$solution)
x
(t(x) %*% sig %*% x)^0.5
I understand the formulation of mu and covariance matrix and know the usage of the quadprog plot
However, I don‘t understand why Amat and bvec are defined in this way, why the are 6 by 4 matrix.
$mu0$ is the expectation we aim to have for the portfolio and it is fixed at value 8%
Attached is the question
As you are probably aware, the reason that Amat has four columns is that there are four assets that you are allocating over. It has six rows because there are six constraints in your problem:
The allocations add up to 1 (100%)
Expected return = 8%
'Money market' allocation >= 0
'Capital stable' allocation >= 0
'Balance' allocation >= 0
'Growth' allocation >= 0
Look at the numbers that define each constraint. They are why bvec is [1, 8, 0, 0, 0, 0]. Of these six, the first two are equality constraints, which is why meq is set to 2 (the other four are greater than or equal constraints).
Edited to add:
The way the constraints work is this: each column of Amat defines a constraint, which is then multiplied by the asset allocations, with the result equal to (or greater-than-or-equal-to) some target that is set in bvec. For example:
The first column of Amat is [1, 1, 1, 1], and the first entry of bvec is 1. So the first constraint is:
1 * money_market + 1 * capital_stable + 1 * balance + 1 * growth = 1
This is a way of saying that the asset allocations add up to 1.
The second constraint says that the expected returns add up to 8:
1.73 * money_market + 6.65 * capital_stable + 9.11 * balance + 10.32 * growth = 8
Now consider the third constraint, which says that the 'Money market' allocation is greater than or equal to zero. That's because the 3rd column of Amat is [1, 0, 0, 0] and the third entry of bvec is 0. So this constraint looks like:
1 * money_market + 0 * capital_stable + 0 * balance + 0 * growth >= 0
Simplifying, that's the same as:
money_market >= 0

R Code error in matrix: 'dimnames' must be a list

I am currently using the Monte Carlo method here.
Although the code (with some minor adaptations) worked with my 2x2 or 3x3 matrix, I keep getting the following error code for my 4x4 matrix:
Error in matrix(c(0.0461705, 0, 0, 0, 0, 0.0028639, 0, 0, 0, 0,
0.0740766, : 'dimnames' must be a list
What am I doing wrong and how do I address this error message?
################################################
# This code can be edited in this window and #
# submitted to Rweb, or for faster performance #
# and a nicer looking histogram, submit #
# directly to R. #
################################################
require(MASS)
a=1.1727132
b=0.2171818
c=1.3666784
d=0.1850852
rep=20000
conf=95
pest=c(a,b,c,d)
acov <- matrix(c(
0.0461705, 0, 0, 0,
0, 0.0028639, 0, 0,
0, 0, 0.0740766, 0,
0, 0, 0, 0.0013694
),4,4,4,4)
mcmc <- mvrnorm(rep,pest,acov,empirical=FALSE)
abcd <- mcmc[,1]*mcmc[,2]*mcmc[,3]*mcmc[,4]
low=(1-conf/100)/2
upp=((1-conf/100)/2)+(conf/100)
LL=quantile(abcd,low)
UL=quantile(abcd,upp)
LL4=format(LL,digits=4)
UL4=format(UL,digits=4)
################################################
# The number of columns in the histogram can #
# be changed by replacing 'FD' below with #
# an integer value. #
################################################
hist(abcd,breaks='FD',col='skyblue',xlab=paste(conf,'% Confidence Interval ','LL',LL4,' UL',UL4),
main='Distribution of Indirect Effect')
Thank you!
As #Remko told, please specify the arguments correctly. The R matrix can be created as:
acov <- matrix(c(
0.0461705, 0, 0, 0,
0, 0.0028639, 0, 0,
0, 0, 0.0740766, 0,
0, 0, 0, 0.0013694
),nrow = 4, ncol = 4, byrow = T,dimnames = list(c("r","o","w","s"),c("c","o","l","s")))
You cans set byrow = F if you want the data to be arranged column wise. The length of rownames and colnames vector must match the number of rows and number of columns respectively.

get.basis() in lpSolveAPI

I am confused with the return of function get.basis(). For example,
lprec <- make.lp(0, 4)
set.objfn(lprec, c(1, 3, 6.24, 0.1))
add.constraint(lprec, c(0, 78.26, 0, 2.9), ">=", 92.3)
add.constraint(lprec, c(0.24, 0, 11.31, 0), "<=", 14.8)
add.constraint(lprec, c(12.68, 0, 0.08, 0.9), ">=", 4)
set.bounds(lprec, lower = c(28.6, 18), columns = c(1, 4))
set.bounds(lprec, upper = 48.98, columns = 4)
RowNames <- c("THISROW", "THATROW", "LASTROW")
ColNames <- c("COLONE", "COLTWO", "COLTHREE", "COLFOUR")
dimnames(lprec) <- list(RowNames, ColNames)
solve(lprec)
Then the basic variables are
> get.basis(lprec)
[1] -7 -2 -3
However, the solution is
> get.variables(lprec)
[1] 28.60000 0.00000 0.00000 31.82759
From the solution, it seems variable 1 and variable 4 are basis. Hence how does vector (-7, -2, -3) come from?
I am guessing it is from 3 constraints and 4 decision variables.
After I reviewed the simplex method for bounded variables, finally I understood how it happens. These two links are helpful. Example; Video
Come back to this problem, the structure is like
lpSolveAPI (R interface for lp_solve) would rewrite the constraint structure as the following format after adding appropriate slack variables. The first three columns are for slack variables. Hence, the return of get.basis(), which is -7,-2,-3, are column 7, 2, 3 that represent variable 4, slack variable 2 and 3.
With respect to this kind of LP with bounded variables, a variable could be nonbasic at either lower bound or upper bound. The return of get.basis(lp, nonbasic=TRUE) is -1,-4,-5,-6. Minus means these variables are at their lower bound. It means slack variable 1 = 0, variable 4 = 28.6, variable 5 = 0, variable 6 = 0.
Thus, the optimal solution is 28.6(nonbasic), 0(nonbasic), 0(nonbasic), 31.82(basic)

Resources