I want to solve a linear optimization problem with binary decision variables to be solved in R (currently I am using the Rglkp package). However I am having trouble setting up the constraints.
Suppose a company wants to decide what quarters to sell their product to maximize their profit. But, if they want to sell they must sell in at least 3 quarters in a row. This is an example of what their profits might look like.
profits <- tibble(year = 1,
quarter = 1:4,
profit = c(23, -4, 6, -2))
I could then set up an Rglpk constraint matrix and solve as below.
cons.m <- matrix(c( 2, -1, -1, 0,
-2, 3, -2, -1,
-1, -2, 3, -1,
0, -1, -2, 2),
nrow = 4, byrow = T)
solution <- Rglpk_solve_LP(obj = profits$profit,
mat = cons.m,
dir = rep("<=", 4),
rhs = rep(0, 4),
types = rep("B", 4),
max = T)
solution$solution
[1] 1 1 1 0
Which says I should sell in the first 3 quarters and not sell in Q4. This is clearly the correct solution.
How could I extend this solution to work with 12 periods, where I must sell at least 5 quarters in a row?
profits.new <- tibble(year = rep(1:3, each = 4),
quarter = 1:12,
profit = runif(12, -20, 20))
I realize I can generate all combinations and then select the maximum that meets the requirements, but I want a solution the can generalize to much larger cases where there would be too many combinations.
This can be modeled as:
where n is the minimum length of a production run.
This will only require T=12 constraints.
The total number of possible production runs >= n (with n=5, T=12) is 42.
Of course, this difference will increase (rather dramatically) for longer planning horizons. E.g. for T=24,n=5 we have 24 constraints vs 4316 possible solutions.
An optimal solution can look like:
There is much more to say about constraints like this.
Related
I have a data.frame of intervals given row-wise, the interval starts in column one, the interval ends in column 2. The numbers are not integers. How can I find the overlapping range, if any, of all intervals. e.g:
df <- cbind(c(1.5, 3, 2.1, 1), c(6, 5, 3.7, 10.1))
plot(1:11, ylim = c(0, 5), col = NA)
segments(x0 = c(1.5, 3, 2.1, 1), y0 = 1:4, x1 = c(6, 5, 3.7, 10.1), y1 = 1:4)
abline(v = 3, col = "red", lty = 2)
abline(v = 3.7, col = "red", lty = 2)
somefunc(df)
[1] 3 3.7
A nice, fast base R (or common package like dplyr ect) solution is preferred. I already know of foverlaps (data.table) and IRranges, but they do not seem to address my problem. For bonus points, if there were interval(s) that prevented total overlap, e.g: rbind'ing c(20, 25) to df above, then the function should still return the largest possible overlap of any of the vectors, i.e. still returning c(3, 3.7).
EDIT: the solution linked by Henrik is good, but relies on generating a sequence with a given step (e.g. seq(start, end by = 1)) then reducing them to get the intersection. My intervals may narrower than this step. Ideally I need a solution that operates using logical comparison or something like that. The second solution in the linked page is also not quite right (see below)
EDIT EDIT: The intersection should be returned only if it is common to all ranges. Solution 2 in the post linked by Henrik groups together intervals even if not all intervals in the group intersect with every other interval
Here is a solution which which seems to return the expected result for the given sample datasets.
It takes the vector of all unique interval endpoints and counts the number of intervals they are intersecting (by aggregating in a non-equi join). Among the subset of points with the maximum number of intersections, the range is taken.
library(data.table)
# enhanced dataset with 2 additional intervals
dt <- fread("lb, ub
1.5, 6
3 , 5
2.1, 3.7
1 , 10.1
8.3 , 12
20 , 25")
mdt <- dt[, .(b = unique(unlist(.SD)))]
res <- dt[mdt, on = .(lb <= b, ub >= b), .N, by = .EACHI][N == max(N), range(lb)]
res
[1] 3.0 3.7
visualisation
library(ggolot2)
ggplot(dt) +
aes(x = lb, y = seq_along(lb), xend = ub, yend = seq_along(ub)) +
geom_segment() +
geom_vline(xintercept = res, col = "red", lty = 2)
EDIT: Handling of no overlaps
The OP has pointed out that the case where there are no overlaps needs to be recognized and handled separately. So I have modified the code:
mdt <- dt[, .(b = unique(unlist(.SD)))]
res <- dt[mdt, on = .(lb <= b, ub >= b), .N, by = .EACHI][
N == max(N), {
if (max(N) > 1) {
cat("Maximum overlaps found:", max(N), "out of", nrow(dt), "intervals\n")
range(lb)
} else {
cat("No overlaps found\n")
NULL
}
}]
This code will recognize the situation where there are no overlaps and will return NULL in these cases. In addition, a message is printed.
In all other cases, it will print an informative message, e.g.,
Maximum overlaps found: 4 out of 6 intervals
For OP's sample dataset without overlaps
dt <- data.table(lb = c(3, 6, 10), ub = c(5, 9, 15))
it will print
No overlaps found
Caveat
In case of multiple solutions the code above will return the overall range, i.e, the start of the first interval and the end of the last interval instead of a list of separate intervals.
Sample data for this use case:
dt <- fread("lb, ub
1.5, 6
3 , 5
2.1, 3.7
1 , 10.1
11.5, 16
13 , 15
12.1, 13.7
11 , 20.1
0 , 22
")
So, there is a 5-fold overlap between 3 and 3.7 and a second 5-fold overlap between 13 and 13.7.
Furthermore, there is another use case which needs to be considered: How shall intervals be treated which overlap only in one point, i.e. one interval ends where another starts?
I am learning to improve my coding in R. I have this code:
data$score[testA == 1] <- testA_score
data$score[testB==1] <- testB_score
So basically I have four columns that I want to combine into one: testA=1 indicates if the student took version A of the test and testA_score is their score; testB=1 indicates if the student took version B of the test and testB_score is their score. I want to combine this information into new column score.
As well Suppose I had testA, testB through testH. All values are 0 or 1. How can I make new column test_complete which is = 1 if any of the tests are = 1?
Basically as a former Stata user I am looking for the R equivalent commands to egen rowtotal and egenrowfirst. Thanks so much.
you can take max out of all test : since it 1 or 0 values only if at least one test is completed max will be equal to 1
testA <- c(1,0, 0, 1,0,0,0)
testB <- c(0, 1,0, 0, 1,0,1)
testC <- c(0, 0, 0,1, 0, 1, 0)
df <- as.data.frame(cbind(testA, testB, testC))
df$completed <- apply(df[, 1:3], 1, max)
So if I understand correctly, taking the maximum value by row should give what you need:
binary <- c(0,1)
df <- data.frame(
score1 = sample(binary, 20, replace = TRUE),
score2 = sample(binary, 20, replace = TRUE),
score3 = sample(binary, 20, replace = TRUE)
)
df$passed <- apply(df, 1, max)
head(df)
I want to quantify the structural effect of an event in a Time Series analysis. For doing so I'm using what the KFAS package has to offer.
The problem is that the code seems to get a bit tricky if I want to model simultaneously n different dates, so n different events.
Here's some example that should clarify, I hope:
library(fpp2)
library(KFAS)
# required libraries
y <- hyndsight # just for the example
plot(hyndsight)
abline(v = c(19, 35, 47), col = "red", lwd = 2)
Let's say that I want to check if each of the "red-line" event is a "permanent shock" (ie: it changes the level of the hyndsight series). Now here's the model that can do that with KFAS functions:
# Z and T component for event on pos 19
aZ1 <- array(0, c(1, 1, length(y)))
aZ1[1, 1, (19 + 1):length(y)] <- 1 # 1s from pos 19(+1) forward
aT1 <- array(1, c(1, 1, length(y)))
# Z and T component for event on pos 35
aZ2 <- array(0, c(1, 1, length(y)))
aZ2[1, 1, (35 + 1):length(y)] <- 1 # 1s from pos 35(+1) forward
aT2 <- array(1, c(1, 1, length(y)))
# Z and T component for event on pos 47
aZ3 <- array(0, c(1, 1, length(y)))
aZ3[1, 1, (47 + 1):length(y)] <- 1 # 1s from pos 47(+1) forward
aT3 <- array(1, c(1, 1, length(y)))
And here's the actual model:
mod <- SSModel(y~0+SSMtrend(2, list(NA, NA))+SSMseasonal(12, NA)+
SSMcustom(Z = aZ1, T = aT1,
R = matrix(0, 1, 0), Q = matrix(0, 0, 0),
a1 = 0, P = matrix(0), P1inf = matrix(1))+ # first event
SSMcustom(Z = aZ2, T = aT2,
R = matrix(0, 1, 0), Q = matrix(0, 0, 0),
a1 = 0, P = matrix(0), P1inf = matrix(1))+ # second event
SSMcustom(Z = aZ3, T = aT3,
R = matrix(0, 1, 0), Q = matrix(0, 0, 0),
a1 = 0, P = matrix(0), P1inf = matrix(1)), # third event
H = NA)
initial_val <- c(0,0,0,0,0,0,0) # the first 4 are always there
fit <- fitSSM(mod, intits = initial_val)
Now, the question is, how can I "automate" this process depending on the number of events that I want to model?
As you can see, for each event, I need to create a vector aZ and a vector aT. Those need to be passed in the model via the SSMcustom function, what if I have a new time series and I need to evaluate just two events, or four or more.
The problem is that I cannot keep adding SSMcustom to the model, I want to pass a new time series with a vector of n events, and automatically build the same model, except for the number of events to evaluate. Can I build a unique SSMcustom for all the events?
Also initial_val has to change, but that's less complicate it's always 4 + n, with n the number of events.
I know this question is a lot specific, maybe it's more for CrossValidated, but I'm not so sure.
Introduce "permanent shock" variables equals to 0 before the shock and equals to 1 after the shock (one variable per shock). Then add these variables as explanatory variables (regressors) in you model and look if they are significant.
Take care that each of these variables will reduce the degree of freedom. So you may want to test the model with the same coefficient for all the shocks.
I have two matrices. The first, m1, is 100x100 and contains numbers with decimal places and the other, m2, is 300x100 and is sparsely populated with integers, like so:
m1 <- matrix(rexp(1000, rate = .1), ncol = 100)
m2 <- matrix(sample(c(rep(0, 1000), rep(1, 10), rep(2, 1)), 300 * 100, replace = T), 300, 100)
Each row in m1 corresponds to the column of the same number in m2. Each column m2 represents the number of occurrences of the corresponding row in m1 for that observation.
For each row in m2, I want to get the colMeans of each row of m1 corresponding to how many times it appears in that row of m2. The result should be a 300x100 matrix. I want to know the most efficient way of doing this.
It's a complex operation but hopefully you understand what I mean. If you need any clarification I can give it. If it helps, what I'm trying to do is to get a document features matrix from a word feature matrix and a document-term matrix.
dtm <- matrix(c(0, 1, 1, 1, 0, 0, 0, 1, 0, 0, 1, 0), ncol = 4)
wvm <- matrix(c(27.305102, 9.095906, 3.792833, 17.561222, 32.06434, 4.719152, 8.367996, 0.0568822), ncol = 2)
dtm
wvm
t(apply(dtm, 1, function(dtm_row) {
vs <- wvm[dtm_row > 0, ] * dtm_row[dtm_row > 0]
if (is.matrix(vs)) { colMeans(vs) } else vs
}))
Solved my own problem. But if anyone wants to improve my method I'll mark there answer as the correct one.
I am writing a simple grid world q-learning program using R. This is my grid world
This simple grid world has 6 states in which state 1 and state 6 are starting and ending state. I avoided adding a fire pit, wall, wind so to keep my grid world as simple as possible. For reward matrix I have starting state value-0.1 and ending state a +1 and rest of the state 0. A -0.1 reward for starting state is to discourage the agent from coming back to the start position.
#Reward and action-value matrix
Row=state(1:6)
Column=actions(1:4)[Left,Right,Down,Up in that order]
I wrote my program in R and its working but with a problem in finding next state when current state is greater than 4th row. The Q matrix doesn't update after 4th row.
#q-learning example
#https://en.wikipedia.org/wiki/Q-learning
# 2x3 grid world
# S for starting grid G for goal/terminal grid
# actions left right down up
# 4 5 6 state
#########
# [0,0,G]
# [S,0,0]
#########
# 1 2 3 state
#setting seed
set.seed(2016)
#number of iterations
N=10
#discount factor
gamma=0.9
#learning rate
alpha=0.1
#target state
tgt.state=6
#reward matrix starting grid has -0.1 and ending grid has 1
R=matrix( c( NA, 0, NA, 0,
-0.1, 0, NA, 0,
0, NA, NA, 1,
NA, 0,-0.1, NA,
0, 1, 0, NA,
0, NA, 0, NA
),
nrow=6,ncol=4,byrow = TRUE)
#initializing Q matrix with zeros
Q=matrix( rep( 0, len=dim(R)[1]*dim(R)[2]), nrow = dim(R)[1],ncol=dim(R)[2])
for (i in 1:N) {
## for each episode, choose an initial state at random
cs <- 1
## iterate until we get to the tgt.state
while (1) {
## choose next state from possible actions at current state
## Note: if only one possible action, then choose it;
## otherwise, choose one at random
next.states <- which(R[cs,] > -1)
if (length(next.states)==1)
ns <- next.states
else
ns <- sample(next.states,1)
## this is the update
Q[cs,ns] <- Q[cs,ns] + alpha*(R[cs,ns] + gamma*max(Q[ns, which(R[ns,] > -1)]) - Q[cs,ns])
## break out of while loop if target state is reached
## otherwise, set next.state as current.state and repeat
if (ns == tgt.state) break
cs <- ns
Sys.sleep(0.5)
print(Q)
}
}
Currently when my algorithm starts the agent always start from the state-1. In the first state(first row of R) there are two actions either Right(R(1,2)) or Up(R(1,4)). If randomly selected an action say Up (R(1,4)) then the agent move to next state as the action Q(4,action).
But now consider state-4(forth row or R) it has two action Right-R(4,2) and Down-R(4,3) this cause problem for my algorithm and if randomly select an action say, Right. Logically it should move to 5th state but my above code
uses the action 2 as the next state. so instead of going to 5th state it goes to 2nd state.
In the end my algorithm will work perfectly if the dimension of state and action matrices are same(m x m) but in my problem my state and action matrices are different (m x n). I tried to find a solution to this problem but failed to find an logical approach to find next state for $max(Q(s',a'))$ currently I am stuck?
(The comments in your code don't correspond to what you are actually doing. Try to avoid this always.)
You are conflating the transition and the reward matrices. For a non-stochastic environment, they should look something like this:
R <- matrix(c(
-1, -1, -1, -1,
-1, -1, -1, -1,
-1, -1, -1, 10,
-1, -1, -1, -1,
-1, 10, -1, -1,
10, 10, -1, -1),
nrow=6, ncol=4, byrow=T)
T <- matrix(c(
1, 2, 1, 4,
1, 3, 2, 5,
2, 3, 3, 6,
4, 5, 1, 4,
4, 6, 2, 5,
6, 6, 3, 5),
nrow=6, ncol=4, byrow=T)
The ε-greedy strategy would be:
greedy <- function(s) which(Q[s,] == max(Q[s,]))
egreedy <- function(s, e) if (runif(1, 0, 1) < e) greedy(s) else sample(1:ncol(Q), 1)
ca <- egreedy(cs, epsilon)
Then choosing the next state is just:
ns <- T[cs, ca]