lpsolve with constraints in r - r
I would like to use R to solve an optimization problem using the lpSolve which can perform processes similar to the solver add-in in excel. Below is a simple case where I would like to maximize npv value specifically using lpSolve.
df<-structure(list(id = c(1, 2, 3, 4, 5, 6, 7, 8), Revenue = c(109,
111, 122, 139, 156, 140, 137, 167)), row.names = c(NA, 8L), class = "data.frame")
dcf <- function(x, r, t0=FALSE){
# calculates discounted cash flows (DCF) given cash flow and discount rate
#
# x - cash flows vector
# r - vector or discount rates, in decimals. Single values will be recycled
# t0 - cash flow starts in year 0, default is FALSE, i.e. discount rate in first period is zero.
if(length(r)==1){
r <- rep(r, length(x))
if(t0==TRUE){r[1]<-0}
}
x/cumprod(1+r)
}
npv <- function(x, r, t0=FALSE){
# calculates net present value (NPV) given cash flow and discount rate
#
# x - cash flows vector
# r - discount rate, in decimals
# t0 - cash flow starts in year 0, default is FALSE
sum(dcf(x, r, t0))
}
npv(df$Revenue,.2)
#Non optimized npv yields a value of 492.
#How can i use lpSolve to optimize my table? Said another way how can I rearrange the table to maximize npv using lpSolve?
More complicated problem involves a penalizing column with the following rule:
Id's represent projects.
if Id project is not the starting period (row 1). Check to see if previous Id is within a delta of 2 (absolute value of subtracting row Id from other previous rows. If true, penalize Revenue by 20%. I think this problem still involved solving for the correct order. How can I optimize this function?
#Randomize order to give base npv. Now i need to optimize the order to find max value
df<- df%>%mutate(random_sort= sample(nrow(df)))
x=function(i){
df_fcn<- i
df_fcn<- df_fcn%>%mutate(Penalty= if_else(abs(random_sort-lag(random_sort))>2,1,.8))%>%mutate(Penalty=ifelse(is.na(Penalty),1,Penalty))
df_fcn<- df_fcn%>%mutate(Revenue_Penalized= Revenue*Penalty)
npv(df_fcn$Revenue_Penalized,.2)
}
Best I've come up with is to randomly rearrange the data and find the maximum value.
schedule_function=function(i){
i<- i%>%mutate(random_sort=sample(random_sort))
df_fcn<- i%>%mutate(Penalty= if_else(abs(random_sort-lag(random_sort))>2,1,.8))%>%mutate(Penalty=ifelse(is.na(Penalty),1,Penalty))
df_fcn<- df_fcn%>%mutate(Revenue_Penalized= Revenue*Penalty)
final_df<-print(df_fcn)
npv(df_fcn$Revenue_Penalized,.2)
}
n <- 1:10000
MAX = -Inf ## initialize maximum
for (i in 1:length(n)) {
x <- schedule_function(df)
if (x > MAX) MAX <- x
}
Related
Creating an excel one-way data table in R -- Problem with my for loop
I'm trying to create an excel one-way data table in R so that I can find the exponent that minimizes errors of a coefficient in an equation. I have a for loop that produces the correct result but it does something strange that I can't figure out. Here is an example of the data. I'll use the Pythogrean Win formula from baseball and use a for loop to find the exponent that minimizes the mean absolute error in the win projections. ## Create Data Teams <- c("Bulls", "Sharks", "Snakes", "Dogs", "Cats") Wins <- c(5, 3, 8, 1, 9) Losses <- 10 - Wins Win.Pct <- Wins/(Wins + Losses) Points.Gained <- c(30, 50, 44, 28, 60) Points.Allowed <- c(28, 74, 40, 92, 25) season <- data.frame(Teams, Wins, Losses, Win.Pct, Points.Gained, Points.Allowed) season ## Calculate Scoring Ratio season$Score.Ratio <- with(season, Points.Gained/Points.Allowed) ## Predict Wins from Scoring Ratio exponent <- 2 season$Predicted.Wins <- season$Score.Ratio^exponent / (1 + season$Score.Ratio^exponent) ## Calculate Mean Absolute Error season$Abs.Error <- with(season, abs(Win.Pct - Predicted.Wins)) mae <- mean(season$Abs.Error) mae Here is my for loop that is looking at a range of exponent options to see if any of them are better than the exponent, 2, used above. For some strange reason, when I run the for loop, it keeps repeating the table several times (many of the tables with incorrect results) until finally producing the correct table as the last one. Can anyone explain to me what is wrong with my for loop and why this is happening? ## Identify potential exponent options that minimize mean absolute error exp.options <- seq(from = 0.5, to = 3, by = 0.1) mae.results <- data.frame("Exp" = exp.options, "Results" = NA) for(i in 1:length(exp.options)){ win.pct <- season$Predicted.Wins pred.win.pct <- (season$Points.Gained/season$Points.Allowed)^exp.options[i] / (1 + (season$Points.Gained/season$Points.Allowed)^exp.options[i]) mae.results[i,2] <- mean(abs(win.pct - pred.win.pct)) print(mae.results) }
How to solve a matrix equation in R
My friend and I (both non-R experts) are trying to solve a matrix equation in R. We have matrix y which is defined by: y=matrix(c(0.003,0.977,0,0,0,0,0,0,0,0,0,0.02,0,0.0117,0.957,0,0,0,0,0,0,0,0,0.03,0,0,0.0067,0.917,0,0,0,0,0,0,0,0.055,0,0,0,0.045,0.901,0,0,0,0,0,0,0.063,0,0,0,0,0.0533,0.913,0,0,0,0,0,0.035,0,0,0,0,0,0.05,0,0,0,0,0,0.922,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.01,0,0,0,0,0,0,0,0,0,0,0,0,0.023,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0), nrow=12, ncol=12, byrow=TRUE) This matrix simulates the way students in our school pass on to the next year. By multiplying this matrix with a vector containing the amount of students in each year we will get the amount of students in each year a year later. With the function: sumfun<-function(x,start,end){ return(sum(x[start:end])) We add up the amount of students that are in each year to get the amount of students in our school in total. We want to fill in the vector (which we multiplicate by array with our matrix) with the amount of students currently in the school and have the amount of new students (first number of the vector) as our variable X. For example: sumfun(colSums(y*c(x,200,178,180,201,172,0,0,200,194,0,0)),2,6) We want to equate this equation to 1000, the maximum amount of students our school building can house. By doing this, we can calculate how many new students can be accepted by our school. We have no idea how to do this. We would precast X is something between 100 and 300. We would be very grateful if somebody can help us with this!
I'm not familiar with R but I can guide through the main process of solving this matrix equation. Assuming that your matrix is called P: And let the current student vector be called s0: s0 = {x, 200, 178, 180, 201, 172, 0, 0, 200, 194, 0, 0}; Note that we leave x undefined as we want to solve for this variable later. Note that even though x is unknown, we can still multiply s0 with P. We call this new vector s1. s1 = s0.P = {0.003*x, 2.34 + 0.977*x, 192.593, 173.326, 177.355, 192.113, 0, 0, 0, 0, 0, 192.749 + 0.02*x} We can verify that this is correct as of the student years 2-6, only year 2 is effected by the amount of new students (x). So if now sum over the years 2-6 like in your example, we find that the sum is: s1[2:6] = 737.727 + 0.977*x All that is left is solving the trivial equation that s1[2:6] == 1000: s1[2:6] == 1000 737.727 + 0.977*x == 1000 x = 268.447 Let me know if this is correct! This was all done in Mathematica. The following code shows how to this in R: y=matrix(c(0.003,0.977,0,0,0,0,0,0,0,0,0,0.02,0,0.0117,0.957,0,0,0,0,0,0,0,0,0.03,0,0,0.0067,0.917,0,0,0,0,0,0,0,0.055,0,0,0,0.045,0.901,0,0,0,0,0,0,0.063,0,0,0,0,0.0533,0.913,0,0,0,0,0,0.035,0,0,0,0,0,0.05,0,0,0,0,0,0.922,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.01,0,0,0,0,0,0,0,0,0,0,0,0,0.023,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0), nrow=12, ncol=12, byrow=TRUE) sumfun<-function(x,start,end){ return(sum(x[start:end])) } students <- function(x) { students = sumfun(colSums(y*c(x,200,178,180,201,172,0,0,200,194,0,0)),2,6); return(students - 1000); } uniroot(students, lower=100, upper=300)$root; The function uniroot finds whenever a function is 0. So if you define a function which returns the amount of students for a value x and subtract 1000, it will find the x for which the number of students is 1000. Note: this only describes short term behavior of the total amount of students. To have the number of students be 1000 in the long-term other equations must be solved.
I would suggest probing various x values and see the resulting answer. From that, you could see the trend and use it for figuring out the answer. Here is an example: # Sample data y=matrix(c(0.003,0.977,0,0,0,0,0,0,0,0,0,0.02,0,0.0117,0.957,0,0,0,0,0,0,0,0,0.03,0,0,0.0067,0.917,0,0,0,0,0,0,0,0.055,0,0,0,0.045,0.901,0,0,0,0,0,0,0.063,0,0,0,0,0.0533,0.913,0,0,0,0,0,0.035,0,0,0,0,0,0.05,0,0,0,0,0,0.922,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.01,0,0,0,0,0,0,0,0,0,0,0,0,0.023,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0), nrow=12, ncol=12, byrow=TRUE) # funciton f will return a total number of students in the school for a given 'x' f <- function(x) { z <- c(x,200,178,180,201,172,0,0,200,194,0,0) sum(t(y[,2:6]) %*% z) } # Let's see the plot px <- 1:1000 py <- sapply(px,f) # will calculate the total number of students for each x from 1 to 1000 plot(px,py,type='l',lty=2) # Analyze the matrices (the analysis is not shown here) and reproduce the linear trend lines(px,f(0)+sum(y[1,2:6])*px,col='red',lty=4) # obtain the answer using the linear trend Xstudents <- (1000-f(0))/sum(y[1,2:6]) floor(Xstudents)
knapsack case r implementation for multiple persons using genetic algorithm
I am trying to implement genetic algorithm in R. I found out that r has 'GA' and 'genalg' packages for genetic algorithm implementation. I encountered the example i the link http://www.r-bloggers.com/genetic-algorithms-a-simple-r-example/. They tried solving the Knapsack problem. The problem can be briefly explained as: "You are going to spend a month in the wilderness. You’re taking a backpack with you, however, the maximum weight it can carry is 20 kilograms. You have a number of survival items available, each with its own number of 'survival points'. You’re objective is to maximize the number of survival points" The problem is easily solved using 'genalg' package for a single person and the output is binary string. Now i have a doubt, lets say instead of one person there are 2 or more i.e multiple persons and we need to distribute the survival points. The weight constraints apply for each person. Then how can we solve this problem? Can we use 'genalg' or 'GA' package? If so how can we apply them? Are there any examples on this that are solved in R or other software's? Thanks
The R package adagio (https://cran.r-project.org/web/packages/adagio/index.html) comes with two functions (knapsack and mknapsack) which solves this type of problem more efficient by dynamic programming.
A simple approach could be to have one chromosome containing all individuals in the group and have the evaluation function split this chromosome in multiple parts, one for each individual and then have these parts evaluated. In the example below (based on the example in the question) I have assumed each individual has the same weight limit and multiple individuals can bring the same item. library(genalg) #Set up the problem parameters #how many people in the group individual_count <-3 #The weight limit for one individual weightlimit <- 20 #The items with their survivalpoints dataset <- data.frame(item = c("pocketknife", "beans", "potatoes", "unions", "sleeping bag", "rope", "compass"), survivalpoints = c(10, 20, 15, 2, 30, 10, 30), weight = c(1, 5, 10, 1, 7, 5, 1)) #Next, we choose the number of iterations, design and run the model. iter <- 100 #Our chromosome has to be large enough to contain a bit for all individuals and for all items in the dataset chromosomesize <- individual_count * nrow(dataset) #Function definitions #A function to split vector X in N equal parts split_vector <- function(x,n) split(x, cut(seq_along(x), n, labels = FALSE)) #EValuate an individual (a part of the chromosome) evalIndividual <- function(x) { current_solution_survivalpoints <- x %*% dataset$survivalpoints current_solution_weight <- x %*% dataset$weight if (current_solution_weight > weightlimit) return(0) else return(-current_solution_survivalpoints) } #Evaluate a chromosome evalFunc <- function(x) { #First split the chromosome in a list of individuals, then we can evaluate all individuals individuals<-split_vector(x,individual_count) #now we need to sapply the evalIndividual function to each element of individuals return(sum(sapply(individuals,evalIndividual))) } #Run the Genetic Algorithm GAmodel <- rbga.bin(size = chromosomesize, popSize = 200, iters = iter, mutationChance = 0.01, elitism = T, evalFunc = evalFunc) #First show a summary summary(GAmodel,echo=TRUE) #Then extract the best solution from the GAmodel, copy/paste from the source code of the summary function filter = GAmodel$evaluations == min(GAmodel$evaluations) bestSolution = GAmodel$population[filter, , drop= FALSE][1,] #Now split the solution in the individuals. split_vector(bestSolution,individual_count)
Optimization in R using constraints
I have some problems setting up an optimalization process using R. My data set is as follows: set.seed(123) library(lpSolve) num_data <- 1000 bal_max <- .2/100 ind_max <- 10.5/100 data <- data.frame(id = 1:num_data, balance = pmax(0,runif(num_data, 0, 1000)), industry = rep(seq(1:10),num_data/10)) data$risk <- pmax(0, data$balance + rnorm(num_data,100,10)) As you can see, there are 1000 ids, 10 different industries. The objective is to maximize the sum of column "risk" and at the same time make sure that the fraction of each individual loan and industry cannot be above respectively 2% and 10.5%. In the current data set these conditions are not met: max(data$balance) / sum(data$balance) #[1] 0.002009751 industry <- aggregate(balance ~ industry, FUN=sum,data=data) max(industry$balance) / sum(industry$balance) #[1] 0.1093997 Thus before we can maximize the column risk, these two conditions needs to be met. The remainder of my code is as follows # set up linear prog problem num_x <- nrow(data) num_ind <- length(unique(data$industry)) objective.in <- data$risk # define quantity to be maximized # construct right-hand-side of constraint vector # - sum of balances = 1 # - each balance <= bal_max # - sum of balances for each industry <= ind_max # - lp solver function imposes constraint that each balance >= 0 const.rhs <- c( 1, rep(bal_max, num_x), rep(ind_max, num_ind)) # construct constraint matrix for same constraints mat_ind <- matrix(0,nrow=num_ind, ncol=num_x) for( i in 1:num_ind) mat_ind[i,which(data$industry == i)] <- 1 const.mat <- rbind( matrix(1, nrow=1,ncol=num_x), diag(num_x), mat_ind ) # define directions for each constraint equation const.dir <- c("=", rep("<=",num_x), rep("<=", num_ind)) # find balances for max risk # max_risk <- lp(direction="max", objective.in=objective.in, const.mat=const.mat, const.dir=const.dir, const.rhs=const.rhs) max_risk # add data balances with optimum solution data$balance <- max_risk$solution # each balance should be smaller than bal_max max(data$balance) # industry should be smaller than 10.5% each industry <- aggregate(balance ~ industry, FUN=sum,data=data) industry As you can see each industry and individual id does not exceed the bounderies (10.5% and 2%) anymore. The problem is that this code either fills in 2% or 0% for each loanid (such that the sum is 1). However, the absolute value of the initial balance should not rise as well. In this example the original balances often rises (fills in 2%). In short, I want to optimize the column "risk" where each individual ID is capped at 2% of the total balance (after the optimalization process) and each industry is capped at 10.5% (after the optimalization process) of the total balance. The sum of all fractions should be one and the absolute value of the balance cannot increase. The idea is to decrease the balances such that all conditions are met and "risk" is optimized.
Fast loan rate calculation for a big number of loans
I have a big data set (around 200k rows) where each row is a loan. I have the loan amount, the number of payments, and the loan payment. I'm trying to get the loan rate. R doesn't have a function for calculating this (at least base R doesn't have it, and I couldn't find it). It isn't that hard to write both a npv and irr functions Npv <- function(i, cf, t=seq(from=0,by=1,along.with=cf)) sum(cf/(1+i)^t) Irr <- function(cf) { uniroot(npv, c(0,100000), cf=cf)$root } And you can just do rate = Irr(c(amt,rep(pmt,times=n))) The problem is when you try to calculate the rate for a lot of payments. Because uniroot is not vectorized, and because rep takes a surprising amount of time, you end up with a slow calculation. You can make it faster if you do some math and figure out that you are looking for the roots of the following equation zerome <- function(r) amt/pmt-(1-1/(1+r)^n)/r and then use that as input for uniroot. This, in my pc, takes around 20 seconds to run for my 200k database. The problem is that I'm trying to do some optimization, and this is a step of the optimization, so I'm trying to speed it up even more. I've tried vectorization, but because uniroot is not vectorized, I can't go further that way. Is there any root finding method that is vectorized? Thanks
Instead of using a root finder, you could use a linear interpolator. You will have to create one interpolator for each value of n (the number of remaining payments). Each interpolator will map (1-1/(1+r)^n)/r to r. Of course you will have to build a grid fine enough so it will return r to an acceptable precision level. The nice thing with this approach is that linear interpolators are fast and vectorized: you can find the rates for all loans with the same number of remaining payments (n) in a single call to the corresponding interpolator. Now some code that proves it is a viable solution: First, we create interpolators, one for each possible value of n: n.max <- 360L # 30 years one.interpolator <- function(n) { r <- seq(from = 0.0001, to = 0.1500, by = 0.0001) y <- (1-1/(1+r)^n)/r approxfun(y, r) } interpolators <- lapply(seq_len(n.max), one.interpolator) Note that I used a precision of 1/100 of a percent (1bp). Then we create some fake data: n.loans <- 200000L n <- sample(n.max, n.loans, replace = TRUE) amt <- 1000 * sample(100:500, n.loans, replace = TRUE) pmt <- amt / (n * (1 - runif(n.loans))) loans <- data.frame(n, amt, pmt) Finally, we solve for r: library(plyr) system.time(ddply(loans, "n", transform, r = interpolators[[n[1]]](amt / pmt))) # user system elapsed # 2.684 0.423 3.084 It's fast. Note that some of the output rates are NA but it is because my random inputs made no sense and would have returned rates outside of the [0 ~ 15%] grid I selected. Your real data won't have that problem.