I took a class a few years ago about power market optimization and tried building a small example in R that I have worked up the courage to tackle again. However I need some help.
I would like to take the constraints of ~4 power plants and try to satisfy a single location's demand for power, as well as demand for 3 other types of ancillary services needed in the power market (called Reserves). I am looking to minimize the total cost of the generators of electricity.
I've laid out a lot of information but can't seem to figure out to how to start using any optimization packages (I have used optim before but couldn't quite work with these constraints). This will be a little lengthy but I'm using comments in R code for an easy copy-paste-run for those interested in helping or viewing what I have.
Warning: You're going to learn more about power plants and electricity markets than you want to from this post.
#####-------------------------------
# G for GENERATOR No.
# For each generator we have the following data:
## Maximum Capacity: the most power it can be producing
## Technical Minimum Capacity: the smallest amount (other than being off)
## Cost per Megawatt: The cost of generating power ($/MW)
## Ramp Rate: The speed a plant can change to a higher or lower output (MW/min)
G1 <- c(200, 100, 50, 2)
G2 <- c(150, 10, 80, 10)
G3 <- c(200, 100, 55, 2)
G4 <- c(150, 10, 85, 10)
Gdat.1 <- rbind(G1,G2,G3,G4)
colnames(Gdat.1) = c("MWMax","MWMin","Cost","RampRate")
Gdat.1
n <- nrow(Gdat.1) # number of generators
#####-------------------------------
# System Requirements: Demand
## Supply (MW) must equal demand.
Demand <- 415
# System Requirements: Reserves
## Total Reserves of the system must be met.
### R1: Primary Reserves
##### 0.5 minute response time, bi-direcitonal (Ramp UP/DOWN)
### R2: Secondary Reserves
##### 5 minute response time, bi-directional (Ramp UP/DOWN)
### R3: Tertiary Reserves
##### 15 minute response time, uni-directional (Ramp UP only)
# R for Reserve Type.
# For each Reserve Type we have the following data:
# Total: MW Needed
# minutes: within how much time the MW is needed by
# bid: amount the operator will pay for MW reserves $/MW)
R1 <- c(2, 0.5, 60) # Primary
R2 <- c(8, 5, 40) # Secondary
R3 <- c(20, 15, 0) # Tertiary
Reserves <- rbind(R1,R2,R3)
colnames(Reserves) = c("Total","Minutes","Bid")
Reserves
#####-------------------------------
## Ramp Rate constraint of generators
### For R1 (Primary Reserves) the system needs 2 MW that can be supplied 30 seconds,
### a Generator with a ramprate of 2 MW/min will only be able to supply
### 1 MW for primary reserves, while a Generator with a ramprate of 10 MW/min
### will be able to supply 5 MW.
# How much each Generator can supply in the time time
R1max <- Gdat.1[,"RampRate"] * Reserves["R1","Minutes"]
R2max <- Gdat.1[,"RampRate"] * Reserves["R2","Minutes"]
R3max <- Gdat.1[,"RampRate"] * Reserves["R3","Minutes"]
R1min <- -R1max # recall, bi-directional
R2min <- -R2max # bi-directional
R3min <- 0/R3max # uni-direction up
# we no longer need RampRate since we used it to calculate
Gdat <- cbind(Gdat.1[,-4], cbind(R1max,R2max,R3max,R1min,R2min,R3min))
#####-------------------------------
# Now we initialize each generator's commitments that can change during optimization
MW.Demand = rep(0,n) # general MW to satisfy demand
MW.R1 = rep(0,n) # MW to satisfy Primary Reserves
MW.R2 = rep(0,n) # MW to satisfy Secondary Reserves
MW.R3 = rep(0,n) # MW to satisfy Tertiary Reserves
Commit.orig <- cbind(MW.Demand,MW.R1,MW.R2,MW.R3)
rownames(Commit.orig) <- paste0("G",seq(1,n))
Commit <- Commit.orig
# Some initial guess (may be exactly the right answer...)
Commit <- matrix(c(200,0,0,0,
17.5,1,6.5,20,
197.5,1,1.5,0,
0,0,0,0), 4, 4, byrow = T, dimnames(Commit.orig))
#####-------------------------------
# Objective Function, cost per MW of each generator times their total MW output
# minimize the total cost, not sure which way to list it, or if this way even works
sum(Commit * Gdat[,"Cost"])
sum(Gdat[,"Cost"] %*% Commit)
sum(rowSums(Ctest * Gdat[,"Cost"]))
#####-------------------------------
# Constraints
sum(Commit[,"MW.Demand"]) == Demand & # All generators together must sum to meet system demand requirements
sum(Commit[,"MW.R1"]) == Reserves["R1","Total"] & # Total Primary Reserves are met
sum(Commit[,"MW.R2"]) == Reserves["R2","Total"] & # Total Secondary
sum(Commit[,"MW.R3"]) == Reserves["R3","Total"] & # Total Tertiary
(rowSums(Commit) <= Gdat[,"MWMax"] | rowSums(Commit) == 0) & # Generators must be less than or equal to its max, or off
(rowSums(Commit) >= Gdat[,"MWMin"] | rowSums(Commit) == 0) & # Genreators must be more than or equal to its min, or off
Commit[,"MW.R1"] <= Gdat[,"R1max"] & Commit[,"MW.R1"] >= Gdat[,"R1min"] & # Genrators cannot exceed ramprate limitations
Commit[,"MW.R2"] <= Gdat[,"R2max"] & Commit[,"MW.R2"] >= Gdat[,"R2min"] & # - for the bi-directional
Commit[,"MW.R3"] <= Gdat[,"R3max"] & Commit[,"MW.R3"] >= Gdat[,"R3min"] # - or unidirectional reserves
Thank you anyone willing to take a look at this.
Related
I'm trying to implement a check for decreasing values of avg temperatures to see when the temperature starts falling. See the chart of temperatures here:
Here is the formula I'm trying to implement:
Here is my code to implement that formula:
temps <- read.delim("temps.txt")
date_avgs <- rowMeans(temps[2:length(temps)], dims=1, na.rm=T)
mu <- 87
threshold <- 86
constant <- 3
date_avgs
S <- 0 * date_avgs
for (i in 2:length(date_avgs)) {
value <- S[i-1] + (mu - date_avgs[i] - constant)
cat("\nvalue", value, "si", date_avgs[i], i)
S[i] <- max(0, value)
if(S[i] >= threshold){
#Once I hit this for the first time, that indicates at this index the temp is decreasing
cat("\nDecreased past my threshold!!!", S[i] ,i)
}
}
But I'm not able to detect the change as I expect. My formula doesn't get over the threshold until index 108, when it should get there around index 60.
Here is the plot of my S (or CUSUM) values:
Any ideas what I'm doing wrong in my formula?
I think the problem is mu <- mean(date_avgs) basically means of all the observations. But mu should be "mean of X if no change". Thus mu should be about 87 but according your code and plotted data seems to be 80 or less.
# simulated data
set.seed(4422)
date_avgs <- c(runif(60, 84, 92), 88-(1:50)-rnorm(50,0,4))
plot(date_avgs)
# setting constants
mu <- 87
threshold <- 86
constant <- 3
# after running for cycle
Index <- match(S[S >= threshold][1], S)
Index
[1] 75
# for data
> date_avgs[74]
[1] 73.41981
# Considering a lower threshold
# (as maximum allowable difference to detect trend 2 * C)
mu <- 87
threshold <- 6 # arbitrary
constant <- 3
# after running for cycle
Index <- match(S[S >= threshold][1], S)
Index
[1] 66
So I think code is fine, maybe the interpretation is not
I new new to R and am trying to program a pair trading strategy in R.
I have already written the code for downloading the data. And have created additional columns and prepared the data. Now i need to calculate the trading signals.
My signal rules are as follows.
- If Z-Score is greater than 2.25 , Sell the pair; Buy back when Z-Score is less than 0.25.
- If Z-Score is less than -2.25 , Buy the pair; sell (Exit) when z-score is above -0.25.
- close any open position if there is a change in signal.
When we sell a pair, we sell the first stock and buy the second stock. In this case, we sell ACC and Buy Ambujacem.
When we buy a pair, we buy the first stock and sell the second stock. In this case, we buy ACC and Sell Ambujacem.
Could anyone help me with the coding for the trading signals.
Enclosing the code.
Regards,
Subash
# Trading Code
library(quantmod)
getSymbols("ACC.NS", from=as.Date('2007-01-01'), to=as.Date('2015-07-24'))
getSymbols("AMBUJACEM.NS", from=as.Date('2007-01-01'), to=as.Date('2015-07-24'))
acc=ACC.NS[,6]
amb=AMBUJACEM.NS[,6]
t.zoo <- merge(acc, amb, all=TRUE)
t.zoo=as.data.frame(t.zoo)
typeof(t.zoo)
t.zoo=na.omit(t.zoo)
#adding columns
t.zoo$spread <- 0
t.zoo$adfTest <- 0
t.zoo$mean <- 0
t.zoo$stdev <- 0
t.zoo$zScore <- 0
t.zoo$signal <- 0
t.zoo$BuyPrice <- 0
t.zoo$SellPrice <- 0
t.zoo$LongReturn <- 0
t.zoo$ShortReturn <- 0
t.zoo$Slippage <- 0
t.zoo$TotalReturn <- 0
#preparing the data
#Calculating the pair ratio
t.zoo$pairRatio <- t.zoo$ACC.NS.Adjusted/t.zoo$AMBUJACEM.NS.Adjusted
#Calculate the log prices of the two time series
t.zoo$LogA <- log10(t.zoo$ACC.NS.Adjusted)
t.zoo$LogB <- log10(t.zoo$AMBUJACEM.NS.Adjusted)
#Calculating the spread
t.zoo$spread <- t.zoo$ACC.NS.Adjusted/t.zoo$AMBUJACEM.NS.Adjusted
#Calculating the mean
# Computes the mean using the SMA function
# choose the number of days for calculating the mean
SMAdays = 20
t.zoo$mean <- SMA(t.zoo$spread,SMAdays)
#Calculating the Std Deviation
t.zoo$stdev <- rollapply(t.zoo$spread,20,sd, fill=NA, align='right')
#Calculating the Z Score
t.zoo$zScore <- (t.zoo$pairRatio - t.zoo$mean)/t.zoo$spread
View(t.zoo)
#Calculation of trading signals and trading prices
#Trigger sell or buy signal if Z Score moves above 2.25 or below -2.25.
# Close position if Z Score reaches 0.2 or -0.2.
# close any open position if there is a change in signal.
I think the main issue was to come up with trading signals for a strategy that depends not only on the current level of indicator but also on the direction from which the indicator is crossed.
There were a number of problems with the code posted in comments, including use of single = for comparisons . So I've worked it afresh
Here's my attempt at solving this. It seems to be fine. I've added some plotting code to eyeball the results. I suggest you check the result over different periods.
This code comes after the one in the original question . Only difference is that I have kept t.zoo as an xts/zoo object and not converted it to data.frame. Also, I've multiplied zScores with 100
It generates trigger dates and also a column depicting the state of strategy. Calculating returns would be easy from there
colnames(t.zoo)
#t.zoo must be an xts object
#working on a separate xts object
sigs<- t.zoo[, c("ACC.NS.Adjusted", "AMBUJACEM.NS.Adjusted" , "zScore")]
# creating my own triggers as there are not enough good values
# buyTrig<- mean(t.zoo$zScore ,na.rm = T) - 1*sd(t.zoo$zScore ,na.rm = T)
# sellTrig<- (-1) * buyTrig
# sqOffTrig<- mean(t.zoo$zScore ,na.rm = T) - 0.5*sd(t.zoo$zScore ,na.rm = T)
# Another approach: scaling tz.zoo to fit your criterion
sigs$zScore<- sigs$zScore*100
buyTrig<- (-2.25)
sellTrig<- (-1) * buyTrig
sqOffTrig<- 0.25
cat ( buyTrig, sellTrig , sqOffTrig)
hist(sigs$zScore, breaks = 40)
abline(v=c(buyTrig,sellTrig), col="red")
abline(v=c(-sqOffTrig, sqOffTrig), col="green")
sum(sigs$zScore >= -sqOffTrig & sigs$zScore<= sqOffTrig , na.rm = T) # 139
sigs$action<- 0
sigs$mode <- NA
sigs$zLag<- lag.xts(sigs$zScore,1)
sigs[19:22,]
#these are not the real trigger dates, but they will serve our purpose
# along with na.locf
buyTrigDays<- time(sigs[sigs$zScore<= buyTrig & sigs$zLag > buyTrig, ])
sellTrigDays<- time(sigs[sigs$zScore>= sellTrig & sigs$zLag < sellTrig, ])
#square offs
buySqOffDays<- time( sigs[sigs$zScore>= (-1*sqOffTrig) & sigs$zLag < (-1*sqOffTrig), ] )
buySqOffDays
sellSqOffDays<- time( sigs[sigs$zScore<= (sqOffTrig) & sigs$zLag > (sqOffTrig), ] )
sellSqOffDays
sigs$mode[buyTrigDays]=1 ; sigs$mode[sellTrigDays]= -1;
sigs$mode[buySqOffDays]=0 ; sigs$mode[sellSqOffDays]= 0;
sigs$mode
# use local fill to repeat these triggered position into future
# till you meet another non NA value
sigs$mode<- na.locf(sigs$mode, fromLast = F)
plot((sigs$zScore["2015"] ))
points(sigs$zScore[sigs$mode==1], col="red", on=1, pch = 19)
points(sigs$zScore[sigs$mode==-1], col="green", on=1 , pch = 19)
points(sigs$zScore[sigs$mode==0], col="blue", on=1)
sum(is.na(sigs$mode))
#now to get the real dates when square off is triggered
trigdays<- time( sigs[diff(sigs$mode,1) != 0, ] ) #when the value changes
squareOffTrigger_real<- time(sigs[sigs$mode==0][trigdays])
buyTrigger_real<- time(sigs[sigs$mode==1] [trigdays])
sellTrigger_real<- time(sigs[sigs$mode==-1][trigdays])
#check
length(sellTrigger_real) + length(buyTrigger_real) == length(squareOffTrigger_real)
plot(sigs$zScore["2015"])
points(sigs$zScore[buyTrigger_real] , col="blue", pch = 19, on=1)
points(sigs$zScore[sellTrigger_real] , col="red", pch = 19, on=1)
points(sigs$zScore[squareOffTrigger_real] , col="green", pch = 19, on=1)
abline(h=c(-sqOffTrig, sqOffTrig) , col= "green" )
# further calculations can be easily made using either the mode
# column or the trigger dates computed at the end
High level question is in the subject title: what can you do to debug linear optimisation when using R lp.
The detailed issue is that I have a working program adapted from: [http://pena.lt/y/2014/07/24/mathematically-optimising-fantasy-football-teams/][1]
Based on player data it chooses an optimal 15 man squad - handy for start of year or when you can change all players
I have changed it to:
1) Read player data from an Excel file (which I can supply - just tell me how)
2) Add 2 constraints to show players I definitely want to include in team and those I definitely don't.
Player data has the following columns:
web_name
team_name
type_name
now_cost
total_points
InTeam
In
Out
Good start, so I go about modelling the normal weeks when you can only transfer 1 player. I think I have the right constraint but now lp chooses about 200 players for me - not 15. Something very wrong - but I can't see it how it gets there.
I have tried going back from my new code to strip out the new feature and it still works.
I have tried removing the In/Out constraints and keeping the new "1 change" constraint. Same result.
Have upgraded packages and to latest R
Any pointers?
Code is
#Straight lift from Web - http://pena.lt/y/2014/07/24/mathematically-optimising-fantasy-football-teams/
# plus extra constraints to exclude and include specific players via Excel In/Out columns
# This variant looks to limit changes (typically 1 or 2) for a normal week
library(gdata)
library(lpSolve)
library(stringr)
library(RCurl)
library(jsonlite)
library(plyr)
excelfile<-"C:/Users/mike/Documents/FF/Start2015R.xlsx"
df=read.xls(excelfile)
# Constants
num_teams = 20
num_constraints = 8
# InTeam,In,Out,Cost + 4 positions
#Create the constraints
num_gk = 2
num_def = 5
num_mid = 5
num_fwd = 3
team_size = num_gk + num_def + num_mid + num_fwd
#max_cost = 1000
max_cost = 998
#max_cost = 2000
max_changes = 2
min_same = team_size - max_changes
# Create vectors to constrain by position
df$Goalkeeper = ifelse(df$type_name == "Goalkeeper", 1, 0)
df$Defender = ifelse(df$type_name == "Defender", 1, 0)
df$Midfielder = ifelse(df$type_name == "Midfielder", 1, 0)
df$Forward = ifelse(df$type_name == "Forward", 1, 0)
# Create vector to constrain by max number of players allowed per team
team_constraint = unlist(lapply(unique(df$team_name), function(x, df){
ifelse(df$team_name==x, 1, 0)
}, df=df))
# next we need the constraint directions. First is for MinSame
const_dir <- c(">=","=","=","=", "=", "=", "=", rep("<=", 21))
# The vector to optimize against
objective = df$total_points
# Put the complete matrix together
# nrow is number of constraints
const_mat = matrix(c(df$Inteam,df$In,df$Out,df$Goalkeeper, df$Defender, df$Midfielder, df$Forward,
df$now_cost, team_constraint),
nrow=( num_constraints + length(unique(df$team_name))),
byrow=TRUE)
const_rhs = c(min_same ,sum(df$In),0,num_gk, num_def, num_mid, num_fwd, max_cost, rep(3, num_teams))
# And solve the linear system
x = lp ("max", objective, const_mat, const_dir, const_rhs, all.bin=TRUE, all.int=TRUE)
print(arrange(df[which(x$solution==1),], desc(Goalkeeper), desc(Defender), desc(Midfielder), desc(Forward), desc(total_points)))
print (df[which(x$solution==1),"web_name",drop=FALSE], row.names = FALSE)
# what changed
df[which(x$solution != df$InTeam),"web_name",drop=FALSE]
I was wondering how to implement a moving maximum and minimum for a price in the last 5 minutes in O(n) time in R. My data consists of two columns: one with the time of day in seconds and the other with price. Right now, I take the current time, subtract 5 minutes, subset for the last 5 minutes, and then search for min and max at each index, so the operation is O(n^2). Is there any way to do this in O(n)?
Sample data:
time
[34200.19, 34200.23, 34201.45, ..., 35800, 35800.2, 35800.5]
price
[100, 103, 102, ..., 95, 97, 99]
The following compares a direct approach with a slightly more efficient varient, but it looks to scale as about n^1.6 on the values I've tried it with (10,000 - 100,000) - partly depends if incresing n is assumed to be more points in the same time period, or extending over a longer period.
#Create some data
n <- 10000
d <- data.frame(t=as.POSIXct(24*3600*runif(n), origin = "2014-01-01"),x=runif(n))
d <- d[order(d$t),]
d$inmax2 <-d$inmax <- rep(FALSE,n)
d$inmax2[1] <-d$inmax[1] <- TRUE
if (max(diff(d$t)) > 300) warning("There are gaps of more than 300 secs")
#Method 1, assume that you've done something like this
t1 <- system.time({
for (i in 2:n) d$inmax[i] <- !any((difftime(d$t[i], d$t[1:(i-1)] ,units="secs") < 300) & (d$x[i] < d$x[1:(i-1)] ))
})
#Method 2
t2 <- system.time({
cand <- 1
next_cand <- 2
while (next_cand <= n)
{
cand <- cand[difftime(d$t[next_cand],d$t[cand],units="secs")<300]
cand <- c(cand[d$x[cand] > d$x[next_cand]],next_cand)
if(length(cand)==1) d$inmax2[cand] <- TRUE
next_cand <- next_cand + 1
}
})
rbind(method1=t1,method2=t2)
# user.self sys.self elapsed user.child sys.child
# method1 14.98 0.03 15.04 NA NA
# method2 2.59 0.05 2.63 NA NA
all(d[[3]]==d[[4]])
# TRUE
The approach is to run through keeping all possible candidates in the past 5 minutes that are not less than the present one. If there are no such candidates the current must be the maximum. I assume that you can generalise to minimum.
Possibly doesn't work if you want to know maximum in last 5 minutes between datapoints rather than at datapoints though - not sure if you require that
Sort the dataframe by time first. Then maintain a max heap of the price, removing the lost price entries after every shift. Since rebalancing a heap is O(log n), this will be O(n log n). For implementing a max heap, consult any algorithms textbook (although I may edit this post later with one).
My problem:
If the sex ratio at birth (male to female) is 1.1, but people adopt the following
strategy: have children until you have one son, and then stop, unless you have 12
daughters (in which case you stop, too). What would be the average sex ratio in the
population? (Calculate by simulation. Suppose you randomly select 10,000 families.)
My code
pm=0.5238095 # Probability of Male
pw=0.4761905 # Female
w=0 # initial number of Female
n=1 # loop
p=0 # count of number
for(i in 1:n){
s=rbinom(1,1,0.4761905)
if(s==1){
w=w+1
}
p=p+1
while(w<=12){ ####1. How to count the number of female? ###
while(s==1){
s=rbinom(1,1,0.4761905)
if(s==1){
w=w+1
}
p=p+1
}
}
f[i]=p
}
w/p
My question
How to count the number of female? I'm using loop to count the number of women$(if(s==1){
w=w+1
}). $It seems inefficient.I think MAYBE counting true or false is more efficient.
How to write the code more concise?
The answer, of course, is that this strategy won't affect the sex ratio at all! At least as you've set this up, no matter what a couple's previous birth history is, the probability of a male arising from each birth is always the same.
Here's one way to confirm that with some calculations. (The code's offered without further explanation, at least for now.):
pm <- 0.5238095
m <- cbind(boys=c(rep(1, 12), 0), girls=0:12)
p <- c(dgeom(0:11, pm), 1-pgeom(11, pm))
## Calculate expected number of boys and girls for an immortal couple pursuing
## this "strategy"
(res <- p %*% m)
# boys girls
# [1,] 0.9998641 0.9089674
p[1] / sum(p)
# [1] 0.5238095 ## Look familiar
Yes, this is very inefficient. Perhaps I can address just a couple of things that almost make sense and it will give you your answer. In your code...
for(i in 1:n){
s=rbinom(1,1,0.4761905)
if(s==1){
w=w+1
}
can be rewritten as...
s = rbinom(n,1,0.4761905)
w = sum(s)
That's the same result. Keep in mind that rbinom is producing 0's and 1's. You can just sum them to know how many 1's. Given that you define n then the number of 0s (females) is...
n - w
But, if you didn't it would be easy to find too...
length(s) - sum(s)
It is still probably inefficient but at least it's correct for what you're trying to do:
# set.seed(1)
pw <- 0.4761905 # Initial sex ratio
w <- 0 # number of daughters
n <- 10000 # number of families
p <- 0 # number of kids
f <- data.frame(Daughters=vector(length=n), Kids=vector(length=n))
for(i in 1:n){
while(w < 12 & w==p){ #As long as you don't have 12 daughters or 1 son...
s <- rbinom(1,1,pw)
if(s==1){w <- w+1}
p <- p+1
}
f[i,] <- c(w,p) #Number of daughter and total kids in each families
w <- p <- 0 # Reset number of kids and daughters for the next family
}
colSums(f)[1]/colSums(f)[2] #Final sex ratio
Daughters
0.4736842 # So as #JoshO'Brien pointed out, very close to the original sex ratio.
And you can verify vector f to see that there is never more than 1 son (number of kids minus number of daughters):
range(f[,2]-f[,1])
[1] 1 1 # Range of the number of boys per family
range(f[,1])
[1] 0 11 # Range of the number of daughters per family
nrow(f[f[,1]==0,])
[1] 5275 # Number of families having 1 son and no daughters (to be compared with 1-pw)