lpsolveAPI in RStudio - r

I am using the lpsolveAPI in RStudio. When I type the name of a model with few decision variables, I can read a printout of the current constraints in the model. For example
> lprec
Model name:
COLONE COLTWO COLTHREE COLFOUR
Minimize 1 3 6.24 0.1
THISROW 0 78.26 0 2.9 >= 92.3
THATROW 0.24 0 11.31 0 <= 14.8
LASTROW 12.68 0 0.08 0.9 >= 4
Type Real Real Real Real
Upper Inf Inf Inf 48.98
Lower 28.6 0 0 18
But when I make a model that has more than 9 decision variables, it no longer gives the full summary and I instead see:
> lprec
Model name:
a linear program with 13 decision variables and 258 constraints
Does anyone know how I can see the same detailed summary of the model when there are large numbers of decision variables?
Bonus Question: Is RStudio the best console for working with R?
Here is an example:
>lprec <- make.lp(0,5)
This makes a new model called lprec, with 0 constraints and 5 variables. Even if you call the name now you get:
>lprec
Model name:
C1 C2 C3 C4 C5
Minimize 0 0 0 0 0
Kind Std Std Std Std Std
Type Real Real Real Real Real
Upper Inf Inf Inf Inf Inf
Lower 0 0 0 0 0
The C columns correspond to the 5 variables. Right now there are no constraints and the objective function is 0.
You can add a constraint with
>add.constraint(lprec, c(1,3,4,2,-8), "<=", 0)
This is the constraint C1 + 3*C2 + 4*C3 + 2*C4 - 8*C5 <= 0. Now the print out is:
Model name:
C1 C2 C3 C4 C5
Minimize 0 0 0 0 0
R1 1 3 4 2 -8 <= 0
Kind Std Std Std Std Std
Type Real Real Real Real Real
Upper Inf Inf Inf Inf Inf
Lower 0 0 0 0 0
Anyway the point is that no matter how many constraints, if there are more than 9 variables then I don't get the full print out.
>lprec <- make.lp(0,15)
>lprec
Model name:
a linear program with 15 decision variables and 0 constraints

Write it out to a file for examination
When I work with LPs using lpSolveAPI, I prefer to write them out to a file. The lp format works fine for my needs. I then examine the LP model using any text editor. If you click on the output file in the "Files" panel in RStudio, it will open it too, and you can inspect it.
write.lp(lprec, "lpfilename.lp", "lp") #write it to a file in LP format
You can also write it out as MPS format if you so choose.
Here's the help file on write.lp().
Hope that helps.

Since it is an S3 object of class lpExtPtr,
the function called to display it is print.lpExtPtr.
If you check its code, you will see that it displays the object
differently depending on its size --
details for very big objects would not be very useful.
Unfortunately, the threshold cannot be changed.
class(r)
# [1] "lpExtPtr"
print.lpExtPtr
# function (x, ...)
# {
# (...)
# if (n > 8) {
# cat(paste("Model name: ", name.lp(x), "\n", " a linear program with ",
# n, " decision variables and ", m, " constraints\n",
# sep = ""))
# return(invisible(x))
# }
# (...)
You can access the contents of the object with the various get.* functions,
as the print method does.
Alternatively, you can just change the print method.
# A function to modify functions
patch <- function( f, before, after ) {
f_text <- capture.output(dput(f))
g_text <- gsub( before, after, f_text )
g <- eval( parse( text = g_text ) )
environment(g) <- environment(f)
g
}
# Sample data
library(lpSolveAPI)
r <- make.lp(0,5)
r # Shows the details
r <- make.lp(0,20)
r # Does not show the details
# Set the threshold to 800 variables instead of 8
print.lpExtPtr <- patch( print.lpExtPtr, "8", "800" )
r # Shows the details

Related

Custom function to compute contrasts in emmeans

I want to create a custom contrast function in emmeans which could remove a given list of levels from the input vector and apply the built-in contrast method ("trt.vs.ctrl") on the remaining levels. An example dataset is available here. I am using the following R code for computing ANOVA and post hoc comparisons:
options(contrasts=c("contr.sum", "contr.poly"))
my_lm <- lm(D1 ~ C*R, data=df)
Anova(my_lm, type = "III")
#show Interaction effects using emmeans
emmip(my_lm, C ~ R )
emm = emmeans(my_lm, ~ C * R)
emm
contrast(emmeans(my_lm, ~ C * R), "consec", by = "C")
#compare 1st with next 3 groups (how to remove other three levels?)
contrast(emmeans(my_lm, ~ C * R), "trt.vs.ctrl", by = "R")
The built-in contrast option ("trt.vs.ctrl") compares the first level with everything that follows it (there are 7 factor levels in C, and I want to remove last 3 of them and compute the contrasts for the remaining 4). An example is provided in the official documentation to write a custom contrast function.
skip_comp.emmc <- function(levels, skip = 1, reverse = FALSE) {
if((k <- length(levels)) < skip + 1)
stop("Need at least ", skip + 1, " levels")
coef <- data.frame()
coef <- as.data.frame(lapply(seq_len(k - skip - 1), function(i) {
sgn <- ifelse(reverse, -1, 1)
sgn * c(rep(0, i - 1), 1, rep(0, skip), -1, rep(0, k - i - skip - 1))
}))
names(coef) <- sapply(coef, function(x)
paste(which(x == 1), "-", which(x == -1)))
attr(coef, "adjust") = "fdr" # default adjustment method
coef
}
However due to my limited understanding I am not very sure where to apply the modifications that I need to to customise the example. Any ideas?
Is this something you are going to want to do lots of times in the future? My guess is not, that you only want to do this once, or a few times at most; in which case it is way too much trouble to write a custom contrast function. Just get the contrast coefficients you need, and use that as the second argument in contrast.
Now, consider these results:
> con <- emmeans:::trt.vs.ctrl.emmc(1:7)
> con
2 - 1 3 - 1 4 - 1 5 - 1 6 - 1 7 - 1
1 -1 -1 -1 -1 -1 -1
2 1 0 0 0 0 0
3 0 1 0 0 0 0
4 0 0 1 0 0 0
5 0 0 0 1 0 0
6 0 0 0 0 1 0
7 0 0 0 0 0 1
From the description, I think you just want the first 3 sets of contrast coefficients. So use those columns:
contrast(emm, con[, 1:3], by = "R")
Update
StackOverflow can occasionally inspire developers to add software features. In this case, I decided it could be useful to add an exclude argument to most built-in .emmc functions in emmeans (all except poly.emmc()). This was fairly straightforward to do, and those features are now incorporated in the latest push to github -- https://github.com/rvlenth/emmeans. These features will be included in the next CRAN update as well.

modulus values (roots) in VECM model using R?

thanks for reading my question. I am trying to fit a VECM for an economic research, i am using the vars and urca package on R using Rstudio. Considering i have no stationary time series, and both need one difference ,both are I(1), i need to use the VECM approach, but i can not get all the tests i need.
For example:
First i load the libraries
library(vars)
library(urca)
and create my model
data("Canada")
df <- Canada
VARselect(df)
vecm <- urca::ca.jo(df,K = 3)
model <- vec2var(vecm)
The problem is, i can not get the "modules" values to prove stability, i know i can use roots() function to get this values from a "varest" object, for example:
roots(VAR(df,3))
My question is:
how can i get modulus from my vec2var object, roots() doesn't handle this kind of object. I know Gretl can do it (using unit circle to prove stability), so is posible to get this values from a VECM?. How can i do it in R?
Starting with:
data("Canada")
dim(Canada) #84observations x 4 variables
VARselect(Canada) # since in small samples, AIC>BIC; VAR(3) is chosen.
Now, the range of the dataset Canada: 1980.1 - 2000.4 (20 years) is long enough for modeling. This 20-year long period definitely includes lots of crises and interventions. Hence, structural breaks in the data MUST be searched. This is necessary since in structurally-broken series, the existence of SBs changes t values of nonstationarity tests (thereby affects the decision on whether a series is stationary or not).
Since Narayan-Popp 2010 nonstationarity test under multiple structural breaks is statistically very powerful against previous ones (Lee-Strazichic2003, Zivot-Andres1992), and since Joyeux 2007 (in Rao2007) has proven the illogicalness of these previous tests, and NP2013 has proven the superiority of NP2010's statistical power, one MUST use NP2010. Since Gauss code for NP2010 seemed to be ugly to me, I converted it to R code, and with the help of ggplot2, results are presented nicer.
[Processing structural breaks is a MUST for cointegration check as well since Osterwald-Lenum1992 CVs ignore SBs whereas Johansen-Mosconi-Nielsen2000 CVs cares SBs.]
Canada <- as.data.frame(Canada)
head(Canada)
e prod rw U
1 929.6105 405.3665 386.1361 7.53
2 929.8040 404.6398 388.1358 7.70
...................................
# Assign lexiographic row names for dates of observations
row.names(Canada) <- paste(sort(rep(seq(1980, 2000, 1), 4) ), rep(seq(1, 4, 1), 20), sep = ".")
# Insert lexiographic "date" column to the dataframe. This is necessary for creating intervention dummies.
DCanada <- data.frame(date=row.names(Canada),Canada) # dataset with obs dates in a column
head(DCanada)
date e prod rw U
1980.1 1980.1 929.6105 405.3665 386.1361 7.53
1980.2 1980.2 929.8040 404.6398 388.1358 7.70
Perform Narayan-Popp 2010 nonstationarity test to the series:
[H0: "(with 2 structural breaks) series is nonstationary";
H1: "(with 2 structural breaks) series is stationary";
"test stat > critical value" => "hold H0"; "test stat < critical value" => "hold H1"]
library(causfinder)
narayanpopp(DCanada[,2]) # for e
narayanpopp(DCanada[,3]) # for prod
narayanpopp(DCanada[,4]) # for rw
narayanpopp(DCanada[,5]) # for U
Narayan-Popp 2010 nonstationarity test results (with obs #s):
variable t stat lag SB1 SB2 Integration Order
e -4.164 2 37:946.86 43:948.03 I(1)
prod -3.325 1 24:406.77 44:405.43 I(1)
rw -5.087 0 36:436.15 44:446.96 I(0) <trend-stationary>
U -5.737 1 43:8.169 53:11.070 I(0) <stationary pattern> (M2 computationally singular; used M1 model)
(critical values (M2): (1%,5%,10%): -5.576 -4.937 -4.596)
(critical values (M1): (1%,5%,10%): -4.958 -4.316 -3.980
Since in a VAR structure, all variables are treated equally, continue to equal-treatment when determining structural breaks systemwise:
mean(c(37,24,36,43)) # 35; SB1 of system=1988.3
mean(c(43,44,44,53)) # 46; SB2 of system=1990.2
The following is to overcome "In Ops.factor(left, right) : >= not meaningful for factors" error. In some dataset, we need to do the following:
library(readxl)
write.xlsx(Canada, file="data.xlsx", row.names=FALSE) # Take this to the below folder, add "date" column with values 1980.1,....,2000.4
mydata <- read_excel("D://eKitap//RAO 2007 Cointegration for the applied economist 2E//JoyeuxCalisma//Canada//data.xlsx")
# arrange your path accordingly in the above line.
mydata <- as.data.frame(mydata)
library(lubridate); library(zoo)
row.names(mydata) <- as.yearqtr(seq(ymd('1980-01-01'), by = '1 quarter', length.out=(84)))
Dmydata <- mydata # Hold it in a variable
Define intervention dummy matrix with 2 SBs (35:1988.3 and 46: 1990.2) as follows:
library(data.table)
DataTable <- data.table(Dmydata, keep.rownames=FALSE)
Dt <- cbind("bir"=1, # intervention dummies matrix
"D2t" = as.numeric(ifelse( DataTable[,c("date"), with=FALSE] >= "1988.3" & DataTable[,c("date"), with=FALSE] <= "1990.1", 1 , 0)),
"D3t" = as.numeric(ifelse( DataTable[,c("date"), with=FALSE] >= "1990.2" & DataTable[,c("date"), with=FALSE] <= "2000.4", 1 , 0)))
On the fly indicator variables accompanying intervention dummies:
OnTheFlyIndicator <- cbind(
"I2t" = as.numeric(DataTable[, c("date"), with=FALSE] == "1988.3"),
"I3t" = as.numeric(DataTable[, c("date"), with=FALSE] == "1990.2"))
myTimeTrend <- as.matrix(cbind("TimeTrend" = as.numeric(1:nrow(Dt))))
zyDt <- Dt * as.vector(myTimeTrend) # TimeTrendDavranisDegisimleri
colnames(zyDt) <- paste(colnames(myTimeTrend), colnames(Dt), sep="*")
mydata <- mydata[,-1]
Selection of VAR order:
library(vars)
# Lag order selection with the effects of intervention dummies
VARselect(mydata, lag.max=5, "both", exogen=cbind(zyDt[drop=FALSE], Dt[drop=FALSE], OnTheFlyIndicator)) # Take VAR(3)
Lagger matrix for Joyeux2007 indexing technique:
lagmatrix <- function(x, maxlag){
x <- as.matrix(x)
if(is.null(colnames(x))== TRUE){ colnames(x) <- "VarCol0" }
DondurulenDizey <- embed(c(rep(NA,maxlag),x),maxlag+1)
dimnames(DondurulenDizey)[[2]] <- c(colnames(x)[1, drop = FALSE], paste(colnames(x)[1,drop=FALSE],".",1:maxlag,"l", sep = ""))
return(DondurulenDizey)
}
Assign VAR lag and no. of subsamples:
VARlag <- 3
Subsamples <- 3 # subsamples = no. of str breaks +1
Dummy matrix for 2 structural breaks:
dummymatrix2SB <- matrix(NA,DataTable[,.N], 10)
dummymatrix2SB <- cbind(myTimeTrend,
lagmatrix(zyDt[,c("TimeTrend*D2t"), drop=FALSE], maxlag=VARlag)[,1+VARlag, drop=FALSE],
lagmatrix(zyDt[,c("TimeTrend*D3t"), drop=FALSE], maxlag=VARlag)[,1+VARlag, drop=FALSE],
lagmatrix(Dt[,c("D2t"), drop=FALSE], maxlag=VARlag)[,1+VARlag, drop=FALSE],
lagmatrix(Dt[,c("D3t"), drop=FALSE], maxlag=VARlag)[,1+VARlag, drop=FALSE],
lagmatrix(OnTheFlyIndicator[,c("I2t"), drop=FALSE], maxlag=VARlag-1),
lagmatrix(OnTheFlyIndicator[,c("I3t"), drop=FALSE], maxlag=VARlag-1))
dummymatrix2SB[is.na(dummymatrix2SB)] <- 0 # replace NAs with 0
dummymatrix2SB # Print dummy matrix for 2 str breaks to make sure all are OK
TimeTrend TimeTrend.D2t.3l TimeTrend.D3t.3l D2t.3l D3t.3l I2t I2t.1l I2t.2l I3t I3t.1l I3t.2l
1 0 0 0 0 0 0 0 0 0 0
2 0 0 0 0 0 0 0 0 0 0
...........................................
34 0 0 0 0 0 0 0 0 0 0
35 0 0 0 0 1 0 0 0 0 0
36 0 0 0 0 0 1 0 0 0 0
37 0 0 0 0 0 0 1 0 0 0
38 35 0 1 0 0 0 0 0 0 0
39 36 0 1 0 0 0 0 0 0 0
40 37 0 1 0 0 0 0 0 0 0
41 38 0 1 0 0 0 0 0 0 0
42 39 0 1 0 0 0 0 1 0 0
43 40 0 1 0 0 0 0 0 1 0
44 41 0 1 0 0 0 0 0 0 1
45 0 42 0 1 0 0 0 0 0 0
46 0 43 0 1 0 0 0 0 0 0
............................................
83 0 80 0 1 0 0 0 0 0 0
84 0 81 0 1 0 0 0 0 0 0
STABILITY of VAR:
Victor, theoretically you are wrong. Stability is checked from VAR side even in the case of restricted (cointegrated) VAR models. See Joyeux2007 for details. Also, estimations from both sides are same:
"unrestricted VAR = unrestricted VECM" and
"restricted VAR = restricted VECM".
Hence, checking stability of unrestricted VAR is equal to checking stability of unrestricted VECM, and vice versa. They are equal math'ly, they are just different representations.
Also, checking stability of restricted VAR is equal to checking stability of restricted VECM, and vice versa. They are equal math'ly, they are just different representations. But, you do not need this checking for restricted VECM cases since we are surfing in subspace of a feasible VAR. That is to say, if original unr VAR corresponding to restd VeCM is stable, then all are OK.
If your series are cointegrated, you check the stability from VAR side even in that case! If you wonder "whether you should check stability for restricted VECM", the answer is NO. You should not check. Because, in cointegrated case, you are in the subspace of feasible solution. That said, if you insist to check stability of restricted (cointegrated) VECM, you can still do that via urca::ca.jo extentions and vars::vec2var extentions:
print(roots(VAR(mydata, p=3, "both", exogen=cbind(zyDt[drop=FALSE], Dt[drop=FALSE], OnTheFlyIndicator)), modulus=TRUE))
# [1] 0.96132524 0.77923543 0.68689517 0.68689517 0.67578368 0.67578368
[7] 0.59065419 0.59065419 0.55983617 0.55983617 0.33700725 0.09363846
print(max(roots(VAR(mydata, p=3, "both", exogen=cbind(zyDt[drop=FALSE], Dt[drop=FALSE], OnTheFlyIndicator)), modulus=TRUE)))
#0.9613252
(optional) Check stability via OLS-CUSUM:
plot(stability(VAR(mydata, p=3, "both", exogen=cbind(zyDt[drop=FALSE], Dt[drop=FALSE], OnTheFlyIndicator)), type="OLS-CUSUM"))
NON-AUTOCORRELATION of VAR residuals test:
for (j in as.integer(1:5)){
print(paste("VAR's lag no:", j))
print(serial.test(VAR(mydata, p=j, "both", exogen=cbind(zyDt[drop=FALSE], Dt[drop=FALSE], OnTheFlyIndicator)), lags.bg=4, type= c("ES")))
# lags.bg: AR order of VAR residuals
}
NORMALITY of VAR residuals test:
print(normality.test(VAR(mydata, p=3, "both", exogen=cbind(zyDt[drop=FALSE], Dt[drop=FALSE], OnTheFlyIndicator)), multivariate=TRUE))
library(normtest)
for (i in as.integer(1:4)){ # there are 4 variables
print(skewness.norm.test(resid(VAR(mydata, p=3, "both", exogen=cbind(zyDt[drop=FALSE], Dt[drop=FALSE], OnTheFlyIndicator)))[,i]))
print(kurtosis.norm.test(resid(VAR(mydata, p=3, "both", exogen=cbind(zyDt[drop=FALSE], Dt[drop=FALSE], OnTheFlyIndicator)))[,i]))
print(jb.norm.test(resid(VAR(mydata, p=3, "both", exogen=cbind(zyDt[drop=FALSE], Dt[drop=FALSE], OnTheFlyIndicator)))[,i]))
}
HOMOSCEDASTICITY of VAR residuals test:
print(arch.test(VAR(mydata, p=3, "both", exogen=cbind(zyDt[drop=FALSE], Dt[drop=FALSE], OnTheFlyIndicator))), lags.multi=6, multivariate.only=TRUE)
Since integration orders of series is different, there is no way that they are cointegrated. That said,
Assume for a while all are I(1) and perform cointegration test with multiple structural breaks with Johansen-Mosconi-Nielsen 2000 CVs:
(extend urca::cajo to causfinder::ykJohEsbInc (i.e., add the functionality to process 1 SB and 2 SBs))
summary(ykJohEsbInc(mydata, type="trace", ecdet="zamanda2yk", K=3, spec="longrun", dumvar=dummymatrix2SB[,c(-1,-2,-3)]))
# summary(ykJohEsbInc(mydata, type="trace", ecdet="zamanda2yk", K=3, spec="transitory", dumvar=dummymatrix2SB[,c(-1,-2,-3)])) gives the exactly same result.
Since there are 2 SBs in the system (1988.3, 1990.2), there are q=2+1=3 subsamples.
1st SB ratio: v1= (35-1)/84= 0.4047619
2nd SB ratio:v2= (46-1)/84= 0.5357143
Hence, JMN2000 CVs for cointegration test with 2 SBs:
(The following is TR-localized. One can find original EN-local code in Giles website)
library(gplots)
# Johansen vd. (2000) nin buldugu, yapisal kirilmalarin varliginda esbutunlesim incelemesinin degistirilmis iz sinamalarinin yanasik p degerleri ve karar degerlerini hesaplama kodu
# Ryan Godwin & David Giles (Dept. of Economics, Univesity of Victoria, Canada), 29.06.2011
# Kullanici asagidaki 4 degeri atamalidir
#======================================
degiskensayisi <- 4 # p
q<- 3 # q: verideki farkli donemlerin sayisi; q=1: 1 donem, hicbir yapisal kirilma yok demek oldugundan v1 ve v2 nin degerleri ihmal edilir
v1<- 0.4047619 # (35-1)/84 # 1.yk anı=34+1=35. Johansen et. al 2000 v1 def'n , v1: SB1 - 1
v2<- 0.5357143 # (46-1)/84 # 2nd SB moment 45+1=46.
#======================================
# iz istatistiginin biri veya her ikisi icin p degerlerinin olmasi istendiginde, sonraki 2 satirin biri veya her ikisini degistir
izZ <- 15.09 # Vz(r) istatistiginin degeri
izK <- 114.7 # Vk(r) istatistiginin degeri
#=========================================
enbuyuk_p_r<- degiskensayisi # "p-r > 10" olmasın; bkz: Johansen vd. (2000)
# "a" ve "b" nin değerleri yapısal kırılmaların sayısına (q-1) bağlıdır
# q=1 iken, hiçbir yapısal kırılma olmadığı bu durumda a=b=0 ata
# q=2 iken, 1 yapısal kırılma olduğu bu durumda a=0 (Johansen vd. 2000 4.Tabloda) ve b=min[V1 , (1-V1)] ata
# q=3 iken, 2 yapısal kırılma olduğu bu durumda a=min[V1, (V2-V1), (1-V2)] ve b=min[geriye kalan iki V ifadesi] ata
a = c(0, 0, min(v1, v2-v1, 1-v2))[q]
b = c(0, min(v1, 1-v1), median(c(v1,v2-v1,1-v2)))[q]
# YanDagOrtLog: yanaşık dağılımın ortalamasının logaritması
# YanDagDegLog: yanaşık dağılımın değişmesinin logaritması
# V(Zamanyönsemsi) veya V(Kesme) sınamalarını yansıtmak üzere adlara z veya k ekle.
# Bkz. Johansen vd. (2000) 4. Tablo.
# Önce Vz(r) sınamasının sonra Vk(r) sınamasının karar değerlerini oluştur
pr<- c(1:enbuyuk_p_r)
YanDagOrtLogZ <- 3.06+0.456*pr+1.47*a+0.993*b-0.0269*pr^2-0.0363*a*pr-0.0195*b*pr-4.21*a^2-2.35*b^2+0.000840*pr^3+6.01*a^3-1.33*a^2*b+2.04*b^3-2.05/pr-0.304*a/pr+1.06*b/pr
+9.35*a^2/pr+3.82*a*b/pr+2.12*b^2/pr-22.8*a^3/pr-7.15*a*b^2/pr-4.95*b^3/pr+0.681/pr^2-0.828*b/pr^2-5.43*a^2/pr^2+13.1*a^3/pr^2+1.5*b^3/pr^2
YanDagDegLogZ <- 3.97+0.314*pr+1.79*a+0.256*b-0.00898*pr^2-0.0688*a*pr-4.08*a^2+4.75*a^3-0.587*b^3-2.47/pr+1.62*a/pr+3.13*b/pr-4.52*a^2/pr-1.21*a*b/pr-5.87*b^2/pr+4.89*b^3/pr
+0.874/pr^2-0.865*b/pr^2
OrtalamaZ<- exp(YanDagOrtLogZ)-(3-q)*pr
DegismeZ<- exp(YanDagDegLogZ)-2*(3-q)*pr
# Sinama istatistiginin yanasik dagilimina yaklasmakta kullanilacak Gama dagiliminin sekil ve olcek degiskelerini elde etmek icin yanasik ortalama ve degismeyi kullanarak
# V0 varsayimi altinda istenen quantilelari elde et:
# quantilelar: olasilik dagiliminin araligini veya bir ornekteki gozlemleri, esit olasiliklara sahip birbirlerine bitisik araliklarla bolen kesim noktalari.
tetaZ <- DegismeZ/OrtalamaZ
kZ <- OrtalamaZ^2/DegismeZ
YanDagOrtLogK<- 2.80+0.501*pr+1.43*a+0.399*b-0.0309*pr^2-0.0600*a*pr-5.72*a^2-1.12*a*b-1.70*b^2+0.000974*pr^3+0.168*a^2*pr+6.34*a^3+1.89*a*b^2+1.85*b^3-2.19/pr-0.438*a/pr
+1.79*b/pr+6.03*a^2/pr+3.08*a*b/pr-1.97*b^2/pr-8.08*a^3/pr-5.79*a*b^2/pr+0.717/pr^2-1.29*b/pr^2-1.52*a^2/pr^2+2.87*b^2/pr^2-2.03*b^3/pr^2
YanDagDegLogK<- 3.78+0.346*pr+0.859*a-0.0106*pr^2-0.0339*a*pr-2.35*a^2+3.95*a^3-0.282*b^3-2.73/pr+0.874*a/pr+2.36*b/pr-2.88*a^2/pr-4.44*b^2/pr+4.31*b^3/pr+1.02/pr^2-0.807*b/pr^2
OrtalamaK <- exp(YanDagOrtLogK)-(3-q)*pr
DegismeK <- exp(YanDagDegLogK)-2*(3-q)*pr
# Sinama istatistiginin yanasik dagilimina yaklasmakta kullanilacak Gama dagiliminin sekil ve olcek degiskelerini elde etmek icin yanasik ortalama ve degismeyi kullanarak
# V0 varsayimi altinda istenen quantilelari elde et:
# quantilelar: olasilik dagiliminin araligini veya bir ornekteki gozlemleri, esit olasiliklara sahip birbirlerine bitisik araliklarla bolen kesim noktalari.
tetaK <- DegismeK/OrtalamaK
kK <- OrtalamaK^2/DegismeK
# (izZ veya izK den biri 0 dan farklı ise) karar değerlerini ve p değerlerini tablolaştır:
windows(6,3.8)
KararDegerleri <- cbind(sapply(c(.90,.95,.99) , function(x) sprintf("%.2f",round(c(qgamma(x, shape=kZ,scale=tetaZ)),2))),
sapply(c(.9,.95,.99) , function(x) sprintf("%.2f",round(c(qgamma(x, shape=kK,scale=tetaK)),2))))
colnames(KararDegerleri) <- rep(c(0.90,0.95,0.99),2)
# rownames(KararDegerleri) <- pr
rownames(KararDegerleri) <- c(sapply((degiskensayisi -1):1, function(i) paste(degiskensayisi - i, " ","(r<=", i, ")",sep="")), paste(degiskensayisi, " ( r=0)", sep=""))
textplot(KararDegerleri, cex=1)
text(.064,.91,"p-r",font=2)
text(.345,1,expression(paste(plain(V)[z],"(r) test")),col=2)
text(.821,1,expression(paste(plain(V)[k],"(r) test")),col=4)
title("Yanasik Karar Degerleri \n (p:duzendeki degisken sayisi; r:esbutunlesim ranki)")
if(izZ!=0){
windows(4,3.8)
pDegerleri <- matrix(sprintf("%.3f",round(1 - pgamma(izZ, shape=kZ, scale = tetaZ),3)))
# rownames(pDegerleri) <- pr
rownames(pDegerleri) <- c(sapply((degiskensayisi -1):1, function(i) paste(degiskensayisi - i, " ","(r<=", i, ")",sep="")), paste(degiskensayisi, " ( r=0)", sep=""))
textplot(pDegerleri,cex=1,show.colnames=F)
text(.69,.96,substitute(paste("Pr(",plain(V)[z],">",nn,")"),list(nn=izZ)),col=2)
text(.45,.96,"p-r",font=2)
title("Yanasik p Degerleri \n (p:duzendeki degisken sayisi; \n r:esbutunlesim ranki)")
}
if(izK!=0){
windows(3,3.8)
pDegerleri <- matrix(sprintf("%.3f",round(1 - pgamma(izK, shape=kK, scale = tetaK),3)))
#rownames(pDegerleri) <- pr
rownames(pDegerleri) <- c(sapply((degiskensayisi -1):1, function(i) paste(degiskensayisi - i, " ","(r<=", i, ")",sep="")), paste(degiskensayisi, " ( r=0)", sep=""))
textplot(pDegerleri,cex=1,show.colnames=F)
text(.78,.96,substitute(paste("Pr(",plain(V)[k],">",nn,")"),list(nn=izK)),col=4)
text(.43,.96,"p-r",font=2)
title("Yanasik p Degerleri \n (p:duzendeki degisken sayisi; \n r:esbutunlesim ranki)")
}
Hence, the according to JMN2000 CVs, there is no cointegration as well. So, your usage of vec2var is meaningless. Because, vec2var is needed in cointegrated cases. Again, assume all series are cointegrated to make you happy (to create need to use vec2var) and continue with the most difficult case (cointegration for series with multiple structural breaks); i.e., we are continueing with "One who pee-pees ambitiously drills the wall" logic.
Extend vars::vec2var to causfinder::vec2var_ykJohEsbInc to handle transformations under "multiple structural breaks" case having relevant intervention dummies. JMN2000 application above showed cointegration rank r is not within [1,4-1]=[1,3] range. Even though that assume JMN2000 CVs resulted r=1 in the above for the sake of argument.
So, to transform restricted VECM to restricted VAR (under multiple=2 structural breaks), apply:
vec2var_ykJohEsbInc(ykJohEsbInc(mydata, type="trace", ecdet="zamanda2yk", K=3, spec="longrun", dumvar=dummymatrix2SB[,c(-1,-2,-3)]),r=1)
These results in:
Deterministic coefficients (detcoeffs):
e prod rw U
kesme 22.6612871 -0.215892151 32.0610121 -9.26649249 #(const)
zyonsemesi 0.2505164 -0.009900004 0.3503561 -0.10494714 #(trend)
zy*D2t_3 0.2238060 -0.008844454 0.3130007 -0.09375756
zy*D3t_3 -0.1234803 0.004879743 -0.1726916 0.05172878
$deterministic
kesme zyonsemesi zy*D2t_3 zy*D3t_3 D2t.3l D3t.3l
e 22.6612871 0.250516390 0.223806048 -0.123480327 -8.8012612 5.3052074
prod -0.2158922 -0.009900004 -0.008844454 0.004879743 -0.1157137 -0.3396206
rw 32.0610121 0.350356063 0.313000702 -0.172691620 -12.5838458 7.2201840
U -9.2664925 -0.104947142 -0.093757559 0.051728781 3.5836119 -2.2921099
I2t I2t.1l I2t.2l I3t I3t.1l I3t.2l
e -0.2584379 0.08470453 0.2102661 -0.51366831 -1.0110891 -2.08728944
prod 0.3013044 0.25103445 -0.8640467 0.08804425 -0.2362783 -0.05606892
rw -0.5838161 0.28400182 1.2073483 -0.67760848 -2.2650094 -0.70586316
U 0.1305258 0.03559119 0.1476985 0.14614290 0.6847273 1.27469940
$A
$A$A1
e.1g prod.1g rw.1g U.1g
e 1.4817704 0.1771082 -0.2274936 0.2332402
prod -0.1605790 1.1846699 0.0406294 -0.9398689
rw -0.8366449 -0.1910611 0.9774874 0.4667430
U -0.4245817 -0.1498295 0.1226085 0.7557885
$A$A2
e.2g prod.2g rw.2g U.2g
e -0.8441175 -0.04277845 0.01128282 -0.01896916
prod -0.3909984 -0.25960184 -0.20426749 0.79420691
rw 1.4181448 -0.03659278 -0.12240211 -0.06579174
U 0.4299422 0.09070905 0.04935195 -0.12691817
$A$A3
e.3g prod.3g rw.3g U.3g
e 0.40149641+0i -0.07067529+0i -0.008175418-0i 0.2286283+0i
prod 0.55003024+0i 0.07241639+0i 0.172505474-0i 0.1281593+0i
rw -0.52674826+0i 0.31667695+0i -0.168897398-0i 0.2184591+0i
U -0.02176108-0i 0.03245409-0i -0.077959841+0i 0.1855889-0i
So, now, check roots:
print(roots(vec2var_ykJohEsbInc(ykJohEsbInc(mydata, type="trace", ecdet="zamanda2yk", K=3, spec="longrun", dumvar=dummymatrix2SB[,c(-1,-2,-3)]),r=1), modulus=TRUE))
This result in "Please provide an object of class 'varest', generated by 'VAR()'." since vars::roots was not extended because: we do NOT need this extention! As I said before, even in the case of restricted VECM, stability is checked from VAR side. You must read Joyeux2007 line by line to see this.
I will supply the ouputs (print-screens) of above functions thouroughly for further clarification.
I will also write extention to vars::root as well just for pedagogical reasons.

Wrong result from constroptim function

I'm trying to use constrOptim to optimize the sum of square errors from a linear multiple regression. The main equation should be D = Beta1*Xa+Beta2*Xb+Beta3*Xc+Beta4*Xd , with D,Xa,Xb,Xc,Xd from a imported .csv file, and the Betas are the coefficients I want to find, minimizing the quadratic errors.
So far I imported the file.csv to R, named each column as Ds,Xa,Xb,Xc,Xd, created the objfunction=
function(Beta1,Beta2,Beta3,Beta4)'sum(E²)'=(sum(D) - sum(Beta1*Xa+Beta2*Xb+Beta3*Xc+Beta4*Xd))^2)
created the matrix 'C' and vector 'd' to configure the constraints that should restrict the Beta's to <=0. I dont know how to find the feasible region, although I've used initial values that made the function work.
Here is the code:
> Tabela= read.table("Simulacao.csv", header=T, sep= ";")
> Tabela
D A B C D.1
1 -1 1 -1 0 0
2 4 0 0 1 -1
3 4 1 0 -1 0
4 0 0 1 0 -1
5 -2 1 0 0 -1
> Ds= Tabela[,1]
> Xa= Tabela[,2]
> Xb= Tabela[,3]
> Xc= Tabela[,4]
> Xd= Tabela[,5]
> simulaf= function(x1,x2,x3,x4) {
+ Ds= Tabela[,1]
+ Xa= Tabela[,2]
+ Xb= Tabela[,3]
+ Xc= Tabela[,4]
+ Xd= Tabela[,5]
+ J=sum(Ds)
+ H=sum(x1*Xa+x2*Xb+x3*Xc+x4*Xd)
+ sx=(J-H)^2
+ return(sx)
+ }
> s= function(x) {simulaf(x[1],x[2],x[3],x[4])}
> d= c(0,0,0,0)
> C= matrix(c(-1,0,0,0,0,-1,0,0,0,0,-1,0,0,0,0,-1),nrow=4,ncol=4,byrow=T)
> constrOptim(c(-1,-1,-1,-1),s,NULL,C,d)
$par
[1] -0.2608199 -0.8981110 -1.1095961 -1.9274866
The result I expect should be:
$par
[1] -0.125 0 -0.5 -0.875
After researching this, my conclusions are that it could be because I'm using bad initial values, parameterization problem (don't understand why its needed) or if it's simply that I have programmed it incorrectly.
What do I need to do to fix this?
The formula for the sum of squared errors is
sum((y - yhat)^2)
and not
(sum(y) - sum(yhat))^2
where yhat is the predicted value.
Also, if your only constraints are that the estimated betas should be negative (which is a bit weird, usually you want them to be positive but never mind), then you don't need constrOptim. Regular optim(method="L-BFGS-B") or nlminb will work with so-called box constraints.

How to use genetic algorithm for prediction correctly

I'm trying to use genetic algorithm for classification problem. However, I didn't succeed to get a summary for the model nor a prediction for a new data frame. How can I get the summary and the prediction for the new dataset?
Here is my toy example:
library(genalg)
dat <- read.table(text = " cats birds wolfs snakes
0 3 9 7
1 3 8 7
1 1 2 3
0 1 2 3
0 1 2 3
1 6 1 1
0 6 1 1
1 6 1 1 ", header = TRUE)
evalFunc <- function(x) {
if (dat$cats < 1)
return(0) else return(1)
}
iter = 100
GAmodel <- rbga.bin(size = 7, popSize = 200, iters = iter, mutationChance = 0.01,
elitism = T, evalFunc = evalFunc)
###########summary try#############
cat(summary.rbga(GAmodel))
# Error in cat(summary.rbga(GAmodel)) :
# could not find function "summary.rbga"
############# prediction try###########
dat$pred<-predict(GAmodel,newdata=dat)
# Error in UseMethod("predict") :
# no applicable method for 'predict' applied to an object of class "rbga"
Update:
After reading the answer given and reading this link:
Pattern prediction using Genetic Algorithm
I wonder how can I programmatically use the GA as part of a prediction mechanism? According to the link's text, one can use the GA for optimizing regression or NN and then use the predict function provided by them/
Genetic Algorithms are for optimization, not for classification. Therefore, there is no prediction method. Your summary statement was close to working.
cat(summary(GAmodel))
GA Settings
Type = binary chromosome
Population size = 200
Number of Generations = 100
Elitism = TRUE
Mutation Chance = 0.01
Search Domain
Var 1 = [,]
Var 0 = [,]
GA Results
Best Solution : 1 1 0 0 0 0 1
Some additional information is available from Imperial College London
Update in response to updated question:
I see from the paper that you mentioned how this makes sense. The idea is to use the genetic algorithm to optimize the weights for a neural network, then use the neural network for classification. This would be a big task, too big to respond here.

R: too long computation of likelihood function for conditional logit model

I am trying to maximize loglikelihood function to get coefficients for conditional logit model. I have a big data frame with about 9M rows (300k choice sets) and about 40 parameters to be estimated. It looks like this:
ChoiceSet Choice SKU Price Caramel etc.
1 1 1234 1.0 1 ...
1 0 145 2.0 1 ...
1 0 5233 2.0 0 ...
2 0 1432 1.5 1 ...
2 0 5233 2.0 0 ...
2 1 8320 2.0 0 ...
3 0 1234 1.5 1 ...
3 1 145 1.0 1 ...
3 0 8320 1.0 0 ...
Where ChoiceSet is a set of products available in store in the moment of purchase and Choice=1 when the SKU is chosen.
Since ChoiceSets might vary I use loglikelihood function:
clogit.ll <- function(beta,X) { #### This is a function to be maximized
X <- as.data.table(X)
setkey(X,ChoiceSet,Choice)
sum((as.matrix(X[J(t(as.vector(unique(X[,1,with=F]))),1),3:ncol(X),with=F]))%*%beta)-
sum(foreach(chset=unique(X[,list(ChoiceSet)])$ChoiceSet, .combine='c', .packages='data.table') %dopar% {
Z <- as.matrix(X[J(chset,0:1),3:ncol(X), with=F])
Zb <- Z%*%beta
e <- exp(Zb)
log(sum(e))
})
}
Create new data frame without SKU (it's not needed) and zero vector:
X0 <- Data[,-3]
b0 <- rep(0,ncol(X0)-2)
I maximize this function with a help of maxLike package where I use gradient to make calculation faster:
grad.clogit.ll <- function(beta,X) { ###It is a gradient of likelihood function
X <- as.data.table(X)
setkey(X,ChoiceSet,Choice)
colSums(foreach(chset=unique(X[,list(ChoiceSet)])$ChoiceSet, .combine='rbind',.packages='data.table') %dopar% {
Z <- as.matrix(X[J(chset,0:1),3:ncol(X), with=F])
Zb <- Z%*%beta
e <- exp(Zb)
as.vector(X[J(chset,1),3:ncol(X),with=F]-t(as.vector(X[J(chset,0:1),3:ncol(X),with=F]))%*%(e/sum(e)))
})
}
Maximization problem is following:
fit <- maxLik(logLik = clogit.ll, grad = grad.clogit.ll, start=b0, X=X0, method="NR", tol=10^(-6), iterlim=100)
Generally, it works fine for small samples, but too long for big:
Number of Choice sets Duration of computation
300 4.5min
400 10.5min
1000 25min
But when I do it for 5000+ choice sets R terminate session.
So (if you are still reading it) how can I maximaze this function if I have 300,000+ choice sets and 1.5 weeks to finish my course work? Please help, I have no any idea.

Resources