Singularity issue with sequential estimation of nested logit model - r

I try to estimate a nested logit model as a sequence of MNLs for teaching purposes. At some point of this procedure (see Ben Akiva & Lerman, 1986, Ch. 10, 10.4) one estimates jointly MNLs representing the nests. The MNLs (i.e., the "nests") are represented by the corresponding choice sets.
However when estimating the model, I come across an error message due to singularity.
I tried the code below.
library(mlogit)
data("Fishing", package = "mlogit")
# nest1: beach & pier -> model 1
# nest2: boat & charter -> model 2
# joint estimation of model 1 and 2
# Ben Akiva & Lerman (1986), Chp. 10 (.4)
# availabilities according to models
Fishing$avail.beach <- 0
Fishing$avail.pier <- 0
Fishing$avail.boat <- 0
Fishing$avail.charter <- 0
Fishing$avail.charter[Fishing$mode == "charter" | Fishing$mode == "boat"] <- 1
Fishing$avail.boat[Fishing$mode == "charter" | Fishing$mode == "boat"] <- 1
Fishing$avail.pier[Fishing$mode == "pier" | Fishing$mode == "beach"] <- 1
Fishing$avail.beach[Fishing$mode == "pier" | Fishing$mode == "beach"] <- 1
names(Fishing)
Fish <- mlogit.data(Fishing, shape="wide", varying=c(2:9, 11:14), choice="mode")
fish.joint <- subset(Fish, avail ==1) # eliminate rows of non-available alts
head(model.matrix(mFormula(mode ~ price), fish.joint), n=4)
fish.1 <- mlogit(mode ~ price, fish.joint)
To me, the model.matrix looks identifiable
> head(model.matrix(mFormula(mode ~ price), fish.joint), n=4)
boat:(intercept) charter:(intercept) pier:(intercept) price
1.boat 1 0 0 157.930
1.charter 0 1 0 182.930
2.boat 1 0 0 10.534
2.charter 0 1 0 34.534
However,
> fish.1 <- mlogit(mode ~ price, fish.joint)
Error in solve.default(H, g[!fixed]) :
Lapack routine dgesv: system is exactly singular: U[2,2] = 0

Related

Ranger Predicted Class Probability of each row in a data frame

With regard to this link Predicted probabilities in R ranger package, I have a question.
Imagine I have a mixed data frame, df (comprising of factor and numeric variables) and I want to do classification using ranger. I am splitting this data frame as test and train sets as Train_Set and Test_Set. BiClass is my prediction factor variable and comprises of 0 and 1 (2 levels)
I want to calculate and attach class probabilities to the data frame using ranger using the following commands:
Biclass.ranger <- ranger(BiClass ~ ., ,data=Train_Set, num.trees = 500, importance="impurity", save.memory = TRUE, probability=TRUE)
probabilities <- as.data.frame(predict(Biclass.ranger, data = Test_Set, num.trees = 200, type='response', verbose = TRUE)$predictions)
The data frame probabilities is a data frame consisting of 2 columns (0 and 1) with number of rows equal to the number of rows in Test_Set.
Does it mean, if I append or attach this data frame, namely, probabilities to the Test_Set as the last two columns, it shows the probability of each row being either 0 or 1? Is my understanding correct?
My second question, when I attempt to calcuate confusion matrix through
pred = predict(Biclass.ranger, data=Test_Set, num.trees = 500, type='response', verbose = TRUE)
table(Test_Set$BiClass, pred$predictions)
I get the following error:
Error in table(Test_Set$BiClass, pred$predictions) :
all arguments must have the same length
What am I doing wrong?
For your first question yes, it shows the probability of each row being 0 or 1. Using the example below:
library(ranger)
idx = sample(nrow(iris),100)
data = iris
data$Species = factor(ifelse(data$Species=="versicolor",1,0))
Train_Set = data[idx,]
Test_Set = data[-idx,]
mdl <- ranger(Species ~ ., ,data=Train_Set,importance="impurity", save.memory = TRUE, probability=TRUE)
probabilities <- as.data.frame(predict(mdl, data = Test_Set,type='response', verbose = TRUE)$predictions)
We can always check whether they agree:
par(mfrow=c(1,2))
boxplot(probabilities[,"0"] ~ Test_Set$Species,ylab="Prob 0",xlab="Actual label")
boxplot(probabilities[,"1"] ~ Test_Set$Species,ylab="Prob 1",xlab="Actual label")
Not the best plot, but sometimes if the labels are flipped you will see something weird. We need to find the column that has the max probability and assign the label, for this we do:
max.col(probabilities) - 1
[1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 0 1 1 0 1 1 1 1 1 1 1 1 1 0
[39] 0 0 0 0 0 0 0 0 0 0 0 0
This goes through each row of probabilities returns 1 or 2 depending on which column has maximum probability and we simply subtract 1 from it to get 0,1. For the confusion matrix:
caret::confusionMatrix(table(max.col(probabilities) - 1,Test_Set$Species))
Confusion Matrix and Statistics
0 1
0 31 2
1 0 17
Accuracy : 0.96
95% CI : (0.8629, 0.9951)
No Information Rate : 0.62
P-Value [Acc > NIR] : 2.048e-08
In your case, you can just do:
confusionMatrix(table(max.col(probabilities)-1,Test_Set$BiClass))

modulus values (roots) in VECM model using R?

thanks for reading my question. I am trying to fit a VECM for an economic research, i am using the vars and urca package on R using Rstudio. Considering i have no stationary time series, and both need one difference ,both are I(1), i need to use the VECM approach, but i can not get all the tests i need.
For example:
First i load the libraries
library(vars)
library(urca)
and create my model
data("Canada")
df <- Canada
VARselect(df)
vecm <- urca::ca.jo(df,K = 3)
model <- vec2var(vecm)
The problem is, i can not get the "modules" values to prove stability, i know i can use roots() function to get this values from a "varest" object, for example:
roots(VAR(df,3))
My question is:
how can i get modulus from my vec2var object, roots() doesn't handle this kind of object. I know Gretl can do it (using unit circle to prove stability), so is posible to get this values from a VECM?. How can i do it in R?
Starting with:
data("Canada")
dim(Canada) #84observations x 4 variables
VARselect(Canada) # since in small samples, AIC>BIC; VAR(3) is chosen.
Now, the range of the dataset Canada: 1980.1 - 2000.4 (20 years) is long enough for modeling. This 20-year long period definitely includes lots of crises and interventions. Hence, structural breaks in the data MUST be searched. This is necessary since in structurally-broken series, the existence of SBs changes t values of nonstationarity tests (thereby affects the decision on whether a series is stationary or not).
Since Narayan-Popp 2010 nonstationarity test under multiple structural breaks is statistically very powerful against previous ones (Lee-Strazichic2003, Zivot-Andres1992), and since Joyeux 2007 (in Rao2007) has proven the illogicalness of these previous tests, and NP2013 has proven the superiority of NP2010's statistical power, one MUST use NP2010. Since Gauss code for NP2010 seemed to be ugly to me, I converted it to R code, and with the help of ggplot2, results are presented nicer.
[Processing structural breaks is a MUST for cointegration check as well since Osterwald-Lenum1992 CVs ignore SBs whereas Johansen-Mosconi-Nielsen2000 CVs cares SBs.]
Canada <- as.data.frame(Canada)
head(Canada)
e prod rw U
1 929.6105 405.3665 386.1361 7.53
2 929.8040 404.6398 388.1358 7.70
...................................
# Assign lexiographic row names for dates of observations
row.names(Canada) <- paste(sort(rep(seq(1980, 2000, 1), 4) ), rep(seq(1, 4, 1), 20), sep = ".")
# Insert lexiographic "date" column to the dataframe. This is necessary for creating intervention dummies.
DCanada <- data.frame(date=row.names(Canada),Canada) # dataset with obs dates in a column
head(DCanada)
date e prod rw U
1980.1 1980.1 929.6105 405.3665 386.1361 7.53
1980.2 1980.2 929.8040 404.6398 388.1358 7.70
Perform Narayan-Popp 2010 nonstationarity test to the series:
[H0: "(with 2 structural breaks) series is nonstationary";
H1: "(with 2 structural breaks) series is stationary";
"test stat > critical value" => "hold H0"; "test stat < critical value" => "hold H1"]
library(causfinder)
narayanpopp(DCanada[,2]) # for e
narayanpopp(DCanada[,3]) # for prod
narayanpopp(DCanada[,4]) # for rw
narayanpopp(DCanada[,5]) # for U
Narayan-Popp 2010 nonstationarity test results (with obs #s):
variable t stat lag SB1 SB2 Integration Order
e -4.164 2 37:946.86 43:948.03 I(1)
prod -3.325 1 24:406.77 44:405.43 I(1)
rw -5.087 0 36:436.15 44:446.96 I(0) <trend-stationary>
U -5.737 1 43:8.169 53:11.070 I(0) <stationary pattern> (M2 computationally singular; used M1 model)
(critical values (M2): (1%,5%,10%): -5.576 -4.937 -4.596)
(critical values (M1): (1%,5%,10%): -4.958 -4.316 -3.980
Since in a VAR structure, all variables are treated equally, continue to equal-treatment when determining structural breaks systemwise:
mean(c(37,24,36,43)) # 35; SB1 of system=1988.3
mean(c(43,44,44,53)) # 46; SB2 of system=1990.2
The following is to overcome "In Ops.factor(left, right) : >= not meaningful for factors" error. In some dataset, we need to do the following:
library(readxl)
write.xlsx(Canada, file="data.xlsx", row.names=FALSE) # Take this to the below folder, add "date" column with values 1980.1,....,2000.4
mydata <- read_excel("D://eKitap//RAO 2007 Cointegration for the applied economist 2E//JoyeuxCalisma//Canada//data.xlsx")
# arrange your path accordingly in the above line.
mydata <- as.data.frame(mydata)
library(lubridate); library(zoo)
row.names(mydata) <- as.yearqtr(seq(ymd('1980-01-01'), by = '1 quarter', length.out=(84)))
Dmydata <- mydata # Hold it in a variable
Define intervention dummy matrix with 2 SBs (35:1988.3 and 46: 1990.2) as follows:
library(data.table)
DataTable <- data.table(Dmydata, keep.rownames=FALSE)
Dt <- cbind("bir"=1, # intervention dummies matrix
"D2t" = as.numeric(ifelse( DataTable[,c("date"), with=FALSE] >= "1988.3" & DataTable[,c("date"), with=FALSE] <= "1990.1", 1 , 0)),
"D3t" = as.numeric(ifelse( DataTable[,c("date"), with=FALSE] >= "1990.2" & DataTable[,c("date"), with=FALSE] <= "2000.4", 1 , 0)))
On the fly indicator variables accompanying intervention dummies:
OnTheFlyIndicator <- cbind(
"I2t" = as.numeric(DataTable[, c("date"), with=FALSE] == "1988.3"),
"I3t" = as.numeric(DataTable[, c("date"), with=FALSE] == "1990.2"))
myTimeTrend <- as.matrix(cbind("TimeTrend" = as.numeric(1:nrow(Dt))))
zyDt <- Dt * as.vector(myTimeTrend) # TimeTrendDavranisDegisimleri
colnames(zyDt) <- paste(colnames(myTimeTrend), colnames(Dt), sep="*")
mydata <- mydata[,-1]
Selection of VAR order:
library(vars)
# Lag order selection with the effects of intervention dummies
VARselect(mydata, lag.max=5, "both", exogen=cbind(zyDt[drop=FALSE], Dt[drop=FALSE], OnTheFlyIndicator)) # Take VAR(3)
Lagger matrix for Joyeux2007 indexing technique:
lagmatrix <- function(x, maxlag){
x <- as.matrix(x)
if(is.null(colnames(x))== TRUE){ colnames(x) <- "VarCol0" }
DondurulenDizey <- embed(c(rep(NA,maxlag),x),maxlag+1)
dimnames(DondurulenDizey)[[2]] <- c(colnames(x)[1, drop = FALSE], paste(colnames(x)[1,drop=FALSE],".",1:maxlag,"l", sep = ""))
return(DondurulenDizey)
}
Assign VAR lag and no. of subsamples:
VARlag <- 3
Subsamples <- 3 # subsamples = no. of str breaks +1
Dummy matrix for 2 structural breaks:
dummymatrix2SB <- matrix(NA,DataTable[,.N], 10)
dummymatrix2SB <- cbind(myTimeTrend,
lagmatrix(zyDt[,c("TimeTrend*D2t"), drop=FALSE], maxlag=VARlag)[,1+VARlag, drop=FALSE],
lagmatrix(zyDt[,c("TimeTrend*D3t"), drop=FALSE], maxlag=VARlag)[,1+VARlag, drop=FALSE],
lagmatrix(Dt[,c("D2t"), drop=FALSE], maxlag=VARlag)[,1+VARlag, drop=FALSE],
lagmatrix(Dt[,c("D3t"), drop=FALSE], maxlag=VARlag)[,1+VARlag, drop=FALSE],
lagmatrix(OnTheFlyIndicator[,c("I2t"), drop=FALSE], maxlag=VARlag-1),
lagmatrix(OnTheFlyIndicator[,c("I3t"), drop=FALSE], maxlag=VARlag-1))
dummymatrix2SB[is.na(dummymatrix2SB)] <- 0 # replace NAs with 0
dummymatrix2SB # Print dummy matrix for 2 str breaks to make sure all are OK
TimeTrend TimeTrend.D2t.3l TimeTrend.D3t.3l D2t.3l D3t.3l I2t I2t.1l I2t.2l I3t I3t.1l I3t.2l
1 0 0 0 0 0 0 0 0 0 0
2 0 0 0 0 0 0 0 0 0 0
...........................................
34 0 0 0 0 0 0 0 0 0 0
35 0 0 0 0 1 0 0 0 0 0
36 0 0 0 0 0 1 0 0 0 0
37 0 0 0 0 0 0 1 0 0 0
38 35 0 1 0 0 0 0 0 0 0
39 36 0 1 0 0 0 0 0 0 0
40 37 0 1 0 0 0 0 0 0 0
41 38 0 1 0 0 0 0 0 0 0
42 39 0 1 0 0 0 0 1 0 0
43 40 0 1 0 0 0 0 0 1 0
44 41 0 1 0 0 0 0 0 0 1
45 0 42 0 1 0 0 0 0 0 0
46 0 43 0 1 0 0 0 0 0 0
............................................
83 0 80 0 1 0 0 0 0 0 0
84 0 81 0 1 0 0 0 0 0 0
STABILITY of VAR:
Victor, theoretically you are wrong. Stability is checked from VAR side even in the case of restricted (cointegrated) VAR models. See Joyeux2007 for details. Also, estimations from both sides are same:
"unrestricted VAR = unrestricted VECM" and
"restricted VAR = restricted VECM".
Hence, checking stability of unrestricted VAR is equal to checking stability of unrestricted VECM, and vice versa. They are equal math'ly, they are just different representations.
Also, checking stability of restricted VAR is equal to checking stability of restricted VECM, and vice versa. They are equal math'ly, they are just different representations. But, you do not need this checking for restricted VECM cases since we are surfing in subspace of a feasible VAR. That is to say, if original unr VAR corresponding to restd VeCM is stable, then all are OK.
If your series are cointegrated, you check the stability from VAR side even in that case! If you wonder "whether you should check stability for restricted VECM", the answer is NO. You should not check. Because, in cointegrated case, you are in the subspace of feasible solution. That said, if you insist to check stability of restricted (cointegrated) VECM, you can still do that via urca::ca.jo extentions and vars::vec2var extentions:
print(roots(VAR(mydata, p=3, "both", exogen=cbind(zyDt[drop=FALSE], Dt[drop=FALSE], OnTheFlyIndicator)), modulus=TRUE))
# [1] 0.96132524 0.77923543 0.68689517 0.68689517 0.67578368 0.67578368
[7] 0.59065419 0.59065419 0.55983617 0.55983617 0.33700725 0.09363846
print(max(roots(VAR(mydata, p=3, "both", exogen=cbind(zyDt[drop=FALSE], Dt[drop=FALSE], OnTheFlyIndicator)), modulus=TRUE)))
#0.9613252
(optional) Check stability via OLS-CUSUM:
plot(stability(VAR(mydata, p=3, "both", exogen=cbind(zyDt[drop=FALSE], Dt[drop=FALSE], OnTheFlyIndicator)), type="OLS-CUSUM"))
NON-AUTOCORRELATION of VAR residuals test:
for (j in as.integer(1:5)){
print(paste("VAR's lag no:", j))
print(serial.test(VAR(mydata, p=j, "both", exogen=cbind(zyDt[drop=FALSE], Dt[drop=FALSE], OnTheFlyIndicator)), lags.bg=4, type= c("ES")))
# lags.bg: AR order of VAR residuals
}
NORMALITY of VAR residuals test:
print(normality.test(VAR(mydata, p=3, "both", exogen=cbind(zyDt[drop=FALSE], Dt[drop=FALSE], OnTheFlyIndicator)), multivariate=TRUE))
library(normtest)
for (i in as.integer(1:4)){ # there are 4 variables
print(skewness.norm.test(resid(VAR(mydata, p=3, "both", exogen=cbind(zyDt[drop=FALSE], Dt[drop=FALSE], OnTheFlyIndicator)))[,i]))
print(kurtosis.norm.test(resid(VAR(mydata, p=3, "both", exogen=cbind(zyDt[drop=FALSE], Dt[drop=FALSE], OnTheFlyIndicator)))[,i]))
print(jb.norm.test(resid(VAR(mydata, p=3, "both", exogen=cbind(zyDt[drop=FALSE], Dt[drop=FALSE], OnTheFlyIndicator)))[,i]))
}
HOMOSCEDASTICITY of VAR residuals test:
print(arch.test(VAR(mydata, p=3, "both", exogen=cbind(zyDt[drop=FALSE], Dt[drop=FALSE], OnTheFlyIndicator))), lags.multi=6, multivariate.only=TRUE)
Since integration orders of series is different, there is no way that they are cointegrated. That said,
Assume for a while all are I(1) and perform cointegration test with multiple structural breaks with Johansen-Mosconi-Nielsen 2000 CVs:
(extend urca::cajo to causfinder::ykJohEsbInc (i.e., add the functionality to process 1 SB and 2 SBs))
summary(ykJohEsbInc(mydata, type="trace", ecdet="zamanda2yk", K=3, spec="longrun", dumvar=dummymatrix2SB[,c(-1,-2,-3)]))
# summary(ykJohEsbInc(mydata, type="trace", ecdet="zamanda2yk", K=3, spec="transitory", dumvar=dummymatrix2SB[,c(-1,-2,-3)])) gives the exactly same result.
Since there are 2 SBs in the system (1988.3, 1990.2), there are q=2+1=3 subsamples.
1st SB ratio: v1= (35-1)/84= 0.4047619
2nd SB ratio:v2= (46-1)/84= 0.5357143
Hence, JMN2000 CVs for cointegration test with 2 SBs:
(The following is TR-localized. One can find original EN-local code in Giles website)
library(gplots)
# Johansen vd. (2000) nin buldugu, yapisal kirilmalarin varliginda esbutunlesim incelemesinin degistirilmis iz sinamalarinin yanasik p degerleri ve karar degerlerini hesaplama kodu
# Ryan Godwin & David Giles (Dept. of Economics, Univesity of Victoria, Canada), 29.06.2011
# Kullanici asagidaki 4 degeri atamalidir
#======================================
degiskensayisi <- 4 # p
q<- 3 # q: verideki farkli donemlerin sayisi; q=1: 1 donem, hicbir yapisal kirilma yok demek oldugundan v1 ve v2 nin degerleri ihmal edilir
v1<- 0.4047619 # (35-1)/84 # 1.yk anı=34+1=35. Johansen et. al 2000 v1 def'n , v1: SB1 - 1
v2<- 0.5357143 # (46-1)/84 # 2nd SB moment 45+1=46.
#======================================
# iz istatistiginin biri veya her ikisi icin p degerlerinin olmasi istendiginde, sonraki 2 satirin biri veya her ikisini degistir
izZ <- 15.09 # Vz(r) istatistiginin degeri
izK <- 114.7 # Vk(r) istatistiginin degeri
#=========================================
enbuyuk_p_r<- degiskensayisi # "p-r > 10" olmasın; bkz: Johansen vd. (2000)
# "a" ve "b" nin değerleri yapısal kırılmaların sayısına (q-1) bağlıdır
# q=1 iken, hiçbir yapısal kırılma olmadığı bu durumda a=b=0 ata
# q=2 iken, 1 yapısal kırılma olduğu bu durumda a=0 (Johansen vd. 2000 4.Tabloda) ve b=min[V1 , (1-V1)] ata
# q=3 iken, 2 yapısal kırılma olduğu bu durumda a=min[V1, (V2-V1), (1-V2)] ve b=min[geriye kalan iki V ifadesi] ata
a = c(0, 0, min(v1, v2-v1, 1-v2))[q]
b = c(0, min(v1, 1-v1), median(c(v1,v2-v1,1-v2)))[q]
# YanDagOrtLog: yanaşık dağılımın ortalamasının logaritması
# YanDagDegLog: yanaşık dağılımın değişmesinin logaritması
# V(Zamanyönsemsi) veya V(Kesme) sınamalarını yansıtmak üzere adlara z veya k ekle.
# Bkz. Johansen vd. (2000) 4. Tablo.
# Önce Vz(r) sınamasının sonra Vk(r) sınamasının karar değerlerini oluştur
pr<- c(1:enbuyuk_p_r)
YanDagOrtLogZ <- 3.06+0.456*pr+1.47*a+0.993*b-0.0269*pr^2-0.0363*a*pr-0.0195*b*pr-4.21*a^2-2.35*b^2+0.000840*pr^3+6.01*a^3-1.33*a^2*b+2.04*b^3-2.05/pr-0.304*a/pr+1.06*b/pr
+9.35*a^2/pr+3.82*a*b/pr+2.12*b^2/pr-22.8*a^3/pr-7.15*a*b^2/pr-4.95*b^3/pr+0.681/pr^2-0.828*b/pr^2-5.43*a^2/pr^2+13.1*a^3/pr^2+1.5*b^3/pr^2
YanDagDegLogZ <- 3.97+0.314*pr+1.79*a+0.256*b-0.00898*pr^2-0.0688*a*pr-4.08*a^2+4.75*a^3-0.587*b^3-2.47/pr+1.62*a/pr+3.13*b/pr-4.52*a^2/pr-1.21*a*b/pr-5.87*b^2/pr+4.89*b^3/pr
+0.874/pr^2-0.865*b/pr^2
OrtalamaZ<- exp(YanDagOrtLogZ)-(3-q)*pr
DegismeZ<- exp(YanDagDegLogZ)-2*(3-q)*pr
# Sinama istatistiginin yanasik dagilimina yaklasmakta kullanilacak Gama dagiliminin sekil ve olcek degiskelerini elde etmek icin yanasik ortalama ve degismeyi kullanarak
# V0 varsayimi altinda istenen quantilelari elde et:
# quantilelar: olasilik dagiliminin araligini veya bir ornekteki gozlemleri, esit olasiliklara sahip birbirlerine bitisik araliklarla bolen kesim noktalari.
tetaZ <- DegismeZ/OrtalamaZ
kZ <- OrtalamaZ^2/DegismeZ
YanDagOrtLogK<- 2.80+0.501*pr+1.43*a+0.399*b-0.0309*pr^2-0.0600*a*pr-5.72*a^2-1.12*a*b-1.70*b^2+0.000974*pr^3+0.168*a^2*pr+6.34*a^3+1.89*a*b^2+1.85*b^3-2.19/pr-0.438*a/pr
+1.79*b/pr+6.03*a^2/pr+3.08*a*b/pr-1.97*b^2/pr-8.08*a^3/pr-5.79*a*b^2/pr+0.717/pr^2-1.29*b/pr^2-1.52*a^2/pr^2+2.87*b^2/pr^2-2.03*b^3/pr^2
YanDagDegLogK<- 3.78+0.346*pr+0.859*a-0.0106*pr^2-0.0339*a*pr-2.35*a^2+3.95*a^3-0.282*b^3-2.73/pr+0.874*a/pr+2.36*b/pr-2.88*a^2/pr-4.44*b^2/pr+4.31*b^3/pr+1.02/pr^2-0.807*b/pr^2
OrtalamaK <- exp(YanDagOrtLogK)-(3-q)*pr
DegismeK <- exp(YanDagDegLogK)-2*(3-q)*pr
# Sinama istatistiginin yanasik dagilimina yaklasmakta kullanilacak Gama dagiliminin sekil ve olcek degiskelerini elde etmek icin yanasik ortalama ve degismeyi kullanarak
# V0 varsayimi altinda istenen quantilelari elde et:
# quantilelar: olasilik dagiliminin araligini veya bir ornekteki gozlemleri, esit olasiliklara sahip birbirlerine bitisik araliklarla bolen kesim noktalari.
tetaK <- DegismeK/OrtalamaK
kK <- OrtalamaK^2/DegismeK
# (izZ veya izK den biri 0 dan farklı ise) karar değerlerini ve p değerlerini tablolaştır:
windows(6,3.8)
KararDegerleri <- cbind(sapply(c(.90,.95,.99) , function(x) sprintf("%.2f",round(c(qgamma(x, shape=kZ,scale=tetaZ)),2))),
sapply(c(.9,.95,.99) , function(x) sprintf("%.2f",round(c(qgamma(x, shape=kK,scale=tetaK)),2))))
colnames(KararDegerleri) <- rep(c(0.90,0.95,0.99),2)
# rownames(KararDegerleri) <- pr
rownames(KararDegerleri) <- c(sapply((degiskensayisi -1):1, function(i) paste(degiskensayisi - i, " ","(r<=", i, ")",sep="")), paste(degiskensayisi, " ( r=0)", sep=""))
textplot(KararDegerleri, cex=1)
text(.064,.91,"p-r",font=2)
text(.345,1,expression(paste(plain(V)[z],"(r) test")),col=2)
text(.821,1,expression(paste(plain(V)[k],"(r) test")),col=4)
title("Yanasik Karar Degerleri \n (p:duzendeki degisken sayisi; r:esbutunlesim ranki)")
if(izZ!=0){
windows(4,3.8)
pDegerleri <- matrix(sprintf("%.3f",round(1 - pgamma(izZ, shape=kZ, scale = tetaZ),3)))
# rownames(pDegerleri) <- pr
rownames(pDegerleri) <- c(sapply((degiskensayisi -1):1, function(i) paste(degiskensayisi - i, " ","(r<=", i, ")",sep="")), paste(degiskensayisi, " ( r=0)", sep=""))
textplot(pDegerleri,cex=1,show.colnames=F)
text(.69,.96,substitute(paste("Pr(",plain(V)[z],">",nn,")"),list(nn=izZ)),col=2)
text(.45,.96,"p-r",font=2)
title("Yanasik p Degerleri \n (p:duzendeki degisken sayisi; \n r:esbutunlesim ranki)")
}
if(izK!=0){
windows(3,3.8)
pDegerleri <- matrix(sprintf("%.3f",round(1 - pgamma(izK, shape=kK, scale = tetaK),3)))
#rownames(pDegerleri) <- pr
rownames(pDegerleri) <- c(sapply((degiskensayisi -1):1, function(i) paste(degiskensayisi - i, " ","(r<=", i, ")",sep="")), paste(degiskensayisi, " ( r=0)", sep=""))
textplot(pDegerleri,cex=1,show.colnames=F)
text(.78,.96,substitute(paste("Pr(",plain(V)[k],">",nn,")"),list(nn=izK)),col=4)
text(.43,.96,"p-r",font=2)
title("Yanasik p Degerleri \n (p:duzendeki degisken sayisi; \n r:esbutunlesim ranki)")
}
Hence, the according to JMN2000 CVs, there is no cointegration as well. So, your usage of vec2var is meaningless. Because, vec2var is needed in cointegrated cases. Again, assume all series are cointegrated to make you happy (to create need to use vec2var) and continue with the most difficult case (cointegration for series with multiple structural breaks); i.e., we are continueing with "One who pee-pees ambitiously drills the wall" logic.
Extend vars::vec2var to causfinder::vec2var_ykJohEsbInc to handle transformations under "multiple structural breaks" case having relevant intervention dummies. JMN2000 application above showed cointegration rank r is not within [1,4-1]=[1,3] range. Even though that assume JMN2000 CVs resulted r=1 in the above for the sake of argument.
So, to transform restricted VECM to restricted VAR (under multiple=2 structural breaks), apply:
vec2var_ykJohEsbInc(ykJohEsbInc(mydata, type="trace", ecdet="zamanda2yk", K=3, spec="longrun", dumvar=dummymatrix2SB[,c(-1,-2,-3)]),r=1)
These results in:
Deterministic coefficients (detcoeffs):
e prod rw U
kesme 22.6612871 -0.215892151 32.0610121 -9.26649249 #(const)
zyonsemesi 0.2505164 -0.009900004 0.3503561 -0.10494714 #(trend)
zy*D2t_3 0.2238060 -0.008844454 0.3130007 -0.09375756
zy*D3t_3 -0.1234803 0.004879743 -0.1726916 0.05172878
$deterministic
kesme zyonsemesi zy*D2t_3 zy*D3t_3 D2t.3l D3t.3l
e 22.6612871 0.250516390 0.223806048 -0.123480327 -8.8012612 5.3052074
prod -0.2158922 -0.009900004 -0.008844454 0.004879743 -0.1157137 -0.3396206
rw 32.0610121 0.350356063 0.313000702 -0.172691620 -12.5838458 7.2201840
U -9.2664925 -0.104947142 -0.093757559 0.051728781 3.5836119 -2.2921099
I2t I2t.1l I2t.2l I3t I3t.1l I3t.2l
e -0.2584379 0.08470453 0.2102661 -0.51366831 -1.0110891 -2.08728944
prod 0.3013044 0.25103445 -0.8640467 0.08804425 -0.2362783 -0.05606892
rw -0.5838161 0.28400182 1.2073483 -0.67760848 -2.2650094 -0.70586316
U 0.1305258 0.03559119 0.1476985 0.14614290 0.6847273 1.27469940
$A
$A$A1
e.1g prod.1g rw.1g U.1g
e 1.4817704 0.1771082 -0.2274936 0.2332402
prod -0.1605790 1.1846699 0.0406294 -0.9398689
rw -0.8366449 -0.1910611 0.9774874 0.4667430
U -0.4245817 -0.1498295 0.1226085 0.7557885
$A$A2
e.2g prod.2g rw.2g U.2g
e -0.8441175 -0.04277845 0.01128282 -0.01896916
prod -0.3909984 -0.25960184 -0.20426749 0.79420691
rw 1.4181448 -0.03659278 -0.12240211 -0.06579174
U 0.4299422 0.09070905 0.04935195 -0.12691817
$A$A3
e.3g prod.3g rw.3g U.3g
e 0.40149641+0i -0.07067529+0i -0.008175418-0i 0.2286283+0i
prod 0.55003024+0i 0.07241639+0i 0.172505474-0i 0.1281593+0i
rw -0.52674826+0i 0.31667695+0i -0.168897398-0i 0.2184591+0i
U -0.02176108-0i 0.03245409-0i -0.077959841+0i 0.1855889-0i
So, now, check roots:
print(roots(vec2var_ykJohEsbInc(ykJohEsbInc(mydata, type="trace", ecdet="zamanda2yk", K=3, spec="longrun", dumvar=dummymatrix2SB[,c(-1,-2,-3)]),r=1), modulus=TRUE))
This result in "Please provide an object of class 'varest', generated by 'VAR()'." since vars::roots was not extended because: we do NOT need this extention! As I said before, even in the case of restricted VECM, stability is checked from VAR side. You must read Joyeux2007 line by line to see this.
I will supply the ouputs (print-screens) of above functions thouroughly for further clarification.
I will also write extention to vars::root as well just for pedagogical reasons.

Procedure of the OneR algorithm in R

I have used the OneR algorithm of the FSelecter Package to find the Attribut with the lowest error rate. My class Attribut is yes and no.
My characteristics of the attributs are also yes and no.
The result of the OneR algorithm is:
Ranking-No. 1
Atribut-Name: OR1:
Matrix: ------ 0(Attribut-Characteristic) -- 1 (Attribut Characteristics
0(Class):-------------------25243-------------------0
1(Class: -------------------1459-------------------18
Error-Rate: 1459 (0 + 1459)
Ranking-No. 2
Atribut-Name: OR2:
Matrix: ------ 0(Attribut-Characteristic) -- 1 (Attribut Characteristics
0(Class):-------------------25243-------------------0
1(Class: -------------------1460-------------------17
Error-Rate: 1460 (0 + 1460)
However, if I use the correlation function on the same data Frame the best attributs have got a lower error rate than the attributs, which i get with the oneR function.
Atribut-Name: CO4:
Matrix: ------ 0(Attribut-Characteristic) -- 1 (Attribut Characteristics
0(Class):-------------------25204-------------------39
1(Class: -------------------1348-------------------129
Error-Rate: 1387 (39 + 1348)
Can anybody tell me, why the OneR algorithm does not show the CO4 Attribut as the best Attribut (based on the error rate)?
Which criterias does the OneR algorithm use?
--- Addition to better understand my question ---
The complete data are too big to show it.
I have constructed a new datapool, which has the same effect
DELAYED - OR1 - CO4 ..
1 ---------1--------1--
0 ---------0--------0--
0 ---------0--------1--
1 ---------0--------1--
0 ---------0--------0--
1 ---------0--------1--
0 ---------0--------0--
1 ---------0--------1--
The code for show the error rate for a single attribute:
print(table(datapool_stackoverflow$DELAYED, datapool_stackoverflow$OR1))
The code the OneR function:
library(FSelector)
oneR_stackoverflow <- oneR(DELAYED~., datapool_stackoverflow)
subset_stackoverflow <- cutoff.k(oneR_stackoverflow, 2)
print(subset_stackoverflow)
The code for the correlation:
cor(as.numeric(datapool_stackoverflow$DELAYED), as.numeric(datapool_stackoverflow$OR1))
In this case the results are:
Error-Rate: OR1
Matrix: ------ 0(Attribut-Characteristic) -- 1 (Attribut Characteristics
0(Class):---------------------4-------------------------0
1(Class: ---------------------3-------------------------1
Manuel calculated Error-Rate: 3(0 + 3)
Error-Rate: CO4
Matrix: ------ 0(Attribut-Characteristic) -- 1 (Attribut Characteristics
0(Class):-----------------------3-----------------------1
1(Class: -----------------------0-----------------------4
Error-Rate: 1(1 + 0)
Correlation:
Attribut OR1: 0.377
Attribut CO4: 0.77
OneR: "OR1", "CO4"
Why, does the OneR function provide the OR1 Attribut as the best Attribut to classify?
You haven't given the types of your data, but I'm assuming that you have numerical values. FSelector discretizes these values before using them in oneR and it seems that bad things happen there (which may be a bug in RWeka's Discretize function). However, you probably want factor variables anyway and not numeric data as you have only 0-1 values. Then everything works fine for me:
> df = data.frame(delayed=factor(c(1,0,0,1,0,1,0,1)), or1 = factor(c(1,0,0,0,0,0,0,0)), co4 = factor(c(1,0,1,1,0,1,0,1)))
> library(FSelector)
> oneR(delayed~., df)
attr_importance
or1 0.2000000
co4 0.4285714
As you can see, co4 now has a much higher importance than or1, as it should have.
Ok, i have the solution. The algorithm calculates the sum of the error rate of the characteristcs in a attribut (in relation to the max value of a characteristc)
In this example:
Attribut OR1: 3/7 + 0/1 = 3/7
Attribut CO4: 0/3 + 1/5 = 0.2
No, the CO4 should be chosen, choosing the other attribute is wrong - see what the OneR package (available on CRAN) gives:
> library(OneR)
> DELAYED <- c(1, 0, 0, 1, 0, 1, 0, 1)
> OR1 <- c(1, rep(0, 7))
> CO4 <- c(1, 0, 1, 1, 0, 1, 0, 1)
>
> data <- data.frame(DELAYED, OR1, CO4)
>
> model <- OneR(formula = DELAYED ~., data = data, verbose = T)
Attribute Accuracy
1 * CO4 87.5%
2 OR1 62.5%
---
Chosen attribute due to accuracy
and ties method (if applicable): '*'
> summary(model)
Rules:
If CO4 = 0 then DELAYED = 0
If CO4 = 1 then DELAYED = 1
Accuracy:
7 of 8 instances classified correctly (87.5%)
Contingency table:
CO4
DELAYED 0 1 Sum
0 * 3 1 4
1 0 * 4 4
Sum 3 5 8
---
Maximum in each column: '*'
Pearson's Chi-squared test:
X-squared = 2.1333, df = 1, p-value = 0.1441
>
> model_2 <- OneR(formula = DELAYED ~ OR1, data = data)
> summary(model_2)
Rules:
If OR1 = 0 then DELAYED = 0
If OR1 = 1 then DELAYED = 1
Accuracy:
5 of 8 instances classified correctly (62.5%)
Contingency table:
OR1
DELAYED 0 1 Sum
0 * 4 0 4
1 3 * 1 4
Sum 7 1 8
---
Maximum in each column: '*'
Pearson's Chi-squared test:
X-squared = 0, df = 1, p-value = 1
You can find more information about the OneR package here: https://github.com/vonjd/OneR
(full disclosure: I am the author of this package)

JAGS Runtime Error: Cannot insert node into X[ ]. Dimension Mismatch

I'm trying to add a bit of code to a data-augmentation capture-recapture model and am coming up with some errors I haven't encountered before. In short, I want to estimate a series of survivorship phases that each last more than a single time interval. I want the model to estimate the length of each survivorship phase and use that to improve the capture recapture model. I tried and failed with a few different approaches, and am now trying to accomplish this using a switching state array for the survivorship phases:
for (t in 1:(n.occasions-1)){
phi1switch[t] ~ dunif(0,1)
phi2switch[t] ~ dunif(0,1)
phi3switch[t] ~ dunif(0,1)
phi4switch[t] ~ dunif(0,1)
psphi[1,t,1] <- 1-phi1switch[t]
psphi[1,t,2] <- phi1switch[t]
psphi[1,t,3] <- 0
psphi[1,t,4] <- 0
psphi[1,t,5] <- 0
psphi[2,t,1] <- 0
psphi[2,t,2] <- 1-phi2switch[t]
psphi[2,t,3] <- phi2switch[t]
psphi[2,t,4] <- 0
psphi[2,t,5] <- 0
psphi[3,t,1] <- 0
psphi[3,t,2] <- 0
psphi[3,t,3] <- 1-phi3switch[t]
psphi[3,t,4] <- phi3switch[t]
psphi[3,t,5] <- 0
psphi[4,t,1] <- 0
psphi[4,t,2] <- 0
psphi[4,t,3] <- 0
psphi[4,t,4] <- 1-phi4switch[t]
psphi[4,t,5] <- phi4switch[t]
psphi[5,t,1] <- 0
psphi[5,t,2] <- 0
psphi[5,t,3] <- 0
psphi[5,t,4] <- 0
psphi[5,t,5] <- 1
}
So this creates a [5,t,5] array where the survivorship state can only switch to the subsequent state and not backwards (e.g. 1 to 2, 4 to 5, but not 4 to 3). Now I create a vector where the survivorship state is defined:
PhiState[1] <- 1
for (t in 2:(n.occasions-1)){
# State process: draw PhiState(t) given PhiState(t-1)
PhiState[t] ~ dcat(psphi[PhiState[t-1], t-1,])
}
We start in state 1 always, and then take a categorical draw at each time step 't' for remaining in the current state or moving on to the next one given the probabilities within the array. I want a maximum of 5 states (assuming that the model will be able to functionally produce fewer by estimating the probability of moving from state 3 to 4 and onwards near 0, or making the survivorship value of subsequent states the same or similar if they belong to the same survivorship value in reality). So I create 5 hierarchical survival probabilities:
for (a in 1:5){
mean.phi[a] ~ dunif(0,1)
phi.tau[a] <- pow(phi_sigma[a],-2)
phi.sigma[a] ~ dunif(0,20)
}
Now this next step is where the errors start. Now that I've assigned values 1-5 to my PhiState vector it should look something like this:
[1] 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 3 3 4 4 5
or maybe
[1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2
and I now want to assign a mean.phi[] to my actual phi[] term, which feeds into the model:
for(t in 1:(n.occasions-1)){
phi[t] ~ dnorm(mean.phi[PhiState[t]],phi.tau[PhiState[t]])
}
However, when I try to run this I get the following error:
Error in jags.model(model.file, data = data, inits = init.values, n.chains = n.chains, :
RUNTIME ERROR:
Cannot insert node into mean.phi[1:5]. Dimension mismatch
It's worth noting that the model works just fine when I use the following phi[] determinations:
phi[t] ~ dunif(0,1) #estimate independent annual phi's
or
phi[t] ~ dnorm(mean.phi,phi_tau) #estimate hierarchical phi's from a single mean.phi
or
#Set fixed survial periods (this works the best, but I don't want to have to tell it when
#the periods start/end and how many there are, hence the current exercise):
for (a in 1:21){
surv[a] ~ dnorm(mean.phi1,phi1_tau)
}
for (b in 22:30){
surv[b] ~ dnorm(mean.phi2,phi2_tau)
}
for (t in 1:(n.occasions-1)){
phi[t] <- surv[t]
}
I did read this post: https://sourceforge.net/p/mcmc-jags/discussion/610037/thread/36c48f25/
but I don't see where I'm redefining variables in this case... Any help fixing this or advice on a better approach would be most welcome!
Many thanks,
Josh
I'm a bit confused as to what are your actual data (the phi[t]?), but the following might give you a starting point:
nt <- 29
nstate <- 5
M <- function() {
phi_state[1] <- 1
for (t in 2:nt) {
up[t-1] ~ dbern(p[t-1])
p[t-1] <- ifelse(phi_state[t-1]==nstate, 0, p_[t-1])
p_[t-1] ~ dunif(0, 1)
phi_state[t] <- phi_state[t-1] + equals(up[t-1], 1)
}
for (k in 1:nstate) {
mean_phi[k] ~ dunif(0, 1)
phi_sigma[k] ~ dunif(0, 20)
}
for(t in 1:(nt-1)){
phi[t] ~ dnorm(mean_phi[phi_state[t]], phi_sigma[phi_state[t]]^-2)
}
}
library(R2jags)
fit <- jags(list(nt=nt, nstate=nstate), NULL,
c('phi_state', 'phi', 'mean_phi', 'phi_sigma', 'p'),
M, DIC=FALSE)
Note that above, p is a vector of probabilities of moving up to the next (adjacent) state.

Modelling for zero using glm function in R

I am trying to build a logistic regression model using glm function in R. My dependent variable is binomial with 0 and 1 only. Here 0 - Non Return , 1- Return.
I want to model for Non-Return (0's),but glm function of R by default build for 1's. Like in SAS which by default build for lower value and we can use descending attribute in proc logistic to change the order, do we have something similar in glm too ?
I have one option to achieve this by changing 0 to 1 and vice-versa in my raw data but don't want to change my raw data.
Please help me or guide how can I do the similar thing in R.
Thanks in advance.
Just specify 1 - y as the DV:
set.seed(42)
y <- sample(c(0, 1), 10, TRUE)
#[1] 1 1 0 1 1 1 1 0 1 1
fit <- glm(y ~ 1, family = binomial)
coef(fit)
# (Intercept)
# 1.386294
log(mean(y) / (1 - mean(y)))
#[1] 1.386294
1 - y
#[1] 0 0 1 0 0 0 0 1 0 0
fit1 <- glm(1 - y ~ 1, family = binomial)
coef(fit1)
#(Intercept)
#-1.386294
log(mean(1 - y) / (1 - mean(1 - y)))
#[1] -1.386294
Alternatively, you can temporarily transform your data by using...transform:
glm( data = transform( data.frame(y=0), y=y+1 ), ... )

Resources