twang - Error in Di - crossprod(WX[index, ], X[index, ]) : non-conformable arrays - propensity-score-matching

I'm trying to build propensity scores with the twang package, but I keep getting this error:
Error in Di - crossprod(WX[index, ], X[index, ]) : non-conformable arrays
I'm attaching the code:
ps.TPSV.gbm = ps(Cardioversione ~ Sesso+ age,
data = prova)
> ps.TPSV.gbm = ps(Cardioversione ~ Sesso+ age,
+ data = prova)
Fitting boosted model
Iter TrainDeviance ValidDeviance StepSize Improve
1 0.6590 nan 0.0100 nan
2 0.6581 nan 0.0100 nan
3 0.6572 nan 0.0100 nan
4 0.6564 nan 0.0100 nan
5 0.6556 nan 0.0100 nan
6 0.6548 nan 0.0100 nan
7 0.6540 nan 0.0100 nan
8 0.6533 nan 0.0100 nan
9 0.6526 nan 0.0100 nan
...
9900 0.4164 nan 0.0100 nan
9920 0.4161 nan 0.0100 nan
9940 0.4160 nan 0.0100 nan
9960 0.4158 nan 0.0100 nan
9980 0.4157 nan 0.0100 nan
10000 0.4155 nan 0.0100 nan
Diagnosis of unweighted analysis
Error in Di - crossprod(WX[index, ], X[index, ]) : non-conformable arrays
I honestly don't understand which is the problem, the variables are one factorial (Sesso) and one numeric (age), there are no missing values...could anyone help me?
Thank you in advance
I've already tried changing the variables introduced in the PS but there's no way, I tried if the example code works with the lalonde dataset included in twang and it works well.

Related

Object p not found when running gbm()

I am aware of the question GBM: Object 'p' not found; however it did not contain sufficient information to allow the stack to answer. I don't believe this is a duplicate as I've followed what was indicated in this question and the linked duplicate Error in R gbm function when cv.folds > 0 which, does not describe the same error.
I have been sure to follow the recommendation of leaving out any columns that were not used in the model.
This error appears when the cv.folds is greater than 0:
object 'p' not found
From what I can see, setting cv.folds to 0 is not producing meaningful outputs.I have attempted different distributions, fractions, trees etc. I'm confident I've parameterized something incorrectly but I can't for the life of me see what it is.
Model and output:
model_output <- gbm(formula = ign ~ . ,
distribution = "bernoulli",
var.monotone = rep(0,9),
data = model_sample,
train.fraction = 0.50,
n.cores = 1,
n.trees = 150,
cv.folds = 1,
keep.data = T,
verbose=T)
Iter TrainDeviance ValidDeviance StepSize Improve
1 nan nan 0.1000 nan
2 nan nan 0.1000 nan
3 nan nan 0.1000 nan
4 nan nan 0.1000 nan
5 nan nan 0.1000 nan
6 nan nan 0.1000 nan
7 nan nan 0.1000 nan
8 nan nan 0.1000 nan
9 nan nan 0.1000 nan
10 nan nan 0.1000 nan
20 nan nan 0.1000 nan
40 nan nan 0.1000 nan
60 nan nan 0.1000 nan
80 nan nan 0.1000 nan
100 nan nan 0.1000 nan
120 nan nan 0.1000 nan
140 nan nan 0.1000 nan
150 nan nan 0.1000 nan
Minimum data to generate error used to be here, however once the suggest by #StupidWolf is employed it is too small, the suggestion below will get passed the initial error. Subsequent errors are occurring and solutions will be posted here upon discovery.
It's not meant to deal with the situation someone sets cv.folds = 1. By definition, k fold means splitting the data into k parts, training on 1 part and testing on the other.. So I am not so sure what is 1 -fold cross validation, and if you look at the code for gbm, at line 437
if(cv.folds > 1) {
cv.results <- gbmCrossVal(cv.folds = cv.folds, nTrain = nTrain,
....
p <- cv.results$predictions
}
It makes the predictions and when it collects the results into gbm, line 471:
if (cv.folds > 0) {
gbm.obj$cv.fitted <- p
}
So if cv.folds ==1, p is not calculated, but it is > 0 hence you get the error.
Below is a reproducible example:
library(MASS)
test = Pima.tr
test$type = as.numeric(test$type)-1
model_output <- gbm(type~ . ,
distribution = "bernoulli",
var.monotone = rep(0,7),
data = test,
train.fraction = 0.5,
n.cores = 1,
n.trees = 30,
cv.folds = 1,
keep.data = TRUE,
verbose=TRUE)
gives me the error object 'p' not found
Set it to cv.folds = 2, and it runs smoothly....
model_output <- gbm(type~ . ,
distribution = "bernoulli",
var.monotone = rep(0,7),
data = test,
train.fraction = 0.5,
n.cores = 1,
n.trees = 30,
cv.folds = 2,
keep.data = TRUE,
verbose=TRUE)

How to stop printing for "ps" function in "twang" package?

The "ps" function (propensity score estimation) in "twang" package in R keeps printing its report. How can I turn that off?
I already tried to set the "print.level" argument to be 0. But it is not working for me.
D = rbinom(100, size = 1, prob = 0.5)
X1 = rnorm(100)
X2 = rnorm(100)
ps(D ~ ., data = data.frame(D, X1, X2), stop.method = 'es.mean',
estimand = "ATE", print.level = 0)
I hope there is no printing of the process, but it keeps giving me something like:
Fitting gbm model
Iter TrainDeviance ValidDeviance StepSize Improve
1 1.3040 nan 0.0100 nan
2 1.3012 nan 0.0100 nan
3 1.2985 nan 0.0100 nan
4 1.2959 nan 0.0100 nan
5 1.2932 nan 0.0100 nan
6 1.2907 nan 0.0100 nan
7 1.2880 nan 0.0100 nan
8 1.2855 nan 0.0100 nan
9 1.2830 nan 0.0100 nan
10 1.2804 nan 0.0100 nan
20 1.2562 nan 0.0100 nan
.....
which is annoying.
Presumably you want to capture the result in a variable; if you combine that with the verbose = FALSE parameter, it should do what you need:
res <- ps(D ~ ., data = data.frame(D, X1, X2), stop.method = 'es.mean',
estimand = "ATE", print.level = 0, verbose = FALSE)
I haven't tested whether you still need print.level = 0.

How to suppress iteration output from Boosted tree model gbm in Caret from R studio

If I run this code tot train a gbm-model with Knitr, I receive several pages of Iter output like copied below. Is there a method to suppress this output?
mod_gbm <- train(classe ~ ., data = TrainSet, method = "gbm")
## Iter TrainDeviance ValidDeviance StepSize Improve
## 1 1.6094 nan 0.1000 0.1322
## 2 1.5210 nan 0.1000 0.0936
## 3 1.4608 nan 0.1000 0.0672
## 4 1.4165 nan 0.1000 0.0561
## 5 1.3793 nan 0.1000 0.0441
Thank you!
Try passing train the argument trace = FALSE.
This is a parameter not defined in the train documentation explicitly as it is part of the ... optional parameters.

lapply and passing arguments

I'm trying to learn how to effectivly use the apply family in R. I have the following numeric vector
>aa
[1] 0.047619 0.000000 NaN 0.000000 0.000000 0.000000 0.000000 0.000000
[9] NaN NaN 0.000000 NaN NaN NaN NaN NaN
[17] 0.000000 0.000000 NaN NaN NaN NaN NaN NaN
[25] NaN 0.100000 0.000000 0.000000 0.000000 0.000000 1.000000 NaN
[33] NaN NaN NaN NaN NaN NaN 0.133333 NaN
[41] NaN 0.000000 0.000000 0.000000 NaN NaN NaN NaN
[49] NaN
and I'm trying to get the n factor out of pwr.t.test with each of these as an input to the d argument.
My attempt(s) have yielded this as the latest result, and frankly, I'm stumped...> lapply(aa,function(x) pwr.t.test(d=x,power=.8,sig.level=.05,type="one.sample",alternative="two.sided"))
with the following error message:
Error in uniroot(function(n) eval(p.body) - power, c(2 + 1e-10, 1e+07)) :
f() values at end points not of opposite sign
Any ideas on the right way to do this?
Short answer: The number of subjects needed is greater than the maximum that R will check for. Add some checks so that you don't run the function when d == 0 and it will work.
When d = 0, you need an infinite number of subjects to detect the difference. The error you are seeing is because R tries to calculate power numerically. The algorithm R uses first checks the bounds of the interval over which the possible values for N lie (about 2 to 1e+07). Because the function for power has the same sign at both endpoints of the interval and is monotonic in N, R throws an error saying that the root (the value of N you are looking for) cannot be found.

How to deal with NaN in R?

I have two binary files with the same dimensions(corr and rmse ).I want to do this:
replace all pixels in rmse by NA whenevr corr is NA.
file1:
conne <- file("D:\\omplete.bin","rb")
corr<- readBin(conne, numeric(), size=4, n=1440*720, signed=TRUE)
file2:
rms <- file("D:\\hgmplete.bin","rb")
rmse<- readBin(rms, numeric(), size=4, n=1440*720, signed=TRUE)
I did this:
rmse[corr==NA]=NA
did not do anything, so I tried this:
rmse[corr==NaN]=NA
did not do anything either! Can anybody help me on this.
Head of the file corr:
> corr
[1] NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
You need to use the logical test is.nan(). In this case:
rmse[is.nan(corr)]=NA
should do the trick

Resources