Problems with using plotCalibration() from the predictABEL package in R - r

I’ve been having some trouble with the plotCalibration() function, I have managed to get it to work before, but recently whilst working with another dataset (here is a link to the .Rda data file), I have been unable to shake off an error message which keeps cropping up:
> plotCalibration(data = data, cOutcome = 2, predRisk = data$sortmort)
Error in plotCalibration(data = data, cOutcome = 2, predRisk = data$sortmort) : The specified outcome is not a binary variable.`
When I’ve tried to set the cOutcome column to factors or to logical, it still doesn’t work.
I’ve looked at the source of the function and the only time the error message comes up is in the first if()else{} statement:
if (length(unique(y))!=2) {stop(" The specified outcome is not a binary variable.\n")}
else{
But I have checked that the length(unique(y)) is indeed ==2, and so don’t understand why the error message still crops up!

Be sure you're passing a dataframe to PlotCalibration. Passing a dplyr tibble can cause this error. Converting with the normal as.data.frame() worked for me.

Using the data you sent earlier, I do not see any error though:
Following output were produced along with a calibration plot:
> library(PredictABEL)
> plotCalibration(data = data, cOutcome = 2, predRisk = data$sortmort)
$Table_HLtest
total meanpred meanobs predicted observed
[0.000632,0.00129) 340 0.001 0.000 0.31 0
0.001287 198 0.001 0.000 0.25 0
[0.001374,0.00201) 283 0.002 0.004 0.53 1
0.002009 310 0.002 0.000 0.62 0
[0.002505,0.00409) 154 0.003 0.000 0.52 0
[0.004086,0.00793) 251 0.006 0.000 1.42 0
[0.007931,0.00998) 116 0.008 0.009 0.96 1
[0.009981,0.19545] 181 0.024 0.011 4.40 2
$Chi_square
[1] 4.906
$df
[1] 8
$p_value
[1] 0.7676

Please try using table(data[,2],useNA = "ifany") to see the number of levels of the outcome variable of your dataset.
The function plotCalibration will execute when the outcome is a binary variable (two levels).

Related

Matrix error when trying to calculate qv in R

So I have tried fitting a Plackett-Luce model to my matrix. Everything seems to be working fine untill I'm trying to calculate the Quasi-variance of the model and recieve the following error-message:
Error in X %*% as.vector(coefs) : Cholmod error 'X and/or Y have wrong dimensions' at file ../MatrixOps/cholmod_sdmult.c, line 88.
R <- as.rankings(ordered.matrix, input="orderings")
mod <- PlackettLuce(R, npseudo = 0.5, as.grouped_rankings=TRUE)
avRank <- apply(R, 2, function(x) mean(x[x > 0]))
coefs <- round(coef(mod)[order(avRank)], 2)
coefs
131 6 3 9 10 208 5 15 1 209
1.32 0.82 0.48 0.51 0.48 0.23 0.21 0.42 0.00 -0.37
qv <- qvcalc(mod)
Error in X %*% as.vector(coefs) :
Cholmod error 'X and/or Y have wrong dimensions' at file ../MatrixOps/cholmod_sdmult.c, line 88
Anyone who know's what the problem might be? Cheers

if/else test if ANY row in a column is greater than an absolute value (R studio)

I'm trying to test if any value/row from the column cooksd is greater than absolute 1.
if (any(lev$cooksd) > abs(-1)) {
print('yey')
} else {
print('no')
}
Warning message:
In any(lev$cooksd) : coercing argument of type 'double' to logical
but I don't think the any() function is doing the job...Any thoughts on this?
Thanks in advance!
Obs:
the data looks like this: (first two rows)
cooksd mdffits covtrace covratio leverage.o~1
<dbl> <dbl> <dbl> <dbl> <dbl>
0.0496 0.0478 0.0368 1.04 0.395
0.0261 0.0251 0.0371 1.04 0.395
edited after comment:
if (any(abs(lev$cooksd) > 1)) {
print('yey')
} else {
print('no')
}
It is simpler, more efficient and more informative to compute the maximum of the column values:
max(lev$cooksd) > 1
As well as showing you what the maximum value is, which might be informative, this approach is 3 to 4 times as fast as using any():
> x <- runif(1e8)
> system.time(any(x > 1))
user system elapsed
0.47 0.01 0.49
> system.time(max(x) > 1)
user system elapsed
0.14 0.00 0.14

Smooth.Pspline yields the same results with different spar values

I am trying to determine the best value of spar to implement across a dataset by reducing the root mean square error between test and training replicates on the same raw data. My test and training replicates look like this:
Traindataset
t = -0.008
-0.006
-0.004
-0.002 0
0.002
0.004
0.006
0.008
0.01
0.012
0.014
0.016
0.018
0.02
0.022
0.024
dist = NA 0 0 0 0
0.000165038
0.000686934
0.001168098
0.001928885
0.003147262
0.004054971
0.005605361
0.007192645
0.009504648
0.011498809
0.013013655
0.01342625
Testdataset
t = -0.008
-0.006
-0.004
-0.002 0
0.002
0.004
0.006
0.008
0.01
0.012
0.014
0.016
0.018
0.02
0.022
0.024
dist = NA 0 0 0 0 0
0.000481184
0.001306409
0.002590156
0.003328259
0.004429246
0.005012768
0.005829698
0.006567801
0.008030102
0.009617453
0.011202827
I need the spline to be 5th order so I can accurately predict the 3rd derivative, so I am using smooth.Pspline (from the pspline package) instead of the more common smooth.spline. I attempted using a variant of the solution outlined here (using root mean squared error of predicting testdataset from traindataset instead of cross validation sum of squares within one dataset). My code looks like this:
RMSE <- function(m, o){
sqrt(mean((m - o)^2))
}
Psplinermse <- function(spar){
trainmod <- smooth.Pspline(traindataset$t, traindataset$dist, norder = 5,
spar = spar)
testpreddist <- predict(trainmod,testdataset$t)[,1]
RMSE(testpreddist, testdataset$dist)
}
spars <- seq(0, 1, by = 0.001)
rmsevals <- rep(0, length(spars))
for (i in 1:length(spars)){
rmsevals[i] <- Psplinermse(spars[i])
}
plot(spars, rmsevals, 'l', xlab = 'spar', ylab = 'RMSE' )
The issue I am having is that for pspline, the values of RMSE are the same for any spar above 0 graph of spar vs RMSE. When I dug into just the predictions line of code, I realized I am getting the exact same predicted values of dist for any spar above 0. Any ideas on why this might be are greatly appreciated.

Apply log and log1p to several columns with an if condition

I have a dataframe and I need to calculate log for all numbers greater than 0 and log1p for numbers equal to 0. My dataframe is called tcPainelLog and is it like this (str from columns 6:8):
$ IDD: num 0.04 0.06 0.07 0.72 0.52 ...
$ Soil: num 0.25 0.22 0.16 0.00 0.00 ...
$ QAI: num 0.00 0.50 0.00 0.71 0.26 ...
Therefore, I guess need to concatenate an ifelse statement with log and log1p functions. However, I tried several different ways to do it, but none has succeeded. For instance:
tcPainelLog <- tcPainel
cols <- names(tcPainelLog[,6:17]) # These are the columns I need to calculate
tcPainelLog$IDD <- ifelse(X = tcPainelLog$IDD>0, log(X), log1p(X))
tcPainelLog[cols] <- lapply(tcPainelLog[cols], function(x) ifelse((x > 0), log(x), log1p(x)))
tcPainelLog[cols] <- if(tcPainelLog[,cols] > 0) log(.) else log1p(.)
I haven't been able to perform it and I would appreciate any help for that. I am really sorry it there is an explanation for that, I searched by many words but I didn't find it.
Best regards.

Add all elements in matrix R

I am trying to add all the elements in a matrix. This is an example of my matrix (the actual matrix is bigger):
m = matrix(c(528,479,538,603),nrow=2,ncol=2)
m
A B
male 528 538
female 479 603
I am trying to do:
sum.elements = colSums(colSums(m))
but it gives the following error:
Error in colSums(colSums(m)) : 'x' must be an array of at least two
dimensions
I have tried doing:
x = colSums(m)
sum.elements = x[1] + x[2]
but this would be very long when you have a 100-column matrix...
Any help would be greatly appreciated!
You can do sum. It also has the option na.rm to remove the NA values.
sum(m)
#[1] 2148
In general, sum works for vector, matrix and data.frame
Benchmarks
set.seed(24)
m1 <- matrix(sample(0:20, 5000*5000, replace=TRUE), ncol=5000)
system.time(sum(m1))
# user system elapsed
# 0.027 0.000 0.026
system.time(sum(colSums(m1)))
# user system elapsed
# 0.027 0.000 0.027
system.time(Reduce('+', m1))
# user system elapsed
#25.977 0.644 26.673
Reduce will work
Reduce(`+`,m)
[1] 2148

Resources