Debugging AUC error in R. A data table is provided. Thanks - r

I am trying to calculate a an AUC for the following predictions, and outcomes using the AUC function, but I keep getting an error. I am supposed to do this by tomorrow. Any help would be much appreciated! Thanks!
structure(list(POD1HemoglobinCut = structure(c(1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 1L
), .Label = c("[10,Inf)", "[0,10)"), class = "factor"), pred = c(0.0044102752927413,
0.00782725095161221, 0.210140717409347, 0.066525545459026, 0.0666137804946143,
0.0125809431305506, 0.0107560804580978, 0.829245110498723, 0.759165998590355,
0.0128042545229042, 0.738354081921031, 0.00287448336844446, 0.0448026818172726,
0.0162243785121634, 0.0687716959646373, 0.0724616690876388, 0.005033110699528,
0.893314696161109, 0.883299551200163, 0.189696433058773)), row.names = c(NA,
-20L), class = c("data.table", "data.frame"))
roc <- roc(test, x = test$pred, class = test$POD1HemoglobinCut)

Related

What does "Error in app$vspace(new_style$`margin-top` %||% 0) : attempt to apply non-function" mean?

I want to figure out if liver values (various biochemical parameters) differ from other organs. I want to include individuals' ID into my approach. I sampled from each individual every time the same set of organs (so these are not independent measurements anymore,...)
I have liver, muscles,...etc.
I thought about repeated ONE-WAY ANOVA. I am not so experienced with different kind of ANOVAs... ? So this is a within-subject design, correct me pls if I am wrong.
I have many samples (n = 2130). Normality seems ok from the qqplot.
This is some data:
structure(list(ID = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L), .Label = c("FB4", "FB5", "FB6", "FB7", "FB8", "FB9",
"KBO16", "KBO21", "KBU10", "KBU11", "KBU12", "KBU15", "RB2",
"RB3", "SR1", "SR2", "SR3", "SR5", "SR6", "SR9", "TG1", "TG3",
"TG4", "TG5", "TG6", "YGL23", "YGL30", "YGL31", "YGL34", "YLG16",
"YLG19", "YLG21", "YLG22", "YLK11"), class = "factor"), sub_group.x = structure(c(1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("bullhead", "salmonid"
), class = "factor"), taxa.x = structure(c(1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L), .Label = c("cottus.gobio", "oncorhynchus.mykiss",
"salmo.trutta"), class = "factor"), combi = c("bullhead.brain",
"bullhead.brain", "bullhead.brain", "bullhead.brain", "bullhead.eyes",
"bullhead.eyes", "bullhead.eyes", "bullhead.eyes", "bullhead.liver",
"bullhead.liver"), sub_group.y = c("bullhead", "bullhead", "bullhead",
"bullhead", "bullhead", "bullhead", "bullhead", "bullhead", "bullhead",
"bullhead"), taxa.y = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L), .Label = c("cottus.gobio", "oncorhynchus.mykiss",
"salmo.trutta"), class = "factor"), PUFA = structure(c(1L, 3L,
4L, 2L, 1L, 3L, 4L, 2L, 1L, 3L), .Label = c("ARA.d13C", "DHA.d13C",
"EPA.d13C", "SDA.d13C"), class = "factor"), organ = structure(c(1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("brain", "eyes",
"liver", "muscle"), class = "factor"), isotopic_value = c(-40.0226662,
-43.1508914, -49.2039419, -44.6943377, -40.0226662, -43.1508914,
-49.2039419, -44.6943377, -40.0226662, -43.1508914)), row.names = c(NA,
10L), class = "data.frame")
I programmed a loop - I want to look at each biochemical parameter (variable "PUFA") separately
lapply(levels(leber_new$PUFA)[-4], function(x)
droplevels(na.omit(leber_new[leber_new$PUFA==x,c("ID","PUFA","organ","sub_group.x","isotopic_value")])) %>%
anova_test(dv = isotopic_value, wid = ID,
within = c(organ)))
But it doesn't work:
Error in app$vspace(new_style$`margin-top` %||% 0) :
attempt to apply non-function
Called from: clii__container_start(app, "span", class = funname)
What is it that I am doing wrong?

CORElearn package Isotonic regression calibration

Considering the following script, why does the isotonic regression calibration with this package return a full zero vector after calibration? Any idea (i asked the author of the package but he didn't answered sadly)?
library("CORElearn")
#Dataset for naive bayesian gaussian classification
(mydata_gauss<-structure(list(Individu = structure(c(2L, 2L, 2L, 2L, 1L, 1L,
1L, 1L), .Label = c("femme", "homme"), class = "factor"), Hauteur = c(182.88,
180.44, 170.07, 180.44, 152.4, 167.64, 165.2, 175.26), Masse = c(81.64,
86.18, 77.11, 74.84, 45.35, 68.03, 58.96, 68.03), Taille_Pied = c(30.48,
27.94, 30.48, 25.4, 15.24, 20.32, 17.78, 22.86)), class = "data.frame", row.names = c(NA,
-8L)))
model_gauss <- CoreModel(Individu ~., mydata_gauss, model="bayes")
#Dataset for naive bayesian binomial classification
(mydata_binomial<-structure(list(Couleur = structure(c(2L, 2L, 2L, 1L, 1L, 1L,
1L, 1L, 2L, 2L, 2L), .Label = c("Jaune", "Rouge"), class = "factor"),
Type = structure(c(1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 1L,
2L), .Label = c("Sports", "SUV"), class = "factor"), Origine = structure(c(1L,
1L, 1L, 1L, 2L, 2L, 2L, 1L, 2L, 2L, 1L), .Label = c("Domestique",
"Importé"), class = "factor"), Volé = structure(c(2L, 1L,
2L, 1L, 2L, 1L, 2L, 1L, 1L, 2L, 2L), .Label = c("Non", "Oui"
), class = "factor")), row.names = c(NA, -11L), class = "data.frame"))
model_binomial <- CoreModel(Volé ~., mydata_binomial, model="bayes")
#we display both models
print(model_gauss)
print(model_binomial)
#we do prediction of the both models to see if they are compliant with manual calculations
(pred_gauss <- predict(model_gauss, mydata_gauss, type="both"))
(pred_binomial <- predict(model_binomial, mydata_binomial, type="both"))
#confusion matrix
modelEval(model_gauss, mydata_gauss$Individu, pred_gauss$class, pred_gauss$prob)$predictionMatrix
modelEval(model_binomial, mydata_binomial$Volé, pred_binomial$class, pred_binomial$prob)$predictionMatrix
(calibration_gauss <- calibrate(mydata_gauss$Individu, pred_gauss$prob[,2], class1="homme"
,method="isoReg",assumeProbabilities=TRUE))
(calibration_binomial <- calibrate(mydata_binomial$Volé, pred_binomial$prob[,2], class1="Oui",
,method="isoReg",assumeProbabilities=TRUE))
(calibratedProbs_gauss <- applyCalibration(pred_gauss$prob[,class1], calibration_gauss))
(calibratedProbs_binomial <- applyCalibration(pred_binomial$prob[,class1], calibration_binomial))
Thanks in advance for your precious help

How to calculate observation per square km from spatial point data in R

How to calculate the total no. of values i.e., no of values per square kilometer using point data with lat lon and each point with 1 observation as in sample data below ?
I want to know the total number of values per square km ( can be 1 or more). I tried creating a raster but no luck.
den_dt <- structure(list(lat = c(49.0267, 49.0984, 49.1023, 49.107, 49.1077,
49.1107, 49.1178, 49.1278, 49.1493, 49.1634, 49.2385, 49.2498,
49.2834, 49.3235, 49.3467, 49.3796, 49.3878, 51.7285, 51.7319,
51.7524, 51.7781, 51.7841, 51.7851, 51.7926, 51.8188, 51.9553,
52.0331, 52.0342, 52.214, 52.2379, 52.4323, 52.492, 52.5312,
52.5337, 52.5772, 52.6456, 52.656, 52.7196, 52.8439, 52.851),
lon = c(108.9861, 108.9342, 108.8654, 108.73, 109.0154, 108.9548,
108.8164, 108.6334, 108.9442, 108.6959, 119.4774, 119.5568,
117.5601, 117.5536, 119.3105, 119.9174, 117.594, 119.7592,
119.7747, 122.6436, 119.7638, 119.651, 122.6079, 119.7761,
119.7572, 121.9104, 121.8712, 122.4515, 121.7362, 121.7861,
121.9452, 121.9638, 121.471, 122.1595, 121.5216, 122.2008,
121.9462, 121.4331, 122.1229, 122.1054), val = c(1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L)), row.names = c(NA, -40L), class = c("tbl_df",
"tbl", "data.frame"))
den_dt_ras <- rasterFromXYZ(den_dt)
You can do that like this
create a raster r with the desired resolution and with an appropriate coordinate reference system (crs). If you want a spatial resolution of 1 km2 you need a planar crs.
project your points p to the crs of the raster
use rasterize(p, r, fun="count")

Awtan method in caret::train fails with Error: lpawnb is not an exported objects from namespace:bnclassify

I'm trying to create a predictive model using the awtan method in caret, but training continues to fail with the error:
model fit failed for Resample04: score=bic, smooth=1 Error : 'lpawnb' is not an exported object from 'namespace:bnclassify'
I'm using bnclassify version 0.3.4. Based on the release notes on github, it looks like lpawnb() was replaced by lp() in version 0.3.2, so my initial guess is that the problem is some sort of legacy bug (in caret? in bnclassify?) that's calling the latter instead of the former.
On the other hand, perhaps I'm just doing something wrong. Here's a toy example:
library(caret)
librarY
data <- structure(list(var1 = structure(c(2L, 2L, 1L, 2L, 2L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("0", "1"), class = "factor"), var2 = structure(c(2L, 1L, 1L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 2L, 1L, 2L, 2L, 1L, 1L, 2L), .Label = c("0", "1"), class = "factor"), var3 = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 1L), .Label = c("0", "1"), class = "factor"), var4 = structure(c(2L, 1L, 1L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 2L, 1L, 1L, 2L, 1L, 1L, 1L, 1L), .Label = c("0", "1"), class = "factor"), var5 = structure(c(2L, 2L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("0", "1"), class = "factor"), var6 = structure(c(2L, 2L, 1L, 2L, 2L, 1L, 1L, 2L, 1L, 1L, 2L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 1L, 1L), .Label = c("0", "1"), class = "factor"), var7 = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 2L), .Label = c("0", "1"), class = "factor"), var8 = structure(c(2L, 1L, 2L, 2L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 2L, 2L, 1L, 2L, 2L, 1L, 2L), .Label = c("0", "1"), class = "factor"), outcome = structure(c(2L, 1L, 1L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 1L ), .Label = c("0", "1"), class = "factor")), .Names = c("var1", "var2", "var3", "var4", "var5", "var6", "var7", "var8", "outcome" ), row.names = c(NA, 20L), class = "data.frame")
model <- train(x = data %>% select(-outcome), y = data$outcome, method = 'awtan')
# Eventually dies with the following errors
#
# Something is wrong; all the Accuracy metric values are missing:
# Accuracy Kappa
# Min. : NA Min. : NA
# 1st Qu.: NA 1st Qu.: NA
# Median : NA Median : NA
# Mean :NaN Mean :NaN
# 3rd Qu.: NA 3rd Qu.: NA
# Max. : NA Max. : NA
# NA's :6 NA's :6
# Error: Stopping
# In addition: There were 50 or more warnings (use warnings() to see the first 50)
#
#
# > warnings()
# Warning messages:
# 1: model fit failed for Resample01: score=loglik, smooth=1 Error : 'lpawnb' is not an exported object from 'namespace:bnclassify'
#
# 2: model fit failed for Resample01: score=bic, smooth=1 Error : 'lpawnb' is not an exported object from 'namespace:bnclassify'
#
# 3: model fit failed for Resample01: score=aic, smooth=1 Error : 'lpawnb' is not an exported object from 'namespace:bnclassify'
Using the same data, I'm able to build a model using bnclassify's functions, so I'm guessing it's a bug in caret, calling lpawnb() when the appropriate functions is bnclassify::lp(), but again, I'm unclear as to how to confirm this.
Can anyone shed any light on what I might be doing wrong (before I blame package developers much smarted than I am)?
It appears that the current version of caret on CRAN, 6.0-78, defines the fit function for awtan models using bnclassify::lpawnb(), while bnclassify has replaced it with lp():
getModelInfo('awtan')
...
$awtan$fit
function (x, y, wts, param, lev, last, classProbs, ...)
{
dat <- if (is.data.frame(x))
x
else as.data.frame(x)
dat$.outcome <- y
struct <- bnclassify::tan_cl(class = ".outcome", dataset = dat,
score = as.character(param$score))
bnclassify::lpawnb(struct, dat, smooth = param$smooth, trees = 10,
bootstrap_size = 0.5, ...)
}
...

chi squared and basic statistics on multiple columns of a data frame

I would like to compute a chi squared test for each column in a dataframe and grouping for the variable Project.
Basically I would like to compute a two by two table for each column and then store the value in a new table.
Here an example of my dataframe.
structure(list(Project = structure(c(1L, 1L, 1L, 2L, 2L, 2L), .Label = c("discovery", "validation"), class = "factor"), MLL = c(1L, 1L, 1L, 1L, 1L, 1L), CREB = c(0L, 1L, 1L, 1L, 1L, 0L), TNR = c(1L, 1L, 0L, 0L, 1L, 1L)), .Names = c("Project", "MLL", "CREB", "TNR"), row.names = c(1L, 2L, 3L, 300L, 301L, 302L), class = "data.frame")
After the comment of Jaap I have tried:
pvalue <- data.frame(apply(cast_subset[-1] , 2 , function(i) chisq.test(table(cast_subset$Project , i ))$p.value))
colnames(pvalue) <- "p.value"
but i can not accces the column with the gene name for merging to other data set.

Resources