Optimum cut-off values - r

I have both univariate and multivariate logistic regression models and I want to find cut-off values with their respective sensitivity and specificity. I want to chose the best cut-off values for both my univariate and multivariate models.
I tried the following code for the univariate models but I am getting the sensitivity and specificity values in decimals. Is there any other way I can get the cut-off values as whole numbers rather than rounding up to the nearest integer? I am also not sure how to use the same code to get the cut-off values of the multivariate model.
Thank you for any help in advance!!
###Cut off values of Var1
library(pROC)
ok <- multiclass.roc(DATA$Outcome, DATA$Var1)
class_1 <- ok$rocs[[1]]
wants <- cbind(sensitivity = class_1$sensitivities, specificity = class_1$specificities, cutt_off = class_1$thresholds)
wants
I am getting the values:
sensitivity specificity cutt_off
[1,] 1.00000 0.000000 Inf
[2,] 1.00000 0.012346 73.500
[3,] 1.00000 0.024691 72.500
[4,] 1.00000 0.049383 71.500
[5,] 1.00000 0.061728 70.500
[6,] 1.00000 0.135802 69.500
[7,] 1.00000 0.172840 68.500
[8,] 0.94118 0.222222 67.500
[9,] 0.88235 0.283951 66.500
[10,] 0.88235 0.320988 65.750
[11,] 0.88235 0.333333 65.250
[12,] 0.88235 0.432099 64.500
[13,] 0.88235 0.506173 63.500
[14,] 0.82353 0.617284 62.500
[15,] 0.82353 0.629630 61.750
[16,] 0.76471 0.629630 61.250
[17,] 0.76471 0.691358 60.500
[18,] 0.70588 0.753086 59.750
[19,] 0.70588 0.777778 59.250
[20,] 0.70588 0.814815 58.500
[21,] 0.64706 0.827160 57.500
[22,] 0.64706 0.876543 56.500
[23,] 0.64706 0.901235 55.250
[24,] 0.58824 0.913580 54.250
[25,] 0.58824 0.938272 53.900
[26,] 0.52941 0.938272 53.400
[27,] 0.41176 0.938272 52.500
[28,] 0.35294 0.950617 51.835
[29,] 0.29412 0.950617 50.835
[30,] 0.29412 0.962963 49.000
[31,] 0.23529 0.975309 47.500
[32,] 0.17647 0.975309 46.000
[33,] 0.11765 0.987654 44.500
[34,] 0.00000 0.987654 42.500
[35,] 0.00000 1.000000 -Inf
To determine the cut-off values for the multivariate model, I tried the following code but I am getting errors. Also, my model comprises of both continuous and categorical values. Var1, Var2, Var3 are continuous variables and Var4 is categorical which was changed to 0, 1, & 2.
library(pROC)
ok <- multiclass.roc(DATA$Outcome, DATA$var1 + DATA$Var2 + DATA$Var3 + DATA$Var4)
class_1 <- ok$rocs[[1]]
wants <- cbind(sensitivity = class_1$sensitivities, specificity = class_1$specificities, cutt_off = class_1$thresholds)
wants

Related

How to generate random numbers with normal distribution and uniform distribution

I am a newbie in R. Now, I want to create a matrix, and then extract 20 random Numbers from each of these three uniform distributions: U(0.6,0.8), U(0.0001,0.0003), U(100,110), and place them in the first three columns of the matrix, with each column corresponding to a uniform distribution. Then 20 random Numbers are extracted from each of the two normal distributions: N(7750,0.01), N(12,0.4), and placed in the last two columns of the matrix. My program is as follows, but can only achieve uniform distribution of random numbers, cannot achieve the first three columns are uniform distribution, the last two columns are the normal distribution of random numbers, How can I change it?
input <-5 # variable input
xinput <- 20 #sampling number
range <- matrix(c(0.60,0.80,
0.0001,0.0003,
100,110,
7700,8000,
10,15,
),nrow=input,ncol=2,byrow=TRUE)
range
rangeresult <- matrix(0, nrow=xinput, ncol=input)# empty matrix for latter data
rangeresult
##uniform distribution
for (i in 1:input){
set.seed(456+i) # make results reproducible
rangeresult[,i] <- runif(xinput,range[i,1],range[i,2])
}
Perhaps try this
cbind(
u1 = runif(20L, 0.6, 0.8),
u2 = runif(20L, 0.0001, 0.0003),
u3 = runif(20L, 100, 110),
n1 = rnorm(20L, 7750, 0.01),
n2 = rnorm(20L, 12, 0.4)
)
Output
u1 u2 u3 n1 n2
[1,] 0.7558480 0.0002851074 101.7209 7749.988 11.75270
[2,] 0.7807589 0.0002600877 104.9278 7749.998 11.67970
[3,] 0.7480385 0.0001562960 109.5744 7749.979 11.84603
[4,] 0.6283492 0.0001408027 108.9455 7749.999 12.00459
[5,] 0.7666862 0.0002485003 106.4735 7750.002 12.58783
[6,] 0.6354397 0.0001042544 107.0999 7749.982 12.36555
[7,] 0.7340912 0.0002507386 109.7052 7749.994 11.75111
[8,] 0.7220797 0.0001173221 105.7116 7749.995 11.35322
[9,] 0.6956138 0.0001478050 104.6444 7750.004 11.68879
[10,] 0.6146491 0.0001238944 108.5946 7750.006 12.78417
[11,] 0.7436676 0.0002492057 107.6073 7750.003 11.80814
[12,] 0.7916866 0.0001927277 100.1949 7750.016 12.16362
[13,] 0.7701075 0.0002236796 103.9207 7750.007 11.82555
[14,] 0.7151522 0.0001528767 101.0997 7749.996 11.75938
[15,] 0.6866158 0.0002872521 100.7036 7750.018 11.36261
[16,] 0.6106267 0.0001278512 105.8946 7749.986 11.81682
[17,] 0.6537794 0.0002875799 104.2015 7750.007 11.56224
[18,] 0.6095022 0.0001534366 108.9352 7749.993 12.22691
[19,] 0.7156714 0.0001303851 107.7274 7749.995 12.01923
[20,] 0.6397735 0.0002706792 109.6200 7749.986 12.01927
matrix(
c(runif(20, .6, .8),
runif(20, .0001, .0003),
runif(20, 100, 110),
rnorm(20, 7750, .01),
rnorm(20, 12, .4)),
ncol=5)
#> [,1] [,2] [,3] [,4] [,5]
#> [1,] 0.6303004 0.0002700728 102.6577 7750.008 12.10271
#> [2,] 0.7611678 0.0001594420 106.2736 7750.001 11.95071
#> [3,] 0.7217263 0.0002726162 105.9933 7749.993 12.16880
#> [4,] 0.7873636 0.0001409666 109.9674 7750.016 11.58212
#> [5,] 0.7329912 0.0002504620 105.8886 7750.005 11.62768
#> [6,] 0.6775068 0.0002546660 109.9630 7750.000 11.75542
#> [7,] 0.6927353 0.0001217041 105.5130 7750.004 12.46987
#> [8,] 0.7889347 0.0001849753 105.8204 7750.002 11.96011
#> [9,] 0.7555766 0.0001712631 104.6053 7750.013 12.77534
#> [10,] 0.6225500 0.0001441519 101.4559 7750.011 11.62323
#> [11,] 0.6004412 0.0002862156 100.7426 7750.015 12.34398
#> [12,] 0.7896445 0.0001871342 103.5566 7750.002 11.18040
#> [13,] 0.7995510 0.0002998966 101.2008 7750.005 11.79095
#> [14,] 0.7271423 0.0001385434 108.3129 7750.006 11.85577
#> [15,] 0.7990341 0.0001868429 102.3255 7749.974 12.00426
#> [16,] 0.7711383 0.0001362412 108.1071 7749.995 11.62242
#> [17,] 0.7168780 0.0001821163 103.0949 7750.021 12.35856
#> [18,] 0.7197489 0.0002015831 109.4623 7749.981 11.46613
#> [19,] 0.7006335 0.0001257633 100.9744 7750.001 12.03066
#> [20,] 0.7503335 0.0002953110 102.1582 7749.989 12.54394

r igraph - how does plot() read the layout matrix?

My question is related to this one here, which unfortunately has not been responded. I'm trying to automatically annotate text next to highlighted communities on a plot. An intermediate step is to understand how nodes are placed on a plot.
G <- make_graph('zachary')
l <- layout_with_fr(G)
l
A layout is a matrix with rows representing nodes and columns representing the x and y plot parameters.
[,1] [,2]
[1,] 2.8510654 -2.2404898
[2,] 2.7183497 -1.1815130
[3,] 3.1429205 0.1117099
[4,] 1.5585372 -1.0743325
[5,] 2.2808632 -4.2035479
[6,] 2.1698198 -5.0526766
[7,] 1.4938068 -4.6975884
[8,] 1.9710816 -1.4672218
[9,] 3.5407035 0.5407852
[10,] 2.2222909 1.9079805
[11,] 3.0784642 -4.5828448
[12,] 4.4115351 -4.1057462
[13,] 0.6002378 -2.2432049
[14,] 2.5010525 -0.1563341
[15,] 4.8914673 4.1417759
[16,] 3.2053338 3.9212694
[17,] 1.1825200 -6.4099021
[18,] 3.7155897 -2.8354432
[19,] 3.8272351 4.2660906
[20,] 3.8636487 -0.5671906
[21,] 2.7302411 3.3998888
[22,] 1.6084374 -2.7407388
[23,] 4.3432855 3.8101278
[24,] 5.9392042 2.2364929
[25,] 6.9980077 0.2389222
[26,] 7.1608499 1.1360134
[27,] 6.0171481 4.0279067
[28,] 5.4996627 1.0367163
[29,] 4.4961257 0.9434659
[30,] 5.5987563 3.2314488
[31,] 2.9958404 1.2022317
[32,] 5.1188900 0.2919268
[33,] 4.1088296 2.5032294
[34,] 4.1686534 2.1339884
But the x, y coordinates of the plot go from -1 to 1, unlike the min-max coordinates in the layout matrix. So how is plot(G, layout = l) reading the layout matrix?
The according to the source, the plot method for objects of class igraph simply rescales the matrix from -1 to 1.
library(igraph)
set.seed(3)
l <- layout_with_fr(G)
[,1] [,2]
[1,] -2.283 0.658
[2,] -1.289 -0.108
[3,] 0.146 1.012
[4,] -1.523 1.601
#... with 30 more rows.
plot(G,layout = l)
maxs <- apply(l, 2, max)
mins <- apply(l, 2, min)
ll <- scale(l, center=(maxs+mins)/2, scale=(maxs-mins)/2)
ll
[,1] [,2]
[1,] -0.2422 -0.1051
[2,] -0.0704 -0.3821
[3,] 0.1775 0.0228
[4,] -0.1108 0.2357
#... with 30 more rows.
plot(G,layout = ll)
Note that the actual rescaling is performed with igraph::norm_coords:
igraph::norm_coords(l)
[,1] [,2]
[1,] -0.2422 -0.1051
[2,] -0.0704 -0.3821
[3,] 0.1775 0.0228
[4,] -0.1108 0.2357
#... with 30 more rows.

Extract contour vertices from a dataframe

Hi guys and thanks in advance for your help.
I have a three-column dataframe, two with coordinates for my data (x and y) and a value of brain activity (z). Out of 7505 rows there are many coordinates with null data that I need to exclude for my statistical analysis.
I'm using the package ImageSCC (https://rdrr.io/github/funstatpackages/ImageSCC/man/) so I need to extract the boundaries or contour of my data, meaning that I need a two-column list of the coordinates that separate brain activity data from null data. This is an example provided by the package:
$Brain.V1
V1 V2
[1,] 0.07781920 0.33867403
[2,] 0.07781920 0.56408840
[3,] 0.07781920 0.65469613
[4,] 0.07968313 0.43812155
[5,] 0.10950606 0.50000000
[6,] 0.11323392 0.25690608
[7,] 0.12068966 0.73425414
[8,] 0.16728798 0.18176796
[9,] 0.19897484 0.83812155
[10,] 0.23625349 0.10441989
[11,] 0.26663278 0.63322031
[12,] 0.28808616 0.46804766
[13,] 0.28939153 0.30397021
[14,] 0.30335508 0.91325967
[15,] 0.31081081 0.04033149
[16,] 0.37862235 0.75747676
[17,] 0.42193552 0.19377116
[18,] 0.42823858 0.02044199
[19,] 0.43196645 0.94419890
[20,] 0.44787179 0.57753089
[21,] 0.45363822 0.38319933
[22,] 0.54193849 0.02554144
[23,] 0.54231688 0.73945984
[24,] 0.55125815 0.92756906
[25,] 0.58275116 0.25043703
[26,] 0.59755088 0.47843551
[27,] 0.63513514 0.89662983
[28,] 0.64072693 0.07616022
[29,] 0.66888720 0.64103671
[30,] 0.71390307 0.35218048
[31,] 0.72087605 0.84138122
[32,] 0.75069897 0.15050829
[33,] 0.78631797 0.50829246
[34,] 0.79543336 0.77508287
[35,] 0.84016775 0.26132597
[36,] 0.85507922 0.69552486
[37,] 0.92845294 0.39287293
[38,] 0.92845294 0.60502762
I have tried with the package 'contoureR' but every time I run my code Rstudio crashes and reboots session. This is a sample of my code:
#install.packages("contoureR")
library(contoureR)
x = 1:ncol(df)
y = 1:nrow(df)
z = expand.grid(x=x,y=y)
z$z = apply(z,1,function(xx){df[xx[1],xx[2]]})
z$z[is.nan(z$z)] <- 0
cl = getContourLines(z)
Does anyone have other idea about how could I extract the boundaries of my data?
Thanks in advance.

R - transform scaled and centered data to original values

If I scale and centre numeric column in a data frame (each value in each column divided by respective column SD and the mean of each respective column subtracted from each value) - how do I then back-transform to original values.
In the simple example below I see that the mean and SD of each column are stored in the object d4 after application of 'scale' with centering.
d1 <- as.data.frame(seq(1,20,1))
d2 <- as.data.frame(seq(0.11,0.3,0.01))
d3 <- cbind(d1,d2)
names(d3) <- c("A","B")
d4 <- scale(d3,center=TRUE)
d4
A B
[1,] -1.60579308 -1.60579308
[2,] -1.43676223 -1.43676223
[3,] -1.26773138 -1.26773138
[4,] -1.09870053 -1.09870053
[5,] -0.92966968 -0.92966968
[6,] -0.76063883 -0.76063883
[7,] -0.59160798 -0.59160798
[8,] -0.42257713 -0.42257713
[9,] -0.25354628 -0.25354628
[10,] -0.08451543 -0.08451543
[11,] 0.08451543 0.08451543
[12,] 0.25354628 0.25354628
[13,] 0.42257713 0.42257713
[14,] 0.59160798 0.59160798
[15,] 0.76063883 0.76063883
[16,] 0.92966968 0.92966968
[17,] 1.09870053 1.09870053
[18,] 1.26773138 1.26773138
[19,] 1.43676223 1.43676223
[20,] 1.60579308 1.60579308
attr(,"scaled:center")
A B
10.500 0.205
attr(,"scaled:scale")
A B
5.9160798 0.0591608
How can I now use the stored mean and SD values to compute from d4 the data frame of original values
We can do
r1 <- d4 * attr(d4, 'scaled:scale')[col(d4)] + attr(d4, 'scaled:center')[col(d4)]
all.equal(as.data.frame(r1), d3)
#[1] TRUE

system is computationally singular error

I am using fastICA package in r. In this package, I am using fastICA function, which have some parameters. If I set n.comp to 2, that works fine, but if I set this parameter to 3 or more in this function:
ica<-fastICA(datalist,n.comp=3)
datalist is here a matrix with 20 rows and 4 columns:
[,1] [,2] [,3] [,4]
[1,] 567.00 324.225 281.0889 538.25
[2,] 557.75 317.500 269.5556 529.15
[3,] 543.75 309.900 264.5778 515.95
[4,] 557.00 316.225 265.0889 528.25
[5,] 538.25 307.750 266.6667 510.95
[6,] 531.25 301.025 250.0222 503.70
[7,] 545.00 311.800 270.9333 517.40
[8,] 550.00 316.925 284.3778 522.65
[9,] 514.75 290.300 235.6000 487.75
[10,] 518.00 293.800 245.1556 491.20
[11,] 553.75 318.125 281.6667 526.00
[12,] 563.50 325.925 297.2667 535.75
[13,] 540.00 303.300 241.1556 511.40
[14,] 546.00 310.350 261.6444 517.90
[15,] 567.25 324.425 281.4889 538.50
[16,] 577.75 330.125 285.2222 548.40
[17,] 560.75 317.425 262.3778 531.60
[18,] 570.00 323.925 272.8222 540.65
[19,] 569.00 324.700 278.8444 540.00
[20,] 565.50 324.150 284.1333 537.00
I am getting this error:
Error in solve.default(w %*% t(w)) :
system is computationally singular: reciprocal condition number = 1.16873e-16
could you please say me why I am getting this error and how can I solve it?
In solve(), use a smaller tolerance, like solve(..., tol = 1e-17).
This should be fine since you get reciprocal condition number = 1.16873e-16.
More info in the help file and this related question.

Resources