R - transform scaled and centered data to original values - r

If I scale and centre numeric column in a data frame (each value in each column divided by respective column SD and the mean of each respective column subtracted from each value) - how do I then back-transform to original values.
In the simple example below I see that the mean and SD of each column are stored in the object d4 after application of 'scale' with centering.
d1 <- as.data.frame(seq(1,20,1))
d2 <- as.data.frame(seq(0.11,0.3,0.01))
d3 <- cbind(d1,d2)
names(d3) <- c("A","B")
d4 <- scale(d3,center=TRUE)
d4
A B
[1,] -1.60579308 -1.60579308
[2,] -1.43676223 -1.43676223
[3,] -1.26773138 -1.26773138
[4,] -1.09870053 -1.09870053
[5,] -0.92966968 -0.92966968
[6,] -0.76063883 -0.76063883
[7,] -0.59160798 -0.59160798
[8,] -0.42257713 -0.42257713
[9,] -0.25354628 -0.25354628
[10,] -0.08451543 -0.08451543
[11,] 0.08451543 0.08451543
[12,] 0.25354628 0.25354628
[13,] 0.42257713 0.42257713
[14,] 0.59160798 0.59160798
[15,] 0.76063883 0.76063883
[16,] 0.92966968 0.92966968
[17,] 1.09870053 1.09870053
[18,] 1.26773138 1.26773138
[19,] 1.43676223 1.43676223
[20,] 1.60579308 1.60579308
attr(,"scaled:center")
A B
10.500 0.205
attr(,"scaled:scale")
A B
5.9160798 0.0591608
How can I now use the stored mean and SD values to compute from d4 the data frame of original values

We can do
r1 <- d4 * attr(d4, 'scaled:scale')[col(d4)] + attr(d4, 'scaled:center')[col(d4)]
all.equal(as.data.frame(r1), d3)
#[1] TRUE

Related

Optimum cut-off values

I have both univariate and multivariate logistic regression models and I want to find cut-off values with their respective sensitivity and specificity. I want to chose the best cut-off values for both my univariate and multivariate models.
I tried the following code for the univariate models but I am getting the sensitivity and specificity values in decimals. Is there any other way I can get the cut-off values as whole numbers rather than rounding up to the nearest integer? I am also not sure how to use the same code to get the cut-off values of the multivariate model.
Thank you for any help in advance!!
###Cut off values of Var1
library(pROC)
ok <- multiclass.roc(DATA$Outcome, DATA$Var1)
class_1 <- ok$rocs[[1]]
wants <- cbind(sensitivity = class_1$sensitivities, specificity = class_1$specificities, cutt_off = class_1$thresholds)
wants
I am getting the values:
sensitivity specificity cutt_off
[1,] 1.00000 0.000000 Inf
[2,] 1.00000 0.012346 73.500
[3,] 1.00000 0.024691 72.500
[4,] 1.00000 0.049383 71.500
[5,] 1.00000 0.061728 70.500
[6,] 1.00000 0.135802 69.500
[7,] 1.00000 0.172840 68.500
[8,] 0.94118 0.222222 67.500
[9,] 0.88235 0.283951 66.500
[10,] 0.88235 0.320988 65.750
[11,] 0.88235 0.333333 65.250
[12,] 0.88235 0.432099 64.500
[13,] 0.88235 0.506173 63.500
[14,] 0.82353 0.617284 62.500
[15,] 0.82353 0.629630 61.750
[16,] 0.76471 0.629630 61.250
[17,] 0.76471 0.691358 60.500
[18,] 0.70588 0.753086 59.750
[19,] 0.70588 0.777778 59.250
[20,] 0.70588 0.814815 58.500
[21,] 0.64706 0.827160 57.500
[22,] 0.64706 0.876543 56.500
[23,] 0.64706 0.901235 55.250
[24,] 0.58824 0.913580 54.250
[25,] 0.58824 0.938272 53.900
[26,] 0.52941 0.938272 53.400
[27,] 0.41176 0.938272 52.500
[28,] 0.35294 0.950617 51.835
[29,] 0.29412 0.950617 50.835
[30,] 0.29412 0.962963 49.000
[31,] 0.23529 0.975309 47.500
[32,] 0.17647 0.975309 46.000
[33,] 0.11765 0.987654 44.500
[34,] 0.00000 0.987654 42.500
[35,] 0.00000 1.000000 -Inf
To determine the cut-off values for the multivariate model, I tried the following code but I am getting errors. Also, my model comprises of both continuous and categorical values. Var1, Var2, Var3 are continuous variables and Var4 is categorical which was changed to 0, 1, & 2.
library(pROC)
ok <- multiclass.roc(DATA$Outcome, DATA$var1 + DATA$Var2 + DATA$Var3 + DATA$Var4)
class_1 <- ok$rocs[[1]]
wants <- cbind(sensitivity = class_1$sensitivities, specificity = class_1$specificities, cutt_off = class_1$thresholds)
wants

How to generate random numbers with normal distribution and uniform distribution

I am a newbie in R. Now, I want to create a matrix, and then extract 20 random Numbers from each of these three uniform distributions: U(0.6,0.8), U(0.0001,0.0003), U(100,110), and place them in the first three columns of the matrix, with each column corresponding to a uniform distribution. Then 20 random Numbers are extracted from each of the two normal distributions: N(7750,0.01), N(12,0.4), and placed in the last two columns of the matrix. My program is as follows, but can only achieve uniform distribution of random numbers, cannot achieve the first three columns are uniform distribution, the last two columns are the normal distribution of random numbers, How can I change it?
input <-5 # variable input
xinput <- 20 #sampling number
range <- matrix(c(0.60,0.80,
0.0001,0.0003,
100,110,
7700,8000,
10,15,
),nrow=input,ncol=2,byrow=TRUE)
range
rangeresult <- matrix(0, nrow=xinput, ncol=input)# empty matrix for latter data
rangeresult
##uniform distribution
for (i in 1:input){
set.seed(456+i) # make results reproducible
rangeresult[,i] <- runif(xinput,range[i,1],range[i,2])
}
Perhaps try this
cbind(
u1 = runif(20L, 0.6, 0.8),
u2 = runif(20L, 0.0001, 0.0003),
u3 = runif(20L, 100, 110),
n1 = rnorm(20L, 7750, 0.01),
n2 = rnorm(20L, 12, 0.4)
)
Output
u1 u2 u3 n1 n2
[1,] 0.7558480 0.0002851074 101.7209 7749.988 11.75270
[2,] 0.7807589 0.0002600877 104.9278 7749.998 11.67970
[3,] 0.7480385 0.0001562960 109.5744 7749.979 11.84603
[4,] 0.6283492 0.0001408027 108.9455 7749.999 12.00459
[5,] 0.7666862 0.0002485003 106.4735 7750.002 12.58783
[6,] 0.6354397 0.0001042544 107.0999 7749.982 12.36555
[7,] 0.7340912 0.0002507386 109.7052 7749.994 11.75111
[8,] 0.7220797 0.0001173221 105.7116 7749.995 11.35322
[9,] 0.6956138 0.0001478050 104.6444 7750.004 11.68879
[10,] 0.6146491 0.0001238944 108.5946 7750.006 12.78417
[11,] 0.7436676 0.0002492057 107.6073 7750.003 11.80814
[12,] 0.7916866 0.0001927277 100.1949 7750.016 12.16362
[13,] 0.7701075 0.0002236796 103.9207 7750.007 11.82555
[14,] 0.7151522 0.0001528767 101.0997 7749.996 11.75938
[15,] 0.6866158 0.0002872521 100.7036 7750.018 11.36261
[16,] 0.6106267 0.0001278512 105.8946 7749.986 11.81682
[17,] 0.6537794 0.0002875799 104.2015 7750.007 11.56224
[18,] 0.6095022 0.0001534366 108.9352 7749.993 12.22691
[19,] 0.7156714 0.0001303851 107.7274 7749.995 12.01923
[20,] 0.6397735 0.0002706792 109.6200 7749.986 12.01927
matrix(
c(runif(20, .6, .8),
runif(20, .0001, .0003),
runif(20, 100, 110),
rnorm(20, 7750, .01),
rnorm(20, 12, .4)),
ncol=5)
#> [,1] [,2] [,3] [,4] [,5]
#> [1,] 0.6303004 0.0002700728 102.6577 7750.008 12.10271
#> [2,] 0.7611678 0.0001594420 106.2736 7750.001 11.95071
#> [3,] 0.7217263 0.0002726162 105.9933 7749.993 12.16880
#> [4,] 0.7873636 0.0001409666 109.9674 7750.016 11.58212
#> [5,] 0.7329912 0.0002504620 105.8886 7750.005 11.62768
#> [6,] 0.6775068 0.0002546660 109.9630 7750.000 11.75542
#> [7,] 0.6927353 0.0001217041 105.5130 7750.004 12.46987
#> [8,] 0.7889347 0.0001849753 105.8204 7750.002 11.96011
#> [9,] 0.7555766 0.0001712631 104.6053 7750.013 12.77534
#> [10,] 0.6225500 0.0001441519 101.4559 7750.011 11.62323
#> [11,] 0.6004412 0.0002862156 100.7426 7750.015 12.34398
#> [12,] 0.7896445 0.0001871342 103.5566 7750.002 11.18040
#> [13,] 0.7995510 0.0002998966 101.2008 7750.005 11.79095
#> [14,] 0.7271423 0.0001385434 108.3129 7750.006 11.85577
#> [15,] 0.7990341 0.0001868429 102.3255 7749.974 12.00426
#> [16,] 0.7711383 0.0001362412 108.1071 7749.995 11.62242
#> [17,] 0.7168780 0.0001821163 103.0949 7750.021 12.35856
#> [18,] 0.7197489 0.0002015831 109.4623 7749.981 11.46613
#> [19,] 0.7006335 0.0001257633 100.9744 7750.001 12.03066
#> [20,] 0.7503335 0.0002953110 102.1582 7749.989 12.54394

r igraph - how does plot() read the layout matrix?

My question is related to this one here, which unfortunately has not been responded. I'm trying to automatically annotate text next to highlighted communities on a plot. An intermediate step is to understand how nodes are placed on a plot.
G <- make_graph('zachary')
l <- layout_with_fr(G)
l
A layout is a matrix with rows representing nodes and columns representing the x and y plot parameters.
[,1] [,2]
[1,] 2.8510654 -2.2404898
[2,] 2.7183497 -1.1815130
[3,] 3.1429205 0.1117099
[4,] 1.5585372 -1.0743325
[5,] 2.2808632 -4.2035479
[6,] 2.1698198 -5.0526766
[7,] 1.4938068 -4.6975884
[8,] 1.9710816 -1.4672218
[9,] 3.5407035 0.5407852
[10,] 2.2222909 1.9079805
[11,] 3.0784642 -4.5828448
[12,] 4.4115351 -4.1057462
[13,] 0.6002378 -2.2432049
[14,] 2.5010525 -0.1563341
[15,] 4.8914673 4.1417759
[16,] 3.2053338 3.9212694
[17,] 1.1825200 -6.4099021
[18,] 3.7155897 -2.8354432
[19,] 3.8272351 4.2660906
[20,] 3.8636487 -0.5671906
[21,] 2.7302411 3.3998888
[22,] 1.6084374 -2.7407388
[23,] 4.3432855 3.8101278
[24,] 5.9392042 2.2364929
[25,] 6.9980077 0.2389222
[26,] 7.1608499 1.1360134
[27,] 6.0171481 4.0279067
[28,] 5.4996627 1.0367163
[29,] 4.4961257 0.9434659
[30,] 5.5987563 3.2314488
[31,] 2.9958404 1.2022317
[32,] 5.1188900 0.2919268
[33,] 4.1088296 2.5032294
[34,] 4.1686534 2.1339884
But the x, y coordinates of the plot go from -1 to 1, unlike the min-max coordinates in the layout matrix. So how is plot(G, layout = l) reading the layout matrix?
The according to the source, the plot method for objects of class igraph simply rescales the matrix from -1 to 1.
library(igraph)
set.seed(3)
l <- layout_with_fr(G)
[,1] [,2]
[1,] -2.283 0.658
[2,] -1.289 -0.108
[3,] 0.146 1.012
[4,] -1.523 1.601
#... with 30 more rows.
plot(G,layout = l)
maxs <- apply(l, 2, max)
mins <- apply(l, 2, min)
ll <- scale(l, center=(maxs+mins)/2, scale=(maxs-mins)/2)
ll
[,1] [,2]
[1,] -0.2422 -0.1051
[2,] -0.0704 -0.3821
[3,] 0.1775 0.0228
[4,] -0.1108 0.2357
#... with 30 more rows.
plot(G,layout = ll)
Note that the actual rescaling is performed with igraph::norm_coords:
igraph::norm_coords(l)
[,1] [,2]
[1,] -0.2422 -0.1051
[2,] -0.0704 -0.3821
[3,] 0.1775 0.0228
[4,] -0.1108 0.2357
#... with 30 more rows.

Extracting element from list of lists in R?

I have a list in the following format:
[[825]][[4]]
Each of the 4 inside list elements are different sized and dimensioned arrays:
[[1]]
[1] 0.02918644 0.03239657 0.03560670 0.03881683 0.04202696 0.04523709 0.04844722 0.05165735
[9] 0.05486748 0.05807761 0.06128774 0.06449787 0.06770800 0.07091813 0.07412827 0.07733840
[17] 0.08054853 0.08375866 0.08696879 0.09017892
[[2]]
[1] 0.7581078 0.7587820 0.7608009 0.7641538 0.7688234 0.7747857 0.7820113 0.7904655 0.8001093
[10] 0.8109003 0.8244816 0.8444896 0.8706241 0.9023530 0.9391094 0.9803280 1.0254709 1.0740433
[19] 1.1256013 1.1797536
[[3]]
[,1] [,2] [,3]
[1,] 0.4177711 0.34606863 2.361603e-01
[2,] 0.4345125 0.35491274 2.105747e-01
[3,] 0.4512540 0.36375685 1.849892e-01
[4,] 0.4679954 0.37260096 1.594036e-01
[5,] 0.4847369 0.38144507 1.338180e-01
[6,] 0.5014783 0.39028918 1.082325e-01
[7,] 0.5182198 0.39913329 8.264693e-02
[8,] 0.5349612 0.40797740 5.706137e-02
[9,] 0.5517027 0.41682150 3.147581e-02
[10,] 0.5684441 0.42566561 5.890257e-03
[11,] 0.6059978 0.39400216 0.000000e+00
[12,] 0.6497759 0.35022414 0.000000e+00
[13,] 0.6935539 0.30644612 0.000000e+00
[14,] 0.7373319 0.26266811 -2.408519e-18
[15,] 0.7811099 0.21889009 -6.394265e-19
[16,] 0.8248879 0.17511207 1.129666e-18
[17,] 0.8686659 0.13133405 2.898758e-18
[18,] 0.9124440 0.08755604 4.667850e-18
[19,] 0.9562220 0.04377802 6.436942e-18
[20,] 1.0000000 0.00000000 0.000000e+00
[[4]]
[,1]
[1,] 0.03849906
[2,] 0.04269549
[3,] 0.04680160
[4,] 0.05079714
[5,] 0.05466400
[6,] 0.05838658
[7,] 0.06195207
[8,] 0.06535055
[9,] 0.06857498
[10,] 0.07162115
[11,] 0.07433489
[12,] 0.07637498
[13,] 0.07776951
[14,] 0.07859245
[15,] 0.07893464
[16,] 0.07889032
[17,] 0.07854784
[18,] 0.07798443
[19,] 0.07726429
[20,] 0.07643877
I want to have 4 new lists, each with 825 elements:
[[4]][[825]]
For example, all the [[1]]'s, [[2]]'s etc. from the list of 825 should be combined.
What's the best way to do this? I've been trying to figure it out with some sort of apply..
First create an example list of lists:
big.lst <- lapply(1:825, function(x) rep(list(rnorm(10)), 4))
#check lengths
length(big.lst)
#[1] 825
unique(lengths(big.lst))
#[1] 4
Then lapply a subset over the big list. I chose 1:4 to create four new groups, but you can genralize with 1:length(big.lst[[1]]) as each sublist has the same length:
newlst <- lapply(1:4, function(x) lapply(big.lst, '[[', x))
#verify answer
length(newlst)
#[1] 4
unique(lengths(newlst))
#[1] 825

Select columns from nested lists in r

I have a list with 50 elements, and each element is a 21x2 matrix. I want to pull every first column so that I will be able to multiply the first column of each 21x2 matrix by another matrix.
Example data:
x<-replicate(50,cbind(rnorm(21,0,1),rnorm(21,1,1)))
x<-lapply(seq(dim(x)[3]), function(i) x[ , , i])
x[[1]]
[,1] [,2]
[1,] -1.00653872 1.2780327
[2,] -0.30442989 -0.6854457
[3,] -1.05715492 -0.3464085
[4,] 0.12005815 1.1885382
[5,] 0.93834177 1.4968285
[6,] 0.85975400 1.3084381
[7,] 0.91980222 -0.1580829
[8,] 0.35785346 1.7679500
[9,] -1.03510124 2.2865753
[10,] -0.74853505 0.5148834
[11,] -1.23582377 0.8514812
[12,] 0.69546075 0.8294420
[13,] 0.08527011 1.7080554
[14,] -0.81635552 0.7492530
[15,] 0.53826428 -0.3058294
[16,] 0.16545497 0.4415540
[17,] -0.27144363 0.8299643
[18,] 0.02851933 1.2673526
[19,] 1.86516449 0.3009744
[20,] -0.46998359 -0.3232826
[21,] -0.60222069 2.3836219
assign <- rep(c(0,1),times=c(10,11))
If I do
x[[1]][,1]*assign
I get what I'm looking for, but I want to be able to do this for all elements of x without a for-loop.
I tried
alt<-lapply(x, `[[`, 1)
but this only gives the first element of the first columns, whereas I want the whole vector.
Any suggestions?
Try using split to split each matrix by columns and take the first one
sapply(x, function(mat) split(mat, col(mat))[1])
You could also try simplify2array
simplify2array(x)[,1,]

Resources