Doing multiple ttests within a dataframe. - r

I would like to do a series of ttests on a dataset:
specifically, i'd like to do seperate ttest for rats 1-5 vs 6-10 for every gene.
I've tried to do this:
>goi2 <- (goi[-1])
control <- goi2[1:5,]
stress <- goi2[6:10,]
for (i in 1:92){
x <- control[,i]
y <- stress[,i]
x <= t.test(x, y)
# print(x=i)
}
but i get this error:
Error: Can't use matrix or array for column indexing
I've tried a few varieties of this but cant figure out why this wont work.
Im a complete newb to R, but not programming in general.
Dataset:
Gene,Rat_1,Rat_2,Rat_3,Rat_4,Rat_5,Rat_6,Rat_7,Rat_8,Rat_9,Rat_10
Oprd1,2.746,1.387,2.25,3.363,3.191,2.432,1.985,1.75,2.752,1.771
Grin2a,3.134,2.644,2.962,5.168,2.484,3.54,2.596,1.535,3.197,2.232
Grin2d(2),4.496,5.528,2.631,4.684,3.934,6.047,0.98,0.077,4.381,2.327
Oprm1,1.998,1.804,1.611,1.712,3.672,3.215,0.249,1.248,1.758,2.671
Scn2b,137.35,97.158,113.65,141.93,77.295,133.02,88.872,75.586,108.96,97.626
Ntf3,0.989,1.835,1.604,1.133,0.889,0.782,0.918,2.241,2.216,3.921
Scn1a(2),9.224,7.369,10.145,14.242,17.262,11.535,8.144,7.166,13.625,6.604
Ntrk2(2),21.929,17.018,14.799,19.783,14.632,24.421,14.235,9.344,16.658,17.913
Cacna1c,4.585,3.637,3.948,4.135,3.403,5.381,4.193,3.162,3.455,3.695
Grin2b,3.273,2.57,2.101,2.922,1.826,3.338,2.121,1.416,2.973,2.005
Scn9a(2),0.319,0,0,0.453,0.434,0.376,0,0,0.346,0.469
Gria4(2),10.867,8.156,7.889,9.236,14.134,10.574,8.404,8.179,9.442,7.982
Cacna1e(2),1.805,1.783,2.045,1.968,1.405,1.807,0.973,0.993,0.857,1.769
Gria3,4.237,4.188,3.901,5.221,6.439,3.993,3.421,4.012,4.452,4.631
Gria1,8.284,7.942,7.557,12.001,3.976,9.472,7.653,4.16,7.971,5.381
Kcnj5,3.089,2.046,3.332,3.392,2.168,3.786,3.865,1.414,2.37,2.009
Cacna1b(2),11.071,8.716,8.246,9.594,7.189,11.62,6.028,4.481,9.307,9.074
Scn5a,1.301,1.017,0.714,1.401,0.449,1.183,1.065,0.292,0.823,0.714
Scn2a(2),3.286,2.119,2.257,2.024,1.902,3.441,1.327,1.072,2.576,2.09
Scn10a,0.037,0.069,0.087,0.076,0.082,0.095,0.052,0.019,0.078,0.045
Cacna1g(2),6.543,5.095,5.463,8.404,3.084,7.359,5.746,4.682,5.969,4.315
Cacna1e(3),5.37,4.002,3.313,4.803,2.665,5.623,3.296,1.953,3.827,4.092
Bdnf(4),0.869,0.509,0.996,1.032,0.256,0.742,0.498,0.531,0.994,0.473
Scn4a,0.284,0.278,0.359,0.45,0.761,0.31,0.319,0.27,0.366,0.273
Scn5a(2),0.256,0.477,0.587,0.283,0,0.564,0.044,0.023,0.204,0.15
Gabra1,51.019,44.3,57.609,81.522,40.853,64.921,68.263,31.766,58.006,39.518
Scn8a,6.854,14.666,5.416,12.347,4.823,14.935,7.014,16.684,9.686,17.44
Kcnj3,17.047,14.3,13.741,14.363,14.01,13.268,12.172,10.718,15.374,13.048
Slc6a2,107.9,69.941,91.704,36.411,112.57,114.5,23.398,63.848,53.323,135.26
Grin3a,6.952,5.676,7.301,12.557,3.65,10.628,9.783,4.286,8.015,4.499
Cnr1,20.261,16.981,19.996,26.469,12.709,24.705,25.548,10.61,19.746,14.64
Scn1b,13.732,15.763,5.03,20.68,17.788,14.959,16.298,24.682,22.477,15.117
Gria1(2),2.709,3.667,2.51,2.9,2.134,1.93,4.308,2.59,2.487,1.742
Scn3a(2),1.439,2.614,0,0.352,0,1.358,1.027,0,0.452,0.586
Scn11a,0.058,0.292,0.036,0.127,0.058,0.06,0.074,0.164,0.047,0.05
Gria1(3),25.283,17.779,22.725,32.705,8.823,28.727,26.915,12.876,23.545,17.879
Cacna1f,0.056,0.067,0.14,0.123,0.04,0.182,0.072,0.083,0.077,0.097
Cacna1a,20.791,19.816,17.613,21.663,15.697,22.824,16.737,16.719,16.604,20.469
Gria4,8.51,7.107,8.342,9.338,7.46,8.877,7.673,6.341,8.393,9.555
Scn8a,6.738,14.706,4.172,11.467,2.552,10.757,6.021,15.222,3.588,11.333
Grin2d,20.398,15.794,22.521,24.693,16.97,24.108,24.19,21.016,18.314,19.044
Gria3(2),15.301,13.087,13.918,14.433,12.282,14.914,12.198,11.602,13.738,15.481
Oprk1(2),6.66,4.97,7.604,10.281,2.151,10.462,10.278,1.525,6.869,4.902
Scn1b(3),46.553,42.795,49.498,55.558,64.101,38.178,44.1,59.033,43.837,39.382
Cacna1h,9.145,7.295,8.7,8.028,5.415,10.799,8.21,6.332,8.455,7.683
Scn2a,36.803,29.975,30.609,38.334,19.053,39.127,31.146,23.066,30.896,32.345
Cacna1g,5.489,5.213,6.24,7.896,3.97,4.876,6.283,5.464,6.08,3.692
Ntrk2(3),147.81,152.45,153.46,136.09,181.1,156.85,219.8,164.53,156.64,147.92
Scn1a,9.222,9.162,9.659,13.83,12.679,8.088,11.45,10.406,9.503,6.827
Grin1(3),69.943,68.01,76.358,81.029,63.692,83.424,70.981,80.088,69.821,70.764
Grin3b(2),2.065,1.265,1.45,1.576,3.875,1.441,1.822,1.964,2.286,0.965
Gabra2(2),2.268,1.251,1.638,2.844,2.93,2.934,3.725,1.724,1.455,2.674
Scn1b2(2),161.76,164.24,213.24,209.19,235.38,172.98,207.33,216.96,198.26,130.93
Oprm1(2),4.046,5.181,2.362,1.925,0.806,2.232,1.178,1.491,3.259,3.751
Cacna1c(3),0.077,0.194,0.23,0,0.132,0.127,0,0.035,0.09,0.092
Ntrk2,27.139,26.028,23.881,27.22,22.259,30.728,22.381,19.782,24.704,30.85
Cacna1d(2),2.126,2.263,2.038,2.1,1.995,2.966,1.943,2.01,2.317,2.214
Scn3a,21.272,16.356,16.245,14.875,11.825,19.753,10.994,11.08,16.905,19.832
Grin1(2),76.771,65.788,66.059,78.716,33.91,88.228,73.859,47.717,70.674,61.275
Grina,672.31,705.45,679.04,623.4,597.51,742.12,619.74,662.95,665.18,781.29
Cacna1e,2.448,1.981,1.506,2.003,1.318,3.052,1.953,0.814,2.17,2.482
Bdnf(2),1.853,2.128,2.553,1.996,0.663,2.5,2.385,0.468,1.922,1.481
Fos,18.402,24.653,23.038,20.615,8.027,38.444,20.836,11.756,20.823,20.296
Scn4b,23.772,27.874,25.388,25.109,51.926,20.291,25.521,28.701,30.256,17.344
Slc6a2(3),480.05,455.95,307.6,186.82,376.96,447.61,123.5,409.58,347.86,681.04
Ntf3(3),1.87,3.561,2.421,3.133,2.134,2.327,1.712,2.32,1.735,3.497
Bdnf(3),0.319,0.09,0.665,0.187,0.107,0.185,0.394,0.264,0.21,0.345
Scn3b,112.86,115.29,99.711,96.245,71.741,122.34,85.875,88.906,102.88,132.13
Grin2c,14.224,15.944,15.473,21.936,32.732,13.98,20.168,23.958,14.541,17.402
Gabrd,0.701,3.542,0.532,5.222,5.593,0.133,2.954,0.961,0.506,2.152
Cacna1b,16.935,15.764,14.475,15.639,10.655,19.408,14.115,14.079,14.26,16.737
Slc18a2,433.92,429.22,293.57,164.53,287.51,370.72,93.973,283.12,321.49,551.07
Cacnb1(2),16.456,5.099,16.969,4.469,12.471,5.143,14.017,10.049,17.537,4.26
Gabrg1,40.614,37.373,43.103,39.253,47.768,41.202,51.665,37.74,42.17,39.097
Grin1,1.235,0.812,0.909,1.605,0.513,1.371,1.596,1.346,1.213,0.922
Slc6a2(2),138.21,136.75,34.759,38.393,25.89,87.126,0,0.467,99.703,137.66
Galr3,2.691,2.51,2.517,4.446,0.727,2.933,4.041,2.08,2.638,1.456
Oprm1(3),7.273,7.676,7.08,6.196,5.515,9.023,2.57,4.8,7.699,10.471
Gabrq,70.623,67.728,51.095,42.456,43.156,77.924,28.63,32.975,54.192,87.697
Gria4(3),25.846,26.045,24.37,37.866,18.037,26.907,31.423,21.292,26.795,24.642
Cacna1c(2),0.644,0.894,0.831,1.084,0.721,1.026,0.817,0.371,1.333,1.015
Cacna1d(3),0.299,0.406,0.127,0.319,0.319,0.231,0.178,0.075,0.18,0.405
Cacnb1,47.24,51.505,42.702,48.718,33.28,60.334,38.611,41.827,40.352,56.132
Scn7a,2.351,2.38,2.114,1.96,0.316,2.647,1.945,1.219,2.559,1.498
Cacna1d,2.661,2.733,2.714,2.649,2.403,2.923,3.216,2.768,2.401,2.302
Gabra2,25.209,26.731,23.249,25.599,20.17,22.928,24.072,18.664,23.808,23.306
Scn9a,3.209,3.106,3.212,3.206,1.094,3.35,3.994,1.934,2.883,2.046
Ntf3(2),2.347,2.282,2.112,1.025,1.762,2.029,0.501,1.652,2.717,1.982
Gria2,12.726,12.997,12.74,15.615,7.156,14.375,13.387,11.682,12.968,11.332
Bdnf,0.703,0.777,1.034,0.571,0.166,1.164,0.549,0.325,0.801,1.12
Gria2(2),17.769,17.694,16.62,18.603,11.295,19.926,18.044,13.594,16.946,17.712
Bdnf(5),1.321,2.152,1.882,2.397,1.598,3.072,3.038,1.53,2.04,1.464

Here's a working sample using just base R. Using your goi:
str(goi)
# 'data.frame': 92 obs. of 11 variables:
# $ Gene : chr "Oprd1" "Grin2a" "Grin2d(2)" "Oprm1" ...
# $ Rat_1 : num 2.75 3.13 4.5 2 137.35 ...
# $ Rat_2 : num 1.39 2.64 5.53 1.8 97.16 ...
# $ Rat_3 : num 2.25 2.96 2.63 1.61 113.65 ...
# $ Rat_4 : num 3.36 5.17 4.68 1.71 141.93 ...
# $ Rat_5 : num 3.19 2.48 3.93 3.67 77.3 ...
# $ Rat_6 : num 2.43 3.54 6.05 3.21 133.02 ...
# $ Rat_7 : num 1.985 2.596 0.98 0.249 88.872 ...
# $ Rat_8 : num 1.75 1.535 0.077 1.248 75.586 ...
# $ Rat_9 : num 2.75 3.2 4.38 1.76 108.96 ...
# $ Rat_10: num 1.77 2.23 2.33 2.67 97.63 ...
control <- goi[,2:6]
stress <- goi[,7:11]
Now, instead of using for loop and processing each return as we calculate it, let's calculate everything, store the complete object for each test within the list, and preserve the opportunity to grab whatever we want from all tests afterwards.
results <- lapply(seq_len(nrow(goi)),
function(i) t.test(control[i,], stress[i,]))
length(results)
# [1] 92
Each element of results is the return value from a single call of t.test.
results[[1]]
# Welch Two Sample t-test
# data: control[i, ] and stress[i, ]
# t = 1.1034, df = 6.2218, p-value = 0.3107
# alternative hypothesis: true difference in means is not equal to 0
# 95 percent confidence interval:
# -0.5386851 1.4374851
# sample estimates:
# mean of x mean of y
# 2.5874 2.1380
You can access any component of the test results:
names(results[[1]])
# [1] "statistic" "parameter" "p.value" "conf.int" "estimate"
# [6] "null.value" "alternative" "method" "data.name"
head( sapply(results, `[[`, "p.value") )
# [1] 0.3107098 0.3083295 0.2626753 0.6245368 0.4406157 0.2800657
head( t(sapply(results, `[[`, "conf.int")) )
# [,1] [,2]
# [1,] -0.5386851 1.4374851
# [2,] -0.7513650 2.0681650
# [3,] -1.5018657 4.4862657
# [4,] -1.1880098 1.8504098
# [5,] -23.5402499 48.8678499
# [6,] -2.2762668 0.8250668
NB: one of R's many nuances is the fact that the *apply family will return a matrix that some might think is transposed from what it should be. Because f this, calls that return a matrix will benefit from being sandwiched in t(...). (This is a great opportunity to press the "I Believe" button and move on.)
You can combine all of these results into a single data.frame with something like:
namefunc <- function(x, nameroot) { dimnames(x) <- list(NULL, paste0(nameroot, seq_len(ncol(x)))) ; x ; }
(That was a small helper function to make the following slightly easier to read. It's a very naïve naming convention, used only to keep the columns unique for now.)
test_results <- cbind.data.frame(
statistic = sapply(results, `[[`, "statistic"),
p.value = sapply(results, `[[`, "p.value"),
parameter = sapply(results, `[[`, "parameter"),
namefunc( t(sapply(results, `[[`, "conf.int")), "conf" ),
namefunc( t(sapply(results, `[[`, "estimate")), "est" )
)
head(test_results)
# statistic p.value parameter conf1 conf2 est1 est2
# 1 1.1033554 0.3107098 6.221806 -0.5386851 1.4374851 2.5874 2.1380
# 2 1.0948456 0.3083295 7.312678 -0.7513650 2.0681650 3.2784 2.6200
# 3 1.2480711 0.2626753 5.480699 -1.5018657 4.4862657 4.2546 2.7624
# 4 0.5107431 0.6245368 7.337202 -1.1880098 1.8504098 2.1594 1.8282
# 5 0.8134064 0.4406157 7.633546 -23.5402499 48.8678499 113.4766 100.8128
# 6 -1.2161356 0.2800657 4.824393 -2.2762668 0.8250668 1.2900 2.0156
There is definitely room here to use packages from the tidyverse as RobertMc suggested. For that, I recommend dplyr and tidyr, though perhaps broom has utility here as well.

Related

Extracting P-value column from output Anova (car package)

I am using the 'car' package function Anova for some statistical testing.
It gives the following output:
Y = cbind(curdata$V1, curdata$V2, curdata$V3)
mymdl = lm(Y ~ curdata$V4 + curdata$V5)
myanova = Anova(mymdl)
Type II MANOVA Tests: Pillai test statistic
Df test stat approx F num Df den Df Pr(>F)
curdata$V4 1 0.27941 2.9728 3 23 0.05280 .
curdata$V5 1 0.33570 3.8743 3 23 0.02228 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
I would like to extract the values in the 'Pr(>F)' column, so I can place these p-values in another matrix for later correction of multiple comparisons.
I have tried using unlist, but it still does not provide the p-values found in the column.
Any help with this would be greatly appreciated.
If we have multiple response variables, it is a Manova. We could capture the output and use regex
as.numeric(sub(".*\\s*(\\d+\\.[0-9e-]+)\\s*[*.]*", "\\1", capture.output(out)[4:5]))
#[1] 8.836e-06 2.200e-16
data
mymdl <- lm(cbind(Sepal.Length, Sepal.Width) ~ Petal.Width +
Petal.Length, data = iris)
out <- Anova(mymdl)
Maybe not the most practical way, but you can play around columns using separate() from tidyr:
library(car)
library(dplyr)
library(tidyr)
#Code
v1 <- data.frame(capture.output(myanova))
v1 <- v1[3:5,,drop=F]
names(v1)<-'v1'
v2 <- separate(v1,v1,c(paste0('v',1:21)),sep = '\\s')
v2 <- v2[-1,]
Output:
as.numeric(v2$v21)
[1] 8.836e-06 2.200e-16
Warning: you would need to change 1:21 if necessary if more columns are present in the capture action.
TLDR:
# define helper:
get_summary_for_print <- car:::print.Anova.mlm
body(get_summary_for_print) <- local({tmp <- body(get_summary_for_print);tmp[-(length(tmp)-(0:1))]})
#use it:
get_summary_for_print(Anova(mymdl))$`Pr(>F)`
Unfortunately there is no designated way. But you can look at the source of car:::print.Anova.mlm (by typing this in the R console) to learn how it gets the values you want:
function (x, ...)
{
if ((!is.null(x$singular)) && x$singular)
stop("singular error SSP matrix; multivariate tests unavailable\ntry summary(object, multivariate=FALSE)")
test <- x$test
repeated <- x$repeated
ntests <- length(x$terms)
tests <- matrix(NA, ntests, 4)
if (!repeated)
SSPE.qr <- qr(x$SSPE)
for (term in 1:ntests) {
eigs <- Re(eigen(qr.coef(if (repeated) qr(x$SSPE[[term]]) else SSPE.qr,
x$SSP[[term]]), symmetric = FALSE)$values)
tests[term, 1:4] <- switch(test, Pillai = Pillai(eigs,
x$df[term], x$error.df), Wilks = Wilks(eigs, x$df[term],
x$error.df), `Hotelling-Lawley` = HL(eigs, x$df[term],
x$error.df), Roy = Roy(eigs, x$df[term], x$error.df))
}
ok <- tests[, 2] >= 0 & tests[, 3] > 0 & tests[, 4] > 0
ok <- !is.na(ok) & ok
tests <- cbind(x$df, tests, pf(tests[ok, 2], tests[ok, 3],
tests[ok, 4], lower.tail = FALSE))
rownames(tests) <- x$terms
colnames(tests) <- c("Df", "test stat", "approx F", "num Df",
"den Df", "Pr(>F)")
tests <- structure(as.data.frame(tests), heading = paste("\nType ",
x$type, if (repeated)
" Repeated Measures", " MANOVA Tests: ", test, " test statistic",
sep = ""), class = c("anova", "data.frame"))
print(tests, ...)
invisible(x)
}
<bytecode: 0x56032ea80990>
<environment: namespace:car>
In this case, there is quite a few lines of code involved to compute the p-values. However, we can easily create a modified version of the print function to return the table (tests) instead of only printing it (print(tests, ...)) and returning the original object (invisible(x)):
get_summary_for_print <- car:::print.Anova.mlm # copy the original print function (inclusive environment)
body(get_summary_for_print) <- # replace the code of our copy
local({ # to avoid pollution of environment by tmp
tmp <- body(get_summary_for_print) # to avoid code duplication
tmp[-(length(tmp)-(0:1))] # remove the last two code lines of the function
})
And use it for example like this:
library(car)
#> Loading required package: carData
res <- Anova(lm(cbind(Sepal.Width, Sepal.Length, Petal.Width) ~ Species + Petal.Length, iris))
res
#>
#> Type II MANOVA Tests: Pillai test statistic
#> Df test stat approx F num Df den Df Pr(>F)
#> Species 2 0.70215 26.149 6 290 < 2.2e-16 ***
#> Petal.Length 1 0.63487 83.461 3 144 < 2.2e-16 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
str(get_summary_for_print(res))
#> Classes 'anova' and 'data.frame': 2 obs. of 6 variables:
#> $ Df : num 2 1
#> $ test stat: num 0.702 0.635
#> $ approx F : num 26.1 83.5
#> $ num Df : num 6 3
#> $ den Df : num 290 144
#> $ Pr(>F) : num 7.96e-25 2.41e-31
#> - attr(*, "heading")= chr "\nType II MANOVA Tests: Pillai test statistic"

Using glmnet on binomial data error

I imported some data as follows
surv <- read.table("http://www.stat.ufl.edu/~aa/glm/data/Student_survey.dat",header = T)
x <- as.matrix(select(surv,-ab))
y <- as.matrix(select(surv,ab))
glmnet::cv.glmnet(x,y,alpha=1,,family="binomial",type.measure = "auc")
and I am getting the following error.
NAs introduced by coercion
Show Traceback
Error in lognet(x, is.sparse, ix, jx, y, weights, offset, alpha, nobs, : NA/NaN/Inf in foreign function call (arg 5)
What is a good fix for this?
The documentation of the glmnet package has the information that you need,
surv <- read.table("http://www.stat.ufl.edu/~aa/glm/data/Student_survey.dat", header = T, stringsAsFactors = T)
x <- surv[, -which(colnames(surv) == 'ab')] # remove the 'ab' column
y <- surv[, 'ab'] # the 'binomial' family takes a factor as input (too)
xfact = sapply(1:ncol(x), function(y) is.factor(x[, y])) # separate the factor from the numeric columns
xfactCols = model.matrix(~.-1, data = x[, xfact]) # one option is to build dummy variables from the factors (the other option is to convert to numeric)
xall = as.matrix(cbind(x[, !xfact], xfactCols)) # cbind() numeric and dummy columns
fit = glmnet::cv.glmnet(xall,y,alpha=1,family="binomial",type.measure = "auc") # run glmnet error free
str(fit)
List of 10
$ lambda : num [1:89] 0.222 0.202 0.184 0.168 0.153 ...
$ cvm : num [1:89] 1.12 1.11 1.1 1.07 1.04 ...
$ cvsd : num [1:89] 0.211 0.212 0.211 0.196 0.183 ...
$ cvup : num [1:89] 1.33 1.32 1.31 1.27 1.23 ...
$ cvlo : num [1:89] 0.908 0.9 0.89 0.874 0.862 ...
$ nzero : Named int [1:89] 0 2 2 3 3 3 4 4 5 6 ...
.....
I have come across the same problem of mixed data types of numeric and character/factor. For converting the predictors, I recommend using a function that comes with the glmnet package for exactly this mixed data type problem: glmnet::makeX(). It handles the dummy creation and is even able to perform a simple imputation in case of missing data.
x <- glmnet::makeX(surv[, -which(colnames(surv) == 'ab')])
or more tidy-ish:
library(tidyverse)
x <-
surv %>%
select(-ab) %>%
glmnet::makeX()

How to export results from bootstrapping in R?

I have a time series of 540 observations which I resample 999 times using the following code:
boot.mean = function(x,i){boot.mean = mean(x[i])}
z1 = boot(x1, boot.mean, R=999)
z1
ORDINARY NONPARAMETRIC BOOTSTRAP
Call:
boot(data = x1, statistic = boot.mean, R = 999)
Bootstrap Statistics :
original bias std. error
t1* -0.009381397 -5.903801e-05 0.002524366
trying to export the results gives me the following error:
write.csv(z1, "z1.csv")
Error in as.data.frame.default(x[[i]], optional = TRUE, stringsAsFactors = stringsAsFactors) :
cannot coerce class ""boot"" to a data.frame
How can I export the results to a .csv file?
I am expecting to obtain a file with 540 observations 999 times, and the goal is to apply the approx_entropy function from the pracma package, to obtain 999 values for approximate entropy and plot the distribution in Latex.
First, please make sure that your example is reproducible. You can do so by generating a small x1 object, or by generating a random x1 vector:
> x1 <- rnorm(540)
Now, from your question:
I am expecting to obtain a file with 540 observations 999 times
However, this is not what you will get. You are generating 999 repetitions of the mean of the resampled data. That means that every bootstrap replicate is actually a single number.
From Heroka's comment:
Hint: look at str(z1).
The function str shows you the actual data inside the z1 object, without the pretty formatting.
> str(z1)
List of 11
$ t0 : num 0.0899
$ t : num [1:999, 1] 0.1068 0.1071 0.0827 0.1413 0.0914 ...
$ R : num 999
$ data : num [1:540] 1.02 1.27 1.82 -2.92 0.68 ...
(... lots of irrelevant stuff here ...)
- attr(*, "class")= chr "boot"
So your original data is stored as z1$data, and the data that you have bootstraped, which is the mean of each resampling, is stored in z1$t. Notice how it tells you the dimension of each slot: z1$t is 999 x 1.
Now, what you probably want to do is change the boot.mean function by a boot.identity function, which simply returns the resampled data. It goes like:
> boot.identity = function(x,i){x[i]}
> z1 = boot(x1, boot.identity, R=999)
> str(z1)
List of 11
$ t0 : num [1:540] 1.02 1.27 1.82 -2.92 0.68 ...
$ t : num [1:999, 1:540] -0.851 -0.434 -2.138 0.935 -0.493 ...
$ R : num 999
$ data : num [1:540] 1.02 1.27 1.82 -2.92 0.68 ...
(... etc etc etc ...)
And you can save this data with write.csv(z1$t, "z1.csv").

Error in match.names(clabs, nmi) - Linear programming model - R

I am applying the CCR Data Envelopment Analysis model to benchmark between stock data. To do that I am running R code from a DEA paper published here. This document comes with step by step instructions on how to implement the model below in R.
The mathematical formulation looks like this:
Finding the model I needed already made for me seemed too good to be true. I am getting this error when I run it:
Error in match.names(clabs, nmi) : names do not match previous names
Traceback:
4 stop("names do not match previous names")
3 match.names(clabs, nmi)
2 rbind(deparse.level, ...)
1 rbind(aux, c(inputs[i, ], rep(0, m)))
My test data looks as follows:
> dput(testdfst)
structure(list(Name = structure(1:10, .Label = c("Stock1", "Stock2",
"Stock3", "Stock4", "Stock5", "Stock6", "Stock7", "Stock8", "Stock9",
"Stock10"), class = "factor"), Date = structure(c(14917, 14917,
14917, 14917, 14917, 14917, 14917, 14917, 14917, 14917), class = "Date"),
`(Intercept)` = c(0.454991569278089, 1, 0, 0.459437188169979,
0.520523252955415, 0.827294243132907, 0.642696631099892,
0.166219881886161, 0.086341470900152, 0.882092217743293),
rmrf = c(0.373075150411683, 0.0349067218712968, 0.895550280607866,
1, 0.180151549474574, 0.28669170468735, 0.0939821798173586,
0, 0.269645291515763, 0.0900619760898984), smb = c(0.764987877309785,
0.509094491489323, 0.933653313048327, 0.355340700554647,
0.654000372286503, 1, 0, 0.221454091364611, 0.660571586102851,
0.545086931342479), hml = c(0.100608151187926, 0.155064872867367,
1, 0.464298576152336, 0.110803875258027, 0.0720803195598597,
0, 0.132407005239869, 0.059742053684015, 0.0661623383303703
), rmw = c(0.544512524466665, 0.0761995312858816, 1, 0, 0.507699534880555,
0.590607506295898, 0.460148690870041, 0.451871218073951,
0.801698199214685, 0.429094840372901), cma = c(0.671162426988512,
0.658898571758625, 0, 0.695830176886926, 0.567814542084284,
0.942862571603074, 1, 0.37571611336359, 0.72565234813082,
0.636762557753099), Returns = c(0.601347600017365, 0.806071701848376,
0.187500487065719, 0.602971876359073, 0.470386289298666,
0.655773224143057, 0.414258177255333, 0, 0.266112191477882,
1)), .Names = c("Name", "Date", "(Intercept)", "rmrf", "smb",
"hml", "rmw", "cma", "Returns"), row.names = c("Stock1.2010-11-04",
"Stock2.2010-11-04", "Stock3.2010-11-04", "Stock4.2010-11-04",
"Stock5.2010-11-04", "Stock6.2010-11-04", "Stock7.2010-11-04",
"Stock8.2010-11-04", "Stock9.2010-11-04", "Stock10.2010-11-04"
), class = "data.frame")
And the linear model program is this:
namesDMU <- testdfst[1]
inputs <- testdfst[c(4,5,6,7,8)]
outputs <- testdfst[9]
N <- dim(testdfst)[1] # number of DMU
s <- dim(inputs)[2] # number of inputs
m <- dim(outputs)[2] # number of outputs
f.rhs <- c(rep(0,N),1) # RHS constraints
f.dir <- c(rep("<=",N),"=") # directions of the constraints
aux <- cbind(-1*inputs,outputs) # matrix of constraint coefficients in (6)
for (i in 1:N) {
f.obj <- c(0*rep(1,s),outputs[i,]) # objective function coefficients
f.con <- rbind(aux ,c(inputs[i,], rep(0,m))) # add LHS of bTz=1
results <-lp("max",f.obj,f.con,f.dir,f.rhs,scale=1,compute.sens=TRUE) # solve LPP
multipliers <- results$solution # input and output weights
efficiency <- results$objval # efficiency score
duals <- results$duals # shadow prices
if (i==1) {
weights <- multipliers
effcrs <- efficiency
lambdas <- duals [seq(1,N)]
} else {
weights <- rbind(weights,multipliers)
effcrs <- rbind(effcrs , efficiency)
lambdas <- rbind(lambdas,duals[seq(1,N)])
}
}
Spotting the problem..
A quick search reveals that the rbind function might be at fault. This is located on this line:
f.con <- rbind(aux ,c(inputs[i,], rep(0,m)))
I tried to isolate the data from the loops to see what the problem is:
aux <- cbind(-1*inputs,outputs)
a <- c(inputs[1,])
b <- rep(0,m)
> aux
rmrf smb hml rmw cma Returns
1 -0.37307515 -0.7649879 -0.10060815 -0.54451252 -0.6711624 0.6013476
2 -0.03490672 -0.5090945 -0.15506487 -0.07619953 -0.6588986 0.8060717
3 -0.89555028 -0.9336533 -1.00000000 -1.00000000 0.0000000 0.1875005
4 -1.00000000 -0.3553407 -0.46429858 0.00000000 -0.6958302 0.6029719
5 -0.18015155 -0.6540004 -0.11080388 -0.50769953 -0.5678145 0.4703863
6 -0.28669170 -1.0000000 -0.07208032 -0.59060751 -0.9428626 0.6557732
7 -0.09398218 0.0000000 0.00000000 -0.46014869 -1.0000000 0.4142582
8 0.00000000 -0.2214541 -0.13240701 -0.45187122 -0.3757161 0.0000000
9 -0.26964529 -0.6605716 -0.05974205 -0.80169820 -0.7256523 0.2661122
10 -0.09006198 -0.5450869 -0.06616234 -0.42909484 -0.6367626 1.0000000
> a
$rmrf
[1] 0.3730752
$smb
[1] 0.7649879
$hml
[1] 0.1006082
$rmw
[1] 0.5445125
$cma
[1] 0.6711624
I also looked at this:
> identical(names(aux[1]), names(a[1]))
[1] TRUE
Column and row names are unimportant to me as long as the problem is calculated so I decided to try remove them. This one works but doesn't solve the problem.
rownames(testdfst) <- NULL
Looking at the contents of a and aux, maybe the problem lies with the column names.
colnames(testdfst) <- NULL does not work. It deletes everything in my data-frame. It could maybe... provide a solution to the problem if I can figure out how to remove the column names.
As you correctly identified, the following line is giving you the trouble:
i <- 1
f.con <- rbind(aux ,c(inputs[i,], rep(0,m))) # add LHS of bTz=1
# Error in match.names(clabs, nmi) : names do not match previous names
You can use the str function to see the structure of each element of this expression:
str(aux)
# 'data.frame': 10 obs. of 6 variables:
# $ rmrf : num -0.3731 -0.0349 -0.8956 -1 -0.1802 ...
# $ smb : num -0.765 -0.509 -0.934 -0.355 -0.654 ...
# $ hml : num -0.101 -0.155 -1 -0.464 -0.111 ...
# $ rmw : num -0.5445 -0.0762 -1 0 -0.5077 ...
# $ cma : num -0.671 -0.659 0 -0.696 -0.568 ...
# $ Returns: num 0.601 0.806 0.188 0.603 0.47 ...
str(inputs[i,])
# 'data.frame': 1 obs. of 5 variables:
# $ rmrf: num 0.373
# $ smb : num 0.765
# $ hml : num 0.101
# $ rmw : num 0.545
# $ cma : num 0.671
str(c(inputs[i,], rep(0, m)))
# List of 6
# $ rmrf: num 0.373
# $ smb : num 0.765
# $ hml : num 0.101
# $ rmw : num 0.545
# $ cma : num 0.671
# $ : num 0
Now you can see that the list you are trying to combine with rbind has different names from the data frame it's being combined with. Probably the simplest way to proceed would be to pass a vector as the new row instead of a list, which you can accomplish by converting inputs[i,] to a matrix with as.matrix:
str(c(as.matrix(inputs[i,]), rep(0, m)))
# num [1:6] 0.373 0.765 0.101 0.545 0.671 ...
This will cause the code to work without an error:
f.con <- rbind(aux, c(as.matrix(inputs[i,]), rep(0, m)))
A few unsolicited R coding tips -- instead of dim(x)[1] and dim(x)[2] to get the number of rows and columns, most would find it more readable to do nrow(x) and ncol(x). Also, building objects in a for loop by rbinding one row at a time can be very inefficient -- you can read more about that in the second circle of the R Inferno.

Return multiple lists in mapply [duplicate]

This question already has answers here:
Force mapply to return a list?
(2 answers)
Closed 8 years ago.
I have a function which I am applying across a list of data frames or matrices (df) using mapply. The function outputs four different types of data frames (ex. a:d) based on some criteria of transformation of the original data frame, but I am having issues as function will only let me output one. I have tried to collate them into a list, but when I run a simplified function, I get this.
df1<-matrix(rnorm(10), nrow=10, ncol=10)
df2<-df1
df<-list(df1, df2)
func<-function (x) {
a<-x*1
b<-x*2
c<-x*3
d<-x*4
return(list(a,b,c,d))
}
finalresult<-mapply(func, x=df)
str(finalresult)
List of 8
$ : num [1:10, 1:10] -3.211 -0.121 -0.2 0.491 1.118 ...
$ : num [1:10, 1:10] -6.422 -0.242 -0.4 0.982 2.235 ...
$ : num [1:10, 1:10] -9.633 -0.362 -0.6 1.473 3.353 ...
$ : num [1:10, 1:10] -12.844 -0.483 -0.8 1.964 4.471 ...
$ : num [1:10, 1:10] -3.211 -0.121 -0.2 0.491 1.118 ...
$ : num [1:10, 1:10] -6.422 -0.242 -0.4 0.982 2.235 ...
$ : num [1:10, 1:10] -9.633 -0.362 -0.6 1.473 3.353 ...
$ : num [1:10, 1:10] -12.844 -0.483 -0.8 1.964 4.471 ...
- attr(*, "dim")= int [1:2] 4 2
In this case you can see it is just giving me the list of the four output data frames (a:d) from df1 and then appends the next set of outputs from df2 right after. I want to have an output where each data frame transformation is put in a list where I can access it by category (ex. finalresults$a). Any help would be very much appreciated!
Thanks,
-Chelsea
How about:
Use mapply but don't simplify.
result1 <- mapply(func, x=df, SIMPLIFY=FALSE)
Iterate over the indices (this assumes both lists in the result are the same length); for each index i, use lapply to pull out the ith element of each list in result1.
result2 <- lapply(seq_along(result1[[1]]),
function(i) lapply(result1,"[[",i))
I tried a little bit to shorten/obfuscate this still further (i.e. remove the need to define an anonymous function), but with nested lapplys I can't quite figure out how to make it work.
Here is a solution that gives what you want. It basically names your list returns and then aggregates by the name.
df1<-matrix(rnorm(10), nrow=10, ncol=10)
df2<-df1
df<-list(df1, df2)
namesToReferTo = letters[1:4]
func<-function (x) {
a<-x*1; b<-x*2; c<-x*3; d<-x*4
retList = list(a,b,c,d)
names(retList)=namesToReferTo
return(retList)
}
finalresult = lapply(namesToReferTo, function(y) { lapply(lapply(df,func), function(x) { x[[y]] }) } )
# alternatively: finalresult = lapply(namesToReferTo, function(y) { Map(function(x) { return(x[[y]])} , lapply(df,func)) } )
names(finalresult) = namesToReferTo
str(finalresult)
finalresult$b

Resources