Error in match.names(clabs, nmi) - Linear programming model - R - r
I am applying the CCR Data Envelopment Analysis model to benchmark between stock data. To do that I am running R code from a DEA paper published here. This document comes with step by step instructions on how to implement the model below in R.
The mathematical formulation looks like this:
Finding the model I needed already made for me seemed too good to be true. I am getting this error when I run it:
Error in match.names(clabs, nmi) : names do not match previous names
Traceback:
4 stop("names do not match previous names")
3 match.names(clabs, nmi)
2 rbind(deparse.level, ...)
1 rbind(aux, c(inputs[i, ], rep(0, m)))
My test data looks as follows:
> dput(testdfst)
structure(list(Name = structure(1:10, .Label = c("Stock1", "Stock2",
"Stock3", "Stock4", "Stock5", "Stock6", "Stock7", "Stock8", "Stock9",
"Stock10"), class = "factor"), Date = structure(c(14917, 14917,
14917, 14917, 14917, 14917, 14917, 14917, 14917, 14917), class = "Date"),
`(Intercept)` = c(0.454991569278089, 1, 0, 0.459437188169979,
0.520523252955415, 0.827294243132907, 0.642696631099892,
0.166219881886161, 0.086341470900152, 0.882092217743293),
rmrf = c(0.373075150411683, 0.0349067218712968, 0.895550280607866,
1, 0.180151549474574, 0.28669170468735, 0.0939821798173586,
0, 0.269645291515763, 0.0900619760898984), smb = c(0.764987877309785,
0.509094491489323, 0.933653313048327, 0.355340700554647,
0.654000372286503, 1, 0, 0.221454091364611, 0.660571586102851,
0.545086931342479), hml = c(0.100608151187926, 0.155064872867367,
1, 0.464298576152336, 0.110803875258027, 0.0720803195598597,
0, 0.132407005239869, 0.059742053684015, 0.0661623383303703
), rmw = c(0.544512524466665, 0.0761995312858816, 1, 0, 0.507699534880555,
0.590607506295898, 0.460148690870041, 0.451871218073951,
0.801698199214685, 0.429094840372901), cma = c(0.671162426988512,
0.658898571758625, 0, 0.695830176886926, 0.567814542084284,
0.942862571603074, 1, 0.37571611336359, 0.72565234813082,
0.636762557753099), Returns = c(0.601347600017365, 0.806071701848376,
0.187500487065719, 0.602971876359073, 0.470386289298666,
0.655773224143057, 0.414258177255333, 0, 0.266112191477882,
1)), .Names = c("Name", "Date", "(Intercept)", "rmrf", "smb",
"hml", "rmw", "cma", "Returns"), row.names = c("Stock1.2010-11-04",
"Stock2.2010-11-04", "Stock3.2010-11-04", "Stock4.2010-11-04",
"Stock5.2010-11-04", "Stock6.2010-11-04", "Stock7.2010-11-04",
"Stock8.2010-11-04", "Stock9.2010-11-04", "Stock10.2010-11-04"
), class = "data.frame")
And the linear model program is this:
namesDMU <- testdfst[1]
inputs <- testdfst[c(4,5,6,7,8)]
outputs <- testdfst[9]
N <- dim(testdfst)[1] # number of DMU
s <- dim(inputs)[2] # number of inputs
m <- dim(outputs)[2] # number of outputs
f.rhs <- c(rep(0,N),1) # RHS constraints
f.dir <- c(rep("<=",N),"=") # directions of the constraints
aux <- cbind(-1*inputs,outputs) # matrix of constraint coefficients in (6)
for (i in 1:N) {
f.obj <- c(0*rep(1,s),outputs[i,]) # objective function coefficients
f.con <- rbind(aux ,c(inputs[i,], rep(0,m))) # add LHS of bTz=1
results <-lp("max",f.obj,f.con,f.dir,f.rhs,scale=1,compute.sens=TRUE) # solve LPP
multipliers <- results$solution # input and output weights
efficiency <- results$objval # efficiency score
duals <- results$duals # shadow prices
if (i==1) {
weights <- multipliers
effcrs <- efficiency
lambdas <- duals [seq(1,N)]
} else {
weights <- rbind(weights,multipliers)
effcrs <- rbind(effcrs , efficiency)
lambdas <- rbind(lambdas,duals[seq(1,N)])
}
}
Spotting the problem..
A quick search reveals that the rbind function might be at fault. This is located on this line:
f.con <- rbind(aux ,c(inputs[i,], rep(0,m)))
I tried to isolate the data from the loops to see what the problem is:
aux <- cbind(-1*inputs,outputs)
a <- c(inputs[1,])
b <- rep(0,m)
> aux
rmrf smb hml rmw cma Returns
1 -0.37307515 -0.7649879 -0.10060815 -0.54451252 -0.6711624 0.6013476
2 -0.03490672 -0.5090945 -0.15506487 -0.07619953 -0.6588986 0.8060717
3 -0.89555028 -0.9336533 -1.00000000 -1.00000000 0.0000000 0.1875005
4 -1.00000000 -0.3553407 -0.46429858 0.00000000 -0.6958302 0.6029719
5 -0.18015155 -0.6540004 -0.11080388 -0.50769953 -0.5678145 0.4703863
6 -0.28669170 -1.0000000 -0.07208032 -0.59060751 -0.9428626 0.6557732
7 -0.09398218 0.0000000 0.00000000 -0.46014869 -1.0000000 0.4142582
8 0.00000000 -0.2214541 -0.13240701 -0.45187122 -0.3757161 0.0000000
9 -0.26964529 -0.6605716 -0.05974205 -0.80169820 -0.7256523 0.2661122
10 -0.09006198 -0.5450869 -0.06616234 -0.42909484 -0.6367626 1.0000000
> a
$rmrf
[1] 0.3730752
$smb
[1] 0.7649879
$hml
[1] 0.1006082
$rmw
[1] 0.5445125
$cma
[1] 0.6711624
I also looked at this:
> identical(names(aux[1]), names(a[1]))
[1] TRUE
Column and row names are unimportant to me as long as the problem is calculated so I decided to try remove them. This one works but doesn't solve the problem.
rownames(testdfst) <- NULL
Looking at the contents of a and aux, maybe the problem lies with the column names.
colnames(testdfst) <- NULL does not work. It deletes everything in my data-frame. It could maybe... provide a solution to the problem if I can figure out how to remove the column names.
As you correctly identified, the following line is giving you the trouble:
i <- 1
f.con <- rbind(aux ,c(inputs[i,], rep(0,m))) # add LHS of bTz=1
# Error in match.names(clabs, nmi) : names do not match previous names
You can use the str function to see the structure of each element of this expression:
str(aux)
# 'data.frame': 10 obs. of 6 variables:
# $ rmrf : num -0.3731 -0.0349 -0.8956 -1 -0.1802 ...
# $ smb : num -0.765 -0.509 -0.934 -0.355 -0.654 ...
# $ hml : num -0.101 -0.155 -1 -0.464 -0.111 ...
# $ rmw : num -0.5445 -0.0762 -1 0 -0.5077 ...
# $ cma : num -0.671 -0.659 0 -0.696 -0.568 ...
# $ Returns: num 0.601 0.806 0.188 0.603 0.47 ...
str(inputs[i,])
# 'data.frame': 1 obs. of 5 variables:
# $ rmrf: num 0.373
# $ smb : num 0.765
# $ hml : num 0.101
# $ rmw : num 0.545
# $ cma : num 0.671
str(c(inputs[i,], rep(0, m)))
# List of 6
# $ rmrf: num 0.373
# $ smb : num 0.765
# $ hml : num 0.101
# $ rmw : num 0.545
# $ cma : num 0.671
# $ : num 0
Now you can see that the list you are trying to combine with rbind has different names from the data frame it's being combined with. Probably the simplest way to proceed would be to pass a vector as the new row instead of a list, which you can accomplish by converting inputs[i,] to a matrix with as.matrix:
str(c(as.matrix(inputs[i,]), rep(0, m)))
# num [1:6] 0.373 0.765 0.101 0.545 0.671 ...
This will cause the code to work without an error:
f.con <- rbind(aux, c(as.matrix(inputs[i,]), rep(0, m)))
A few unsolicited R coding tips -- instead of dim(x)[1] and dim(x)[2] to get the number of rows and columns, most would find it more readable to do nrow(x) and ncol(x). Also, building objects in a for loop by rbinding one row at a time can be very inefficient -- you can read more about that in the second circle of the R Inferno.
Related
I'm missing the second line in a ggplot, there should be test and train lines present?
I'm trying to use ggplot2 using R to graph a train and test curve for the iterative error rates of a neural network. There should be two lines but I'm only seeing just the test line, does anyone know what happened? It looks like when I used head(error_df) every type is labelled as test for some reason. Edit: even with just error_df without any subsets it's still not showing the line for the training set's error, this also includes various ranges such as error_df[2500:5000, 7500:10000,] Here's the ggplot graph: Here's the code and this is a link to a public google spreadsheet of the data: library(Rcpp) library(RSNNS) library(ggplot2) library(plotROC) library(tidyr) setwd("**set working directory**") data <- read.csv("WDBC.csv", header=T) data <- data[,1:4] data <- scale(data) # normalizes the data numHneurons3 = 3 DecTargets = decodeClassLabels(data[,4]) train.test3 <- splitForTrainingAndTest(data, DecTargets,ratio = 0.50) # split model3_02 <- mlp(train.test3$inputsTrain, train.test3$targetsTrain, # build model3 size = numHneurons3, learnFuncParams = c(0.02),maxit = 10000, inputsTest = train.test3$inputsTest, targetsTest = train.test3$targetsTest) #-------------------------------------- # GGPlots of the Iterative Error: #-------------------------------------- str(model3_02) test_error <- model3_02$IterativeTestError train_error <- model3_02$IterativeFitError error_df <- data.frame(iter = c(seq_along(test_error), seq_along(train_error)), Error = c(test_error, train_error), type = c(rep("test", length(test_error)), rep("train", length(train_error)) )) ggplot(error_df[5000:10000,], aes(iter, Error, color = type, each = length(test_error))) + geom_line() Here's also a snippet of the data, model, and data frame: > head(data, 10) PatientID radius texture perimeter [1,] -0.2361973 1.0960995 -2.0715123 1.26881726 [2,] -0.2361956 1.8282120 -0.3533215 1.68447255 [3,] 0.4313615 1.5784992 0.4557859 1.56512598 [4,] 0.4317407 -0.7682333 0.2535091 -0.59216612 [5,] 0.4318215 1.7487579 -1.1508038 1.77501133 [6,] -0.2361855 -0.4759559 -0.8346009 -0.38680772 [7,] -0.2361809 1.1698783 0.1605082 1.13712450 [8,] 0.4326197 -0.1184126 0.3581350 -0.07280278 [9,] -0.2361759 -0.3198854 0.5883121 -0.18391855 [10,] 0.4329621 -0.4731182 1.1044669 -0.32919213 > str(model3_02) List of 17 $ nInputs : int 4 $ maxit : num 10000 $ IterativeFitError : num [1:10000] 18838 4468 2365 1639 1278 ... $ IterativeTestError : num [1:10000] 7031 3006 1916 1431 1161 ... $ fitted.values : num [1:284, 1:522] 0.00386 0.00386 0.00387 0.00387 0.00386 ... $ fittedTestValues : num [1:285, 1:522] 0.00387 0.00387 0.00387 0.00387 0.00387 ... $ nOutputs : int 522 - attr(*, "class")= chr [1:2] "mlp" "rsnns" > head(error_df) iter Error type 1 1 7031.3101 test 2 2 3006.4253 test 3 3 1915.8997 test 4 4 1430.6152 test 5 5 1160.6987 test 6 6 990.2686 test
You created a data frame (error_df) with three columns by concatenating two variable together into one column, thus the variables were filled one after the other. However, you're limiting your plot from rows 5000 to 10000 of the data. ggplot(error_df[c(5000:10000, 15000:20000),], aes(iter, Error, color = type, each = length(test_error))) + geom_line() should show both curves.
How to bind rows of svymean output?
In order to apply weights etc. on survey data I am working with the survey package. There included is a wonderful function svymean() which gives me neat pairs of mean and standard errors. I now have several of these pairs and want them combine with rbind() into a data.frame. library(survey) data(fpc) fpc.w1 <- with(fpc, svydesign(ids = ~0, weights = weight, data = fpc)) fpc.w2 <- with(fpc, svydesign(ids = stratid, weights = weight, data = fpc)) (msd.1 <- svymean(fpc$x, fpc.w1)) # mean SE # [1,] 5.4481 0.7237 (msd.2 <- svymean(fpc$x, fpc.w2)) # mean SE # [1,] 5.4481 0.5465 rbind(msd.1, msd.2) # [,1] # msd.1 5.448148 # msd.2 5.448148 As one can see, the SE is missing. Examining the object yields following: class(msd.1) # [1] "svystat" str(msd.1) # Class 'svystat' atomic [1:1] 5.45 # ..- attr(*, "var")= num [1, 1] 0.524 # .. ..- attr(*, "dimnames")=List of 2 # .. .. ..$ : NULL # .. .. ..$ : NULL # ..- attr(*, "statistic")= chr "mean" So I made some guesswork. msd.1$mean # Error in msd.1$mean : $ operator is invalid for atomic vectors msd.1$SE # Error in msd.1$SE : $ operator is invalid for atomic vectors msd.1[2] # [1] NA msd.1[1, 2] # Error in msd.1[1, 2] : incorrect number of dimensions Included in the package is a function called SE() which yields: SE(msd.1) # [,1] # [1,] 0.723725 Ok. By this I finally could accomplish a solution to bind these rows: t(data.frame(msd.1=c(msd.1, SE(msd.1)), msd.2=c(msd.2, SE(msd.2)), row.names = c("mean", "SD"))) # mean SD # msd.1 5.448148 0.7237250 # msd.2 5.448148 0.5465021 Do I actually have to take this pain with the package to bind rows, or am I missing something?
You can just coerce the svymean output to a data frame, then rbind them together. do.call(rbind, lapply(list(msd.1, msd.2), as.data.frame)) mean SE 1 5.448148 0.7237250 2 5.448148 0.5465021 If you want to add names, you have to name the items in the list and then set USE.NAMES = TRUE in lapply do.call(rbind, lapply(list("msd.1"= msd.1, "msd.2" = msd.2), as.data.frame, USE.NAMES = TRUE)) mean SE msd.1 5.448148 0.7237250 msd.2 5.448148 0.5465021
Or a tidyverse option would be library(tidyverse) list(msd.1, msd.2) %>% map_df(as.tibble) # A tibble: 2 x 2 # mean SE # <dbl> <dbl> #1 5.448148 0.7237250 #2 5.448148 0.5465021
Using glmnet on binomial data error
I imported some data as follows surv <- read.table("http://www.stat.ufl.edu/~aa/glm/data/Student_survey.dat",header = T) x <- as.matrix(select(surv,-ab)) y <- as.matrix(select(surv,ab)) glmnet::cv.glmnet(x,y,alpha=1,,family="binomial",type.measure = "auc") and I am getting the following error. NAs introduced by coercion Show Traceback Error in lognet(x, is.sparse, ix, jx, y, weights, offset, alpha, nobs, : NA/NaN/Inf in foreign function call (arg 5) What is a good fix for this?
The documentation of the glmnet package has the information that you need, surv <- read.table("http://www.stat.ufl.edu/~aa/glm/data/Student_survey.dat", header = T, stringsAsFactors = T) x <- surv[, -which(colnames(surv) == 'ab')] # remove the 'ab' column y <- surv[, 'ab'] # the 'binomial' family takes a factor as input (too) xfact = sapply(1:ncol(x), function(y) is.factor(x[, y])) # separate the factor from the numeric columns xfactCols = model.matrix(~.-1, data = x[, xfact]) # one option is to build dummy variables from the factors (the other option is to convert to numeric) xall = as.matrix(cbind(x[, !xfact], xfactCols)) # cbind() numeric and dummy columns fit = glmnet::cv.glmnet(xall,y,alpha=1,family="binomial",type.measure = "auc") # run glmnet error free str(fit) List of 10 $ lambda : num [1:89] 0.222 0.202 0.184 0.168 0.153 ... $ cvm : num [1:89] 1.12 1.11 1.1 1.07 1.04 ... $ cvsd : num [1:89] 0.211 0.212 0.211 0.196 0.183 ... $ cvup : num [1:89] 1.33 1.32 1.31 1.27 1.23 ... $ cvlo : num [1:89] 0.908 0.9 0.89 0.874 0.862 ... $ nzero : Named int [1:89] 0 2 2 3 3 3 4 4 5 6 ... .....
I have come across the same problem of mixed data types of numeric and character/factor. For converting the predictors, I recommend using a function that comes with the glmnet package for exactly this mixed data type problem: glmnet::makeX(). It handles the dummy creation and is even able to perform a simple imputation in case of missing data. x <- glmnet::makeX(surv[, -which(colnames(surv) == 'ab')]) or more tidy-ish: library(tidyverse) x <- surv %>% select(-ab) %>% glmnet::makeX()
Doing multiple ttests within a dataframe.
I would like to do a series of ttests on a dataset: specifically, i'd like to do seperate ttest for rats 1-5 vs 6-10 for every gene. I've tried to do this: >goi2 <- (goi[-1]) control <- goi2[1:5,] stress <- goi2[6:10,] for (i in 1:92){ x <- control[,i] y <- stress[,i] x <= t.test(x, y) # print(x=i) } but i get this error: Error: Can't use matrix or array for column indexing I've tried a few varieties of this but cant figure out why this wont work. Im a complete newb to R, but not programming in general. Dataset: Gene,Rat_1,Rat_2,Rat_3,Rat_4,Rat_5,Rat_6,Rat_7,Rat_8,Rat_9,Rat_10 Oprd1,2.746,1.387,2.25,3.363,3.191,2.432,1.985,1.75,2.752,1.771 Grin2a,3.134,2.644,2.962,5.168,2.484,3.54,2.596,1.535,3.197,2.232 Grin2d(2),4.496,5.528,2.631,4.684,3.934,6.047,0.98,0.077,4.381,2.327 Oprm1,1.998,1.804,1.611,1.712,3.672,3.215,0.249,1.248,1.758,2.671 Scn2b,137.35,97.158,113.65,141.93,77.295,133.02,88.872,75.586,108.96,97.626 Ntf3,0.989,1.835,1.604,1.133,0.889,0.782,0.918,2.241,2.216,3.921 Scn1a(2),9.224,7.369,10.145,14.242,17.262,11.535,8.144,7.166,13.625,6.604 Ntrk2(2),21.929,17.018,14.799,19.783,14.632,24.421,14.235,9.344,16.658,17.913 Cacna1c,4.585,3.637,3.948,4.135,3.403,5.381,4.193,3.162,3.455,3.695 Grin2b,3.273,2.57,2.101,2.922,1.826,3.338,2.121,1.416,2.973,2.005 Scn9a(2),0.319,0,0,0.453,0.434,0.376,0,0,0.346,0.469 Gria4(2),10.867,8.156,7.889,9.236,14.134,10.574,8.404,8.179,9.442,7.982 Cacna1e(2),1.805,1.783,2.045,1.968,1.405,1.807,0.973,0.993,0.857,1.769 Gria3,4.237,4.188,3.901,5.221,6.439,3.993,3.421,4.012,4.452,4.631 Gria1,8.284,7.942,7.557,12.001,3.976,9.472,7.653,4.16,7.971,5.381 Kcnj5,3.089,2.046,3.332,3.392,2.168,3.786,3.865,1.414,2.37,2.009 Cacna1b(2),11.071,8.716,8.246,9.594,7.189,11.62,6.028,4.481,9.307,9.074 Scn5a,1.301,1.017,0.714,1.401,0.449,1.183,1.065,0.292,0.823,0.714 Scn2a(2),3.286,2.119,2.257,2.024,1.902,3.441,1.327,1.072,2.576,2.09 Scn10a,0.037,0.069,0.087,0.076,0.082,0.095,0.052,0.019,0.078,0.045 Cacna1g(2),6.543,5.095,5.463,8.404,3.084,7.359,5.746,4.682,5.969,4.315 Cacna1e(3),5.37,4.002,3.313,4.803,2.665,5.623,3.296,1.953,3.827,4.092 Bdnf(4),0.869,0.509,0.996,1.032,0.256,0.742,0.498,0.531,0.994,0.473 Scn4a,0.284,0.278,0.359,0.45,0.761,0.31,0.319,0.27,0.366,0.273 Scn5a(2),0.256,0.477,0.587,0.283,0,0.564,0.044,0.023,0.204,0.15 Gabra1,51.019,44.3,57.609,81.522,40.853,64.921,68.263,31.766,58.006,39.518 Scn8a,6.854,14.666,5.416,12.347,4.823,14.935,7.014,16.684,9.686,17.44 Kcnj3,17.047,14.3,13.741,14.363,14.01,13.268,12.172,10.718,15.374,13.048 Slc6a2,107.9,69.941,91.704,36.411,112.57,114.5,23.398,63.848,53.323,135.26 Grin3a,6.952,5.676,7.301,12.557,3.65,10.628,9.783,4.286,8.015,4.499 Cnr1,20.261,16.981,19.996,26.469,12.709,24.705,25.548,10.61,19.746,14.64 Scn1b,13.732,15.763,5.03,20.68,17.788,14.959,16.298,24.682,22.477,15.117 Gria1(2),2.709,3.667,2.51,2.9,2.134,1.93,4.308,2.59,2.487,1.742 Scn3a(2),1.439,2.614,0,0.352,0,1.358,1.027,0,0.452,0.586 Scn11a,0.058,0.292,0.036,0.127,0.058,0.06,0.074,0.164,0.047,0.05 Gria1(3),25.283,17.779,22.725,32.705,8.823,28.727,26.915,12.876,23.545,17.879 Cacna1f,0.056,0.067,0.14,0.123,0.04,0.182,0.072,0.083,0.077,0.097 Cacna1a,20.791,19.816,17.613,21.663,15.697,22.824,16.737,16.719,16.604,20.469 Gria4,8.51,7.107,8.342,9.338,7.46,8.877,7.673,6.341,8.393,9.555 Scn8a,6.738,14.706,4.172,11.467,2.552,10.757,6.021,15.222,3.588,11.333 Grin2d,20.398,15.794,22.521,24.693,16.97,24.108,24.19,21.016,18.314,19.044 Gria3(2),15.301,13.087,13.918,14.433,12.282,14.914,12.198,11.602,13.738,15.481 Oprk1(2),6.66,4.97,7.604,10.281,2.151,10.462,10.278,1.525,6.869,4.902 Scn1b(3),46.553,42.795,49.498,55.558,64.101,38.178,44.1,59.033,43.837,39.382 Cacna1h,9.145,7.295,8.7,8.028,5.415,10.799,8.21,6.332,8.455,7.683 Scn2a,36.803,29.975,30.609,38.334,19.053,39.127,31.146,23.066,30.896,32.345 Cacna1g,5.489,5.213,6.24,7.896,3.97,4.876,6.283,5.464,6.08,3.692 Ntrk2(3),147.81,152.45,153.46,136.09,181.1,156.85,219.8,164.53,156.64,147.92 Scn1a,9.222,9.162,9.659,13.83,12.679,8.088,11.45,10.406,9.503,6.827 Grin1(3),69.943,68.01,76.358,81.029,63.692,83.424,70.981,80.088,69.821,70.764 Grin3b(2),2.065,1.265,1.45,1.576,3.875,1.441,1.822,1.964,2.286,0.965 Gabra2(2),2.268,1.251,1.638,2.844,2.93,2.934,3.725,1.724,1.455,2.674 Scn1b2(2),161.76,164.24,213.24,209.19,235.38,172.98,207.33,216.96,198.26,130.93 Oprm1(2),4.046,5.181,2.362,1.925,0.806,2.232,1.178,1.491,3.259,3.751 Cacna1c(3),0.077,0.194,0.23,0,0.132,0.127,0,0.035,0.09,0.092 Ntrk2,27.139,26.028,23.881,27.22,22.259,30.728,22.381,19.782,24.704,30.85 Cacna1d(2),2.126,2.263,2.038,2.1,1.995,2.966,1.943,2.01,2.317,2.214 Scn3a,21.272,16.356,16.245,14.875,11.825,19.753,10.994,11.08,16.905,19.832 Grin1(2),76.771,65.788,66.059,78.716,33.91,88.228,73.859,47.717,70.674,61.275 Grina,672.31,705.45,679.04,623.4,597.51,742.12,619.74,662.95,665.18,781.29 Cacna1e,2.448,1.981,1.506,2.003,1.318,3.052,1.953,0.814,2.17,2.482 Bdnf(2),1.853,2.128,2.553,1.996,0.663,2.5,2.385,0.468,1.922,1.481 Fos,18.402,24.653,23.038,20.615,8.027,38.444,20.836,11.756,20.823,20.296 Scn4b,23.772,27.874,25.388,25.109,51.926,20.291,25.521,28.701,30.256,17.344 Slc6a2(3),480.05,455.95,307.6,186.82,376.96,447.61,123.5,409.58,347.86,681.04 Ntf3(3),1.87,3.561,2.421,3.133,2.134,2.327,1.712,2.32,1.735,3.497 Bdnf(3),0.319,0.09,0.665,0.187,0.107,0.185,0.394,0.264,0.21,0.345 Scn3b,112.86,115.29,99.711,96.245,71.741,122.34,85.875,88.906,102.88,132.13 Grin2c,14.224,15.944,15.473,21.936,32.732,13.98,20.168,23.958,14.541,17.402 Gabrd,0.701,3.542,0.532,5.222,5.593,0.133,2.954,0.961,0.506,2.152 Cacna1b,16.935,15.764,14.475,15.639,10.655,19.408,14.115,14.079,14.26,16.737 Slc18a2,433.92,429.22,293.57,164.53,287.51,370.72,93.973,283.12,321.49,551.07 Cacnb1(2),16.456,5.099,16.969,4.469,12.471,5.143,14.017,10.049,17.537,4.26 Gabrg1,40.614,37.373,43.103,39.253,47.768,41.202,51.665,37.74,42.17,39.097 Grin1,1.235,0.812,0.909,1.605,0.513,1.371,1.596,1.346,1.213,0.922 Slc6a2(2),138.21,136.75,34.759,38.393,25.89,87.126,0,0.467,99.703,137.66 Galr3,2.691,2.51,2.517,4.446,0.727,2.933,4.041,2.08,2.638,1.456 Oprm1(3),7.273,7.676,7.08,6.196,5.515,9.023,2.57,4.8,7.699,10.471 Gabrq,70.623,67.728,51.095,42.456,43.156,77.924,28.63,32.975,54.192,87.697 Gria4(3),25.846,26.045,24.37,37.866,18.037,26.907,31.423,21.292,26.795,24.642 Cacna1c(2),0.644,0.894,0.831,1.084,0.721,1.026,0.817,0.371,1.333,1.015 Cacna1d(3),0.299,0.406,0.127,0.319,0.319,0.231,0.178,0.075,0.18,0.405 Cacnb1,47.24,51.505,42.702,48.718,33.28,60.334,38.611,41.827,40.352,56.132 Scn7a,2.351,2.38,2.114,1.96,0.316,2.647,1.945,1.219,2.559,1.498 Cacna1d,2.661,2.733,2.714,2.649,2.403,2.923,3.216,2.768,2.401,2.302 Gabra2,25.209,26.731,23.249,25.599,20.17,22.928,24.072,18.664,23.808,23.306 Scn9a,3.209,3.106,3.212,3.206,1.094,3.35,3.994,1.934,2.883,2.046 Ntf3(2),2.347,2.282,2.112,1.025,1.762,2.029,0.501,1.652,2.717,1.982 Gria2,12.726,12.997,12.74,15.615,7.156,14.375,13.387,11.682,12.968,11.332 Bdnf,0.703,0.777,1.034,0.571,0.166,1.164,0.549,0.325,0.801,1.12 Gria2(2),17.769,17.694,16.62,18.603,11.295,19.926,18.044,13.594,16.946,17.712 Bdnf(5),1.321,2.152,1.882,2.397,1.598,3.072,3.038,1.53,2.04,1.464
Here's a working sample using just base R. Using your goi: str(goi) # 'data.frame': 92 obs. of 11 variables: # $ Gene : chr "Oprd1" "Grin2a" "Grin2d(2)" "Oprm1" ... # $ Rat_1 : num 2.75 3.13 4.5 2 137.35 ... # $ Rat_2 : num 1.39 2.64 5.53 1.8 97.16 ... # $ Rat_3 : num 2.25 2.96 2.63 1.61 113.65 ... # $ Rat_4 : num 3.36 5.17 4.68 1.71 141.93 ... # $ Rat_5 : num 3.19 2.48 3.93 3.67 77.3 ... # $ Rat_6 : num 2.43 3.54 6.05 3.21 133.02 ... # $ Rat_7 : num 1.985 2.596 0.98 0.249 88.872 ... # $ Rat_8 : num 1.75 1.535 0.077 1.248 75.586 ... # $ Rat_9 : num 2.75 3.2 4.38 1.76 108.96 ... # $ Rat_10: num 1.77 2.23 2.33 2.67 97.63 ... control <- goi[,2:6] stress <- goi[,7:11] Now, instead of using for loop and processing each return as we calculate it, let's calculate everything, store the complete object for each test within the list, and preserve the opportunity to grab whatever we want from all tests afterwards. results <- lapply(seq_len(nrow(goi)), function(i) t.test(control[i,], stress[i,])) length(results) # [1] 92 Each element of results is the return value from a single call of t.test. results[[1]] # Welch Two Sample t-test # data: control[i, ] and stress[i, ] # t = 1.1034, df = 6.2218, p-value = 0.3107 # alternative hypothesis: true difference in means is not equal to 0 # 95 percent confidence interval: # -0.5386851 1.4374851 # sample estimates: # mean of x mean of y # 2.5874 2.1380 You can access any component of the test results: names(results[[1]]) # [1] "statistic" "parameter" "p.value" "conf.int" "estimate" # [6] "null.value" "alternative" "method" "data.name" head( sapply(results, `[[`, "p.value") ) # [1] 0.3107098 0.3083295 0.2626753 0.6245368 0.4406157 0.2800657 head( t(sapply(results, `[[`, "conf.int")) ) # [,1] [,2] # [1,] -0.5386851 1.4374851 # [2,] -0.7513650 2.0681650 # [3,] -1.5018657 4.4862657 # [4,] -1.1880098 1.8504098 # [5,] -23.5402499 48.8678499 # [6,] -2.2762668 0.8250668 NB: one of R's many nuances is the fact that the *apply family will return a matrix that some might think is transposed from what it should be. Because f this, calls that return a matrix will benefit from being sandwiched in t(...). (This is a great opportunity to press the "I Believe" button and move on.) You can combine all of these results into a single data.frame with something like: namefunc <- function(x, nameroot) { dimnames(x) <- list(NULL, paste0(nameroot, seq_len(ncol(x)))) ; x ; } (That was a small helper function to make the following slightly easier to read. It's a very naïve naming convention, used only to keep the columns unique for now.) test_results <- cbind.data.frame( statistic = sapply(results, `[[`, "statistic"), p.value = sapply(results, `[[`, "p.value"), parameter = sapply(results, `[[`, "parameter"), namefunc( t(sapply(results, `[[`, "conf.int")), "conf" ), namefunc( t(sapply(results, `[[`, "estimate")), "est" ) ) head(test_results) # statistic p.value parameter conf1 conf2 est1 est2 # 1 1.1033554 0.3107098 6.221806 -0.5386851 1.4374851 2.5874 2.1380 # 2 1.0948456 0.3083295 7.312678 -0.7513650 2.0681650 3.2784 2.6200 # 3 1.2480711 0.2626753 5.480699 -1.5018657 4.4862657 4.2546 2.7624 # 4 0.5107431 0.6245368 7.337202 -1.1880098 1.8504098 2.1594 1.8282 # 5 0.8134064 0.4406157 7.633546 -23.5402499 48.8678499 113.4766 100.8128 # 6 -1.2161356 0.2800657 4.824393 -2.2762668 0.8250668 1.2900 2.0156 There is definitely room here to use packages from the tidyverse as RobertMc suggested. For that, I recommend dplyr and tidyr, though perhaps broom has utility here as well.
neural network: in neurons[[i]] %*% weights[[i]] : requires numeric/complex matrix/vector arguments
i am trying to the neural network method on my data and i am stuck. i am allways getting the message: in neurons[[i]] %*% weights[[i]] : requires numeric/complex matrix/vector arguments the facts are: i am reading my data using read.csv i am adding a link to a file with some of my data, i hope it helps https://www.dropbox.com/s/b1btx0cnhmj229p/collineardata0.4%287.2.2017%29.csv?dl=0 i have no NA in my data (i checked twice) the outcome of str(data) is: 'data.frame': 20 obs. of 457 variables: $ X300.5_alinine.sulphate : num 0.351 0.542 0.902 0.656 1 ... $ X300.5_bromocresol.green : num 0.435 0.603 0.749 0.314 0.922 ... $ X300.5_bromophenol.blue : num 0.415 0.662 0.863 0.345 0.784 ... $ X300.5_bromothymol.blue : num 0.2365 0.0343 0.4106 0.3867 0.8037 ... $ X300.5_chlorophenol.red : num 0.465 0.1998 0.7786 0.0699 1 ... $ X300.5_cresol.red : num 0.534 0.311 0.678 0.213 0.821 ... continued i have tried to do use model.matrix the code i have was tried on different datasets (i.e iris) and it was good. can anyone please try and suggest what is wrong with my data/data reading? the code is require(neuralnet) require(MASS) require(grid) require(nnet) #READ IN DATA data<-read.table("data.csv", sep=",", dec=".", head=TRUE) dim(data) # Create Vector of Column Max and Min Values maxs <- apply(data[,3:459], 2, max) mins <- apply(data[,3:459], 2, min) # Use scale() and convert the resulting matrix to a data frame scaled.data <- as.data.frame(scale(data[,3:459],center = mins, scale = maxs - mins)) # Check out results print(head(scaled.data,2)) #create formula feats <- names(scaled.data) # Concatenate strings f <- paste(feats,collapse=' + ') f <- paste('data$Type ~',f) # Convert to formula f <- as.formula(f) f #creating neural net nn <- neuralnet(f,model,hidden=c(21,15),linear.output=FALSE) str(scaled.data) apply(scaled.data,2,function(x) sum(is.na(x)))
There are multiple things wrong with your code. 1.There are multiple factors in your dependent variable Type. The neuralnet only accepts numeric input so you must convert it to a binary matrix with model.matrix. y <- model.matrix(~ Type + 0, data = data[,1,drop=FALSE]) # fix up names for as.formula y_feats <- gsub(" |\\+", "", colnames(y)) colnames(y) <- y_feats scaled.data <- cbind(y, scaled.data) # Concatenate strings f <- paste(feats,collapse=' + ') y_f <- paste(y_feats,collapse=' + ') f <- paste(y_f, '~',f) # Convert to formula f <- as.formula(f) 2.You didn't even pass in your scaled.data to the neuralnet call anyway. nn <- neuralnet(f,scaled.data,hidden=c(21,15),linear.output=FALSE) The function will run now but you will need to look in to the multiclass problems (beyond the scope of this question). This package does not output straight probabilities so you must be cautious.