Error in match.names(clabs, nmi) - Linear programming model - R - r

I am applying the CCR Data Envelopment Analysis model to benchmark between stock data. To do that I am running R code from a DEA paper published here. This document comes with step by step instructions on how to implement the model below in R.
The mathematical formulation looks like this:
Finding the model I needed already made for me seemed too good to be true. I am getting this error when I run it:
Error in match.names(clabs, nmi) : names do not match previous names
Traceback:
4 stop("names do not match previous names")
3 match.names(clabs, nmi)
2 rbind(deparse.level, ...)
1 rbind(aux, c(inputs[i, ], rep(0, m)))
My test data looks as follows:
> dput(testdfst)
structure(list(Name = structure(1:10, .Label = c("Stock1", "Stock2",
"Stock3", "Stock4", "Stock5", "Stock6", "Stock7", "Stock8", "Stock9",
"Stock10"), class = "factor"), Date = structure(c(14917, 14917,
14917, 14917, 14917, 14917, 14917, 14917, 14917, 14917), class = "Date"),
`(Intercept)` = c(0.454991569278089, 1, 0, 0.459437188169979,
0.520523252955415, 0.827294243132907, 0.642696631099892,
0.166219881886161, 0.086341470900152, 0.882092217743293),
rmrf = c(0.373075150411683, 0.0349067218712968, 0.895550280607866,
1, 0.180151549474574, 0.28669170468735, 0.0939821798173586,
0, 0.269645291515763, 0.0900619760898984), smb = c(0.764987877309785,
0.509094491489323, 0.933653313048327, 0.355340700554647,
0.654000372286503, 1, 0, 0.221454091364611, 0.660571586102851,
0.545086931342479), hml = c(0.100608151187926, 0.155064872867367,
1, 0.464298576152336, 0.110803875258027, 0.0720803195598597,
0, 0.132407005239869, 0.059742053684015, 0.0661623383303703
), rmw = c(0.544512524466665, 0.0761995312858816, 1, 0, 0.507699534880555,
0.590607506295898, 0.460148690870041, 0.451871218073951,
0.801698199214685, 0.429094840372901), cma = c(0.671162426988512,
0.658898571758625, 0, 0.695830176886926, 0.567814542084284,
0.942862571603074, 1, 0.37571611336359, 0.72565234813082,
0.636762557753099), Returns = c(0.601347600017365, 0.806071701848376,
0.187500487065719, 0.602971876359073, 0.470386289298666,
0.655773224143057, 0.414258177255333, 0, 0.266112191477882,
1)), .Names = c("Name", "Date", "(Intercept)", "rmrf", "smb",
"hml", "rmw", "cma", "Returns"), row.names = c("Stock1.2010-11-04",
"Stock2.2010-11-04", "Stock3.2010-11-04", "Stock4.2010-11-04",
"Stock5.2010-11-04", "Stock6.2010-11-04", "Stock7.2010-11-04",
"Stock8.2010-11-04", "Stock9.2010-11-04", "Stock10.2010-11-04"
), class = "data.frame")
And the linear model program is this:
namesDMU <- testdfst[1]
inputs <- testdfst[c(4,5,6,7,8)]
outputs <- testdfst[9]
N <- dim(testdfst)[1] # number of DMU
s <- dim(inputs)[2] # number of inputs
m <- dim(outputs)[2] # number of outputs
f.rhs <- c(rep(0,N),1) # RHS constraints
f.dir <- c(rep("<=",N),"=") # directions of the constraints
aux <- cbind(-1*inputs,outputs) # matrix of constraint coefficients in (6)
for (i in 1:N) {
f.obj <- c(0*rep(1,s),outputs[i,]) # objective function coefficients
f.con <- rbind(aux ,c(inputs[i,], rep(0,m))) # add LHS of bTz=1
results <-lp("max",f.obj,f.con,f.dir,f.rhs,scale=1,compute.sens=TRUE) # solve LPP
multipliers <- results$solution # input and output weights
efficiency <- results$objval # efficiency score
duals <- results$duals # shadow prices
if (i==1) {
weights <- multipliers
effcrs <- efficiency
lambdas <- duals [seq(1,N)]
} else {
weights <- rbind(weights,multipliers)
effcrs <- rbind(effcrs , efficiency)
lambdas <- rbind(lambdas,duals[seq(1,N)])
}
}
Spotting the problem..
A quick search reveals that the rbind function might be at fault. This is located on this line:
f.con <- rbind(aux ,c(inputs[i,], rep(0,m)))
I tried to isolate the data from the loops to see what the problem is:
aux <- cbind(-1*inputs,outputs)
a <- c(inputs[1,])
b <- rep(0,m)
> aux
rmrf smb hml rmw cma Returns
1 -0.37307515 -0.7649879 -0.10060815 -0.54451252 -0.6711624 0.6013476
2 -0.03490672 -0.5090945 -0.15506487 -0.07619953 -0.6588986 0.8060717
3 -0.89555028 -0.9336533 -1.00000000 -1.00000000 0.0000000 0.1875005
4 -1.00000000 -0.3553407 -0.46429858 0.00000000 -0.6958302 0.6029719
5 -0.18015155 -0.6540004 -0.11080388 -0.50769953 -0.5678145 0.4703863
6 -0.28669170 -1.0000000 -0.07208032 -0.59060751 -0.9428626 0.6557732
7 -0.09398218 0.0000000 0.00000000 -0.46014869 -1.0000000 0.4142582
8 0.00000000 -0.2214541 -0.13240701 -0.45187122 -0.3757161 0.0000000
9 -0.26964529 -0.6605716 -0.05974205 -0.80169820 -0.7256523 0.2661122
10 -0.09006198 -0.5450869 -0.06616234 -0.42909484 -0.6367626 1.0000000
> a
$rmrf
[1] 0.3730752
$smb
[1] 0.7649879
$hml
[1] 0.1006082
$rmw
[1] 0.5445125
$cma
[1] 0.6711624
I also looked at this:
> identical(names(aux[1]), names(a[1]))
[1] TRUE
Column and row names are unimportant to me as long as the problem is calculated so I decided to try remove them. This one works but doesn't solve the problem.
rownames(testdfst) <- NULL
Looking at the contents of a and aux, maybe the problem lies with the column names.
colnames(testdfst) <- NULL does not work. It deletes everything in my data-frame. It could maybe... provide a solution to the problem if I can figure out how to remove the column names.

As you correctly identified, the following line is giving you the trouble:
i <- 1
f.con <- rbind(aux ,c(inputs[i,], rep(0,m))) # add LHS of bTz=1
# Error in match.names(clabs, nmi) : names do not match previous names
You can use the str function to see the structure of each element of this expression:
str(aux)
# 'data.frame': 10 obs. of 6 variables:
# $ rmrf : num -0.3731 -0.0349 -0.8956 -1 -0.1802 ...
# $ smb : num -0.765 -0.509 -0.934 -0.355 -0.654 ...
# $ hml : num -0.101 -0.155 -1 -0.464 -0.111 ...
# $ rmw : num -0.5445 -0.0762 -1 0 -0.5077 ...
# $ cma : num -0.671 -0.659 0 -0.696 -0.568 ...
# $ Returns: num 0.601 0.806 0.188 0.603 0.47 ...
str(inputs[i,])
# 'data.frame': 1 obs. of 5 variables:
# $ rmrf: num 0.373
# $ smb : num 0.765
# $ hml : num 0.101
# $ rmw : num 0.545
# $ cma : num 0.671
str(c(inputs[i,], rep(0, m)))
# List of 6
# $ rmrf: num 0.373
# $ smb : num 0.765
# $ hml : num 0.101
# $ rmw : num 0.545
# $ cma : num 0.671
# $ : num 0
Now you can see that the list you are trying to combine with rbind has different names from the data frame it's being combined with. Probably the simplest way to proceed would be to pass a vector as the new row instead of a list, which you can accomplish by converting inputs[i,] to a matrix with as.matrix:
str(c(as.matrix(inputs[i,]), rep(0, m)))
# num [1:6] 0.373 0.765 0.101 0.545 0.671 ...
This will cause the code to work without an error:
f.con <- rbind(aux, c(as.matrix(inputs[i,]), rep(0, m)))
A few unsolicited R coding tips -- instead of dim(x)[1] and dim(x)[2] to get the number of rows and columns, most would find it more readable to do nrow(x) and ncol(x). Also, building objects in a for loop by rbinding one row at a time can be very inefficient -- you can read more about that in the second circle of the R Inferno.

Related

I'm missing the second line in a ggplot, there should be test and train lines present?

I'm trying to use ggplot2 using R to graph a train and test curve for the iterative error rates of a neural network. There should be two lines but I'm only seeing just the test line, does anyone know what happened? It looks like when I used head(error_df) every type is labelled as test for some reason.
Edit: even with just error_df without any subsets it's still not showing the line for the training set's error, this also includes various ranges such as error_df[2500:5000, 7500:10000,]
Here's the ggplot graph:
Here's the code and this is a link to a public google spreadsheet of the data:
library(Rcpp)
library(RSNNS)
library(ggplot2)
library(plotROC)
library(tidyr)
setwd("**set working directory**")
data <- read.csv("WDBC.csv", header=T)
data <- data[,1:4]
data <- scale(data) # normalizes the data
numHneurons3 = 3
DecTargets = decodeClassLabels(data[,4])
train.test3 <- splitForTrainingAndTest(data, DecTargets,ratio = 0.50) # split
model3_02 <- mlp(train.test3$inputsTrain, train.test3$targetsTrain, # build model3
size = numHneurons3, learnFuncParams = c(0.02),maxit = 10000,
inputsTest = train.test3$inputsTest,
targetsTest = train.test3$targetsTest)
#--------------------------------------
# GGPlots of the Iterative Error:
#--------------------------------------
str(model3_02)
test_error <- model3_02$IterativeTestError
train_error <- model3_02$IterativeFitError
error_df <- data.frame(iter = c(seq_along(test_error),
seq_along(train_error)),
Error = c(test_error, train_error),
type = c(rep("test", length(test_error)),
rep("train", length(train_error))
))
ggplot(error_df[5000:10000,], aes(iter, Error, color = type, each = length(test_error))) + geom_line()
Here's also a snippet of the data, model, and data frame:
> head(data, 10)
PatientID radius texture perimeter
[1,] -0.2361973 1.0960995 -2.0715123 1.26881726
[2,] -0.2361956 1.8282120 -0.3533215 1.68447255
[3,] 0.4313615 1.5784992 0.4557859 1.56512598
[4,] 0.4317407 -0.7682333 0.2535091 -0.59216612
[5,] 0.4318215 1.7487579 -1.1508038 1.77501133
[6,] -0.2361855 -0.4759559 -0.8346009 -0.38680772
[7,] -0.2361809 1.1698783 0.1605082 1.13712450
[8,] 0.4326197 -0.1184126 0.3581350 -0.07280278
[9,] -0.2361759 -0.3198854 0.5883121 -0.18391855
[10,] 0.4329621 -0.4731182 1.1044669 -0.32919213
> str(model3_02)
List of 17
$ nInputs : int 4
$ maxit : num 10000
$ IterativeFitError : num [1:10000] 18838 4468 2365 1639 1278 ...
$ IterativeTestError : num [1:10000] 7031 3006 1916 1431 1161 ...
$ fitted.values : num [1:284, 1:522] 0.00386 0.00386 0.00387 0.00387 0.00386 ...
$ fittedTestValues : num [1:285, 1:522] 0.00387 0.00387 0.00387 0.00387 0.00387 ...
$ nOutputs : int 522
- attr(*, "class")= chr [1:2] "mlp" "rsnns"
> head(error_df)
iter Error type
1 1 7031.3101 test
2 2 3006.4253 test
3 3 1915.8997 test
4 4 1430.6152 test
5 5 1160.6987 test
6 6 990.2686 test
You created a data frame (error_df) with three columns by concatenating two variable together into one column, thus the variables were filled one after the other. However, you're limiting your plot from rows 5000 to 10000 of the data.
ggplot(error_df[c(5000:10000, 15000:20000),], aes(iter, Error, color = type, each = length(test_error))) + geom_line()
should show both curves.

How to bind rows of svymean output?

In order to apply weights etc. on survey data I am working with the survey package. There included is a wonderful function svymean() which gives me neat pairs of mean and standard errors. I now have several of these pairs and want them combine with rbind() into a data.frame.
library(survey)
data(fpc)
fpc.w1 <- with(fpc, svydesign(ids = ~0, weights = weight, data = fpc))
fpc.w2 <- with(fpc, svydesign(ids = stratid, weights = weight, data = fpc))
(msd.1 <- svymean(fpc$x, fpc.w1))
# mean SE
# [1,] 5.4481 0.7237
(msd.2 <- svymean(fpc$x, fpc.w2))
# mean SE
# [1,] 5.4481 0.5465
rbind(msd.1, msd.2)
# [,1]
# msd.1 5.448148
# msd.2 5.448148
As one can see, the SE is missing. Examining the object yields following:
class(msd.1)
# [1] "svystat"
str(msd.1)
# Class 'svystat' atomic [1:1] 5.45
# ..- attr(*, "var")= num [1, 1] 0.524
# .. ..- attr(*, "dimnames")=List of 2
# .. .. ..$ : NULL
# .. .. ..$ : NULL
# ..- attr(*, "statistic")= chr "mean"
So I made some guesswork.
msd.1$mean
# Error in msd.1$mean : $ operator is invalid for atomic vectors
msd.1$SE
# Error in msd.1$SE : $ operator is invalid for atomic vectors
msd.1[2]
# [1] NA
msd.1[1, 2]
# Error in msd.1[1, 2] : incorrect number of dimensions
Included in the package is a function called SE() which yields:
SE(msd.1)
# [,1]
# [1,] 0.723725
Ok. By this I finally could accomplish a solution to bind these rows:
t(data.frame(msd.1=c(msd.1, SE(msd.1)),
msd.2=c(msd.2, SE(msd.2)),
row.names = c("mean", "SD")))
# mean SD
# msd.1 5.448148 0.7237250
# msd.2 5.448148 0.5465021
Do I actually have to take this pain with the package to bind rows, or am I missing something?
You can just coerce the svymean output to a data frame, then rbind them together.
do.call(rbind, lapply(list(msd.1, msd.2), as.data.frame))
mean SE
1 5.448148 0.7237250
2 5.448148 0.5465021
If you want to add names, you have to name the items in the list and then set USE.NAMES = TRUE in lapply
do.call(rbind, lapply(list("msd.1"= msd.1, "msd.2" = msd.2), as.data.frame, USE.NAMES = TRUE))
mean SE
msd.1 5.448148 0.7237250
msd.2 5.448148 0.5465021
Or a tidyverse option would be
library(tidyverse)
list(msd.1, msd.2) %>%
map_df(as.tibble)
# A tibble: 2 x 2
# mean SE
# <dbl> <dbl>
#1 5.448148 0.7237250
#2 5.448148 0.5465021

Using glmnet on binomial data error

I imported some data as follows
surv <- read.table("http://www.stat.ufl.edu/~aa/glm/data/Student_survey.dat",header = T)
x <- as.matrix(select(surv,-ab))
y <- as.matrix(select(surv,ab))
glmnet::cv.glmnet(x,y,alpha=1,,family="binomial",type.measure = "auc")
and I am getting the following error.
NAs introduced by coercion
Show Traceback
Error in lognet(x, is.sparse, ix, jx, y, weights, offset, alpha, nobs, : NA/NaN/Inf in foreign function call (arg 5)
What is a good fix for this?
The documentation of the glmnet package has the information that you need,
surv <- read.table("http://www.stat.ufl.edu/~aa/glm/data/Student_survey.dat", header = T, stringsAsFactors = T)
x <- surv[, -which(colnames(surv) == 'ab')] # remove the 'ab' column
y <- surv[, 'ab'] # the 'binomial' family takes a factor as input (too)
xfact = sapply(1:ncol(x), function(y) is.factor(x[, y])) # separate the factor from the numeric columns
xfactCols = model.matrix(~.-1, data = x[, xfact]) # one option is to build dummy variables from the factors (the other option is to convert to numeric)
xall = as.matrix(cbind(x[, !xfact], xfactCols)) # cbind() numeric and dummy columns
fit = glmnet::cv.glmnet(xall,y,alpha=1,family="binomial",type.measure = "auc") # run glmnet error free
str(fit)
List of 10
$ lambda : num [1:89] 0.222 0.202 0.184 0.168 0.153 ...
$ cvm : num [1:89] 1.12 1.11 1.1 1.07 1.04 ...
$ cvsd : num [1:89] 0.211 0.212 0.211 0.196 0.183 ...
$ cvup : num [1:89] 1.33 1.32 1.31 1.27 1.23 ...
$ cvlo : num [1:89] 0.908 0.9 0.89 0.874 0.862 ...
$ nzero : Named int [1:89] 0 2 2 3 3 3 4 4 5 6 ...
.....
I have come across the same problem of mixed data types of numeric and character/factor. For converting the predictors, I recommend using a function that comes with the glmnet package for exactly this mixed data type problem: glmnet::makeX(). It handles the dummy creation and is even able to perform a simple imputation in case of missing data.
x <- glmnet::makeX(surv[, -which(colnames(surv) == 'ab')])
or more tidy-ish:
library(tidyverse)
x <-
surv %>%
select(-ab) %>%
glmnet::makeX()

Doing multiple ttests within a dataframe.

I would like to do a series of ttests on a dataset:
specifically, i'd like to do seperate ttest for rats 1-5 vs 6-10 for every gene.
I've tried to do this:
>goi2 <- (goi[-1])
control <- goi2[1:5,]
stress <- goi2[6:10,]
for (i in 1:92){
x <- control[,i]
y <- stress[,i]
x <= t.test(x, y)
# print(x=i)
}
but i get this error:
Error: Can't use matrix or array for column indexing
I've tried a few varieties of this but cant figure out why this wont work.
Im a complete newb to R, but not programming in general.
Dataset:
Gene,Rat_1,Rat_2,Rat_3,Rat_4,Rat_5,Rat_6,Rat_7,Rat_8,Rat_9,Rat_10
Oprd1,2.746,1.387,2.25,3.363,3.191,2.432,1.985,1.75,2.752,1.771
Grin2a,3.134,2.644,2.962,5.168,2.484,3.54,2.596,1.535,3.197,2.232
Grin2d(2),4.496,5.528,2.631,4.684,3.934,6.047,0.98,0.077,4.381,2.327
Oprm1,1.998,1.804,1.611,1.712,3.672,3.215,0.249,1.248,1.758,2.671
Scn2b,137.35,97.158,113.65,141.93,77.295,133.02,88.872,75.586,108.96,97.626
Ntf3,0.989,1.835,1.604,1.133,0.889,0.782,0.918,2.241,2.216,3.921
Scn1a(2),9.224,7.369,10.145,14.242,17.262,11.535,8.144,7.166,13.625,6.604
Ntrk2(2),21.929,17.018,14.799,19.783,14.632,24.421,14.235,9.344,16.658,17.913
Cacna1c,4.585,3.637,3.948,4.135,3.403,5.381,4.193,3.162,3.455,3.695
Grin2b,3.273,2.57,2.101,2.922,1.826,3.338,2.121,1.416,2.973,2.005
Scn9a(2),0.319,0,0,0.453,0.434,0.376,0,0,0.346,0.469
Gria4(2),10.867,8.156,7.889,9.236,14.134,10.574,8.404,8.179,9.442,7.982
Cacna1e(2),1.805,1.783,2.045,1.968,1.405,1.807,0.973,0.993,0.857,1.769
Gria3,4.237,4.188,3.901,5.221,6.439,3.993,3.421,4.012,4.452,4.631
Gria1,8.284,7.942,7.557,12.001,3.976,9.472,7.653,4.16,7.971,5.381
Kcnj5,3.089,2.046,3.332,3.392,2.168,3.786,3.865,1.414,2.37,2.009
Cacna1b(2),11.071,8.716,8.246,9.594,7.189,11.62,6.028,4.481,9.307,9.074
Scn5a,1.301,1.017,0.714,1.401,0.449,1.183,1.065,0.292,0.823,0.714
Scn2a(2),3.286,2.119,2.257,2.024,1.902,3.441,1.327,1.072,2.576,2.09
Scn10a,0.037,0.069,0.087,0.076,0.082,0.095,0.052,0.019,0.078,0.045
Cacna1g(2),6.543,5.095,5.463,8.404,3.084,7.359,5.746,4.682,5.969,4.315
Cacna1e(3),5.37,4.002,3.313,4.803,2.665,5.623,3.296,1.953,3.827,4.092
Bdnf(4),0.869,0.509,0.996,1.032,0.256,0.742,0.498,0.531,0.994,0.473
Scn4a,0.284,0.278,0.359,0.45,0.761,0.31,0.319,0.27,0.366,0.273
Scn5a(2),0.256,0.477,0.587,0.283,0,0.564,0.044,0.023,0.204,0.15
Gabra1,51.019,44.3,57.609,81.522,40.853,64.921,68.263,31.766,58.006,39.518
Scn8a,6.854,14.666,5.416,12.347,4.823,14.935,7.014,16.684,9.686,17.44
Kcnj3,17.047,14.3,13.741,14.363,14.01,13.268,12.172,10.718,15.374,13.048
Slc6a2,107.9,69.941,91.704,36.411,112.57,114.5,23.398,63.848,53.323,135.26
Grin3a,6.952,5.676,7.301,12.557,3.65,10.628,9.783,4.286,8.015,4.499
Cnr1,20.261,16.981,19.996,26.469,12.709,24.705,25.548,10.61,19.746,14.64
Scn1b,13.732,15.763,5.03,20.68,17.788,14.959,16.298,24.682,22.477,15.117
Gria1(2),2.709,3.667,2.51,2.9,2.134,1.93,4.308,2.59,2.487,1.742
Scn3a(2),1.439,2.614,0,0.352,0,1.358,1.027,0,0.452,0.586
Scn11a,0.058,0.292,0.036,0.127,0.058,0.06,0.074,0.164,0.047,0.05
Gria1(3),25.283,17.779,22.725,32.705,8.823,28.727,26.915,12.876,23.545,17.879
Cacna1f,0.056,0.067,0.14,0.123,0.04,0.182,0.072,0.083,0.077,0.097
Cacna1a,20.791,19.816,17.613,21.663,15.697,22.824,16.737,16.719,16.604,20.469
Gria4,8.51,7.107,8.342,9.338,7.46,8.877,7.673,6.341,8.393,9.555
Scn8a,6.738,14.706,4.172,11.467,2.552,10.757,6.021,15.222,3.588,11.333
Grin2d,20.398,15.794,22.521,24.693,16.97,24.108,24.19,21.016,18.314,19.044
Gria3(2),15.301,13.087,13.918,14.433,12.282,14.914,12.198,11.602,13.738,15.481
Oprk1(2),6.66,4.97,7.604,10.281,2.151,10.462,10.278,1.525,6.869,4.902
Scn1b(3),46.553,42.795,49.498,55.558,64.101,38.178,44.1,59.033,43.837,39.382
Cacna1h,9.145,7.295,8.7,8.028,5.415,10.799,8.21,6.332,8.455,7.683
Scn2a,36.803,29.975,30.609,38.334,19.053,39.127,31.146,23.066,30.896,32.345
Cacna1g,5.489,5.213,6.24,7.896,3.97,4.876,6.283,5.464,6.08,3.692
Ntrk2(3),147.81,152.45,153.46,136.09,181.1,156.85,219.8,164.53,156.64,147.92
Scn1a,9.222,9.162,9.659,13.83,12.679,8.088,11.45,10.406,9.503,6.827
Grin1(3),69.943,68.01,76.358,81.029,63.692,83.424,70.981,80.088,69.821,70.764
Grin3b(2),2.065,1.265,1.45,1.576,3.875,1.441,1.822,1.964,2.286,0.965
Gabra2(2),2.268,1.251,1.638,2.844,2.93,2.934,3.725,1.724,1.455,2.674
Scn1b2(2),161.76,164.24,213.24,209.19,235.38,172.98,207.33,216.96,198.26,130.93
Oprm1(2),4.046,5.181,2.362,1.925,0.806,2.232,1.178,1.491,3.259,3.751
Cacna1c(3),0.077,0.194,0.23,0,0.132,0.127,0,0.035,0.09,0.092
Ntrk2,27.139,26.028,23.881,27.22,22.259,30.728,22.381,19.782,24.704,30.85
Cacna1d(2),2.126,2.263,2.038,2.1,1.995,2.966,1.943,2.01,2.317,2.214
Scn3a,21.272,16.356,16.245,14.875,11.825,19.753,10.994,11.08,16.905,19.832
Grin1(2),76.771,65.788,66.059,78.716,33.91,88.228,73.859,47.717,70.674,61.275
Grina,672.31,705.45,679.04,623.4,597.51,742.12,619.74,662.95,665.18,781.29
Cacna1e,2.448,1.981,1.506,2.003,1.318,3.052,1.953,0.814,2.17,2.482
Bdnf(2),1.853,2.128,2.553,1.996,0.663,2.5,2.385,0.468,1.922,1.481
Fos,18.402,24.653,23.038,20.615,8.027,38.444,20.836,11.756,20.823,20.296
Scn4b,23.772,27.874,25.388,25.109,51.926,20.291,25.521,28.701,30.256,17.344
Slc6a2(3),480.05,455.95,307.6,186.82,376.96,447.61,123.5,409.58,347.86,681.04
Ntf3(3),1.87,3.561,2.421,3.133,2.134,2.327,1.712,2.32,1.735,3.497
Bdnf(3),0.319,0.09,0.665,0.187,0.107,0.185,0.394,0.264,0.21,0.345
Scn3b,112.86,115.29,99.711,96.245,71.741,122.34,85.875,88.906,102.88,132.13
Grin2c,14.224,15.944,15.473,21.936,32.732,13.98,20.168,23.958,14.541,17.402
Gabrd,0.701,3.542,0.532,5.222,5.593,0.133,2.954,0.961,0.506,2.152
Cacna1b,16.935,15.764,14.475,15.639,10.655,19.408,14.115,14.079,14.26,16.737
Slc18a2,433.92,429.22,293.57,164.53,287.51,370.72,93.973,283.12,321.49,551.07
Cacnb1(2),16.456,5.099,16.969,4.469,12.471,5.143,14.017,10.049,17.537,4.26
Gabrg1,40.614,37.373,43.103,39.253,47.768,41.202,51.665,37.74,42.17,39.097
Grin1,1.235,0.812,0.909,1.605,0.513,1.371,1.596,1.346,1.213,0.922
Slc6a2(2),138.21,136.75,34.759,38.393,25.89,87.126,0,0.467,99.703,137.66
Galr3,2.691,2.51,2.517,4.446,0.727,2.933,4.041,2.08,2.638,1.456
Oprm1(3),7.273,7.676,7.08,6.196,5.515,9.023,2.57,4.8,7.699,10.471
Gabrq,70.623,67.728,51.095,42.456,43.156,77.924,28.63,32.975,54.192,87.697
Gria4(3),25.846,26.045,24.37,37.866,18.037,26.907,31.423,21.292,26.795,24.642
Cacna1c(2),0.644,0.894,0.831,1.084,0.721,1.026,0.817,0.371,1.333,1.015
Cacna1d(3),0.299,0.406,0.127,0.319,0.319,0.231,0.178,0.075,0.18,0.405
Cacnb1,47.24,51.505,42.702,48.718,33.28,60.334,38.611,41.827,40.352,56.132
Scn7a,2.351,2.38,2.114,1.96,0.316,2.647,1.945,1.219,2.559,1.498
Cacna1d,2.661,2.733,2.714,2.649,2.403,2.923,3.216,2.768,2.401,2.302
Gabra2,25.209,26.731,23.249,25.599,20.17,22.928,24.072,18.664,23.808,23.306
Scn9a,3.209,3.106,3.212,3.206,1.094,3.35,3.994,1.934,2.883,2.046
Ntf3(2),2.347,2.282,2.112,1.025,1.762,2.029,0.501,1.652,2.717,1.982
Gria2,12.726,12.997,12.74,15.615,7.156,14.375,13.387,11.682,12.968,11.332
Bdnf,0.703,0.777,1.034,0.571,0.166,1.164,0.549,0.325,0.801,1.12
Gria2(2),17.769,17.694,16.62,18.603,11.295,19.926,18.044,13.594,16.946,17.712
Bdnf(5),1.321,2.152,1.882,2.397,1.598,3.072,3.038,1.53,2.04,1.464
Here's a working sample using just base R. Using your goi:
str(goi)
# 'data.frame': 92 obs. of 11 variables:
# $ Gene : chr "Oprd1" "Grin2a" "Grin2d(2)" "Oprm1" ...
# $ Rat_1 : num 2.75 3.13 4.5 2 137.35 ...
# $ Rat_2 : num 1.39 2.64 5.53 1.8 97.16 ...
# $ Rat_3 : num 2.25 2.96 2.63 1.61 113.65 ...
# $ Rat_4 : num 3.36 5.17 4.68 1.71 141.93 ...
# $ Rat_5 : num 3.19 2.48 3.93 3.67 77.3 ...
# $ Rat_6 : num 2.43 3.54 6.05 3.21 133.02 ...
# $ Rat_7 : num 1.985 2.596 0.98 0.249 88.872 ...
# $ Rat_8 : num 1.75 1.535 0.077 1.248 75.586 ...
# $ Rat_9 : num 2.75 3.2 4.38 1.76 108.96 ...
# $ Rat_10: num 1.77 2.23 2.33 2.67 97.63 ...
control <- goi[,2:6]
stress <- goi[,7:11]
Now, instead of using for loop and processing each return as we calculate it, let's calculate everything, store the complete object for each test within the list, and preserve the opportunity to grab whatever we want from all tests afterwards.
results <- lapply(seq_len(nrow(goi)),
function(i) t.test(control[i,], stress[i,]))
length(results)
# [1] 92
Each element of results is the return value from a single call of t.test.
results[[1]]
# Welch Two Sample t-test
# data: control[i, ] and stress[i, ]
# t = 1.1034, df = 6.2218, p-value = 0.3107
# alternative hypothesis: true difference in means is not equal to 0
# 95 percent confidence interval:
# -0.5386851 1.4374851
# sample estimates:
# mean of x mean of y
# 2.5874 2.1380
You can access any component of the test results:
names(results[[1]])
# [1] "statistic" "parameter" "p.value" "conf.int" "estimate"
# [6] "null.value" "alternative" "method" "data.name"
head( sapply(results, `[[`, "p.value") )
# [1] 0.3107098 0.3083295 0.2626753 0.6245368 0.4406157 0.2800657
head( t(sapply(results, `[[`, "conf.int")) )
# [,1] [,2]
# [1,] -0.5386851 1.4374851
# [2,] -0.7513650 2.0681650
# [3,] -1.5018657 4.4862657
# [4,] -1.1880098 1.8504098
# [5,] -23.5402499 48.8678499
# [6,] -2.2762668 0.8250668
NB: one of R's many nuances is the fact that the *apply family will return a matrix that some might think is transposed from what it should be. Because f this, calls that return a matrix will benefit from being sandwiched in t(...). (This is a great opportunity to press the "I Believe" button and move on.)
You can combine all of these results into a single data.frame with something like:
namefunc <- function(x, nameroot) { dimnames(x) <- list(NULL, paste0(nameroot, seq_len(ncol(x)))) ; x ; }
(That was a small helper function to make the following slightly easier to read. It's a very naïve naming convention, used only to keep the columns unique for now.)
test_results <- cbind.data.frame(
statistic = sapply(results, `[[`, "statistic"),
p.value = sapply(results, `[[`, "p.value"),
parameter = sapply(results, `[[`, "parameter"),
namefunc( t(sapply(results, `[[`, "conf.int")), "conf" ),
namefunc( t(sapply(results, `[[`, "estimate")), "est" )
)
head(test_results)
# statistic p.value parameter conf1 conf2 est1 est2
# 1 1.1033554 0.3107098 6.221806 -0.5386851 1.4374851 2.5874 2.1380
# 2 1.0948456 0.3083295 7.312678 -0.7513650 2.0681650 3.2784 2.6200
# 3 1.2480711 0.2626753 5.480699 -1.5018657 4.4862657 4.2546 2.7624
# 4 0.5107431 0.6245368 7.337202 -1.1880098 1.8504098 2.1594 1.8282
# 5 0.8134064 0.4406157 7.633546 -23.5402499 48.8678499 113.4766 100.8128
# 6 -1.2161356 0.2800657 4.824393 -2.2762668 0.8250668 1.2900 2.0156
There is definitely room here to use packages from the tidyverse as RobertMc suggested. For that, I recommend dplyr and tidyr, though perhaps broom has utility here as well.

neural network: in neurons[[i]] %*% weights[[i]] : requires numeric/complex matrix/vector arguments

i am trying to the neural network method on my data and i am stuck.
i am allways getting the message:
in neurons[[i]] %*% weights[[i]] : requires numeric/complex matrix/vector arguments
the facts are:
i am reading my data using read.csv
i am adding a link to a file with some of my data, i hope it helps
https://www.dropbox.com/s/b1btx0cnhmj229p/collineardata0.4%287.2.2017%29.csv?dl=0
i have no NA in my data (i checked twice)
the outcome of str(data) is:
'data.frame': 20 obs. of 457 variables:
$ X300.5_alinine.sulphate : num 0.351 0.542 0.902 0.656 1 ...
$ X300.5_bromocresol.green : num 0.435 0.603 0.749 0.314 0.922 ...
$ X300.5_bromophenol.blue : num 0.415 0.662 0.863 0.345 0.784 ...
$ X300.5_bromothymol.blue : num 0.2365 0.0343 0.4106 0.3867 0.8037 ...
$ X300.5_chlorophenol.red : num 0.465 0.1998 0.7786 0.0699 1 ...
$ X300.5_cresol.red : num 0.534 0.311 0.678 0.213 0.821 ...
continued
i have tried to do use model.matrix
the code i have was tried on different datasets (i.e iris) and it was good.
can anyone please try and suggest what is wrong with my data/data reading?
the code is
require(neuralnet)
require(MASS)
require(grid)
require(nnet)
#READ IN DATA
data<-read.table("data.csv", sep=",", dec=".", head=TRUE)
dim(data)
# Create Vector of Column Max and Min Values
maxs <- apply(data[,3:459], 2, max)
mins <- apply(data[,3:459], 2, min)
# Use scale() and convert the resulting matrix to a data frame
scaled.data <- as.data.frame(scale(data[,3:459],center = mins, scale = maxs - mins))
# Check out results
print(head(scaled.data,2))
#create formula
feats <- names(scaled.data)
# Concatenate strings
f <- paste(feats,collapse=' + ')
f <- paste('data$Type ~',f)
# Convert to formula
f <- as.formula(f)
f
#creating neural net
nn <- neuralnet(f,model,hidden=c(21,15),linear.output=FALSE)
str(scaled.data)
apply(scaled.data,2,function(x) sum(is.na(x)))
There are multiple things wrong with your code.
1.There are multiple factors in your dependent variable Type. The neuralnet only accepts numeric input so you must convert it to a binary matrix with model.matrix.
y <- model.matrix(~ Type + 0, data = data[,1,drop=FALSE])
# fix up names for as.formula
y_feats <- gsub(" |\\+", "", colnames(y))
colnames(y) <- y_feats
scaled.data <- cbind(y, scaled.data)
# Concatenate strings
f <- paste(feats,collapse=' + ')
y_f <- paste(y_feats,collapse=' + ')
f <- paste(y_f, '~',f)
# Convert to formula
f <- as.formula(f)
2.You didn't even pass in your scaled.data to the neuralnet call anyway.
nn <- neuralnet(f,scaled.data,hidden=c(21,15),linear.output=FALSE)
The function will run now but you will need to look in to the multiclass problems (beyond the scope of this question). This package does not output straight probabilities so you must be cautious.

Resources