Using glmnet on binomial data error

Using glmnet on binomial data error - r

I imported some data as follows
surv <- read.table("http://www.stat.ufl.edu/~aa/glm/data/Student_survey.dat",header = T)
x <- as.matrix(select(surv,-ab))
y <- as.matrix(select(surv,ab))
glmnet::cv.glmnet(x,y,alpha=1,,family="binomial",type.measure = "auc")
and I am getting the following error.
NAs introduced by coercion
Show Traceback
Error in lognet(x, is.sparse, ix, jx, y, weights, offset, alpha, nobs, : NA/NaN/Inf in foreign function call (arg 5)
What is a good fix for this?

The documentation of the glmnet package has the information that you need,
surv <- read.table("http://www.stat.ufl.edu/~aa/glm/data/Student_survey.dat", header = T, stringsAsFactors = T)
x <- surv[, -which(colnames(surv) == 'ab')] # remove the 'ab' column
y <- surv[, 'ab'] # the 'binomial' family takes a factor as input (too)
xfact = sapply(1:ncol(x), function(y) is.factor(x[, y])) # separate the factor from the numeric columns
xfactCols = model.matrix(~.-1, data = x[, xfact]) # one option is to build dummy variables from the factors (the other option is to convert to numeric)
xall = as.matrix(cbind(x[, !xfact], xfactCols)) # cbind() numeric and dummy columns
fit = glmnet::cv.glmnet(xall,y,alpha=1,family="binomial",type.measure = "auc") # run glmnet error free
str(fit)
List of 10
$ lambda : num [1:89] 0.222 0.202 0.184 0.168 0.153 ...
$ cvm : num [1:89] 1.12 1.11 1.1 1.07 1.04 ...
$ cvsd : num [1:89] 0.211 0.212 0.211 0.196 0.183 ...
$ cvup : num [1:89] 1.33 1.32 1.31 1.27 1.23 ...
$ cvlo : num [1:89] 0.908 0.9 0.89 0.874 0.862 ...
$ nzero : Named int [1:89] 0 2 2 3 3 3 4 4 5 6 ...
.....

I have come across the same problem of mixed data types of numeric and character/factor. For converting the predictors, I recommend using a function that comes with the glmnet package for exactly this mixed data type problem: glmnet::makeX(). It handles the dummy creation and is even able to perform a simple imputation in case of missing data.
x <- glmnet::makeX(surv[, -which(colnames(surv) == 'ab')])
or more tidy-ish:
library(tidyverse)
x <-
surv %>%
select(-ab) %>%
glmnet::makeX()

Related

Error in Y * 0: non numeric argument to binary operator - RNN

Good Morning,
I am currently trying to run a Recurrent Neural Network for Regression, using the package "rnn" on a dataset, called BostonHousing of numerical values; specifically, this is the structure:
Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 1038 obs. of 3 variables:
$ date : Date, format: "2013-11-19" "2013-11-20" "2013-11-21" "2013-11-22" ...
$ Quantità : num 0.85 0.85 -0.653 -0.653 -0.653 ...
$ Giacenza_In: num 0.945 1.648 -0.694 -0.694 -0.694 ...
#Split into train and test
cutoff = round(0.8*nrow(BostonHousing))
train_x <- BostonHousing[1:cutoff,]
test_x <- BostonHousing[-(1:cutoff),]
str(train_x)
#I apply the model and remove the first column because it's made up of dates
require(rnn)
model <- trainr( Y = train_x[,2],
X = train_x[,3],
learningrate = 0.05,
hidden_dim = 4,
numepochs = 100)
pred <- predictr( model, test_x[,3])
Whenever I try to run the code, it gives me the error reported in the title.
Basically, I would like to predict "Quantità"(which means Quantity ordered), given the quantity of products currently in stock(Giacenza_In)
Best Regards, Alessandro

It seems like trainr in the package rnn needs binary format to the input and output values. So you have to convert each column using "int2bin"
Due to this, first of all you have to convert your numeric values into integer values (using round and multiplying by 10^n)
Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 1038 obs. of 3 variables:
$ date : Date, format: "2013-11-19" "2013-11-20" "2013-11-21" "2013-11-22" ...
$ Quantità : num 0.85 0.85 -0.653 -0.653 -0.653 ...
$ Giacenza_In: num 0.945 1.648 -0.694 -0.694 -0.694 ...
#Split into train and test
cutoff = round(0.8*nrow(BostonHousing))
train_x <- BostonHousing[1:cutoff,]
test_x <- BostonHousing[-(1:cutoff),]
str(train_x)
#I apply the model and remove the first column because it's made up of dates
n<-5 #Number of decimal values
require(rnn)
model <- trainr( Y = int2bin(round(train_x[,2])*10^n),
X = int2bin(round(train_x[,3])*10^n),
learningrate = 0.05,
hidden_dim = 4,
numepochs = 100)
pred <- predictr( model, int2bin(round(test_x[,3])-10^n))
pred[pred>=0.5]<-1
pred[pred<0.5]<-0
And then you have to convert the binary values into integer again

neural network: in neurons[[i]] %*% weights[[i]] : requires numeric/complex matrix/vector arguments

i am trying to the neural network method on my data and i am stuck.
i am allways getting the message:
in neurons[[i]] %*% weights[[i]] : requires numeric/complex matrix/vector arguments
the facts are:
i am reading my data using read.csv
i am adding a link to a file with some of my data, i hope it helps
https://www.dropbox.com/s/b1btx0cnhmj229p/collineardata0.4%287.2.2017%29.csv?dl=0
i have no NA in my data (i checked twice)
the outcome of str(data) is:
'data.frame': 20 obs. of 457 variables:
$ X300.5_alinine.sulphate : num 0.351 0.542 0.902 0.656 1 ...
$ X300.5_bromocresol.green : num 0.435 0.603 0.749 0.314 0.922 ...
$ X300.5_bromophenol.blue : num 0.415 0.662 0.863 0.345 0.784 ...
$ X300.5_bromothymol.blue : num 0.2365 0.0343 0.4106 0.3867 0.8037 ...
$ X300.5_chlorophenol.red : num 0.465 0.1998 0.7786 0.0699 1 ...
$ X300.5_cresol.red : num 0.534 0.311 0.678 0.213 0.821 ...
continued
i have tried to do use model.matrix
the code i have was tried on different datasets (i.e iris) and it was good.
can anyone please try and suggest what is wrong with my data/data reading?
the code is
require(neuralnet)
require(MASS)
require(grid)
require(nnet)
#READ IN DATA
data<-read.table("data.csv", sep=",", dec=".", head=TRUE)
dim(data)
# Create Vector of Column Max and Min Values
maxs <- apply(data[,3:459], 2, max)
mins <- apply(data[,3:459], 2, min)
# Use scale() and convert the resulting matrix to a data frame
scaled.data <- as.data.frame(scale(data[,3:459],center = mins, scale = maxs - mins))
# Check out results
print(head(scaled.data,2))
#create formula
feats <- names(scaled.data)
# Concatenate strings
f <- paste(feats,collapse=' + ')
f <- paste('data$Type ~',f)
# Convert to formula
f <- as.formula(f)
f
#creating neural net
nn <- neuralnet(f,model,hidden=c(21,15),linear.output=FALSE)
str(scaled.data)
apply(scaled.data,2,function(x) sum(is.na(x)))

There are multiple things wrong with your code.
1.There are multiple factors in your dependent variable Type. The neuralnet only accepts numeric input so you must convert it to a binary matrix with model.matrix.
y <- model.matrix(~ Type + 0, data = data[,1,drop=FALSE])
# fix up names for as.formula
y_feats <- gsub(" |\\+", "", colnames(y))
colnames(y) <- y_feats
scaled.data <- cbind(y, scaled.data)
# Concatenate strings
f <- paste(feats,collapse=' + ')
y_f <- paste(y_feats,collapse=' + ')
f <- paste(y_f, '~',f)
# Convert to formula
f <- as.formula(f)
2.You didn't even pass in your scaled.data to the neuralnet call anyway.
nn <- neuralnet(f,scaled.data,hidden=c(21,15),linear.output=FALSE)
The function will run now but you will need to look in to the multiclass problems (beyond the scope of this question). This package does not output straight probabilities so you must be cautious.

Standard errors for smooth coefficient kernel regression with npscoef {np}

While fitting a Smooth Coefficient Kernel Regression with help of npscoef {np} in R, I cannot output the standard errors for the regression estimates.
The Help states that if errors = TRUE, asymptotic standard errors should be computed and returned in the resulting smoothcoefficient object.
Based on the example provided by the authors of the package "NP":
library("np")
data(wage1)
NP.Ydata <- wage1$lwage
NP.Xdata <- wage1[c("educ", "tenure", "exper", "expersq")]
NP.Zdata <- wage1[c("female", "married")]
NP.bw.scoef <- npscoefbw(xdat=NP.Xdata, ydat=NP.Ydata, zdat=NP.Zdata)
NP.scoef <- npscoef(NP.bw.scoef,
betas = TRUE,
residuals = TRUE,
errors = TRUE)
Coefficients are in the object coef(NP.scoef) saved under betas = TRUE
> str(coef(NP.scoef))
num [1:526, 1:5] 0.146 0.504 0.196 0.415 0.415 ...
- attr(*, "dimnames")=List of 2
..$ : NULL
..$ : chr [1:5] "Intercept" "educ" "tenure" "exper" ...
But should not the standard errors for the estimates be saved under errors = TRUE?
I see only one column vector. Not 5 for intercept + 4 explanatory variables.
> str(se(NP.scoef))
num [1:526] 0.015 0.0155 0.0155 0.0268 0.0128 ...
I am confused. Hope for a clarification.

How to export results from bootstrapping in R?

I have a time series of 540 observations which I resample 999 times using the following code:
boot.mean = function(x,i){boot.mean = mean(x[i])}
z1 = boot(x1, boot.mean, R=999)
z1
ORDINARY NONPARAMETRIC BOOTSTRAP
Call:
boot(data = x1, statistic = boot.mean, R = 999)
Bootstrap Statistics :
original bias std. error
t1* -0.009381397 -5.903801e-05 0.002524366
trying to export the results gives me the following error:
write.csv(z1, "z1.csv")
Error in as.data.frame.default(x[[i]], optional = TRUE, stringsAsFactors = stringsAsFactors) :
cannot coerce class ""boot"" to a data.frame
How can I export the results to a .csv file?
I am expecting to obtain a file with 540 observations 999 times, and the goal is to apply the approx_entropy function from the pracma package, to obtain 999 values for approximate entropy and plot the distribution in Latex.

First, please make sure that your example is reproducible. You can do so by generating a small x1 object, or by generating a random x1 vector:
> x1 <- rnorm(540)
Now, from your question:
I am expecting to obtain a file with 540 observations 999 times
However, this is not what you will get. You are generating 999 repetitions of the mean of the resampled data. That means that every bootstrap replicate is actually a single number.
From Heroka's comment:
Hint: look at str(z1).
The function str shows you the actual data inside the z1 object, without the pretty formatting.
> str(z1)
List of 11
$ t0 : num 0.0899
$ t : num [1:999, 1] 0.1068 0.1071 0.0827 0.1413 0.0914 ...
$ R : num 999
$ data : num [1:540] 1.02 1.27 1.82 -2.92 0.68 ...
(... lots of irrelevant stuff here ...)
- attr(*, "class")= chr "boot"
So your original data is stored as z1$data, and the data that you have bootstraped, which is the mean of each resampling, is stored in z1$t. Notice how it tells you the dimension of each slot: z1$t is 999 x 1.
Now, what you probably want to do is change the boot.mean function by a boot.identity function, which simply returns the resampled data. It goes like:
> boot.identity = function(x,i){x[i]}
> z1 = boot(x1, boot.identity, R=999)
> str(z1)
List of 11
$ t0 : num [1:540] 1.02 1.27 1.82 -2.92 0.68 ...
$ t : num [1:999, 1:540] -0.851 -0.434 -2.138 0.935 -0.493 ...
$ R : num 999
$ data : num [1:540] 1.02 1.27 1.82 -2.92 0.68 ...
(... etc etc etc ...)
And you can save this data with write.csv(z1$t, "z1.csv").

Error in match.names(clabs, nmi) - Linear programming model - R

I am applying the CCR Data Envelopment Analysis model to benchmark between stock data. To do that I am running R code from a DEA paper published here. This document comes with step by step instructions on how to implement the model below in R.
The mathematical formulation looks like this:
Finding the model I needed already made for me seemed too good to be true. I am getting this error when I run it:
Error in match.names(clabs, nmi) : names do not match previous names
Traceback:
4 stop("names do not match previous names")
3 match.names(clabs, nmi)
2 rbind(deparse.level, ...)
1 rbind(aux, c(inputs[i, ], rep(0, m)))
My test data looks as follows:
> dput(testdfst)
structure(list(Name = structure(1:10, .Label = c("Stock1", "Stock2",
"Stock3", "Stock4", "Stock5", "Stock6", "Stock7", "Stock8", "Stock9",
"Stock10"), class = "factor"), Date = structure(c(14917, 14917,
14917, 14917, 14917, 14917, 14917, 14917, 14917, 14917), class = "Date"),
`(Intercept)` = c(0.454991569278089, 1, 0, 0.459437188169979,
0.520523252955415, 0.827294243132907, 0.642696631099892,
0.166219881886161, 0.086341470900152, 0.882092217743293),
rmrf = c(0.373075150411683, 0.0349067218712968, 0.895550280607866,
1, 0.180151549474574, 0.28669170468735, 0.0939821798173586,
0, 0.269645291515763, 0.0900619760898984), smb = c(0.764987877309785,
0.509094491489323, 0.933653313048327, 0.355340700554647,
0.654000372286503, 1, 0, 0.221454091364611, 0.660571586102851,
0.545086931342479), hml = c(0.100608151187926, 0.155064872867367,
1, 0.464298576152336, 0.110803875258027, 0.0720803195598597,
0, 0.132407005239869, 0.059742053684015, 0.0661623383303703
), rmw = c(0.544512524466665, 0.0761995312858816, 1, 0, 0.507699534880555,
0.590607506295898, 0.460148690870041, 0.451871218073951,
0.801698199214685, 0.429094840372901), cma = c(0.671162426988512,
0.658898571758625, 0, 0.695830176886926, 0.567814542084284,
0.942862571603074, 1, 0.37571611336359, 0.72565234813082,
0.636762557753099), Returns = c(0.601347600017365, 0.806071701848376,
0.187500487065719, 0.602971876359073, 0.470386289298666,
0.655773224143057, 0.414258177255333, 0, 0.266112191477882,
1)), .Names = c("Name", "Date", "(Intercept)", "rmrf", "smb",
"hml", "rmw", "cma", "Returns"), row.names = c("Stock1.2010-11-04",
"Stock2.2010-11-04", "Stock3.2010-11-04", "Stock4.2010-11-04",
"Stock5.2010-11-04", "Stock6.2010-11-04", "Stock7.2010-11-04",
"Stock8.2010-11-04", "Stock9.2010-11-04", "Stock10.2010-11-04"
), class = "data.frame")
And the linear model program is this:
namesDMU <- testdfst[1]
inputs <- testdfst[c(4,5,6,7,8)]
outputs <- testdfst[9]
N <- dim(testdfst)[1] # number of DMU
s <- dim(inputs)[2] # number of inputs
m <- dim(outputs)[2] # number of outputs
f.rhs <- c(rep(0,N),1) # RHS constraints
f.dir <- c(rep("<=",N),"=") # directions of the constraints
aux <- cbind(-1*inputs,outputs) # matrix of constraint coefficients in (6)
for (i in 1:N) {
f.obj <- c(0*rep(1,s),outputs[i,]) # objective function coefficients
f.con <- rbind(aux ,c(inputs[i,], rep(0,m))) # add LHS of bTz=1
results <-lp("max",f.obj,f.con,f.dir,f.rhs,scale=1,compute.sens=TRUE) # solve LPP
multipliers <- results$solution # input and output weights
efficiency <- results$objval # efficiency score
duals <- results$duals # shadow prices
if (i==1) {
weights <- multipliers
effcrs <- efficiency
lambdas <- duals [seq(1,N)]
} else {
weights <- rbind(weights,multipliers)
effcrs <- rbind(effcrs , efficiency)
lambdas <- rbind(lambdas,duals[seq(1,N)])
}
}
Spotting the problem..
A quick search reveals that the rbind function might be at fault. This is located on this line:
f.con <- rbind(aux ,c(inputs[i,], rep(0,m)))
I tried to isolate the data from the loops to see what the problem is:
aux <- cbind(-1*inputs,outputs)
a <- c(inputs[1,])
b <- rep(0,m)
> aux
rmrf smb hml rmw cma Returns
1 -0.37307515 -0.7649879 -0.10060815 -0.54451252 -0.6711624 0.6013476
2 -0.03490672 -0.5090945 -0.15506487 -0.07619953 -0.6588986 0.8060717
3 -0.89555028 -0.9336533 -1.00000000 -1.00000000 0.0000000 0.1875005
4 -1.00000000 -0.3553407 -0.46429858 0.00000000 -0.6958302 0.6029719
5 -0.18015155 -0.6540004 -0.11080388 -0.50769953 -0.5678145 0.4703863
6 -0.28669170 -1.0000000 -0.07208032 -0.59060751 -0.9428626 0.6557732
7 -0.09398218 0.0000000 0.00000000 -0.46014869 -1.0000000 0.4142582
8 0.00000000 -0.2214541 -0.13240701 -0.45187122 -0.3757161 0.0000000
9 -0.26964529 -0.6605716 -0.05974205 -0.80169820 -0.7256523 0.2661122
10 -0.09006198 -0.5450869 -0.06616234 -0.42909484 -0.6367626 1.0000000
> a
$rmrf
[1] 0.3730752
$smb
[1] 0.7649879
$hml
[1] 0.1006082
$rmw
[1] 0.5445125
$cma
[1] 0.6711624
I also looked at this:
> identical(names(aux[1]), names(a[1]))
[1] TRUE
Column and row names are unimportant to me as long as the problem is calculated so I decided to try remove them. This one works but doesn't solve the problem.
rownames(testdfst) <- NULL
Looking at the contents of a and aux, maybe the problem lies with the column names.
colnames(testdfst) <- NULL does not work. It deletes everything in my data-frame. It could maybe... provide a solution to the problem if I can figure out how to remove the column names.

As you correctly identified, the following line is giving you the trouble:
i <- 1
f.con <- rbind(aux ,c(inputs[i,], rep(0,m))) # add LHS of bTz=1
# Error in match.names(clabs, nmi) : names do not match previous names
You can use the str function to see the structure of each element of this expression:
str(aux)
# 'data.frame': 10 obs. of 6 variables:
# $ rmrf : num -0.3731 -0.0349 -0.8956 -1 -0.1802 ...
# $ smb : num -0.765 -0.509 -0.934 -0.355 -0.654 ...
# $ hml : num -0.101 -0.155 -1 -0.464 -0.111 ...
# $ rmw : num -0.5445 -0.0762 -1 0 -0.5077 ...
# $ cma : num -0.671 -0.659 0 -0.696 -0.568 ...
# $ Returns: num 0.601 0.806 0.188 0.603 0.47 ...
str(inputs[i,])
# 'data.frame': 1 obs. of 5 variables:
# $ rmrf: num 0.373
# $ smb : num 0.765
# $ hml : num 0.101
# $ rmw : num 0.545
# $ cma : num 0.671
str(c(inputs[i,], rep(0, m)))
# List of 6
# $ rmrf: num 0.373
# $ smb : num 0.765
# $ hml : num 0.101
# $ rmw : num 0.545
# $ cma : num 0.671
# $ : num 0
Now you can see that the list you are trying to combine with rbind has different names from the data frame it's being combined with. Probably the simplest way to proceed would be to pass a vector as the new row instead of a list, which you can accomplish by converting inputs[i,] to a matrix with as.matrix:
str(c(as.matrix(inputs[i,]), rep(0, m)))
# num [1:6] 0.373 0.765 0.101 0.545 0.671 ...
This will cause the code to work without an error:
f.con <- rbind(aux, c(as.matrix(inputs[i,]), rep(0, m)))
A few unsolicited R coding tips -- instead of dim(x)[1] and dim(x)[2] to get the number of rows and columns, most would find it more readable to do nrow(x) and ncol(x). Also, building objects in a for loop by rbinding one row at a time can be very inefficient -- you can read more about that in the second circle of the R Inferno.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Using glmnet on binomial data error - r

Related

Error in Y * 0: non numeric argument to binary operator - RNN

neural network: in neurons[[i]] %*% weights[[i]] : requires numeric/complex matrix/vector arguments

Standard errors for smooth coefficient kernel regression with npscoef {np}

How to export results from bootstrapping in R?

Error in match.names(clabs, nmi) - Linear programming model - R

Categories

Resources