How to use deepnet for classification in R - r

When i use code from example:
library(deepnet)
Var1 <- c(rnorm(50, 1, 0.5), rnorm(50, -0.6, 0.2))
Var2 <- c(rnorm(50, -0.8, 0.2), rnorm(50, 2, 1))
x <- matrix(c(Var1, Var2), nrow = 100, ncol = 2)
y <- c(rep(1, 50), rep(0, 50))
nn <- dbn.dnn.train(x, y, hidden = c(5))
it works. But when i use this code:
Var1 <- c(rnorm(50, 1, 0.5), rnorm(50, -0.6, 0.2))
Var2 <- c(rnorm(50, -0.8, 0.2), rnorm(50, 2, 1))
x <- matrix(c(Var1, Var2), nrow = 100, ncol = 2)
**y <- c(rep("1", 50), rep("0", 50))**
nn <- dbn.dnn.train(x, y, hidden = c(5))
i receive error:
Error in batch_y - nn$post[[i]] : non-numeric argument to binary operator
How can i use deepnet package for classification problem?

y1 <- c(rep("1", 50), rep("0", 50))
lead you to character vector which is not acceptable by the package. so that you get error
class(y)
#[1] "character"
The right y should be numeric as follows
y <- c(rep(1, 50), rep(0, 50))
class(y)
#[1] "numeric"
if you see inside your y , you can find that you have 1 or 0 which is a binary values for classification
> table(y)
#y
# 0 1
#50 50
If you want to train as it is mentioned in the manual, you can do the following to train and predict a test set
Var1 <- c(rnorm(50, 1, 0.5), rnorm(50, -0.6, 0.2))
Var2 <- c(rnorm(50, -0.8, 0.2), rnorm(50, 2, 1))
x <- matrix(c(Var1, Var2), nrow = 100, ncol = 2)
y <- c(rep(1, 50), rep(0, 50))
If you now look at your x and y by str just simply write str(x) or str(y) you can see that they are numeric (to make sure, you can check them by class(x) and class(y).
After having your X and y , then you can build your model
dnn <- dbn.dnn.train(x, y, hidden = c(5, 5))
If you have a test set to predict, then you can predict it using for example as is mentioned in the manual
test_Var1 <- c(rnorm(50, 1, 0.5), rnorm(50, -0.6, 0.2))
test_Var2 <- c(rnorm(50, -0.8, 0.2), rnorm(50, 2, 1))
test_x <- matrix(c(test_Var1, test_Var2), nrow = 100, ncol = 2)
nn.test(dnn, test_x, y)
#[1] 0.25
Again your test_x must be numeric. If your problem is that you have the values as character, then you can convert it to numeric by mydata<- as.numeric()

Related

R: non-numeric arguments to binary operators

I am working with the R programming language. I am trying to make a "parallel coordinates plot" using some fake data:
library(MASS)
a = rnorm(100, 10, 10)
b = rnorm(100, 10, 5)
c = rnorm(100, 5, 10)
d = matrix(a, b, c)
parcoord(d[, c(3, 1, 2)], col = 1 + (0:149) %/% 50)
However, a problem arises when I try to mix numeric and factor variables together:
group <- sample( LETTERS[1:4], 100, replace=TRUE, prob=c(0.25, 0.25, 0.25, 0.25) )
d = matrix(a,b, group)
parcoord(d[, c(3, 1, 2)], col = 1 + (0:149) %/% 50)
Error in x - min(x, na.rm = TRUE): non-numeric argument to binary operator
I am just curious. Can this problem be resolved? Or is it simply impossible to make such a plot using numeric and factor variables together?
I saw a previous stackoverflow post over here where a similar plot is made using numeric and factor variables: How to plot parallel coordinates with multiple categorical variables in R
However, I am using a computer with no USB port or internet access - I have a pre-installed version of R with limited libraries (I have plotly, ggplot2, dplyr, MASS ... I don't have ggally or tidyverse) and was looking for a way to do this only with the parcoord() function.
Does anyone have any ideas if this can be done?
Thanks
Thanks
One option is to label rows of the matrix using a factor and use that on the plot, e.g.
library(MASS)
set.seed(300)
par(xpd=TRUE)
par(mar=c(4, 4, 4, 6))
a = rnorm(12, 10, 10)
b = rnorm(12, 10, 5)
c = rnorm(12, 5, 10)
group <- sample(c("#FF9289", "#FF8AFF", "#00DB98", "#00CBFF"),
12, replace=TRUE)
d = cbind(a, b, c)
rownames(d) <- group
parcoord(d[, c(3, 1, 2)], col = group)
title(main = "Plot", xlab = "Variable", ylab = "Values")
axis(side = 2, at = seq(0, 1, 0.1),
tick = TRUE, las = 1)
legend(3.05, 1, legend = c("A", "B", "C", "D"), lty = 1,
col = c("#FF9289", "#FF8AFF", "#00DB98", "#00CBFF"))
EDIT
Thanks for the additional explanation. What you want does make sense, but unfortunately it doesn't look like it will work as I expected. I tried to make a plot using an ordered factor as the middle variable (per https://pasteboard.co/JKK4AUD.jpg) but got the same error ("non-numeric argument to binary operator").
One way I thought of doing it is to recode the factor as a number (e.g. "Var_1" -> 0.2, "Var_2" -> 0.4) as below:
library(MASS)
set.seed(123)
par(xpd=TRUE)
par(mar=c(4, 4, 4, 6))
a = rnorm(12, 10, 10)
b = c(rep("Var_1", 3),
rep("Var_2", 3),
rep("Var_3", 3),
rep("Var_4", 3))
c = rnorm(12, 5, 10)
group <- c(rep("#FF9289", 3),
rep("#FF8AFF", 3),
rep("#00DB98", 3),
rep("#00CBFF", 3))
d = data.frame("A" = a,
"Factor" = b,
"C" = c,
"Group" = group)
d$Factor <- sapply(d$Factor, switch,
"Var_1" = 0.8,
"Var_2" = 0.6,
"Var_3" = 0.4,
"Var_4" = 0.2)
parcoord(d[, c(1, 2, 3)], col = group)
title(main = "Plot", xlab = "Variable", ylab = "Values")
axis(side = 2, at = seq(0, 1, 0.1),
tick = TRUE, las = 1)
legend(3.05, 1, legend = c("A", "B", "C", "D"), lty = 1,
col = c("#FF9289", "#FF8AFF", "#00DB98", "#00CBFF"))
mtext(text = "Var 1", side = 1, adj = 0.6, padj = -30)
mtext(text = "Var 3", side = 1, adj = 0.6, padj = -12)
mtext(text = "Var 2", side = 1, adj = 0.6, padj = -21)
mtext(text = "Var 4", side = 1, adj = 0.6, padj = -3)

R: Problem with loop showing variable names

I want to create a loop using variable names instead of numbers but I'm struggling with it.
I have over 1000 variables in my data but the structure looks like this:
#Reproducible data
id <- rep(c("1","2","3","4","5","6"),3)
sequence <- rep(c("1","2","1","2","1","1"),3)
treatment <- c(rep(c("A"), 6), rep(c("B"), 6),rep(c("C"), 6))
var1 <- c(rnorm(3, 1, 0.4), rnorm(3, 3, 0.5), rnorm(3, 6, 0.8), rnorm(3, 1.1, 0.4), rnorm(3, 0.8, 0.2), rnorm(3, 1, 0.6))
var1_base <- c(rnorm(3, 1, 0.4), rnorm(3, 3, 0.5), rnorm(3, 6, 0.8), rnorm(3, 1.1, 0.4), rnorm(3, 0.8, 0.2), rnorm(3, 1, 0.6))
var2 <- c(rnorm(3, 1, 0.4), rnorm(3, 3, 0.5), rnorm(3, 6, 0.8), rnorm(3, 1.1, 0.4), rnorm(3, 0.8, 0.2), rnorm(3, 1, 0.6))
var2_base <- c(rnorm(3, 1, 0.4), rnorm(3, 3, 0.5), rnorm(3, 6, 0.8), rnorm(3, 1.1, 0.4), rnorm(3, 0.8, 0.2), rnorm(3, 1, 0.6))
var3 <- c(rnorm(3, 1, 0.4), rnorm(3, 3, 0.5), rnorm(3, 6, 0.8), rnorm(3, 1.1, 0.4), rnorm(3, 0.8, 0.2), rnorm(3, 1, 0.6))
var3_base <- c(rnorm(3, 1, 0.4), rnorm(3, 3, 0.5), rnorm(3, 6, 0.8), rnorm(3, 1.1, 0.4), rnorm(3, 0.8, 0.2), rnorm(3, 1, 0.6))
DF <- data.frame(id,sequence,treatment, var1, var2, var3, var1_base, var2_base, var3_base) %>%
mutate(id = factor(id),
sequence = factor(sequence),
treatment = factor(treatment, levels = c("A","B","C")))
> head(DF)
id sequence treatment var1 var2 var3 var1_base var2_base var3_base
1 1 1 A 0.5488589 1.3045888 0.2367363 1.2646227 1.2241417 0.1968524
2 2 2 A 1.0201801 1.3480361 0.9944096 0.3625067 0.8987885 1.5868442
3 3 1 A 0.7269204 0.7091029 1.2025266 0.1238612 1.8828400 0.8687552
4 4 2 A 3.3240269 3.3133104 3.2251780 2.4116230 2.6284785 2.6027341
5 5 1 A 3.3051822 2.4542786 2.1687379 3.5250026 3.2231797 2.9990167
6 6 1 A 2.7436715 2.7419527 3.8349072 2.9971485 3.0528477 2.6970430
I want to create a linear mixed model with var as the outcome; treatment, var_base (baseline), and sequence as the fixed effect; id as a random effect.
To code it one by one, it would look like this:
lm1 <- lmer(var1 ~ var1_base + treatment + sequence + (1|id), data = DF)
But since I have over 1000 vars, it wouldn't make sense to do it individually. I tried writing for loop but did not turn out to be what I expected.
#Approaches 1--it worked but I want the result to show "var" instead of "[[1]]"
lm_output <- list()
for(i in 4:6){
lm1 <-lmer(DF[[i+3]] ~ DF[[i]] + Treatment+ sequence + (1|id), data = DF)
summary(lm1)
lm_output[[i]] <- summary(lm1)
}
>print(lm_output[1:6])
[[1]]
NULL
[[2]]
NULL
[[3]]
NULL
[[4]]
Fixed effects:
Estimate Std. Error df t value Pr(>|t|)
(Intercept) 0.8995 0.6129 13.0000 1.468 0.16598
DF[[i]] 0.6772 0.1860 13.0000 3.641 0.00299 **
TreatmentB 0.1621 0.6885 13.0000 0.235 0.81751
TreatmentC -0.3112 0.7049 13.0000 -0.441 0.66611
sequence2 -0.1001 0.5715 13.0000 -0.175 0.86367
[[5]]
Fixed effects:
Estimate Std. Error df t value Pr(>|t|)
(Intercept) 0.137752 0.365302 11.104560 0.377 0.713
DF[[i]] 0.729762 0.071874 9.810327 10.153 1.61e-06 ***
TreatmentB 0.531048 0.332585 9.144490 1.597 0.144
TreatmentC 0.060414 0.343280 9.185060 0.176 0.864
sequence2 -0.001702 0.440920 4.000881 -0.004 0.997
[[6]]
Fixed effects:
Estimate Std. Error df t value Pr(>|t|)
(Intercept) 0.765739 0.446747 13.000000 1.714 0.110
DF[[i]] 0.783985 0.132198 13.000000 5.930 4.98e-05 ***
TreatmentB 0.006516 0.554550 13.000000 0.012 0.991
TreatmentC -0.312968 0.515562 13.000000 -0.607 0.554
sequence2 -0.762799 0.436095 13.000000 -1.749 0.104
Is there a way to transform [[4]] --> var1, [[5]] --> var2..., so it's more intuitive and easier to manage the data?
#Approaches 2--Tried storing vars name as a vector first and ran. Did not work
responseList <- names(DF)[c(4:6)]
lm_output2 <- list()
for(i in n){
lm2<-lmer(get(n+3) ~ get(n) + Treatment+ sequence + (1|id), data = DF)
summary(lm2)
lm_output2[[i]] <- summary(lm2)
}
> Error in n + 3 : non-numeric argument to binary operator
I understand this error because in this case, n is not numeric so it would fail to do get (n+3). But I don't know how can I specify var and var_base in the same loop.
Any suggestion is appreciated, thank you!
You can build the formula for lmer as a string. So we could loop over vars (1, 2, 3, etc.) and concatenate formula from the desired variable names, like this:
library(lme4)
lm_output <- list()
for(i in 1:3) {
outcome_var = paste("var", i, sep = "")
base_var = paste(outcome_var, "base", sep = "_")
form = as.formula(paste(outcome_var,
" ~ ",
base_var,
" + treatment + sequence + (1 | id)",
sep = ""))
lm1 = lmer(form, data = DF)
summary(lm1)
lm_output[[i]] <- summary(lm1)
}

How to adapt the size of multiple plots?

How can I adapt the size of the following plots with regard to their length of the x-axis?
The width of the plots should refer to the length of their respective section of the x-axis. The height should be the same for all plots.
The function you want is base graphics function help("layout").
First I will make up a dataset, since you have not posted one. I will not draw the regression lines, just the points.
Data creation code.
fun <- function(X, A) {
apply(X, 1, function(.x){
xx <- seq(.x[1], .x[2], length.out = 100)
y <- A[1]*xx + A[2] + rnorm(100, 0, 25)
list(xx, y)
})}
Coef <- matrix(c(0.24, 0.54,
0.75, 0.54,
0.33, 2.17,
0.29, 3.3,
0.29, 4.41), byrow = TRUE, ncol = 2)
X <- matrix(c(0.1, 0.49,
0.5, 2.49,
2.5, 3.9,
4.0, 5.9,
6.0, 12.0), byrow = TRUE, ncol = 2)
set.seed(1234)
res <- fun(X, Coef)
The problem.
Define a layout matrix with each plot in a sequence from first to 5th. And the widths given by the X ranges.
layout_mat <- matrix(c(1, 2, 3, 4, 5), 1, 5, byrow = TRUE)
w <- apply(X, 1, diff)
l <- layout(layout_mat, widths = w)
layout.show(l)
Now make some room for the axis annotation, saving the default graphics parameters, and plot the 5 graphs.
om <- par(mar = c(3, 0.1, 0.1, 0.1),
oma = c(3, 2, 0.1, 0.1))
for(i in 1:5) plot(res[[i]][[1]], res[[i]][[2]])
par(om)

Calling ROI "LP "and "QP" functions

I am trying to reproduce some of the examples given by the ROI creators.
For example in http://statmath.wu.ac.at/courses/optimization/Presentations/ROI-2011.pdf (slides 15-17) there is the example:
library("ROI")
#ROI: R Optimization Infrastructure
#Installed solver plugins: cplex, lpsolve, glpk, quadprog, symphony, nlminb.
#Default solver: glpk.
(constr1 <- L_constraint(c(1, 2), "<", 4))
#An object containing 1 linear constraints.
(constr2 <- L_constraint(matrix(c(1:4), ncol = 2), c("<", "<"), c(4, 5)))
#An object containing 2 linear constraints.
rbind(constr1, constr2)
#An object containing 3 linear constraints.
(constr3 <- Q_constraint(matrix(rep(2, 4), ncol = 2), c(1, 2), "<", 5))
#An object containing 1 constraints.
#Some constraints are of type quadratic.
foo <- function(x) {sum(x^3) - seq_along(x) %*% x}
F_constraint(foo, "<", 5)
lp <- LP(objective = c(2, 4, 3), L_constraint(L = matrix(c(3, 2, 1, 4, 1, 3, 2, 2, 2), nrow = 3), dir = c("<=", "<=", "<="), rhs = c(60, 40, 80)), maximum = TRUE)
qp <- QP(Q_objective(Q = diag(1, 3), L = c(0, -5, 0)), L_constraint(L = matrix(c(-4, -3, 0, 2, 1, 0, 0, -2, 1), ncol = 3, byrow = TRUE), dir = rep(">=", 3), rhs = c(-8, 2, 0)))
When I run it I get the errors
Error in LP(objective = c(2, 4, 3), L_constraint(L = matrix(c(3, 2, 1, :
could not find function "LP"
and
Error in QP(Q_objective(Q = diag(1, 3), L = c(0, -5, 0)), L_constraint(L = matrix(c(-4, :
could not find function "QP"
In fact the functions are not in ROI's namespace. e.g.
ROI::LP
Error: 'LP' is not an exported object from 'namespace:ROI'
The same syntax appears in other examples I found on the web but the functions LP and QP are never defined.
I am using ROI 0.3.0
Can someone tell me what is going wrong?
The commands LP and QP were both changed to OP.
library("ROI")
## ROI: R Optimization Infrastructure
## Registered solver plugins: nlminb, alabama, cbc, cccp, clp, deoptim, ecos, glpk, ipop, lpsolve, msbinlp, neos, nloptr, ucminf, spg, cgm, vmm, bobyqa, newuoa, uobyqa, hjk, nmk, lbfgs, optimx, qpoases, quadprog, scs, symphony.
## Default solver: auto.
(constr1 <- L_constraint(c(1, 2), "<", 4))
## An object containing 1 linear constraint.
(constr2 <- L_constraint(matrix(c(1:4), ncol = 2), c("<", "<"), c(4, 5)))
## An object containing 2 linear constraints.
rbind(constr1, constr2)
## An object containing 3 linear constraints.
(constr3 <- Q_constraint(matrix(rep(2, 4), ncol = 2), c(1, 2), "<", 5))
## An object containing 0 linear constraints
## 1 quadratic constraint.
foo <- function(x) {sum(x^3) - seq_along(x) %*% x}
F_constraint(foo, "<", 5)
## An object containing 1 nonlinear constraint.
lp <- OP(objective = c(2, 4, 3),
L_constraint(L = matrix(c(3, 2, 1, 4, 1, 3, 2, 2, 2), nrow = 3),
dir = c("<=", "<=", "<="),
rhs = c(60, 40, 80)), maximum = TRUE)
qp <- OP(Q_objective(Q = diag(1, 3), L = c(0, -5, 0)),
L_constraint(L = matrix(c(-4, -3, 0, 2, 1, 0, 0, -2, 1), ncol = 3, byrow = TRUE),
dir = rep(">=", 3), rhs = c(-8, 2, 0)))
The slides you refer to are outdated. The new documentation is on http://roi.r-forge.r-project.org !

SVM from e1071 R package replaces labels if there is a feature with only 1 unique value

Why SVM from e1071 package replaces original labels by "1" and "2", if there is at least one such column having only one unique value?
For example, the code below works correctly:
trainData <- data.frame("cA" = c(1, 1, 1, 0.99),
"cB" = c(0.5, 0.6, 0.5, 0.3),
"is_match" = factor(c("N", "N", "P", "P")))
testData <- data.frame("cA" = c(1, 1, 0, 0),
"cB" = c(0.2, 0.3, 0.2, 0.1))
model <- svm(is_match ~ ., data = trainData, type = "C-classification")
pred <- predict(model, testData, type = "class")
print(pred)
it returns
1 2 3 4
P P P P
However, if I change 0.99 to 1 in the first column - so that all values become the same - svm changes labels "N" and "P" to "1" and "2":
trainData <- data.frame("cA" = c(1, 1, 1, 1),
"cB" = c(0.5, 0.6, 0.5, 0.3),
"is_match" = factor(c("N", "N", "P", "P")))
testData <- data.frame("cA" = c(1, 1, 0, 0),
"cB" = c(0.2, 0.3, 0.2, 0.1))
model <- svm(is_match ~ ., data = trainData, type = "C-classification")
pred <- predict(model, testData, type = "class")
print(pred)
Such code returns:
1 2 3 4
2 2 2 2
Additional notes:
It happens with all possible values in column (zeros, NAs) as long as they are all the same for each instance
if labels are digits, svm doesn't replace them
other ML methods like rpart or ada works correctly

Resources