I am trying to implement a simple Multi-layer feed forward neural network using "neuralnet" package available in R for the "iris" dataset.
The code that I am using is as follows-
library(neuralnet)
data(iris)
D <- data.frame(iris, stringsAsFactors=TRUE)
# create formula-
f <- as.formula(Species ~ Sepal.Length + Sepal.Width + Petal.Length + Petal.Width)
# convert qualitative variables to dummy (binary) variables-
m <- model.matrix(f, data = D)
# create neural network-
iris_nn <- neuralnet(f, data = m, hidden = 4, learningrate = 0.3)
I have two questions at this point of time-
1.) How do I use the "hidden" parameter? According to the manual pages, its saying-
hidden: a vector of integers specifying the number of hidden neurons (vertices) in each layer
How should I supply the vector of integer? Say if I wanted to have 1 hidden layer of 4 neurons/perceptrons in each layer Or if I wanted to have 3 hidden layers of 5 neurons in each layer.
2.) The last line of code gives me the error-
Error in eval(predvars, data, env) : object 'Species' not found
If I remove the "hidden" parameter, this error still persists.
What am I doing wrong here?
Edit: after adding the line-
m <- model.matrix(f, data = D)
The matrix 'm' no longer contains "Species" variable/attribute which I am trying to predict.
Output of
str(D)
str(D) 'data.frame': 150 obs. of 5 variables: $ Sepal.Length: num
5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ... $ Sepal.Width : num 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ... $ Petal.Length: num 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ... $ Petal.Width : num 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ... $ Species : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...
I have coded this with "nnet" successfully. Posting my code for reference-
data(iris)
library(nnet)
# create formula-
f <- as.formula(Species ~ Sepal.Length + Sepal.Width + Petal.Length + Petal.Width)
# create a NN with hidden layer having 4 neurons/node and
# maximum number of iterations = 3
iris_nn <- nnet(f, data = iris, size = 4, maxit = 3)
# create a test data-
new_obs <- data.frame(Sepal.Length = 5.5, Sepal.Width = 3.1, Petal.Length = 1.4, Petal.Width = 0.4)
# make prediction-
predict(iris_nn, new_obs) # gives percentage of which class it may belong
predict(iris_nn, new_obs, type = "class") # gives the class instead of percentages of which 'class' this data type may belong to
# create a 'confusion matrix' to measure accuracy of model-
# rows are actual values and columns are predicted values-
# table(iris$Species, predict(iris_nn, iris[, 1:4], type = "class"))
cat("\n\nConfusion Matrix for # of iters = 3\n")
print(table(iris$Species, predict(iris_nn, iris[, 1:4], type = "class")))
cat("\n\n")
rm(iris_nn)
# setting 'maxit' to 1000, makes the model coverge-
iris_nn <- nnet(f, data = iris, size = 4, maxit = 1000)
# create a new confusion matrix to check model accuracy again-
cat("\n\nConfusion Matrix for # of iters = 1000\n")
print(table(iris$Species, predict(iris_nn, iris[, 1:4], type = "class")))
# table(iris$Species, predict(iris_nn, iris[, 1:4], type = "class"))
# to plot 'iris_nn' trained NN-
# library("NeuralNetTools")
# plotnet(iris_nn)
Thanks!!
No clue how NN runs and what's the best way to run it. Don't know much about the iris dataset as well.
Just pointing out why its not running - the column Species
str(d)
'data.frame': 150 obs. of 5 variables:
$ Sepal.Length: num 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
$ Sepal.Width : num 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
$ Petal.Length: num 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
$ Petal.Width : num 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
$ Species : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...
Species is a factor NN doesnt take factors.
Convert to dummy varibles -
d$set <-0
d$set[d$Species == "setosa"] <- 1
d$versi <-0
d$versi[d$Species == "versicolor"] <- 1
f <- as.formula(set+versi ~ Sepal.Length + Sepal.Width + Petal.Length + Petal.Width)
iris_nn <- neuralnet(f, data = d, hidden = 4, learningrate = 0.3)
EDIT:
So when you say hidden = c(5,3)
then the neural network diagram would have your input nodes, 5 side by side hidden nodes(a layer), 3 side by side hidden nodes(another layer), output node/nodes
No clue how they impact the accuracy.
The compute for neuralnet is like predict for all other machine learning models.
library(neuralnet)
library(caret) #has the confusionmatrix function in it
#for some reason compute needs to be called like that, calling normally was producing some error
nnans <- neuralnet::compute(NN, test)
confusionMatrix(nnans, test_labels))
1.) Referring to your question how to use the "hidden" parameter, here are some examples.
neuralnet(f, data = m, hidden = c(2,3,2) , learningrate = 0.3)
or
neuralnet(f, data = m, hidden = c(2,2) , learningrate = 0.3)
Related
I am trying to compute several model in the same time. The dependent variable in the first column, as rest of them are independent columns. I want to run logistic regression between IV and DV for each independent variables separately. Thank you very much for your help! Please let me know anything needs to be provided.
**** Some of IV are bivariate variables. So it should be treated as.factor in R.
*** After compute each model, can I also compute a p-value for each model in one time.
*** Right now, I just compute and summary each model separately
The data and my current code looks like below.
enter image description here
enter image description here
Pictures of your data are not as helpful as providing a sample of your data with dput(). Also you should paste your code directly into your question and not paste a picture. Here is an example using the iris data set that is included with R:
data(iris)
iris.2 <- iris[iris$Species!="setosa", ]
iris.2 <- droplevels(iris.2)
iris.2$Species <- as.numeric(iris.2$Species) - 1
# Species: 0 == versicolor, 1== virginica
str(iris.2)
# 'data.frame': 100 obs. of 5 variables:
# $ Sepal.Length: num 7 6.4 6.9 5.5 6.5 5.7 6.3 4.9 6.6 5.2 ...
# $ Sepal.Width : num 3.2 3.2 3.1 2.3 2.8 2.8 3.3 2.4 2.9 2.7 ...
# $ Petal.Length: num 4.7 4.5 4.9 4 4.6 4.5 4.7 3.3 4.6 3.9 ...
# $ Petal.Width : num 1.4 1.5 1.5 1.3 1.5 1.3 1.6 1 1.3 1.4 ...
# $ Species : num 0 0 0 0 0 0 0 0 0 0 ...
Now we compute the logistic regression in which Species is the dependent variable against each of the independent variables.
forms <- paste("Species ~", colnames(iris.2)[-5])
forms
# [1] "Species ~ Sepal.Length" "Species ~ Sepal.Width" "Species ~ Petal.Length" "Species ~ Petal.Width"
iris.glm <- lapply(forms, function(x) glm(as.formula(x), iris.2, family=binomial))
Now iris.glm is a list containing all of the results. The results of the first logistic regression are iris.glm[[1]] and summary(iris.glm[[1]]) gives you the summary. To print all of the results use lapply():
lapply(iris.glm, print)
lapply(iris.glm, summary)
I´m trying to use levene Test from "car" library in R with the iris dataset.
The code I have is:
library(tidyverse)
library(car)
iris %>% group_by (Species) %>% leveneTest( Sepal.Length )
From there I´m getting the following error:
Error in leveneTest.default(., Sepal.Length) :
. is not a numeric variable
I don´t know how to fix this, since the data types seem to be of the rigth type:
> str(iris)
'data.frame': 150 obs. of 5 variables:
$ Sepal.Length: num 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
$ Sepal.Width : num 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
$ Petal.Length: num 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
$ Petal.Width : num 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
$ Species : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...
For levene test, you need to specify a grouping factor, for example:
leveneTest(Sepal.Length ~ Species,data=iris)
Levene's Test for Homogeneity of Variance (center = median)
Df F value Pr(>F)
group 2 6.3527 0.002259 **
147
This test whether the variances are homogenous across groups. It doesn't quite make sense for you to group them and do the leveneTest within the group. If you intend to do something else, you can elaborate more or comment.
try to do it this way
with(iris, leveneTest(Sepal.Length, Species))
maybe you are looking for such a solution
map(iris[, 1:4], ~ leveneTest(.x, iris$Species))
Are there any packages in R that can generate a random dataset given a pre-existing template dataset?
For example, let's say I have the iris dataset:
data(iris)
> head(iris)
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1 5.1 3.5 1.4 0.2 setosa
2 4.9 3.0 1.4 0.2 setosa
3 4.7 3.2 1.3 0.2 setosa
4 4.6 3.1 1.5 0.2 setosa
5 5.0 3.6 1.4 0.2 setosa
6 5.4 3.9 1.7 0.4 setosa
I want some function random_df(iris) which will generate a data-frame with the same columns as iris but with random data (preferably random data that preserves certain statistical properties of the original, (e.g., mean and standard deviation of the numeric variables).
What is the easiest way to do this?
[Comment from question author moved here. --Editor's note]
I don't want to sample random rows from an existing dataset. I want to generate actually random data with all the same columns (and types) as an existing dataset. Ideally, if there is some way to preserve statistical properties of the data for numeric variables, that would be preferable, but it's not needed
How about this for a start:
Define a function that simulates data from df by
drawing samples from a normal distribution for numeric columns in df, with the same mean and sd as in the original data column, and
uniformly drawing samples from the levels of factor columns.
generate_data <- function(df, nrow = 10) {
as.data.frame(lapply(df, function(x) {
if (class(x) == "numeric") {
rnorm(nrow, mean = mean(x), sd = sd(x))
} else if (class(x) == "factor") {
sample(levels(x), nrow, replace = T)
}
}))
}
Then for example, if we take iris, we get
set.seed(2019)
df <- generate_data(iris)
str(df)
#'data.frame': 10 obs. of 5 variables:
# $ Sepal.Length: num 6.45 5.42 4.49 6.6 4.79 ...
# $ Sepal.Width : num 2.95 3.76 2.57 3.16 3.2 ...
# $ Petal.Length: num 4.26 5.47 5.29 6.19 2.33 ...
# $ Petal.Width : num 0.487 1.68 1.779 0.809 1.963 ...
# $ Species : Factor w/ 3 levels "setosa","versicolor",..: 3 2 1 2 3 2 1 1 2 3
It should be fairly straightfoward to extend the generate_data function to account for other column types.
I estimate a randomForest, then run the randomForest.predict function on some hold-out data.
What I would like to do is (preferably) append the prediction for each row to the dataframe containing the holdout data as a new column, or (second choice) save the (row number in test data, prediction for that row) as a .csv file.
What I can't do is access the internals of the results object in a way that lets me do that. I'm new to R so I appreciate your help.
I have:
res <-predict(forest_tst1,
test_d,
type="response")
which successfully gives me a bunch of predictions.
The following is not valid R, but ideally I would do something like:
test_d$predicted_value <- results[some_field_of_the_results]
or,
for i = 1:nrow(test_d)
test_d[i, new_column] = results[prediction_for_row_i]
end
Basically I just want a column of predicted 1's or 0's corresponding to rows in test_d. I've been trying to use the following commands to get at the internals of the res object, but I've not found anything that's helped me.
attributes(res)
names(res)
Finally - I'm a bit confused by the following if anyone can explain!
typeof(res) = "integer"
Edit: I can do
res != test_d$gold_label
which is if anything a little confusing, because I'm comparing a column and a non-column object (??), and
length(res) = 2053
and res appears to be indexable
attributes(res[1])
$names
[1] "6836"
$levels
[1] "0" "1"
$class
[1] "factor"
but I can't select out the sub-parts in a sensible way
> res[1][1]
6836
0
Levels: 0 1
> res[1]["levels"]
<NA>
<NA>
Levels: 0 1
If understand right, all you are trying to do is add predictions to your Test Data?
ind <- sample(2, nrow(iris), replace = TRUE, prob=c(0.8, 0.2))
TestData = iris[ind == 2,] ## Generate Test Data
iris.rf <- randomForest(Species ~ ., data=iris[ind == 1,]) ## Build Model
iris.pred <- predict(iris.rf, iris[ind == 2,]) ## Get Predictions
TestData$Predictions <- iris.pred ## Append the Predictions Column
OutPut:
Sepal.Length Sepal.Width Petal.Length Petal.Width Species Predictions
9 4.4 2.9 1.4 0.2 setosa setosa
16 5.7 4.4 1.5 0.4 setosa setosa
17 5.4 3.9 1.3 0.4 setosa setosa
32 5.4 3.4 1.5 0.4 setosa setosa
42 4.5 2.3 1.3 0.3 setosa setosa
46 4.8 3.0 1.4 0.3 setosa setosa
I'm conducting a KNN algorithm on R. I have three datasets. I've been working on my code here's what I have:
library(stats)
library(class)
#load up train and testing files
train1<-read.table("train1.txt",header=FALSE)
test1<-read.table("test1.txt",header=FALSE)
#convert inputs into matrix
train = matrix(train1, byrow = T, ncol=3)
test = matrix(test1, byrow = T, ncol=3)
#load the classes in the training data
cl1a<-read.table("classes1.txt",header = FALSE)
clas=matrix(cl1a,byrow=T,ncol=1)
#set k
kk = 2
#run knn
kn1 = knn(train, test, clas, k=kk, prob=TRUE)
After running the last line I get the error message:
Error in knn(train, test, clas, k = kk, prob = TRUE) :
(list) object cannot be coerced to type 'double'
I've read somewhere else that this can be fixed by converting tables into matrix, but I fixed that on my code as you can see.
Any help is appreciated!
You need to use as.matrix as I suggested in my comments above. Here's why:
str(matrix(iris,byrow=T,ncol=5))
As you can see this produces a list.
List of 5
$ : num [1:150] 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
$ : num [1:150] 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
$ : num [1:150] 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
$ : num [1:150] 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
$ : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...
- attr(*, "dim")= int [1:2] 1 5
as.matrix on the other hand produces a matrix.
Now why the error anyways?
From ?knn we can see that it accepts matrices or dataframes:
train
matrix or data frame of training set cases.
test
matrix or data frame of test set cases.
A vector will be interpreted as a row vector for a single case
This explains why we have the error:
Error in knn(train, test, clas, k = kk, prob = TRUE) : (list) object cannot be coerced to type 'double'
The safe thing to do is to either use as.data.frame or as.matrix