R neuralnet training for a simple dataset of squares of numbers - r

Dear neuralnet experts,
I am studying ANN with a book and R package.
One of the examples is to train a neuralnet (R package) for a simple set of squares of numbers [1~10]. It was quite quick and easy to fit them with 1 hidden layer with 10 neurons.
But, for a large set of [1~30], the algorithm does not converge. I think that some parameters should be changed to train the neuralnet. At first, I increased the number of neurons and hidden layers, i.e, c(20,10). But, failed...
Could somebody please guide me to learn more about neuralnet to train the dataset?
My code in R is given as,
library("neuralnet")
#Read the input file
mydata50=read.csv('Squares50.csv',sep=",",header=TRUE)
mydata30 <- mydata50[1:30,]
attach(mydata30)
names(mydata30)
mydata30
#Train the model based on output from input
model30=neuralnet(formula = Output~Input,
data = mydata30,
hidden=c(20,10),
threshold=0.01 )
print(model30)
#Lets plot and see the layers
plot(model30)
Best regards,
Dong-Ho

Related

Find out the most contributing variables/features of an R H2O AutoML model?

I'm currently working with some insurance data to predict in what kind of insured sum class a customer will fall. To achieve this I'm using the AutoML function of the H2O package in R. Now that I have my model I'd like to be able to see which variables/features in my data contribute the most to the predictions the model makes. Is such a thing possible with H2O? If not, what would be another good option to achieve this with R? Thanks!
Definitely possible. If the best fitting model that AutoML has selected is not an ensemble then you can use the following to plot the variable importances (where model is your model extracted from AutoML),
library(h2o)
h2o.varimp_plot(model)
If the best fitting model is an ensemble then things are a little more complicated. A good option is to use the lime package to look at local importance.
library(h2o)
library(lime)
## Train explainer
explainer <- lime(train, model)
## Get explanations for a subset of samples
explanation <- explain(train[1:5, ], explainer, n_features = 10)
## Plot global explanations
plot_explanations(explanation)
## Plot local explanations
plot_features(explanation)

Classification with One Class SVM in R

I am trying to code a SVM for classification using a training data-set that contains only one type of class. So, i want to predict if some data is different or not from my data-set.
I used the same data-set as the training for predicting, but unfortunately, the SVM is not predicting well.
library(e1071)
# Data set
high <- c(10,5,14,12,20)
temp <- c(12,15,20,15,9)
x <- cbind(high,temp)
# Create SVM
model <- svm(x,y=NULL,type='one-classification',kernel='linear')
# Predict training data-set
pred <- predict(model,x)
pred
It returns:
TRUE TRUE FALSE FALSE TRUE
It should be TRUE for all of them.
I am working on a similar problem. In reading the vignette's that the e1071 authors have at CRAN I believe that by definition the SVM is going to draw a hyperplane that separates it into 2 classes. In other words, that 3rd item is the most likely to be an outlier. SVM will always define at least one outlier.
I'm not sure traditional supervised learning techniques, such as SVMs, are well suited to training data where you only have 1 class. There's nothing in the data to inform the model how to differentiate between class A and class B.
I think the best you can do with your 1-class training data is to learn a probability density/mass function from the data, and then find how likely a new instance is under the learned probability density. For some more info see the wikipedia article on one-class classification.

Random forest evaluation in R

I am a newbie in R and I am trying to do my best to create my first model. I am working in a 2- classes random forest project and so far I have programmed the model as follows:
library(randomForest)
set.seed(2015)
randomforest <- randomForest(as.factor(goodkit) ~ ., data=training1, importance=TRUE,ntree=2000)
varImpPlot(randomforest)
prediction <- predict(randomforest, test,type='prob')
print(prediction)
I am not sure why I don't get the overall prediction for my model.I must be missing something in my code. I get the OOB and the prediction per case in the test set but not the overall prediction of the model.
library(pROC)
auc <-roc(test$goodkit,prediction)
print(auc)
This doesn't work at all.
I have been through the pROC manual but I cannot get to understand everything. It would be very helpful if anyone can help with the code or post a link to a good practical sample.
Using the ROCR package, the following code should work for calculating the AUC:
library(ROCR)
predictedROC <- prediction(prediction[,2], as.factor(test$goodkit))
as.numeric(performance(predictedROC, "auc")#y.values))
Your problem is that predict on a randomForest object with type='prob' returns two predictions: each column contains the probability to belong to each class (for binary prediction).
You have to decide which of these predictions to use to build the ROC curve. Fortunately for binary classification they are identical (just reversed):
auc1 <-roc(test$goodkit, prediction[,1])
print(auc1)
auc2 <-roc(test$goodkit, prediction[,2])
print(auc2)

unsupervised random forest classification of raster stack in R

I want to compute an unsupervised random forest classification out of a raster stack in R. The raster stack represents the same extent in different spectral bands and as a result I want to obtain an unsupervised classification of the stack.
I am having problems with my code as my data is very huge. Is it okay to just convert the stack into a dataframe in order to run the random forest algorithm like this:
stack_median <- stack(b1_mosaic_median, b2_mosaic_median, b3_mosaic_median, b4_mosaic_median, b5_mosaic_median, b7_mosaic_median)
stack_median_df <- as.data.frame(stack_median)
Here is the data as a csv file (https://www.dropbox.com/s/gkaryusnet46f0i/stack_median_df.csv?dl=0) - and you can read it in via:
stack_median_df<-read.csv(file="stack_median_df.csv")
stack_median_df<-stack_median_df[,-1]
stack_median_df_na <- na.omit(stack_median_df)
My next step would be the unsupervised classification:
median_rf <- randomForest(stack_median_df_na, importance=TRUE, proximity=FALSE, ntree=500, type=unsupervised, forest=NULL)
Due to my huge dataset a proximity measure can't be calculated (would need around 6000GB). Do you know how to be able to have a look at the classification? As predict(median_rf) and plot(median_rf) don't return anything.
I am happy for every suggestion, improvement or code snippet of a unsupervised random forest classification with its accuracy measures,...
Thanks a lot!
I think you could use a large sample for unsupervised classification, and then use the create a supervised classification model (that predicts the classes from the raw data; and should have a very good fit) and apply that to the entire data set.

Multivariate time series model using MARSS package (or maybe dlm)

I have two temporal processes. I would like to see if one temporal process (X_{t,2}) can be used to perform better forecast of the other process (X_{t,1}). I have multiple sources providing temporal data on X_{t,2}, (e.g. 3 time series measuring X_{t,2}). All time series require a seasonal component.
I found MARSS' notation to be pretty natural to fit this type of model and the code looks like this:
Z=factor(c("R","S","S","S")) # observation matrix
B=matrix(list(1,0,"beta",1),2,2) #evolution matrix
A="zero" #demeaned
R=matrix(list(0),4,4); diag(R)=c("r","s","s","s")
Q="diagonal and unequal"
U="zero"
period = 12
per.1st = 1 # Now create factors for seasons
c.in = diag(period)
for(i in 2:(ceiling(TT/period))) {c.in = cbind(c.in,diag(period))}
c.in = c.in[,(1:TT)+(per.1st-1)]
rownames(c.in) = month.abb
C = "unconstrained" #2 x 12 matrix
dlmfit = MARSS(data, model=list(Z=Z,B=B,Q=Q,C=C, c=c.in,R=R,A=A,U=U))
I got a beta estimate implying that the second temporal process is useful in forecasting the first process but to my dismay, MARSS gives me an error when I use MARSSsimulate to forecast because one of the matrices (related to seasonality) is time-varying.
Anyone, knows a way around this issue of the MARSS package? And if not, any tips on fitting an analogous model using, say the dlm package?
I was able to represent my state-space model in a form adequate to use with the dlm package. But I encountered some problems using dlm too. First, the ML estimates are VERY unstable. I bypassed this issue by constructing the dlm model based on marss estimates. However, dlmFilter is not working properly. I think the issue is that dlmFilter is not designed to deal with models with multiple sources for one time series, and additional seasonal components. dlmForecast gives me forecasts that I need!!!
In summary for my multivariate time series model (with multiple sources providing data for one of the temporal processes), the MARSS library gave me reasonable estimates of the parameters and allowed me to obtain filtered and smoothed values of the states. Forecast values were not possible. On the other hand, dlm gave fishy estimates for my model and the dlmFilter didn't work, but I was able to use dlmForecast to forecast values using the model I fitted in MARSS and reexpressed in dlm appropriate form.

Resources