I am working on my research and am stuck for a long time on getting the weights to converge in nnet package. I am running back propagation algorithm on weather data to predict temperature. I previously used neuralnet package and weights converged very well. I was able to increase the number of iterations to increase accuracy. But neuralnet package is extremely slow when I increase the input pattern size. Below is brief explanation about the data.
I collect weather data every 1 hour from a station. I pad every 1 hour data for 24 hours into a row. This forms my input pattern. Neuralnet runs beautifully and converges for one months data, which is around 653 patterns. Now when I want to train the same with one year. Neuralnet seems to run for ages.
I hence opted to nnet. nnet is very quick. But it does not converge as neuralnet.
Below are the code snippets I used.
formula.in=V385+V386+....+V392~V1+V2+...+V382+V383+V384
net<-neuralnet(formula=formula.in,data=data,hidden=90,threshold=0.0001,act.fct="tanh",err.fct="sse", algorithm = "rprop+")
As you see here, I can change the threshold to get desired convergence.
Now My nnet snippet is (this is the one that was quicker but does not allow me to change the convergence threshold.
target<-as.matrix(data[,385:392]);
input<-as.matrix(data[,1:384]);
net<-nnet(input,target,size=2,maxit=15000)
Any help would be greatly appreciated.
Thanks a ton.
i am also using R for neural networks. As i am a beginner so, please pardon me if my point sounds bit silly. As you said, you are using back-propagation algorithm to train the network, but as far as i know, nnet only implements a single layer Feed Forward neural network unlike neuralnet.
In short, i think nnet is not using back-propagation. Kindly correct me if i am wrong.
Related
I am trying to build a Mixed Model Lasso model using glmmLasso in RStudio. However, I am looking for some assistance.
I have the equation of my model as follows:
glmmModel <- glmmLasso(outcome ~ year + married ,list(ID=~1), lambda = 100, family=gaussian(link="identity"), data=data1,control = list(print.iter=TRUE))
where outcome is a continuous variable, year is the year the data was collected, and married is a binary indicator (1/0) of whether or not the subject is married. I eventually would like to include more covariates in my model, but for the purpose of successfully first getting this to run, right now I am just attempting to run a model with these two covariates. My data1 dataframe is 48000 observations and 57 variables.
When I click run, however, the model runs for many hours (48+) without stopping. The only feedback I am getting is "ITERATION 1," "ITERATION 2," etc... Is there something I am missing or doing wrong? Please note, I am running on a machine with only 8 GB RAM, but I don't think this should be the issue, right? My dataset (48000 observations) isn't particularly large (at least I don't think so). Any advice or thoughts would be appreciated on how I can fix this issue. Thank you!
This is too long to be a comment, but I feel like you deserve an answer to this confusion.
It is not uncommon to experience "slow" performance. In fact in many glmm implementations it is more common than not. The fact is that Generalized Linear Mixed Effect models are very hard to estimate. For purely gaussian models (no penalizer) a series of proofs gives us the REML estimator, which can be estimated very efficiently, but for generalized models this is not the case. As such note that the Random Effect model matrix can become absolutely massive. Remember that for every random effect, you obtain a block-diagonal matrix so even for small sized data, you might have a model matrix with 2000+ columns, that needs to go through optimization through PIRLS (inversions and so on).
Some packages (glmmTMB, lme4 and to some extend nlme) have very efficient implementations that abuse the block-diagonality of the random effect matrix and high-performance C/C++ libraries to perform optimized sparse-matrix calculations, while the glmmLasso (link to source) package uses R-base to perform all of it's computations. No matter how we go about it, the fact that it does not abuse sparse computations and implements it's code in R, causes it to be slow.
As a side-note, my thesis project had about 24000~ observations, with 3 random effect variables (and some odd 20 fixed effects). The fitting process of this dataset could take anywhere between 15 minutes to 3 hours, depending on the complexity, and was primarily decided by the random effect structure.
So the answer from here:
Yes glmmLasso will be slow. It may take hours, days or even weeks depending on your dataset. I would suggest using a stratified (or/and clustered) subsample across independent groups, fit the model using a smaller dataset (3000 - 4000 maybe?), to obtain initial starting points, and "hope" that these are close to the real values. Be patient. If you think neural networks are complex, welcome to the world of generalized mixed effect models.
I have been trying different methods of forecasting and stumbled upon the
nnetar()
function in the forecast package of R. I soon quickly realized that while this does work to forecast, it gives me something different every time I run it. Could anybody help to explain why this happens? I thought I had a decent understanding of neural nets and I don't see what could make drastic differences in forecasts, unless the nnetar() function randomly selects the number of nodes or something. Any help?
20, by default, networks are trained with random starting values and then their predictions are averaged when you use the function.
Because the function uses random starting values for each run, the forecasts will be different for each call too.
EDIT: new question from OP in the comments
In order to control the function and get the same random starting values each time, you can simple use the function set.seed() with the value of your choice.
For example:
set.seed(666)
forecast(nnetar(...),...)
set.seed(666)
forecast(nnetar(...),...)
set.seed(666)
forecast(nnetar(...),...)
will give the same results every time you run it with this "seed" value (666). You have to run set.seed(666) before every run of the rest of you code of course.
EDIT 2: new new question from OP in the comments
In order to have 100 different networks to fit with random starting weights:
nnetar(...,repeats=100,...)
I would like to fit an LSTM model using MXNET in R for the purpose of predicting a continuous response (i.e., regression) given several continuous predictors. However, the mx.lstm() function seems to be geared toward NLP as it requires arguments which don't seem applicable to a regression problem (such as those related to embedding).
Is MXNET capable of this sort of modeling and, if not, what is an example of an appropriate tool (preferably in R)? Are there any tutorials relevant to the problem I've described?
LSTM is used for working with temporal data: text, speech, time series. If you want to predict a continuous response, then I assume you want to do something similar to time series analysis.
If my assumption is correct, then, please, take a look here. It gives quite a good example on how to use MxNet with R for time series on CPU. The GPU version is also available here.
Background
I am trying to fit a topic model with the following data and specification documents=140 000, words = 3000, and topics = 15. I am using the package topicmodels in R (3.1.2) on a Windows 7 machine (ram 24 GB, 8 cores). My problem is that the computation only goes on and on without any “convergence” being produced.
I am using the default options in LDA() function in topicmodels:
Run model
dtm2.sparse_TM <- LDA(dtm2.sparse, 15)
The model has been running for about 72 hours – and still is as I am writing.
Question
So, my questions are (a) if this is normal behaviour; (b) if not to the first question, do you have any suggestion on what do; (c) if yes to the first question, how can I substantially improve the speed of the computation?
Additional information: The original data contains not 3000 words but about 3.7 million. When I ran that (on the same machine) it did not converge, not even after a couple of weeks. So I ran it with 300 words and only 500 documents (randomly selected) and not all worked fine. I used the same nr of topics and default values as before for all models.
So for my current model (see my question) I removed sparse terms with the help of the tm package.
Remove sparse terms
dtm2.sparse <- removeSparseTerms(dtm2, 0.9)
Thanks for the input in advance
Adel
You need to use online variational Bayes which can easily handle the training such number of documents. In online variational Bayes you train the model using mini-batches of your training samples which increases the convergence speed amazingly (refer to SGD link below).
For R, you can use this package. Here you can read more about it and how to use it too. Also look at this paper since that R package implements the method used in that paper. If possible, import their Python code uploaded here in R. I highly recommend the Python code since I had such a great experience with it for a project I recently worked on. When the model is learned, you can save the topic distributions for future use and use the input it to onlineldavb.py along with your test samples to integrate over the topic distributions given those unseen documents. With online variational Bayesian methods I trained an LDA with 500000 documents and 5400 words in the vocabulary data set in less than 15 hours.
Sources
Variational Bayesian Methods
Stochastic Gradient Descent (SGD)
I am using the library e1071. In particular, I'm using the svm function. My dataset has 270 fields and 800,000 rows. I've been running this program for 24+ hours now, and I have no idea if it's hung or still running properly. The command I issued was:
svmmodel <- svm(V260 ~ ., data=traindata);
I'm using windows, and using the task manager, the status of Rgui.exe is "Not Responding". Did R crash already? Are there any other tips / tricks to better gauge to see what's happening inside R or the SVM learning process?
If it helps, here are some additional things I noticed using resource monitor (in windows):
CPU usage is at 13% (stable)
Number of threads is at 3 (stable)
Memory usage is at 10,505.9 MB +/- 1 MB (fluctuates)
As I'm writing this thread, I also see "similar questions" and am clicking on them. It seems that SVM training is quadratic or cubic. But still, after 24+ hours, if it's reasonable to wait, I will wait, but if not, I will have to eliminate SVM as a viable predictive model.
As mentioned in the answer to this question, "SVM training can be arbitrary long" depending on the parameters selected.
If I remember correctly from my ML class, running time is roughly proportional to the square of the number training examples, so for 800k examples you probably do not want to wait.
Also, as an anecdote, I once ran e1071 in R for more than two days on a smaller data set than yours. It eventually completed, but the training took too long for my needs.
Keep in mind that most ML algorithms, including SVM, will usually not achieve the desired result out of the box. Therefore, when you are thinking about how fast you need it to run, keep in mind that you will have to pay the running time every time you tweak a tuning parameter.
Of course you can reduce this running time by sampling down to a smaller training set, with the understanding that you will be learning from less data.
By default the function "svm" from e1071 uses radial basis kernel which makes svm induction computationally expensive. You might want to consider using a linear kernel (argument kernel="linear") or use a specialized library like LiblineaR built for large datasets. But your dataset is really large and if linear kernel does not do the trick then as suggested by others you can use a subset of your data to generate the model.