Running separate epochs for fit_one_cycle function in fastai - fast-ai

I am trying to run different epochs of fit_one_cycle function separately; saving the model, loading it and starting a new epoch:
learn = language_model_learner(data, AWD_LSTM, drop_mult=0.5, pretrained=False).to_fp16()
learn.load('/content/gdrive/My Drive/Language Model/language_model')
learn.load_encoder('/content/gdrive/My Drive/Language Model/model_encoder');
lr = 1e-3
lr *= bs/48 # Scale learning rate by batch size
learn.unfreeze()
learn.fit_one_cycle(1, lr, moms=(0.8,0.7))
learn.save('/content/gdrive/My Drive/Language Model/language_model')
learn.save_encoder('/content/gdrive/My Drive/Language Model/model_encoder')
Question: how I should change the learning rate after each epoch?

You can check Discriminative Layer Training which uses different learning rates for different layers in the model.
Create layer groups of the model using
# creates 3 layer groups with start, middle and end groups
learn.split(lambda m: (m[0][6], m[1]))
# only randomly initialized head now trainable
learn.freeze()
Note: No need to manually split the layers fit_one_cycle automatically splits randomly.
Manually setting LR rate and weight decay for each layer group
# all layers now trainable
learn.unfreeze()
# optionally, separate LR and WD for each group for 5 epochs
learn.fit_one_cycle(5, max_lr=slice(1e-5,1e-3), wd=(1e-4,1e-4,1e-1))

Related

How to remove two data points from a data set that have a large influence on the regression model

I have found two outlier data points in my data set but I don't know how to remove them. All of the guides that I have found online seem to emphasize plotting the data but my question does not require plotting, it only takes regression model fitting. I am having great difficulty finding out how to remove the two data points from my data set and then fitting the new data set with a new model.
Here is the code that I have written and the outliers that I found:
library(alr4)
library(MASS)
data(lathe1)
head(lathe1)
y=lathe1$Life
x1=lathe1$Speed
x2=lathe1$Feed
x1_square=(x1)^2
x2_square=(x2)^2
#part A (Box-Cox method show log transformation)
y.regression=lm(y~x1+x2+(x1)^2+(x2)^2+(x1*x2))
mod=boxcox(y.regression, data=lathe1, lambda = seq(-1, 1, length=10))
best.lam=mod$x[which(mod$y==max(mod$y))]
best.lam
#part B (null-hypothesis F-test)
y.regression1_Reduced=lm(log(y)~1)
y.regression1=lm(log(y)~x1+x2+x1_square+x2_square+(x1*x2))
anova(y.regression1_Reduced, y.regression1)
#part D (F-test of log(Y) without beta1)
y.regression2=lm(log(y)~x2+x2_square)
anova(y.regression1_Reduced, y.regression2)
#part E (Cook's distance and refit)
cooks.distance(y.regression1)
Outliers:
9 10
0.7611370235 0.7088115474
I think you may be able (if execution time / corpus size allows it) to pass through your data using a loop and copy / remove elems by your criteria to obtain your desired result e.g.
corpus_list_without_outliers = []
for elem in corpus_list:
if(elem.speed <= 10000) # elem.[any_param_name] < arbitrary_outlier_value
# push to corpus_list_without_outliers because it is OK :)
print corpus_list_without_outliers
# regression algorithm after
this is how I'd see the situation, but you can change the above-if with a remove statement to avoid the creation of a second list etc. e.g.
for elem in corpus_list:
if(elem.speed > 10000) # elem.[any_param_name]
# remove from current corpus because it is an outlier :(
print corpus_list
# regression algorithm after
Hope it helped you!

Setting up a statnet model in R

I would like to simulate exponential family random graphs, and I just started learning to use the statnet and ergm R packages. From the tutorial I found online, I am able to learn an ERGM model from an example dataset:
# install.packages('statnet')
# install.packages('ergm')
# install.packages('coda')
library(statnet)
set.seed(123)
data(package='ergm') # tells us the datasets in our packages
data(florentine) # loads flomarriage and flobusiness data
# Triad model
flomodel <- ergm(flomarriage ~ edges + triangle)
summary(flomodel)
Currently, I would like to use the simulate command to simulate networks with a pre-specified number of nodes from a pre-specified formula (that is not learned from any particular dataset), for example, P(y) = 1/Z exp(a * num_edges + b * num_triangles), where a and b are user-specified coefficients.
How should I go about writing such a model in statnet?
You can simulate from a given formula with simulate (or simulate.formula):
simulate(flomarriage ~ edges + triangles, coef = c(3,1))
To fix a simulation to have the same number of edges as the given graph (flomarriage in this case)
simulate(flomarriage ~ edges + triangles, coef = c(3,1), constraints = ~edges)
Not every constraint you might want to apply is available since each requires a specific mcmc sampler, but for a list of what is available see ?ergm.constraints
To fix the simulation to have an arbitrary number of nodes and edges (not based on an observed data) a workaround is to create such a network first. For example, to simulate over networks with 17 nodes and 16 edges.
test.mat = matrix(0, 17, 17)
test.mat[1,] = 1 #adds 16 edges
test.net = as.network(test.mat, directed = F)
test.sim = simulate(test.net ~ triangles, coef = 1, constraints = ~edges)
summary.statistics(test.sim ~ edges() + triangles())
p.s. I don't recommend using the triangles term in ERGM models. The geometrically weighted terms (gwesp, gwdsp) are the best substitutes which are more stable.

How to create a sliding window in R to divide data into test and train samples to test accuracy of forecasts?

We are using the forecast package in R to read 3 weeks worth of hourly data (3*7*24 data points) and make predictions for the next 24 hours. It's a time-series with multiple seasonality.
We have the forecast model running just fine and it seems to be doing well. Now, we wish to quantify the accuracy of our approach / forecasting algorithm for our data. We wish to use the accuracy function in forecast package for this purpose. We understand that the accuracy function works so that it f is the forecast and x is the actual observation vector then accuracy(f,x) would give us the several accuracy measurements for this forecast.
We have data from the past several months and we wish to write a sliding window algorithm that picks (3*7*24) hour values and then predicts the next 24 hours. Then, compares these values against actual data for the next day / 24 hours, displays the accuracy, then slides the window by (24 points / hours) / next day and repeats.
The sample data is generated as follows:
library("forecast")
time <- 1:(12*168)
set.seed(1)
ds <- msts(sin(2*pi*time/24)+c(1,1,1.2,0.8,1,0,0)[((time-1)%/%24)%%7+1]+ time/400+rnorm(length(time),0,0.2),seasonal.periods=c(24,168))
plot(ds)
head(ds)
tail(ds)
length(ds)
length(time)
Forecasting procedure is as follows:
model <- tbats(ds[1:504])
fcst <- forecast(model,h=24,level=90)
accuracy(fcst,ds[505:528]) ##Test accuracy of forecast against next/actual 24 values
Now, we wish to slide the "window" by 24 and repeat the same procedure, that is, the next set of values used to build the model will be ds[25:528] and their accuracy will be tested against ds[529:552] ... and so on. How can we implement this?
Also, is there a better way to test overall accuracy of this forecasting algorithm for our scenario?
I would do this by creating a vector of times representing the front edge of the sliding windows, then using lapply to iterate the forecasting and scoring process over the windows those edges imply. Like...
# set a couple of parameters we'll use to slice the series into chunks:
# window width (w) and the time step at which you want to end the first
# training set
w = 24 ; start = 504
# now use those parameters to make a vector of the time steps at which each
# window will end
steps <- seq(start + w, length(ds), by = w)
# using lapply, iterate the forecasting-and-scoring process over the
# windows that created
cv_list <- lapply(steps, function(x) {
train <- ds[1:(x - w)]
test <- ds[(x - w + 1):x]
model <- tbats(train)
fcst <- forecast(model, h = w, level = 90)
accuracy(fcst, test)
})
Example output for the first window:
> cv_list[[1]]
ME RMSE MAE MPE MAPE MASE
Training set 0.0001587681 0.3442898 0.2689754 34.3957362 84.30841 0.9560206
Test set 0.2619029897 0.8961109 0.7868256 -0.6832273 36.64301 2.7966186
ACF1
Training set 0.02588145
Test set NA
If you want summaries of the scores for the whole list, you can do something like...
rmse <- mean(unlist(lapply(cv_list, '[[', "Test set","RMSE")))
...which produces this:
[1] 1.011177

Restricted Boltzmann Machine

I am currently trying to work on RBM in R using deepnet package.I trained an RBM using my own dataset with 3 input points.After training the network I got 2 sets of weights and 2 sets of biases.
My code runs like this
a<-matrix(c(1,0,0,0,1,0,0,0,1,1,1,1),nrow=4,ncol=3,byrow=T)
RBM_trn<-rbm.train(a, 2, numepochs = 30, batchsize = 100, learningrate=0.8,
momentum =0.5 ,visible_type = "bin",hidden_type = "bin" , cd = 1)
RBM_trn
The results I obtained were in the sets of 2.I got two 2x3 weight matrix.What does the other matrix mean?
Check this: https://github.com/cran/deepnet/blob/master/R/rbm_train.R
Where W and B corresponds to learnt weight and bias at every iteration using stochastic (or mini-batch) gradient descent to optimize the cost function, VW and VB combines the momentum as well (helping to minimize noisy weight updates).

How to draw the regression tree correctly when clustering using R

I get stuck when trying to build a model.
I want to class the dataset freeny into 10 subsets by year.
data(freeny)
options(digits=2)
year<-as.integer(rownames(freeny))
freeny<-cbind(freeny,year)
freeny = freeny[sample(1:nrow(freeny),length(1:nrow(freeny))),1:ncol(freeny)]
freenyValues= freeny[,1:5]
freenyTargets=decodeClassLabels(freeny[,6])
freeny = splitForTrainingAndTest(freenyValues,freenyTargets,ratio=0.15)
km<-kmeans(freeny$inputsTrain,10,iter.max = 100, nstart = 5)
kclust=km$cluster
library(tree)
kclust=as.factor(kclust)
mdp=cbind(freeny$inputsTrain,kclust)
mdp<-data.frame(mdp)
mdp.tr=tree(kclust~.,mdp)
but the result is that the tree only has 5 terminal nodes.It should be 10 terminal nodes because I divide into 10 clusters by kmeans. What's wrong?
No. It shouldn't. tree is an algorithm that tries to fit a tree given predictor and response, and stops if
the terminal nodes are too small or too few to be split.
(manual page). Try adjusting the minsize parameter (see ?tree.control).
minsize: The smallest allowed node size: a weighted quantity. The
default is 10.
I think the following will do what is intended:
mdp.tr=tree(kclust~.,mdp, minsize= 1)

Resources