Best way to save a trained model in Flux.jl? - julia

I have a model which I have trained and I would like to save it for future use and distributing to others. What is the best way to save a trained model with Flux.jl?

If your model does not have things like dynamically created/sized layers, you should be able to save just the weights instead of serializing the whole model. This can be much more robust than using BSON.jl or the Serialization stdlib to serialize the whole model (both of which can be very fragile).
The weights can be obtained from a model by weights=collect(params(cpu(model))) and loaded back into the model by Flux.loadparams!(model, weights). Thus, one just needs to save a Vector of numeric arrays to disk, instead of more complicated Julia-side objects in the model. So I would suggest a pattern like:
function make_model(config)
...define layers, put them in a chain, etc...
return model
end
# train model
...
# collect weights
weights=collect(params(cpu(model)))
# save them to disk somehow...
Then when it's time to reload the model,
weights = # load them from disk
fresh_model = make_model(config)
Flux.loadparams!(model, weights)
Note that this approach means you can't e.g. add a layer to make_model and reload old weights; they will no longer be the right size. So you need to version your code and your weights and ensure they match up.
Last week I helped make a new package LegolasFlux.jl to make this pattern easier (in particular, providing a way to use Arrow to save the weights to disk along with any other configuration parameters, losses, etc, you would like to save). It should be registered in two days.

According to the Flux.jl docs (https://fluxml.ai/Flux.jl/stable/saving/) the best way to save a trained model is using BSON.jl by doing the following:
julia> using Flux
julia> model = Chain(Dense(10,5,relu),Dense(5,2),softmax)
Chain(Dense(10, 5, NNlib.relu), Dense(5, 2), NNlib.softmax)
julia> using BSON: #save
julia> #save "mymodel.bson" model
and then you can load the saved model by doing:
julia> using Flux
julia> using BSON: #load
julia> #load "mymodel.bson" model
julia> model
Chain(Dense(10, 5, NNlib.relu), Dense(5, 2), NNlib.softmax)

Related

Why is data pre-processed with caret encoded as non-tabular objects

I am learning how to use the R caret package, and I am wondering why there are than many functions that encode output data as objects that are non directly usable for training or regression.
For example, for preprocessing, the dummyVars functions returns an object of class "dummyVars". And similarly, the preProcess function returns an object of class "preProcess". These are non-usable by caret::train, and one has to work it out first with stats::predict like:
caret::dummyVars(Y ~ ., data = mydata) %>%
stats::predict(newdata = mydata)
Is there a reason for that? Why? What are the benefits?
After becoming more familiar with the package, I think I can provide an answer to this question and explain the advantages of such approach.
The caret package is intended to be used in a context in which one would not typically use just one predictive model to fit the data, but many.
Thus, objects such as preProcess are useful because they simply provide rules to process data than later can be passed to as many models as required. This saves coding and avoids errors (e.g. copy-pasting ones), because the same preProcess object can be used for all subsequent models in the train function by means of the trControl argument.
One must note also that these objects do not save entire "pre-processed" datasets (e.g. dummyVars, but just rules to be used during training, or during pre-processing. This also helps saving memory in a context where one might tend to accumulate many temporary variables and dataframes.

Used saveRDS to save a model but not enough memory to readRDS?

I created a model based on a very large dataset and had the program save the results using
saveRDS(featVarLogReg.mod, file="featVarLogReg.mod.RDS")
Now I'm trying to load the model to evaluate, but readRDS runs out of memory.
featVarLR.mod <- readRDS(file = "featVarLogReg.mod.RDS")
Is there a way to load the file that takes less memory? Or at least the same amount of memory that was used to save it?
The RDS file ended up being 1.5GB in size for this logistic regression using caret. My other models using the same dataset and very similar caret models were 50MB in size so I can load them.
The caret linear model saves the training data in the model object. You could try to use returnData = FALSE in the trainControl argument to train. I don't recall if this fixed my issue in the past.
https://www.rdocumentation.org/packages/caret/versions/6.0-77/topics/trainControl
You could also try to just export the coefficients into a dataframe and use a manual formula to score new data.
Use coef(model_object)

t-SNE predictions in R

Goal: I aim to use t-SNE (t-distributed Stochastic Neighbor Embedding) in R for dimensionality reduction of my training data (with N observations and K variables, where K>>N) and subsequently aim to come up with the t-SNE representation for my test data.
Example: Suppose I aim to reduce the K variables to D=2 dimensions (often, D=2 or D=3 for t-SNE). There are two R packages: Rtsne and tsne, while I use the former here.
# load packages
library(Rtsne)
# Generate Training Data: random standard normal matrix with J=400 variables and N=100 observations
x.train <- matrix(nrom(n=40000, mean=0, sd=1), nrow=100, ncol=400)
# Generate Test Data: random standard normal vector with N=1 observation for J=400 variables
x.test <- rnorm(n=400, mean=0, sd=1)
# perform t-SNE
set.seed(1)
fit.tsne <- Rtsne(X=x.train, dims=2)
where the command fit.tsne$Y will return the (100x2)-dimensional object containing the t-SNE representation of the data; can also be plotted via plot(fit.tsne$Y).
Problem: Now, what I am looking for is a function that returns a prediction pred of dimension (1x2) for my test data based on the trained t-SNE model. Something like,
# The function I am looking for (but doesn't exist yet):
pred <- predict(object=fit.tsne, newdata=x.test)
(How) Is this possible? Can you help me out with this?
From the author himself (https://lvdmaaten.github.io/tsne/):
Once I have a t-SNE map, how can I embed incoming test points in that
map?
t-SNE learns a non-parametric mapping, which means that it does not
learn an explicit function that maps data from the input space to the
map. Therefore, it is not possible to embed test points in an existing
map (although you could re-run t-SNE on the full dataset). A potential
approach to deal with this would be to train a multivariate regressor
to predict the map location from the input data. Alternatively, you
could also make such a regressor minimize the t-SNE loss directly,
which is what I did in this paper (https://lvdmaaten.github.io/publications/papers/AISTATS_2009.pdf).
So you can't directly apply new data points. However, you can fit a multivariate regression model between your data and the embedded dimensions. The author recognizes that it's a limitation of the method and suggests this way to get around it.
t-SNE does not really work this way:
The following is an expert from the t-SNE author's website (https://lvdmaaten.github.io/tsne/):
Once I have a t-SNE map, how can I embed incoming test points in that
map?
t-SNE learns a non-parametric mapping, which means that it does not
learn an explicit function that maps data from the input space to the
map. Therefore, it is not possible to embed test points in an existing
map (although you could re-run t-SNE on the full dataset). A potential
approach to deal with this would be to train a multivariate regressor
to predict the map location from the input data. Alternatively, you
could also make such a regressor minimize the t-SNE loss directly,
which is what I did in this paper.
You may be interested in his paper: https://lvdmaaten.github.io/publications/papers/AISTATS_2009.pdf
This website in addition to being really cool offers a wealth of info about t-SNE: http://distill.pub/2016/misread-tsne/
On Kaggle I have also seen people do things like this which may also be of intrest:
https://www.kaggle.com/cherzy/d/dalpozz/creditcardfraud/visualization-on-a-2d-map-with-t-sne
This the mail answer from the author (Jesse Krijthe) of the Rtsne package:
Thank you for the very specific question. I had an earlier request for
this and it is noted as an open issue on GitHub
(https://github.com/jkrijthe/Rtsne/issues/6). The main reason I am
hesitant to implement something like this is that, in a sense, there
is no 'natural' way explain what a prediction means in terms of tsne.
To me, tsne is a way to visualize a distance matrix. As such, a new
sample would lead to a new distance matrix and hence a new
visualization. So, my current thinking is that the only sensible way
would be to rerun the tsne procedure on the train and test set
combined.
Having said that, other people do think it makes sense to define
predictions, for instance by keeping the train objects fixed in the
map and finding good locations for the test objects (as was suggested
in the issue). An approach I would personally prefer over this would
be something like parametric tsne, which Laurens van der Maaten (the
author of the tsne paper) explored a paper. However, this would best
be implemented using something else than my package, because the
parametric model is likely most effective if it is selected by the
user.
So my suggestion would be to 1) refit the mapping using all data or 2)
see if you can find an implementation of parametric tsne, the only one
I know of would be Laurens's Matlab implementation.
Sorry I can not be of more help. If you come up with any other/better
solutions, please let me know.
t-SNE fundamentally does not do what you want. t-SNE is designed only for visualizing a dataset in a low (2 or 3) dimension space. You give it all the data you want to visualize all at once. It is not a general purpose dimensionality reduction tool.
If you are trying to apply t-SNE to "new" data, you are probably not thinking about your problem correctly, or perhaps simply did not understand the purpose of t-SNE.

Simulate data in JAGS/r2jags

Is it possible to misuse JAGS as a tool for generating data from a model with known parameters? I need to sample data points from a predefined model in order to do a simulation study and test the power of a model I have developed in R.
Unfortunately, the model is somehow tricky (hierarchical structure with AR and VAR component) and I was not able to simulate the data directly in R.
While searching the internet, I found a blog post where the data was generated in JAGS using the data{} Block in JAGS. In the post, the author than estimated the model directly in JAGS. Since I have my model in R, I would like to transfer the data back to R without a model{} block. Is this possible?
Best,
win
There is no particular reason that you need to use the data block for generating data in this way - the model block can just as easily work in 'reverse' to generate data based on fixed parameters. Just specify the parameters as 'data' to JAGS, and monitor the simulated data points (and run for as many iterations as you need datasets - which might only be 1!).
Having said that, in principle you can simulate data using either the data or model blocks (or a combination of both), but you need to have a model block (even if it is a simple and unrelated model) for JAGS to run. For example, the following uses the data block to simulate some data:
txtstring <- '
data{
for(i in 1:N){
Simulated[i] ~ dpois(i)
}
}
model{
fake <- 0
}
#monitor# Simulated
#data# N
'
library('runjags')
N <- 10
Simulated <- coda::as.mcmc(run.jags(txtstring, sample=1, n.chains=1, summarise=FALSE))
Simulated
The only real difference is that the data block is updated only once (at the start of the simulation), whereas the model block is updated at each iteration. In this case we only take 1 sample so it doesn't matter, but if you wanted to generate multiple realisations of your simulated data within the same JAGS run you would have to put the code in the model block. [There might also be other differences between data and model blocks but I can't think of any offhand].
Note that you will get the data back out of JAGS in a different format (a single vector with names giving the indices of any arrays within the monitored data), so some legwork might be required to get that back to a list of vectors / arrays / whatever in R. Edit: unless R2jags provides some utility for this - I'm not sure as I don't use that package.
Using a model block to run a single MCMC chain that simulates multiple datasets would be problematic because MCMC samples are typically correlated. (Each subsequent sample is drawn using the previous sample). For a simulation study, you would want to generate independent samples from your distribution. The way to go would be to use the data or model block recursively, e.g. in a for loop, which would ensure that your samples are independent.

100-fold-cross-validation for Ridge Regression in R

I have a huge dataset, and I am quite new to R, so the only way I can think of implementing 100-fold-CV by myself is through many for's and if's which makes it extremely inefficient for my huge dataset, and might even take several hours to compile. I started looking for packages that do this instead and found quite many topics related to CV on stackoverflow, and I have been trying to use the ones I found but none of them are working for me, I would like to know what I am doing wrong here.
For instance, this code from DAAG package:
cv.lm(data=Training_Points, form.lm=formula(t(alpha_cofficient_values)
%*% Training_Points), m=100, plotit=TRUE)
..gives me the following error:
Error in formula.default(t(alpha_cofficient_values)
%*% Training_Points) : invalid formula
I am trying to do Kernel Ridge Regression, therefore I have alpha coefficient values already computed. So for getting predictions, I only need to do either t(alpha_cofficient_values)%*% Test_Points or simply crossprod(alpha_cofficient_values,Test_Points) and this will give me all the predictions for unknown values. So I am assuming that in order to test my model, I should do the same thing but for KNOWN values, therefore I need to use my Training_Points dataset.
My Training_Points data set has 9000 columns and 9000 rows. I can write for's and if's and do 100-fold-CV each time take 100 rows as test_data and leave 8900 rows for training and do this until the whole data set is done, and then take averages and then compare with my known values. But isn't there a package to do the same? (and ideally also compare the predicted values with known values and plot them, if possible)
Please do excuse me for my elementary question, I am very new to both R and cross-validation, so I might be missing some basic points.
The CVST package implements fast cross-validation via sequential testing. This method significantly speeds up the computations while preserving full cross-validation capability. Additionaly, the package developers also added default cross validation functionality.
I haven't used the package before but it seems pretty flexible and straightforward to use. Additionally, KRR is readily available as a CVST.learner object through the constructKRRLearner() function.
To use the crossval functionality, you must first convert your data to a CVST.data object by using the constructData(x, y) function, with x the feature data and y the labels. Next, you can use one of the cross validation functions to optimize over a defined parameter space. You can tweak the settings of both the cv or fastcv methods to your liking.
After the cross validation spits out the optimal parameters you can create the model by using the learn function and subsequently predict new labels.
I puzzled together an example from the package documentation on CRAN.
# contruct CVST.data using constructData(x,y)
# constructData(x,y)
# Load some data..
ns = noisySinc(1000)
# Kernel ridge regression
krr = constructKRRLearner()
# Create parameter Space
params=constructParams(kernel="rbfdot", sigma=10^(-3:3),
lambda=c(0.05, 0.1, 0.2, 0.3)/getN(ns))
# Run Crossval
opt = fastCV(ns, krr, params, constructCVSTModel())
# OR.. much slower!
opt = CV(ns, krr, params, fold=100)
# p = list(kernel=opt[[1]]$kernel, sigma=opt[[1]]$sigma, lambda=opt[[1]]$lambda)
p = opt[[1]]
# Create model
m = krr$learn(ns, p)
# Predict with model
nsTest = noisySinc(10000)
pred = krr$predict(m, nsTest)
# Evaluate..
sum((pred - nsTest$y)^2) / getN(nsTest)
If further speedup is required, you can run the cross validations in parallel. View this post for an example of the doparallel package.

Resources