Saving and loading a model in R - r

When working with caret, how can I save a model after training, and load it later (e.g. in a different session) for prediction?

A better solution nowadays is to use saveRDS to save and readRDS to read:
saveRDS(model, "model.rds")
my_model <- readRDS("model.rds")
This lets you to choose a new name for the object (you don't need to remember the name you used when you saved it)

The correct syntax would be to use:
save(model, file="model.Rdata")
Thereafter, it can be loaded using the load() command.

The following code assumes that your model's variable name is 'model':
save(model, "model.RData")
This will save your model as "model.RData" in the current working directory. You can find out what the working directory is by issuing the following:
getwd()
To load it back in, ensure that your model is saved in your working directory and issue:
load("model.RData")

Related

How can I build a custom context based Question answering model SQuAD using deeppavlov

I have the following queries
Dataset format (is how to split train, test and valid data )
Where to place the dataset
How to change the path for dataset reader
How to save the model in my own directory
And How to use the trained model
Edit
my_config['dataset_reader']['data_path'] = '/home/ec2-user/SageMaker/squad/data/'
my_config['metadata']['variables']['MODELS_PATH'] = '/home/ec2-user/SageMaker/squad/model/'
I used this command to change my dataset path and model path in configuration file. My model is saved in this location but It is not using my dataset during training instead of this it is downloading its own dataset in that folder and using it.
The example of dataset is https://github.com/deepmipt/DeepPavlov/blob/f5117cd9ad1e64f6c2d970ecaa42fc09ccb23144/deeppavlov/dataset_readers/squad_dataset_reader.py#L46
Your dataset should have the same format.
2-3. The dataset should be placed in the folder https://github.com/deepmipt/DeepPavlov/blob/f5117cd9ad1e64f6c2d970ecaa42fc09ccb23144/deeppavlov/configs/squad/squad_torch_bert.json#L4
(you can change the folder name)
Model is saved in the directory https://github.com/deepmipt/DeepPavlov/blob/f5117cd9ad1e64f6c2d970ecaa42fc09ccb23144/deeppavlov/configs/squad/squad_torch_bert.json#L166
(here you can write your own directory)
Trained model can be used with the command: python3 -m deeppavlov interact <your_config_name> More detailed tutorial how to launch models is here https://github.com/deepmipt/DeepPavlov

How to extract code from .Rdata file?

I have .Rdata file that "stores" some logic/code.
How can I extract the code written in this .Rdata file?
I want to edit/fix this code, but instead the general pipeline loads this .Rdata with it's variables and SVM model without the option to fix and edit.
Please advise.
P.S .Rdata saves a workspace, which includes the function and value objects created during an open session in R, I need the actual logic/code/initialization done in order to create these objects, for example I get the svm model fit result but not the code that created this object, that is what I need.
You can try loading the RData file, and listing its contents:
load("mydata.RData", verbose=TRUE)
Then, you can view the code behind the objects loaded. For instance, if you just loaded a function called myfunc you could view the definition by just entering the function's name:
myfunc

Same Random Data generated when .RData is loaded

When a .RData file is loaded same random numbers are generated everytime. For example try this: (Type these in terminal)
rm(list=ls())
x=10 #Just some random value
save.image("samplefile.RData")
Now try this:
rm(list=ls())
load("samplefile.RData")
print(runif(n=100,min=0,max=100)) #Now it prints same random numbers everytime i run above code junket.
Can anyone please explain?
Thanks.
This is intentional behaviour - .Random.seed is saved within Rdata file. If you want different data generated just rm() the value before that or set it to a different one.
If you need to load a .RData file that has a saved .Random.seed, you can reset the seed using the clock time and this bit of code:
a <- as.numeric(Sys.time())
set.seed(a)
Note that there are benefits to being able to exactly reproduce randomizations, i.e., reproducible research. But for everyday purposes it's probably safer to save and load objects rather than the environment. https://www.rdocumentation.org/packages/base/versions/3.4.0/topics/readRDS

How to create my own datasets like the default datasets in R?

I am a new R user. I like the function data() very much, it stored many datasets as default in it and I can use and test them at any time when I want.
Can I also put my own data sets in R like the default datasets, so that I don't have to import them every time when I need them?
You could use the following workflow:
Put your data (e.g. a mydataset.CSV) in a data folder of your project.
Put an R file that does the loading named mydataset.R in the same folder.
When you want to use the data function, first set the working directory
to the project folder using setwd, then call data(mydataset, package=character(0))
Here is an example how the R script can look like:
# this code goes into mydataset.R
mydataset <- local({
dat <- read.csv("mydataset.Csv")
# some transformations here if necessary
return(dat)
})
If you want to be working directory agnostic, you should consider to put your data in your own package. May the devtools package is an option for this.

Retrieve path of supplementary data file of developed package

While developing a package I encountered the problem of supplementary data import - this has been 'kind of' solved here.
Nevertheless, I need to make use of a function of another package, which needs a path to the used file. Sadly, using GlobalEnvironment variables here is not an option.
[By the way: the file needs to be .txt, while supplementary data should be .RData. The function is quite picky.]
So I need to know how to get the path supplementary data file of a package. Is this even possible to do?
I had the idea of reading the .RData into the global environment and then saving it into a tmpfile for further processing. I would really like to know a clean way - the supplementary data is ~100MB large...
Thank you very much!
Use system.file() to reliably find the path to the installed package and sub-directories, typically these are created in your-pkg-source/inst/extdata/your-file.txt and then referenced as
system.file(package="your-pkg", "extdata", "your-file.txt")

Resources