I am working with secure datasets and a safe data environment, and producing models from them, e.g. Cox proportional hazard models which are safe to export
i.e. model <- coxph(survobject, y ~ x + a+ b + c).
I can use grid.expand on all possible values of x, a, b, c and predict(model, grid) to visualise what the model is saying about the risk distribution.
However, I'd like to export the models and carry out all the model visualisation outside the safe space.
What is the best way to export the model, save the model and import the model so I can use predict at a later time on new data without access to the original data. Bearing in mind this file needs to be readable by a human to assess that it is a datasafe export and does not include any of the original (sensitive) data.
I note that you can export the model as an RDS file.
https://stackoverflow.com/questions/54744797/is-there-a-way-to-export-and-import-models-rather-than-re-running-them-every-ses]
However, when opening this with notepad it is unreadable.
I have already exported all the coefficients, i.e. the outputs from glance and tidy (both exponentiated and not). Is there any way to reconstruct a model that can carry out predict from this data?
How can I export the model in a human readable file, perhaps via unlist, saving in excel and then reconstructing the model?
I know how to calculate predictions from linear models but not from CoxPH models.
Related
I have a question. I'm trying to analyze some data for my thesis with R.
Since my data are not normally distributed, I can not make a linear regression model.
I've read a paper where they use a power regression to fit their data (which are more or less similar to mine).
I was wondering: how can I find the power function that fits my data?
Is there a script similar to the one of linear regression model like this one?
regr_bivariata <- lm(Cells_Llog ~ Ostreo$Cells_glog, data = Ostreo)
summary(regr_bivariata)
If there's nothing, I found out on this link how I can calculate my power function.
The question is: using the method proposed in the link, can I use the p-value and R^2 given by R when I perform the function ln(y) = ln(x), to calculate then the power function, to describe my power function?
Final question: is there a way to plot the power function I will obtain?
I need to use some script that can allow me to discriminate the points plotted (I attach a photo to make you understand).
I show you what I would like to obtain by showing you what I obtained using Excel.
Power function in Excel:
I used the forecast package to forecast the daily time-series of variable Y using its lag values and a time series of an external parameter X. I found nnetar model (a NARX model) was the best in terms of overall performance. However, I was not able to get the prediction of peaks of the time series well despite my various attempts with parameter tuning.
I then extracted the peak values (above a threshold) of Y (and of course this is not a regular time series anymore) and corresponding X values and tried to fit a regression model (note: not an autoregression model) using various models in carat package. I found out the prediction of peak values using brnn(Bidirectional recurrent neural networks) model just using X values is better than that of nnetar which uses both lag values and X values.
Now my question is how do I go from here to create ensamples of these two models (i.e whenever the prediction using brnn regression model ( or any other regression model) is better I want to replace the prediction using nnetar and move forward - I am mostly concerned about the peaks)? Is this a commonly used approach?
Instead of trying to pick one model that would be the superior at anytime, it's typically better to do an average of the models, in order to include as many individual views as possible.
In the experiments I've been involved in, where we tried to pick one model that would outperform, based on historical performance, it's typically shown that a simple average was as good or better. Which is in line with the typical results on this problem: https://otexts.com/fpp2/combinations.html
So, before you try to go more advanced at it by using trying to pick a specific model based on previous performance, or by using an weighted average, consider doing a simple average of the two models.
If you want to continue with a sort of selection/weighted averaging, try to have a look at the FFORMA package in R: https://github.com/pmontman/fforma
I've not tried the specific package (yet), but have seen promising results in my test using the original m4metalearning package.
I have app that could calculate some defined models (let's say lm model). The app is hosted on the server. Now I would like to add functionality, that user could define "on the side" any model (arima or any defined by user), add it to app and then calculate estimates by using that model.
The best solution would be if user could define model on own R instance, export it to a file and import via front-end on the server. The best solution for me, because user don't have to have any permissions to the server.
I was thinking about saving model definition as a RDS file and then importing it to app. However if model is saved via:
modelTest <- glm(y ~ x, data = df)
saveRDS(modelTest, file = "modelTest.rds")
And then after import:
modelTest2 <- readRDS("modelTest.rds")
df2$prediction <- predict(modelTest2, newdata=df2)
In the above example, the whole glm object is saved. It means that predicted values are also saved, so the file could be large if many predicted values are saved. Is it possible to use another method and save model with only model definition without data?
I conduct a large number of regression analyses using ols and cph (different models, sensitivity analyses etc) which takes on my computer around two hours. Therefore, I would like to save these models so that I don't have to re-run the same analyses every time I want to work with them. The models all have very structured names, so I can create a list of names as follows:
model.names <- list()[grep("^im", ls())
But how can I use this to save those models? Could they be placed into a data frame?
I think you are looking for save()
save writes an external representation of R objects to the specified file. The objects can be read back from the file at a later date by using the function load or attach (or data in some cases).
Is it possible to misuse JAGS as a tool for generating data from a model with known parameters? I need to sample data points from a predefined model in order to do a simulation study and test the power of a model I have developed in R.
Unfortunately, the model is somehow tricky (hierarchical structure with AR and VAR component) and I was not able to simulate the data directly in R.
While searching the internet, I found a blog post where the data was generated in JAGS using the data{} Block in JAGS. In the post, the author than estimated the model directly in JAGS. Since I have my model in R, I would like to transfer the data back to R without a model{} block. Is this possible?
Best,
win
There is no particular reason that you need to use the data block for generating data in this way - the model block can just as easily work in 'reverse' to generate data based on fixed parameters. Just specify the parameters as 'data' to JAGS, and monitor the simulated data points (and run for as many iterations as you need datasets - which might only be 1!).
Having said that, in principle you can simulate data using either the data or model blocks (or a combination of both), but you need to have a model block (even if it is a simple and unrelated model) for JAGS to run. For example, the following uses the data block to simulate some data:
txtstring <- '
data{
for(i in 1:N){
Simulated[i] ~ dpois(i)
}
}
model{
fake <- 0
}
#monitor# Simulated
#data# N
'
library('runjags')
N <- 10
Simulated <- coda::as.mcmc(run.jags(txtstring, sample=1, n.chains=1, summarise=FALSE))
Simulated
The only real difference is that the data block is updated only once (at the start of the simulation), whereas the model block is updated at each iteration. In this case we only take 1 sample so it doesn't matter, but if you wanted to generate multiple realisations of your simulated data within the same JAGS run you would have to put the code in the model block. [There might also be other differences between data and model blocks but I can't think of any offhand].
Note that you will get the data back out of JAGS in a different format (a single vector with names giving the indices of any arrays within the monitored data), so some legwork might be required to get that back to a list of vectors / arrays / whatever in R. Edit: unless R2jags provides some utility for this - I'm not sure as I don't use that package.
Using a model block to run a single MCMC chain that simulates multiple datasets would be problematic because MCMC samples are typically correlated. (Each subsequent sample is drawn using the previous sample). For a simulation study, you would want to generate independent samples from your distribution. The way to go would be to use the data or model block recursively, e.g. in a for loop, which would ensure that your samples are independent.