My question is if an object in R saved to binary format using the save function can be different if saved from different (but recent) versions of R.
That is because I have a script that makes some calculations and save its results to a file. When reproducing the same calculations later, I decided to compare the two files using
diff --binary -s mv3p.Rdata mv3p.Rdata.backup
To my surprise the two files are different. However when analysing the contents in R, they are identical.
The new version is 3.3.1. I believe the older version have been created by R 3.3.0 but it could also be by 3.2.x, I am not 100% sure. I used the save command with only the object I wanted to save and the filename arguments.
So my question is : is it normal that the same object is written differently in different versions of R? is it documented somewhere? How can I be sure to be able to reproduce exactly the same file? On what can it depend (R version, OS, processor architecture, etc...)
Please , I am NOT asking if versions can be read by another version of R and I am NOT asking about very old R versions.
R data files also include the R version used to write it. That's one reason the files may be different. See here on documentation: http://biostat.mc.vanderbilt.edu/wiki/Main/RBinaryFormat
Also, you can use save(..., ascii=T) to see the difference in plain text.
Related
I have a file run_experiment.rmd which performs an analysis on data using a bunch of .r scripts in another folder.
Every analysis is saved into its own timestamped folder. I save the outputs of the analysis, the inputs used, and if possible I would also like to save the code used to generate the analysis (including the contents of both the .rmd file and the .r files).
The reason for this is because if I make changes to the way my analyses are run, then if I re-run the analysis using the new updated file, I will get different results. If possible, I would like to keep legacy versions of the code so that I can always, if need be, re-run the original analysis.
Have you considered using a git repository to commit your code and output each time you update/run it? I think this is the optimal solution for what you are describing. Each commit would have a timestamp associated with it for you to rollback to a previous version when needed.
The best way to do this is to put all of those scripts into an R package, and in your Rmd file, print sessionInfo() to record the package version used.
You should change the version number of the package each time you make a non-trivial change to it (or even better, with every change).
Then when you want to reproduce the analysis, you use the sessionInfo() listing to work out which version of R and of the packages to install, and you'll get the same environment.
There are packages to help with this (pak and renv, maybe others), but I haven't used them, so I can't give details or recommendations.
I am using plyr to rename the columns of a large data set to a shorter aliases. The from names are very long with occasional unusual symbols (i.e. Â) This code works in in R Studio when I manually (i.e. Ctrl+R) execute the code. No errors.
However, when it is run using source in another script and/or in the standard Rgui (even using Ctrl+R), it recognizes some of the names in the from statement, but not others, which are identified in the error:
The following from values were not present in x
32/64 bit doesn't seem to make a difference. Can't identify character or value that is producing the error. Any solutions?
Should this be posted as an issue on the plyr Github?
I have prepared a dummy replica of the data set here.
The program that works in R Studio, but not in standard Rgui is here.
The code for the "source" call that produces errors is
source("dftest.R")
All software and packages updated on 3/18/2016.
See similar but unrelated question here.
Looks like copying the code from R Studio text editor and pasting into native client text editor, and then saving, solved both the source problem and the native client problem.
I have written and collected R code on various topics that solve particular problems at hand. I stored the R script/code in .txt files. I have now 100s of them.
How do you keep your R code at hand efficiently?
#Manetheran has the right idea: write a package. It's easy to do (especially with RStudio). Read "Writing R Extensions" and then on top of that learn about roxygen2 (which allows you to document each function in-line and avoid writing .Rd files).
Then you can use devtools to load your package locally, or once it's stable if you think other people can use the functions you can submit your package to CRAN.
I prefer to keep it simple. I use Total Commander and when I need an example which uses some R function, I just do Alt-F7 and search for *.R files which contain the desired string.
I use RStudio and have created two or three basic scripts. I save my much-used functions in the basic script that is most appropriate. Then, at the start of an RStudio script for a project, I source one or more of the basic scripts as is appropriate.
I tried to load my R workspace and received this error:
Error: bad restore file magic number (file may be corrupted) -- no data loaded
In addition: Warning message:
file ‘WORKSPACE_Wedding_Weekend_September’ has magic number '#gets'
Use of save versions prior to 2 is deprecated
I'm not particularly interested in the technical details, but mostly in how I caused it and how I can prevent it in the future. Here's some notes on the situation:
I'm running R 2.15.1 on a MacBook Pro running Windows XP on a bootcamp partition.
There is something obviously wrong this workspace file, since it weighs in at only ~80kb while all my others are usually >10,000
Over the weekend I was running an external modeling program in R and storing its output to different objects. I ran several iterations of the model over the course of several days, eg output_Saturday <- call_model()
There is nothing special to the model output, its just a list with slots for betas, VC-matrices, model specification, etc.
I got that error when I accidentally used load() instead of source() or readRDS().
Also worth noting the following from a document by the R Core Team summarizing changes in versions of R after v3.5.0 (here):
R has new serialization format (version 3) which supports custom serialization of
ALTREP framework objects... Serialized data in format 3 cannot be read by versions of R prior to version 3.5.0.
I encountered this issue when I saved a workspace in v3.6.0, and then shared the file with a colleague that was using v3.4.2. I was able to resolve the issue by adding "version=2" to my save function.
Assuming your file is named "myfile.ext"
If the file you're trying to load is not an R-script, for which you would use
source("myfile.ext")
you might try the readRDSfunction and assign it to a variable-name:
my.data <- readRDS("myfile.ext")
The magic number comes from UNIX-type systems where the first few bytes of a file held a marker indicating the file type.
This error indicates you are trying to load a non-valid file type into R. For some reason, R no longer recognizes this file as an R workspace file.
Install the readr package, then use library(readr).
It also occurs when you try to load() an rds object instead of using
object <- readRDS("object.rds")
I got the error when saved with saveRDS() rather than save(). E.g. save(iris, file="data/iris.RData")
This fixed the issue for me. I found this info here
Also note that with save() / load() the object is loaded in with the same name it is initially saved with (i.e you can't rename it until it's already loaded into the R environment under the name it had when you initially saved it).
I had this problem when I saved the Rdata file in an older version of R and then I tried to open in a new one. I solved by updating my R version to the newest.
If you are working with devtools try to save the files with:
devtools::use_data(x, internal = TRUE)
Then, delete all files saved previously.
From doc:
internal If FALSE, saves each object in individual .rda files in the data directory. These are available whenever the package is loaded. If
TRUE, stores all objects in a single R/sysdata.rda file. These objects
are only available within the package.
This error occured when I updated my R and R Studio versions and loaded files I created under my prior version. So I reinstalled my prior R version and everything worked as it should.
I was wondering if it is possible to work between matlab and R to plot some data. I have a script in matlab which generates a text file. From this I was wondering if it was possible to open R from within matlab and plot the data from this text file and then return to matlab.
For example, if I save a text file called test.txt, in the path 'E:\', and then define the path of R which in my case will be:
pathR = 'C:\Program Files\R\R-2.14.1\bin\R';
Is it possible to run a script already written in R saved under test1.R (saved in the same directory as test.txt) in R from matlab?
If you're working with Windows (from the path it looks like you are), you can use the MATLAB R-link from the File Exchange to pass data from Matlab to R, execute commands there, and retrieve output.
I don't use R so this is not something I have done but I see no reason why you shouldn't be able to use the system function to call R from a Matlab session. Look in the product documentation under the section Run External Commands, Scripts, and Programs for this and related approaches.
There are some platform-specific peculiarities to be aware of and you may have to wrestle a little with what is returned (though since you are planning to have R create a plot which is likely to be a side-effect rather than what is returned you may not). As ever this is covered quite well in the product's documentation
After using R(D)COM and Matlab R-link for a while, I do not recommend it. The COM interface has trouble parsing many commands and it is difficult to debug the code. I recommend using a system command from Matlab as described in the R Wiki. This also avoids having to install RAndFriends.