Save or Retrieve the data in Rstudio - r

Sir,i am a student, learning R,I have a question about how to store data in R, or how to retrieve data that has been erased.

Sir,
Using RStudio is not much different than using, say, Word or Notepad, but with some differences.
First the similarities:
If you do not save your Rscript or data, it might not be available after you restart RStudio or if you overwrite/erase your data.
The advantage of using R and Rstudio is that you can script how you load and manipulate your data, hence recreate the data. If you use a script and do not rely only on the console (interactive) part.
For the differences, Rstudio can be set to save your current workspace. This is were all data and variables loaded reside. To change the settings, go to "Tools" --> "Global options" and you should see the options as depicted below.
However, if you erase your data by overwriting with other values or using the command unset, the data is lost. Your only recourse is to retrace how it was loaded/modified, using either your script or going through the "history".
For saving data, see e.g. http://www.sthda.com/english/wiki/saving-data-into-r-data-format-rds-and-rdata. Note the difference between save and saveRDS where the former saves data with their variable names, whereas saveRDS saves the data without and must be loaded into a variable.

Related

How to output a list of dataframes, which is able to be used by another user

I have a list whose elements are several dataframes, which looks like this
Because it is hard for another user to use these data by re-running my original code. Hence, I would like to export it. As the graph shows, the dataframes in that list have different number of rows. I am wondering if there is any method to export it as file without damaging any information, and make it be able to be used by Rstudio. I have tried to save it as RData, but I don't know how to save the information.
Thanks a lot
To output objects in R, here are 4 common methods:
dput() writes a text representation of an R object
This is very convenient if you want to allow someone to get your object by copying and pasting text (for instance on this site), without having to email or upload and download a file. The downside however is that the output is long and re-reading the object into R (simply by assigning the copied text to an object) can hang R for large objects. This works best to create reproducible examples. For a list of data frames, this would not be a very good option.
You can print an object to a .csv, .xlsx, etc. file with write.table(), write.csv(), readr::write_csv(), xlsx::write.xlsx(), etc.
While the file can then be used by other software (and re-imported into R with read.csv(), readr::read_csv(), readxl::read_excel(), etc.), the data can be transformed in the process and some objects cannot be printed in a single file without prior modifications. So this is not ideal in your case either.
save.image() saves your entire workspace (objects + environment)
The workspace can then be recreated with load(). This can be useful, but you are here only interested in saving one object. In that case, it is preferable to use:
saveRDS() which allows to write one object to file
The object can then be re-created with readRDS(). This is the best option to save an R object to file, without any modification and then re-create it.
In your situation, this is definitely the best solution.

BlySky Statistics - File naming conventions

When opening file 'TestFile.RData' in BlueSky Statistics it is opened with this name PLUS Dataset3 attached. Looks like this in tab TestFile.RData(Dataset3)
I would like to use my original name when using r code in the r command editor but from what I see BlueSky wants me to use the Dataset3 name.
Please clarify this file name issue for me.
If my original name is changed I see issues with reproducing things - as the given name of Dataset3 is not controllable.
Regards
Your observation is correct. When ever a file is opened in BlueSky Statistics (that is not an R datafile) we create a dataframe object in R. We name these objects sequentially namely Dataset1, Dataset2,Dataset3, etc. We could always use the name of the original file, however we went with Dataset1,Dataset2,Dataset3 for compatibility with SPSS. Many of our users come from SPSS and that is exactly what SPSS does. There is a simple work around, see below.
To work around this you need to change the default code we use to open the dataset. To see the code in the output window, Go to the top level menu Tools , Tools->Configuration settings->Select the Output tab and select the checkbox near the text "Show syntax in output window"
The code you will see when you open a dataset in the output Window is
BSkyloadDataset(fullpathfilename='C:/Users/Aaron_2/Documents/BlueSky Statistics/Sample Datasets/IRT/engagement.csv', filetype='CSV', worksheetName='',load.missing=FALSE, character.to.factor=FALSE, csvHeader=TRUE, isBasketData=FALSE, trimSPSStrailing=FALSE, sepChar=',', deciChar='.', datasetName='Dataset2')
All you need to do is change the datasetName parameter to the name you want to use
I will also add an enhancement to make the default behavior of naming the dataset when opening files to be the name of the file. This is easy to do.
With R datasets this is not a problem because we load all dataframe objects into the grid. The name of the dataset in the grid, continues to be the dataset object
BlueSky is one of the few packages that use R and allow you to open and work on multiple data files at once. This naming approach is its way of allowing that while using files that have not yet been stored as R data files (.RData). After importing data from a non-R file, simply use "File> Save as" and save it as an R Object (.RData). The next time you open that file, it will maintain the name you've given it.

How to share data frames between scripts in R

I've got multiple R scripts; one that cleans my original data and produces a tidy data frame, and several others that performs functions on that data frame.
When I wrote them, the data frame produced by the first script was in my RStudio environment and the other scripts referenced the resulting data frame without trouble.
Now that I'm trying to run them from the console, the data frame produced by the first script isn't reference-able for the others.
What's the best way to share a data frame between scripts?
You could try using the commands save.image() and load() to save your workspace to a file and then load it onto your console environment as it's likely that your console instance and RStudio each have their own independent environments.
Doing this way, you would have access to all objects that the previous scripts executed. However, if you're only interested in the generated data, you could save your data.frame using save() and open it using load().
As mentioned by #Dirk Eddelbuettel, there are also plenty good functions to save single variables like saveRDS() and readRDS() (which provides a better serialization than save()) and write.csv() and read.csv().

Running jobs in background in R

I am working with a 250 by 250 matrix. However, it takes loads and loads of time to compute this. It takes like an hour at least.
Is it possible that I can store this matrix in memory in R, such that everytime I open up R, it is already there.
Ideally, I would like to know if it is possible to run a job on background in R , so that I dont have to wait an hour to get the matrix out and be able to play around with it.
1) You can save the workspace of R when closing R. Usually R asks "Save workspace image?" when you are closing it. If you will answer "Yes" it will save the workspace in a file named ".Rdata" and will load it when staring a new R instance.
2) The better option (more safe) is to save the matrix explicitly. There are several options how it can be done. One of the options is to save it as Rdata file:
save(m, file = "matrix.Rdata")
where m is your matrix.
You can load the matrix at any time with
load("matrix.Rdata")
if you are on the same working directory.
3) There is not such option as background computing for R. But you can open several R instances. Do computation in one instance, and do something else on other instance.
What would help is to output it to a file when you have computed it and then parse that file everytime you open R. Write yourself a computeMatrix() function or script to produce a file with the matrix stored in a sensible format. Also write yourself a loadMatrix() function or script to read in that file and load the matrix into memory for use, then call or run loadMatrix everytime you start R and want to use the matrix.
In terms of running an R job in the background, you can run an R script from the command line with the syntax "R CMD BATCH scriptName" with scriptName replaced by the name of your script.
It might be better to use the ff package and save the matrix as an ff object. This means that the actual matrix will be saved on the disk in an efficient manner, then when you start a new R session you can point to that same file without loading the entire matrix into memory. When you need part of the matrix, only the part you need will be loaded so it will be much quicker. Even if you need the entire matrix loaded into memory it should load faster than reading a text file.

How to write multiple tables, dataframes, regression results etc - to one excel file?

I am looking for an easy way to get objects into MS Excel.
(I am using the preinstalled "Puromycin"-dataset for the examples)
I would like to place the contents of these objects to a single excel file:
Puromycin
summary(Puromycin$rate)
summary(Purymycin$conc)
table(Puromycin$state)
lm( conc ~ rate , data=Puromycin)
By "contents" i mean what is shown in the console when i press enter. I dont know what to call it.
I tried to do this:
sink("datafilewhichexcelhopefullyunderstands.csv")
Puromycin
summary(Puromycin$rate)
summary(Purymycin$conc)
table(Puromycin$state)
lm( conc ~ rate , data=Puromycin)
sink()
This gives med a file with the CSV-extension, however when i open the file in notepad,
there is comma-separation. That means that i cant get Excel to open it properly. By properly
i mean that each number is in its own cell.
Others have suggested this for a similar problem
https://stackoverflow.com/a/13007555/1831980
But as a novice i feel that the solution is too complex, and I am hoping for a simpler method.
What I am doing now is this:
write.table(Puromycin, file="clipboard" , sep=";" , row.names=FALSE )
write.table(summary(Purymycin$conc), file="clipboard" , sep=";" , row.names=FALSE )
... etc...
But this requires i lot of copy-ing and pasting, which I hope to eliminate.
Any help would appreciated.
write.table and its friends are intended to write out columns of data separated by whatever separator is specified. Your clipboard contains several data types because you are using summary which always gives a unique output.
For writing the data values out, you can use write.csv on a data frame and then open with Excel. For example, Puromycin is already a data frame (which you can see with str(Puromycin)) so you can just write it out directly:
write.csv(file = "some file.csv", x = Puromycin)
Which will go into the current working directory (which can be determined with getwd()).
To write out/save the results of the regression model is a bit more of a challenge. You could definitely use sink as you did, but specify an extension of .txt on your file so a text editor can open it. There are fancier methods (sweave, knitr) which you might want to look into in the long run, as they can write really nice reports automatically.
In the meantime, get to know str(any R object) as it will be your friend. You can see all the objects in your workspace with ls().
This will only be helpful if you are prepared to use Excel's Data/Text to Columns functions:
capture.output( sapply( c(Puromycin,
summary(Puromycin$rate),
summary(Puromycin$conc),
table(Puromycin$state),
lm( conc ~ rate , data=Puromycin) ), FUN=print), file="datafilewhichexcelhopefullyunderstands.csv", append=TRUE)
The problem being that Excel will not read the whitespace as a cell separator unless you specifically tell it to. You can (and I have often done so) use the fixed filed input features offered by the Text-to-Columns dialog interface.
Your simplest option may be to use the RExcel tool, it transfers information between R and Excel. However it is not free software.
The XLConnect package is another option, it can be used to write information directly to an Excel file.
The tricky part is the lm call. lm does not return a simple vector, matrix, or data frame (all of which are easy to convert to csv or send directly) and there is not a clear way to convert the various parts of a list to cells in a spreadsheet. What would be better is to use extractor functions to pull the important parts from the return of lm or the summary of the lm object and send those to Excel using the other tools.
If you can tell us more about why you want the numbers in Excel and what you plan to do with them after, then we may be able to offer better help (you may be able to completely skip excel).
If the main goal is to share output with others then you should really look at the knitr package (or other related packages). This will not create Excel files, but can be used (along with the pandoc program and possibly other tools) to create a report file in a format easy to share with others not familiar with R. You could put everything into a .pdf file or a .docx file (the latter read by MS Word and would have tables wich can be edited using Word). There is not a simple way to get edits back into R, but with the track changes you can easily see what changes have been made and hand edit your R script/template accordingly.

Resources