R load script objects to workspace - r

This is a rookie question that I cannot seem to figure out. Say you've built an R script that manipulates a few data frames. You run the script, it prints out the result. All is well. How is it possible to load objects created within the script to be used in the workspace? For example, say the script creates data frame df1. How can we access that in the workspace? Thanks.
Here is the script...simple function just reads a csv file and computes diff between columns 2 and 3...basically I would like to access spdat in workspace
mspreaddata<-function(filename){
# read csv file
rdat<-read.csv(filename,header=T,sep=",")
# compute spread value column 2-3
spdat$sp<-rdat[,2]-rdat[,3]
}

You should use the source function.
i.e. use source("script.R")
EDIT:
Check the documentation for further details. It'll run the script you call. The objects will then be in your workspace.
Alternatively you can save those objects using save and then load them using load.

So when you source that, the function mspreaddata is not available in your workspace? Because in there spdat is never created. You are just creating a function and not running it. That object spdat only exists within that function and not in any environment external to that. You should add something like
newObject <- mspreaddata("filename.csv")
Then you can access newObject
EDIT:
It is also the case that spdat is not created in your function so the call to spdat$sp<-rdat[,2]-rdat[,3] is itself incorrect. Simply use return(rdat[,2]-rdat[,3]) instead.

Related

How to use a file modified by a R chunk in a Python one

I am working in Rmarkdown into primarily R chunks, which I used to modify data frames. Now that they are ready, a colleague gave me Python codes to process some of the data. But when transitioning from a R chunk to a Python one, the environment changes and I do not know how to use the previous files directly.
reticulate::repl_python()
biodata_file = women_personal_data
NameError: name 'women_personal_data' is not defined
NameError: name 'women_personal_data' is not defined
Ideally, I would like not to have to save the files on my computer between R and Python, and then back at R again, to avoid accumulating files that are not completely clean yet (because I figured it could be a solution).
I tried this solution but seems to not work with Data Frames
Thanks !
biodata_file = r.women_personal_data
The '.r' makes it take it from R, because the variable was called
r women_personal_data
TIP = to come back to R, the variable is now called py$women_personal_data

How to output a list of dataframes, which is able to be used by another user

I have a list whose elements are several dataframes, which looks like this
Because it is hard for another user to use these data by re-running my original code. Hence, I would like to export it. As the graph shows, the dataframes in that list have different number of rows. I am wondering if there is any method to export it as file without damaging any information, and make it be able to be used by Rstudio. I have tried to save it as RData, but I don't know how to save the information.
Thanks a lot
To output objects in R, here are 4 common methods:
dput() writes a text representation of an R object
This is very convenient if you want to allow someone to get your object by copying and pasting text (for instance on this site), without having to email or upload and download a file. The downside however is that the output is long and re-reading the object into R (simply by assigning the copied text to an object) can hang R for large objects. This works best to create reproducible examples. For a list of data frames, this would not be a very good option.
You can print an object to a .csv, .xlsx, etc. file with write.table(), write.csv(), readr::write_csv(), xlsx::write.xlsx(), etc.
While the file can then be used by other software (and re-imported into R with read.csv(), readr::read_csv(), readxl::read_excel(), etc.), the data can be transformed in the process and some objects cannot be printed in a single file without prior modifications. So this is not ideal in your case either.
save.image() saves your entire workspace (objects + environment)
The workspace can then be recreated with load(). This can be useful, but you are here only interested in saving one object. In that case, it is preferable to use:
saveRDS() which allows to write one object to file
The object can then be re-created with readRDS(). This is the best option to save an R object to file, without any modification and then re-create it.
In your situation, this is definitely the best solution.

How to share data frames between scripts in R

I've got multiple R scripts; one that cleans my original data and produces a tidy data frame, and several others that performs functions on that data frame.
When I wrote them, the data frame produced by the first script was in my RStudio environment and the other scripts referenced the resulting data frame without trouble.
Now that I'm trying to run them from the console, the data frame produced by the first script isn't reference-able for the others.
What's the best way to share a data frame between scripts?
You could try using the commands save.image() and load() to save your workspace to a file and then load it onto your console environment as it's likely that your console instance and RStudio each have their own independent environments.
Doing this way, you would have access to all objects that the previous scripts executed. However, if you're only interested in the generated data, you could save your data.frame using save() and open it using load().
As mentioned by #Dirk Eddelbuettel, there are also plenty good functions to save single variables like saveRDS() and readRDS() (which provides a better serialization than save()) and write.csv() and read.csv().

Command to use with easy way the insert of R dataframe

I have a dataframe loaded successfully in R.
I would like to give the data of df to someone else to use them with quick and easy way without need to load again the file into a df.
Which is the command to give the whole data of df (not the str())
You can save the file into a .RData using save or save.image, depending on your needs. First one will save specific objects while the latter will dump the whole workspace to a file. This method has the advantage of working on probably any R object.
Another option is as #user1945827 mentioned, using dput which will produce a string that is parseable into another R session. This will not work for complex (like S4) objects.

Read a file if it hasn't read before in R

I am writing an R script (not a function, just a collection of commands) that will read a csv file into a data frame. The CSV file is large and I don't want to read it every time I am running the script, if it has been already read. This is how I am checking if the variable exists:
if (!exists("df")) {
df <- read_csv(file = "./some_file.csv")
}
However, every time I run the script no matter whether df exists or not, the read_csv function runs.
what am I missing here? Should I specify where df data frame should be searched?
Edit: Here is a bit of context to what I am trying to achieve. Usually, when working I work interactively in R or Rstudio. If I am readying a file, I read it and then the data is in the GlobalEnvinronment and I play with my data. I was trying to put all my work in a script and add to it step by step. In the beginning of the script, I read this CSV file which is about 11MB and then start manipulating the data. However, as I add new lines to my script and I want to test them, I don't want to read the CSV file again. It is already read and the data frame is available in the Global environment. That was the the reason I put the call to read_csv() function inside an if statement.
The problem is despite variable existing in global environment, every time I run the script, the read_csv() function is run, as if the if statement is ignored.
df is actually a function in the stats package which normally exists :-)
So basically, just choose a better variable name!
Can you please use the "where" and "environment" argument and then try. These argument basically drive the exists command to look at this variable at what place/environment.
exists(x, where = -1, envir = , frame, mode = "any",
inherits = TRUE)
Here x is variable name

Resources