Command to use with easy way the insert of R dataframe - r

I have a dataframe loaded successfully in R.
I would like to give the data of df to someone else to use them with quick and easy way without need to load again the file into a df.
Which is the command to give the whole data of df (not the str())

You can save the file into a .RData using save or save.image, depending on your needs. First one will save specific objects while the latter will dump the whole workspace to a file. This method has the advantage of working on probably any R object.
Another option is as #user1945827 mentioned, using dput which will produce a string that is parseable into another R session. This will not work for complex (like S4) objects.

Related

How to keep style format unchanged after writing data using openxlsx in R

I am using openxlsx in order to write the outputs of my data.
I have used the following code to read my data using readxl.
df1=read_excel("C:/my_data.xlsx",skip=2);
Now I want to write the output and keep the original Excel file using any possible package. I have used the following codes, but it does not keep the original Excel file. Can we do it it in R packages?
write.xlsx(df1, 'C:/mydata.xlsx',skip=2)
Given your code, you should nhave two different data files in your working directory:
"my_data.xlsx" (the one that you loaded), and "mydata.xlsx" (the one that you created through R). R shouldn't overwrite your files if you give them different names.
If there's only one file, are you sure you didn't use the same name for both files? If so, then everything should work fine if you give the files different names (e.g. "my_file1.xlsx" and "my_file2.xlsx")!
Also, in general, it's a good idea to give data files an informative name so that you don't accidentally delete/overwrite files that you need. For example, if the original excel data is you raw data, consider naming it "data_raw.xlsx", and make sure that you only read it, and whenever you make some changes to it, save it under a different name (e.g. "data_processed1.xlsx").
You can also save data files in the native R format .rds using the save_rds() function, this is especially helpful if you want to keep special attributes of variables such as factors, etc...
Hope this helps!

How to output a list of dataframes, which is able to be used by another user

I have a list whose elements are several dataframes, which looks like this
Because it is hard for another user to use these data by re-running my original code. Hence, I would like to export it. As the graph shows, the dataframes in that list have different number of rows. I am wondering if there is any method to export it as file without damaging any information, and make it be able to be used by Rstudio. I have tried to save it as RData, but I don't know how to save the information.
Thanks a lot
To output objects in R, here are 4 common methods:
dput() writes a text representation of an R object
This is very convenient if you want to allow someone to get your object by copying and pasting text (for instance on this site), without having to email or upload and download a file. The downside however is that the output is long and re-reading the object into R (simply by assigning the copied text to an object) can hang R for large objects. This works best to create reproducible examples. For a list of data frames, this would not be a very good option.
You can print an object to a .csv, .xlsx, etc. file with write.table(), write.csv(), readr::write_csv(), xlsx::write.xlsx(), etc.
While the file can then be used by other software (and re-imported into R with read.csv(), readr::read_csv(), readxl::read_excel(), etc.), the data can be transformed in the process and some objects cannot be printed in a single file without prior modifications. So this is not ideal in your case either.
save.image() saves your entire workspace (objects + environment)
The workspace can then be recreated with load(). This can be useful, but you are here only interested in saving one object. In that case, it is preferable to use:
saveRDS() which allows to write one object to file
The object can then be re-created with readRDS(). This is the best option to save an R object to file, without any modification and then re-create it.
In your situation, this is definitely the best solution.

How to save large output sufficiently fast in text or any other format?

My question is: how to save the output i.e., mydata
mydata=array(sample(100),dim=c(2,100,4000))
reasonably fast?
I used the reshape2 package as suggested here.
melt(mydata)
and
write.table(mydata,file="data_1")
But it is taking more than one hour to save the data into the file. I am looking for any other faster ways to do the job.
I strongly suggest to refer to this great post, that surely helps in make issues clear about file saving.
Anyway, saveRDS could be the most adequate for you. The difference more relevant, in this case, is that save can save many objects to a file in a single call, whilst saveRDS, being a lower-level function, works with a single object at a time.
save and load allow you to save a named R object to a file or other connection and restore that object again. But, when loaded, the named object is restored to the current environment with the same name it had when saved.
saveRDS and loadRDS, instead, allow to save a single R object to a connection (typically a file) and to restore the object, possibly with a different name. The low level operability makes RDS functions more efficient, probably, for your case.
Read the help text for saveRDS using ?saveRDS. This will probably be the best way for you to save and load large dataframes.
saveRDS(yourdata, file = "yourdata.Rda")

R load script objects to workspace

This is a rookie question that I cannot seem to figure out. Say you've built an R script that manipulates a few data frames. You run the script, it prints out the result. All is well. How is it possible to load objects created within the script to be used in the workspace? For example, say the script creates data frame df1. How can we access that in the workspace? Thanks.
Here is the script...simple function just reads a csv file and computes diff between columns 2 and 3...basically I would like to access spdat in workspace
mspreaddata<-function(filename){
# read csv file
rdat<-read.csv(filename,header=T,sep=",")
# compute spread value column 2-3
spdat$sp<-rdat[,2]-rdat[,3]
}
You should use the source function.
i.e. use source("script.R")
EDIT:
Check the documentation for further details. It'll run the script you call. The objects will then be in your workspace.
Alternatively you can save those objects using save and then load them using load.
So when you source that, the function mspreaddata is not available in your workspace? Because in there spdat is never created. You are just creating a function and not running it. That object spdat only exists within that function and not in any environment external to that. You should add something like
newObject <- mspreaddata("filename.csv")
Then you can access newObject
EDIT:
It is also the case that spdat is not created in your function so the call to spdat$sp<-rdat[,2]-rdat[,3] is itself incorrect. Simply use return(rdat[,2]-rdat[,3]) instead.

How to put datasets into an R package

I am creating my own R package and I was wondering what are the possible methods that I can use to add (time-series) datasets to my package. Here are the specifics:
I have created a package subdirectory called data and I am aware that this is the location where I should save the datasets that I want to add to my package. I am also cognizant of the fact that the files containing the data may be .rda, .txt, or .csv files.
Each series of data that I want to add to the package consists of a single column of numbers (eg. of the form 340 or 4.5) and each series of data differs in length.
So far, I have saved all of the datasets into a .txt file. I have also successfully loaded the data using the data() function. Problem not solved, however.
The problem is that each series of data loads as a factor except for the series greatest in length. The series that load as factors contain missing values (of the form '.'). I had to add these missing values in order to make each column of data the same in length. I tried saving the data as unequal columns, but I received an error message after calling data().
A consequence of adding missing values to get the data to load is that once the data is loaded, I need to remove the NA's in order to get on with my analysis of the data! So, this clearly is not a good way of doing things.
Ideally (I suppose), I would like the data to load as numeric vectors or as a list. In this way, I wouldn't need the NA's appended to the end of each series.
How do I solve this problem? Should I save all of the data into one single file? If so, in what format should I do it? Perhaps I should save the datasets into a number of files? Again, in which format? What is the best practical way of doing this? Any tips would greatly be appreciated.
I'm not sure if I understood your question correctly. But, if you edit your data in your favorite format and save with
save(myediteddata, file="data.rda")
The data should be loaded exactly the way you saw it in R.
To load all files in data directory you should add
LazyData: true
To your DESCRIPTION file, in your package.
If this don't help you could post one of your files and a print of the format you want, this will help us to help you ;)
In addition to saving as rda files you could also choose to load them as numeric with:
read.table( ... , colClasses="numeric")
Or as non-factor-text:
read.table( ..., as.is=TRUE) # which does pretty much the same as stringsAsFactors=FALSE
read.table( ..., colClasses="character")
It also appears that the data function would accept these arguments sinc it is documented to be a simple wrapper for read.table(..., header=TRUE).
Preferred saving location of your data depends on its format.
As Hadley suggested:
If you want to store binary data and make it available to the user,
put it in data/. This is the best place to put example datasets.
If you want to store parsed data, but not make it available to the
user, put it in R/sysdata.rda. This is the best place to put data
that your functions need.
If you want to store raw data, put it in inst/extdata.
I suggest you have a look at the linked chapter as it goes into detail about working with data when developing R packages.
You'll need to create the data file and include it in the R package, and you may want to also document it. Here's how to do both.
Create the data file and include it in R package
Create a directory inside the package called /data and place any data in it. Use only .rda and .RData files.
When creating the rda/RData file from an R object, make sure the R object is named what you want it to be named when it's used in the package and use save() to create it. Example:
save(river_fish, file = "data/river_fish.rda", version = 2)
Add this on a new line in the file called DESCRIPTION:
LazyData: true
Documenting the dataset
Document the dataset by placing a string with the dataset name after the documentation:
#' This is data to be included in my package
#'
#' #author My Name \email{blahblah##roxygen.org}
#' #references \url{data_blah.com}
"data-name"
Here and here are some nice examples from dplyr.
Notes
To access the data in the package, run river_fish or whatever the name of the dataset is. Nothing more is needed.
Using version = 2 when calling save() ensures your data object is available for older R versions (i.e. prior to 3.5.0) i.e. it will prevent this warning:
WARNING: Added dependency on R >= 3.5.0 because serialized objects in serialize/load version 3 cannot be read in older versions of R.
No need to use load() in the R package (just call the object directly instead e.g. river_fish will be enough to yield the data from data/river_fish.rda), but in the event you do wish to load an rda/RData file for some reason (e.g. playing around or testing), this will do it:
load("data/river_fish.rda")
Informative sources here and here

Resources