How to output a list of dataframes, which is able to be used by another user

How to output a list of dataframes, which is able to be used by another user - r

I have a list whose elements are several dataframes, which looks like this
Because it is hard for another user to use these data by re-running my original code. Hence, I would like to export it. As the graph shows, the dataframes in that list have different number of rows. I am wondering if there is any method to export it as file without damaging any information, and make it be able to be used by Rstudio. I have tried to save it as RData, but I don't know how to save the information.
Thanks a lot

To output objects in R, here are 4 common methods:
dput() writes a text representation of an R object
This is very convenient if you want to allow someone to get your object by copying and pasting text (for instance on this site), without having to email or upload and download a file. The downside however is that the output is long and re-reading the object into R (simply by assigning the copied text to an object) can hang R for large objects. This works best to create reproducible examples. For a list of data frames, this would not be a very good option.
You can print an object to a .csv, .xlsx, etc. file with write.table(), write.csv(), readr::write_csv(), xlsx::write.xlsx(), etc.
While the file can then be used by other software (and re-imported into R with read.csv(), readr::read_csv(), readxl::read_excel(), etc.), the data can be transformed in the process and some objects cannot be printed in a single file without prior modifications. So this is not ideal in your case either.
save.image() saves your entire workspace (objects + environment)
The workspace can then be recreated with load(). This can be useful, but you are here only interested in saving one object. In that case, it is preferable to use:
saveRDS() which allows to write one object to file
The object can then be re-created with readRDS(). This is the best option to save an R object to file, without any modification and then re-create it.
In your situation, this is definitely the best solution.

Related

In R and Sparklyr, writing a table to .CSV (spark_write_csv) yields many files, not one single file. Why? And can I change that?

Background
I'm doing some data manipulation (joins, etc.) on a very large dataset in R, so I decided to use a local installation of Apache Spark and sparklyr to be able to use my dplyr code to manipulate it all. (I'm running Windows 10 Pro; R is 64-bit.) I've done the work needed, and now want to output the sparklyr table to a .csv file.
The Problem
Here's the code I'm using to output a .csv file to a folder on my hard drive:
spark_write_csv(d1, "C:/d1.csv")
When I navigate to the directory in question, though, I don't see a single csv file d1.csv. Instead I see a newly created folder called d1, and when I click inside it I see ~10 .csv files all beginning with "part". Here's a screenshot:
The folder also contains the same number of .csv.crc files, which I see from Googling are "used to store CRC code for a split file archive".
What's going on here? Is there a way to put these files back together, or to get spark_write_csv to output a single file like write.csv?
Edit
A user below suggested that this post may answer the question, and it nearly does, but it seems like the asker is looking for Scala code that does what I want, while I'm looking for R code that does what I want.

I had the exact same issue.
In simple terms, the partitions are done for computational efficiency. If you have partitions, multiple workers/executors can write the table on each partition. In contrast, if you only have one partition, the csv file can only be written by a single worker/executor, making the task much slower. The same principle applies not only for writing tables but also for parallel computations.
For more details on partitioning, you can check this link.
Suppose I want to save table as a single file with the path path/to/table.csv. I would do this as follows
table %>% sdf_repartition(partitions=1)
spark_write_csv(table, path/to/table.csv,...)
You can check full details of sdf_repartition in the official documentation.

Data will be divided into multiple partitions. When you save the dataframe to CSV, you will get file from each partition. Before calling spark_write_csv method you need to bring all the data to single partition to get single file.
You can use a method called as coalese to achieve this.
coalesce(df, 1)

How to keep style format unchanged after writing data using openxlsx in R

I am using openxlsx in order to write the outputs of my data.
I have used the following code to read my data using readxl.
df1=read_excel("C:/my_data.xlsx",skip=2);
Now I want to write the output and keep the original Excel file using any possible package. I have used the following codes, but it does not keep the original Excel file. Can we do it it in R packages?
write.xlsx(df1, 'C:/mydata.xlsx',skip=2)

Given your code, you should nhave two different data files in your working directory:
"my_data.xlsx" (the one that you loaded), and "mydata.xlsx" (the one that you created through R). R shouldn't overwrite your files if you give them different names.
If there's only one file, are you sure you didn't use the same name for both files? If so, then everything should work fine if you give the files different names (e.g. "my_file1.xlsx" and "my_file2.xlsx")!
Also, in general, it's a good idea to give data files an informative name so that you don't accidentally delete/overwrite files that you need. For example, if the original excel data is you raw data, consider naming it "data_raw.xlsx", and make sure that you only read it, and whenever you make some changes to it, save it under a different name (e.g. "data_processed1.xlsx").
You can also save data files in the native R format .rds using the save_rds() function, this is especially helpful if you want to keep special attributes of variables such as factors, etc...
Hope this helps!

What are the commands for viewing a ".RData" file's data in RStudio?

I am trying to find out how I can see the data within a dataset with a .RData extension.
I tried view(), it gave me one object present in the dataset but I know that this dataset is a large dataset with over 300MB size and consists of a very large number of names list. I need to view all of the contents of it and have been unsuccessful so far.
Should I convert it into a CSV instead in order to view all of the contents? If yes, how can I do that using RStudio?

The cross-platform function is View. (Caps are discriminatory in R.) If you did:
obj <- load("filename.Rdata") # assuming a file exist in your working directory
Then type:
obj
You should see a print-listing of the character representations of the objects created (or possibly overwritten) in your global environment. The Rstudio aspect of this question would not affect the result.

How to save large output sufficiently fast in text or any other format?

My question is: how to save the output i.e., mydata
mydata=array(sample(100),dim=c(2,100,4000))
reasonably fast?
I used the reshape2 package as suggested here.
melt(mydata)
and
write.table(mydata,file="data_1")
But it is taking more than one hour to save the data into the file. I am looking for any other faster ways to do the job.

I strongly suggest to refer to this great post, that surely helps in make issues clear about file saving.
Anyway, saveRDS could be the most adequate for you. The difference more relevant, in this case, is that save can save many objects to a file in a single call, whilst saveRDS, being a lower-level function, works with a single object at a time.
save and load allow you to save a named R object to a file or other connection and restore that object again. But, when loaded, the named object is restored to the current environment with the same name it had when saved.
saveRDS and loadRDS, instead, allow to save a single R object to a connection (typically a file) and to restore the object, possibly with a different name. The low level operability makes RDS functions more efficient, probably, for your case.

Read the help text for saveRDS using ?saveRDS. This will probably be the best way for you to save and load large dataframes.
saveRDS(yourdata, file = "yourdata.Rda")

Command to use with easy way the insert of R dataframe

I have a dataframe loaded successfully in R.
I would like to give the data of df to someone else to use them with quick and easy way without need to load again the file into a df.
Which is the command to give the whole data of df (not the str())

You can save the file into a .RData using save or save.image, depending on your needs. First one will save specific objects while the latter will dump the whole workspace to a file. This method has the advantage of working on probably any R object.
Another option is as #user1945827 mentioned, using dput which will produce a string that is parseable into another R session. This will not work for complex (like S4) objects.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

How to output a list of dataframes, which is able to be used by another user - r

Related

In R and Sparklyr, writing a table to .CSV (spark_write_csv) yields many files, not one single file. Why? And can I change that?

How to keep style format unchanged after writing data using openxlsx in R

What are the commands for viewing a ".RData" file's data in RStudio?

How to save large output sufficiently fast in text or any other format?

Command to use with easy way the insert of R dataframe

Categories

Resources