What to do when you have a lot of individual objects in the global environment and you want to save them as a table of values?
I could not find any similar questions here, but found and an answer in the R help files eventually. It's posted below.
The dump function in base R did the trick. I used dump(ls(), "my_file_name.txt")
It produces a text file that you can edit pretty easily. I used a macro in notepad++ to replace the <- items and delete the line break, resulting in a file I could easily copy and paste into a table.
There's probably a better way. Other answers are more than welcome.
Related
Background
I'm doing some data manipulation (joins, etc.) on a very large dataset in R, so I decided to use a local installation of Apache Spark and sparklyr to be able to use my dplyr code to manipulate it all. (I'm running Windows 10 Pro; R is 64-bit.) I've done the work needed, and now want to output the sparklyr table to a .csv file.
The Problem
Here's the code I'm using to output a .csv file to a folder on my hard drive:
spark_write_csv(d1, "C:/d1.csv")
When I navigate to the directory in question, though, I don't see a single csv file d1.csv. Instead I see a newly created folder called d1, and when I click inside it I see ~10 .csv files all beginning with "part". Here's a screenshot:
The folder also contains the same number of .csv.crc files, which I see from Googling are "used to store CRC code for a split file archive".
What's going on here? Is there a way to put these files back together, or to get spark_write_csv to output a single file like write.csv?
Edit
A user below suggested that this post may answer the question, and it nearly does, but it seems like the asker is looking for Scala code that does what I want, while I'm looking for R code that does what I want.
I had the exact same issue.
In simple terms, the partitions are done for computational efficiency. If you have partitions, multiple workers/executors can write the table on each partition. In contrast, if you only have one partition, the csv file can only be written by a single worker/executor, making the task much slower. The same principle applies not only for writing tables but also for parallel computations.
For more details on partitioning, you can check this link.
Suppose I want to save table as a single file with the path path/to/table.csv. I would do this as follows
table %>% sdf_repartition(partitions=1)
spark_write_csv(table, path/to/table.csv,...)
You can check full details of sdf_repartition in the official documentation.
Data will be divided into multiple partitions. When you save the dataframe to CSV, you will get file from each partition. Before calling spark_write_csv method you need to bring all the data to single partition to get single file.
You can use a method called as coalese to achieve this.
coalesce(df, 1)
I'm no R-programmer (because of the problem I started learning it), I'm using Python, In a forcasting task I got a dataset signalList.rdata of a pheomenen called partial discharge.
I tried some commands to load, open and view, Hardly got a glimps
my_data <- get(load('C:/Users/Zack-PC/Desktop/Study/Data Sets/pdCluster/signalList.Rdata'))
but, since i lack deep knowledge about R, I wanted to convert it into a csv file, or any type that I can deal with in python.
or, explore it and copy-paste manually.
so, i'm asking for any solution whether using R or Python or any tool to get what's in the .rdata file.
Have you managed to load the data successfully into your working environment?
If so, write.csv is the function you are looking for.
If not,
setwd("C:/Users/Zack-PC/Desktop/Study/Data Sets/pdCluster/")
signalList <- load("signalList.Rdata")
write.csv(signalList, "signalList.csv")
should do the trick.
If you would like to remove signalList from your working directory,
rm(signalList)
will accomplish this.
Note: changing your working directory isn't necessary, it just makes it easier to read in a comment I feel. You may also specify another path for saving your csv to within the second argument of write.csv.
I'm looking to see if there is a way in R to read in a .R file's source code as text. What I'd like to do is give the path to a .R file, have R grab that R file and return the source code, in text, of that R file.
After that, I plan to make a gsub edit on the source code, and then saving the edited text to the same location (which I believe I can do with the save function). The gsub regular expression is solid (as I wrote the code) and know that will only match what I want it to match and replace.
Naturally, I'm backing up everything before attempting any of this. The part that I am having the most issue with is reading in a .R file's code as text to be edited. I'm also not sure if this would destroy the formatting of the R file, but obviously it would be preferred to not do that. Any help is greatly appreciated!
code <- readLines("<path>")
code_edited <- gsub("foo", "bar", code)
writeLines(code_edited, "<path>")
I'm currently implementing a tool in R and I got stucked with a problem. I looked already in the forums and didn't found anything.
I have many .csv files, which are somehow correlated with each other. The problem is I don't know yet how (this depends on the input of the user of the tool). Now I would like to read in a csv-file, that contains an arbitrary function f, e.g. f: a=b+max(c,d), and then the inputs, e.g. a="Final Sheet", b="Sheet1", c="Sheet2", d="Sheet3". (Maybe I didn't explained it very well, then I will upload a picture).
Now my question is, can I somehow read that csv file in, such that I can later use the function f in the programm? (Of course the given function has to be common in R).
I hope you understand my problem and I would appreciate any help or idea!!
I would not combine data files with R source. Much easier to keep them separate. You put your functions in separate script files and then source() them as needed, and load your data with read.csv() etc.
"Keep It Simple" :-)
I am sure there's a contorted way of reading in the source code of a function from a text file and then eval() it somehow -- but I am not sure it would be worth the effort.
Situation
I wrote an R program which I split up into multiple R-files for the sake of keeping a good code structure.
There is a Main.R file which references all the other R-files with the 'source()' command, like this:
source(paste(getwd(), dirname1, 'otherfile1.R', sep="/"))
source(paste(getwd(), dirname3, 'otherfile2.R', sep="/"))
...
As you can see, the working directory needs to be set correctly in advance, otherwise, this could go wrong.
Now, if I want to share this R program with someone else, I have to pass all the R files and folders in relative order of each other for things to work. Hence my next question.
Question
Is there a way to replace all the 'source' commands with the actual R script code which it refers to? That way, I have a SINGLE R script file, which I can simply pass along without having to worry about setting the working directory.
I'm not looking for a solution which is an 'R package' (which by the way is one single directory, so I would lose my own directory structure). I simply wondering if there is an easy way to combine these self-referencing R files into one single file.
Thanks,
Ok I think you could use something like scaning all the files and then writting them again in the same new one. This can be done using readLines and sink:
sink("mynewRfile.R")
for(i in Nfiles){
current_file = readLines(filedir[i])
cat("\n\n#### Current file:",filedir[i],"\n\n")
cat(current_file, sep ="\n")
}
sink()
Here I have supposed all your file directories are in a vector filedir with length Nfiles, I guess you can adapt that