Source Multiple R Scripts With Delay In Between - r

I'm trying to source multiple R scripts with a short delay in between each one. The 15 R scripts to be 'sourced' all collect data from GA API, transform/clean/analyze the data, then finally push the into their own worksheets within a single Google Sheet. So I'd like to set a wait of 1 minute between each script to make sure I'm not overloading the Google Sheet file.
How can I turn the code (below) into a mini-function where there is a wait time between each source() command?
source("/code/processed/script1.R")
source("/code/processed/script1.R")
source("/code/processed/script1.R")
...
source("/code/processed/script15.R")
Thanks in advance for your help! :)
PS - For context, please note I have my working directory organized in the following hierarchy:
|-project
|-code
|-processed
|-raw
|-data
|-processed
|-raw

As suggested in my comment I would use sys.sleep(), either by manually adding it netwerk every source command:
source(...)
sys.sleep(60)
source(...)
Or by storing all scripts in a vector and looping over them.

Related

In R and Sparklyr, writing a table to .CSV (spark_write_csv) yields many files, not one single file. Why? And can I change that?

Background
I'm doing some data manipulation (joins, etc.) on a very large dataset in R, so I decided to use a local installation of Apache Spark and sparklyr to be able to use my dplyr code to manipulate it all. (I'm running Windows 10 Pro; R is 64-bit.) I've done the work needed, and now want to output the sparklyr table to a .csv file.
The Problem
Here's the code I'm using to output a .csv file to a folder on my hard drive:
spark_write_csv(d1, "C:/d1.csv")
When I navigate to the directory in question, though, I don't see a single csv file d1.csv. Instead I see a newly created folder called d1, and when I click inside it I see ~10 .csv files all beginning with "part". Here's a screenshot:
The folder also contains the same number of .csv.crc files, which I see from Googling are "used to store CRC code for a split file archive".
What's going on here? Is there a way to put these files back together, or to get spark_write_csv to output a single file like write.csv?
Edit
A user below suggested that this post may answer the question, and it nearly does, but it seems like the asker is looking for Scala code that does what I want, while I'm looking for R code that does what I want.
I had the exact same issue.
In simple terms, the partitions are done for computational efficiency. If you have partitions, multiple workers/executors can write the table on each partition. In contrast, if you only have one partition, the csv file can only be written by a single worker/executor, making the task much slower. The same principle applies not only for writing tables but also for parallel computations.
For more details on partitioning, you can check this link.
Suppose I want to save table as a single file with the path path/to/table.csv. I would do this as follows
table %>% sdf_repartition(partitions=1)
spark_write_csv(table, path/to/table.csv,...)
You can check full details of sdf_repartition in the official documentation.
Data will be divided into multiple partitions. When you save the dataframe to CSV, you will get file from each partition. Before calling spark_write_csv method you need to bring all the data to single partition to get single file.
You can use a method called as coalese to achieve this.
coalesce(df, 1)

How to combine multiple similar scripts into one in R?

I have 48 scripts used to clean data corresponding to 48 different tests. The cleaning protocols for each test used to be unique and test-specific, but after some time the final project guideline allows that all tests may use the same cleaning protocol granted they save all output files to the appropriate directory (each test's own folder of results). I'm trying to combine these tests into one master cleaning script that can be used by any team member to clean data as more is collected, or make small changes, given they have the raw data files and a folder for each test (that I would give to them).
Currently I have tried two approaches:
The first is to include all necessary libraries in the body of a master cleaning script, then source() each individual cleaning script. Inside each script, the libraries are the require()ed, the appropriate files are read in, and code for the files are saved to their correct destination. This method seems to work best, but if the whole script is run, some subtests are successfully cleaned and saved to their correct locations, and the rest need to be saved individually--I'm not sure why.
library(readr)
library(dplyr)
library(data.table)
library(lubridate)
source("~/SF_Cleaning_Protocol.R")
etc
.
.
The second is the save the body of the general cleaning script as a function, and then call that function in a series of if statements based on the test one wants to clean.
For example:
if (testname == "SF"){
setwd("~/SF")
#read in the csv file
subtest<- read_csv()
path_map<- read_csv()
SpecIDs<- read_csv()
CleaningProtocol(subtest,path_map,SpecIDs)
write.csv("output1.csv")
write.csv("output2.csv")
write.csv("output3.csv")
write.csv("output4.csv")
} else if (testname == "EV"){
etc
}
The code reads in and prints out files fine if selected individually, but when testname is specified and the script is run as a whole, it ignores the if statements, runs all test, but fails to print results for any.
Is there a better option I haven't tried, or can anyone help me diagnose my issues?
Many thanks.

"filename.rdata" file Exploring and Converting to CSV

I'm no R-programmer (because of the problem I started learning it), I'm using Python, In a forcasting task I got a dataset signalList.rdata of a pheomenen called partial discharge.
I tried some commands to load, open and view, Hardly got a glimps
my_data <- get(load('C:/Users/Zack-PC/Desktop/Study/Data Sets/pdCluster/signalList.Rdata'))
but, since i lack deep knowledge about R, I wanted to convert it into a csv file, or any type that I can deal with in python.
or, explore it and copy-paste manually.
so, i'm asking for any solution whether using R or Python or any tool to get what's in the .rdata file.
Have you managed to load the data successfully into your working environment?
If so, write.csv is the function you are looking for.
If not,
setwd("C:/Users/Zack-PC/Desktop/Study/Data Sets/pdCluster/")
signalList <- load("signalList.Rdata")
write.csv(signalList, "signalList.csv")
should do the trick.
If you would like to remove signalList from your working directory,
rm(signalList)
will accomplish this.
Note: changing your working directory isn't necessary, it just makes it easier to read in a comment I feel. You may also specify another path for saving your csv to within the second argument of write.csv.

Print plain text of help file to console [duplicate]

I'd like to be able to write the contents of a help file in R to a file from within R.
The following works from the command-line:
R --slave -e 'library(MASS); help(survey)' > survey.txt
This command writes the help file for the survey data file
--slave hides both the initial prompt and commands entered from the
resulting output
-e '...' sends the command to R
> survey.txt writes the output of R to the file survey.txt
However, this does not seem to work:
library(MASS)
sink("survey.txt")
help(survey)
sink()
How can I save the contents of a help file to a file from within R?
Looks like the two functions you would need are tools:::Rd2txt and utils:::.getHelpFile. This prints the help file to the console, but you may need to fiddle with the arguments to get it to write to a file in the way you want.
For example:
hs <- help(survey)
tools:::Rd2txt(utils:::.getHelpFile(as.character(hs)))
Since these functions aren't currently exported, I would not recommend you rely on them for any production code. It would be better to use them as a guide to create your own stable implementation.
While Joshua's instructions work perfectly, I stumbled upon another strategy for saving an R helpfile; So I thought I'd share it. It works on my computer (Ubuntu) where less is the R pager. It essentially just involves saving the file from within less.
help(survey)
Then follow these instructions to save less buffer to file
i.e., type g|$tee survey.txt
g goes to the top of the less buffer if you aren't already there
| pipes text between the range starting at current mark
and ending at $ which indicates the end of the buffer
to the shell command tee which allows standard out to be sent to a file

How to save a function as new R script?

Given a function, how to save it to an R script (.R)?
Save works well with data, but apparently can not create .R data.
Copy pasting from the console to a new script file appears to introduce characters that cause errors.
Take a look at the dump function. That writes files that are R code that can be read back in with source or used in some other way.
I have to ask: why are you writing your functions in the console in the first place? Any number of editors support a "source" call, so you can update the function as you edit. Copy/pasting from the console will carry prompt characters along , if nothing else, so it's a bad idea to begin with.

Resources