Create an automated R script using taskscheduleR - r

I trying to create some automated R scripts using the taskscheduleR library. I have created the following script:
library(lubridate)
setwd("C:/Users/Marc/Desktop/")
create_df <- function(){
list <- c(1,2,3)
df <- data.frame(list)
x <- format(Sys.time(), "%S")
name <- paste0("name_", x, ".csv")
write.csv(df, name)
}
create_df()
That can be fired up with the following:
myscript <- "C:/Users/Marc/Dropbox/PROJECTEN/Lopend/taskschedulR_test/test.R"
taskscheduler_create(taskname = "myfancyscript", rscript = myscript,
schedule = "ONCE", starttime = format(Sys.time() + 62, "%H:%M"))
However when I execute it nothing happens. Any thoughts on how I can get this running?

It worked for me, I've now got a .csv called "name_03". I have the script within the folder that the output goes into, unlike yours which is in your dropbox. You could check the event log by looking at the history tab on the Task Scheduler, type this into R:
system("control schedtasks")

Related

Running R notebook from command line

Ok, I'm trying to run this script through a batch file in Windows (server 2016), but it just starts to push out lineshifts and dots to the output screen:
"c:\Program Files\R\R-3.5.1\bin\rscript.exe" C:\projects\r\HentTsmReport.R
The script works like a charm in RStudio, it reads a html file (a TSM backup report) and transforms the content into a data frame, then it saves one of the html tables as a csv-file.
Why do I just get a gunk of nothing on the screen instead of an output to the csv when running through rscript.exe?
My goal is to run this script through a scheduled task each day to keep a history of the backup status in a table to keep track of failed backups through tivoli.
This is the script in the R-file:
library(XML)
library(RCurl)
#library(tidyverse)
library(rlist)
theurl <- getURL("file://\\\\chill\\um\\backupreport20181029.htm",.opts = list(ssl.verifypeer = FALSE) )
tables <- readHTMLTable(theurl)
tables <- list.clean(tables, fun = is.null, recursive = FALSE)
n.rows <- unlist(lapply(tables, function(t) dim(t)[1]))
head(tables)
test <- tables[5] # select table number 5
write.csv(test, file = "c:\\temp\\backupreport.csv")

is there any method to defer the execution of code in r?

I have created the following function to read a csv file from a given URL:
function(){
s <- 1;
#first get the bhav copy
today <- c();ty <- c();tm <- c();tmu <- c();td <- c();
# get the URL first
today <- Sys.Date()
ty <- format(today, format = "%Y")
tm <- format(today, format = "%b")
tmu <- toupper(tm)
td <- format(today, format = "%d")
dynamic.URL <- paste("https://www.nseindia.com/content/historical/EQUITIES/",ty,"/",tmu,"/cm",td,tmu,ty,"bhav.csv.zip", sep = "")
file.string <- paste("C:/Users/user/AppData/Local/Temp/cm",td,tmu,ty,"bhav.csv")
download.file(dynamic.URL, "C:/Users/user/Desktop/bhav.csv.zip")
bhav.copy <- read.csv(file.string)
return(bhav.copy)
}
If I run the function, immediately it says that "file.string not found". But when I run it after some time(a few seconds), it executes normally. I think when download.file ecexecutes, it transfers control to read.csv,and it tries to load the file which is not yet properly saved. when i run it after some time, it tries to overwrite the existing file, which it cannot, and the read.csvproperly loads the saved file.`
I want the function to execute the first time I run it. Is there any way or a function to defer the action of read.csvuntil the file is properly saved? Something like this:
download.file(dynamic.URL, "C:/Users/user/Desktop/bhav.csv.zip")
wait......
bhav.copy <- read.csv(file.string)
Ignore the fact that the destfile in download.file is different from file.string; it is due to function of my system (windows 7).
Very many thanks for your time and effort...

R, Rscript, Works when variables hard coded, but not when passed as argument

I built the following R script to take a .csv generated by an automated report and split it into several .csv files.
This code works perfectly, and outputs a .csv file for each unique value of "facility" in "todays_data.csv":
disps <- read.csv("/Users/me/Downloads/todays_data.csv", header = TRUE, sep=",")
for (facility in levels(disps$Facility)) {
temp <- subset(disps, disps$Facility == facility & disps$Alert.End == "")
temp <- temp[order(temp$Unit, temp$Area),]
fn <- paste("/Users/me/Documents/information/", facility, "_todays_data.csv", sep = "")
write.csv(temp, fn, row.names=FALSE)
}
But this does not output anything:
args <- commandArgs(trailingOnly = TRUE)
file <- args[1]
disps <- read.csv(file, header = TRUE, sep=",")
for (facility in levels(disps$Facility)) {
temp <- subset(disps, disps$Facility == facility & disps$Alert.End == "")
temp <- temp[order(temp$Unit, temp$Area),]
fn <- paste("/Users/me/Documents/information/", facility, "_todays_data.csv", sep = "")
write.csv(temp, fn, row.names=FALSE)
}
The only difference between the two files is that the first hardcodes the path to the .csv file to be split, while the second one has it passed as an argument in the command line using Rscript.
The read.csv() command works with the passed file path, because I can successfully run commands like head(disps) while running the script via Rscript.
Nothing within the for-loop will execute when run via Rscript, but things before and after it will.
Does anyone have any clues as to what I've missed? Thank you.

Sourcing an R script from github, for global session use, from within a wrapper function?

I can source an R script held on github (using the 'raw' text link) as follows:
# load package
require(RCurl)
# check 1
ls()
#character(0)
# read script lines from website
u <- "https://raw.github.com/tonybreyal/Blog-Reference-Functions/master/R/bingSearchXScraper/bingSearchXScraper.R"
script <- getURL(u, ssl.verifypeer = FALSE)
eval(parse(text = script))
# clean-up
rm("script", "u")
# check 2
ls()
#[1] "bingSearchXScraper"
However, what I would really like to do is wrap that up in a function. This is where I run into problems and I suspect it has something to do with the functions of the script only existing locally within the function it's called in. For example, here is the sort of thing I am aiming for:
source_github <- function(u) {
# load package
require(RCurl)
# read script lines from website and evaluate
script <- getURL(u, ssl.verifypeer = FALSE)
eval(parse(text = script))
}
source_github("https://raw.github.com/tonybreyal/Blog-Reference-Functions/master/R/bingSearchXScraper/bingSearchXScraper.R")
Many thanks in advance for your time.
Use:
eval(parse(text = script),envir=.GlobalEnv)
to stick the results into your default search space. Overwriting anything else with the same names, of course.

Troubleshooting R mapper script on Amazon Elastic MapReduce - Results not as expected

I am trying to use Amazon Elastic Map Reduce to run a series of simulations of several million cases. This is an Rscript streaming job with no reducer. I am using the Identity Reducer in my EMR call --reducer org.apache.hadoop.mapred.lib.IdentityReducer.
The script file works fine when tested and run locally from the command line on a Linux box when passing one line of string manually echo "1,2443,2442,1,5" | ./mapper.R and I get the one line of results that I am expecting. However, when I tested my simulation using about 10,000 cases (lines) from the input file on EMR, I only got output for a dozen lines or so out of 10k input lines. I've tried several times and I cannot figure out why. The Hadoop job runs fine without any errors. It seems like input lines are being skipped, or perhaps something is happening with the Identity reducer. The results are correct for the cases where there is output.
My input file is a csv with the following data format, a series of five integers separated by commas:
1,2443,2442,1,5
2,2743,4712,99,8
3,2443,861,282,3177
etc...
Here is my R script for mapper.R
#! /usr/bin/env Rscript
# Define Functions
trimWhiteSpace <- function(line) gsub("(^ +)|( +$)", "", line)
splitIntoWords <- function(line) unlist(strsplit(line, "[[:space:]]+"))
# function to read in the relevant data from needed data files
get.data <- function(casename) {
list <- lapply(casename, function(x) {
read.csv(file = paste("./inputdata/",x, ".csv", sep = ""),
header = TRUE,
stringsAsFactors = FALSE)})
return(data.frame(list))
}
con <- file("stdin")
line <- readLines(con, n = 1, warn = FALSE)
line <- trimWhiteSpace(line)
values <- unlist(strsplit(line, ","))
lv <- length(values)
cases <- as.numeric(values[2:lv])
simid <- paste("sim", values[1], ":", sep = "")
l <- length(cases) # for indexing
## create a vector for the case names
names.vector <- paste("case", cases, sep = ".")
## read in metadata and necessary data columns using get.data function
metadata <- read.csv(file = "./inputdata/metadata.csv", header = TRUE,
stringsAsFactors = FALSE)
d <- cbind(metadata[,1:3], get.data(names.vector))
## Calculations that use df d and produce a string called 'output'
## in the form of "id: value1 value2 value3 ..." to be used at a
## later time for agregation.
cat(output, "\n")
close(con)
The (generalized) EMR call for this simulation is:
ruby elastic-mapreduce --create --stream --input s3n://bucket/project/input.txt --output s3n://bucket/project/output --mapper s3n://bucket/project/mapper.R --reducer org.apache.hadoop.mapred.lib.IdentityReducer --cache-archive s3n://bucket/project/inputdata.tar.gz#inputdata --name Simulation --num-instances 2
If anyone has any insights as to why I might be experiencing these issues, I am open to suggestions, as well as any changes/optimization to the R script.
My other option is to turn the script into a function and run a parallelized apply using R multicore packages, but I haven't tried it yet. I'd like to get this working on EMR. I used JD Long's and Pete Skomoroch's R/EMR examples as a basis for creating the script.
Nothing obvious jumps out. However, can you run the job using a simple input file of only 10 lines? Make sure these 10 lines are scenarios which did not run in your big test case. Try this to eliminate the possibility that your inputs are causing the R script to not produce an answer.
Debugging EMR jobs is a skill of its own.
EDIT:
This is a total fishing expedition, but fire up a EMR interactive pig session using the AWS GUI. "Interactive pig" sessions stay up and running so you can ssh into them. You could also do this from the command line tools, but it's a little easier from the GUI since, hopefully, you only need to do this once. Then ssh into the cluster, transfer over your test case infile your cachefiles and your mapper and then run this:
cat infile.txt | yourMapper.R > outfile.txt
This is just to test if your mapper can parse the infile in the EMR environment with no Hadoop bits in the way.
EDIT 2:
I'm leaving the above text there for posterity but the real issue is your script never goes back to stdin to pick up more data. Thus you get one run for each mapper then it ends. If you run the above one liner you will only get out one result, not a result for each line in infile.txt. If you had run the cat test even on your local machine the error should pop out!
So let's look at Pete's word count in R example:
#! /usr/bin/env Rscript
trimWhiteSpace <- function(line) gsub("(^ +)|( +$)", "", line)
splitIntoWords <- function(line) unlist(strsplit(line, "[[:space:]]+"))
## **** could wo with a single readLines or in blocks
con <- file("stdin", open = "r")
while (length(line <- readLines(con, n = 1, warn = FALSE)) > 0) {
line <- trimWhiteSpace(line)
words <- splitIntoWords(line)
## **** can be done as cat(paste(words, "\t1\n", sep=""), sep="")
for (w in words)
cat(w, "\t1\n", sep="")
}
close(con)
The piece your script is missing is this bit:
while (length(line <- readLines(con, n = 1, warn = FALSE)) > 0) {
#do your dance
#do your dance quick
#come on everybody tell me what's the word
#word up
}
you should, naturally, replace the lyrics of Cameo's Word Up! with your actual logic.
Keep in mind that proper debugging music makes the process less painful:
http://www.youtube.com/watch?v=MZjAantupsA

Resources