starting a function in asynch mode in R, as a separate process

starting a function in asynch mode in R, as a separate process - r

I am looking for the ability to start R processes Asynchronously from within R.
Something like the below function
startFunctionInAsynchMode<-function(workingDir,filesToSource, functionName, ...){
#workingdir - the dir that should be set as wd
#filesToSource - vector of fileNames to be sourced
#functionName - the actual function to be run asynchrously
#... - other parameters to be passed to the function
#Return Value - should be the System Process Id Started
}
Would anyone have quick ideas? I checked packages like parallel etc. but doesn't seem to fit.
Thanks in advance

Here is an implementation using R CMD. Basic version tested. And with some open items.
startFunctionInAsynchMode<-function(workingDir,filesToSource, functionName, ...){
wd<-getwd()
setwd(workingDir)
fs<-makeFiles()
scriptFile<-fs$ScriptFile
cat(file=scriptFile,paste0("source(\"",filesToSource,"\")", collapse = "\n"))
cat(file=scriptFile,"\n",append = T)
functionCall<-getFunctionCall(functionName,as.list(match.call()), startIndex=5)
cat(file=scriptFile,functionCall,append = T)
commandsToRun <- paste0("(R CMD BATCH ", scriptFile, " ",fs$LogFile , " --slave ) &")
print(commandsToRun)
system(commandsToRun)
Sys.sleep(5)
pids<-getPids(scriptFile, "--restore")
cat(file=fs$KillScript,paste0("kill -9 ",pids$PID[1]))
setwd(wd)
return(as.character(pids$PID[1]))
}
makeFiles<-function(){
res<-list()
dir.create("./temp/tempRgen", recursive=T,showWarnings = F)
tf<-tempfile("rGen-","./temp/tempRgen", fileext = "")
res$ScriptFile<-paste0(tf,".R")
res$LogFile<-paste0(tf,".log")
res$KillScript<-paste0(tf,"-kill.sh")
file.create(res$KillScript,showWarnings = F)
file.create(res$ScriptFile,showWarnings = F)
res
}
#Open Items to be handled
#1. Named Arguments
#2. Non String Arguments
getFunctionCall<-function(functionName,argList,startIndex){
res<-paste0(functionName,"(")
if(!is.null(argList)){
if(length(argList)>=startIndex){
first=T
for(i in startIndex:length(argList)){
if(first){
first=F
} else {
res<-paste0(res,",")
}
res<-paste0(res,"\"",argList[[i]],"\"")
}
}
}
res<-paste0(res,")")
}
getPids <- function(grepFor, refineWith){
numCols <- length(unlist(str_split(system("ps aux", intern=T)[1], "\\s+")))
psOutput <- system(paste0("ps auxww | grep ", grepFor), intern=T)
psOutput <- psOutput[str_detect(psOutput, refineWith)]
pidDf <- ldply(psOutput, parseEachPsLine)
# Remove the process that actually grep-ed for my search string
pidDf <- pidDf[!str_detect(pidDf$COMMAND, "grep"),]
return(pidDf)
}
parseEachPsLine <- function(line){
tabular <- read.table(textConnection(line), header=F, sep=" ")
tabular <- tabular[!is.na(tabular)]
psTitles <- c("USER", "PID", "CPU", "MEM", "VSZ", "RSS", "TTY", "STAT", "START", "TIME", "COMMAND")
psColNames <- setNames(seq(1, length(psTitles)), psTitles)
COMMAND <- paste0(tabular[(psColNames["COMMAND"]):length(tabular)], collapse=" ")
return(data.frame("PID"=tabular[psColNames["PID"]], "STARTED"=tabular[psColNames["START"]], "COMMAND"=COMMAND, "STATUS"=tabular[psColNames["STAT"]]))
}

Related

Why is my for loop only half-working? (R)

I'm not sure how to make a reproducible example of my problem and this post is very verbose. I was hoping the issue might pop out. Basically, this for loop obtains output from an external program, makes some calculations in R, and then posts the results of those calculations back into an external file.
The first iteration of the loop runs perfectly fine. It does everything correctly then proceeds to return to the top of the loop and change to the correct directory (flist[2]), but when it reaches the second function (get_stress_table), it chokes by printing "NAs" into the files rather than the file names (flist, which is a vector of file names).
The file names and sub directories being iterated through in this loop share a common name. The fact that it correctly changes to the right sub-directory in setwd, but prints 'NA' as a file name in the first function is what confuses me. Thus I don't understand the problem.
Anything sticking out?
Here is the for loop I am trying to run:
for (i in 1:length(flist)){
setwd(paste0(solutions_dir, "\\", flist[i]))
max_stress <- get_stress_table(solutions_dir = solutions_dir, flist = flist[i], lsdynadir = lsdynadir, states = 5)
xy_table <- element_time_series(stressed_eid = max_stress, solutions_dir = solutions_dir, flist = flist[i], lsdynadir = lsdynadir)
damp_coeff <- find_damp(xy_table = file_xy)
setwd(kfile_complete)
erode_damp(erosion_lines = erosion_lines, damp_coef = damp_coeff, kfile_mesh = flist[i])
}
Here is the error I return:
3.
file(con, "r")
2.
readLines(flist[i])
1.
get_stress_table(solutions_dir = solutions_dir, flist = flist[i],
lsdynadir = lsdynadir, states = 5)
Here is the inside of that function:
biggest_stresses <- data.frame(eid= numeric(),
stress = numeric(),
stringsAsFactors=FALSE)
for (j in 1:states) {
fileconn <- file("get_stresses.cfile")
line_one <- paste0("open d3plot ", solutions_dir, "\\", flist[i], "\\", "d3plot")
line_two <- "ac"
line_three <- "fringe 14"
line_four <- "pfringe"
line_five <- "anim forward"
line_six <- "anim stop; state 100;"
line_seven <- paste0("output ", solutions_dir, "\\", flist[i], "\\", flist[i], " ", j, " 1 1 1 0 0 0 0 1 0 0 0 0 0 0 1.000000")
writeLines(c(line_one, line_two, line_three, line_four, line_five, line_six, line_seven), fileconn)
close(fileconn)
system(paste0(lsdynadir,"\\lsprepost4.3_x64.exe c=get_stresses.cfile -nographics"))
stresses <- readLines(flist[i])
start <- grep(stresses, pattern="*KEYWORD",fixed = T)
stop <- grep(stresses, pattern="$Interpreted from averaged nodal data",fixed = T)
stresses <- stresses[-seq(start, stop, by = 1)]
writeLines(stresses, flist[i])
stresses <- read.table(flist[i], header = FALSE)
names(stresses) <- c("eid", "stress")
max_stress <- which(stresses$eid == which.max(stresses$stress)
biggest_stresses <- rbind(biggest_stresses, stresses[max_stress,]
}
return(biggest_stresses[which.max(biggest_stresses$stress),1])
}

Error in ocrFile function in AbbyyR package

While I was using Abbyy cloud SDK for OCR, I keep on getting the error below when I try to use the ocrFile function which is inside the AbbyyR package.
" Error in curl_download(finishedlist$resultUrl[res$id == finishedlist$id], :
Argument 'url' must be string. "
When I send the files to the cloud and process them everything works fine but when the cloud returns the files there is a problem in downloading them. I thought that it might be a network or certificate problem but I can't solve the problem.
Thanks in advance

There is a problem in source code, it needs as.character() function for url.
I updated ocrFile function as follows:
install.packages("curl")
library(curl)
new_ocrFile<-function (file_path = "", output_dir = "./", exportFormat = c("txt",
"txtUnstructured", "rtf", "docx", "xlsx", "pptx", "pdfSearchable",
"pdfTextAndImages", "pdfa", "xml", "xmlForCorrectedImage",
"alto"), save_to_file = TRUE)
{
exportFormat <- match.arg(exportFormat)
res <- processImage(file_path = file_path, exportFormat = exportFormat)
while (!(any(as.character(res$id) == as.character(listFinishedTasks()$id)))) {
Sys.sleep(1)
}
finishedlist <- listFinishedTasks()
res$id <- as.character(res$id)
finishedlist$id <- as.character(finishedlist$id)
if (identical(save_to_file, FALSE)) {
res <- curl_fetch_memory(as.character(finishedlist$resultUrl[res$id ==
finishedlist$id]))
return(rawToChar(res$content))
}
curl_download(as.character(finishedlist$resultUrl[res$id == finishedlist$id]),
destfile = paste0(output_dir, unlist(strsplit(basename(file_path),
"[.]"))[1], ".", exportFormat))
}
I hope, it helps.

R Parallel Programming: Error in { : task 1 failed - "could not find function "%>%""

I tried to do Parallel Programming in R by modified my script. On my script I did two parallel programming. First one was done but the second was error whereas the script structure were same. Below is my code:
library(rvest)
library(RMySQL)
library(curl)
library(gdata)
library(doMC)
library(foreach)
library(doParallel)
library(raster)
trim <- function (x) gsub("^\\s+|\\s+$", "", x)
setwd('/home/chandra/R/IlmuOne/MisterAladin')
no_cores <- detectCores()
cl<-makeCluster(no_cores)
registerDoParallel(cl)
MasterData = read.xls("Master Hotels - FINAL.xlsx", sheet = 1, header = TRUE)
MasterData$url_agoda = as.character(MasterData$url_agoda)
today = as.Date(format(Sys.time(), "%Y-%m-%d"))+2
ntasks <- nrow(MasterData)
#This section perfomed well
foreach(i=1:ntasks) %dopar% {
url = MasterData$url_agoda[i]
if (trim(url)!='-' & trim(url)!='')
{
from = gregexpr(pattern ='=',url)[[1]][1]
piece1 = substr(url,1,from)
from = gregexpr(pattern ='&los=',url)[[1]][1]
piece2 = substr(url,from,nchar(url))
MasterData$url_agoda[i] = paste0(piece1,today,piece2)
}
}
con <- dbConnect(RMySQL::MySQL(), username = "root", password = "master",host = "localhost", dbname = "mister_aladin")
#Tried first 10 data
#Below section was error and always return error: Error in { : task 1 failed - "could not find function "%>%""
foreach(a=1:10, .packages='foreach') %dopar% {
hotel_id = MasterData$id[a]
vendor = 'Agoda'
url = MasterData$url_agoda[a]
if (url!='-')
{
tryCatch({
hotel <- curl(url) %>%
read_html() %>%
html_nodes(xpath='//*[#id="room-grouping"]') %>%
html_table(fill = TRUE)
hotel <- hotel[[1]]
hotel$hotel_id= hotel_id
hotel$vendor= vendor
colnames(hotel)[1] = 'TheSpace'
colnames(hotel)[4] = 'PricePerNight'
room = '-'
hotel$NormalPrice = 0
hotel$FinalPrice = 0
for(i in 1:nrow(hotel))
{
if (i==1 | (!grepl('See photos',hotel$TheSpace[i]) & hotel$TheSpace[i]!='') )
{
room = hotel$TheSpace[i]
}
hotel$TheSpace[i] = room
#Normal Price
if (gregexpr(pattern ='IDR',hotel$PricePerNight[i])[[1]][1][1]==1)
{
split = strsplit(hotel$PricePerNight[i],'\n')[[1]]
NormalPrice = trim(split[2])
hotel$NormalPrice[i] = NormalPrice
NormalPrice = as.integer(gsub(",","",NormalPrice))
hotel$NormalPrice[i] = NormalPrice
}
#Final Price
if (gregexpr(pattern ='IDR',hotel$PricePerNight[i])[[1]][1][1]==1)
{
split = strsplit(hotel$PricePerNight[i],'\n')[[1]]
FinalPrice = trim(split[6])
hotel$FinalPrice[i] = FinalPrice
FinalPrice = as.integer(gsub(",","",FinalPrice))
hotel$FinalPrice[i] = FinalPrice
}
hotel$NormalPrice[is.na(hotel$NormalPrice)] <- 0
hotel$FinalPrice[is.na(hotel$FinalPrice)] <- 0
}
hotel = hotel[which(hotel$FinalPrice!=0),c("TheSpace","NormalPrice","FinalPrice")]
colnames(hotel) = c('room','normal_price','final_price')
hotel$log = format(Sys.time(), "%Y-%m-%d %H:%M:%S")
hotel$hotel_id = hotel_id
hotel$vendor = vendor
Push = hotel[,c('hotel_id','room','normal_price','final_price','vendor','log')]
#print(paste0('Agoda: push one record, hotel id ',hotel_id,'!'))
#cat(paste(paste0('Agoda: push one record, hotel id ',hotel_id,'!'),'\n'))
dbWriteTable(conn=con,name='prices_',value=as.data.frame(Push), append = TRUE, row.names = F)
},
error = function(e) {
Sys.sleep(2)
e
})
}
}
dbDisconnect(con)
stopImplicitCluster()
Every time I run the script it always gives me error: Error in { : task 1 failed - "could not find function "%>%""
I already check every post on this forum and tried to apply it but no one works.
Please advise any solution

you have to use .packages = c("magrittr", ...) and include all the packages, which are necessary to run the code within the foreach loop. However, .packages = "foreach" is not helping.
See, you can imagine that all the packages you define in .packages are forwareded / loaded in each parallel worker.

The %>% operator requires the package magrittr. In this case however it does not suffice to load it at the beginning of your script - it needs to be loaded for each of the nodes. You could add this line to the creation of your cluster to accomplish this:
cl<-makeCluster(no_cores)
registerDoParallel(cl)
clusterCall(cl, function() library(magrittr))

R staticdocs, titles of demos

Does anyone know, using R staticdocs, how to define the name of the demos? The title in the html file that is produced is not reflective of what is contained within the demo 00Index file?

Think there is something missing in the demo building routine, the following is a sligtly modified routine from that which comes standard with staticdocs. Basically, the pgk$title attribute should be set, which I have tried to highlight below:
build_demos = function (pkg = ".") {
pkg <- staticdocs::as.sd_package(pkg)
demo_dir <- file.path(pkg$path, "demo")
if (!file.exists(demo_dir))
return()
message("Rendering demos")
demos <- readLines(file.path(demo_dir, "00Index"))
pieces <- stringr::str_split_fixed(demos, "\\s+", 2)
in_path <- stringr::str_c(pieces[, 1], ".r")
filename <- stringr::str_c("demo-", pieces[, 1], ".html")
title <- pieces[, 2]
for (i in seq_along(title)) {
demo_code <- readLines(file.path(demo_dir, in_path[i]))
demo_expr <- evaluate::evaluate(demo_code, new.env(parent = globalenv()), new_device = FALSE)
#NH: replay_html is not exported...
replay_html <- getFromNamespace('replay_html','staticdocs')
pkg$demo <- replay_html(demo_expr, pkg = pkg, name = stringr::str_c(pieces[i],"-"))
pkg$pagetitle <- title[i]
#---------------------------------------------------
#NH: Need to set the title attribute...
#---------------------------------------------------
pkg$title <- pkg$pagetitle
#---------------------------------------------------
staticdocs::render_page(pkg, "demo", pkg, file.path(pkg$site_path, filename[i]))
}
invisible(list(demo = unname(apply(cbind(filename,title), 1, as.list))))
}

How to input HDFS file into R mapreduce for processing and get the result into HDFS file

I have a question similar to the below link in stackoverflow
R+Hadoop: How to read CSV file from HDFS and execute mapreduce?
I am tring to read a file from location "/somnath/logreg_data/ds1.10.csv" in HDFS, reduce its number of columns from 10 to 5 and then write to another location "/somnath/logreg_data/reduced/ds1.10.reduced.csv" in HDFS using the below
transfer.csvfile.hdfs.to.hdfs.reduced function.
transfer.csvfile.hdfs.to.hdfs.reduced("hdfs://10.5.5.82:8020/somnath/logreg_data/ds1.10.csv", "hdfs://10.5.5.82:8020/somnath/logreg_data/reduced/ds1.10.reduced.csv", 5)
The function definition is
transfer.csvfile.hdfs.to.hdfs.reduced =
function(hdfsFilePath, hdfsWritePath, reducedCols=1) {
#local.df = data.frame()
#hdfs.get(hdfsFilePath, local.df)
#to.dfs(local.df)
#r.file <- hdfs.file(hdfsFilePath,"r")
transfer.reduced.map =
function(.,M) {
label <- M[,dim(M)[2]]
reduced.predictors <- M[,1:reducedCols]
reduced.M <- cbind(reduced.predictors, label)
keyval(
1,
as.numeric(reduced.M))
}
reduced.values =
values(
from.dfs(
mapreduce(
input = from.dfs(hdfsFilePath),
input.format = "native",
map = function(.,M) {
label <- M[,dim(M)[2]]
print(label)
reduced.predictors <- M[,1:reducedCols]
reduced.M <- cbind(reduced.predictors, label)
keyval(
1,
as.numeric(reduced.M))}
)))
write.table(reduced.values, file="/root/somnath/reduced.values.csv")
w.file <- hdfs.file(hdfsWritePath,"w")
hdfs.write(reduced.values,w.file)
#to.dfs(reduced.values)
}
But I am receiving an error
Error in file(fname, paste(if (is.read) "r" else "w", if (format$mode == :
cannot open the connection
Calls: transfer.csvfile.hdfs.to.hdfs.reduced ... make.keyval.reader -> do.call -> <Anonymous> -> file
In addition: Warning message:
In file(fname, paste(if (is.read) "r" else "w", if (format$mode == :
cannot open file 'hdfs://10.5.5.82:8020/somnath/logreg_data/ds1.10.csv': No such file or directory
Execution halted
OR
When I am trying to load a file from hdfs using the below commands, I am getting the below error:
> x <- hdfs.file(path="hdfs://10.5.5.82:8020/somnath/logreg_data/ds1.10.csv",mode="r")
Error in hdfs.file(path = "hdfs://10.5.5.82:8020/somnath/logreg_data/ds1.10.csv", :
attempt to apply non-function
Any help will be highly appreciated
Thanks

Basically found a solution to the problem that I stated above.
r.file <- hdfs.file(hdfsFilePath,"r")
from.dfs(
mapreduce(
input = as.matrix(hdfs.read.text.file(r.file)),
input.format = "csv",
map = ...
))
Below is the entire modified function:
transfer.csvfile.hdfs.to.hdfs.reduced =
function(hdfsFilePath, hdfsWritePath, reducedCols=1) {
hdfs.init()
#local.df = data.frame()
#hdfs.get(hdfsFilePath, local.df)
#to.dfs(local.df)
r.file <- hdfs.file(hdfsFilePath,"r")
transfer.reduced.map =
function(.,M) {
numRows <- length(M)
M.vec.elems <-unlist(lapply(M,
function(x) strsplit(x, ",")))
M.matrix <- matrix(M.vec.elems, nrow=numRows, byrow=TRUE)
label <- M.matrix[,dim(M.matrix)[2]]
reduced.predictors <- M.matrix[,1:reducedCols]
reduced.M <- cbind(reduced.predictors, label)
keyval(
1,
as.numeric(reduced.M))
}
reduced.values =
values(
from.dfs(
mapreduce(
input = as.matrix(hdfs.read.text.file(r.file)),
input.format = "csv",
map = function(.,M) {
numRows <- length(M)
M.vec.elems <-unlist(lapply(M,
function(x) strsplit(x, ",")))
M.matrix <- matrix(M.vec.elems, nrow=numRows, byrow=TRUE)
label <- M.matrix[,dim(M.matrix)[2]]
reduced.predictors <- M.matrix[,1:reducedCols]
reduced.M <- cbind(reduced.predictors, label)
keyval(
1,
as.numeric(reduced.M)) }
)))
write.table(reduced.values, file="/root/somnath/reduced.values.csv")
w.file <- hdfs.file(hdfsWritePath,"w")
hdfs.write(reduced.values,w.file)
hdfs.close(r.file)
hdfs.close(w.file)
#to.dfs(reduced.values)
}
Hope this helps and don't forget to give points if you find it useful. Thanks ahead

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

starting a function in asynch mode in R, as a separate process - r

Related

Why is my for loop only half-working? (R)

Error in ocrFile function in AbbyyR package

R Parallel Programming: Error in { : task 1 failed - "could not find function "%>%""

R staticdocs, titles of demos

How to input HDFS file into R mapreduce for processing and get the result into HDFS file

Categories

Resources