how to pause an R script until a file has been downloaded? - r

I have the following piece of code which runs a python selenium script that downloads a report
library(reticulate)
source_python("C:/Users/Gunathilakel/Desktop/Dashboard Project/Dashboard/PBK-Report-Automation-for-Dashboard-master/pbk automation.py")
I want to make sure that R waits until the earlier piece of code has downloaded a file into my downloads folder before the script executes this next piece of code
my.file.copy <- function(from, to) {
todir <- dirname(to)
if (!isTRUE(file.info(todir)$isdir)) dir.create(todir, recursive=TRUE)
file.copy(from = from, to = to,overwrite = TRUE)
}
my.file.copy(from = "C:/Users/Gunathilakel/Downloads/Issued and Referral Charge.csv",
to = "C:/Users/Gunathilakel/Desktop/Dashboard Project/Dashboard/RawData/Issued and Referral Charge2020.csv")
I found this question How to make execution pause, sleep, wait for X seconds in R?
But is it possible to wait for execution until a file has been downloaded?

actually found a solution from this question
Wait for file to exist
library(reticulate)
source_python("C:/Users/Gunathilakel/Desktop/Dashboard Project/Dashboard/PBK-Report-Automation-for-Dashboard-master/pbk automation.py")
while (!file.exists("C:/Users/Gunathilakel/Downloads/Issued and Referral Charge.csv")) {
Sys.sleep(1)
}
# Moves Downloaded CSV file of Issued_and_refused from Downloads --------
my.file.copy <- function(from, to) {
todir <- dirname(to)
if (!isTRUE(file.info(todir)$isdir)) dir.create(todir, recursive=TRUE)
file.copy(from = from, to = to,overwrite = TRUE)
}
my.file.copy(from = "C:/Users/Gunathilakel/Downloads/Issued and Referral Charge.csv",
to = "C:/Users/Gunathilakel/Desktop/Dashboard Project/Dashboard/RawData/Issued and Referral Charge2020.csv")

Related

My R function is consuming too much memory. Can you help me optimizing it?

I'm new to R and having trouble with optimizing a function.
My function is to:
create a directory specified in the function
download the zip file from the link inside the function and extract it to the directory
move extracted files to the main directory if files are extracted under a new subfolder
delete the subfolder
It works but consumes a lot of memory and takes 30mins to do such an easy job on a 2.7MB zip file.
Thank you in advance!
create_dir <- function(directory) {
path <- file.path(getwd(), directory)
if (!file.exists(path)) {
dir.create(path)
}
link <-
"https://d396qusza40orc.cloudfront.net/rprog%2Fdata%2Fspecdata.zip"
temp <- tempfile()
download.file(link, temp, mode = "wb")
unzip(temp, exdir = path)
unlink(temp)
existing_loc <- list.files(path, recursive = TRUE)
for (loc in existing_loc) {
if (length(grep("/", loc))) {
file.copy(file.path(path, loc), path)
file.remove(file.path(path, loc))
}
}
dirs <- list.dirs(path)
rm_dirs <- dirs[dirs != path]
if (length(rm_dirs)) {
for (dir in rm_dirs) {
unlink(rm_dirs, recursive = TRUE)
}
}
}
create_dir("testDirectory")
Thanks, I found the problem. It's because of setting a working directory on OneDrive that syncs for every extraction, moving, and deletion of 332 files processed by the function. AntiVirus also run along with OneDrive and caused my PC to freeze for 30 mins by using 70% of CPU.

Quit a Plumber API once a condition is met

I am trying to run a Plumber API inline to receive an input, and once the proper input is received and a specified condition is met, the input is returned to the globalenv and the API closes itself such that the script can continue to run.
I've specified a condition within a #get endpoint that calls quit(), stop() etc, none of which successfully shut down the API.
I've attempted to run the API in parallel using future such that the parent script can close the Plumber API.
It appears that there isn't actually a method in the Plumber API class object to close the Plumber API, and the API can't be closed from within itself.
I've been through the extended documentation, SO, and the Github Issues in search of a solution. The only semi-relevant solution suggested is to use R.Utils::withTimeout to create a time-bounded timeout. However, this method is also unable to close the API.
A simple use case:
Main Script:
library(plumber)
code_api <- plumber::plumb("code.R")
code_api$run(port = 8000)
code.R
#' #get /<code>
function(code) {
print(code)
if (nchar(code) == 3) {
assign("code",code,envir = globalenv())
quit()}
return(code)
}
#' #get /exit
function(exit){
stop()
}
The input is successfully returned to the global environment, but the API does not shut down afterward, nor after calling the /exit endpoint.
Any ideas on how to accomplish this?
You could look at Iterative testing with plumber #Irène Steve's, Dec 23 2018 with:
trml <- rstudioapi::terminalCreate()
rstudioapi::terminalKill(trml)
excerpt of her article (2nd version of 3):
.state <- new.env(parent = emptyenv()) #create .state when package is first loaded
start_plumber <- function(path, port) {
trml <- rstudioapi::terminalCreate(show = FALSE)
rstudioapi::terminalSend(trml, "R\n")
Sys.sleep(2)
cmd <- sprintf('plumber::plumb("%s")$run(port = %s)\n', path, port)
rstudioapi::terminalSend(trml, cmd)
.state[["trml"]] <- trml #store terminal name
invisible(trml)
}
kill_plumber <- function() {
rstudioapi::terminalKill(.state[["trml"]]) #access terminal name
}
Running a Plumber in the terminal might work in some cases but as I needed access to the R session (for insertText) I had to come up with the different approach. While not ideal the following solution worked:
# plumber.R
#* Insert
#* #param msg The msg to insert to the cursor location
#* #post /insert
function(msg="") {
rstudioapi::insertText(paste0(msg))
stop_plumber(Sys.getpid())
}
.state <- new.env(parent = emptyenv()) #create .state when package is first loaded
stop_plumber <- function(pid) {
trml <- rstudioapi::terminalCreate(show = FALSE)
Sys.sleep(2) # Wait for the terminal to initialize
# Wait a bit for the Plumber to flash the buffers and then send a SIGINT to the R session process,
# to terminate the Plumber
cmd <- sprintf("sleep 2 && kill -SIGINT %s\n", pid)
rstudioapi::terminalSend(trml, cmd)
.state[["trml"]] <- trml # store terminal name
invisible(trml)
Sys.sleep(2) # Wait for the Plumber to terminate and then kill the terminal
rstudioapi::terminalKill(.state[["trml"]]) # access terminal name
}

is reading .wma sound files in R possible?

Is the there a way to read .WMA sound files in R or the copyright restrictions do not allow this?
The final aim is to convert it to another format (MP3/WAV)
In one or another way R audio packages use ffmpeg converter.
Please see the following options:
After its download you can use R to convert WMA to MP3 file format
directly by system function call;
or you can use ffmpeg wrapper package for simple audio conversions. However it is Linux-oriented it could be easily transformed into Windows compatible one.
Please see the code below for Option 2:
# install.packages("devtools")
# library(devtools)
# install_github("pmur002/ffmpeg")
library(ffmpeg)
# set path to your ffmpeg.exe file
ffmpeg_path <- "C:\\<Path to ffmpeg>\\ffmpeg-20180906-70a7087-win64-static\\bin\\ffmpeg.exe"
ffmpeg_win <- function (inputs, outputs, filters = NULL, overwrite = FALSE,
wait = TRUE, echo = FALSE) {
if (!is.null(filters)) {
stop("Filters are currently unsupported")
}
if (inherits(inputs, "FFmpeg_input")) {
inputs <- list(inputs)
}
if (inherits(outputs, "FFmpeg_output")) {
outputs <- list(outputs)
}
options <- ""
if (overwrite) {
options <- paste0(options, "-y ")
}
cmd <- paste(ffmpeg_path, options, do.call(paste, inputs), do.call(paste,
outputs))
system(cmd, wait = wait)
if (echo) {
cat(cmd, "\n")
}
}
# just copy to your working directory required file, here is for example "mellow.wma"
ffmpeg_win(fileInput("mellow.wma"), fileOutput("mellow.mp3"), echo = TRUE)

Web scraping PDF files from a map

I've been trying to download pdfs embedded in a map following this code (original one can be found here). Each pdf refers to a brazilian municipality (5,570 files).
library(XML)
library(RCurl)
url <- "http://simec.mec.gov.br/sase/sase_mapas.php?uf=RJ&tipoinfo=1"
page <- getURL(url)
parsed <- htmlParse(page)
links <- xpathSApply(parsed, path="//a", xmlGetAttr, "href")
inds <- grep("*.pdf", links)
links <- links[inds]
regex_match <- regexpr("[^/]+$", links, perl=TRUE)
destination <- regmatches(links, regex_match)
for(i in seq_along(links)){
download.file(links[i], destfile=destination[i])
Sys.sleep(runif(1, 1, 5))
}
I already used this code in other projects a few times and it worked. For this specific case, it doesn't. In fact, I've tried many things to scrape these files but it seems impossible to me. Recently, I got the following link. Then it makes possible to combine uf (state) and muncod (municipal code) to download the file, but I dont know how to include this to the code though.
http://simec.mec.gov.br/sase/sase_mapas.php?uf=MT&muncod=5100102&acao=download
Thanks in advance!
devtools::install_github("ropensci/RSelenium")
library(rvest)
library(httr)
library(RSelenium)
# connect to selenium server from within r (REPLACE SERVER ADDRESS)
rem_dr <- remoteDriver(
remoteServerAddr = "192.168.50.25", port = 4445L, browserName = "firefox"
)
rem_dr$open()
# get the two-digit state codes for brazil by scraping the below webpage
tables <- "https://en.wikipedia.org/wiki/States_of_Brazil" %>%
read_html() %>%
html_table(fill = T)
states <- tables[[4]]$Abbreviation
# for each state, we are going to go navigate to the map of that state using
# selenium, then scrape the list of possible municipality codes from the drop
# down menu present in the map
get_munip_codes <- function(state) {
url <- paste0("http://simec.mec.gov.br/sase/sase_mapas.php?uf=", state)
rem_dr$navigate(url)
# have to wait until the drop down menu loads. 8 seconds will be enough time
# for each state
Sys.sleep(8)
src <- rem_dr$getPageSource()
out <- read_html(src[[1]]) %>%
html_nodes(xpath = "//select[#id='muncod']/option[boolean(#value)]") %>%
xml_attrs("value") %>%
unlist(use.names = F)
print(state)
out
}
state_munip <- sapply(
states, get_munip_codes, USE.NAMES = TRUE, simplify = FALSE
)
# now you can download each pdf. first create a directory for each state, where
# the pdfs for that state will go:
lapply(names(state_munip), function(x) dir.create(file.path("brazil-pdfs", x)))
# ...then loop over each state/municipality code and download the pdf
lapply(
names(state_munip), function(state) {
lapply(state_munip[[state]], function(munip) {
url <- sprintf(
"http://simec.mec.gov.br/sase/sase_mapas.php?uf=%s&muncod=%s&acao=download",
state, munip
)
file <- file.path("brazil-pdfs", state, paste0(munip, ".pdf"))
this_one <- paste0("state ", state, ", munip ", munip)
tryCatch({
GET(url, write_disk(file, overwrite = TRUE))
print(paste0(this_one, " downloaded"))
},
error = function(e) {
print(paste0("couldn't download ", this_one))
try(unlink(file, force = TRUE))
}
)
})
}
)
STEPS:
Get the IP address of your windows machine (see https://www.digitalcitizen.life/find-ip-address-windows)
start selenium server docker container by running this:
docker run -d -p 4445:4444 selenium/standalone-firefox:2.53.1
start rocker/tidyverse docker container by running this:
docker run -v `pwd`/brazil-pdfs:/home/rstudio/brazil-pdfs -dp 8787:8787 rocker/tidyverse
Go into your preferred browser and enter this address: http://localhost:8787 ...This will take you to the login screen for rstudio server. login using the username "rstudio" and password "rstudio"
Copy/paste the code shown above in a new Rstudio .R document. Replace the value for remoteServerAddr with the IP address you found in step 1.
Run the code...this should write the pdfs to a directory "brazil-pdfs" that is both inside the container and mapped to your windows machine (in other words, the pdfs will show up in the brazil-pdfs dir on your local machine as well). note, it takes a while to run the code b/c there are a lot of pdfs.

system open RStudio close connection

I'm attempting to use R to open a .Rproj file used in RStudio. I have succeeded with the code below (stolen from Ananda here). However, the connection to open RStudio called from R is not closed after the file is opened. How can I sever this "connection" after the .Rproj file is opened? (PS this has not been tested on Linux or Mac yet).
## Create dummy .Rproj
x <- c("Version: 1.0", "", "RestoreWorkspace: Default", "SaveWorkspace: Default",
"AlwaysSaveHistory: Default", "", "EnableCodeIndexing: Yes",
"UseSpacesForTab: No", "NumSpacesForTab: 4", "Encoding: UTF-8",
"", "RnwWeave: knitr", "LaTeX: pdfLaTeX")
loc <- file.path(getwd(), "Bar.rproj")
cat(paste(x, collapse = "\n"), file = loc)
## wheresRStudio function to find RStudio location
wheresRstudio <-
function() {
myPaths <- c("rstudio", "~/.cabal/bin/rstudio",
"~/Library/Haskell/bin/rstudio", "C:\\PROGRA~1\\RStudio\\bin\\rstudio.exe",
"C:\\RStudio\\bin\\rstudio.exe")
panloc <- Sys.which(myPaths)
temp <- panloc[panloc != ""]
if (identical(names(temp), character(0))) {
ans <- readline("RStudio not installed in one of the typical locations.\n
Do you know where RStudio is installed? (y/n) ")
if (ans == "y") {
temp <- readline("Enter the (unquoted) path to RStudio: ")
} else {
if (ans == "n") {
stop("RStudio not installed or not found.")
}
}
}
temp
}
## function to open .Rproj files
open_project <- function(Rproj.loc) {
action <- paste(wheresRstudio(), Rproj.loc)
message("Preparing to open project!")
system(action)
}
## Test it (it works but does no close)
open_project(loc)
It's not clear what you're trying to do exactly. What you've described doesn't really sound to me like a "connection" -- it's a system call.
I think what you're getting at is that after you run open_project(loc) in your above example, you don't get your R prompt back until you close the instance of RStudio that was opened by your function. If that is the case, you should add wait = FALSE to your system call.
You might also need to add an ignore.stderr = TRUE in there to get directly back to the prompt. I got some error about "QSslSocket: cannot resolve SSLv2_server_method" on my Ubuntu system, and after I hit "enter" it took me back to the prompt. ignore.stderr can bypass that (but might also mean that the user doesn't get meaningful errors in the case of serious errors).
In other words, I would change your open_project() function to the following and see if it does what you expect:
open_project <- function(Rproj.loc) {
action <- paste(wheresRstudio(), Rproj.loc)
message("Preparing to open project!")
system(action, wait = FALSE, ignore.stderr = TRUE)
}

Resources