Leaving connections open indefinitely using file() in R - r

I have this function to write streaming data from Twitter into one file for 12 hours, then to another file for 12 hours. This is so we can clean, parse, and store the data twice a day.
conn <- file(description = "after12.json", open = "a")
conn2 <- file(description = "before12.json", open = "a")
write.tweets <- function(x) {
if (nchar(x) > 0 && format(Sys.time(), " %H") >= 12){
writeLines(x, conn, sep = "")
} else {
writeLines(x, conn2, sep = "")
}
}
This is within a much larger function to pull and write the data. My question is pretty simple. I want to leave both connections open indefinitely to be able to call the connection after 12 hours of inactivity. Is there a way I can do this?

use open
conn <- file(description = "after12.json")
open(conn, open = "a")
as per ?open:
open opens a connection. In general functions using connections will open them if they are not open, but then close them again, so to leave a connection open call open explicitly.

Related

How to tryCatch the same function call multiple times (N times) in R

We have a basic tryCatch that writes a dataframe to Google Sheets, and trys again if the first write fails for any reason:
result = tryCatch({
print('TRYING')
googlesheets4::sheet_write(data = our_df, ss = our_spreadsheet, sheet = 'our_sheetname')
}, error = function(e) {
print('ERROR, TRYING AGAIN')
googlesheets4::sheet_write(data = our_df, ss = our_spreadsheet, sheet = 'our_sheetname')
})
It is possible to generalize this code to retry the googlesheets4::sheet_write() function call for N number of tries? Is something built into base R for this or is there a good R library that handles unlimited retries of a function?
You can put it in a for loop like this.
First, I am going to define a function that often fails (as I don't have access to your Google sheet).
russian_roulette <- function(n = 6) {
revolver <- sample(1:n, 1)
if (revolver == 1) {
return("You lived")
} else {
stop("Better luck next time...")
}
}
Then you can try it as many times as you consider reasonable. You can replace my call to russian_roulette() with your call to googlesheets4::sheet_write().
NUM_TRIES <- 10
for (i in 1:NUM_TRIES) {
message(i)
result <- try({
russian_roulette()
})
if (class(result) != "try-error") {
print("Success!")
break
}
}
Output:
1
Error in russian_roulette() : Better luck next time...
2
Error in russian_roulette() : Better luck next time...
3
Error in russian_roulette() : Better luck next time...
4
Error in russian_roulette() : Better luck next time...
5
Error in russian_roulette() : Better luck next time...
6
[1] "Success!"
result
# [1] "You lived"
I don't know why you expect writing to a file to fail - depending on the reason you may want to add a Sys.sleep() call in there for a certain number of seconds after every failure.

Test if socket is empty (was: Reading data from a raw socket)

At the start, I thought that the bad performance of my driver was caused by the way in which I read data from a socket.
This was the original function I used:
socket_char_reader = function(in_sock) {
string_read <- raw(0)
while((rd <- readBin(in_sock, what = "raw", n=1)) > 0) {
string_read <- c(string_read, rd)
}
return(string_read %>% strip_CR_NUL() %>% rawToChar())
}
The results from 3 consecutive calls to this function give the expected result. It takes 0.004 seconds to read 29 bytes in total.
My second try reads the socket until it is empty. Another function splits the resulting raw array in 3 separate parts.
get_response = function() {
in_sock <- self$get_socket()
BUF_SIZE <- 8
chars_read <- raw(0)
while (length(rd <- readBin(in_sock, what = "raw", n=BUF_SIZE)) > 0) {
chars_read <- c(chars_read, rd)
}
return(chars_read)
},
Reading from the socket now takes 2.049 seconds!
Can somebody explain to me what could be the cause for this difference (or what is the best method for reading the socket until it is empty)?
In the meantime I'll return to my original solution and continue looking for the cause of the bad performance.
Ben
I believe, I found the cause (but not the solution).
While debuging, I noticed that the delay is caused by the last call to readBin().
In socket_char_reader(), the stop-condition for the while-loop is based on the value of rd; if that value equals 0, the loop stops.
In get_response() the stop-condition is based on the number of bytes in the buffer. Before that number can be determined, readBin() first waits if any other bytes will be send to the socket.
The timeOut-period is set in the socketConnection() call.
private$conn <- socketConnection(host = "localhost", port,
open = "w+b", server = FALSE, blocking = TRUE,
encoding = "UTF-8", timeout = 1)
Timeout has to be give a value > 0, otherwise it will take days before the loop stops.
Is it possible to check if there still are any bytes in the socket without actually reading?
Ben

How to improve formatting of slack messages using slackr?

I'm using slackr to send alert messages to a Slack channel. It works great except the message format is not great and I want to improve it.
install_github("hrbrmstr/slackr")
library(slackr)
slackr_setup(channel="#alerts", username="Mark Davis",
incoming_webhook_url = "https://hooks.slack.com/services/T31P8UDAB/BCH4HKQSC/*********",
api_token = "*********", echo = F)
alert="On Monday, 2018-09-03 # 2pm Pacific..."
slackr(alert)
Here is an example of how a message from slackr looks in Slack:
Here is an example of how I'd like it to look:
slackr doesn't seem to have many options in the way of formatting. I was thinking of building an image and inserting that, but I'm having trouble building an image out of a text file using R.
Perhaps there is another api I could call that could take my text and format it for slack?
I'm open to any suggestions.
Addendum:
Slackr has an option to upload files, so my latest attempt is to create an image from the text message and upload that object.
I am able to create a png file from the text message using the magick library. I created an image with a colored background, and I simply add the message text to the image:
library(magick)
alert_picture <- image_read('alert_480x150_dark_red.png')
alert_picture=image_annotate(alert_picture, DreamCloud_Alert, size = 20, gravity = "southwest",
color = "white", location = "+10+10")
image_write(alert_picture, path = "alert_picture.png", format = "png")
The image looks pretty good (although there doesn't seem to be an easy way to bold or underline specific words in the message), but the obstacle now is that I can't get the upload command to work.
slackr_upload(filename = "alert_picture.png")
I don't get any error messages but nothing is uploaded to slack.
I got around this issue by using the httr package to execute the post image function to slack.
Thanks to Adil B. for providing the solution:
Post Image to Slack Using HTTR package in R
I am not sure this is what you meant, but I solved allowing formatting like in a regular slack message by altering the slackr_bot() function and just removing the 2 sets of 3 back-ticks at the end of the code where it says text. Then just call it slackr_bot1() or something, and then you can post formatted messages. This is the function after the back-ticks removal:
slackr_bot1 <- function(...,
channel=Sys.getenv("SLACK_CHANNEL"),
username=Sys.getenv("SLACK_USERNAME"),
icon_emoji=Sys.getenv("SLACK_ICON_EMOJI"),
incoming_webhook_url=Sys.getenv("SLACK_INCOMING_URL_PREFIX")) {
if (incoming_webhook_url == "") {
stop("No incoming webhook URL specified. Did you forget to call slackr_setup()?", call. = FALSE)
}
if (icon_emoji != "") { icon_emoji <- sprintf(', "icon_emoji": "%s"', icon_emoji) }
resp_ret <- ""
if (!missing(...)) {
# mimics capture.output
# get the arglist
args <- substitute(list(...))[-1L]
# setup in-memory sink
rval <- NULL
fil <- textConnection("rval", "w", local = TRUE)
sink(fil)
on.exit({
sink()
close(fil)
})
# where we'll need to eval expressions
pf <- parent.frame()
# how we'll eval expressions
evalVis <- function(expr) withVisible(eval(expr, pf))
# for each expression
for (i in seq_along(args)) {
expr <- args[[i]]
# do something, note all the newlines...Slack ``` needs them
tmp <- switch(mode(expr),
# if it's actually an expresison, iterate over it
expression = {
cat(sprintf("> %s\n", deparse(expr)))
lapply(expr, evalVis)
},
# if it's a call or a name, eval, printing run output as if in console
call = ,
name = {
cat(sprintf("> %s\n", deparse(expr)))
list(evalVis(expr))
},
# if pretty much anything else (i.e. a bare value) just output it
integer = ,
double = ,
complex = ,
raw = ,
logical = ,
numeric = cat(sprintf("%s\n\n", as.character(expr))),
character = cat(sprintf("%s\n\n", expr)),
stop("mode of argument not handled at present by slackr"))
for (item in tmp) if (item$visible) { print(item$value, quote = FALSE); cat("\n") }
}
on.exit()
sink()
close(fil)
# combined all of them (rval is a character vector)
output <- paste0(rval, collapse="\n")
loc <- Sys.getlocale('LC_CTYPE')
Sys.setlocale('LC_CTYPE','C')
on.exit(Sys.setlocale("LC_CTYPE", loc))
resp <- POST(url = incoming_webhook_url, encode = "form",
add_headers(`Content-Type` = "application/x-www-form-urlencoded",
Accept = "*/*"), body = URLencode(sprintf("payload={\"channel\": \"%s\", \"username\": \"%s\", \"text\": \"%s\"%s}",
channel, username, output, icon_emoji)))
warn_for_status(resp)
}
return(invisible())
}
slackr_bot1("*test* on time")

Unable to start Julia connection on port 1023: all connections are in use

I am trying to run Julia function via R using XRJulia package. Below is my code snippet.
## start
library(XRJulia)
prevInterface <- XR::getInterface()
if (is.null(prevInterface)) {
ev <- RJulia(.makeNew = TRUE)
} else {
ev <- RJulia(.makeNew = FALSE)
}
juliaAddToPath(directory = '/home/.julia/lib/v0.6/', package = NULL, evaluator = ev)
runjl <- juliaEval('function sum(a, b)
c= a+b;
return c
end
')
runjl_function <- JuliaFunction(runjl)
sum_result <- runjl_function(1, 5)
XR::rmInterface(XR::getInterface())
## end
This code is working fine. But few times when I am running above code multiple times I am getting
error: Unable to start Julia connection on port 1023: all connections
are in use.
How to close all connections of Julia and what is the systematic way..? Please suggest.
You have the function ServerQuit() in the RJuliaConnect:
https://github.com/johnmchambers/XRJulia/blob/master/R/RJuliaConnect.R

make concurrent RCurl GET requests for set of URLs

I wrote a function to use RCurl to obtain the effective URL for a list of shortened URL redirects (bit.ly, t.co, etc.) and handle errors when the effective URL locates a document (PDFs tend to throw "Error in curlPerform... embedded nul in string.")
I would like to make this function more efficiently if possible (while keeping it in R). As written the run-time is prohibitively long for un-shortening a thousand or more URLs.
?getURI tells us that by default, getURI/getURL goes asynchronous when the length of the url vector is >1. But my performance seems totally linear, presumably because sapply turns the thing into one big for loop and the concurrency is lost.
Is there anyway I can speed up these requests? Extra credit for fixing the "embedded nul" issue.
require(RCurl)
options(RCurlOptions = list(verbose = F, followlocation = T,
timeout = 500, autoreferer = T, nosignal = T,
useragent = "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_2)"))
# find successful location (or error msg) after any redirects
getEffectiveUrl <- function(url){
c = getCurlHandle()
h = basicHeaderGatherer()
curlSetOpt( .opts = list(header=T, verbose=F), curl= c, .encoding = "CE_LATIN1")
possibleError <- tryCatch(getURI( url, curl=c, followlocation=T,
headerfunction = h$update, async=T),
error=function(e) e)
if(inherits(possibleError, "error")){
effectiveUrl <- "ERROR_IN_PAGE" # fails on linked documents (PDFs etc.)
} else {
headers <- h$value()
names(headers) <- tolower(names(headers)) #sometimes cases change on header names?
statusPrefix <- substr(headers[["status"]],1,1) #1st digit of http status
if(statusPrefix=="2"){ # status = success
effectiveUrl <- getCurlInfo(c)[["effective.url"]]
} else{ effectiveUrl <- paste(headers[["status"]] ,headers[["statusmessage"]]) }
}
effectiveUrl
}
testUrls <- c("http://t.co/eivRJJaV4j","http://t.co/eFfVESXE2j","http://t.co/dLI6Q0EMb0",
"http://www.google.com","http://1.uni.vi/01mvL","http://t.co/05Mz00DHLD",
"http://t.co/30aM6L4FhH","http://www.amazon.com","http://bit.ly/1fwWZLK",
"http://t.co/cHglxQkz6Z") # 10th URL redirects to content w/ embedded nul
system.time(
effectiveUrls <- sapply(X= testUrls, FUN=getEffectiveUrl, USE.NAMES=F)
) # takes 7-10 secs on my laptop
# does Vectorize help?
vGetEffectiveUrl <- Vectorize(getEffectiveUrl, vectorize.args = "url")
system.time(
effectiveUrls2 <- vGetEffectiveUrl(testUrls)
) # nope, makes it worse
I had bad experience with RCurl and Async request. R would completely freeze (though no error message, CPU and RAM did not spike) with only concurrent 20 requests.
I recommend switching to CURL and using curl_fetch_multi() function. It my case it could easily handle 50000 JSON request in one pool (with some division into subpools under the hood).
https://cran.r-project.org/web/packages/curl/vignettes/intro.html#async_requests

Resources