Get all R code which is run when running - r

Suppose I have a bunch of R code in a script and I want to log all the R code which is run from the .GlobalEnv to a flat file or a database together with the errors and warning messages.
I could write a simple logme function as follows or make it a bit more complex to also fetch the errors by changing options(error = mylogginfunction)
mylogfile <- tempfile()
logme <- function(x){
mode <- "at"
if(!file.exists(mylogfile)){
mode <- "wt"
}
myconn <- file(mylogfile, mode)
writeLines(x, myconn)
close(myconn)
invisible()
}
logme(sprintf("%s: started some yadayada, ", Sys.time()))
x <- 10
x * 7
logme(sprintf("%s: done with yadayada", Sys.time()))
## Get the log
cat(readLines(mylogfile))
The log prints out:
2015-05-14 17:24:31: started some yadayada, 2015-05-14 17:24:31: done with yadayada
But what I would like to have is that the logfile writes down the expressions which were executed without me having to write a wrapper around each statement.
I would like the log to look like.
2015-05-14 17:24:31: started some yadayada, x <- 10, x * 7 2015-05-14 17:24:31: done with yadayada
So my question is, how do I fetch what is being executed by R so that I can store the executed expressions in a log/database. And without having to write a function call before each expression (as in myhandler(x <- 10); myhandler(x * 10)).
Any help on this?

For catching input commands you could use addTaskCallback
mylogfile <- tempfile()
addTaskCallback(
function(...) {
expr <- deparse(as.expression(...)[[1]]) # it could handled better...
cat(expr, file=mylogfile, append=TRUE, sep="\n")
# or cat(sprintf("[%s] %s", Sys.time(), expr),...) if you want timestamps
TRUE
}
,name="logger"
)
x <- 10
x * 7
removeTaskCallback("logger")
Then result is:
cat(readLines(mylogfile), sep="\n")
... addTaskCallback definition ...
x <- 10
x * 7
But what you get is parsed expression, which means that line
x+1;b<-7;b==2
will be logged as
x + 1
b <- 7
b == 2
In addition:
output will not be logged, in particular message or warning shown in console
in case of error logging will not be triggered, so you need separate function to handle it

This is probably to simple to work in every case, but you can try with this:
Define myhandler as:
myhandler <- function(x, file = stdout()) {
expr <- substitute(x)
for(e_line in as.list(expr)) {
cat( file = file, as.character(Sys.time()), capture.output(e_line), "\n")
eval(e_line, envir = parent.frame())
}
}
Use it with your code inside the brackets:
myhandler({
a <- 1
a <- a + 1
print(a)
})
Result:
# 2015-05-14 18:46:34 `{`
# 2015-05-14 18:46:34 a <- 1
# 2015-05-14 18:46:34 a <- a + 1
# 2015-05-14 18:46:34 print(a)
# [1] 2

I confess that I don't really get what "to have the running expressions in the same process available as where the R commands are run" means when we chatted a bit in the comments. However, I expanded what I had in mind. You can create a logGenerator.R file with the following lines:
logGenerator<-function(sourcefile,log) {
..zz <- file(log, open = "at")
sink(..zz)
sink(..zz, type = "message")
on.exit({
sink(type="message")
sink()
close(..zz)
})
..x<-parse(sourcefile)
for (..i in 1:length(..x)) {
cat(as.character(Sys.time()),"\n")
cat(as.character(..x[..i]),"\n")
..y<-eval(..x[..i])
}
}
This function takes as arguments the source file and the log file names. This script will take an R file and will log the time at which each instruction is executed. Then it records the expression on the same log file. Every output directed to the stdout() and the error messages are directed to the log file. You obviously don't have to modify in any way your source file.

Related

how to properly close connection so I won't get "Error in file(con, "r") : all connections are in use" when using "readlines" and "tryCatch"

I have a list of URLs (more than 4000) from a specific domain (pixilink.com) and what I want to do is to figure out if the provided domain is a picture or a video. To do this, I used the solutions provided here: How to write trycatch in R and Check whether a website provides photo or video based on a pattern in its URL and wrote the code shown below:
#Function to get the value of initial_mode from the URL
urlmode <- function(x){
mycontent <- readLines(x)
mypos <- grep("initial_mode = ", mycontent)
if(grepl("0", mycontent[mypos])){
return("picture")
} else if(grepl("tour", mycontent[mypos])){
return("video")
} else{
return(NA)
}
}
Also, in order to prevent having error for URLs that don't exist, I used the code below:
readUrl <- function(url) {
out <- tryCatch(
{
readLines(con=url, warn=FALSE)
return(1)
},
error=function(cond) {
return(NA)
},
warning=function(cond) {
return(NA)
},
finally={
message( url)
}
)
return(out)
}
Finally, I separated the list of URLs and pass it into the functions (here for instance, I used 1000 values from URL list) described above:
a <- subset(new_df, new_df$host=="www.pixilink.com")
vec <- a[['V']]
vec <- vec[1:1000] # only chose first 1000 rows
tt <- numeric(length(vec)) # checking validity of url
for (i in 1:length(vec)){
tt[i] <- readUrl(vec[i])
print(i)
}
g <- data.frame(vec,tt)
g2 <- g[which(!is.na(g$tt)),] #only valid url
dd <- numeric(nrow(g2))
for (j in 1:nrow(g2)){
dd[j] <- urlmode(g2[j,1])
}
Final <- cbind(g2,dd)
Final <- left_join(g, Final, by = c("vec" = "vec"))
I ran this code on a sample list of URLs with 100, URLs and it worked; however, after I ran it on whole list of URLs, it returned an error. Here is the error : Error in textConnection("rval", "w", local = TRUE) : all connections are in use Error in textConnection("rval", "w", local = TRUE) : all connections are in use
And after this even for sample URLs (100 samples that I tested before) I ran the code and got this error message : Error in file(con, "r") : all connections are in use
I also tried closeAllConnection after each recalling each function in the loop, but it didn't work.
Can anyone explain what this error is about? is it related to the number of requests we can have from the website? what's the solution?
So, my guess as to why this is happening is because you're not closing the connections that you're opening via tryCatch() and via urlmode() through the use of readLines(). I was unsure of how urlmode() was going to be used in your previous post so it had made it as simple as I could (and in hindsight, that was badly done, my apologies). So I took the liberty of rewriting urlmode() to try and make it a little bit more robust for what appears to be a more expansive task at hand.
I think the comments in the code should help, so take a look below:
#Updated URL mode function with better
#URL checking, connection handling,
#and "mode" investigation
urlmode <- function(x){
#Check if URL is good to go
if(!httr::http_error(x)){
#Test cases
#x <- "www.pixilink.com/3"
#x <- "https://www.pixilink.com/93320"
#x <- "https://www.pixilink.com/93313"
#Then since there are redirect shenanigans
#Get the actual URL the input points to
#It should just be the input URL if there is
#no redirection
#This is important as this also takes care of
#checking whether http or https need to be prefixed
#in case the input URL is supplied without those
#(this can cause problems for url() below)
myx <- httr::HEAD(x)$url
#Then check for what the default mode is
mycon <- url(myx)
open(mycon, "r")
mycontent <- readLines(mycon)
mypos <- grep("initial_mode = ", mycontent)
#Close the connection since it's no longer
#necessary
close(mycon)
#Some URLs with weird formats can return
#empty on this one since they don't
#follow the expected format.
#See for example: "https://www.pixilink.com/clients/899/#3"
#which is actually
#redirected from "https://www.pixilink.com/3"
#After that, evaluate what's at mypos, and always
#return the actual URL
#along with the result
if(!purrr::is_empty(mypos)){
#mystr<- stringr::str_extract(mycontent[mypos], "(?<=initial_mode\\s\\=).*")
mystr <- stringr::str_extract(mycontent[mypos], "(?<=\').*(?=\')")
return(c(myx, mystr))
#return(mystr)
#So once all that is done, check if the line at mypos
#contains a 0 (picture), tour (video)
#if(grepl("0", mycontent[mypos])){
# return(c(myx, "picture"))
#return("picture")
#} else if(grepl("tour", mycontent[mypos])){
# return(c(myx, "video"))
#return("video")
#}
} else{
#Valid URL but not interpretable
return(c(myx, "uninterpretable"))
#return("uninterpretable")
}
} else{
#Straight up invalid URL
#No myx variable to return here
#Just x
return(c(x, "invalid"))
#return("invalid")
}
}
#--------
#Sample code execution
library(purrr)
library(parallel)
library(future.apply)
library(httr)
library(stringr)
library(progressr)
library(progress)
#All future + progressr related stuff
#learned courtesy
#https://stackoverflow.com/a/62946400/9494044
#Setting up parallelized execution
no_cores <- parallel::detectCores()
#The above setup will ensure ALL cores
#are put to use
clust <- parallel::makeCluster(no_cores)
future::plan(cluster, workers = clust)
#Progress bar for sanity checking
progressr::handlers(progressr::handler_progress(format="[:bar] :percent :eta :message"))
#Website's base URL
baseurl <- "https://www.pixilink.com"
#Using future_lapply() to recursively apply urlmode()
#to a sequence of the URLs on pixilink in parallel
#and storing the results in sitetype
#Using a future chunk size of 10
#Everything is wrapped in with_progress() to enable the
#progress bar
#
range <- 93310:93350
#range <- 1:10000
progressr::with_progress({
myprog <- progressr::progressor(along = range)
sitetype <- do.call(rbind, future_lapply(range, function(b, x){
myprog() ##Progress bar signaller
myurl <- paste0(b, "/", x)
cat("\n", myurl, " ")
myret <- urlmode(myurl)
cat(myret, "\n")
return(c(myurl, myret))
}, b = baseurl, future.chunk.size = 10))
})
#Converting into a proper data.frame
#and assigning column names
sitetype <- data.frame(sitetype)
names(sitetype) <- c("given_url", "actual_url", "mode")
#A bit of wrangling to tidy up the mode column
sitetype$mode <- stringr::str_replace(sitetype$mode, "0", "picture")
head(sitetype)
# given_url actual_url mode
# 1 https://www.pixilink.com/93310 https://www.pixilink.com/93310 invalid
# 2 https://www.pixilink.com/93311 https://www.pixilink.com/93311 invalid
# 3 https://www.pixilink.com/93312 https://www.pixilink.com/93312 floorplan2d
# 4 https://www.pixilink.com/93313 https://www.pixilink.com/93313 picture
# 5 https://www.pixilink.com/93314 https://www.pixilink.com/93314 floorplan2d
# 6 https://www.pixilink.com/93315 https://www.pixilink.com/93315 tour
unique(sitetype$mode)
# [1] "invalid" "floorplan2d" "picture" "tour"
#--------
Basically, urlmode() now opens and closes connections only when necessary, checks for URL validity, URL redirection, and also "intelligently" extracts the value assigned to initial_mode. With the help of future.lapply(), and the progress bar from the progressr package, this can now be applied quite conveniently in parallel to as many pixilink.com/<integer> URLs as desired. With a bit of wrangling thereafter, the results can be presented very tidily as a data.frame as shown.
As an example, I've demonstrated this for a small range in the code above. Note the commented out 1:10000 range in the code in this context: I let this code run the last couple of hours over this (hopefully sufficiently) large range of URLs to check for errors and problems. I can attest that I encountered no errors (only the regular warnings In readLines(mycon) : incomplete final line found on 'https://www.pixilink.com/93334'). For proof, I have the data from all 10000 URLs written to a CSV file that I can provide upon request (I don't fancy uploading that to pastebin or elsewhere unnecessarily). Due to oversight on my part, I forgot to benchmark that run, but I suppose I could do that later if performance metrics are desired/would be considered interesting.
For your purposes, I believe you can simply take the entire code snippet below and run it verbatim (or with modifications) by just changing the range assignment right before the with_progress(do.call(...)) step to a range of your liking. I believe this approach is simpler and does away with having to deal with multiple functions and such (and no tryCatch() messes to deal with).

How to get all objects in a script

I am trying to determine all the objects in a script. ( specifically to get all the dataframes but I'll settle for all the assigned objects ie vectors lists etc.)
Is there a way of doing this. Should I make the script run in its own session and then somehow get the objects from that session rather than rely on the global environment.
Use the second argument to source() when you execute the script. For example, here's a script:
x <- y + 1
z <- 2
which I can put in script.R. Then I will execute it in its own environment using the following code:
x <- 1 # This value will *not* change
y <- 2 # This value will be visible to the script
env <- new.env()
source("script.R", local = env)
Now I can print the values, and see that the comments are correct
x # the original one
# [1] 1
ls(env) # what was created?
# [1] "x" "z"
env$x # this is the one from the script
# [1] 3
I had a similar question and found an answer. I am copying the answer from my other post here.
I wrote the following function, get.objects(), that returns all the objects created in a script:
get.objects <- function(path2file = NULL, exception = NULL, source = FALSE, message = TRUE) {
library("utils")
library("tools")
# Step 0-1: Possibility to leave path2file = NULL if using RStudio.
# We are using rstudioapi to get the path to the current file
if(is.null(path2file)) path2file <- rstudioapi::getSourceEditorContext()$path
# Check that file exists
if (!file.exists(path2file)) {
stop("couldn't find file ", path2file)
}
# Step 0-2: If .Rmd file, need to extract the code in R chunks first
# Use code in https://felixfan.github.io/extract-r-code/
if(file_ext(path2file)=="Rmd") {
require("knitr")
tmp <- purl(path2file)
path2file <- paste(getwd(),tmp,sep="/")
source = TRUE # Must be changed to TRUE here
}
# Step 0-3: Start by running the script if you are calling an external script.
if(source) source(path2file)
# Step 1: screen the script
summ_script <- getParseData(parse(path2file, keep.source = TRUE))
# Step 2: extract the objects
list_objects <- summ_script$text[which(summ_script$token == "SYMBOL")]
# List unique
list_objects <- unique(list_objects)
# Step 3: find where the objects are.
src <- paste(as.vector(sapply(list_objects, find)))
src <- tapply(list_objects, factor(src), c)
# List of the objects in the Global Environment
# They can be in both the Global Environment and some packages.
src_names <- names(src)
list_objects = NULL
for (i in grep("GlobalEnv", src_names)) {
list_objects <- c(list_objects, src[[i]])
}
# Step 3bis: if any exception, remove from the list
if(!is.null(exception)) {
list_objects <- list_objects[!list_objects %in% exception]
}
# Step 4: done!
# If message, print message:
if(message) {
cat(paste0(" ",length(list_objects)," objects were created in the script \n ", path2file,"\n"))
}
return(list_objects)
}
To run it, you need a saved script. Here is an example of a script:
# This must be saved as a script, e.g, "test.R".
# Create a bunch of objects
temp <- LETTERS[1:3]
data <- data.frame(x = 1:10, y = 10:1)
p1 <- ggplot(data, aes(x, y)) + geom_point()
# List the objects. If you want to list all the objects except some, you can use the argument exception. Here, I listed as exception "p1.
get.objects()
get.objects(exception = "p1", message = FALSE)
Note that the function also works for external script and R markdown.
If you run an external script, you will have to run the script before. To do so, change the argument source to TRUE.

How to get content or length of current console output?

I'm looking for the functions get_output_content or at least get_output_length below, that would tell me how many characters were printed in the console.
test <- function(){
cat("ab")
cat("\b")
cat("cd")
c <- get_output_content() # "acd" (I'd be happy with "ab\bcd" as well)
l <- get_output_length() # 3
return(list(c,l))
}
test()
In this example obviously I could easily count the characters in the input, but If I'm using other functions I may not. Can you help me build one or both of these functions ?
EDIT to clarify:
in my real situation, I cannot work upstream and count before, like in the proposed solutions, I need to count the displayed output at a given time without monitoring what's before.
here's a reproducible example looking more like what I want to achieve
library(pbapply)
my_files <- paste0(1000:1,".pdf")
work_on_pdf <- function(pdf_file){
Sys.sleep(0.001)
}
report <- pbsapply(my_files,work_on_pdf) # the simple version, but I want to add the pdf name next to the bar to have more info about my progress
# so I tried this but it's not satisfying because it "eats" some of the current output of pbapply
report <- pbsapply(my_files,function(x){
buffer_length <- 25
work_on_pdf(x)
catmsg <- paste0(c( # my additional message, which is in 3 parts:
rep("\b",buffer_length), # (1) eat 25 characters
x, # (2) print filename
rep(" ",buffer_length-nchar(x))), # (3) print spaces to cover up what may have been printed before
collapse="")
cat(catmsg)
})
if I was able to count what's in the console I could easily tweak my function to get something satisfying.
NEW EDIT : FYI solution to example but not to general question:
I could solve my precise issue with this, though it doesn't solve the general question, which is measuring the current output of the console when you don't have any other info.
library(pbapply)
my_files <- paste0(1000:1,".pdf")
work_on_pdf <- function(pdf_file){
Sys.sleep(0.01)
}
pbsapply2 <- function(X,FUN,FUN2){
# FUN2 will give the additional message
pbsapply(X,function(x){
msg <- FUN2(x)
cat(msg)
output <- FUN(x)
eraser <- paste0(c(
rep("\b",nchar(msg)), # go back to position before additional message
rep(" ",nchar(msg)), # cover with blank spaces
rep("\b",nchar(msg))), # go back again to initial position
collapse="")
cat(eraser)
return(output)
})
}
report <- pbsapply2(my_files,work_on_pdf,function(x) paste("filename:",x))
Something like this (?):
test <- function(){
c <- paste0(capture.output(cat("ab")),
capture.output(cat("\b")),
capture.output(cat("cd")))
n <- nchar(c)
l <- length(c)
return(list(c,n,l))
}
test()

Test interaction with users in R package

I am developing an R package and one of the function implements interaction with users through standard input via readline. I now wonder how to test the behavior of this function, preferably with testthat library.
It seems test_that function assumes the answer is "" for user-input. I wish I could test the behavior conditional of various answers users may type in.
Below is a small example code. In the actual development, the marryme function is defined in a separate file and exported to the namespace.
devtools::test() gets me an error on the last line because the answer never becomes yes. I would like to test if the function correctly returns true when user types "y".
library(testthat)
test_that("input", {
marryme <- function() {
ans <- readline("will you marry me? (y/n) > ")
return(ans == "y")
}
expect_false(marryme()) # this is good
expect_true(marryme()) # this is no good
})
Use readLines() with a custom connection
By using readLines() instead of readline(), you can define the connection, which allows you to customize it using global options.
There are two steps that you need to do:
set a default option in your package in zzz.R that points to stdin:
.onAttach <- function(libname, pkgname){
options(mypkg.connection = stdin())
}
In your function, change readline to readLines(n = 1) and set the connection in readLines() to getOption("mypkg.connection")
Example
Based on your MWE:
library(testthat)
options(mypkg.connection = stdin())
marryme <- function() {
cat("will you marry me? (y/n) > ")
ans <- readLines(con = getOption("mypkg.connection"), n = 1)
cat("\n")
return(ans == "y")
}
test_that("input", {
f <- file()
options(mypkg.connection = f)
ans <- paste(c("n", "y"), collapse = "\n") # set this to the number of tests you want to run
write(ans, f)
expect_false(marryme()) # this is good
expect_true(marryme()) # this is no good
# reset connection
options(mypkg.connection = stdin())
# close the file
close(f)
})
#> will you marry me? (y/n) >
#> will you marry me? (y/n) >

R Language - Adding timestamp to console output

I'm using sink() for logging purposes for running R-Scripts, which works fine.
*R> sink(file = paste(Log_Path, FileName), append = TRUE, type = c("output"), split = TRUE)*
I'm now doing performance tests and needing to find out how long certain parts of the R-Script runs, without adding tons of print statements.
This solution works, via in RGui Interface:
R> updatePrompt <- function(...) {options(prompt=paste(Sys.time(),"> ")); return(TRUE)}
R> addTaskCallback(updatePrompt)
However, The time prompts doesn't propagate back into the Console stream of sink() when running in the R-Server.
Suggestions?
I've explored txtStart , but not sure if that's what I need.
Is there a different package or a option to set to set the timestamp in the prompt in the sink() console output?
Thanks for any help...
The prompt is not part of stdout, which is why it doesn't make it to the sink. Why don't you just print from your callback? For example:
make_timing_fun <- function() {
time.start <- proc.time()
function(...) {
new.time <- proc.time()
print(new.time - time.start)
time.start <<- new.time
TRUE
}
}
addTaskCallback(make_timing_fun()) # note parens used to generate actual function
Note this times the time between statements completing, so if you're just waiting around the console doing nothing that will be part of the time as well.
I did try that originally, but I tried it again. and received same results:
Snippet of saved console output from log file:
> startdate <- as.vector(input_data2)
> input_data3 <- stop_date
> stopdate <- as.vector(input_data3)
.
Was hoping for this:
2014-01-03 09:07:57 > startdate <- as.vector(input_data2)
2014-01-03 09:07:57 > input_data3 <- stop_date
2014-01-03 09:07:57 > stopdate <- as.vector(input_data3)
.

Resources