Adding logging to a complex function? - r

Assume that I have a (at least subjectively) complex function like this:
library(rgithub)
pull <- function(i){
commits <- get.pull.request.commits(owner = owner, repo = repo, id = i, ctx = get.github.context(), per_page=100)
links <- digest_header_links(commits)
number_of_pages <- links[2,]$page
if (number_of_pages != 0)
try_default(for (n in 1:number_of_pages){
if (as.integer(commits$headers$`x-ratelimit-remaining`) < 5)
Sys.sleep(as.integer(commits$headers$`x-ratelimit-reset`)- as.POSIXct(Sys.time()) %>% as.integer())
else
get.pull.request.commits(owner = owner, repo = repo, id = i, ctx = get.github.context(), per_page=100, page = n)
}, default = NULL)
else
return(commits)
}
list <- c(500, 501, 502)
pull_lists <- lapply(list, pull)
Let's say that I want to attain a deeper understanding of what actually happens inside this function. How can I add some type of logging that will help me trace what goes on inside of the function as it is being run?

You can use futile.logger
Than you can setup log threshold level using:
flog.threshold(INFO)
Functions, like flog.debug or flog.info are used to produce logging information
For further details see:
http://www.r-bloggers.com/better-logging-in-r-aka-futile-logger-1-3-0-released/
http://cran.r-project.org/web/packages/futile.logger/index.html

Related

Parallel computing on two R servers using batchtools/BatchJobs

I'm trying to use batchtools/BatchJobs for parallel computing on two unix-based R servers. I'm completely new to this and hence followed a few articles and package details to do this. I have added some links below:
batchtools,
BatchJobs
So far I have not really understood how to use batchtools for multi-machines. On the other hand, with BatchJobs I have better progress.
I made an ssh connection from the terminal first and execute the following lines:
reg = makeRegistry("TestExp")
reg$cluster.functions = makeClusterFunctionsSSH(worker = makeSSHWorker(nodename="sla19438")) #By BatchJobs
#Test Function
piApprox = function(n) {
nums = matrix(runif(2 * n), ncol = 2)
d = sqrt(nums[, 1]^2 + nums[, 2]^2)
4 * mean(d <= 1)
}
set.seed(42)
piApprox(1000)
BatchJobs::batchMap(reg = reg, fun = piApprox, n = rep(1e7, 10))
getJobTable()
BatchJobs::submitJobs(reg = reg, resources = list(walltime = 3600, memory = 1024))
getStatus(reg = reg)
loadResult(reg = reg, id = 5)
mean(sapply(1:10, loadResult, reg = reg))
It works and gives me the results but I can't see any indication of the jobs being run on the other machine (sla19438) when I run "top" in the terminal.
Please help me understand what I'm doing wrong. Maybe there is some configuration needed but I don't see any material online which dumbs down the steps for a newbie like me.
Thanks

Simple deep pagination example using SOLR and R

I'm needing to perform deep pagination using R and the solr package. SOLR 7.2.1 server, R 3.4.3
I can't figure out how to get the nextCursorMark from the resultant dataframe. I usually do this in Python but this is stumping me.
res <- solr_all(base = myBase, rows = 100, verbose=TRUE,
sort = "unique_id asc",
fq="*:*",
cursorMark="*"
)
I cannot get the nextCursorMark from the result. Any help would be appreciated.
I have noticed that if I add the nextCursorMark to pageDoc it will return the value if parsetype is set to json, but not dataframe. So I guess another part is - where is that value if you return a dataframe?
So I finally got a way to make this work. This is not optimal, the final solution is in the github issue referenced in the comment. But this works:
dat <-"http://yadda.com"
cM = "*"
done = FALSE
rowCount = 0
a <- data.frame()
while (!done)
{
Data <- solr_search(base = dat, rows = 100, verbose=FALSE,
sort = "unique_id asc",
fq="*:*",
parsetype="json",
cursorMark=cM,
pageDoc = "nextCursorMark"
)
if (cM == Data$nextCursorMark) {
done = TRUE
} else {
cM = Data$nextCursorMark
}
a <- append(x = a, Data$response$docs)
rowCount = rowCount + length(Data$response$docs)
print(rowCount)
}

Error from Rfacebook 'getPage' with since&until

I am using Rfacebook version 0.6.
When I call getPage with since and until dates as following, I get the following error. What I am doing wrong, or if there's something that need be updated in the package itself?
Note: <facebook_page_name>, <my_app_id>, <my_app_secret> are placeholders for illustration, without using the actual values.
Here are the details:
content<-get_fb_data("<facebook_page_name>",since="2016/01/01",until="2016/01/20",condition=2)
get_fb_data<-function(page_name,no_of_records,since_date,until_date,condition)
{
#get data from facebook page
library("Rfacebook")
fb_oauth <- fbOAuth(app_id="<my_app_id>", app_secret="<my_app_secret>",
extended_permissions = FALSE)
if (condition == 1)
{
content<-getPage(page_name, fb_oauth, no_of_records, feed = TRUE)
}
else
{
since_date<-paste(since_date,"00:00:00 IST",sep=" ")
until_date<-paste(until_date,"23:59:59 IST",sep=" ")
from_value<-as.numeric(as.POSIXct(since_date))
to_value<-as.numeric(as.POSIXct(until_date))
content<-getPage(page_name, fb_oauth,
since = from_value,
until = to_value,
feed = TRUE)
}
return(content)
}
Error displayed:
Error in as.Date.numeric(since) : 'origin' must be supplied
Per debug, this is from function as.Date called in getPage.
This should work:
library("Rfacebook")
fb_oauth <- fbOAuth(app_id="<my_app_id>", app_secret="<my_app_secret>",
extended_permissions = FALSE)
get_fb_data <- function(page_name, no_of_records, since_date,
until_date, condition){
if (condition == 1){
content<-getPage(page_name, fb_oauth, no_of_records, feed = TRUE)
} else{
content <- getPage(page_name, fb_oauth,
since = since_date,
until = until_date,
feed = TRUE)
}
content
}
content <- get_fb_data("humansofnewyork",
since_date="2016/01/01",
until_date="2016/01/2",
condition=2)
I do not really understand, why you are trying to change date format - it's unnecessary. What is more, you have a syntax error, because else should be written after } closing if. You shouldn't also load packages inside your function. What for loading it each time? The same with your fb_oauth.

Increase the github xrate limit in R

Increasing xrate limit related to github api in R, I am using rgithub package in R to extract the pull requests from github. I have registered the application and generated the client id's. The code of using xrate limit in R, the ctx contains the details clientids. To increase the xrate limit I am running the curl command with details. Is this the correct way to increase the rate-limit. Please suggest any other way possible
ctx = interactive.login("clientid", "client_secret_id")
owner = "a"
repo = "repo_name"
comments <- function(i){
commits <- get.issue.comments(owner = owner, repo = repo, number = i,
ctx = get.github.context(), per_page=100)
links <- digest_header_links(commits)
number_of_pages <- links[2,]$page
if (number_of_pages != 0)
try_default(for (n in 1:number_of_pages){
if (as.integer(commits$headers$`x-ratelimit-remaining`) < 5)
Sys.sleep(as.integer(commits$headers$`x-ratelimit-reset`)
- as.POSIXct(Sys.time()) %>% as.integer())
else
get.issue.comments(owner = owner, repo = repo, number = i,
ctx =get.github.context(), per_page=100, page = n)
}, default = NULL)
else
return(commits)
}
list <- c(issueid) # a issue id in github
comments_lists <- lapply(list, comments)
curl -i 'https://api.github.com/users/whatever?client_id=&client_secret=yyyy'

How to initialize R function during first run or whenever input changes

I'm new to R and have some trouble of understanding so called "envirionments" and way to use them properly. What I miss a lot in R language are static variables (like in Java).
I'm writing a program with couple of functions that will need to initialize during first run. To achieve this for each function I've created new environment which will be only accessed by this particular function (for example "f1" will be only accessed from inside "myfunction1").
What I don't like about my solution is that there is some additional code outside of function body and it's not too readable. Is there any simpler way to achieve the same? And if yes then it would be nice if you could provide me with modified example to show me how it works. Thank you.
f1 <- new.env()
f1$initialized <- FALSE
f1$o <- NULL
f1$length <- NULL
f1$compute
myfunction1 <- function(x) {
if(f1$initialized == FALSE){
f1$initialized <- TRUE
f1$compute <- 2*pi^2+3
}
if(is.null(f1$length) || f1$length!=length(x)){
f1$length <- length(x)
if(f1$length==2) {f1$o<-read.table("data_1.txt")}
else {f1$o<-read.table("data_2.txt")}
}
print("Lets print something!")
return(f1$o * f1$compute * x + 1000)
}
If you are familiar with Java then maybe using RefrenceClasses would be a good way to go. This seems to do what you are looking for:
myclass <- setRefClass('myclass', fields = list(initilized = 'logical',
o = 'data.frame',
len = 'numeric',
compute = 'numeric'))
#constructor
myclass$methods(initialize = function(initialized, len){
initilized <<- initialized
len <<- len
})
#method
myclass$methods(myfunction1 = function(x){
if(initilized == FALSE){
initilized <<- TRUE
compute <<- 2*pi^2+3
}
if(is.null(len) || len != length(x)){
len <<- length(x)
if(len==2) {o <<- read.table("data_1.txt")}
else {o <<- read.table("data_2.txt")}
}
print("Lets print something!")
return(o * compute * x + 1000)
})
obj <- myclass$new(FALSE, 0)
obj$myfunction1(2)
Check out ?ReferenceClasses for information on what's going on here (much more OOP styled and has some support for class inheritance, which sounds like what you want anyway).

Resources