How do I create a set of R functions that all access the same private variable?
Let's say I want to create readSetting(key) and writeSetting(key,value) functions that both operate on the same hidden list settings. If I try it like so...
local( {
settings <- list()
readSetting <<- function ( key ) settings[[key]]
writeSetting <<- function ( key, value ) settings[[key]] = value
} )
...then readSetting and writeSetting are not visible outside of the local call. If I want them to be visible there, I have to first assign
readSetting <- writeSetting <- NULL
outside the local call. There must be a better way, because my code isn't DRY if I have to say in two different ways which variables are public.
(The context of this work is that I'm developing an R package, and this code will be in an auxiliary file loaded into the main file via source.)
This question is related to How to limit the scope of the variables used in a script? but the answers there do not solve my problem.
You can simulate somthing like that using R6Class package and the following very rough code:
Privates <- R6Class("Privates",
public=list(
readSetting = function(key) {
private$settings[[key]]
},
writeSetting = function(key,value) {
private$settings[[key]] <<- value
}
),
private=list(
settings = list()
)
)
a <- Privates$new()
a$writeSetting("a",4)
a$readSetting("a")
Directly reading o setting the a$setting would not work.
Related
I am trying to write an R-script that uses nested functions to save multiple data.frames (parallelly) to global environment. The below sample code works fine in Windows. But when I moved the same code to a Linux server, the objects the function - prepare_output() saves to global environment are not captured by the save() operation in function - get_output().
Am i missing something that is fundamentally different on how mcmapply affects scoping in Linux vs Windows?
library(data.table)
library(parallel)
#Function definitions
default_case <- function(flag){
if(flag == 1){
create_input()
get_output()
}else{
Print("select a proper flag!")
}
}
create_input <- function(){
dt_initial <<- data.table('col1' = c(1:20), 'col2' = c(21:40)) #Assignment to global envir
}
get_output<- function(){
list1 <- c(5,6,7,8)
dt1 <- data.table(dt_initial[1:15,])
prepare_output<- function(cnt){
dt_new <- data.table(dt1)
dt_new <- dt_new[col1 <= cnt, ]
assign(paste0('dt_final_',cnt), dt_new, envir = .GlobalEnv )
#eval(call("<<-",paste0('dt_final_',cnt), dt_new))
print('contents in global envir inside:')
print(ls(name = .GlobalEnv)) # This print all object names dt_final_5 through dt_final_8 correctly
}
mcmapply(FUN = prepare_output,list1,mc.cores = globalenv()$numCores)
print('contents in global envir outside:')
print(ls(name = .GlobalEnv)) #this does NOT print dataframes generated and assigned to global in function prepare_output
save( list = ls(name = .GlobalEnv)[ls(name = .GlobalEnv) %like% 'dt_final_' ], file = 'dt_final.Rdata')
}
if(Sys.info()['sysname'] == "Windows"){numCores <- 1}else{numCores <- parallel::detectCores()}
print('numCores:')
print(numCores)
#Function call
default_case(1)
The reason I an using nested structure is because the preparation of dt1 is time taking and I do not want to increase the execution time by its execution every loop in the apply call.
(Sorry, I'll write this as an 'Answer' because the comment box is too brief)
The best solution to your problem would be to make sure you return the objects you produce rather than trying to assign them from inside a function to an external environment [edit 2020-01-26] which never works in parallel processing because parallel workers do not have access to the environments of the main R process.
A very good rule of thumb in R that will help you achieve this: Never use assign() or <<- in code - neither for sequential nor for parallel processing. At best, you can get such code to work in sequential mode but, in general, you will end up with hard to maintain and error-prone code.
By focusing on returning values (y <- mclapply(...) in your example), you'll get it right. It also fits in much better with the overall functional design of R and parallelizes more naturally.
I've got a blog post 'Parallelize a For-Loop by Rewriting it as an Lapply Call' from 2019-01-11 that might help you transition to this functional style.
I am working on a R script based on RSelenium library that is aimed to use "scraping scenarios" in form of a tibble. Therefore I would like to use a function, that according to certain arguments would return the certain action of remote driver. The general idea is to have something that will convert arguments to methods syntax:
scraper(driver, method, arguments) == driver$method(arguments)
So if I call:
scraper(remDr, "open") - it simply does - remDr$open()
scraper(remDr, "navigate", "https://google.com") - it does - remDr$navigate("https://google.com")
scraper(remDr, "findElement", list(using = "xpath", "[#=...]") - it does - remDr$findElement("xpath", "[#=...]")
Here is the sample that I've ended up with:
scraper <- function(driver, method, arguments = "") {
open <- function(driver) {
return(
driver$open()
)
}
close <- function(driver) {
return(
driver$close()
)
}
navigate <- function(driver, arguments) {
return(
driver$navigate(arguments)
)
}
findElement <- function(driver, arguments) {
return(
driver$findElement(arguments)
)
}
scraperMethods <- list(open = open,
close = close,
navigate = navigate,
findElement = findElement)
return(scraperMethods[[method]](arguments))
}
The double brackets convention in scraperMethods[[method]] seems to work in global environment but when i call
scraper(remDr, "open")
or other methods defined so far within the scraper function. It throws an error:
Error: $ operator is invalid for atomic vectors
So my questions are:
1. Is this the right approach?
2. If not - is there more convenient way to achieve my goal?
Thanks in advance for all answers.
I'm having trouble returning data frames from a loop in R. I have a set of functions that reads in files and turns them into data frames for the larger project to use/visualize.
I have a list of file names to pass:
# list of files to read
frameList <-c("apples", "bananas", "pears")
This function iterates over the list and runs the functions to create the data frames if they are not already present.
populateFrames <- function(){
for (frame in frameList){
if (exists(frame) && is.data.frame(get(frame))){
# do nothing
}
else {
frame <- clean_data(gather_data(frame))
}
}
}
When executed, the function runs with no errors, but does not save any data frame to the environment.
I can manually run the same thing and that saves a data frame:
# manually create "apples" data frame
apples <- clean_data(gather_data(frameList[1]))
From my reading through similar questions here, I see that assign() is used for similar things. But in the same way as before, I can run the code manually fine; but when put inside the loop no data frame is saved to the environment.
# returns a data frame, "apples" to the environment
assign(x = frame[1], value = clean_data(gather_data(frame[1])))
Solutions, following the principle of "change as little about the OPs implementation as possible".
You have two problems here.
Your function is not returning anything, so any changes that happen are stuck in the environment of the function
I think you're expecting the re-assignment of framein the elsestatement to re-assign it to that element in frameList. It's not.
This is the NOT RECOMMENDED* way of doing this where you assign a variable in the function's parent environment. In this case you are populatingFrames as a side effect, mutating the frameList in the parent environment. Mutating the input is generally something you want to avoid if you want to practice defensive programming.
populateFrames <- function(){
for (i in seq_along(frameList)){
if (exists(frameList[[i]]) && is.data.frame(get(frameList[[i]]))){
# do nothing
}
else {
frameList[[i]] <<- clean_data(gather_data(frameList[[i]]))
}
}
}
This is the RECOMMENDED version where you return the new frameList (which means you have to assign it to a value).
populateFrames <- function(){
for (i in seq_along(frameList)){
if (exists(frameList[[i]]) && is.data.frame(get(frameList[[i]]))){
# do nothing
}
else {
frameList[[i]] <- clean_data(gather_data(frameList[[i]]))
}
}
frameList
}
Avoiding global variable assignments, which are typically a no-no, try lapply:
lapply(
frameList,
function(frame){
if(exists(frame) && is.data.frame(get(frame))){
frame
}else{
clean_data(gather_data(frame))
}
}
)
My problem is very basic (I am a beginner user in R). I am trying to collect the value selected from a gradio widget (gwidgets2 package for R).
I am using a similar script as this simplified one :
U=vector(mode="character")
DF=function() {
Win=gbasicdialog(handler=function(h,...) {
T=svalue(A)
print(T)
# I can print but not assign the value using : assign (U,T, .GlobalEnv)
})
A<-gradio(c("1","2","3"), selected=1,container=Win,)
out <- visible(Win)
}
DF()
Using this script, I am able to print the value selected in the gradio widget, but when I try to assign this value to another variable passed to the global environment, I get an error.
It is strange as this structure of script works fine to collect values from other widgets (like gtable). What am I doing wrong ?
Thanks for the help.
I am not sure what goes wrong, but was able to run your code with a small change:
DF <- function() {
Win <- gbasicdialog(
handler = function(h, ...) {
.GlobalEnv$varT = svalue(A)
print(varT)
}
)
A <- gradio(c("1", "2", "3"), selected = 1, container = Win)
out <- visible(Win)
}
DF()
A small advice: avoid using the single letters T or F, as in your code T might be interpreted as TRUE and not object T.
I created a function which produces a matrix as a result, but I can't figure out how to make the output of this function usable outside of the function environment, so that I could for instance save it in csv file.
My code for function is the following:
created function which takes url's from specific site and returns page title:
getTitle <- function(url) {
webpage <- readLines(url)
first.row <- webpage[1]
start <- regexpr("<title>", first.row)
end <- regexpr("</title>", first.row)
title <- substr(first.row,start+7,end-1)
return(title)
}
created function which takes vector of urls and returns n*2 matrix with urls and page titles:
getTitles <- function(pages) {
my.matrix <- matrix(NA, ncol=2, nrow=nrow(pages))
for (i in seq_along(1:nrow(pages))) {
my.matrix[i,1] <- as.character(pages[i,])
my.matrix[i,2] <- getTitle(as.character(pages[i,])) }
return(my.matrix)
print(my.matrix)}
After running this functions on a sample file from here http://goo.gl/D9lLZ which I import with read.csv function and name "mypages" I get the following output:
getTitles(mypages)
[,1] [,2]
[1,] "http://support.google.com/adwords/answer/1704395" "Create your first ad campaign - AdWords Help"
[2,] "http://support.google.com/adwords/answer/1704424" "How costs are calculated in AdWords - AdWords Help"
[3,] "http://support.google.com/adwords/answer/2375470" "Organizing your account for success - AdWords Help"
This is exactly what I need, but I'd love to be able to export this output to csv file or reuse for further manipulations. However, when I try to print(my.matrix), I am getting an error saying "Error: object 'my.matrix' not found"
I feel like it's quite basic gap in my knowledge, but have not been working with R for a while and could not solve that.
Thanks!
Sergey
That's easy: use <<- for assignment to a global.
But then again, global assignment is evil and not functional. Maybe you'd rather return
a list with several results from your function? Looking at your code, it seems that your second function may confuse the return and print. Make sure you return the correct data structure.
A little about functional programming. First of all, when you define your function:
getTitles <- function(pages) {
[...]
return(my.matrix)
print(my.matrix)
}
know that when the function is called it will never reach the print statement. Instead, it will exit right before, with return. So you can remove that print statement, it is useless.
Now the more important stuff. Inside your function, you define and return my.matrix. The object only exists within the scope of the function: as the function exits, what is returned is an unnamed object (and my.matrix is lost.)
In your session, when you call
getTitles(mypages)
the result is printed because you did not assign it. Instead, you should do:
out.matrix <- getTitles(mypages)
Now the result won't be printed but you can definitely do so by typing print(out.matrix) or just out.matrix on a single line. And because you have stored the result in an object, you can now reuse it for further manipulations.
If it help you grasp the concept, this is all the same as calling the c() function from the command line:
c(1, 5, 2) # will return and print a vector
x <- c(1, 5, 2) # will return and assign a vector (not printed.)
Bonus: Really, I don't think you need to define getTitles, but you can use one of the *apply functions. I would try this:
url <- as.character(mypages)
title <- sapply(url, getTitle)
report <- data.frame(url, title)
write.csv(report, file = "report.csv", row.names = FALSE)
Can't you just use <<- to assign it the object to the workspace? The following code works for me and saves the amort_value object.
amortization <- function(cost, downpayment, interest, term) {
amort_value <<- (cost)*(1-downpayment/100)*(interest/1200)*((1+interest/1200)^(term*12))/((1+interest/1200)^(term*12)-1)
sprintf("$%.2f", amort_value)
}
amortization(445000,20,3,15)
amort_value
At the end of the function, you can return the result.
First define the function:
getRangeOf <- function (v) {
numRange <- max(v) - min(v)
return(numRange)
}
Then call it and assign the output to a variable:
scores <- c(60, 65, 70, 92, 99)
scoreRange <- getRangeOf(scores)
From here on use scoreRange in the environment. Any variables or nested functions within your defined function is not accessible to the outside, unless of course, you use <<- to assign a global variable. So in this example, you can't see what numRange is from the outside unless you make it global.
Usually, try to avoid global variables at an early stage. Variables are "encapsulated" so we know which one is used within the current context ("environment"). Global variables are harder to tame.