How to make object created within function usable outside - r

I created a function which produces a matrix as a result, but I can't figure out how to make the output of this function usable outside of the function environment, so that I could for instance save it in csv file.
My code for function is the following:
created function which takes url's from specific site and returns page title:
getTitle <- function(url) {
webpage <- readLines(url)
first.row <- webpage[1]
start <- regexpr("<title>", first.row)
end <- regexpr("</title>", first.row)
title <- substr(first.row,start+7,end-1)
return(title)
}
created function which takes vector of urls and returns n*2 matrix with urls and page titles:
getTitles <- function(pages) {
my.matrix <- matrix(NA, ncol=2, nrow=nrow(pages))
for (i in seq_along(1:nrow(pages))) {
my.matrix[i,1] <- as.character(pages[i,])
my.matrix[i,2] <- getTitle(as.character(pages[i,])) }
return(my.matrix)
print(my.matrix)}
After running this functions on a sample file from here http://goo.gl/D9lLZ which I import with read.csv function and name "mypages" I get the following output:
getTitles(mypages)
[,1] [,2]
[1,] "http://support.google.com/adwords/answer/1704395" "Create your first ad campaign - AdWords Help"
[2,] "http://support.google.com/adwords/answer/1704424" "How costs are calculated in AdWords - AdWords Help"
[3,] "http://support.google.com/adwords/answer/2375470" "Organizing your account for success - AdWords Help"
This is exactly what I need, but I'd love to be able to export this output to csv file or reuse for further manipulations. However, when I try to print(my.matrix), I am getting an error saying "Error: object 'my.matrix' not found"
I feel like it's quite basic gap in my knowledge, but have not been working with R for a while and could not solve that.
Thanks!
Sergey

That's easy: use <<- for assignment to a global.
But then again, global assignment is evil and not functional. Maybe you'd rather return
a list with several results from your function? Looking at your code, it seems that your second function may confuse the return and print. Make sure you return the correct data structure.

A little about functional programming. First of all, when you define your function:
getTitles <- function(pages) {
[...]
return(my.matrix)
print(my.matrix)
}
know that when the function is called it will never reach the print statement. Instead, it will exit right before, with return. So you can remove that print statement, it is useless.
Now the more important stuff. Inside your function, you define and return my.matrix. The object only exists within the scope of the function: as the function exits, what is returned is an unnamed object (and my.matrix is lost.)
In your session, when you call
getTitles(mypages)
the result is printed because you did not assign it. Instead, you should do:
out.matrix <- getTitles(mypages)
Now the result won't be printed but you can definitely do so by typing print(out.matrix) or just out.matrix on a single line. And because you have stored the result in an object, you can now reuse it for further manipulations.
If it help you grasp the concept, this is all the same as calling the c() function from the command line:
c(1, 5, 2) # will return and print a vector
x <- c(1, 5, 2) # will return and assign a vector (not printed.)
Bonus: Really, I don't think you need to define getTitles, but you can use one of the *apply functions. I would try this:
url <- as.character(mypages)
title <- sapply(url, getTitle)
report <- data.frame(url, title)
write.csv(report, file = "report.csv", row.names = FALSE)

Can't you just use <<- to assign it the object to the workspace? The following code works for me and saves the amort_value object.
amortization <- function(cost, downpayment, interest, term) {
amort_value <<- (cost)*(1-downpayment/100)*(interest/1200)*((1+interest/1200)^(term*12))/((1+interest/1200)^(term*12)-1)
sprintf("$%.2f", amort_value)
}
amortization(445000,20,3,15)
amort_value

At the end of the function, you can return the result.
First define the function:
getRangeOf <- function (v) {
numRange <- max(v) - min(v)
return(numRange)
}
Then call it and assign the output to a variable:
scores <- c(60, 65, 70, 92, 99)
scoreRange <- getRangeOf(scores)
From here on use scoreRange in the environment. Any variables or nested functions within your defined function is not accessible to the outside, unless of course, you use <<- to assign a global variable. So in this example, you can't see what numRange is from the outside unless you make it global.
Usually, try to avoid global variables at an early stage. Variables are "encapsulated" so we know which one is used within the current context ("environment"). Global variables are harder to tame.

Related

R: Continue loop to next iteration if function used in loop has stop() clausule

I have created a function that reads in a dataset but returns a stop() when this specific file does not exist on the drive. This function is called sondeprofile(), but the only important part is this:
if(file.exists(sonde)) {
dfs <- read.table(sonde, header=T, sep=",", skip = idx, fill = T)
} else {
stop("No sonde data available for this day")
}
This function has then been used within a for-loop to loop over specific days and stations to do calculations on each day. Extremely simplified problem:
for(name in stations) {
sonde <- sondeprofile(date)
# Continue with loop if sonde exists, skip this if not
if(exists("sonde")) {
## rest of code ##
}
}
But my issue is whenever the sondeprofile() functions finds that there is no file for this specific date, the stop("No sonde data available for this date") causes the whole for loop above to stop. I thought by checking if the file exists it would be enough to make sure it skips this iteration. But alas I can't get this to work properly.
I want that whenever the sondeprofile() function finds that there is no data available for a specific date, it skips the iteration and does not execute the rest of the code, rather just goes to the next one.
How can I make this happen? sondeprofile() is used in other portions of the code as well, as a standalone function so I need it to skip the iteration in the for loop.
When the function sondeprofile() throws an error, it will stop your whole loop. However, you can avoid that with try(), which attempts to try to run "an expression that might fail and allow the user's code to handle error-recovery." (From help("try")).
So, if you replace
sonde <- sondeprofile(date)
with
sonde <- try(sondeprofile(date), silent = TRUE)
you can avoid the problem of it stopping your loop. But then how do you deal with the if() condition?
Well, if a try() call encounters an error, what it returns will be of class try-error. So, you can just make sure that sonde isn't of that class, changing
if(exists("sonde")) {
to
if ( !inherits(sonde, "try-error") ) {

Saving R objects to global environment from inside a nested function called by a parent function using mcmapply

I am trying to write an R-script that uses nested functions to save multiple data.frames (parallelly) to global environment. The below sample code works fine in Windows. But when I moved the same code to a Linux server, the objects the function - prepare_output() saves to global environment are not captured by the save() operation in function - get_output().
Am i missing something that is fundamentally different on how mcmapply affects scoping in Linux vs Windows?
library(data.table)
library(parallel)
#Function definitions
default_case <- function(flag){
if(flag == 1){
create_input()
get_output()
}else{
Print("select a proper flag!")
}
}
create_input <- function(){
dt_initial <<- data.table('col1' = c(1:20), 'col2' = c(21:40)) #Assignment to global envir
}
get_output<- function(){
list1 <- c(5,6,7,8)
dt1 <- data.table(dt_initial[1:15,])
prepare_output<- function(cnt){
dt_new <- data.table(dt1)
dt_new <- dt_new[col1 <= cnt, ]
assign(paste0('dt_final_',cnt), dt_new, envir = .GlobalEnv )
#eval(call("<<-",paste0('dt_final_',cnt), dt_new))
print('contents in global envir inside:')
print(ls(name = .GlobalEnv)) # This print all object names dt_final_5 through dt_final_8 correctly
}
mcmapply(FUN = prepare_output,list1,mc.cores = globalenv()$numCores)
print('contents in global envir outside:')
print(ls(name = .GlobalEnv)) #this does NOT print dataframes generated and assigned to global in function prepare_output
save( list = ls(name = .GlobalEnv)[ls(name = .GlobalEnv) %like% 'dt_final_' ], file = 'dt_final.Rdata')
}
if(Sys.info()['sysname'] == "Windows"){numCores <- 1}else{numCores <- parallel::detectCores()}
print('numCores:')
print(numCores)
#Function call
default_case(1)
The reason I an using nested structure is because the preparation of dt1 is time taking and I do not want to increase the execution time by its execution every loop in the apply call.
(Sorry, I'll write this as an 'Answer' because the comment box is too brief)
The best solution to your problem would be to make sure you return the objects you produce rather than trying to assign them from inside a function to an external environment [edit 2020-01-26] which never works in parallel processing because parallel workers do not have access to the environments of the main R process.
A very good rule of thumb in R that will help you achieve this: Never use assign() or <<- in code - neither for sequential nor for parallel processing. At best, you can get such code to work in sequential mode but, in general, you will end up with hard to maintain and error-prone code.
By focusing on returning values (y <- mclapply(...) in your example), you'll get it right. It also fits in much better with the overall functional design of R and parallelizes more naturally.
I've got a blog post 'Parallelize a For-Loop by Rewriting it as an Lapply Call' from 2019-01-11 that might help you transition to this functional style.

How To Create R Data Frame From List In Loop

I'm having trouble returning data frames from a loop in R. I have a set of functions that reads in files and turns them into data frames for the larger project to use/visualize.
I have a list of file names to pass:
# list of files to read
frameList <-c("apples", "bananas", "pears")
This function iterates over the list and runs the functions to create the data frames if they are not already present.
populateFrames <- function(){
for (frame in frameList){
if (exists(frame) && is.data.frame(get(frame))){
# do nothing
}
else {
frame <- clean_data(gather_data(frame))
}
}
}
When executed, the function runs with no errors, but does not save any data frame to the environment.
I can manually run the same thing and that saves a data frame:
# manually create "apples" data frame
apples <- clean_data(gather_data(frameList[1]))
From my reading through similar questions here, I see that assign() is used for similar things. But in the same way as before, I can run the code manually fine; but when put inside the loop no data frame is saved to the environment.
# returns a data frame, "apples" to the environment
assign(x = frame[1], value = clean_data(gather_data(frame[1])))
Solutions, following the principle of "change as little about the OPs implementation as possible".
You have two problems here.
Your function is not returning anything, so any changes that happen are stuck in the environment of the function
I think you're expecting the re-assignment of framein the elsestatement to re-assign it to that element in frameList. It's not.
This is the NOT RECOMMENDED* way of doing this where you assign a variable in the function's parent environment. In this case you are populatingFrames as a side effect, mutating the frameList in the parent environment. Mutating the input is generally something you want to avoid if you want to practice defensive programming.
populateFrames <- function(){
for (i in seq_along(frameList)){
if (exists(frameList[[i]]) && is.data.frame(get(frameList[[i]]))){
# do nothing
}
else {
frameList[[i]] <<- clean_data(gather_data(frameList[[i]]))
}
}
}
This is the RECOMMENDED version where you return the new frameList (which means you have to assign it to a value).
populateFrames <- function(){
for (i in seq_along(frameList)){
if (exists(frameList[[i]]) && is.data.frame(get(frameList[[i]]))){
# do nothing
}
else {
frameList[[i]] <- clean_data(gather_data(frameList[[i]]))
}
}
frameList
}
Avoiding global variable assignments, which are typically a no-no, try lapply:
lapply(
frameList,
function(frame){
if(exists(frame) && is.data.frame(get(frame))){
frame
}else{
clean_data(gather_data(frame))
}
}
)

How to create R functions with private variables?

How do I create a set of R functions that all access the same private variable?
Let's say I want to create readSetting(key) and writeSetting(key,value) functions that both operate on the same hidden list settings. If I try it like so...
local( {
settings <- list()
readSetting <<- function ( key ) settings[[key]]
writeSetting <<- function ( key, value ) settings[[key]] = value
} )
...then readSetting and writeSetting are not visible outside of the local call. If I want them to be visible there, I have to first assign
readSetting <- writeSetting <- NULL
outside the local call. There must be a better way, because my code isn't DRY if I have to say in two different ways which variables are public.
(The context of this work is that I'm developing an R package, and this code will be in an auxiliary file loaded into the main file via source.)
This question is related to How to limit the scope of the variables used in a script? but the answers there do not solve my problem.
You can simulate somthing like that using R6Class package and the following very rough code:
Privates <- R6Class("Privates",
public=list(
readSetting = function(key) {
private$settings[[key]]
},
writeSetting = function(key,value) {
private$settings[[key]] <<- value
}
),
private=list(
settings = list()
)
)
a <- Privates$new()
a$writeSetting("a",4)
a$readSetting("a")
Directly reading o setting the a$setting would not work.

Assigning namespace variables inside of a function

I'm struggling to assign a namespace variable inside of a function. Consider this example using the CRAN package "qcc": qcc() generates a plot, but
the display options of that plot are controlled by qcc.options().
When working in global, everything is fine:
library(qcc)
qcc.options(bg.margin="red") # sets margin background colour, i.e.
# qcc:::.qcc.options$bg.margin is "red"
qcc(rnorm(100), type = "xbar.one") # generates a plot with red margin
But when working in the local environment of a function, qcc and qcc.options seem to use the namespace differently:
foo <- function(x){
qcc.options(bg.margin=x)
qcc(rnorm(100), type = "xbar.one")
}
foo("green") # generates a default plot with grey margins
Here is an ugly hack:
foo <- function(x){
old.qcc.options <- get(".qcc.options", asNamespace("qcc"))
assign(".qcc.options", qcc.options(bg.margin=x), asNamespace("qcc"))
res <- qcc(rnorm(100), type = "xbar.one")
assign(".qcc.options", old.qcc.options, asNamespace("qcc"))
invisible(res)
}
foo("green")
Of course, the scoping issues would be better solved by changing qcc.options. You should contact the package maintainer about that.
This is because of where qcc.options stores its .qcc.options variable. Working in global, this is qcc:::.qcc.options, but when you're inside a function, it is storing it in a local variable just called .qcc.options, thus when you try to use plot.qcc (called by qcc) it retrieves options from the global (non-exported) qcc:::.qcc.options rather than the local .qcc.options.
Here's a function that shows what's happening with the options:
bar <- function(x){
pre <- qcc:::.qcc.options
pre.marg <- qcc.options("bg.margin")
qcc.options(bg.margin=x)
post1 <- qcc:::.qcc.options
post2 <- .qcc.options
post.marg <- qcc.options("bg.margin")
qcc(rnorm(100), type = "xbar.one")
list(pre,post1,post2,pre.marg,post.marg)
}
bar('green')
If you look at the results, you'll see that qcc.options creates the local variable and changes its value of bg.margin to "green" but this isn't the object that's subsequently referenced by plot.qcc.
Seems like you should request some code modifications from the package maintainer because this is probably not the best setup.
EDIT: A workaround is to use assignInNamespace to use the local variable to overwrite the global one. (Obviously, this then changes the parameter globally and would affect all subsequent plots unless the parameter is updated.)
foo <- function(x){
qcc.options(bg.margin=x)
assignInNamespace('.qcc.options',.qcc.options,ns='qcc')
qcc(rnorm(100), type = "xbar.one")
}
foo('green')

Resources