Why assigning to globalenv not allowed when creating R package? - r

I'm creating a R package and I have a function that returns an object which its name is constructed with the argument passed.
I use the function assign() to do this as in the code below and it works fine.
df <- data.frame(A = 1:10, B = 2:11, C = 5:14)
ot_test <- function(df, min){
tmp <- colSums(df)
tmp2 <- df[, tmp >= min]
assign(paste0(deparse(substitute(df)), "_min_", min), tmp2, envir= .GlobalEnv)
}
ot_test(df,60)
ls()
[1] "df" "df_min_60" "ot_test"
But when I check the package with devtools::check I have the message.
Found the following assignments to the global environment:
File 'test/R/ottest.R':
assign(paste0(deparse(substitute(df)), "_min_", min), tmp2, envir = .GlobalEnv)
Is there a way to do the same without having .GlobalEnv in argument or without using the function assign().

Its just an ugly, bad thing to do in a functional programming environment.
What's wrong with:
df_min_60 = ot_test(df,60)
Your argument will be that your method saves a bit of typing, but it opens you up to all sorts of bugs and obscurities.
Suppose I want to call ot_test in a function, in a loop maybe. Now its stomped on, with no warning or obvious clue its going to do it, the df_min_60 in my global workspace. Gee thanks for that. So what do I have to do?
ot_test(df, 60)
# now rename so I don't stomp on it
df_min_60.1 = df_min_60
results = domyloop(d1,d2,d3)
Which has meant more typing.
Now another idea. Suppose I want to call ot_test on a list of data frames and make a list of the results. Normally I'd do something like:
for(i in 1:10){res[[i]] = ot_test(data[[i]], 60)}
but with your code I can't. I have to do:
for(i in 1:10){d=data[[i]]; ot_test(d,60); res[[i]] = d_min_60)}
which is WAY more typing.
Be thankful that devtools::check only gives a message and doesn't set your computer on fire for doing this. Seriously, don't create things in the global environment, return them as return values.

Related

interpolate_from_env: is call to setNames redundant?

I've been exploring engine.R to improve my understanding of how the sql engine for Knitr works. I noticed a call to setNames in the definition for interpolate_from_env that looks like it may be redundant (at least to my relatively inexperienced eyes!)
I am unsure if it is appropriate to reproduce the entirety of the function definition below, so I'm just including the lines in question:
args = if (length(names) > 0) setNames(
mget(names, envir = env), names)
setNames appears to be called on the result of mget, which already returns a named list of objects. Similarly, the call to identical below returns TRUE:
identical(
mget(names, envir = env),
setNames(mget(names, envir = env, names)
)
Have I overlooked something here or is the call to setNames in fact redundant?
TIA!

R pipe, mget, and environments

I'm posting this in hopes someone could explain the behavior here. And perhaps this may save others some time in tracking down how to fix a similar error.
The answer is likely somewhere here in this vignette by Hadley Wickham and Lionel Henry. Yet it will take someone like me weeks of study to connect the dots.
I am running a number of queries from a remote database and then combining them into a single data.table. I add the "part_" prefix to the name of each individual query result and use ls() and mget() with data.table's rbindlist() to combined them.
This works:
results_all <- rbindlist(mget(ls(pattern = "part_", )))
I learned that approach, probably from list data.tables in memory and combine by row (rbind), and it is a helpful thing to know how to do for sure.
For readability, I often prefer using the magrittr pipe (or chaining with data.table) and especially so with projects like this because I use dplyr to query the database. Yet this code results in an error:
results_all <- ls(pattern = "part_", ) %>%
mget() %>%
rbindlist()
The error reads Error: value for ‘part_a’ not found where part_a is the first object name in the character vector returned by ls().
Searching that error message, I came across the discussion in this data.table Github issue. Reading through that, I tried setting "inherits = TRUE" within mget() like so:
results_all <- ls(pattern = "part_", ) %>%
mget(inherits = TRUE) %>%
rbindlist()
And that works. So the error is happening when piping the result of ls() to mget(). And given that nesting ls() within mget() works, my guess is that it is something to do with the pipe and "the enclosing frames of the environment".
In writing this up, I came across Unexpected error message while joining data.table with rbindlist() using mget(). From the discussion there I found out that this also works.
results_all <- ls(pattern = "part_", ) %>%
mget(envir = .GlobalEnv) %>%
rbindlist()
Again, I am hoping someone can explain what is going on for folks looking to learn more about how environments work in R.
Edit: Adding reproducible example
Per the request for a reproducible answer, running the code above using these three data.tables (data.frames or tibbles will behave the same) should do it.
part_a <- data.table(col1 = 1:10, col2 = sample(letters, 10))
part_b <- data.table(col1 = 11:20, col2 = sample(letters, 10))
part_c <- data.table(col1 = 21:30, col2 = sample(letters, 10))
The rhs argument to a pipe operator (in your example, the expression mget()) is never evaluated as a function call by the interpreter. The pipe operator is an infix function that performs non-standard evaluation of its second argument (rhs). The pipe function composes and performs a new function call using the RHS expression as a sort of "template".
The calling environment of this new function call is the function environment of %>%, not the calling environment of the lhs function or the global environment. .GlobalEnv and the calling environment of the lhs function happen to be the same environment in your example, and that environment is a parent to the function environment of %>%, which is why inherits = TRUE or setting the environment to .GlobalEnv works for you.

Possible to call a function to all particular objects stored in project environment?

I, stupidly, have been creating loads of new data_frames in an R project trying to solve a particular problem without making proper commits. Having gone through all practical names and most of the Greek alphabet, I now have an environment full of data_frame objects with names like 'bob','might.work','almostthere'. I'd like to use a looping function - lapply or otherwise - to return some indicators that will tell me something about each dataframe object in the environment. I can then clean up/delete based on the returns.
So is it possible to use lapply to access all data_frames in an R project environment? Something like this?
lapply(environment, function (x){
if(is.dataframe(x)){
dplyr::glimpse(x)
}
}
Thanks.
The eapply() function easily iterates over objects in an environment
eapply(globalenv(), function(x) if (is.data.frame(x)) dplyr::glimpse(x))
Sure is possible!
lapply(ls(),function(x){
o = get(x,envir = globalenv())
cat("if"(is.data.frame(o),paste0(x," is a data frame!\n"),"Nope.\n"))
})
ls() will list all object names in the environment (global by default).
Since this is just a name, we need to get the value but specify the global environment (since we're in a function environment at this point)
Then I cat out if it's a data frame, but you can do whatever you want with the o object.
The following function will return the objects that inherit from class data.frame in the environment environ, which defaults to .GlobalEnv.
getDataFrames <- function(environ = .GlobalEnv){
l <- ls(name = environ)
res <- NULL
for(i in seq_along(l)){
r <- inherits(get(l[i], envir = environ), "data.frame")
if(r) res <- c(res, l[i])
}
res
}
getDataFrames()

R: How to pass two filepaths as parameters to a function?

I'm trying to pass two file paths as parameters to a function. But it's not accepting the inputs. Here's what I'm doing:
partition<-function(d1,p2){
d1<-read.table(file = d1, fill = TRUE)
p2<-read.table(file = p2, fill = TRUE)
}
and while calling the function:
partition("samcopy.txt","partcopy.txt")
The .txt is not being read by the variables inside the function. How to make the variables read the table?
AidanGawronSki's approach works, but from a programming standpoint should be avoided! Here is a more traditional answer to your problem.
partition<-function(d1,p2){
a <- read.table(file = d1, fill = TRUE)
b <- read.table(file = p2, fill = TRUE)
res <- list(a,b)
names(res) <- c(d1,p2)
res
}
To understand why the above approach is "better", it is important to understand what environments are and more generally the R scoping rules. Environments are essentially your workspace. For example, when you first open R and begin assigning objects, these objects are stored within the Global Environment. Another example of an environment is when you call a function, the function creates it own environment comprised of any parameters you have passed to the function. By doing this R ensures that when you call a function, it has no "side effects" or said another way it does not affect the global environment.
Let me show you an example. Imagine you begin an R session, and assign d1 <- 1 in your Global Environment. You're going to want to use d1 later on in your analysis and it would be a shame if it changed without you knowing it, right?
If you utilize AidanGawronSki's approach when you call
partition<-function(d1,p2){
d1 <<- read.table(file = d1, fill = TRUE)
}
The d1 in your Global Environment will change to be read.table(file = d1, fill = TRUE). This is very very dangerous! A object you previously assigned to be one thing is now another thing and you are not even warned of this change.
The same problem, however, will never occur with the approach I have proposed. I strongly recommend you get in the habit of using this approach! If you don't any function can change things in your Global Environment without you knowing.
For more info read this, this or just google something like "functions with no side effects"
FYI there are also several other problems with your code. First you need to tell your function what to return. All you did is call a function, assign stuff to the local environment and then close the function. Functions will always return the last line (as long as it is not an assignment). This is why in my example, I put res as the last line of the function. Also you are not correctly assigning your object. You pass a string like d1 <- "text.txt", to your function and then ask your function to do the following, "text.txt" <- read.table("text.txt",...). That simply does not make sense. You need to assign the output from read.table to an object. In my example, I assign them to a and b.
use the super assignment operator <<-
partition<-function(d1,p2){
d1 <<- read.table(file = d1, fill = TRUE)
p2 <<- read.table(file = p2, fill = TRUE)
}

In R, getting the following error: "attempt to replicate an object of type 'closure'"

I am trying to write an R function that takes a data set and outputs the plot() function with the data set read in its environment. This means you don't have to use attach() anymore, which is good practice. Here's my example:
mydata <- data.frame(a = rnorm(100), b = rnorm(100,0,.2))
plot(mydata$a, mydata$b) # works just fine
scatter_plot <- function(ds) { # function I'm trying to create
ifelse(exists(deparse(quote(ds))),
function(x,y) plot(ds$x, ds$y),
sprintf("The dataset %s does not exist.", ds))
}
scatter_plot(mydata)(a, b) # not working
Here's the error I'm getting:
Error in rep(yes, length.out = length(ans)) :
attempt to replicate an object of type 'closure'
I tried several other versions, but they all give me the same error. What am I doing wrong?
EDIT: I realize the code is not too practical. My goal is to understand functional programming better. I wrote a similar macro in SAS, and I was just trying to write its counterpart in R, but I'm failing. I just picked this as an example. I think it's a pretty simple example and yet it's not working.
There are a few small issues. ifelse is a vectorized function, but you just need a simple if. In fact, you don't really need an else -- you could just throw an error immediately if the data set does not exist. Note that your error message is not using the name of the object, so it will create its own error.
You are passing a and b instead of "a" and "b". Instead of the ds$x syntax, you should use the ds[[x]] syntax when you are programming (fortunes::fortune(312)). If that's the way you want to call the function, then you'll have to deparse those arguments as well. Finally, I think you want deparse(substitute()) instead of deparse(quote())
scatter_plot <- function(ds) {
ds.name <- deparse(substitute(ds))
if (!exists(ds.name))
stop(sprintf("The dataset %s does not exist.", ds.name))
function(x, y) {
x <- deparse(substitute(x))
y <- deparse(substitute(y))
plot(ds[[x]], ds[[y]])
}
}
scatter_plot(mydata)(a, b)

Resources