Object not found - nested function - R - r

I am still getting used with functions. I had a look in environments documentation but I can't figure out how to solve the error. Lets see what I tried until now:
I have a list of documents. Lets suppose it is "core"
library(dplyr)
table_1 <- data.frame(replicate(10,sample(0:1,1000,rep=TRUE)))
table_2 <- data.frame(replicate(10,sample(0:1,1000,rep=TRUE)))
core <- list(table_1, table_2)
Then, I have to run the function documents_ for each element of the list. This function gives some parameters to execute in another nested function:
documents_ <- function(i) {
core_processed <- as.data.frame(core[[i]])
x <- 1:nrow(core_processed)
y <- 1:ncol(core_processed)
temp <- sapply(x, function(x) mapply(calc_dens_,x,y))
return(temp)
}
Inside that, there is the function calc_dens, which is:
calc_dens_ <- function(x, y) {
core_temp <- core_processed %>%
filter(X2 == x & X3 == y)
return(core_temp)
}
Then, for iterate for each element of the list, I tried without success:
calc <- lapply(c(1:2), function(i) documents_(i))
Error in eval(lhs, parent, parent) : object 'core_processed' not found
The calc_dens function doesn't get the results of the documents_ (environment problem. Is there a way to solve this, or another better approach? My function is more complex than this, but the main elements are in this example. Thank you in advance.

As the other commenters have said, the problem is that you are referring to a variable, core_processed that is not in scope. You can make it a global variable, but it might be more sensible just to use it in a closure like this:
table_1 <- data.frame(replicate(10,sample(0:1,1000,rep=TRUE)))
table_2 <- data.frame(replicate(10,sample(0:1,1000,rep=TRUE)))
cores <- list(table_1, table_2)
documents_ <- function(core_processed) {
x <- 1:nrow(core_processed)
y <- 1:ncol(core_processed)
calc_dens <- function(x, y) core_processed %>% filter(X2 == x & X3 == y)
sapply(x, function(x) mapply(calc_dens, x, y))
}
calc <- lapply(cores, documents_)
If cores is a list of data frames, you do not need to to use as.data.frame and since you use lapply, there is no need to apply over indices and then index into the list. So the code I wrote here is simplified but does the same as your code.
I have to wonder, though, is this really what you want? The sapply over x and then mapply over x and y -- where x is the one from the sapply and not the ist you built in documents_ -- looks mighty strange to me.

Related

Loop with left join gives me an error " UseMethod ("left_join") error: No Applicable method for 'left_join' applied to object of class "list""

I try to create a loop that makes join to 5 dataframes like this
c <- list(EC_Pop, EC_GDP, EC_Inflation, ST_Tech_Exp, ST_Res_Jour)
for (i in seq_along(c))
{
if (i < 2)
{
EC_New <- c[i] %>%
left_join(c[i+1], by = c("Country","Year"))
}
else if(i > 1 & i < 4)
{
EC_New <- EC_New %>%
left_join(c[i+1], by = c("Country","Year"))
}
else
{
EC_New
}
}
But I have an error : UseMethod ("left_join") error: No Applicable method for 'left_join' applied to object of class "list"
Can somebody explain the reason? It seems very logical for me the way I wrote it...
According to the documentation of left_join, both x and y must be data frames.
Your c is a list, and so is c[i].
However, c[[i]] is a data frame. So change your code to include two square brackets.
EC_New <- c[[i]] %>%
left_join(c[[i+1]], by = c("Country","Year"))
I think you can also replace your code using Reduce:
EC_New2 <- Reduce(left_join, c)
Then check:
identical(EC_New, EC_New2) # should be TRUE
But I'm not sure since I don't have your data. It should work if the common columns are only "Country" and "Year".
And thanks to this answer, you can use the following command if the "Country" and "Year" are not the only common columns.
EC_New2 <- Reduce(function(x, y) left_join(x, y, by=c("Country","Year")), c)
By the way, try not to use function names such as c to name your R objects. While R allows this, it can lead to confusion later. For example, if you want to concatenate x and y but accidentally type c[x, y] instead of c(x, y), R may not return an error but something totally unexpected.

R - Creating function call within function using relational operator as variable

I am trying to write a function that will apply a user-specified binary operator (e.g. < ) to a raster object. To do so is fairly simple. For example:
selection <- raster::overlay(x = data, fun = function(x) {return(x < 2)}
My issue is that this code would be running within a function, with which I would like to specify both the binary operator and the criteria value (which is 2 in the example above) as variables. For example:
my.func <- function(data, binary_operator, value){
selection <- raster::overlay(x=data, fun=function(x) {x criteria value})
return(selection)
}
I have tried to construct the function as a call without success.
my.func <- function(data, binary_operator, value){
selection <- raster::overlay(x=data, fun=function(x) {call(sprintf("x %s %s", criteria, value))}
return(selection)
}
Is there a way to construct the call of the second function using variables in the first function?
Thanks for your help.
Write your code like this:
my.func <- function(data, binary_operator, value){
selection <- raster::overlay(x=data, fun=function(x) binary_operator(x, value))
return(selection)
}
You need to call this as
my.func(data, `<`, 2)
(with backticks for quotes). If you want to allow "<" for the operator, you could use do.call:
my.func <- function(data, binary_operator, value){
selection <- raster::overlay(x=data, fun=function(x)
do.call(binary_operator, list(x, value)))
return(selection)
}
This will work with either form of argument.
The example is probably simpler than the real case, but you in the example you use, it would be more direct to do:
selection <- data < 2

Error message when using lapply to apply a function to multiple dataframes in a list.

My dataset looks like this, and I have a list of data.
Plot_ID Canopy_infection_rate DAI
1 YO01 5 7
2 YO01 8 14
3 YO01 10 21
What I want to do is to apply a function called "audpc_Canopyinfactionrate" to a list of dataframes.
However, when I run lapply, I get an error as below:
Error in FUN(X[[i]], ...) : argument "DAI" is missing, with no default
I've checked my list that my data does not shift a column.
Does anyone know what's wrong with it? Thanks
Here is part of my code:
#Read files in to list
for(i in 1:length(files)) {
lst[[i]] <- read.delim(files[i], header = TRUE, sep=" ")
}
#Apply a function to the list
densities <- list()
densities<- lapply(lst, audpc_Canopyinfactionrate)
#canopy infection rate
audpc_Canopyinfactionrate <- function(Canopy_infection_rate,DAI){
n <- length(DAI)
meanvec <- matrix(-1,(n-1))
intvec <- matrix(-1,(n-1))
for(i in 1:(n-1)){
meanvec[i] <- mean(c(Canopy_infection_rate[i],
Canopy_infection_rate[i+1]))
intvec[i] <- DAI[i+1] - DAI[i]
}
infprod <- meanvec * intvec
sum(infprod)
}
As pointed out in the comments, the problem lies in the way you are using lapply.
This function is built up like this: lapply(X, FUN, ...). FUN is the name of a function used to apply to the elements in a data.frame/list called X. So far so good.
Back to your case: You want to apply a function audpc_Canopyinfactionrate() to all data frames in lst. This function takes two arguments. And I think this is where things got mixed up in your code. Make sure you understand that in the way you are using lapply, you use lst[[1]], lst[[2]], etc. as the only argument in audpc_Canopyinfactionrate(), whereas it actually requires two arguments!
If you reformulate your function a bit, you can use lst[[1]], lst[[2]] as the only argument to your function, because you know that argument contains the columns you need - Canopy_infection_rate and DAI:
audpc_Canopyinfactionrate <- function(df){
n <- nrow(df)
meanvec <- matrix(-1, (n-1))
intvec <- matrix(-1, (n-1))
for(i in 1:(n-1)){
meanvec[i] <- mean(c(df$Canopy_infection_rate[i],
df$Canopy_infection_rate[i+1]))
intvec[i] <- df$DAI[i+1] - df$DAI[i]
}
infprod <- meanvec * intvec
return(sum(infprod))
}
Call lapply in the following way:
lapply(lst, audpc_Canopyinfactionrate)
Note: lapply can also be used with more than 1 argument, by using the ... in lapply(X, FUN, ...). In your case, however, I think this is not the best option.

dplyr and overlapping variable names with surrounding environment

Let's say I have a (dplyr/tibble) data-frame/tbl constructed like so:
df <- data_frame(x = 1:10)
Now, I'd like to use this within a function that works with df via some dplyr verbs, like so:
myfun <- function(df, x) {
x <- doSomeStuffTo(x)
filter(df, x == x)
}
But this will always return the full df... I'm trying to figure out a way to implement scoping within a dplyr verb, something like:
filter_(df, ~x == x)
... which doesn't work, either. In some other languages, you might be able to achieve this via something like:
df.filter(this.x == x)
... where this refers to the df instance.
My only work-around so far is naming the function's variable like so:
myfun <- function(df, query_x) {
query_x <- doSomeStuffTo(query_x)
filter(df, x == query_x)
}
I suspect this is doable (without using a name like query_x) somehow with SE dplyr verbs (e.g. filter_), but I haven't stumbled upon the correct pattern yet. Anyone here have the answer?
To dynamically build different dplyr commands you typically use the standard evaluation versions of the functions (the ones with the underscores) and the lazyeval package. Here's how you could change your function
doSomeStuffTo <- function(x) {x+1}
myfun <- function(df, x) {
x <- doSomeStuffTo(x)
filter_(df, lazyeval::interp(~x == y, y=x))
}
df <- data_frame(x = 1:10)
myfun(df,3)
but even in the interp we can't have x==x because it's not clear which x you want to replace. Both filter(df, 3==x) and filter(df, x==3) work with dplyr. You can have constants or column names on either side of the equality.
If you use filter_ you can pass logical expressions via quote:
myfun <- function(df, t) {
df$x <- 5*df$x
filter_(df, t )
}
> myfun(df, t= quote(x < 25) )
# A tibble: 4 x 1
x
<dbl>
1 5
2 10
3 15
4 20
I stumbled into the same issue. Instead of wrangling with even more complex evaluations, it's usually easier to just rename the function argument. Like this:
myfun <- function(df, x) {
x_ <- doSomeStuffTo(x)
filter(df, x == x_)
}
This solution is still dangerous because we might hit another variable called x_. One can be defensive about this by checking the variable names in df and making sure to pick one that isn't there. Or more lazily, one can use very implausible variable names. I often use stuff like _____temp.
Maybe the new dplyr 0.6.0 evaluation system will handle this better. See the notes about the new system, tidyeval.

Fixing a function to find and remove outliers from the dataset

I am trying to make a simple function which will find and remove outliers automatically. This is the function I have created so far:
fOutlier <- function(x, y) {
outlier <- with(x, boxplot.stats(y)$out)
subset(x, !(y %in% outlier))
}
data <- fOutlier(data, variable)
The problem is that the function does not read x as dataset name. It works if I use the following:
data <- fOutlier(data, data$variable)
Non-standard evaluation seems to be the culprit.
This is what I would personally do.
set.seed(1)
# mock data set
d<-data.frame(var1=rnorm(1000,500,50),
var2=rnorm(1000,1000,100),
var3=rnorm(1000,1000,100),
var4=rnorm(1000,1000,100))
fOutlier<-function(dat, var_name){
var_vec<-dat[,var_name]
outliers<-boxplot.stats(var_vec)$out
clean_dat<-dat[!(var_vec %in% outliers),]
}
# test with different variables
d_var1_clean<-fOutlier(d, 'var1')
d_var2_clean<-fOutlier(d, 'var2')
d_var3_clean<-fOutlier(d, 'var3')
If you really like the non-standard evaluation, then you can add eval() and substitute() to maintain this functionality.
This function is a workable version of what you posted (note the creation of y_vec):
fOutlier2 <- function(x, y) {
y_vec<-eval(substitute(y),eval(x))
outlier <- boxplot.stats(y_vec)$out
subset(x, !(y_vec %in% outlier))
}
d_var1_clean2<-fOutlier2(d, var1)

Resources