Change a data frame in an outer scope in a function - r

In fun_outer I created an empty data frame df, and I want to add a new row to df via the inner function fun_inner:
fun_outer <- function() {
df <- data.frame()
fun_inner()
return(df)
}
fun_inner <- function(){
tmp <- data.frame(x = 1 ,y = 2)
df <<- rbind(df, tmp)
}
I would expect that executing fun_outer() could return a df like:
x y
1 2
But I actually got an error:
Error in rep_len(xi, nvar) (temp.R#355): attempt to replicate non-vector
Then I tried another approach:
fun_outer <- function() {
df <- data.frame()
fun_inner(df)
return(df)
}
fun_inner <- function(x){
tmp <- data.frame(x = 1 ,y = 2)
df <<- rbind(x, tmp)
}
And this time by executing fun_outer() I got another error:
Error in fun_inner(df) (temp.R#344): cannot change value of locked binding for 'df'
How can I create a data frame in an outer function, and bind row(s) to it using an inner scope function?
My intention was to use an iterator function inside a function A to append new data from each iteration to a data frame created inside function A

If a variable cannot be found in the current function it is looked up in the environment where the function was defined, not the environment from which it was called. <<- works the same way. What you want is the parent frame which is the caller.
fun_outer <- function() {
df <- data.frame()
fun_inner()
return(df)
}
fun_inner <- function(envir = parent.frame()){
tmp <- data.frame(x = 1 ,y = 2)
envir$df <- rbind(envir$df, tmp)
}
fun_outer()
## x y
## 1 1 2

Related

How do I make dataframes in a for loop in R?

I want to create dataframes in a for loop where every dataframe gets a value specified in a vector. It seems very simple but for some reason I cannot find the answer.
So what I want is something like this:
x <- c(1,2,3)
for (i in x) {
df_{{i}} <- ""
return df_i
}
The result I want is:
df_1
df_2
df_3
So df_{{i}} should be something else but I don't know what.
EDIT: I have solved my problem by creating a list of lists like this:
function_that_creates_model_output <- function(var) {
output_function <- list()
output_function$a <- df_a %>% something(var)
output_function$b <- df_b %>% something(var)
return(output_function)
}
meta_output <- list()
for (i in x) {
meta_output[[i]] <- function_that_creates_model_output(var = i)
}
One solution would be to use the function assign
x <- c(1,2,3)
for (i in x) {
assign(x = paste0("df_",i),value = NULL)
}

How can I make a loop that calls dataframes

I have the wrote the code below for a transformation of rows of a dataframe to colums
RowsToColums <- function(df)
{
model = list()
for(i in seq_along(df))
{
if(i>4)
{
dataf <- data.frame(names = df[1], Year=colnames(df[i]), index = df[,i:i])
names(dataf)[3]<- toString(df[[3]][2])
names(dataf)[1]<- "Country"
model[[i]] <- dataf
}
}
df <- do.call(rbind, model)
df <- arrange(df, Country)
}
EC_Pop <- RowsToColums(EC_Pop)
EC_GDP <- RowsToColums(EC_GDP)
EC_Inflation <- RowsToColums(EC_Inflation)
ST_Tech_Exp <- RowsToColums(ST_Tech_Exp)
ST_Res_Jour <- RowsToColums(ST_Res_Jour)
ST_Res_Exp <- RowsToColums(ST_Res_Exp)
ST_Res_Pop <- RowsToColums(ST_Res_Pop)
ED_Unempl <- RowsToColums(ED_Unempl)
ED_Edu_Exp <- RowsToColums(ED_Edu_Exp)
But as you can see, I call many times the same function.
I tried to move all these dataframes in a vector like this
list_a = list(EC_Pop,EC_GDP,EC_Inflation,ST_Tech_Exp,ST_Res_Exp)
for (i in seq_along(list_a))
{
list_a[i] <- RowsToColums(list_a[i])
}
write a loop that everytime take the dataframe but it fails with an error
UseMethod ("arrange_") error:
Inapplicable method for 'arrange_' applied to object of class "NULL"
Does anybody know how to fix this case?

R: can't add row to a dataframe from within a function

I have a function where I want it to add its result to a dataframe. However, this doesn't seem to work. If I try the following:
testdf <- data.frame(matrix(ncol = 1, nrow = 0))
testfunction <- function() {
testdf[nrow(testdf) + 1,] <- list("a")
}
testfunction()
testdf
[1] matrix.ncol...1..nrow...0.
<0 rows> (or 0-length row.names)
It doesn't add a row. But if I do what's in the testfunction() directly, it works:
testdf[nrow(testdf) + 1,] <- list("a")
testdf
matrix.ncol...1..nrow...0.
1 a
Why is this the case and how can I add a row of data to a dataframe from within a function?
Perhaps the best way to do this would be to pass your data frame to the function as a parameter, and then to return the modified data frame to the caller.
testfunction <- function(df) {
df[nrow(df) + 1,] <- list("a")
return(df)
}
testdf <- data.frame(matrix(ncol = 1, nrow = 0))
testdf <- testfunction(testdf)
testdf
You could also keep everything the same but use the global assignment operator <<-:
testfunction <- function() {
testdf[nrow(testdf) + 1,] <<- list("a") # generally bad
}
But there are potential caveats with doing this, and the first option I gave is preferable.

Forcing the use of a for loop with group_by and mutate()

I have a list of data frames (generated by the permutation order of an initial dataframe) to which I would like to apply complicated calculus using group_by_at() and mutate(). It works well with a single data frame but fail using a for loop since mutate requires the name of the dataframe and some of my calculus as well. So I thought, well, let's create a list of different dataframes all having the same name and loop over the initial sequence of names. Unfortunately the trick does not work and I get the following message:
Error: object of type 'closure' is not subsettable.
Here is the self contained example showing all my steps. I think the problem comes from mutate. So, how could I force the use of for loop with mutate?
data <- read.table(text = 'obs gender ageclass weight year subdata income
1 F 1 10 yearA sub1 1000
2 M 2 25 yearA sub1 1200
3 M 2 5 yearB sub2 1400
4 M 1 11 yearB sub1 1350',
header = TRUE)
library(dplyr)
library(GiniWegNeg)
dataA <- select(data, gender, ageclass)
dataB <- select(data, -gender, -ageclass)
rm(data)
# Generate permutation of indexes based on the number of column in dataA
library(combinat)
index <- permn(ncol(dataA))
# Attach dataA to the previous list of index
res <- lapply(index, function(x) dataA[x])
# name my list keeping track of permutation order in dataframe name
names(res) <- unlist(lapply(res,function(x) sprintf('data%s',paste0(toupper(substr(colnames(x),1,1)),collapse = ''))))
# Create a list containing the name of each data.frame name
NameList <- unlist(lapply(res,function(x) sprintf('data%s',paste0(toupper(substr(colnames(x),1,1)),collapse = ''))))
# Define as N the number of columns/permutation/dataframes
N <- length(res)
# Merge res and dataB for all permutation of dataframes
res <- lapply(res,function(x) cbind(x,dataB))
# Change the name of res so that all data frames are named data
names(res) <- rep("data", N)
# APPLY FOR LOOP TO ALL DATAFRAMES
for (j in NameList){
runCalc <- function(data, y){
data <- data %>%
group_by_at(1) %>%
mutate(Income_1 = weighted.mean(income, weight))
data <- data %>%
group_by_at(2) %>%
mutate(Income_2 = weighted.mean(income, weight))
gini <- c(Gini_RSV(data$Income_1, data$weight), Gini_RSV(data$Income_2,data$weight))
Gini <- data.frame(gini)
colnames(Gini) <- c("Income_1","Income_2")
rownames(Gini) <- c(paste0("Gini_", y))
return(Gini)
}
runOtherCalc <- function(df, y){
Contrib <- (1/5) * df$Income_1 + df$Income_2
Contrib <- data.frame(Contrib)
colnames(Contrib) <- c("myresult")
rownames(Contrib) <- c(paste0("Contrib_", y)
return(Contrib)
}
# Run runCalc over dataframe data by year
df1_List <- lapply(unique(data$year), function(i) {
byperiod <- subset(data, year == i)
runCalc(byperiod, i)
})
# runCalc returns df which then passes to runOtherCalc, again by year
df1_OtherList <- lapply(unique(data$year), function(i)
byperiod <- subset(data, year == i)
df <- runCalc(byperiod, i)
runOtherCalc(df, i)
})
# Run runCalc over dataframe data by subdata
df2_List <- lapply(unique(data$subdata), function(i) {
byperiod <- subset(data, subdata == i)
runCalc(bysubdata, i)
})
# runCalc returns df which then passes to runOtherCalc, again by subdata
df2_OtherList <- lapply(unique(data$subdata), function(i)
bysubdata <- subset(data, subdata == i)
df <- runCalc(bysubdata, i)
runOtherCalc(df, i)
})
# Return all results in separate frames, then append by row in 2 frames
Gini_df1 <- do.call(rbind, df1_List)
Contrib_df1 <- do.call(rbind,df1_OtherList)
Gini_df2 <- do.call(rbind, df1_List)
Contrib_df2 <- do.call(rbind,df1_OtherList)
Gini <- rbind(Gini_df1, Gini_df2)
Contrib <- rbind(Contrib_df1, Contrib_df2)
}
Admittedly, the R error you receive below is a bit cryptic but usually it means you are running an operation on an object that does not exist.
Error: object of type 'closure' is not subsettable.
Specifically, it comes with your lapply call as data is not defined anywhere globally (only within the runCalc method) and as above you remove it with rm(data).
dfList <- lapply(unique(data$year), function(i) {
byperiod <- subset(data, year == i)
runCalc(byperiod, i)
})
By, the way the use of lapply...unique...subset can be replaced with the underused grouping base R function, by().
Gathering from your text and code, I believe you intend to run a year grouping on each dataframe of your list, res. Then consider two by calls, wrapped in a larger function that receives as a parameter a dataframe, df. Then run lapply across all items of list to return a new list of nested dataframe pairs.
# SECONDARY FUNCTIONS
runCalc <- function(data) {
data <- data %>%
group_by_at(1) %>%
mutate(Income_1 = weighted.mean(income, weight))
data <- data %>%
group_by_at(2) %>%
mutate(Income_2 = weighted.mean(income, weight))
Gini <- data.frame(
year = data$year[[1]],
Income_1 = unname(Gini_RSV(data$Income_1, data$weight)),
Income_2 = unname(Gini_RSV(data$Income_2, data$weight)),
row.names = paste0("Gini_", data$year[[1]])
)
return(Gini)
}
runOtherCalc <- function(df){
Contrib <- data.frame(
myresult = (1/5) * df$Income_1 + df$Income_2,
row.names = paste0("Contrib_", df$year[[1]])
)
return(Contrib)
}
# PRIMARY FUNCTION
runDfOperations <- function(df) {
gList <- by(df, df$year, runCalc)
gTmp <- do.call(rbind, gList)
cList <- by(gTmp, gTmp$year, runOtherCalc)
cTmp <- do.call(rbind, cList)
gtmp$year <- NULL
return(list(gTmp, cTmp))
}
# RETURNS NESTED LIST OF TWO DFs FOR EACH ORIGINAL DF
new_res <- lapply(res, runDfOperations)
# SEPARATE LISTS IF NEEDED (EQUAL LENGTH)
Gini <- lapply(new_res, "[[", 1)
Contrib <- lapply(new_res, "[[", 2)

lapply() changing global variable in R

Using R, I wanted to save each variable's value when running lapply().
Below is what I tested now:
list_C <- list()
list_D <- list()
n <- 1
data_partition <- split(data, with(data, paste(A, B, sep=":")))
final_result <- lapply(data_partition,
function(dat) {
if(... condition ...) {
<Some R codes to run>
list_C[[n]] <- dat$C
list_D[[n]] <- dat$D
n <- n + 1
}
})
However, after running the code, 'n' remains just '1' and there's no change. How can I change the variable of 'n' to get the right saving lists of 'list_C' and 'list_D'?

Resources