R: can't add row to a dataframe from within a function - r

I have a function where I want it to add its result to a dataframe. However, this doesn't seem to work. If I try the following:
testdf <- data.frame(matrix(ncol = 1, nrow = 0))
testfunction <- function() {
testdf[nrow(testdf) + 1,] <- list("a")
}
testfunction()
testdf
[1] matrix.ncol...1..nrow...0.
<0 rows> (or 0-length row.names)
It doesn't add a row. But if I do what's in the testfunction() directly, it works:
testdf[nrow(testdf) + 1,] <- list("a")
testdf
matrix.ncol...1..nrow...0.
1 a
Why is this the case and how can I add a row of data to a dataframe from within a function?

Perhaps the best way to do this would be to pass your data frame to the function as a parameter, and then to return the modified data frame to the caller.
testfunction <- function(df) {
df[nrow(df) + 1,] <- list("a")
return(df)
}
testdf <- data.frame(matrix(ncol = 1, nrow = 0))
testdf <- testfunction(testdf)
testdf
You could also keep everything the same but use the global assignment operator <<-:
testfunction <- function() {
testdf[nrow(testdf) + 1,] <<- list("a") # generally bad
}
But there are potential caveats with doing this, and the first option I gave is preferable.

Related

Change a data frame in an outer scope in a function

In fun_outer I created an empty data frame df, and I want to add a new row to df via the inner function fun_inner:
fun_outer <- function() {
df <- data.frame()
fun_inner()
return(df)
}
fun_inner <- function(){
tmp <- data.frame(x = 1 ,y = 2)
df <<- rbind(df, tmp)
}
I would expect that executing fun_outer() could return a df like:
x y
1 2
But I actually got an error:
Error in rep_len(xi, nvar) (temp.R#355): attempt to replicate non-vector
Then I tried another approach:
fun_outer <- function() {
df <- data.frame()
fun_inner(df)
return(df)
}
fun_inner <- function(x){
tmp <- data.frame(x = 1 ,y = 2)
df <<- rbind(x, tmp)
}
And this time by executing fun_outer() I got another error:
Error in fun_inner(df) (temp.R#344): cannot change value of locked binding for 'df'
How can I create a data frame in an outer function, and bind row(s) to it using an inner scope function?
My intention was to use an iterator function inside a function A to append new data from each iteration to a data frame created inside function A
If a variable cannot be found in the current function it is looked up in the environment where the function was defined, not the environment from which it was called. <<- works the same way. What you want is the parent frame which is the caller.
fun_outer <- function() {
df <- data.frame()
fun_inner()
return(df)
}
fun_inner <- function(envir = parent.frame()){
tmp <- data.frame(x = 1 ,y = 2)
envir$df <- rbind(envir$df, tmp)
}
fun_outer()
## x y
## 1 1 2

Applying a Function to a Data Frame : lapply vs traditional way

I have this data frame in R:
x <- seq(1, 10,0.1)
y <- seq(1, 10,0.1)
data_frame <- expand.grid(x,y)
I also have this function:
some_function <- function(x,y) { return(x+y) }
Basically, I want to create a new column in the data frame based on "some_function". I thought I could do this with the "lapply" function in R:
data_frame$new_column <-lapply(c(data_frame$x, data_frame$y),some_function)
This does not work:
Error in `$<-.data.frame`(`*tmp*`, f, value = list()) :
replacement has 0 rows, data has 8281
I know how to do this in a more "clunky and traditional" way:
data_frame$new_column = x + y
But I would like to know how to do this using "lapply" - in the future, I will have much more complicated and longer functions that will be a pain to write out like I did above. Can someone show me how to do this using "lapply"?
Thank you!
When working within a data.frame you could use apply instead of lapply:
x <- seq(1, 10,0.1)
y <- seq(1, 10,0.1)
data_frame <- expand.grid(x,y)
head(data_frame)
some_function <- function(x,y) { return(x+y) }
data_frame$new_column <- apply(data_frame, 1, \(x) some_function(x["Var1"], x["Var2"]))
head(data_frame)
To apply a function to rows set MAR = 1, to apply a function to columns set MAR = 2.
lapply, as the name suggests, is a list-apply. As a data.frame is a list of columns you can use it to compute over columns but within rectangular data, apply is often the easiest.
If some_function is written for that specific purpose, it can be written to accept a single row of the data.frame as in
x <- seq(1, 10,0.1)
y <- seq(1, 10,0.1)
data_frame <- expand.grid(x,y)
head(data_frame)
some_function <- function(row) { return(row[1]+row[2]) }
data_frame$yet_another <- apply(data_frame, 1, some_function)
head(data_frame)
Final comment: Often functions written for only a pair of values come out as perfectly vectorized. Probably the best way to call some_function is without any function of the apply-familiy as in
some_function <- function(x,y) { return(x + y) }
data_frame$last_one <- some_function(data_frame$Var1, data_frame$Var2)

R How to transform a for loop with recursive lag into a function?

I can compute a recursive variable with a for loop like this:
df <- as.data.frame(cbind(1:10))
df$it <- NA
for(i in 1:length(df$V1)) {
df$it <- 0.5*df$V1 + dplyr::lag(df$it, default = 0)
}
df
But how can I do this with a function on the fly?
The following produces an error "C stack usage 15924032 is too close to the limit":
adstwm <- function(x){
0.5*x + adstwm(dplyr::lag(x, default = 0))
}
adstwm(df$V1)
I guess I need to define a stop for the process, but how?
You can use the cumulative sum to achieve the desired result. This sums all preceding values in a vector which is effectively the same as your recursive loop:
df$it_2 <- cumsum(0.5*df$V1)
If you do want to make recursive function, for example to address the comment below you can include an if statement that will make it stop:
function_it <- function(vector, length) {
if (length == 1) 0.5*vector[length]
else 0.5*vector[length] + 0.1*function_it(vector, length-1)
}
df$it <- NA
for(i in 1:length(df$V1)) {
df$it[i] <- function_it(df$V1,i)
}
df
However, you still need the for loop since the function is not vectorised so not sure if it really helps.
I can put the lagged for loop into the function without recursing the function:
df <- as.data.frame(cbind(1:10))
func.it2 <- function(x){
df$it2 <- NA
for(i in 1:length(df$V1)) {
df$it2 <- 0.5*df$V1 + 0.1*dplyr::lag(df$it2, default = 0)
}
df$it2
}
func.it2(df$V1)
df
df$it2 <- func.it2(df$V1)
df
That works. But why is df$it2 not available as a variable in df until I declare it (second last line) although it was already available in the for loop (line 5)?

Loop through df and create new df in R

I have a df (10 rows, 15 columns)
df<-data.frame(replicate(15,sample(0:1,10,rep=TRUE)))
I want to loop over each column, do something to each row and create a new df with the answer.
I actually want to do a linear regression on each column. I get back a list for each column. For example I have a second df with what I want to put into the lm. df2<-data.frame(replicate(2,sample(0:1,10,rep=TRUE)))
I then want to do something like:
new_df <- data.frame()
for (i in 1:ncol(df)){
j<-lm(df[,i] ~ df2$X1 + df2$X2)
temp_df<-j$residuals
new_df[,i]<-cbind(new_df,temp_df)
}
I get the error:
Error in data.frame(..., check.names = FALSE) : arguments imply
differing number of rows: 0, 8
I have checked other similar posts but they always seem to involve a function or something similarly complex for a newbie like me. Please help
This can be done without loops but for your understanding, using loops we can do
new_df <- df
for (i in names(df)) {
j<-lm(df[,i] ~ df$X1 + df$X2)
new_df[i] <- j$residuals
}
You are initialising an empty dataframe with 0 rows and 0 columns initially as new_df and hence when you are trying to assign the value to it, it gives you an error. Instead of that assign original df to new_df as they both are going to share the same structure and then use the above.
Update
Based on the new example
lst1 <- lapply(names(df), function(nm) {dat <- cbind(df[nm], df2[c('X1', 'X2')])
lm(paste0(nm, "~ X1 + X2"), data = dat)$residuals})
out <- setNames(data.frame(lst1), names(df))
Also, this doesn't need any loop
out2 <- lm(as.matrix(df) ~ X1 + X2, data = cbind(df, df2))$residuals
Old
We can do this easily without any loop
new_df <- df + 10
---
If we need a loop, it can be done with `lapply`
new_df <- df
new_df[] <- lapply(df, function(x) x + 10)
---
Or with a `for` loop
lst1 <- vector('list', ncol(df))
for(i in seq_along(df)) lst1[[i]] <- df[, i] + 10
new_df <- as.data.frame(lst1)
data
set.seed(24)
df <- data.frame(replicate(15,sample(0:1,10,rep=TRUE)))
df2 <- data.frame(replicate(2,sample(0:1,10,rep=TRUE)))
I would do as suggested by akrun. But if you do need (or want) to loop for some reasons you can use:
df<-data.frame(replicate(15,sample(0:1,10,rep=TRUE)))
new_df <- data.frame(replicate(15, rep(NA, 10)))
for (i in 1:ncol(df)){
new_df[ ,i] <- df[ , i] + 10
}

Passing variable names to function in R

I have a subset function that takes in an object of a user defined class, a condition passed to the function, and adds that condition as an attribute of the object.
subset.survey.data.frame <- function(x, condition, drop=FALSE, inside=FALSE) {
if(inside) {
condition_call <- deparse(substitute(condition, env=parent.frame(n=1)))
}
else {
condition_call <- substitute(condition)
}
x[["user_conditions"]] <- unique(c(x[["user_conditions"]],list(condition_call)))
cat("Subset Conditions have been added to SDF")
x
}
I can call this function as:
sdf <- subset.survey.data.frame(sdf,dsex =="Male")
This adds dsex == "Male" in user_conditions attribute.
However, if I want to call it from within another function and a loop, it passed v1 and v2, instead of the actual variable names.
for(i in 1:length(lvls)) {
v1 <- rhs_vars[1]
v2 <- lvls[i]
print(v1) #"dsex"
print(v2) #"Male"
dsdf <- subset.survey.data.frame(sdf, v1 == v2, inside=T)
How can I modify the subset function so that I can get the names of v1 and v2 and then add the condition to the object?
Here is what SDF, lvls, and rhs_vars looks like
sdf <- list(user_conditions = list(),default_conditions = list(default_conditions) ,data = data_Laf, weights=weights, pvvars=pvs, fileDescription = f)
Here, data_Laf is an LaF object (http://cran.r-project.org/web/packages/LaF/index.html), weights, pvs, and f are all lists.
rhs_vars <- rhs.vars(y ~ dsex + b017451) # from formula.tools package
> rhs_vars
[1] "dsex" "b017451"
lvls is the levels of a column in a dataframe
lvls <- levels(data[,rhs_vars[1]])
"Male" "Female"
Here is a working example:
default_conditions= quote(rptsamp=="Reporting sample")
sdf <- list(user_conditions = list(),default_conditions = list(default_conditions))
class(sdf) <- "Userdefined"
subset.survey.data.frame <- function(x, condition, drop=FALSE, inside=FALSE) {
if(inside) {
condition_call <- deparse(substitute(condition, env=parent.frame(n=1)))
}
else {
condition_call <- substitute(condition)
}
x[["user_conditions"]] <- unique(c(x[["user_conditions"]],list(condition_call)))
cat("Subset Conditions have been added to X")
x
}
sdf <- subset.survey.data.frame(sdf,dsex =="Male")
print(sdf)
#This gives the correct answer and adds dsex == "Male" to user conditions
#Creating some sample data
dsex =c('1','2','1','1','2','1','1','2','1')
b017451 <- sample(c(1:100), 9)
y <- rep(10, 9)
data <- data.frame(dsex, y, b017451)
data[,'dsex'] <- factor(data[,'dsex'], levels=c("1", "2"), labels=c('Male','Female'))
require(formula.tools)
rhs_vars <- rhs.vars(y ~ dsex + b017451)
lvls <- levels(data[,rhs_vars[1]])
for(i in 1:length(lvls)) {
v1 <- rhs_vars[1]
v2 <- lvls[i]
print(v1) #"dsex"
print(v2) #"Male"
dsdf <- subset.survey.data.frame(sdf, v1 == v2, inside=F)
print(dsdf)
#this doesnt give the correct answer and adds v1 == v2 to user conditions
break
}
As #nrussell was alluding to, substitute should help you build your expressions. Then you just need to evalute them. Here's a simple example
v1 <- quote(cyl)
v2 <- 6
eval(substitute(subset(mtcars, v1==v2), list(v1=v1, v2=v2)))
If your v1 is a character class, you can convert it to a symbol vi as.name() because you need a symbol and not a character for the expression to work.
v1 <- "cyl"
v2 <- 6
eval(substitute(subset(mtcars, v1==v2), list(v1=as.name(v1), v2=v2)))
If you're controlling the "inside" parameter, then isn't it as simple as:
if(inside) condition_call = call(substitute(condition[[1]]), as.name(condition[[2]]), condition[[3]])
This of course assumes people are only using binary conditions, but you can extend the above logic.

Resources