function fundamentals - r

I am attempting to write a function to add a suffix to variable names in a data frame.
The code I want to turn into a function: colnames(dataframe) <- paste0(colnames(dataframe), "_suffix")
My function:
rename <- function(x, y) {colnames(x) <- paste0(colnames(x), y)}
When I call the function on the data frame I would like to append with a suffix, the data frame does not change. I am sure I am missing some fundamental understanding of how the function should work to append the data frame column names and preserve the data frame name.

Try this solution:
df <- data.frame(x = c(1, 2, 3), y = c(4, 5, 6))
rename <- function(x, y) {
colnames(x) <- paste0(colnames(x), y)
return(x)
}
As suggested above, you would overwrite the original data frame like so:
df <- rename(df, "_suffix")

Related

How to write a function with an unspecified number of arguments where the arguments are column names

I am trying to write a function with an unspecified number of arguments using ... but I am running into issues where those arguments are column names. As a simple example, if I want a function that takes a data frame and uses within() to make a new column that is several other columns pasted together, I would intuitively write it as
example.fun <- function(input,...){
res <- within(input,pasted <- paste(...))
res}
where input is a data frame and ... specifies column names. This gives an error saying that the column names cannot be found (they are treated as objects). e.g.
df <- data.frame(x = c(1,2),y=c("a","b"))
example.fun(df,x,y)
This returns "Error in paste(...) : object 'x' not found "
I can use attach() and detach() within the function as a work around,
example.fun2 <- function(input,...){
attach(input)
res <- within(input,pasted <- paste(...))
detach(input)
res}
This works, but it's clunky and runs into issues if there happens to be an object in the global environment that is called the same thing as a column name, so it's not my preference.
What is the correct way to do this?
Thanks
1) Wrap the code in eval(substitute(...code...)) like this:
example.fun <- function(data, ...) {
eval(substitute(within(data, pasted <- paste(...))))
}
# test
df <- data.frame(x = c(1, 2), y = c("a", "b"))
example.fun(df, x, y)
## x y pasted
## 1 1 a 1 a
## 2 2 b 2 b
1a) A variation of that would be:
example.fun.2 <- function(data, ...) {
data.frame(data, pasted = eval(substitute(paste(...)), data))
}
example.fun.2(df, x, y)
2) Another possibility is to convert each argument to a character string and then use indexing.
example.fun.3 <- function(data, ...) {
vnames <- sapply(substitute(list(...))[-1], deparse)
data.frame(data, pasted = do.call("paste", data[vnames]))
}
example.fun.3(df, x, y)
3) Other possibilities are to change the design of the function and pass the variable names as a formula or character vector.
example.fun.4 <- function(data, formula) {
data.frame(data, pasted = do.call("paste", get_all_vars(formula, data)))
}
example.fun.4(df, ~ x + y)
example.fun.5 <- function(data, vnames) {
data.frame(data, pasted = do.call("paste", data[vnames]))
}
example.fun.5(df, c("x", "y"))

Applying a Function to a Data Frame : lapply vs traditional way

I have this data frame in R:
x <- seq(1, 10,0.1)
y <- seq(1, 10,0.1)
data_frame <- expand.grid(x,y)
I also have this function:
some_function <- function(x,y) { return(x+y) }
Basically, I want to create a new column in the data frame based on "some_function". I thought I could do this with the "lapply" function in R:
data_frame$new_column <-lapply(c(data_frame$x, data_frame$y),some_function)
This does not work:
Error in `$<-.data.frame`(`*tmp*`, f, value = list()) :
replacement has 0 rows, data has 8281
I know how to do this in a more "clunky and traditional" way:
data_frame$new_column = x + y
But I would like to know how to do this using "lapply" - in the future, I will have much more complicated and longer functions that will be a pain to write out like I did above. Can someone show me how to do this using "lapply"?
Thank you!
When working within a data.frame you could use apply instead of lapply:
x <- seq(1, 10,0.1)
y <- seq(1, 10,0.1)
data_frame <- expand.grid(x,y)
head(data_frame)
some_function <- function(x,y) { return(x+y) }
data_frame$new_column <- apply(data_frame, 1, \(x) some_function(x["Var1"], x["Var2"]))
head(data_frame)
To apply a function to rows set MAR = 1, to apply a function to columns set MAR = 2.
lapply, as the name suggests, is a list-apply. As a data.frame is a list of columns you can use it to compute over columns but within rectangular data, apply is often the easiest.
If some_function is written for that specific purpose, it can be written to accept a single row of the data.frame as in
x <- seq(1, 10,0.1)
y <- seq(1, 10,0.1)
data_frame <- expand.grid(x,y)
head(data_frame)
some_function <- function(row) { return(row[1]+row[2]) }
data_frame$yet_another <- apply(data_frame, 1, some_function)
head(data_frame)
Final comment: Often functions written for only a pair of values come out as perfectly vectorized. Probably the best way to call some_function is without any function of the apply-familiy as in
some_function <- function(x,y) { return(x + y) }
data_frame$last_one <- some_function(data_frame$Var1, data_frame$Var2)

Assigning a variable to pasted name of column in R

I have a few data frames with the names:
Meanplots1,
Meanplots2,
Meanplots3 etc.
I am trying to write a for loop to do a series of equations on each data frame.
I am attempting to use the paste0 function.
What I want to happen is for x to be a column of each data set. So the code should work like this line:
x <- Meanplots1$PAR
However, since I want to put this in a for loop I want to format it like this:
for (i in 1:3){
x <- paste0("Meanplots",i,"$PAR")
Dmodel <- nls(y ~ ((a*x)/(b + x )) - c, data = dat, start = list(a=a,b=b,c=c))
}
What this does is it assigns x to the list "Meanplots1$PAR" not the actual column. Any idea on how to fix this?
We can get all the data.frame in a list with mget
lst1 <- mget(ls(pattern = '^MeanPlots\\d+$'))
then loop over the list with lapply and apply the model
DmodelLst <- lapply(lst1, function(dat) nls(y ~ ((a* PAR)/(b + PAR )) - c,
data = dat, start = list(a=a,b=b,c=c)))
Replace 'x' with the column name 'PAR'.
In the OP's loop, create a NULL list to store the output ('Outlst'), get the value of the object from paste0, then apply the formula with the unquoted column name i.e. 'PAR'
Outlst <- vector("list", 3)
ndat <- data.frame(x = seq(0,2000,100))
for(i in 1:3) {
dat <- get(paste0("MeanPlots", i))
modeltmp <- nls(y ~ ((a*PAR)/(b + PAR )) - c,
data = dat, start = list(a=a,b=b,c=c))
MD <- data.frame(predict(modeltmp, newdata = ndat))
MD[,2] <- ndat$x
names(MD) <- c("Photo","PARi")
Outlst[[i]] <- MD
}
Now, we extract the output of each list element
Outlst[[1]]
Outlst[[2]]
instead of creating multiple objects in the global environment

Create variables in data frame with for loop

I have a main data frame (mydata) and two secondary ones (df1, df2) such as follows:
x <- c(1, 2, 3, 4, 5)
y <- c(5, 4, 3, 2, 1)
mydata <- data.frame(x)
df1 <- data.frame(y)
df2 <- data.frame(y)
df2$y <- y+1 #This way, the columns in the df have the same name but different values.
I want to create new columns in mydata based on a formula with the variables in df1 and df2 like this:
mydata$new1 <- mydata$x*df1$y
mydata$new2 <- mydata$x*df2$y
Is there a way I can do this with a for loop? This is what I had in mind:
for (i in 2) {
mydata$paste0("new", i) <- mydata$x*dfpaste0(i)$y
}
Something along the lines of:
for (i in 1:2) {
mydata[[as.symbol(paste0('new', i))]] <- mydata$x*get(paste0("df", i))$y
}
We could also use mget to get all the object values in a list and multiply with the concerned vector
mydata[paste0("new", 1:2)] <- mydata$x * data.frame(mget(paste0("df", 1:2)))

Apply a user defined function to a list of data frames

I have a series of data frames structured similarly to this:
df <- data.frame(x = c('notes','year',1995:2005), y = c(NA,'value',11:21))
df2 <- data.frame(x = c('notes','year',1995:2005), y = c(NA,'value',50:60))
In order to clean them I wrote a user defined function with a set of cleaning steps:
clean <- function(df){
colnames(df) <- df[2,]
df <- df[grep('^[0-9]{4}', df$year),]
return(df)
}
I'd now like to put my data frames in a list:
df_list <- list(df,df2)
and clean them all at once. I tried
lapply(df_list, clean)
and
for(df in df_list){
clean(df)
}
But with both methods I get the error:
Error in df[2, ] : incorrect number of dimensions
What's causing this error and how can I fix it? Is my approach to this problem wrong?
You are close, but there is one problem in code. Since you have text in your dataframe's columns, the columns are created as factors and not characters. Thus your column naming does not provide the expected result.
#need to specify strings to factors as false
df <- data.frame(x = c('notes','year',1995:2005), y = c(NA,'value',11:21), stringsAsFactors = FALSE)
df2 <- data.frame(x = c('notes','year',1995:2005), y = c(NA,'value',50:60), stringsAsFactors = FALSE)
clean <- function(df){
colnames(df) <- df[2,]
#need to specify the column to select the rows
df <- df[grep('^[0-9]{4}', df$year),]
#convert the columns to numeric values
df[, 1:ncol(df)] <- apply(df[, 1:ncol(df)], 2, as.numeric)
return(df)
}
df_list <- list(df,df2)
lapply(df_list, clean)

Resources