how to apply same function to multiple dataframes in R - r

I am applying the same function to multiple dataframes. For example, I want to merge the column2 and column3 in df1. After applying this function, the df1 will get a new column called col2_col3.
df1 <- data.frame(x = rep(3, 5), y = seq(1, 5, 1), ID = letters[1:5])
df2 <- data.frame(x = rep(5, 5), y = seq(2, 6, 1), ID = letters[6:10])
#I define a function:
PasteTwoColumn <- function(x)
{
x$col2_col3 <- paste(x[,2], x[,3], sep = "_")
return(x)
}
#apply the function to the df1, it works.
df1 <- PasteTwoColumn(df1)
# but I failed by an lappy function, because it returns a list, not the dataframe
mylist <- list(df1, df2)
result <- lapply(mylist, PasteTwoColumn)
I want to continue to apply this function to all my dataframes, eg. df1, df2, df3 ...df100. The output file should keep the same type of dataframe and the name. The lapply function does not work, because it returns a list, not the separate data frame.

We can keep the datasets in a list and loop over the list with lapply
lst1 <- lapply(list(df1, df2), PasteTwoColumn)
If there are many datasets, use mget to get the values of the datasets into a list
lst1 <- lapply(mget(paste0('df', 1:100)), PasteTwoColumn)
Or instead of paste, we can also use ls
lst1 <- lapply(mget(ls(pattern = '^df\\d+$')), PasteTwoColumn)
If we need to update the original object, use list2env
list2env(lst1, .GlobalEnv) #not recommended though
If we need to use a for loop
for(obj in paste0("df", 1:100)) {
assign(obj, PasteTwoColumn(get(obj)))
}

Related

How to dynamically complement colnames in a list of data.frames by information from a vector

I have a list of data.frames whereas the first column's colname in each data.frame is supposed to be complemented by dynamic information from a vector.
Example:
set.seed(1)
df1 <- data.frame(matrix(sample(32), ncol = 8))
names(df1) <- paste(rep(c("a", "b"), each = 4), 1:4, sep = "")
set.seed(2)
df2 <- data.frame(matrix(sample(32), ncol = 8))
names(df2) <- paste(rep(c("a", "b"), each = 4), 1:4, sep = "")
list_dfs <- list(df1, df2)
add_info <- c("add1", "other")
How can I add information from add_info to change the colname for a1 in df1 to "a1 add1" and a1 in df2 to "a1 add2" in a scalable way within the given list structure? The other colnames are not supposed to be changed.
I tried several approaches setting colnames using paste0 within lapply or a for loop and reviewed similar questions on SO but couldn't solve this problem so far.
You can do the following:
list_dfs <- lapply(1:length(list_dfs), function(i) {
setNames(list_dfs[[i]],paste(names(list_dfs[[i]]),add_info[[i]]))
})
Now the first dataframe in the list has its original name concatenated with the first element of add_info, the second has its names concatenated with second element of add_info. You can easily scale this to longer lists of data.frames and corresponding add_info-vectors.
Update:
If you only want to change the first name, do
list_dfs <- lapply(1:length(list_dfs), function(i) {
lastNames <- names(list_dfs[[i]])[2:NCOL(list_dfs[[i]])]
firstName <- paste(names(list_dfs[[i]])[1],add_info[[i]])
setNames(list_dfs[[i]],c(firstName,lastNames))
})

Renaming the same column in multiple dataframes

I have three dataframes with one variable but it is labelled differently in each. Is there a way to rename the variable by position (or just by the dataframes each having a single variable) across all three dataframes rather than doing it individually.
e.g I would like to rename the column in dfa, dfb, dfc to "Percentage"
dfa <- data.frame(x = c(45, 55))
dfb <- data.frame(y = c(60, 40))
dfc <- data.frame(z = c(30, 70))
I tried using a loop like below - why doesn't this work?
for (i in c(dfa, dfb, dfc)) {
names(i)[1] <- "Percentage"
}
Using the purrr and dplyr libraries:
library(purrr)
library(dplyr)
list2env(purrr::imap(list(dfa = dfa, dfb = dfb, dfc = dfc), ~ dplyr::rename(., Percentage = 1)), envir = .GlobalEnv)
Or using pipes you can write this as
list(dfa = dfa, dfb = dfb, dfc = dfc) %>%
purrr::imap(~ dplyr::rename(., Percentage = 1)) %>%
list2env(envir = .GlobalEnv)
How it works
If you put your dataframes into a list with the same names (ie dfa = dfa) then purrr::imap will apply a function over that list and preserve the names. The output of imap will be a list where element names are still dfa, dfb, dfc. This will be useful in step 3.
The function being mapped over the list is dplyr::rename, which you can use positionally. Here Percentage = 1 is renaming the first column to be "Percentage".
Lastly, list2env will unlist the dataframes into your global environment with the same names.
You could do something similar in base R:
# names(x)[1] <- "Percentage" renames first column
list_of_dfs <- lapply(list(dfa, dfb, dfc), function(x) {
names(x)[1] <- "Percentage"
x})
names(list_of_dfs) <- c("dfa", "dfb", "dfc")
list2env(list_of_dfs, envir = .GlobalEnv)

Coerce specific column to "double" within a dataframe list

Let's say I have a list of dataframes
myList <- list(df1 = data.frame(A = as.character(sample(10)), B =
rep(1:2, 10)), df2 = data.frame(A = as.character(sample(10)), B = rep(1:2, 10)) )
I want to coerce column A in each dataframe to double.
I'm trying:
myList = sapply(myList,simplify = FALSE, function(x){
x$A <- as.double(x$A) })
But this returns the coerced values, not even column with column names.
I also tried with dplyr and mutate_if, but with no success
We can use lapply with transform in base R
myList2 <- lapply(myList, transform, A = as.double(A))
Or use map with mutate from tidyverse
library(dplyr)
library(purrr)
myList2 <- map(myList, ~ .x %>%
mutate(A = as.double(A)))
The issue in the OP's code is that it is not returning the data i.e. 'x'.
myList2 <- sapply(myList, simplify = FALSE,
function(x){
x$A <- as.double(x$A)
x
})

R Paste List to Bind

data1 = data.frame("time" = c(1:10))
data2 = data.frame("time" = c(11:20))
data3 = data.frame("time" = c(21:30))
data4 = data.frame("time" = c(31:40))
rbind(data1, data2, data3, data4)
rbind(paste("'","data","'",1:4,sep=","))
I want to bind together a whole bunch of data frames but instead of spelling out all of them want to use paste functions. Here in my simple example you will see it doesn't work as desired but when I spell out the dataframes it works..
We can use mget on the pasted strings to return the values of the object names in a list and then rbind the elements with do.call
`row.names<-`(do.call(rbind, mget(paste0('data', 1:4))), NULL)
Or use pattern in ls
do.call(rbind, mget(ls(pattern = '^data\\d+$')))
With data.table, it would be rbindlist
library(data.table)
rbindlist(mget(paste0('data', 1:4)))

Renaming Several Columns in Data Frames Stored in a List Simultaneously

I have the following list, which contains several dataframes that all have the same column names:
my_list <- list(df1 = data.frame(A = c(1:3), B = c(4:6), C = c(7:9)),
df2 = data.frame(A = c(1:4), B = c(5:8), C = c(9:12)),
df3 = data.frame(A = c(1:5), B = c(6:10), C = c(11:15)))
Is there an efficient way to rename all of the column As in each data frame in the list simultaneously using base R functions?
I was thinking that something like
names(lapply(my_list, `[[`, "A")) <- "new_name"
may work, but I think I'm off track - the lapply function returns an object that might not work for what I'm trying to do.
Thanks!
A few more base options:
# rename first column name
lapply(my_list, function(x) setNames(x, replace(names(x), 1, "new_name_for_A")))
# rename column named "A"
lapply(my_list, function(x) setNames(x, replace(names(x), names(x) == "A", "new_name_for_A")))
# lowly for loop
for (i in seq_along(my_list)) {
names(my_list[[i]])[names(my_list[[i]]) == "A"] = "new_name_for_A"
}
We can use map to loop over the list and rename the column named 'A' to 'new_name" with rename_at
library(purrr)
library(dplyr)
map(my_list, ~ .x %>%
rename_at(vars("A"), ~ "new_name"))
Or with base R by making use of anonymous function call
lapply(my_list, function(x) {names(x)[names(x) == "A"] <- "new_name"; x})
How about
new.names = c('New', 'B', 'C')
lapply(my_list, `names<-`, new.names)
For the added example in your edit, you would simply change this to
new.names = sub('B', 'New', names(my_list[[1]]))

Resources