I have three dataframes with one variable but it is labelled differently in each. Is there a way to rename the variable by position (or just by the dataframes each having a single variable) across all three dataframes rather than doing it individually.
e.g I would like to rename the column in dfa, dfb, dfc to "Percentage"
dfa <- data.frame(x = c(45, 55))
dfb <- data.frame(y = c(60, 40))
dfc <- data.frame(z = c(30, 70))
I tried using a loop like below - why doesn't this work?
for (i in c(dfa, dfb, dfc)) {
names(i)[1] <- "Percentage"
}
Using the purrr and dplyr libraries:
library(purrr)
library(dplyr)
list2env(purrr::imap(list(dfa = dfa, dfb = dfb, dfc = dfc), ~ dplyr::rename(., Percentage = 1)), envir = .GlobalEnv)
Or using pipes you can write this as
list(dfa = dfa, dfb = dfb, dfc = dfc) %>%
purrr::imap(~ dplyr::rename(., Percentage = 1)) %>%
list2env(envir = .GlobalEnv)
How it works
If you put your dataframes into a list with the same names (ie dfa = dfa) then purrr::imap will apply a function over that list and preserve the names. The output of imap will be a list where element names are still dfa, dfb, dfc. This will be useful in step 3.
The function being mapped over the list is dplyr::rename, which you can use positionally. Here Percentage = 1 is renaming the first column to be "Percentage".
Lastly, list2env will unlist the dataframes into your global environment with the same names.
You could do something similar in base R:
# names(x)[1] <- "Percentage" renames first column
list_of_dfs <- lapply(list(dfa, dfb, dfc), function(x) {
names(x)[1] <- "Percentage"
x})
names(list_of_dfs) <- c("dfa", "dfb", "dfc")
list2env(list_of_dfs, envir = .GlobalEnv)
Related
Let's say I have a list of dataframes
myList <- list(df1 = data.frame(A = as.character(sample(10)), B =
rep(1:2, 10)), df2 = data.frame(A = as.character(sample(10)), B = rep(1:2, 10)) )
I want to coerce column A in each dataframe to double.
I'm trying:
myList = sapply(myList,simplify = FALSE, function(x){
x$A <- as.double(x$A) })
But this returns the coerced values, not even column with column names.
I also tried with dplyr and mutate_if, but with no success
We can use lapply with transform in base R
myList2 <- lapply(myList, transform, A = as.double(A))
Or use map with mutate from tidyverse
library(dplyr)
library(purrr)
myList2 <- map(myList, ~ .x %>%
mutate(A = as.double(A)))
The issue in the OP's code is that it is not returning the data i.e. 'x'.
myList2 <- sapply(myList, simplify = FALSE,
function(x){
x$A <- as.double(x$A)
x
})
I am applying the same function to multiple dataframes. For example, I want to merge the column2 and column3 in df1. After applying this function, the df1 will get a new column called col2_col3.
df1 <- data.frame(x = rep(3, 5), y = seq(1, 5, 1), ID = letters[1:5])
df2 <- data.frame(x = rep(5, 5), y = seq(2, 6, 1), ID = letters[6:10])
#I define a function:
PasteTwoColumn <- function(x)
{
x$col2_col3 <- paste(x[,2], x[,3], sep = "_")
return(x)
}
#apply the function to the df1, it works.
df1 <- PasteTwoColumn(df1)
# but I failed by an lappy function, because it returns a list, not the dataframe
mylist <- list(df1, df2)
result <- lapply(mylist, PasteTwoColumn)
I want to continue to apply this function to all my dataframes, eg. df1, df2, df3 ...df100. The output file should keep the same type of dataframe and the name. The lapply function does not work, because it returns a list, not the separate data frame.
We can keep the datasets in a list and loop over the list with lapply
lst1 <- lapply(list(df1, df2), PasteTwoColumn)
If there are many datasets, use mget to get the values of the datasets into a list
lst1 <- lapply(mget(paste0('df', 1:100)), PasteTwoColumn)
Or instead of paste, we can also use ls
lst1 <- lapply(mget(ls(pattern = '^df\\d+$')), PasteTwoColumn)
If we need to update the original object, use list2env
list2env(lst1, .GlobalEnv) #not recommended though
If we need to use a for loop
for(obj in paste0("df", 1:100)) {
assign(obj, PasteTwoColumn(get(obj)))
}
data1 = data.frame("time" = c(1:10))
data2 = data.frame("time" = c(11:20))
data3 = data.frame("time" = c(21:30))
data4 = data.frame("time" = c(31:40))
rbind(data1, data2, data3, data4)
rbind(paste("'","data","'",1:4,sep=","))
I want to bind together a whole bunch of data frames but instead of spelling out all of them want to use paste functions. Here in my simple example you will see it doesn't work as desired but when I spell out the dataframes it works..
We can use mget on the pasted strings to return the values of the object names in a list and then rbind the elements with do.call
`row.names<-`(do.call(rbind, mget(paste0('data', 1:4))), NULL)
Or use pattern in ls
do.call(rbind, mget(ls(pattern = '^data\\d+$')))
With data.table, it would be rbindlist
library(data.table)
rbindlist(mget(paste0('data', 1:4)))
I have multiple .csv files (mydata_1, mydata_2,...) with the same amount of columns and column names(, different row lengths if that helps finding an answer). After reading them into my environment they have the class data.frame . I was putting them all in a list and now want to select specific columns by name from all of them, resulting in in the same variable name with just the chosen columns.
mydata_1 = matrix(c(1:21), nrow=3, ncol=7,byrow = TRUE)
mydata_2 = matrix(c(1:21), nrow=3, ncol=7,byrow = TRUE)
colnames(mydata_1) = c(paste0("X","1":"7"))
colnames(mydata_2) = c(paste0("X","1":"7"))
df1 = as.data.frame(mydata_1)
df2 = as.data.frame(mydata_2)
all_data = c(df1, df2)
class(all_data)
class(df1)
for (i in all_data){
i = select(i,"X3":"X5")
}
My for command shall output the data.frames df1 and df2 with just three columns (instead of the prior seven), but when running the code an error message regarding the select command appears.
Error in UseMethod("select_") :
no applicable method for 'select_' applied to an object of class "c('integer', 'numeric')"
How can I get an working output of my new dfs?
The first issue here is that your are trying to create a list using c(df1, df2), while you have to use list(df1, df2)
Data
library(dplyr)
library(purrr)
mydata_1 = matrix(c(1:21), nrow=3, ncol=7,byrow = TRUE)
mydata_2 = matrix(c(1:21), nrow=3, ncol=7,byrow = TRUE)
colnames(mydata_1) = c(paste0("X","1":"7"))
colnames(mydata_2) = c(paste0("X","1":"7"))
df1 = as.data.frame(mydata_1)
df2 = as.data.frame(mydata_2)
all_data = list(df1 = df1, df2 = df2)
The second problem is within your loop. look, in this approach you have to create an empty list before running the loop, and then aggregate elements in each iteration.
all_data2 <- list()
for(i in 1:length(all_data)) {
all_data2[[i]] <- all_data[[i]] %>% select(X3, X4, X5)
}
try using map from purrr which is part of the tidyverse package and lead to a cleaner code with the same result.
# Down here the `.x` is replaced by each element of the list all_data
# in each iteration, ending wiht a list of two data frames
all_data2 = map(all_data, ~.x %>%
select(X3, X4, X5))
Consider base R's subset with select argument for contiguous column selection, wrapped in an lapply call. Unlike for loop, lapply does not require the bookkeeping to reassign each element back into a list:
all_data <- list(df1 = df1, df2 = df2)
all_data_sub <- lapply(all_data, function(df) subset(df, select=X3:X5))
Suppose I have the following data frames in a list:
df1 <- data.frame(x = runif(3), y = runif(3))
df2 <- data.frame(x = runif(3), y = runif(3))
df.list <- list(df1, df2)
Now suppose I want to add column x and y to get column z
I know to do this in a dataframe with mutate is as easy as:
dplyr::mutate(lapply(df.list, z = x + y))
How do I perform operations on multiple columns in a list using lapply?
We can use transform with lapply
lapply(df.list, transform, z= x+y)
If we need to do this for multiple columns,
lapply(df.list, transform, z= x+y, w= x*y)
Another option would be using library(purr) (from the authors of dplyr)
library(dplyr)
library(purrr)
df.list %>%
map(mutate, z=x+y, w= z*y)