data1 = data.frame("time" = c(1:10))
data2 = data.frame("time" = c(11:20))
data3 = data.frame("time" = c(21:30))
data4 = data.frame("time" = c(31:40))
rbind(data1, data2, data3, data4)
rbind(paste("'","data","'",1:4,sep=","))
I want to bind together a whole bunch of data frames but instead of spelling out all of them want to use paste functions. Here in my simple example you will see it doesn't work as desired but when I spell out the dataframes it works..
We can use mget on the pasted strings to return the values of the object names in a list and then rbind the elements with do.call
`row.names<-`(do.call(rbind, mget(paste0('data', 1:4))), NULL)
Or use pattern in ls
do.call(rbind, mget(ls(pattern = '^data\\d+$')))
With data.table, it would be rbindlist
library(data.table)
rbindlist(mget(paste0('data', 1:4)))
Related
I have three dataframes with one variable but it is labelled differently in each. Is there a way to rename the variable by position (or just by the dataframes each having a single variable) across all three dataframes rather than doing it individually.
e.g I would like to rename the column in dfa, dfb, dfc to "Percentage"
dfa <- data.frame(x = c(45, 55))
dfb <- data.frame(y = c(60, 40))
dfc <- data.frame(z = c(30, 70))
I tried using a loop like below - why doesn't this work?
for (i in c(dfa, dfb, dfc)) {
names(i)[1] <- "Percentage"
}
Using the purrr and dplyr libraries:
library(purrr)
library(dplyr)
list2env(purrr::imap(list(dfa = dfa, dfb = dfb, dfc = dfc), ~ dplyr::rename(., Percentage = 1)), envir = .GlobalEnv)
Or using pipes you can write this as
list(dfa = dfa, dfb = dfb, dfc = dfc) %>%
purrr::imap(~ dplyr::rename(., Percentage = 1)) %>%
list2env(envir = .GlobalEnv)
How it works
If you put your dataframes into a list with the same names (ie dfa = dfa) then purrr::imap will apply a function over that list and preserve the names. The output of imap will be a list where element names are still dfa, dfb, dfc. This will be useful in step 3.
The function being mapped over the list is dplyr::rename, which you can use positionally. Here Percentage = 1 is renaming the first column to be "Percentage".
Lastly, list2env will unlist the dataframes into your global environment with the same names.
You could do something similar in base R:
# names(x)[1] <- "Percentage" renames first column
list_of_dfs <- lapply(list(dfa, dfb, dfc), function(x) {
names(x)[1] <- "Percentage"
x})
names(list_of_dfs) <- c("dfa", "dfb", "dfc")
list2env(list_of_dfs, envir = .GlobalEnv)
I know how to manually merge specific columns of a dataframe into a single column:
df_new <- data.frame(paste(df$a, df$b, df$c))
My question is how can I do this dynamically with all of the dataframe's columns?
You can use do.call: ‘do.call’ constructs and executes a function call from a name or a function and a list of arguments to be passed to it.
do.call(paste, df)
A solution from the tidyverse could be tidyr::unite():
df <- data.frame(x = letters[1:4], y = LETTERS[1:4], z = 1:4)
df_new <- tidyr::unite(df, col = "union", sep = " ")
where col is the name of the newly constructed column in the dataframe. sep is equivalent to its use in paste.
I am applying the same function to multiple dataframes. For example, I want to merge the column2 and column3 in df1. After applying this function, the df1 will get a new column called col2_col3.
df1 <- data.frame(x = rep(3, 5), y = seq(1, 5, 1), ID = letters[1:5])
df2 <- data.frame(x = rep(5, 5), y = seq(2, 6, 1), ID = letters[6:10])
#I define a function:
PasteTwoColumn <- function(x)
{
x$col2_col3 <- paste(x[,2], x[,3], sep = "_")
return(x)
}
#apply the function to the df1, it works.
df1 <- PasteTwoColumn(df1)
# but I failed by an lappy function, because it returns a list, not the dataframe
mylist <- list(df1, df2)
result <- lapply(mylist, PasteTwoColumn)
I want to continue to apply this function to all my dataframes, eg. df1, df2, df3 ...df100. The output file should keep the same type of dataframe and the name. The lapply function does not work, because it returns a list, not the separate data frame.
We can keep the datasets in a list and loop over the list with lapply
lst1 <- lapply(list(df1, df2), PasteTwoColumn)
If there are many datasets, use mget to get the values of the datasets into a list
lst1 <- lapply(mget(paste0('df', 1:100)), PasteTwoColumn)
Or instead of paste, we can also use ls
lst1 <- lapply(mget(ls(pattern = '^df\\d+$')), PasteTwoColumn)
If we need to update the original object, use list2env
list2env(lst1, .GlobalEnv) #not recommended though
If we need to use a for loop
for(obj in paste0("df", 1:100)) {
assign(obj, PasteTwoColumn(get(obj)))
}
I have multiple .csv files (mydata_1, mydata_2,...) with the same amount of columns and column names(, different row lengths if that helps finding an answer). After reading them into my environment they have the class data.frame . I was putting them all in a list and now want to select specific columns by name from all of them, resulting in in the same variable name with just the chosen columns.
mydata_1 = matrix(c(1:21), nrow=3, ncol=7,byrow = TRUE)
mydata_2 = matrix(c(1:21), nrow=3, ncol=7,byrow = TRUE)
colnames(mydata_1) = c(paste0("X","1":"7"))
colnames(mydata_2) = c(paste0("X","1":"7"))
df1 = as.data.frame(mydata_1)
df2 = as.data.frame(mydata_2)
all_data = c(df1, df2)
class(all_data)
class(df1)
for (i in all_data){
i = select(i,"X3":"X5")
}
My for command shall output the data.frames df1 and df2 with just three columns (instead of the prior seven), but when running the code an error message regarding the select command appears.
Error in UseMethod("select_") :
no applicable method for 'select_' applied to an object of class "c('integer', 'numeric')"
How can I get an working output of my new dfs?
The first issue here is that your are trying to create a list using c(df1, df2), while you have to use list(df1, df2)
Data
library(dplyr)
library(purrr)
mydata_1 = matrix(c(1:21), nrow=3, ncol=7,byrow = TRUE)
mydata_2 = matrix(c(1:21), nrow=3, ncol=7,byrow = TRUE)
colnames(mydata_1) = c(paste0("X","1":"7"))
colnames(mydata_2) = c(paste0("X","1":"7"))
df1 = as.data.frame(mydata_1)
df2 = as.data.frame(mydata_2)
all_data = list(df1 = df1, df2 = df2)
The second problem is within your loop. look, in this approach you have to create an empty list before running the loop, and then aggregate elements in each iteration.
all_data2 <- list()
for(i in 1:length(all_data)) {
all_data2[[i]] <- all_data[[i]] %>% select(X3, X4, X5)
}
try using map from purrr which is part of the tidyverse package and lead to a cleaner code with the same result.
# Down here the `.x` is replaced by each element of the list all_data
# in each iteration, ending wiht a list of two data frames
all_data2 = map(all_data, ~.x %>%
select(X3, X4, X5))
Consider base R's subset with select argument for contiguous column selection, wrapped in an lapply call. Unlike for loop, lapply does not require the bookkeeping to reassign each element back into a list:
all_data <- list(df1 = df1, df2 = df2)
all_data_sub <- lapply(all_data, function(df) subset(df, select=X3:X5))
Say I have multiple data frames which all have identical vector names and I'd like to cbind all which have a commmon pattern. So for these 3 data frames:
df.1 <- data.frame(column1 = factor(sample(c("Male","Female"), 10, replace=TRUE)),
speed=runif(10))
df.2 <- data.frame(column1 = factor(sample(c("Male","Female"), 10, replace=TRUE)),
speed=runif(10))
df.3 <- data.frame(column1 = factor(sample(c("Male","Female"), 10, replace=TRUE)),
speed = runif(10))
I would like to rbind everything with the common pattern "df.*"
I have tried creating a list and then creating a data-frame from this using:
temp <- lapply(ls(pattern = "df.*"), get)
temp2<- as.data.frame(temp)
However this only produces a data frame of 6 columns effectively cbinding the whole thing rather than rbinding.
We can use ls with mget
library(data.table)
rbindlist(mget(ls(pattern = "^df\\.\\d+")))
Or with dplyr
library(dplyr)
mget(ls(pattern="^df\\.\\d+")) %>%
bind_rows()
Or with rbind from base R
do.call(rbind, mget(ls(pattern="^df\\.\\d+")))
You can try:
new_df <- do.call("rbind",mget(ls(pattern = "^df.*")))