Say I have multiple data frames which all have identical vector names and I'd like to cbind all which have a commmon pattern. So for these 3 data frames:
df.1 <- data.frame(column1 = factor(sample(c("Male","Female"), 10, replace=TRUE)),
speed=runif(10))
df.2 <- data.frame(column1 = factor(sample(c("Male","Female"), 10, replace=TRUE)),
speed=runif(10))
df.3 <- data.frame(column1 = factor(sample(c("Male","Female"), 10, replace=TRUE)),
speed = runif(10))
I would like to rbind everything with the common pattern "df.*"
I have tried creating a list and then creating a data-frame from this using:
temp <- lapply(ls(pattern = "df.*"), get)
temp2<- as.data.frame(temp)
However this only produces a data frame of 6 columns effectively cbinding the whole thing rather than rbinding.
We can use ls with mget
library(data.table)
rbindlist(mget(ls(pattern = "^df\\.\\d+")))
Or with dplyr
library(dplyr)
mget(ls(pattern="^df\\.\\d+")) %>%
bind_rows()
Or with rbind from base R
do.call(rbind, mget(ls(pattern="^df\\.\\d+")))
You can try:
new_df <- do.call("rbind",mget(ls(pattern = "^df.*")))
Related
I have three dataframes with one variable but it is labelled differently in each. Is there a way to rename the variable by position (or just by the dataframes each having a single variable) across all three dataframes rather than doing it individually.
e.g I would like to rename the column in dfa, dfb, dfc to "Percentage"
dfa <- data.frame(x = c(45, 55))
dfb <- data.frame(y = c(60, 40))
dfc <- data.frame(z = c(30, 70))
I tried using a loop like below - why doesn't this work?
for (i in c(dfa, dfb, dfc)) {
names(i)[1] <- "Percentage"
}
Using the purrr and dplyr libraries:
library(purrr)
library(dplyr)
list2env(purrr::imap(list(dfa = dfa, dfb = dfb, dfc = dfc), ~ dplyr::rename(., Percentage = 1)), envir = .GlobalEnv)
Or using pipes you can write this as
list(dfa = dfa, dfb = dfb, dfc = dfc) %>%
purrr::imap(~ dplyr::rename(., Percentage = 1)) %>%
list2env(envir = .GlobalEnv)
How it works
If you put your dataframes into a list with the same names (ie dfa = dfa) then purrr::imap will apply a function over that list and preserve the names. The output of imap will be a list where element names are still dfa, dfb, dfc. This will be useful in step 3.
The function being mapped over the list is dplyr::rename, which you can use positionally. Here Percentage = 1 is renaming the first column to be "Percentage".
Lastly, list2env will unlist the dataframes into your global environment with the same names.
You could do something similar in base R:
# names(x)[1] <- "Percentage" renames first column
list_of_dfs <- lapply(list(dfa, dfb, dfc), function(x) {
names(x)[1] <- "Percentage"
x})
names(list_of_dfs) <- c("dfa", "dfb", "dfc")
list2env(list_of_dfs, envir = .GlobalEnv)
I am applying the same function to multiple dataframes. For example, I want to merge the column2 and column3 in df1. After applying this function, the df1 will get a new column called col2_col3.
df1 <- data.frame(x = rep(3, 5), y = seq(1, 5, 1), ID = letters[1:5])
df2 <- data.frame(x = rep(5, 5), y = seq(2, 6, 1), ID = letters[6:10])
#I define a function:
PasteTwoColumn <- function(x)
{
x$col2_col3 <- paste(x[,2], x[,3], sep = "_")
return(x)
}
#apply the function to the df1, it works.
df1 <- PasteTwoColumn(df1)
# but I failed by an lappy function, because it returns a list, not the dataframe
mylist <- list(df1, df2)
result <- lapply(mylist, PasteTwoColumn)
I want to continue to apply this function to all my dataframes, eg. df1, df2, df3 ...df100. The output file should keep the same type of dataframe and the name. The lapply function does not work, because it returns a list, not the separate data frame.
We can keep the datasets in a list and loop over the list with lapply
lst1 <- lapply(list(df1, df2), PasteTwoColumn)
If there are many datasets, use mget to get the values of the datasets into a list
lst1 <- lapply(mget(paste0('df', 1:100)), PasteTwoColumn)
Or instead of paste, we can also use ls
lst1 <- lapply(mget(ls(pattern = '^df\\d+$')), PasteTwoColumn)
If we need to update the original object, use list2env
list2env(lst1, .GlobalEnv) #not recommended though
If we need to use a for loop
for(obj in paste0("df", 1:100)) {
assign(obj, PasteTwoColumn(get(obj)))
}
I have two dataframes. I want to replace the ids in dataframe1 with generic ids. In dataframe2 I have mapped the ids from dataframe1 with the generic ids.
Do I have to merge the two dataframes and after it is merged do I delete the column I don't want?
Thanks.
With dplyr
library(dplyr)
left_join(df1, df2, by = 'ids')
We can use merge and then delete the ids.
dataframe1 <- data.frame(ids = 1001:1010, variable = runif(min=100,max = 500,n=10))
dataframe2 <- data.frame(ids = 1001:1010, generics = 1:10)
result <- merge(dataframe1,dataframe2,by="ids")[,-1]
Alternatively we can use match and replace by assignment.
dataframe1$ids <- dataframe2$generics[match(dataframe1$ids,dataframe2$ids)]
Subsetting data frames isn't very difficult in R: hope this helps, you didn't provide much code so I hope this will be of help to you:
#create 4 random columns (vectors) of data, and merge them into data frames:
a <- rnorm(n=100,mean = 0,sd=1)
b <- rnorm(n=100,mean = 0,sd=1)
c <- rnorm(n=100,mean = 0,sd=1)
d<- rnorm(n=100,mean = 0,sd=1)
df_ab <- as.data.frame(cbind(a,b))
df_cd <- as.data.frame(cbind(c,d))
#if you want column d in df_cd to equal column a in df_ab simply use the assignment operator
df_cd$d <- df_ab$a
#you can also use the subsetting with square brackets:
df_cd[,"d"] <- df_ab[,"a"]
data1 = data.frame("time" = c(1:10))
data2 = data.frame("time" = c(11:20))
data3 = data.frame("time" = c(21:30))
data4 = data.frame("time" = c(31:40))
rbind(data1, data2, data3, data4)
rbind(paste("'","data","'",1:4,sep=","))
I want to bind together a whole bunch of data frames but instead of spelling out all of them want to use paste functions. Here in my simple example you will see it doesn't work as desired but when I spell out the dataframes it works..
We can use mget on the pasted strings to return the values of the object names in a list and then rbind the elements with do.call
`row.names<-`(do.call(rbind, mget(paste0('data', 1:4))), NULL)
Or use pattern in ls
do.call(rbind, mget(ls(pattern = '^data\\d+$')))
With data.table, it would be rbindlist
library(data.table)
rbindlist(mget(paste0('data', 1:4)))
I have a dataset where I only want to loop through certain columns in a dataframe one at a time to create a graph. The structure of my dataframe consists of data that I parsed from a larger dataset into a vector containing multiple dataframes.
I want to call one column from one dataframe in the vector. I want to loop on the dataframes to call each column.
See example below:
d1 <- data.frame(y1=c(1,2,3),y2=c(4,5,6))
d2 <- data.frame(y1=c(3,2,1),y2=c(6,5,4))
my.list <- list(d1, d2)
All I have to work with is my.list
How would I do this?
You can use lapply to plot each of the individual data frames in your list. For example,
d1 <- data.frame(y1=c(1,2,3),y2=c(4,5,6),y3=c(7,8,9))
d2 <- data.frame(y1=c(3,2,1),y2=c(6,5,4),y3=c(11,12,13))
mylist <- list(d1, d2)
par(mfrow=c(2,1))
# lapply on a subset of columns
lapply(mylist, function(x) plot(x$y2, x$y3))
You don't need a for loop to get their data points. You can call the column by their column names.
# a toy dataframe
d <- data.frame(A = 1:20, B = sample(c(FALSE, TRUE), 20, replace = TRUE),
C = LETTERS[1:20], D = rnorm(20, 0, 1))
col_names <- c("A", "B", "D") # names of columns I want to get
d[,col_names] # returns a dataset with the values of the columns you want
Here is a solution to your problem using a for loop:
# a toy dataframe
mylist <- list(dat1 = data.frame(A = 1:20, B = LETTERS[1:20]),
dat2 = data.frame(A = 21:40, B = LETTERS[1:20]),
dat3 = data.frame(A = 41:60, B = LETTERS[1:20]))
col_names <- c("A") # name of columns I want to get
for (i in 1:length(mylist)){
# you can do whatever you want with what is returned;
# here I am just print them out
print(names(mylist)[i]) # name of the data frame
print(mylist[[i]][,col_names]) # values in Column A
}
I think the simplest answer to your question is to use double brackets.
for (i in 1:length(my.list)) {
print(my.list[[i]]$column)
}
That works assuming all of the columns in your list of data frames have the same names. You could also call the position of the column in the data frame if you wanted.
Yes, lapply can be more elegant, but in some situations a for loop makes more sense.