Using list's elements in loops in r (example: setDT) - r

I have multiple data frames and I want to perform the same action in all data frames, such, for example, transform all them into data.tables (this is just an example, I want to apply other functions too).
A simple example can be (df1=df2=df3, without loss of generality here)
df1 <- data.frame(var1 = c(1, 2, 3, 4, 5), var2 =c(1, 2, 2, 1, 2), var3 = c(10, 8, 15, 7, 9))
df2 <- data.frame(var1 = c(1, 2, 3, 4, 5), var2 =c(1, 2, 2, 1, 2), var3 = c(10, 8, 15, 7, 9))
df3 <- data.frame(var1 = c(1, 2, 3, 4, 5), var2 =c(1, 2, 2, 1, 2), var3 = c(10, 8, 15, 7, 9))
My approach was: (i) to create a list of the data frames (list.df), (ii) to create a list of how they should be called afterwards (list.dt) and (iii) to loop into those two lists:
list.df:
list.df<-vector('list',3)
for(j in 1:3){
name <- paste('df',j,sep='')
list.df[j] <- name
}
list.dt
list.dt<-vector('list',3)
for(j in 1:3){
name <- paste('dt',j,sep='')
list.dt[j] <- name
}
Loop (to make all data frames into data tables):
for(i in 1:3){
name<-list.dt[i]
assign(unlist(name), setDT(list.df[i]))
}
I am definitely doing something wrong as the result of this are three data tables with 1 variable, 1 observation (exactly the name list.df[i]).
I've tried to unlist the list.df thinking r would recognize that as an entire data frame and not only as a string:
for(i in 1:3){
name<-list.dt[i]
assign(unlist(name), setDT(unlist(list.df[i])))
}
But I get the error message:
Error in setDT(unlist(list.df[i])) :
Argument 'x' to 'setDT' should be a 'list', 'data.frame' or 'data.table'
Any suggestions?

You can just put all the data into one dataframe. Then, if you want to iterate through dataframes, use dplyr::do or, preferably, other dplyr functions
library(dplyr)
data =
list(df1 = df2, df2 = df2, df3 = df3) %>%
bind_rows(.id = "source") %>%
group_by(source)

Change your last snippet to this:
for(i in 1:3){
name <- list.dt[i]
assign(unlist(name), setDT(get(list.df[[i]])))
}

# Alternative to using lists
list.df <- paste0("df", 1:3)
# For loop that works with the length of the input 'list'/vector
# Creates the 'dt' objects on the fly
for(i in seq_along(list.df)){
assign(paste0("dt", i), setDT(get(list.df[i])))
}

Using data.table (which deserve far more advertising):
a) If you need all your data.frames converted to data.tables, then as was already suggested in the comments by #A5C1D2H2I1M1N2O1R2T1, iterate over your data.frames with setDT
library(data.table)
lapply(mget(paste0("df", 1:3)), setDT)
# or, if you wish to type them one by one:
lapply(list(df1, df2, df3), setDT)
class(df1) # check if coercion took place
# [1] "data.table" "data.frame"
b) If you need to bind your data.frames by rows, then use data.table::rbindlist
data <- rbindlist(mget(paste0("df", 1:3)), idcol = TRUE)
# or, if you wish to type them one by one:
data <- rbindlist(list(df1 = df1, df2 = df2, df3 = df3), idcol = TRUE)
Side note: If you like chaining/piping with the magrittr package (which you see almost always in combination with dplyr syntax), then it goes like:
library(data.table)
library(magrittr)
# for a)
mget(paste0("df", 1:3)) %>% lapply(setDT)
# for b)
data <- mget(paste0("df", 1:3)) %>% rbindlist(idcol = TRUE)

Related

Add a Column created Within a Function to a dataframe in R

I have searched and tried multiple previously asked questions that might be similar to my question, but none worked.
I have a dataframe in R called df2, a column called df2$col. I created a function to take the df, the df$col, and two parameters that are names for two new columns I want created and worked on within the function. After the function finishes running, I want a return df with the two new columns included. I get the two columns back indeed, but they are named after the placeholders in the function shell. See below:
df2 = data.frame(col = c(1, 3, 4, 5),
col1 = c(9, 6, 8, 3),
col2 = c(8, 2, 8, 4))
the function I created will take col and do something to it; return the transformed col, as well as the two newly created columns:
no_way <- function(df, df_col_name, df_col_flagH, df_col_flagL) {
lo_perc <- 2
hi_perc <- 6
df$df_col_flagH <- as.factor(ifelse(df_col_name<lo_perc, 1, 0))
df$df_col_flagL <- as.factor(ifelse(df_col_name>hi_perc, 1, 0))
df_col_name <- df_col_name + 1.4
df_col_name <- df_col_name * .12
return(df)
}
When I call the function, no_way(df2, col, df$new_col, df$new_col2), instead of getting a df with col, col1, col2, new_col1, new_col2, I get the first three right but get the parametric names for the last two. So something like df, col, col1, col2, df_col_flagH, df_col_flagL. I essentially want the function to return the df with the new columns' names I give it when I am calling it. Please help.
I don't see what your function is trying to do, but this might point you in the right direction:
no_way <- function(df = df2, df_col_name = "col", df_col_flagH = "col1", df_col_flagL = "col2") {
lo_perc <- 2
hi_perc <- 6
df[[df_col_flagH]] <- as.factor(ifelse(df[[df_col_name]] < lo_perc, 1, 0)) # as.factor?
df[[df_col_flagL]] <- as.factor(ifelse(df[[df_col_name]] > hi_perc, 1, 0))
df[[df_col_name]] <- (df[[df_col_name]] + 1.4) * 0.12 # Do in one step
return(df)
}
I needed to call the function with the new column names as strings instead:
no_way(mball, 'TEAM_BATTING_H', 'hi_TBH', 'lo_TBH')
Additionally, I had to use brackets around the target column in my function.

how to apply same function to multiple dataframes in R

I am applying the same function to multiple dataframes. For example, I want to merge the column2 and column3 in df1. After applying this function, the df1 will get a new column called col2_col3.
df1 <- data.frame(x = rep(3, 5), y = seq(1, 5, 1), ID = letters[1:5])
df2 <- data.frame(x = rep(5, 5), y = seq(2, 6, 1), ID = letters[6:10])
#I define a function:
PasteTwoColumn <- function(x)
{
x$col2_col3 <- paste(x[,2], x[,3], sep = "_")
return(x)
}
#apply the function to the df1, it works.
df1 <- PasteTwoColumn(df1)
# but I failed by an lappy function, because it returns a list, not the dataframe
mylist <- list(df1, df2)
result <- lapply(mylist, PasteTwoColumn)
I want to continue to apply this function to all my dataframes, eg. df1, df2, df3 ...df100. The output file should keep the same type of dataframe and the name. The lapply function does not work, because it returns a list, not the separate data frame.
We can keep the datasets in a list and loop over the list with lapply
lst1 <- lapply(list(df1, df2), PasteTwoColumn)
If there are many datasets, use mget to get the values of the datasets into a list
lst1 <- lapply(mget(paste0('df', 1:100)), PasteTwoColumn)
Or instead of paste, we can also use ls
lst1 <- lapply(mget(ls(pattern = '^df\\d+$')), PasteTwoColumn)
If we need to update the original object, use list2env
list2env(lst1, .GlobalEnv) #not recommended though
If we need to use a for loop
for(obj in paste0("df", 1:100)) {
assign(obj, PasteTwoColumn(get(obj)))
}

How to print the names of dataframes in list with for loop?

I have a list of dataframes such as follows:
x <- c(1, 2, 3, 4, 5)
y <- c(5, 4, 3, 2, 1)
df1 <- data.frame(x)
df2 <- data.frame(y)
x <- list(df1, df2)
I want to print the names of the dataframes in list x with a for loop such as this:
for (i in x) {
deparse(substitute(x[i]))
}
But it doesn't work. My goal is to have the names of the dataframes printed out as characters such as this:
[1] df1
[2] df2
Thanks!
Data.frames don't have "names". A variable can point to a data.frame, but a data.frame can exist without a name at all (like if you did x <- list(data.frame(x), data.frame(y))). So your data.frame isn't named df1; df1 is a variable name that happens to point to a data.frame.
If you put a variable in a list, the value of the variable is placed in the list, not the variable name itself. So if you want to keep a name of the variable that originally held the object in the list, you'd need to store the name in the list. One common way to do that is
x <- list(df1=df1, df2=df2)
Then you can set the names with names(x). If you want to see other ways to create lists that keep the object names, see the existing question: Can lists be created that name themselves based on input object names?
There were some very helpful point: using x <- list(df1=df1, df2=df2) to save the names of the data frames in the list and using the names() function. Here is the code I used to definitively print the names of the data frames in the list:
x <- c(1, 2, 3, 4, 5)
y <- c(5, 4, 3, 2, 1)
df <- data.frame(x, y)
df1 <- data.frame(x)
df2 <- data.frame(y)
x <- list(df1=df1, df2=df2)
for (i in 1:2) {
print(as.character(names(x)[i]))
}
And this prints out the names of the data frames on the list.

R subset df based on multiple columns from another data frame

I am trying to find a more succinct way to filter a data frame using rows from another data frame (I am currently using a loop).
For example, suppose you have the following data frame df1 consisting of quantities of apples, pears, lemons and oranges. There is also a 5th column which we will call happiness.
require(gtools)
df1 <- data.frame(permutations(n = 4, r = 4, v = 1:4)) %>% cbind(sample(1:24))
colnames(df1) <- c("Apples", "Pears", "Lemons", "Oranges", "Happiness")
However you wish to filter this dataframe to leave only certain combinations of fruit which exist in a second data frame (not with the same column order):
df2 = data.frame(Apples = c(1, 3, 2, 4), Pears = c(4, 1, 1, 3), Lemons = c(2, 2, 3, 1), Oranges = c(3, 4, 4, 2))
Currently I am using a loop to apply each row of df2 as a filter condition one-by-one and then binding the result e.g:
df.ss = list()
for (i in 1:nrow(df2)){
df.ss[[i]] = filter(df1,
df1$Apples == df2$Apples &
df1$Pears == df2$Pears &
df1$Lemons == df2$Lemons &
df1$Oranges == df2$Oranges)
}
df.ss %>% bind_rows()
Is there a more elegant way of going about this ?
I think you are looking for an inner join
dplyr::inner_join(df1, df2)

Using an element from a table in selecting columns/rows in R

I've been working on a process to create all possible combinations of unique integers for lengths 1:n. I found the nCr function (combn function in the combinat package to be useful here).
Once all unique occurrences are iterated, they are appended to a consolidation table that contains any possible length+combination of the digits 1:n. A subset of the final table's relevant column (one record) looks like this (column is named String and the subset table f1):
c(1,3,4,5,9,10)
I need to select these columns from a secondary data source (df) one at a time (I am going to loop through this table), so my logic was to use this code:
df[,f1$String]
However, I get a message that says that undefined columns are selected, but if I copy and paste the contents of the cell such as:
df[,c(1, 3, 4, 5, 9, 10)]
it works fine ... I've tried all I can think of at this point; if anyone has some insight it would be greatly appreciated.
Code to reproduce is:
library(combinat)
library(data.table)
library(plyr)
rm(list=ls())
NCols=10
NRows=10
myMat<-matrix(runif(NCols*NRows), ncol=NCols)
XVars <- as.data.frame(myMat)
colnames(XVars) <- c("a","b","c","d","e","f","g","h","i","j")
x1 <- as.data.frame(colnames(XVars[1:ncol(XVars)]))
colnames(x1) <- "Independent.Variable"
setDT(x1)[, Index := .GRP, by = "Independent.Variable"]
colClasses = c("character", "numeric", "numeric")
col.names = c("String", "r!", "n!")
Combination <- read.table(text = "", colClasses = colClasses, col.names = col.names)
for(i in 1:nrow(x1)){
x2<- as.data.frame(combn(nrow(x1),i))
for (i in 1:ncol(x2)){
x3 <- paste("c(",paste(x2[1:nrow(x2),i], collapse = ", "), ")", sep="")
x3 <- as.data.frame(x3)
colnames(x3) <- "String"
x3 <- mutate(x3, "r!" = nrow(x2))
x3 <- mutate(x3, "n!" = nrow(x1))
Combination <- rbind(Combination, x3)
}
}
setDT(Combination)[, Index := .GRP, by = c("String", "r!", "n!")]
f1 <- Combination[717,]
f1$String <- as.character(f1$String)
## reference to data frame
myMat[,(f1$String)]
## pasted element
myMat[, c(1, 3, 4, 5, 9, 10)]
f1$String is the string "c(1, 3, 4, 5, 9, 10)". When you use myMat[,(f1$String)], R will look for the column with name "c(1, 3, 4, 5, 9, 10)". To get column numbers 1,3,4,5,9,10, you have to parse the string to an R expression and evaluate it first:
myMat[,eval(parse(text=f1$String))]
As #user3794498 noticed, you set f1$String as.character() so you cannot use is to get the columns you want.
You can change the way you define f1 or extract the column numbers from f1$String. Something like this should also work (load stringr before) myMat[, f1$String %>% str_match_all("[0-9]+") %>% unlist %>% as.numeric].

Resources