For loop to eliminate columns in multiple dfs - r

I've got about 10 dataframes. For the example, here are two:
name <- c("A", "B", "C")
name.footnote <- c("this", "that", "the other")
class <- c("one", "two", "three")
class.footnote <- c("blank", "blank", "blank")
df1 <- data.frame(name, name.footnote, class, class.footnote)
df2 <- data.frame(name, name.footnote, class, class.footnote)
When I eliminate columns from them one at a time, my code works fine.
library(dplyr)
df1 <- select(df1, -ends_with("footnote"))
I'd like to write a loop to process both dfs with less code, but can't get my loop working right. I keep getting the same error message:
Error in UseMethod("select_") : no applicable method for 'select_'
applied to an object of class "character".
See a few of the many loop codes I've tried, below. What am I missing?
listofDfs <- list("df1","df2")
1.
lapply(listofDfs, function(df){
df <- select(df, -ends_with("footnote"))
return(df)
}
)
2.
for (i in listofDfs){
i <- select(i, -ends_with("footnote"))
}

Try dropping the quotes when defining your list listofDfs <- list(df1,df2). As the error states, when you have the quotes the elements of your list are character instead of the data.frame that select() is expecting.
library(dplyr)
listofDfs <- list(df1,df2)
#using lapply
list_out1 <- lapply(listofDfs, function(df){
df <- select(df, -ends_with("footnote"))
return(df)
})
#using for loop
list_out2 <- vector("list", length(listofDfs))
for (i in seq_along(listofDfs)){
list_out2[[i]] <- select(listofDfs[[i]], -ends_with("footnote"))
}
follow up per comment
you can use get and assign to work with your original character list and manipulate the dfs in your global environment while iterating.
listofDfs <- list('df1','df2')
invisible(lapply(listofDfs, function(i){
df <- select(get(i, globalenv()), -ends_with("footnote"))
assign(i, df, envir = globalenv())
}))
for (i in listofDfs){
df <- select(get(i, globalenv()), -ends_with("footnote"))
assign(i, df, envir = globalenv())
}

Related

Loop-generated list of data frames not being joined by rbind properly

I have a table with samples of data named Sample_1, Sample_2, etc. I take user input as a string for which samples are wanted (Sample_1,Sample_3,Sample_5). Then after parsing the string, I have a for-loop which I pass each sample name to and the program filters the original dataset for the name and creates a DF with calculations. I then append the DF to a list after each iteration of the loop and at the end, I rbind the list for a complete DF.
sampleloop <- function(samplenames) {
data <- unlist(strsplit(samplenames, ","))
temp = list()
for(inc in 1:length(data)) {
df <- CT[CT[["Sample_Name"]] == data[inc],]
........
tempdf = goitemp
temp[inc] <- tempdf
}
newdf <- do.call(rbind.data.frame, temp)
}
The inner function on its own produces the correct wanted output. However, with the loop the function produces the following wrong DF if the input is "Sample_3,Sample_9":
I'm wondering if it has something to do with the rbind?
The issue seems to be using [ instead of [[ to access and assign to the list element`
sampleloop <- function(samplenames) {
data <- unlist(strsplit(samplenames, ","))
temp <- vector('list', length(data))
for(inc in seq_along(data)) {
df <- CT[CT[["Sample_Name"]] == data[inc],]
........
tempdf <- goitemp
temp[[inc]] <- tempdf
}
newdf <- do.call(rbind.data.frame, temp)
return(newdf)
}
The difference can be noted with the reproducible example below
lst1 <- vector('list', 5)
lst2 <- vector('list', 5)
for(i in 1:5) {
lst1[i] <- data.frame(col1 = 1:5, col2 = 6:10)
lst2[[i]] <- data.frame(col1 = 1:5, col2 = 6:10)
}

How to rewrite as a loop when I have identical frames for different years and the year is in the name?

I am new, so this question is a bit basic, but it might help others get a good start as well...
How to rewrite the below as a loop and have it include the years in the new names, as below...
DFNUM2011 = DF2011[,!(names(DF2011) %in% mydummies)]
DFNUM2012 = DF2012[,!(names(DF2012) %in% mydummies)]
DFNUM2013 = DF2013[,!(names(DF2013) %in% mydummies)]
I tried
df.list<-list("2011","2012","2013")
> for (i in df.list){
+ DFNUM[[i]] = DF[[i]][,!(names(DF2011) %in% mydummies)]
+ }
Error in DF : object 'DF' not found
This can work:
#List
List <- list(DFNUM2011,DFNUM2012,DFNUM2013)
#Loop
for (i in seq_along(List))
{
List[[i]] = List[[i]][,!(names(List[[i]]) %in% mydummies)]
}
A working example can be:
#Example
List <- list(iris,mtcars)
mydummies <- c('Species','mpg')
#Loop
for (i in seq_along(List))
{
List[[i]] = List[[i]][,!(names(List[[i]]) %in% mydummies)]
}
And a more compact way without loops:
#Code
List <- lapply(List, function(x) {x<-x[,!names(x) %in% mydummies]})
You can use :
library(purrr)
n <- 2011:2013
result <- map(mget(paste0('DF', n)), ~keep(.x, !(names(.x) %in% mydummies)))
If you want to create new dataframes with different names in your global environment.
names(result) <- paste0('DFNUM', n)
list2env(result, .GlobalEnv)
This should create DFNUM2011, DFNUM2012 and DFNUM2013 dataframes.

How can I make a loop that calls dataframes

I have the wrote the code below for a transformation of rows of a dataframe to colums
RowsToColums <- function(df)
{
model = list()
for(i in seq_along(df))
{
if(i>4)
{
dataf <- data.frame(names = df[1], Year=colnames(df[i]), index = df[,i:i])
names(dataf)[3]<- toString(df[[3]][2])
names(dataf)[1]<- "Country"
model[[i]] <- dataf
}
}
df <- do.call(rbind, model)
df <- arrange(df, Country)
}
EC_Pop <- RowsToColums(EC_Pop)
EC_GDP <- RowsToColums(EC_GDP)
EC_Inflation <- RowsToColums(EC_Inflation)
ST_Tech_Exp <- RowsToColums(ST_Tech_Exp)
ST_Res_Jour <- RowsToColums(ST_Res_Jour)
ST_Res_Exp <- RowsToColums(ST_Res_Exp)
ST_Res_Pop <- RowsToColums(ST_Res_Pop)
ED_Unempl <- RowsToColums(ED_Unempl)
ED_Edu_Exp <- RowsToColums(ED_Edu_Exp)
But as you can see, I call many times the same function.
I tried to move all these dataframes in a vector like this
list_a = list(EC_Pop,EC_GDP,EC_Inflation,ST_Tech_Exp,ST_Res_Exp)
for (i in seq_along(list_a))
{
list_a[i] <- RowsToColums(list_a[i])
}
write a loop that everytime take the dataframe but it fails with an error
UseMethod ("arrange_") error:
Inapplicable method for 'arrange_' applied to object of class "NULL"
Does anybody know how to fix this case?

Apply a user defined function to a list of data frames

I have a series of data frames structured similarly to this:
df <- data.frame(x = c('notes','year',1995:2005), y = c(NA,'value',11:21))
df2 <- data.frame(x = c('notes','year',1995:2005), y = c(NA,'value',50:60))
In order to clean them I wrote a user defined function with a set of cleaning steps:
clean <- function(df){
colnames(df) <- df[2,]
df <- df[grep('^[0-9]{4}', df$year),]
return(df)
}
I'd now like to put my data frames in a list:
df_list <- list(df,df2)
and clean them all at once. I tried
lapply(df_list, clean)
and
for(df in df_list){
clean(df)
}
But with both methods I get the error:
Error in df[2, ] : incorrect number of dimensions
What's causing this error and how can I fix it? Is my approach to this problem wrong?
You are close, but there is one problem in code. Since you have text in your dataframe's columns, the columns are created as factors and not characters. Thus your column naming does not provide the expected result.
#need to specify strings to factors as false
df <- data.frame(x = c('notes','year',1995:2005), y = c(NA,'value',11:21), stringsAsFactors = FALSE)
df2 <- data.frame(x = c('notes','year',1995:2005), y = c(NA,'value',50:60), stringsAsFactors = FALSE)
clean <- function(df){
colnames(df) <- df[2,]
#need to specify the column to select the rows
df <- df[grep('^[0-9]{4}', df$year),]
#convert the columns to numeric values
df[, 1:ncol(df)] <- apply(df[, 1:ncol(df)], 2, as.numeric)
return(df)
}
df_list <- list(df,df2)
lapply(df_list, clean)

subsetting a data.frame using a for loop

I have a data.frame, and I want to subset it every 10 rows and then applied a function to the subset, save the object, and remove the previous object. Here is what I got so far
L3 <- LETTERS[1:20]
df <- data.frame(1:391, "col", sample(L3, 391, replace = TRUE))
names(df) <- c("a", "b", "c")
b <- seq(from=1, to=391, by=10)
nsamp <- 0
for(i in seq_along(b)){
a <- i+1
nsamp <- nsamp+1
df_10 <- df[b[nsamp]:b[a], ]
res <- lapply(seq_along(df_10$b), function(x){...}
saveRDS(res, file="res.rds")
rm(res)
}
My problem is the for loop crashes when reaching the last element of my sequence b
When partitioning data, split is your friend. It will create a list with each data subset as an item which is then easy to iterate over.
dfs = split(df, 1:nrow(df) %/% 10)
Then your for loop can be simplified to something like this (untested... I'm not exactly sure what you're doing because example data seems to switch from df to sc2_10 and I only hope your column named b is different from your vector named b):
for(i in seq_along(dfs)){
res <- lapply(seq_along(dfs[[i]]$b), function(x){...}
saveRDS(res, file = sprintf("res_%s.rds", i))
rm(res)
}
I also modified your save file name so that you aren't overwriting the same file every time.

Resources