R mutating along a list of dataframes - r

df<- data_frame(first =seq(1:10), second = seq(1:10))
ldf <- list(df, df, df, df, df)
names(ldf) <- c('alpha', 'bravo', 'charlie', 'delta', 'echo')
I have this list of dataframes and I am attempting to apply the mutate function to each dataframe but I get a "not compatible with STRSXP" error that I am confused about.
here is my code that gives me the error.
for( i in seq_along(ldf)){
ldf[[i]] <- mutate( ldf[[i]], NewColumn1= ldf[[i]][1]/(ldf[[i]][2] *2),
NewColumn2= ldf[[i]][1]/(ldf[[i]][2] * 3))
}
My intention is that the for loop goes to the first dataframe. It applys the mutate function and creates a new column called "NewColumn1" that divides the first column by two times the second column. It does something similar for the next column.
Am I in the right ballpark with this code or can I not use mutate when looping though dfs in a list?

You seem to be on the right track, but the way you're substituting the elements of your original list is a bit faulty. While there are multiple ways this could be achieved, the following are in the realm of what you started with:
for-loop
for (df_name in names(ldf)) {
ldf[[df_name]] <- mutate(ldf[[df_name]],
new_col_one=first/(second * 2),
new_col_two=first/(second * 3))
}
This actually overwrites the original list.
lapply
lapply(ldf, function(x) {
mutate(x,
new_col_one=first/(second * 2),
new_col_two=first/(second * 3))
})
This will create a new list
Map
Map(function(x) {
mutate(x,
new_col_one=first/(second * 2),
new_col_two=first/(second * 3))
}, ldf)
This will create a new list, as well.
You can also look into map from the purrr package.
I hope one of these serves a purpose.

Here is an option with map from tidyverse
library(tidyverse)
ldf %>%
map(~mutate(., NewColumn1 = first/(second*2), NewColumn2 = first/(second*3)))

Related

R function used to rename columns of a data frames

I have a data frame, say acs10. I need to relabel the columns. To do so, I created another data frame, named as labelName with two columns: The first column contains the old column names, and the second column contains names I want to use, like the table below:
column_1
column_2
oldLabel1
newLabel1
oldLabel2
newLabel2
Then, I wrote a for loop to change the column names:
for (i in seq_len(nrow(labelName))){
names(acs10)[names(acs10) == labelName[i,1]] <- labelName[i,2]}
, and it works.
However, when I tried to put the for loop into a function, because I need to rename column names for other data frames as well, the function failed. The function I wrote looks like below:
renameDF <- function(dataF,varName){
for (i in seq_len(nrow(varName))){
names(dataF)[names(dataF) == varName[i,1]] <- varName[i,2]
print(varName[i,1])
print(varName[i,2])
print(names(dataF))
}
}
renameDF(acs10, labelName)
where dataF is the data frame whose names I need to change, and varName is another data frame where old variable names and new variable names are paired. I used print(names(dataF)) to debug, and the print out suggests that the function works. However, the calling the function does not actually change the column names. I suspect it has something to do with the scope, but I want to know how to make it works.
In your function you need to return the changed dataframe.
renameDF <- function(dataF,varName){
for (i in seq_len(nrow(varName))){
names(dataF)[names(dataF) == varName[i,1]] <- varName[i,2]
}
return(dataF)
}
You can also simplify this and avoid for loop by using match :
renameDF <- function(dataF,varName){
names(dataF) <- varName[[2]][match(names(dataF), varName[[1]])]
return(dataF)
}
This should do the whole thing in one line.
colnames(acs10)[colnames(acs10) %in% labelName$column_1] <- labelName$column_2[match(colnames(acs10)[colnames(acs10) %in% labelName$column_1], labelName$column_1)]
This will work if the column name isn't in the data dictionary, but it's a bit more convoluted:
library(tibble)
df <- tribble(~column_1,~column_2,
"oldLabel1", "newLabel1",
"oldLabel2", "newLabel2")
d <- tibble(oldLabel1 = NA, oldLabel2 = NA, oldLabel3 = NA)
fun <- function(dat, dict) {
names(dat) <- sapply(names(dat), function(x) ifelse(x %in% dict$column_1, dict[dict$column_1 == x,]$column_2, x))
dat
}
fun(d, df)
You can create a function containing just on line of code.
renameDF <- function(df, varName){
setNames(df,varName[[2]][pmatch(names(df),varName[[1]])])
}

How to drop columns that meet a certain pattern over a list of dataframes

I'm trying to drop columns that have a suffix .1 - indicating that this is a repeated column name. This needs to act over a list of dataframe
I have written a function:
drop_duplicated_columns <- function (df) {
lapply(df, function(x) {
x <- x %>% select(-contains(".1"))
x
})
return(df)
}
However it is not working. Any ideas why?
One tidy way to solve this problem would be to first create a function that works for one data.frame and then map this function to a list
library(tidyverse)
drop_duplicated_columns <- function(df) {
df %>%
select(-contains(".1"))
}
Or even better
drop_duplicated_columns <- . %>%
select(-contains(".1"))
Usage in pipes, combine it with a map
list_dfs <- list(mtcars,mtcars)
list_dfs %>%
map(drop_duplicated_columns)
If you just need one function you can create a new pipe using the functioning code that you tested before
drop_duplicated_columns_list <- . %>%
map(drop_duplicated_columns)
list_dfs %>%
drop_duplicated_columns_list()

How to use mapply to mutate columns in a list of dataframes

I have a list of dataframes - some dataframes in this list require their columns to mutated into date columns. I was wondering if it possible to do this with mapply.
Here is my attempt (files1 is the list of dataframes, c("data, data1") are the names of dataframes within files1, c("adfFlowDate","datedate") are the names of the columns within the respective dataframes:
files2 <- repair_dates(files1, c("data, data1"), c("adfFlowDate","datedate"))
The function that does not work:
repair_dates <- function(data, df_list, col_list) {
mapply(function(n, i) data[[n]] <<- data[[n]] %>% mutate(i = as.Date(i, origin = "1970-01-01")), df_list, col_list)
return(data)
}
Your set-up is fairly complex here, calling an anonymous function inside an mapply inside another function, which takes three parameters, all relating to a single nested object.
Personally, I wouldn't add to this complexity by accommodating the non-standard evaluation required to get mutate to work here (though it is possible). Perhaps something like this (though difficult to tell without any reproducible data) -
repair_dates <- function(data, df_list, col_list)
{
mapply(function(n, i) {
data[[n]][[i]] <- as.Date(data[[n]][[i]], origin = "1970-01-01")
return(data[[n]])
}, df_list, col_list, SIMPLIFY = FALSE)
}

R - lapply - getting data frames back out of lists?

I have the same problem as this guy: returning from list to data.frame after lapply
Whilst they solved his specific problem, no one actually answered his original question about how to get dataframes out of a list.
I have a list of data frames:
dfPreList = list(yearlyFunding, yearlyPubs, yearlyAuthors)
And I want to filter/replace etc on them all.
So my function is:
DoThis = function(x){
filter(x, year >=2015 & year <=2018) %>%
replace(is.na(.), 0) %>%
adorn_totals("row")
}
And I use lapply to run the function on them all like this:
a = lapply(dfPreList, DoThis)
As the other post stated, these data frames are now stuck in this list (a), and I need a for loop to get them out, which just cannot be the correct way of doing it.
This is my current working way of applying the function to the dataframes and then getting them out:
dfPreList = list(yearlyFunding, yearlyPubs, yearlyAuthors)
dfPreListstr= list('yearlyFunding', 'yearlyPubs', 'yearlyAuthors')
DoThis = function(x){
filter(x, year >=2015 & year <=2018) %>%
replace(is.na(.), 0) %>%
adorn_totals("row")
}
a = lapply(dfPreList, DoThis)
for( i in seq_along(dfPreList)){
assign(dfPreListstr[[i]], as.data.frame(a[i]))
}
Is there a way of doing this without having to rely on for loops and string names of the dataframes? I.e. a one-liner with the lapply?
Many thanks for your help
You can assign names to the list and then use list2env.
dfPreList = list(yearlyFunding, yearlyPubs, yearlyAuthors)
a = lapply(dfPreList, DoThis)
names(a) <- c('yearlyFunding', 'yearlyPubs', 'yearlyAuthors')
list2env(a, .GlobalEnv)
Another way would be to unlist the list, then convert the content into data frame.
dfPreList = list(yearlyFunding, yearlyPubs, yearlyAuthors)
a = lapply(dfPreList, DoThis)
names(a) <- c('yearlyFunding', 'yearlyPubs', 'yearlyAuthors')
yearlyFunding <- data.frame(matrix(unlist(a$yearlyFunding), nrow= nrow(yearlyFunding), ncol= ncol(yearlyFunding)))
yearlyPubs <- data.frame(matrix(unlist(a$yearlyPubs), nrow= nrow(yearlyPubs), ncol= ncol(yearlyPubs)))
yearlyAuthors <- data.frame(matrix(unlist(a$yearlyAuthors), nrow= nrow(yearlyAuthors), ncol= ncol(yearlyAuthors)))
Since unlist function returns a vector, we first generate a matrix, then convert it to data frame.

How to facilitate the output when splitting data.frame in the list?

I have data.frame objects in the list. However, I intend to split myList by comparing its score with given threshold value. In particular, I want let my function only return data.frame whose score greater than threshold value, meanwhile I export the one with less than threshold value as csv file(because I will further process saved data.frame, while exported data.frame will be listed on summary at the end).I aware that it is easier first split data.frame then export them as csv file, and further process desired one. But I want to make this happen in one wrapper function easily. Can anyone point me how to facilitate the output of my function more efficiently ? Any idea ?
mini example:
mylist <- list(
foo=data.frame( from=seq(1, by=4, len=16), to=seq(3, by=4, len=16), score=sample(30, 16)),
bar=data.frame( from=seq(3, by=7, len=20), to=seq(6, by=7, len=20), score=sample(30, 20)),
cat=data.frame( from=seq(4, by=8, len=25), to=seq(7, by=8, len=25), score=sample(30, 25)))
I intend to split them like this:
func <- function(list, threshold=16, ...) {
# input param checking
stopifnot(is.numeric(threshold))
reslt <- lapply(list, function(elm) {
res <- split(x, c("Droped", "Saved")[(x$score > threshold)+1])
# FIXME : anyway to export Droped instance while return Saved
})
}
In my sketch function, I intend to export Droped instance from each data.frame as csv files, while return Saved instance from each data.frame as an output and use this for further process.
I tried to make this happen in my function, but my approach is not efficient here. Can anyone point me out how to accomplish this easily ? Does anyone knows any useful to trick of doing this to prompt my expected output more elegantly ? Thanks in advance.
You could roll both processes into a call to lapply, like this:
# function to perform both tasks on one data frame in mylist
splitter <- function(i, threshold) {
require(dplyr)
DF <- mylist[[i]]
DF %>%
filter(score <= threshold) %>%
write.csv(., sprintf("dropped.%s.csv", i), row.names = FALSE)
Saved <- filter(DF, score > threshold)
return(Saved)
}
# now use the function to create a new list, my list2, with the Saved
# versions as its elements. the csvs of the dropped rows will get created
# as this runs.
mylist2 <- lapply(seq_along(mylist), function(i) splitter(i, 16))

Resources