do.call skip error and continue processing - r

After a for loop I create 4 dataframes (data1, data2,data3,data4), i want to rbind all of them.
I tried:
do.call(rbind, mget(paste0("data", 1:4)))
but sometimes, the for loop gives me only 3 of them, for example: data1, data2, data4.
it seems that do.call doesn't know how to handle this issue.
How could I do to still have an rbind of data1, data2, data4?

You can get all your objects from the global environment (via ls()) and use grep to get the ones that follow the pattern you need, i.e.
do.call(rbind, mget(grep('data[0-9]+', ls(), value = TRUE)))

Maybe check if dataframe exists in the environment and mget only those.
data_names <- paste0("data", 1:4)
do.call(rbind, mget(data_names[sapply(data_names, exists)]))

You can use pattern matching mechanism in ls to identify your objects, as mget takes character vector of object names and pattern argument in ls can use regular expression, which is more flexible than generating object names via paste.
data_cars_one <- mtcars
data_cars_two <- mtcars
library(tidyverse)
res_all <- bind_rows(mget(x = ls(pattern = "^data")))
Concerning the binding, I've used bind_rows just as an alternative to do.call and Reduce solutions.

Related

Apply an `as.character()` function to a list of dataframes

So essentially I have a list of dataframes that I want to apply as.character() to.
To obtain the list of dataframes I have a list of files that I read in using a map() function and a read funtion that I created. I can't use map_df() because there are columns that are being read in as different data types. All of the files are the same and I know that I could hard code the data types in the read function if I wanted, but I want to avoid that if I can.
At this point I throw the list of dataframes in a for loop and apply another map() function to apply the as.character() function. This final list of dataframes is then compressed using bind_rows().
All in all, this seems like an extremely convoluted process, see code below.
audits <- list.files()
my_reader <- function(x) {
my_file <- read_xlsx(x)
}
audits <- map(audits, my_reader)
for (i in 1:length(audits)) {
audits[[i]] <- map_df(audits[[i]], as.character)
}
audits <- bind_rows(audits)
Does anybody have any ideas on how I can improve this? Ideally to the point where I can do everything in a single vectorised map() function?
For reproducibility you can use two iris datasets with one of the columns datatypes changed.
iris2 <- iris
iris2[1] <- as.character(iris2[1])
my_list <- list(iris, iris2)
as.character works on vector whereas data.frame is a list of vectors. An option is to use across if we want only a single use of map
library(dplyr)
library(purrr)
map_dfr(my_list, ~ .x %>%
mutate(across(everything(), as.character)))
I wanted to show a base R solution just incase if it helps anyone else. You can use rapply to recursively go through the list and apply a function. you can specify class and if you want to replace or unlist/list the returned object:
iris2 <- iris
iris2[1] <- as.character(iris2[1])
my_list <- list(iris, iris2)
mylist2 <- rapply(my_list, class = "ANY", f = as.character, how = "replace")
bigdf <- do.call(rbind, mylist2)

A Function to Merge 100 Dataframes to one Dataframe

I am new to programming and R is my first programming language to learn.
I want to merge 100 dataframes; each dataframe contains one column and 20 observations, as shown below:
df1 <- as.data.frame(c(6,3,4,4,5,...))
df2 <- as.data.frame(c(2,2,3,5,10,...))
df3 <- as.data.frame(c(5,9,2,3,7,...))
...
df100 <- as.data.frame(c(4,10,5,9,8,...))
I tried using df.list <- list(df1:df100) to construct an overall dataframe for all of the dataframes but I am not sure if df.list merges all the columns from all the dataframes together in a table.
Can anyone tell me if I am right? And what do I need to do?
We can use mget to get all the objects into a list by specifying the pattern in 'ls' to check for object names that starts (^) with 'df' followed by one or mor digits (\\d+) till the end ($) of the string
df.list <- mget(ls(pattern = '^df\\d+$'))
From the list, if we can want to cbind all the datasets, use cbind in do.call
out <- do.call(cbind, df.list)
NOTE: It is better not to create multiple objects in the global environment. We could have read all the data into a list directly or constructed within a list i.e. if the files are read from .csv, get all the files with .csv from the directory of interest with list.files, then loop over the files in lapply, read them individually with read.csv and cbind
files <- list.files(path = 'path/to/your/location',
pattern = '\\.csv$', full.names = TRUE)
out <- do.call(cbind, lapply(files, read.csv))
We can also use reduce function from purrr package, after creating a character vector of names of data frames:
library(dplyr)
library(purrr)
names <- paste0("df", 1:100)
names %>%
reduce(.init = get(names[1]), ~ bind_rows(..1, get(..2)))
Or in base R:
Reduce(function(x, y) rbind(x, get(y)), names, init = get(names[1]))

colnames and mutate on multiple dataframes

I have a problem with cleaning up my code. I understand I could type this all out but we don't want that obviously.
I have only dataframes in my global environment. They are all "data.frame".
I want to check the dimensions of all of them and put that in a tibble. I managed that somehow. I also would like to change their colnames() tolower() which works easy if I just type the name of the data.frame, but there's more than 2 and I want it done automatically. Then I also want to mutate all data.frames in the same way.
Small example of my code:
library(tidyverse)
x <- data.frame(letters[1:2]) #To create the data
y <- data.frame(letters[3:4])
dfs <- as.list(ls()) #I take whatever is in my environment
I managed below to get a tibble of the dimensions:
z <- as_tibble(lapply(seq_along(dfs),
function(j) dim(get(dfs[[j]]))), .name_repair = "unique")
colnames(z) <- dfs
Now for the colnames of all the data.frames stored in my list I basically want to perform this code:
colnames(dfs[[1]]) <- tolower(colnames(dfs[[1]])
but that returns NULL as I found out earlier. So I used get() in there to make it work for the dimensions. But if I use get() to assign colnames it says it can't find function "get<-".
Since all colnames for all dataframes are the same (just different nrows()) I could save the lowercase colnames as value and use that, but that doesn't take away that it cant find the get<- function.
names <- tolower(colnames(x))
sapply(seq_along(dfs),
function(j) colnames(get(dfs[[j]])) <- names)
*Error in colnames(get(dfs[[j]])) <- names :
could not find function "get<-"*
as for the mutating part I tried a for loop:
for(i in seq_along(dfs)){
get(dfs[[i]]) <- get(dfs[[i]]) %>% mutate(cd = ab)
}
But it's the same issue.
Could anyone help clearing this problem for me? (and if a cleaner code for the dimensions is available that would be highly appreciated)
I am just trying to up my coding skills. I would have been long done if I just typed it all out but that defeats the purpose.
Thanks!
-JK
Using base R
lapply(dfs, function(x) transform(setNames(x, tolower(names(x))), X = c('a', 'b')))

R Combine more than two lists elements with RegEx

I have multiple lists starting with the same name.
(values_1, values_2,values_n)
Is there a way to combine them like
all_lists <- c(values_*)
As suggested by Ronak Shah comment:
You have to work with the global environment .GlobalEnv
The function ls returns all the objects already defined in the .GlobalEnv
The pattern parameter allows you to obtain only objects which match the pattern.
ls() returns a character vector with the names of the objects.
To access the value of objects with their names, you have to use the get() function
When you have multiple names, you can use mget(). So the final snippet is
list_data <- mget(ls(pattern = 'values_'))
If you want to do the same with dataframes
Here is a working example:
mtc_1 <- mtcars
mtc_2 <- mtcars
mtc_3 <- mtcars
list_data <- mget(ls(pattern = 'mtc_'))
do.call(rbind, list_data)

Combine lapply, seq_along and ddply

I've been searching around this forum and trying to implement in my case what was said in previous answers from those questions. However, something in my code is missing.
I use lapply() with a function inside that runs ddply. This works nice. However, I would like to identify every result from a single data frame by reading the name of the data frame, and not [[1]], [[2]]...
For this reason, I am trying to implement the seq_along argument, but unsuccessfully. Let's see what I have:
I created a list to group 16 different data frames (with the same structure) in one object, called melt_noNA_noDC_regression:
melt_noNA_noDC_regression <-
list(I1U_melt_noNA_noDC_regression, I1L_melt_noNA_noDC_regression,
I1U_melt_noNA_noDC_regression, I1L_melt_noNA_noDC_regression,
CU_melt_noNA_noDC_regression, CL_melt_noNA_noDC_regression,
P3U_melt_noNA_noDC_regression, P3L_melt_noNA_noDC_regression,
P4U_melt_noNA_noDC_regression, P4L_melt_noNA_noDC_regression,
M1U_melt_noNA_noDC_regression, M1L_melt_noNA_noDC_regression,
M2U_melt_noNA_noDC_regression, M2L_melt_noNA_noDC_regression,
M3U_melt_noNA_noDC_regression, M3L_melt_noNA_noDC_regression)
Later, I run this lapply() line successfully.
lapply(melt_noNA_noDC_regression, function(x) ddply(x, .(Species), model_regression))
As I have 16 different data frames, I would like to identify them in the results of the lapply function. I have tried several combinations to include seq_along within the lapply code, as in this case:
lapply(melt_noNA_noDC_regression, function(x) {
ddply(x, .(Species), model_regression)
seq_along(x), function(i) paste(names(x)[[i]], x[[i]])
})
However, I've been getting errors constantly, and it is a bit frustrating. It is maybe very easy to solve, but I am block.
Any idea to solve this?
Consider using eapply (lapply's lesser known sibling) or mget to retrieve a named list of your dataframes. Then run them through lapply for the ddply call to return the same named dataframe list with new corresponding values.
df_list <- eapply(.GlobalEnv, function(d) d)[c("I1U_melt_noNA_noDC_regression",
"I1L_melt_noNA_noDC_regression",
"I1U_melt_noNA_noDC_regression",
...)]
df_list <- mget(c("I1U_melt_noNA_noDC_regression",
"I1L_melt_noNA_noDC_regression",
"I1U_melt_noNA_noDC_regression",
...))
# GENERALIZED FOR ANY DF IN GLOBAL ENV
df_list <- Filter(function(i) class(i)=="data.frame", eapply(.GlobalEnv, function(d) d))
new_list <- lapply(df_list, function(x) ddply(x, .(Species), model_regression))
And because eapply (being environment apply) is part of the apply family and can iterate through objects, you can bypass lapply. But you must account for non-dataframes and then filter out by df names. Hence, tryCatch is used and [] indexing:
new_list2 <- eapply(.GlobalEnv, function(x)
tryCatch(ddply(x, .(Species), model_regression),
warning = function(w) return(NA),
error = function(e) return(NA)
)
)[c("I1U_melt_noNA_noDC_regression",
"I1L_melt_noNA_noDC_regression",
"I1U_melt_noNA_noDC_regression",
...)]
all.equal(new_list, new_list2)
# [1] TRUE
With all that said, ideally in your data processing you would originally use a named dataframe list and not create separate, similar structured 16 objects flooding your global environment. Therefore, consider adjusting the source of your regression objects, so replace the following:
I1U_melt_noNA_noDC_regression <- ...
with this:
df_list = list()
df_list["I1U_melt_noNA_noDC_regression"] <- ...

Resources