How to unpack particular list elements into dataframes in R?

How to unpack particular list elements into dataframes in R? - r

Have been researching this question on SO, and found only solutions for merging list elements into one large data frame. However, I am struggling with unpacking only those elements that meet certain condition.
df1 <- iris %>% filter(Sepal.Length > 2.5)
df2 <- mtcars %>% filter(qsec > 16)
not_neccessary <- head(diamonds, 10)
not_neccessary2 <- head(beaver1, 12)
data_lists <- list("#123 DATA" = df1, "CON" = not_neccessary2, "#432 DATA" = df2, "COM" = not_neccessary)
My goal is to convert only those list elements that contain "DATA" in their name. I was thinking about writing a loop function within a lapply:
a <- lapply(data_lists, function(x){if (x == "#+[1-9]+_+DATA"){new_df <- as.data.frame(x)}})
It does not work. Also was trying to make a for loop:
for (i in list){
if (i == "#+[1-9]+_+DATA"){
df <- i
}
}
It does not work neither.
Is there any effective function that will unpack my list into particular dataframes by certain condition? My R skills are very bad, especially in writing functions, although I am not really new to this language. Sorry about that.

Use grepl/grep to find lists that have 'DATA' in their name and subset the list.
result <- data_lists[grepl('DATA', names(data_lists))]
#With `grep`
#result <- data_lists[grep('DATA', names(data_lists))]

Using %like%
result <- data_lists[names(data_lists) %like% 'DATA']

Related

How to name data frame in for loops using object?

I have an object that contains list of lab tests and based on the length of the object, I have created a FOR loop that processes scripts. During each loop, R should create a data frame using list in that object. Please see below.
adlb <- data.frame(subjid = c(1:20), aval = c(100:119))
adlb$paramcd <- ifelse(adlb$subjid <= 10, "ALT", "AST")
lab_list <- unique(filter(adlb, !is.na(aval))$paramcd)
for (i in 1:length(lab_list))
{
lab_name <- unlist(lab_list)[[i]]
print(lab_name)`
**???** <- adlb %>%
dplyr::filter(paramcd == lab_name) %>%
drop_na(aval)
}
When I run it, it should first create data frame named ALT followed by AST. What should I replace ??? with?
Only reason why I would prefer it this way is because it helps me to review data in question and debug scripts when needed.
Thank you in advance.
I tried lab_name[[i]] and few other options but it resulted in either error or incorrect data frame name.

I think this might help:
# example dataframes
df1 <- iris
df2 <- mtcars
df3 <- iris
#put them into list
mylist <- list(df1,df2,df3)
#give names to list
names(mylist) <- c("df_name1","df_name2","df_name3")
#put dataframes into global env
list2env(mylist ,.GlobalEnv)

Apply an `as.character()` function to a list of dataframes

So essentially I have a list of dataframes that I want to apply as.character() to.
To obtain the list of dataframes I have a list of files that I read in using a map() function and a read funtion that I created. I can't use map_df() because there are columns that are being read in as different data types. All of the files are the same and I know that I could hard code the data types in the read function if I wanted, but I want to avoid that if I can.
At this point I throw the list of dataframes in a for loop and apply another map() function to apply the as.character() function. This final list of dataframes is then compressed using bind_rows().
All in all, this seems like an extremely convoluted process, see code below.
audits <- list.files()
my_reader <- function(x) {
my_file <- read_xlsx(x)
}
audits <- map(audits, my_reader)
for (i in 1:length(audits)) {
audits[[i]] <- map_df(audits[[i]], as.character)
}
audits <- bind_rows(audits)
Does anybody have any ideas on how I can improve this? Ideally to the point where I can do everything in a single vectorised map() function?
For reproducibility you can use two iris datasets with one of the columns datatypes changed.
iris2 <- iris
iris2[1] <- as.character(iris2[1])
my_list <- list(iris, iris2)

as.character works on vector whereas data.frame is a list of vectors. An option is to use across if we want only a single use of map
library(dplyr)
library(purrr)
map_dfr(my_list, ~ .x %>%
mutate(across(everything(), as.character)))

I wanted to show a base R solution just incase if it helps anyone else. You can use rapply to recursively go through the list and apply a function. you can specify class and if you want to replace or unlist/list the returned object:
iris2 <- iris
iris2[1] <- as.character(iris2[1])
my_list <- list(iris, iris2)
mylist2 <- rapply(my_list, class = "ANY", f = as.character, how = "replace")
bigdf <- do.call(rbind, mylist2)

input variable stored in list into loop in R

I'm sure there are much better ways of doing this, I'm open to suggestions.
I have these vectors:
vkt1 <- c("df1", "df2", "df3")
vector2 <- paste("sample", wSheatx, sep="_")
The first vector contains a list of the names of dataframes stored in the environment. These are stored as strings, but I'd like to call them as variable names.
The second vector is just the first one adding "sample" at the beggining, equivalent to:
vector2 <- c('sample_df1', 'sample_df2', 'sample_df3')
These strings from vector2 would serve as the names of new data frames to be created.
Alrighty, so now I want to do something like this:
for (i in 1:length(vector){ # meaning for i in 1,2,3
vector2[i] = data.frame(which(eval(parse(text = vkt1[i])) == "Some_String", arr.ind=TRUE))
addStyle(wb, vkt1[i], cols = 1:ncol(eval(parse(text = vkt1[i]))), rows = vector2[[i]][,1]+1, style = duppedStyle, gridExpand = TRUE)
}
It may look complicated, but the idea is to make a data frames named as the strings contained in vector2, being a subset of the data frames from vkt1 when "Some_String" is found.
Then, use that created data frame and add a style to the entire row when said string is present.
vector2[[i]][,1]+1 is intended to deploy as sample_df1[,1]+1 (in the first iteration)
Note that I'm using eval(parse(text = vkt1[i])) to get the variables from the strings of vkt1. So, say, eval(parse(text = vkt1[1])) is equal do df1 (the data frame, not the string)
Like this, the code gives the following error:
In file(filename, "r") :
cannot open file 'noCoinColor_Concat': No such file or directory
Been trying to get it working like so, but I'm beginning to feel this approach might be very wrong.

It is easier to manage code and data when you keep them in a list instead of separate dataframes.
You can use mget to get all the dataframes in vkt1 in a string and let's say you want to search for 'Some_String' in the first column of each dataframe, so you can do :
new_data <- lapply(mget(vkt1), function(df) df[df[[1]] == 'Some_String', ])
I haven't included the addStyle code here because I don't know from which package it is and what it does but you can easily include it in lapply's anonymous function.

Is it not easier to combine your data frames into a list and then use apply or map family functions to adjust your data frames?
data(mtcars)
df1 <- mtcars %>% filter(cyl == 4)
df2 <- mtcars %>% filter(cyl == 6)
df3 <- mtcars %>% filter(cyl == 8)
df_old_names <- c("df1", "df2", "df3")
df_new_names <- c("df_cyl_4", "df_cyl_6", "df_cyl_8")
df_list <- lapply(df_old_names, get)
names(df_list) <- df_new_names

Using a loop to create multiple dataframes from a single dataset

Quick question for you. I have the following:
a <- c(1,5,2,3,4,5,3,2,1,3)
b <- c("a","a","f","d","f","c","a","r","a","c")
c <- c(.2,.6,.4,.545,.98,.312,.112,.4,.9,.5)
df <- data.frame(a,b,c)
What i am looking to do is utilize a for loop to create multiple data frames from rows based on the column contents of column B (i.e. a df for the "a," the "d," and so on).
At the same time, I would also like to name the data frame based on the corresponding value from column B (df will be named "a" for the data frame created from the "a."
I tried making it work based off the answers provided here Using a loop to create multiple data frames in R but i had no luck.
If it helps, I have variables created with levels() and nlevels() to use in the loop to keep it scalable based on how my data changes. Any help would be much appreciated.
Thanks!

This should do:
require(dplyr)
df$b <- as.character(df$b)
col.filters <- unique(df$b)
lapply(seq_along(col.filters), function(x) {
filter(df, b == col.filters[x])
}
) -> list
names(list) <- col.filters
list2env(list, .GlobalEnv)
Naturally, you don't need dplyr to do this. You can just use base syntax:
df$b <- as.character(df$b)
col.filters <- unique(df$b)
lapply(seq_along(col.filters), function(x) {
df[df[, "b"] == col.filters[x], ]
}
) -> list
names(list) <- col.filters
list2env(list, .GlobalEnv)
But I find dplyrmuch more intuitive.
Cheers

R: Adress objects deep inside lists with filter commands inside functions/loops (ExtremeBounds package)

I am using the ExtremeBounds package which provides as a result a multi level list with (amongst others) dataframes at the lowest level. I run this package over several specifications and I would like to collect some columns of selected dataframes in these results. These should be collected by specification (spec1 and spec2 in the example below) and arranged in a list of dataframes. This list of dataframes can then be used for all kind of things, for example to export the results of different specifications into different Excel Sheets.
Here is some code which creates the problematic object (just run this code blindly, my problem only concerns how to deal with the kind of list it creates: eba_results):
library("ExtremeBounds")
Data <- data.frame(var1=rbinom(30,1,0.2),var2=rbinom(30,2,0.2),
var3=rnorm(30),var4=rnorm(30),var5=rnorm(30))
spec1 <- list(y=c("var1"),
freevars=c("var2"),
doubtvars=c("var3","var4"))
spec2 <- list(y=c("var1"),
freevars=c("var2"),
doubtvars=c("var3","var4","var5"))
indicators <- c("spec1","spec2")
ebaFun <- function(x){
eba <- eba(data=Data, y=x$y,
free=x$freevars,
doubtful=x$doubtvars,
reg.fun=glm, k=1, vif=7, draws=50, weights = "lri", family = binomial(logit))}
eba_results <- lapply(mget(indicators),ebaFun) #eba_results is the object in question
Manually I know how to access each element, for example:
eba_results$spec1$bounds$type #look at str(eba_results) to see the different levels
So "bounds" is a dataframe with identical column names for both spec1 and spec2. I would like to collect the following 5 columns from "bounds":
type, cdf.mu.normal, cdf.above.mu.normal, cdf.mu.generic, cdf.above.mu.generic
into one dataframe per spec. Manually this is simple but ugly:
collectedManually <-list(
manual_spec1 = data.frame(
type=eba_results$spec1$bounds$type,
cdf.mu.normal=eba_results$spec1$bounds$cdf.mu.normal,
cdf.above.mu.normal=eba_results$spec1$bounds$cdf.above.mu.normal,
cdf.mu.generic=eba_results$spec1$bounds$cdf.mu.generic,
cdf.above.mu.generic=eba_results$spec1$bounds$cdf.above.mu.generic),
manual_spec2= data.frame(
type=eba_results$spec2$bounds$type,
cdf.mu.normal=eba_results$spec2$bounds$cdf.mu.normal,
cdf.above.mu.normal=eba_results$spec2$bounds$cdf.above.mu.normal,
cdf.mu.generic=eba_results$spec2$bounds$cdf.mu.generic,
cdf.above.mu.generic=eba_results$spec2$bounds$cdf.above.mu.generic))
But I have more than 2 specifications and I think this should be possible with lapply functions in a prettier way. Any help would be appreciated!
p.s.: A generic example to which hrbrmstr's answer applies but which turned out to be too simplistic:
exampleList = list(a=list(aa=data.frame(A=rnorm(10),B=rnorm(10)),bb=data.frame(A=rnorm(10),B=rnorm(10))),
b=list(aa=data.frame(A=rnorm(10),B=rnorm(10)),bb=data.frame(A=rnorm(10),B=rnorm(10))))
and I want to have an object which collects, for example, all the A and B vectors into two data frames (each with its respective A and B) which are then a list of data frames. Manually this would look like:
dfa <- data.frame(A=exampleList$a$aa$A,B=exampleList$a$aa$B)
dfb <- data.frame(A=exampleList$a$aa$A,B=exampleList$a$aa$B)
collectedResults <- list(a=dfa, b=dfb)

There's probably a less brute-force way to do this.
If you want lists of individual columns this is one way:
get_col <- function(my_list, col_name) {
unlist(lapply(my_list, function(x) {
lapply(x, function(y) { y[, col_name] })
}), recursive=FALSE)
}
get_col(exampleList, "A")
get_col(exampleList, "B")
If you want a consolidated data.frame of indicator columns this is one way:
collect_indicators <- function(my_list, indicators) {
lapply(my_list, function(x) {
do.call(rbind, c(lapply(x, function(y) { y[, indicators] }), make.row.names=FALSE))
})[[1]]
}
collect_indicators(exampleList, c("A", "B"))
If you just want to bring the individual data.frames up a level to make it easier to iterate over to write to a file:
unlist(exampleList, recursive=FALSE)
Much assumption about the true output format is being made (the question was a bit vague).

There is a brute force way which works but is dependent on several named objects:
collectEBA <- function(x){
df <- paste0("eba_results$",x,"$bounds")
df <- eval(parse(text=df))[,c("type",
"cdf.mu.normal","cdf.above.mu.normal",
"cdf.mu.generic","cdf.above.mu.generic")]
df[is.na(df)] <- "NA"
df
}
eba_export <- lapply(indicators,collectEBA)
names(eba_export) <- indicators