So essentially I have a list of dataframes that I want to apply as.character() to.
To obtain the list of dataframes I have a list of files that I read in using a map() function and a read funtion that I created. I can't use map_df() because there are columns that are being read in as different data types. All of the files are the same and I know that I could hard code the data types in the read function if I wanted, but I want to avoid that if I can.
At this point I throw the list of dataframes in a for loop and apply another map() function to apply the as.character() function. This final list of dataframes is then compressed using bind_rows().
All in all, this seems like an extremely convoluted process, see code below.
audits <- list.files()
my_reader <- function(x) {
my_file <- read_xlsx(x)
}
audits <- map(audits, my_reader)
for (i in 1:length(audits)) {
audits[[i]] <- map_df(audits[[i]], as.character)
}
audits <- bind_rows(audits)
Does anybody have any ideas on how I can improve this? Ideally to the point where I can do everything in a single vectorised map() function?
For reproducibility you can use two iris datasets with one of the columns datatypes changed.
iris2 <- iris
iris2[1] <- as.character(iris2[1])
my_list <- list(iris, iris2)
as.character works on vector whereas data.frame is a list of vectors. An option is to use across if we want only a single use of map
library(dplyr)
library(purrr)
map_dfr(my_list, ~ .x %>%
mutate(across(everything(), as.character)))
I wanted to show a base R solution just incase if it helps anyone else. You can use rapply to recursively go through the list and apply a function. you can specify class and if you want to replace or unlist/list the returned object:
iris2 <- iris
iris2[1] <- as.character(iris2[1])
my_list <- list(iris, iris2)
mylist2 <- rapply(my_list, class = "ANY", f = as.character, how = "replace")
bigdf <- do.call(rbind, mylist2)
Related
I know this topic appeared on SO a few times, but the examples were often more complicated and I would like to have an answer (or set of possible solutions) to this simple situation. I am still wrapping my head around R and programming in general. So here I want to use lapply function or a simple loop to data list which is a list of three lists of vectors.
data1 <- list(rnorm(100),rnorm(100),rnorm(100))
data2 <- list(rnorm(100),rnorm(100),rnorm(100))
data3 <- list(rnorm(100),rnorm(100),rnorm(100))
data <- list(data1,data2,data3)
Now, I want to obtain the list of means for each vector. The result would be a list of three elements (lists).
I only know how to obtain list of outcomes for a list of vectors and
for (i in 1:length(data1)){
means <- lapply(data1,mean)
}
or by:
lapply(data1,mean)
and I know how to get all the means using rapply:
rapply(data,mean)
The problem is that rapply does not maintain the list structure.
Help and possibly some tips/explanations would be much appreciated.
We can loop through the list of list with a nested lapply/sapply
lapply(data, sapply, mean)
It is otherwise written as
lapply(data, function(x) sapply(x, mean))
Or if you need the output with the list structure, a nested lapply can be used
lapply(data, lapply, mean)
Or with rapply, we can use the argument how to get what kind of output we want.
rapply(data, mean, how='list')
If we are using a for loop, we may need to create an object to store the results.
res <- vector('list', length(data))
for(i in seq_along(data)){
for(j in seq_along(data[[i]])){
res[[i]][[j]] <- mean(data[[i]][[j]])
}
}
I have a problem with cleaning up my code. I understand I could type this all out but we don't want that obviously.
I have only dataframes in my global environment. They are all "data.frame".
I want to check the dimensions of all of them and put that in a tibble. I managed that somehow. I also would like to change their colnames() tolower() which works easy if I just type the name of the data.frame, but there's more than 2 and I want it done automatically. Then I also want to mutate all data.frames in the same way.
Small example of my code:
library(tidyverse)
x <- data.frame(letters[1:2]) #To create the data
y <- data.frame(letters[3:4])
dfs <- as.list(ls()) #I take whatever is in my environment
I managed below to get a tibble of the dimensions:
z <- as_tibble(lapply(seq_along(dfs),
function(j) dim(get(dfs[[j]]))), .name_repair = "unique")
colnames(z) <- dfs
Now for the colnames of all the data.frames stored in my list I basically want to perform this code:
colnames(dfs[[1]]) <- tolower(colnames(dfs[[1]])
but that returns NULL as I found out earlier. So I used get() in there to make it work for the dimensions. But if I use get() to assign colnames it says it can't find function "get<-".
Since all colnames for all dataframes are the same (just different nrows()) I could save the lowercase colnames as value and use that, but that doesn't take away that it cant find the get<- function.
names <- tolower(colnames(x))
sapply(seq_along(dfs),
function(j) colnames(get(dfs[[j]])) <- names)
*Error in colnames(get(dfs[[j]])) <- names :
could not find function "get<-"*
as for the mutating part I tried a for loop:
for(i in seq_along(dfs)){
get(dfs[[i]]) <- get(dfs[[i]]) %>% mutate(cd = ab)
}
But it's the same issue.
Could anyone help clearing this problem for me? (and if a cleaner code for the dimensions is available that would be highly appreciated)
I am just trying to up my coding skills. I would have been long done if I just typed it all out but that defeats the purpose.
Thanks!
-JK
Using base R
lapply(dfs, function(x) transform(setNames(x, tolower(names(x))), X = c('a', 'b')))
I have multiple lists starting with the same name.
(values_1, values_2,values_n)
Is there a way to combine them like
all_lists <- c(values_*)
As suggested by Ronak Shah comment:
You have to work with the global environment .GlobalEnv
The function ls returns all the objects already defined in the .GlobalEnv
The pattern parameter allows you to obtain only objects which match the pattern.
ls() returns a character vector with the names of the objects.
To access the value of objects with their names, you have to use the get() function
When you have multiple names, you can use mget(). So the final snippet is
list_data <- mget(ls(pattern = 'values_'))
If you want to do the same with dataframes
Here is a working example:
mtc_1 <- mtcars
mtc_2 <- mtcars
mtc_3 <- mtcars
list_data <- mget(ls(pattern = 'mtc_'))
do.call(rbind, list_data)
Have been researching this question on SO, and found only solutions for merging list elements into one large data frame. However, I am struggling with unpacking only those elements that meet certain condition.
df1 <- iris %>% filter(Sepal.Length > 2.5)
df2 <- mtcars %>% filter(qsec > 16)
not_neccessary <- head(diamonds, 10)
not_neccessary2 <- head(beaver1, 12)
data_lists <- list("#123 DATA" = df1, "CON" = not_neccessary2, "#432 DATA" = df2, "COM" = not_neccessary)
My goal is to convert only those list elements that contain "DATA" in their name. I was thinking about writing a loop function within a lapply:
a <- lapply(data_lists, function(x){if (x == "#+[1-9]+_+DATA"){new_df <- as.data.frame(x)}})
It does not work. Also was trying to make a for loop:
for (i in list){
if (i == "#+[1-9]+_+DATA"){
df <- i
}
}
It does not work neither.
Is there any effective function that will unpack my list into particular dataframes by certain condition? My R skills are very bad, especially in writing functions, although I am not really new to this language. Sorry about that.
Use grepl/grep to find lists that have 'DATA' in their name and subset the list.
result <- data_lists[grepl('DATA', names(data_lists))]
#With `grep`
#result <- data_lists[grep('DATA', names(data_lists))]
Using %like%
result <- data_lists[names(data_lists) %like% 'DATA']
I am trying to use bind_rows and tibble from tidyverse, and getting unexpected results.
When I combine several data frames with bind_rows and then transform them to a tibble, the column names get messed up:
library(tidyr)
pred.models <- c('1.csv', '2.csv', '3.csv')
prediction.slides <- list()
for (modelid in pred.models){
tmp <- read.csv(modelid)
tmp[,'modelid'] <- modelid
prediction.slides[[length(prediction.slides)+1]] <- (tmp)
}
prediction.slides <- (bind_rows(prediction.slides))
typeof(prediction.slides)
# -> list
# now let's see what we got:
prediction.slides
# -> `bind_rows(prediction.slides)`$hash $class_prob $modelid
However, when I try following:
pred.models <- c('1.csv', '2.csv', '3.csv')
prediction.slides <- list()
for (modelid in pred.models){
tmp <- read.csv(modelid)
tmp[,'modelid'] <- modelid
############################################ Changed here:
prediction.slides[[length(prediction.slides)+1]] <- tibble(tmp)
}
prediction.slides <- (bind_rows(prediction.slides))
I am getting an error Error: Argument 1 can't be a list containing data frames on the last line. Which is very strange given that bind_rows is for combining list of data frames according to the docs.
Any idea how to do it correctly and get a nice tibble as output?
UPD: csv files look like following:
hash,class_prob
1578d8,0.9451976000
1c7644,0.4519760001
dc7358,0.5197600012
The reason is that tibble() doesn't do what you think it does. You need as_tibble() instead. tibble() is used to construct data.frames from given inputs, while as_tibble() transforms the input into a tibble, which is what you want.