How to export all dataframes made by group_split - r

I am doing some text analysis in R Studio, and as part of this analysis I have split my data frame into various tibbles, split by a column in my data called "topic". This has worked successfully.
All I need to do now is find some way to export each of those tibbles into a csv, or xlxs or even html - anything that will let me look through them properly.
Has anyone got any solutions for this? Feels like it should be something easy to do but in my research it is not.
A screenshot of the tibbles I am trying to export
Thanks

You may use map or lapply to write each dataframe to a csv. However, group_split does not give names to the list. To get proper names of the csv you can use split and imap together.
For example, with iris dataset -
library(tidyverse)
iris %>%
split(.$Species) %>%
imap(~write_csv(.x, paste0(.y, '.csv')))
This creates 3 csvs named virginia.csv, versicolor.csv and setosa.csv in the working directory.

If you don't mind using a for() loop try this. You could probably find a better way to name the list items, but this works.
my_list <- mtcars %>%
group_split(gear)
names(my_list) <- 1:length(my_list)
for(i in 1:length(my_list)){
write_csv(my_list[[i]], file = paste0(names(my_list)[i], ".csv"))
}

Related

Parse tables split across multiple pages on pdf, in single table, in r

How can I parse a data (table) which has been split across multiple pages on a pdf document here into a single table in R?
Code I have tried - I am still wondering how can it be done, as I am not good in parsing text files.
Please help?
With the following code, you just need now to separate the columns of the dataframe df (you can use Excel for that):
library(pdftools)
library(tidyverse)
text <- pdf_text("consolidated transfer orders.pdf")
df <- map_dfr(1:length(text),
~ str_extract_all(text[.x],"(?<=\\n\\s{1,3})\\d+\\s+(.*)") %>%
unlist() %>% data.frame())

Nested for loops in R Using Tidyverse, where iterator is generated within the first loop

I'm working on a little problem in R where I'm curious if I can use purrr to iterate over excel files and sheets.
I've seen a lot of examples where map() is being given the iterable object directly, i.e. map(1:6, function(x)... but I'm not sure if/how to do it when I want to generate one of the iterators within the first map call.
Take this example where we have a folder of excel files, and we want to run the same function on each sheet. So we need two loops, one to loop through the files, and one to iterate through the sheets.
library(tidyverse)
library(readxl)
fileList <- list.files()
customFunction <- function(xlsheet, filePath){
return(1)
}
output <- list()
i <- 1
for (file in fileList){
# get a list of excel sheets in each file
sheet_list <- excel_sheets(file)
for (sheet in sheet_list){
# apply customFunction to each sheet of each file
output[[i]] <- customFunction(sheet, file)
i <- i + 1
}
}
I think where I'm getting stuck is that I need arguments from both the first and the second loop in each call of customFunction().
Looking at an example from #r2evans in another question, it seems like they're describing something like this:
map(fileList, ~ map(excel_sheets(.x), ~ customFunction(.x, .y)))
But that returns an error in my actual code (and in this example it returns a nested list instead of a single list like the for loop, noting my example won't fail if the sheet and path aren't correctly passed to customFunction)
Error in is_string(path) : the ... list contains fewer than 2 elements
And I'm honestly a little lost with the .x and .y pronouns
Finally, if this is a silly thing to try to do with purrr and the for loop is generally a better solution, that's great feedback too.
Using .x and .y can be confusing when you have nested maps. I usually prefer to use an anonymous function to be clear and yes nested map's would return nested list. You can use flatten to get one big list like the for loop or use unlist with recursive = FALSE.
library(purrr)
flatten(map(fileList, function(file)
map(excel_sheets(file), function(sheet)
customFunction(file, sheet))))

R: Generate dynamic names for dataframes

I need to read several csv files from a directory and save each data in separate dataframe.
The filenames are in a character vector:
lcl_forecast_data_files <- dir(lcl_forecast_data_path, pattern=glob2rx("*.csv"), full.names=TRUE)
For example: "fruc2021.csv", "gem2020.csv", "strb2021.csv".
So far I am reading the files step by step:
fruc2021 <- read_csv2("fruc2021.csv")
gem2020 <- read_csv2("gem2020.csv")
strb2010 <- read_csv2("strb2021.csv")
But there are many more files in the directory and subdirectories. To read them all one by one is very tedious.
Now I have already experimented a little with the map function, but I have not yet figured out how to automatically generate the names of the dataframes from the file names.
A first simple try was:
lcl_forecast_data <- lcl_forecast_data_files %>%
map(
function(x) {
str_replace(basename(x), ".csv","") <- read_csv2(x)
}
)
But this did not work :-(
Is it even possible to generate names for dataframes like this?
Or are there other, simpler possibilities?
Greetings
Benne
Translated with www.DeepL.com/Translator (free version)
If you do not want to use a list and lapply as #Onyambu suggested you can use assign() to generate the dataframes.
filenames <- c("fruc2021.csv", "gem2020.csv", "strb2021.csv")
for (i in filenames) {
assign(paste('',gsub(".csv","",i),sep=''),read.csv(i))
}

How to Create Excel Pivot Table to R

I want to create a pivot table from my data set in excel to R. I have been following this tutorial on how to do this: http://excel2r.com/pivot-tables-in-r-basic-pivot-table-columns-and-metrics/ . I have used the codes mentioned in this tutorial by replacing it with my own data variables, but I keep getting an error message noting: Error: select() doesn't handle lists.
What does this error message mean and how I can I fix this?
The R-Script I have been using from the tutorial is:
library(dplyr)
library(tidyr)
pivot <- df %>%
select(Product.Category, Region, Customer.Segment, Sales)%>%
group_by(Product.Category, Region, Customer.Segment) %>%
summarise(TotalSales = sum(Sales))
Thank you in advance for the help!
By your error message: "select() doesn't handle lists.", I supose that your object called df isn't a dataframe.
Maybe you have a dataframe inside a list.
Try this in your R console:
class(df)
If the class is a list, you need take off the dataframe from the list. You can do this by the position. Probably in the first position. df[[1]]
The functions that you are using, works only for dataframes in general. (And tibbles, that is a another type of dataframe)
Like this example:
I hope it works for you.
And, for the next time, try to make an reproducible example.
You could at least print your dataframe original, before try to use these functions, that way I could help you efficiently.

exports hundreds of dataframes as xlsx using loop in R

I created hundreds of data frames in R, and I want to export them to a local position. All the names of the data frames are stored in a vector :
name.vec<-c('df1','df2','df3','df4','df5','df5')
each of which in name.vec is a data frame .
what I want to do is to export those data frames as excel file, but I did not want to do it the way below :
library("xlsx")
write.xlsx(df1,file ="df1.xlsx")
write.xlsx(df2,file ="df2.xlsx")
write.xlsx(df3,file ="df3.xlsx")
because with hundreds of data frames, it's tedious and dangerous.
I want some thing like below instead :
library('xlsx')
for (k in name.vec) {
write.xlsx(k,file=paste0(k,'.xlsx'))
}
but this would not work.
Anyone know how to achieve this? your time and knowledge would be deeply appreciated. Thanks in advance.
The first reason the for loop doesn't work is that the code is attempting to write a single name, 'df1' for example, as the xlsx file contents, instead of the data frame. This is because you're not storing the data frames themselves in the "name.vec" you have. So to fix the for loop, you'd have to do something more like this:
df.list<-list(df1,df2,df3)
name.vec<-c('df1','df2','df3')
library('xlsx')
for (k in 1:length(name.list)){
write.xlsx(df.list[[k]],file=paste0(name.vec[k],'.xlsx'))
}
However, for loops are generally slower than other options. So here's another way:
sapply(1:length(df.list),
function(i) write.xlsx(df.list[[i]],file=paste0(name.vec[i],'.xlsx')))
Output is 3 data frames, taken from the df list, named by the name vector.
It may also be best to at some point switch to the newer package for this: writexl.

Resources