Pulling out elements of a list of tibbles after using dplyr group_split (R) - r

I'm working with a dataset containing the full works of Shakespeare, where each row in the dataframe is a line of a Shakespeare text, and am trying to combine the individual lines of text into one large row of content for each play of Shakespeare's. So far, I've been able to use group_split from dplyr to split the dataframe into tibbles containing the contents of each work, but I can't seem to pull the contents out of the tibbles based on the name of the play.
So far, I've been able to create a large vectors list of tibbles by using the following code:
playtbls <- data %>%
group_split(name)
I'm stuck on how to access anything from the resulting tibbles without repeatedly using indexing formatting. I've been able to pull tcontents of individual tibbles by using the following code, but I know there has to be a faster and cleaner way to do this!
playtbls[[1]][["content"]]
My ultimate goal is to be able to append the contents of these tibbles to a new dataframe "plays" which contains the title and genre of each play, where each play is an individual row. Basically, I need to be able to say for each play's name, when plays$name == playtbls[[x]][["name"]], append playtbls[[x]][["content"]] to the plays dataframe
The data is available from the bardr package.

We can use
lapply(playtbls, `[[`, "content")
Or with tidyverse
library(purrr)
library(dplyr)
map(playtbls, ~ .x %>%
pluck(content))

Related

Create filtered dataframe with dplyr and for loop

I want to create several data frames from an original one in R using for loop.
I want to get three separated data frames for each species to conduct separate analysis.
I tried the following code but it doesn't work:
data(iris)
library(dplyr)
for i in levels(iris$Species){
paste0(i,".data") <- data.frame(filter(iris,
Species=="i"))
}
I do not necessarily need dplyr but it's the one I am used to.
I think it is better to keep the separate frames in a list, which as pointed out in the comments is done easily with split(iris, iris$Species)
If you really want them out of the list and into separate named frames, you can use
list2env(split(iris, iris$Species),envir = .GlobalEnv)
An even better approach would be to keep everything in one data frame and then use dplyr/tidyr’s group_by() and nest() functions to fit a model for each group.
See here for a detailed walkthrough: https://tidyr.tidyverse.org/articles/nest.html

How to use blank df and lists to speed up processing time in R

I know that creating a blank dataframe or list prior to populating it is a good thing to speed up processing time in R, but I'm having trouble executing it. Generally, all I would like to do is create a blank list of dataframes in which a map function fills in after completing some filtering. Below I'll recreate a simplified example to help explain what I'm trying to accomplish.
library(tidyverse)
library(purrr)
library(dplyr)
The code to create the lists of dataframes below is to much to show, but essentially I have a list of 192 dataframes that each contain the same type of information, but the data in each dataframe varies depending on which list.
"ListofDF1" is a list of 192 dataframes, each containing 468 rows and 27 columns of data. This list of dfs is created using a series of map functions.
Next, I have a pmap function that performs many tasks. Too many to show here, but below I'll generally show what I'm trying to accomplish.
return <-
pmap(inputs, function("variables that are contained in inputs dataframe") {
ListofDF2<- map(ListofDF1, ~filter(.,
"series of filters"
%>%
map(., ~data.frame(.) %>%
select(column1)
}
To summarize, inside the pmap function, a map function is performed on ListofDF1 (192 times because ListofDF1 contains 192 dataframes) to filter various metrics in the ListofDF1. The result is ListofDF2, which is a list of 192 dataframes. Note that each dataframe within the list of dfs contains only 1 column (due to select(column1)). But the number of rows in each dataframe are NOT consistent as they are dependent on the filtering that occurs.
I would like to try to improve the speed of my overall pmap function because it is cycling through several thousand times and I believe that creating a blank ListofDF2 may help.
Therefore, does anyone have any suggestions on how to create the blank ListofDF2 list of dataframes and then populate it using the filtering map function? To clarify, my existing code works just fine. I am just trying to improve speed and therefore efficiency.
Additionally, I would also like to crate a blank "return" list for the pmap function to populate. But one step at a time.

Unlisting a dataframe from a list of a list

I want to extract a dataframe from a list that is also inside a list. Also some dataframes have different number of columns than others. This is what i have used without success.
Name of the first list is comments.
df <- do.call(rbind.fill,comments)
When i try
df <- do.call(rbind.fill,comments[[1]])
it does work, but i would like for all the dataframes to be together as one.
I know that this is not a reproducible example, but please bear with me, as this would take some time to repproduce, and i think the problem is clear enough.
Thanks

How to merge a set of lists into a single data frame

I am new to R and coding in general, so please bear with me.
I have a spreadsheet that has 7 sheets, 6 of these sheets are formatted in the same way and I am skipping the one that is not formatted the same way.
The code I have is thus:
lst <- lapply(2:7,
function(i) read_excel("CONFIDENTIAL Ratio 062018.xlsx", sheet = i)
)
This code was taken from this post: How to import multiple xlsx sheets in R
So far so good, the formula works and I have a large list with 6 sub lists that appears to represent all of my data.
It is at this point that I get stuck, being so new I do not understand lists yet, and really need the lists to be merged into one single data frame that looks and feels like the source data (so columns and rows).
I cannot work out how to get from a list to a single data frame, I've tried using R Bind and other suggestions from here, but all seem to either fail or only partially work and I end up with a data frame that looks like a list etc.
If each sheets has the same number of columns (ncol) and same names (colnames) then this will work. It needs the dplyr pacakge.
require(dplyr)
my_dataframe <- bind_rows(my_list)

How can I extract a single element from a list into a data frame?

I compiled a list of ~60 data frames to keep my RStudio environment tidy.
I will need to occasionally extract a single element into a data frame so that I can work on it before putting it back into the list - how can this extract be achieved?
I am aware that I can manipulate the list element directly, but that isn't ideal and being able to extract the data frame would serve me better for my needs.
If dflist is your list of dataframes, then the easiest way to work on element n would be something like
df <- dflist[[n]]
#...work on df...then
dflist[[n]] <- df

Resources