Using lapply instead of repeating code - r

I would like to know how to use lapply and/or for loops to have more concise code.
This is what I currently have and it works.
MLFreq <- MLlyrics %>%
unnest_tokens(word, line) %>%
anti_join(stop_words) %>%
ungroup() %>%
count(word)
MLpct <- sum(albumList2$MLlyrics$n) / sum(MLFreq$n)
ViewFreq <- ViewLyrics %>%
unnest_tokens(word, line) %>%
anti_join(stop_words) %>%
ungroup() %>%
count(word)
Viewpct <- sum(albumList2$ViewLyrics$n) / sum(ViewFreq$n)
#... repeating 6 times with different data frames
I've been trying
Freq <- lapply(albumList2, function(df){
df %>% unnest_tokens(word, line) %>%
anti_join(stop_words) %>%
ungroup()%>%
count(word) %>%
sum(albumList2$df$n) / sum(df$n)
})
and
for (i in 1:length(albumList2)) {
unnest_tokens(word, line) %>%
anti_join(stop_words) %>%
ungroup()%>%
count(word) %>%
print(sum(albumList2$i$n) / sum(i$n))
}
but the lapply brings
Error in check_input(x) : Input must be a character vector of any length or
a list of character vectors, each of which has a length of 1.
and the for loop brings
no applicable method for 'unnest_tokens_' applied to an object of class
"function"
For reference albumList2 contains a list of data frames (MLlyrics, ViewLyrics, etc...)
I was originally going to leave it as is but just read something along the lines of "If you use the same code 3 times, loop over it"

The problem with the lapply example is that the list that you are looping over is a nested list instead of single list.
Also, references of the kind: sum(albumList2$df$n) / sum(df$n) & print(sum(albumList2$i$n) / sum(i$n)) would not work.
i is just a number ranging from 1 to length(albumList2). Saying you want 1$n or albumList2$1$n does not make sense.
You should read about indexing in lists and nested lists here and here. Please add some dummy data that everyone can test and help you better.

Related

R Shiny how to render a list without displaying c(...)

I'm trying to display a list with the top three users based on a user-selected variable (see below). I have created a function that filters my table based on the selection of the Agency via the dropdown and retrieves the top 3 users in a column. I then transfromed the column into a string to render it in the app, but the results are being displayed in between c(...):
I'm okay with the format of the names separated by a comma, but I cannot find a way to eliminate the c(...).
This is the code for my function:
Top3UsersbyAgency <- function(filteredbyAgencyPool) {
filteredbyAgencyPool %>%
arrange(desc(MTD_Domestic)) %>%
group_by(userDisplayName) %>%
head(3) %>%
select(userDisplayName) %>%
na.exclude() %>%
na_if("") %>%
na.omit() %>%
toString()
}
And this is the result:
> Top3UsersbyAgency(filteredbyAgencyPool)
[1] "c(\"Payal Malhotra\", \"Swati Parmar\", \"Unassigned\")"
In the app, I simply used textOutput in the ui and renderText in the server function. I tried to also use renderTable to display the results in the column, but it honestly looks ugly with the title of the column in the middle, so I'd rather display the information just as a list of names in plain text. Any suggestion on how to clean this string up?
Try to transform data.frame column to vector with %>% .[[1]]:
op3UsersbyAgency <- function(filteredbyAgencyPool) {
filteredbyAgencyPool %>%
arrange(desc(MTD_Domestic)) %>%
group_by(userDisplayName) %>%
head(3) %>%
select(userDisplayName) %>%
na.exclude() %>%
na_if("") %>%
na.omit() %>%
.[[1]] %>%
toString()
}
As an illustration :
library(magrittr)
data.frame(c('a','b','c')) %>% toString()
#> [1] "c(\"a\", \"b\", \"c\")"
data.frame(c('a','b','c')) %>% .[[1]] %>% toString()
#> [1] "a, b, c"

RLang: accessing list items via chaining and map (purrr)

I'm trying R again after a few years, much more used to Python dicts or Kotlin maps or JS objects. I am simply trying the access the value of key-value pairs after using some chaining methods. Unfortunately the normal accessors $ and [[ are not returning the expected values, or throwing errors.
Any idea how to simply get a list of the correct state names ("Alabama", "California", "Arizona") from my sample code? Thank you.
states_list <- list("AL"="Alabama", "AK"="Alaska", "AZ"="Arizona", "CA"="California") # (etc)
states_hash <- hash("AL"="Alabama", "AK"="Alaska", "AZ"="Arizona", "CA"="California") # (etc)
"AL-CA-AZ" %>% str_split("-") %>% map(~ states_list$.x) # NULL
"AL-CA-AZ" %>% str_split("-") %>% map(~ states_list[.x]) # k-v pairs, not just the values
"AL-CA-AZ" %>% str_split("-") %>% map(~ states_hash$.x) # NULL
"AL-CA-AZ" %>% str_split("-") %>% map(~ has.key(.x, states_hash)) # AL:TRUE CA:TRUE AZ:TRUE
"AL-CA-AZ" %>% str_split("-") %>% map(~ states_hash[.x]) # k-v pairs, not just the values
"AL-CA-AZ" %>% str_split("-") %>% map(~ states_list[[.x]]) # error - "recursive indexing failed at level 2"
"AL-CA-AZ" %>% str_split("-") %>% map(~ states_hash[[.x]]) # error - "wrong arguments for subsetting an environment"
"AL-CA-AZ" %>% str_split("-") %>% states_list[[.x]] # error - "object '.x' not found"
Your problem is really the str_split step. Note that str_split will return a list, rather than a vector. (It does this because you can pass multiple strings to the function at once and it will keep all the results separated in the list.) So when you map over that list, you are just mapping over the single list, not each of three elements in the vector in the list. A somewhat clumsy way to change that is
"AL-CA-AZ" %>% {str_split(., "-")[[1]]} %>% map(~states_list[[.x]])
You can clean it up a bit with purrr:pluck
"AL-CA-AZ" %>% str_split("-") %>% pluck(1) %>% map(~states_list %>% pluck(.x))
Or just do direct indexing for the last step
"AL-CA-AZ" %>% str_split("-") %>% pluck(1) %>% {states_list[.]}

Loop over list in R, conduct analysis specific to element in list, save results in element dataframe?

I am trying to replicate an analysis using tidytext in R, except using a loop. The specific example comes from Julia Silge and David Robinson's Text Mining with R, a Tidy Approach. The context for it can be found here: https://www.tidytextmining.com/sentiment.html#sentiment-analysis-with-inner-join.
In the text, they give an example of how to do sentiment analysis using the NRC lexicon, which has eight different sentiments, including joy, anger, and anticipation. I'm not doing an analysis for a specific book like the example, so I commented out that line, and it still works:
nrc_list <- get_sentiments("nrc") %>%
filter(sentiment == "joy")
wordcount_joy <- wordcount %>%
# filter(book == "Emma") %>%
inner_join(nrc_list) %>%
count(word, sort = TRUE)
As I said before, this works. I now want to modify it to loop over all eight emotions, and save the results in a dataframe labeled with the emotion. How I tried to modify it:
emotion <- c('anger', 'disgust', 'joy', 'surprise', 'anticip', 'fear', 'sadness', 'trust')
for (i in emotion) {
nrc_list <- get_sentiments("nrc") %>%
filter(sentiment == "i")
wcount[[i]] <- wordcount %>%
inner_join(nrc_list) %>%
count(word, sort = TRUE)
}
I get an "Error: object 'wcount' not found" message when I do this. I have googled this and it seems like the answers to this question is to use wcount[[i]] but clearly something is off when I tried adapting it. Do you have any suggestions?
Code below will do the trick. Note that you are refering to wordcount in your loop and the example uses tidybooks. Code follows the steps as in the link to tidytextmining you are refering to.
library(janeaustenr)
library(dplyr)
library(stringr)
library(tidytext)
tidy_books <- austen_books() %>%
group_by(book) %>%
mutate(linenumber = row_number(),
chapter = cumsum(str_detect(text, regex("^chapter [\\divxlc]",
ignore_case = TRUE)))) %>%
ungroup() %>%
unnest_tokens(word, text)
emotion <- c('anger', 'disgust', 'joy', 'surprise', 'anticip', 'fear', 'sadness', 'trust')
# initialize list with the length of the emotion vector
wcount <- vector("list", length(emotion))
# name the list entries
names(wcount) <- emotion
# run loop
for (i in emotion) {
nrc_list <- get_sentiments("nrc") %>%
filter(sentiment == i)
wcount[[i]] <- tidy_books %>%
inner_join(nrc_list) %>%
count(word, sort = TRUE)
}

How to resolve 'Don't know how to pluck from a closure' error in R

The code below works if I remove the Sys.sleep() from within the map() function. I tried to research the error ('Don't know how to pluck from a closure') but i haven't found much on that topic.
Does anyone know where I can find documentation on this error, and any help on why it is happening and how to prevent it?
library(rvest)
library(tidyverse)
library(stringr)
# lets assume 3 pages only to do it quickly
page <- (0:18)
# no need to create a list. Just a vector
urls = paste0("https://www.mlssoccer.com/players?page=", page)
# define this function that collects the player's name from a url
get_the_names = function( url){
url %>%
read_html() %>%
html_nodes("a.name_link") %>%
html_text()
}
# map the urls to the function that gets the names
players = map(urls, get_the_names) %>%
# turn into a single character vector
unlist() %>%
# make lower case
tolower() %>%
# replace the `space` to underscore
str_replace_all(" ", "-")
# Now create a vector of player urls
player_urls = paste0("https://www.mlssoccer.com/players/", players )
# define a function that reads the 3rd table of the url
get_the_summary_stats <- function(url){
url %>%
read_html() %>%
html_nodes("table") %>%
html_table() %>% .[[3]]
}
# lets read 3 players only to speed things up [otherwise it takes a significant amount of time to run...]
a_few_players <- player_urls[1:5]
# get the stats
tables = a_few_players %>%
# important step so I can name the rows I get in the table
set_names() %>%
#map the player urls to the function that reads the 3rd table
# note the `safely` wrap around the get_the_summary_stats' function
# since there are players with no stats and causes an error (eg.brenden-aaronson )
# the output will be a list of lists [result and error]
map(., ~{ Sys.sleep(5)
safely(get_the_summary_stats) }) %>%
# collect only the `result` output (the table) INTO A DATA FRAME
# There is also an `error` output
# also, name each row with the players name
map_df("result", .id = "player") %>%
#keep only the player name (remove the www.mls.... part)
mutate(player = str_replace(player, "https://www.mlssoccer.com/players/", "")) %>%
as_tibble()
tables <- tables %>% separate(Match,c("awayTeam","homeTeam"), extra= "drop", fill = "right")
purrr::safely(...) returns a function, so your map(., { Sys.sleep(5); safely(get_the_summary_stats) }) is returning functions, not any data. In R, a "closure" is a function and its enclosing environment.
Tilde notation is a tidyverse-specific method of more-terse anonymous functions. Typically (e.g., with lapply) one would use lapply(mydata, function(x) get_the_summary_stats(x)). In tilde notation, the same thing is written as map(mydata, ~ get_the_summary_stats(.))
So, re-write to:
... %>% map(~ { Sys.sleep(5); safely(get_the_summary_stats)(.); })
From comments by #r2evans

Getting the tidyr::nest() -> purrr:map() workflow to work for special case of no grouping var

I'm trying to write a function that does a split-apply-combine for which the split variable(s) are parameters, and - importantly - a null split is acceptable. For example, running statistics either on subsets of data or on the entire dataset.
somedata=expand.grid(a=1:3,b=1:3)
somefun=function(df_in,grpvars=NULL){
df_in %>% group_by_(.dots=grpvars) %>% nest() %>%
mutate(X2.Resid=map(data,~with(.x,chisq.test(b)$residuals))) %>%
unnest(data,X2.Resid) %>% return()
}
somefun(somedata,"a") # This works
somefun(somedata) # This fails
The null condition fails because nest() seems to need a variable to nest by, rather than nesting the entire df into a 1x1 data.frame. I can get around this as follows:
somefun2=function(df_in,grpvars="Dummy"){
df_in$Dummy=1
df_in %>% group_by_(.dots=grpvars) %>% nest() %>%
mutate(X2.Resid=map(data,~with(.x,chisq.test(b)$residuals))) %>%
unnest(data,X2.Resid) %>%
select(-Dummy) %>% return()
}
somefun2(somedata) # This works
However, I'm wondering if there is a more elegant way to fix this, without needing the dummy variabe?
Hmm, that behavior is a little surprising to me. A fix is easy though: you just have to make sure you nest everything():
somefun3 <- function(df_in, grpvars = NULL) {
df_in %>%
group_by_(.dots = grpvars) %>%
nest(everything()) %>%
mutate(X2.Resid = map(data, ~with(.x, chisq.test(b)$residuals))) %>%
unnest()
}
somefun3(somedata, "a")
somefun3(somedata)
Both work.

Resources