RLang: accessing list items via chaining and map (purrr) - r

I'm trying R again after a few years, much more used to Python dicts or Kotlin maps or JS objects. I am simply trying the access the value of key-value pairs after using some chaining methods. Unfortunately the normal accessors $ and [[ are not returning the expected values, or throwing errors.
Any idea how to simply get a list of the correct state names ("Alabama", "California", "Arizona") from my sample code? Thank you.
states_list <- list("AL"="Alabama", "AK"="Alaska", "AZ"="Arizona", "CA"="California") # (etc)
states_hash <- hash("AL"="Alabama", "AK"="Alaska", "AZ"="Arizona", "CA"="California") # (etc)
"AL-CA-AZ" %>% str_split("-") %>% map(~ states_list$.x) # NULL
"AL-CA-AZ" %>% str_split("-") %>% map(~ states_list[.x]) # k-v pairs, not just the values
"AL-CA-AZ" %>% str_split("-") %>% map(~ states_hash$.x) # NULL
"AL-CA-AZ" %>% str_split("-") %>% map(~ has.key(.x, states_hash)) # AL:TRUE CA:TRUE AZ:TRUE
"AL-CA-AZ" %>% str_split("-") %>% map(~ states_hash[.x]) # k-v pairs, not just the values
"AL-CA-AZ" %>% str_split("-") %>% map(~ states_list[[.x]]) # error - "recursive indexing failed at level 2"
"AL-CA-AZ" %>% str_split("-") %>% map(~ states_hash[[.x]]) # error - "wrong arguments for subsetting an environment"
"AL-CA-AZ" %>% str_split("-") %>% states_list[[.x]] # error - "object '.x' not found"

Your problem is really the str_split step. Note that str_split will return a list, rather than a vector. (It does this because you can pass multiple strings to the function at once and it will keep all the results separated in the list.) So when you map over that list, you are just mapping over the single list, not each of three elements in the vector in the list. A somewhat clumsy way to change that is
"AL-CA-AZ" %>% {str_split(., "-")[[1]]} %>% map(~states_list[[.x]])
You can clean it up a bit with purrr:pluck
"AL-CA-AZ" %>% str_split("-") %>% pluck(1) %>% map(~states_list %>% pluck(.x))
Or just do direct indexing for the last step
"AL-CA-AZ" %>% str_split("-") %>% pluck(1) %>% {states_list[.]}

Related

R Shiny how to render a list without displaying c(...)

I'm trying to display a list with the top three users based on a user-selected variable (see below). I have created a function that filters my table based on the selection of the Agency via the dropdown and retrieves the top 3 users in a column. I then transfromed the column into a string to render it in the app, but the results are being displayed in between c(...):
I'm okay with the format of the names separated by a comma, but I cannot find a way to eliminate the c(...).
This is the code for my function:
Top3UsersbyAgency <- function(filteredbyAgencyPool) {
filteredbyAgencyPool %>%
arrange(desc(MTD_Domestic)) %>%
group_by(userDisplayName) %>%
head(3) %>%
select(userDisplayName) %>%
na.exclude() %>%
na_if("") %>%
na.omit() %>%
toString()
}
And this is the result:
> Top3UsersbyAgency(filteredbyAgencyPool)
[1] "c(\"Payal Malhotra\", \"Swati Parmar\", \"Unassigned\")"
In the app, I simply used textOutput in the ui and renderText in the server function. I tried to also use renderTable to display the results in the column, but it honestly looks ugly with the title of the column in the middle, so I'd rather display the information just as a list of names in plain text. Any suggestion on how to clean this string up?
Try to transform data.frame column to vector with %>% .[[1]]:
op3UsersbyAgency <- function(filteredbyAgencyPool) {
filteredbyAgencyPool %>%
arrange(desc(MTD_Domestic)) %>%
group_by(userDisplayName) %>%
head(3) %>%
select(userDisplayName) %>%
na.exclude() %>%
na_if("") %>%
na.omit() %>%
.[[1]] %>%
toString()
}
As an illustration :
library(magrittr)
data.frame(c('a','b','c')) %>% toString()
#> [1] "c(\"a\", \"b\", \"c\")"
data.frame(c('a','b','c')) %>% .[[1]] %>% toString()
#> [1] "a, b, c"

How to use list elements to query data from a database using dplyr?

I use dplyr (version 0.8.3) to query data. The query includes parameters that are defined in the beginning of the query, as shown in the first reprex below. On top of that, I want to collect the parameters in a list, as shown the second reprex. The first and second examples, which query data from a dataframe, work fine. However, when I want to query data from a database using parameters saved in a list, the SQL translation produces a non-working SQL that cannot be used to query data from a database.
Is there a way to incorporate list entries into a dplyr pipeline to query data from a database?
library(dplyr)
library(dbplyr)
library(RSQLite)
# 1. Data frame: value
a <- 1
mtcars %>% filter(vs == a) # Works
# 2. Data frame: list value
input <- list()
input$a <- 1
mtcars %>% filter(vs == input$a) # Works
# 3. Database: value and list value
con <- DBI::dbConnect(RSQLite::SQLite(), path = ":memory:")
copy_to(con, mtcars, "mtcars", temporary = FALSE)
db_mtcars <- tbl(con, "mtcars")
db_mtcars %>% filter(vs == a) %>% show_query() %>% collect() # Works
db_mtcars %>% filter(vs == input$a) %>% show_query() %>% collect() # Does not work
db_mtcars %>% filter(vs == input[1]) %>% show_query() %>% collect() # Does not work
db_mtcars %>% filter(vs == input[[1]]) %>% show_query() %>% collect() # Does not work
The background of my question is that I want to process and analyze data in a shiny app. I find it easier to develop the code for processing data outside the app and then include the code in the app afterwards. However, this task becomes increasingly difficult with a growing number of inputs. For development, my idea was to define a list named "input" such that I can copy and paste the code into the app. However, I stumble over the problem decribed above. Suggestions for an alternative development workflow are very welcome, too.
For dplyr>=1.0.0 you need to {{embrace}} values, see programming with dplyr :
db_mtcars %>% filter(vs == a) %>% show_query() %>% collect() # Works
db_mtcars %>% filter(vs == {{input$a}}) %>% show_query() %>% collect() # should work
db_mtcars %>% filter(vs == {{input[1]}}) %>% show_query() %>% collect() # should work
db_mtcars %>% filter(vs == {{input[[1]]}}) %>% show_query() %>% collect() # should work
for dplyr <1.0.0 you can use the bang bang !! operator : !!input$a.

How to resolve 'Don't know how to pluck from a closure' error in R

The code below works if I remove the Sys.sleep() from within the map() function. I tried to research the error ('Don't know how to pluck from a closure') but i haven't found much on that topic.
Does anyone know where I can find documentation on this error, and any help on why it is happening and how to prevent it?
library(rvest)
library(tidyverse)
library(stringr)
# lets assume 3 pages only to do it quickly
page <- (0:18)
# no need to create a list. Just a vector
urls = paste0("https://www.mlssoccer.com/players?page=", page)
# define this function that collects the player's name from a url
get_the_names = function( url){
url %>%
read_html() %>%
html_nodes("a.name_link") %>%
html_text()
}
# map the urls to the function that gets the names
players = map(urls, get_the_names) %>%
# turn into a single character vector
unlist() %>%
# make lower case
tolower() %>%
# replace the `space` to underscore
str_replace_all(" ", "-")
# Now create a vector of player urls
player_urls = paste0("https://www.mlssoccer.com/players/", players )
# define a function that reads the 3rd table of the url
get_the_summary_stats <- function(url){
url %>%
read_html() %>%
html_nodes("table") %>%
html_table() %>% .[[3]]
}
# lets read 3 players only to speed things up [otherwise it takes a significant amount of time to run...]
a_few_players <- player_urls[1:5]
# get the stats
tables = a_few_players %>%
# important step so I can name the rows I get in the table
set_names() %>%
#map the player urls to the function that reads the 3rd table
# note the `safely` wrap around the get_the_summary_stats' function
# since there are players with no stats and causes an error (eg.brenden-aaronson )
# the output will be a list of lists [result and error]
map(., ~{ Sys.sleep(5)
safely(get_the_summary_stats) }) %>%
# collect only the `result` output (the table) INTO A DATA FRAME
# There is also an `error` output
# also, name each row with the players name
map_df("result", .id = "player") %>%
#keep only the player name (remove the www.mls.... part)
mutate(player = str_replace(player, "https://www.mlssoccer.com/players/", "")) %>%
as_tibble()
tables <- tables %>% separate(Match,c("awayTeam","homeTeam"), extra= "drop", fill = "right")
purrr::safely(...) returns a function, so your map(., { Sys.sleep(5); safely(get_the_summary_stats) }) is returning functions, not any data. In R, a "closure" is a function and its enclosing environment.
Tilde notation is a tidyverse-specific method of more-terse anonymous functions. Typically (e.g., with lapply) one would use lapply(mydata, function(x) get_the_summary_stats(x)). In tilde notation, the same thing is written as map(mydata, ~ get_the_summary_stats(.))
So, re-write to:
... %>% map(~ { Sys.sleep(5); safely(get_the_summary_stats)(.); })
From comments by #r2evans

Using lapply instead of repeating code

I would like to know how to use lapply and/or for loops to have more concise code.
This is what I currently have and it works.
MLFreq <- MLlyrics %>%
unnest_tokens(word, line) %>%
anti_join(stop_words) %>%
ungroup() %>%
count(word)
MLpct <- sum(albumList2$MLlyrics$n) / sum(MLFreq$n)
ViewFreq <- ViewLyrics %>%
unnest_tokens(word, line) %>%
anti_join(stop_words) %>%
ungroup() %>%
count(word)
Viewpct <- sum(albumList2$ViewLyrics$n) / sum(ViewFreq$n)
#... repeating 6 times with different data frames
I've been trying
Freq <- lapply(albumList2, function(df){
df %>% unnest_tokens(word, line) %>%
anti_join(stop_words) %>%
ungroup()%>%
count(word) %>%
sum(albumList2$df$n) / sum(df$n)
})
and
for (i in 1:length(albumList2)) {
unnest_tokens(word, line) %>%
anti_join(stop_words) %>%
ungroup()%>%
count(word) %>%
print(sum(albumList2$i$n) / sum(i$n))
}
but the lapply brings
Error in check_input(x) : Input must be a character vector of any length or
a list of character vectors, each of which has a length of 1.
and the for loop brings
no applicable method for 'unnest_tokens_' applied to an object of class
"function"
For reference albumList2 contains a list of data frames (MLlyrics, ViewLyrics, etc...)
I was originally going to leave it as is but just read something along the lines of "If you use the same code 3 times, loop over it"
The problem with the lapply example is that the list that you are looping over is a nested list instead of single list.
Also, references of the kind: sum(albumList2$df$n) / sum(df$n) & print(sum(albumList2$i$n) / sum(i$n)) would not work.
i is just a number ranging from 1 to length(albumList2). Saying you want 1$n or albumList2$1$n does not make sense.
You should read about indexing in lists and nested lists here and here. Please add some dummy data that everyone can test and help you better.

Getting the tidyr::nest() -> purrr:map() workflow to work for special case of no grouping var

I'm trying to write a function that does a split-apply-combine for which the split variable(s) are parameters, and - importantly - a null split is acceptable. For example, running statistics either on subsets of data or on the entire dataset.
somedata=expand.grid(a=1:3,b=1:3)
somefun=function(df_in,grpvars=NULL){
df_in %>% group_by_(.dots=grpvars) %>% nest() %>%
mutate(X2.Resid=map(data,~with(.x,chisq.test(b)$residuals))) %>%
unnest(data,X2.Resid) %>% return()
}
somefun(somedata,"a") # This works
somefun(somedata) # This fails
The null condition fails because nest() seems to need a variable to nest by, rather than nesting the entire df into a 1x1 data.frame. I can get around this as follows:
somefun2=function(df_in,grpvars="Dummy"){
df_in$Dummy=1
df_in %>% group_by_(.dots=grpvars) %>% nest() %>%
mutate(X2.Resid=map(data,~with(.x,chisq.test(b)$residuals))) %>%
unnest(data,X2.Resid) %>%
select(-Dummy) %>% return()
}
somefun2(somedata) # This works
However, I'm wondering if there is a more elegant way to fix this, without needing the dummy variabe?
Hmm, that behavior is a little surprising to me. A fix is easy though: you just have to make sure you nest everything():
somefun3 <- function(df_in, grpvars = NULL) {
df_in %>%
group_by_(.dots = grpvars) %>%
nest(everything()) %>%
mutate(X2.Resid = map(data, ~with(.x, chisq.test(b)$residuals))) %>%
unnest()
}
somefun3(somedata, "a")
somefun3(somedata)
Both work.

Resources