Which environment should be called when using eval( ) in a function? - r

I've got a set of functions that I'm trying to work with and I'm struggling to figure out why the assignment isn't working. Here are the functions I'm using:
new_timeline <- function() {
timeline = structure(list(), class="timeline")
timeline$title <- list("text" = list("headline" = NULL, "text" = NULL),
"start_date" = list("year" = NULL, "month" = NULL, "day" = NULL),
"end_date" = list("year" = NULL, "month" = NULL, "day" = NULL))
return(timeline)
}
.add_date <- function(self, date, time_type) {
valid_date <- stringr::str_detect(date, "^[0-9]{4}(-[0-9]{1,2}){0,2}$")
if (!valid_date) {
stringr::str_interp("Your ${time_type} date does not appear to be formatted correctly. It must be of the form 'yyyy-mm-dd'. Only the year is required.") %>% stop()
}
date_elements <- date %>% as.character() %>% stringr::str_split(" ") %>% unlist()
date <- date_elements[1] %>% stringr::str_split("-") %>% unlist()
stringr::str_interp("self$title$${time_type}_date$year <- date[1]") %>% parse(text = .) %>% eval()
if (!is.na(date[2])) stringr::str_interp("self$title$${time_type}_date$month <- date[2]") %>% parse(text = .) %>% eval()
if (!is.na(date[3])) stringr::str_interp("self$title$${time_type}_date$day <- date[3]") %>% parse(text = .) %>% eval()
return(self)
}
edit_title <- function(self, headline = NULL, text = NULL, start_date = NULL, end_date = NULL) {
if (class(self) != "timeline") stop("The object passed must be a timeline object.")
if (is.null(headline) && is.null(self$title$text$headline)) stop("Headline cannot be empty when adding a new title.")
if (!is.null(headline)) self$title$text$headline <- headline
if (!is.null(text)) self$title$text$text <- text
if (!is.null(start_date)) self <- .add_date(self, date = start_date, time_type = "start")
if (!is.null(end_date)) self <- .add_date(self, date = end_date, time_type = "end")
return(self)
}
EDIT: The above code has been severely reduced per a request in the comments. The code is still sufficient to reproduce the error.
I know that's a bit long-winded, so I apologize. The first function establishes a new timeline object. The third function allows us to change the title of the timeline object and the second function is a helper function that handles dates. The code would be used like this:
library(magrittr)
#devtools::install_github("hadley/stringr")
library(stringr)
tl <- new_timeline()
tl <- tl %>% edit_title(headline = "My Timeline", text = "Example", start_date = "2015-10-18")
The code runs with no errors, but when I call tl$title$start_date$year, it comes back as NULL. Using an answer I got in this previous question I asked, I tried to set envir = globalenv() within the eval function. When I do that, the function returns an error saying that object self cannot be found.
So I'm under the impression that self is held in the parent.frame(). So I add both of these to a list: envir = list(globalenv(), parent.frame()). This causes the function to run without error, but there's still no assignment.
Where am I going wrong? Thanks in advance!

As mentioned in the comments, I think you could probably do away with all of the code parsing and just pass variables in [[ for your assignments. Anyway, when you use the pipe operator a bunch of function wrapping happens so determining how many frames to go back is painful. Here are a couple solutions modifying the .add_date function.
You already found one, using <<-, since it searches back through the parent environments until it finds the variable (or doesnt and assigns it in the global).
Another would be just storing the function environment() and passing that to eval.
A third would be counting how many frames deep you go, and using sys.frame to tell eval which environment to look in.
.add_date <- function(self, date, time_type) {
valid_date <- stringr::str_detect(date, "^[0-9]{4}(-[0-9]{1,2}){0,2}$")
if (!valid_date) {
stringr::str_interp("Your ${time_type} date does not appear to be formatted correctly. It must be of the form 'yyyy-mm-dd'. Only the year is required.") %>% stop()
}
## Examining environemnts
e <- environment() # current env
efirst <- sys.nframe() # frame number
print(paste("Currently in frame", efirst))
envs <- stringr::str_interp("${date}") %>% parse(text=.) %>% {.; sys.frames()} # list of frames
elast <- stringr::str_interp("${date}") %>% parse(text=.) %>% {.; sys.nframe()} # number of last
print(paste("Went", elast, "frames deep."))
## Go back this many frames in eval
goback <- efirst-elast
date_elements <- date %>% as.character() %>% stringr::str_split(" ") %>% unlist()
date <- date_elements[1] %>% stringr::str_split("-") %>% unlist()
## Solution 1: use sys.frame
stringr::str_interp("self$title$${time_type}_date$year <- date[1]") %>%
parse(text = .) %>% eval(envir=sys.frame(goback))
## Solution 2: use environment defined in function
if (!is.na(date[2])) stringr::str_interp("self$title$${time_type}_date$month <- date[2]") %>%
parse(text = .) %>% eval(envir=e)
return(self)
}

Related

A for() loop to overwrite existing data.frames

I'm testing some operations with that classical 99' Czech Bank data set, trying to execute a tidyverse task upon several data.frames in my global environment, but the loop I've created keeps overwriting the same object val, which was supposed to be a dummy for the df themselves:
x <- c("loans93","loans94","loans95","loans96",
"loans97","loans98")
x <- base::mget(x, envir=as.environment(-1), mode= "any", inherits=F)
for (val in x) {
val <- val %>%
select(account_id, district_id, balance, status, date) %>%
group_by(account_id, district_id, status, date) %>%
summarise(balance=mean(balance, na.rm=T)) %>%
ungroup()
}
What am I doing wrong? I've searched for similar questions but people keep answering lapply solutions, I just need the task to be saved upon my DFs instead of this "val" object I keep getting.
you could try this:
df_names <- c("loans93", "loans94", "loans95",
"loans96", "loans97", "loans98")
for(df_name in df_names){
get(df_name) %>%
head %>% ## replace with desired manipulations
assign(value = .,
x = paste0(df_name,'_manipulated'), ## or just: df_name to overwrite original
envir = globalenv())
}
aside: list2env is handy to "spawn" list members (e. g. dataframes) into the environment. Example:
list_of_dataframes <- list(
iris_short = head(iris),
cars_short = head(cars)
)
list2env(x = list_of_dataframes)

Problems generating tree diagram with hctreemap2

library(highcharter)
library(dplyr)
library(viridisLite)
library(forecast)
library(treemap)
data("Groceries", package = "arules")
dfitems <- tbl_df(Groceries#itemInfo)
set.seed(10)
dfitemsg <- dfitems %>%
mutate(category = gsub(" ", "-", level1),
subcategory = gsub(" ", "-", level2)) %>%
group_by(category, subcategory) %>%
summarise(sales = n() ^ 3 ) %>%
ungroup() %>%
sample_n(31)
hctreemap2(group_vars = c("category","subcategory"),
size_var = "sales")%>%
hc_tooltip(pointFormat = "<b>{point.name}</b>:<br>
Pop: {point.value:,.0f}<br>
GNI: {point.colorValue:,.0f}")
the error is the following
Error in hctreemap2(., group_vars = c("category", "subcategory"), size_var = "sales") : Treemap data uses same label at multiple levels.
I tried everything and it doesn't work out, could someone with experience explain to me what is happening?
When I tried your code, it also stated that the function was deprecated and to use data_to_hierarchical. Although, it's never quite that simple, right? I tried multiple ways to get hctreemap2 to work, but wasn't able to discern that issue. From there I turned to the package recommended data_to_hierarchical. Now that worked without an issue--once I figured out the right type, which in hindsight seemed kind-of obvious.
That being said, this is what I've got:
data_to_hierarchical(data = dfitemsg,
group_vars = c(category,subcategory),
size_var = sales) %>%
hchart(type = "treemap") %>%
hc_tooltip(pointFormat = "<b>{point.name}</b>:<br>
Pop: {point.value:,.0f}<br>
GNI: {point.colorValue:,.0f}")
You didn't actually designate a color, so the GNI comes up blank.
Let me know if you run into any issues.
Based on your comment:
I have not found a way to change the color to density, which is what both hctreemap2 and treemap appear to do. The function data_to_heirarchical codes the colors to the first grouping variable or the level 1 variable.
Inadvertently, I did figure out why the function hctreemap2 would not work. It checks to see if any category labels are the same as a subcategory label. I didn't go through all of the data, but I know there is a perfumery perfumery. I don't understand what that's a hard stop. If that is a problem for this call, why wouldn't data_to_heirchical be looking for this issue, as well?
So, I changed the function. First, I called the function itself.
x = hctreemap2
Then I selected it from the environment pane. Alternatively, you can code View(x).
This view is read-only, but it's easier to read than the console. I copied the function and assigned it to its original name with changes. I removed two pieces of the code, which changed nothing structurally speaking to how the chart is created.
I removed the first line of code in the function:
.Deprecated("data_to_hierarchical")
and this code (about a third of the way down)
if (data %>% select(!!!group_syms) %>% map(unique) %>% unlist() %>%
anyDuplicated()) {
stop("Treemap data uses same label at multiple levels.")
}
This left me to recreate the function with this code:
hctreemap2 <- function (data, group_vars, size_var, color_var = NULL, ...)
{
assertthat::assert_that(is.data.frame(data))
assertthat::assert_that(is.character(group_vars))
assertthat::assert_that(is.character(size_var))
if (!is.null(color_var))
assertthat::assert_that(is.character(color_var))
group_syms <- rlang::syms(group_vars)
size_sym <- rlang::sym(size_var)
color_sym <- rlang::sym(ifelse(is.null(color_var), size_var, color_var))
data <- data %>% mutate_at(group_vars, as.character)
name_cell <- function(..., depth) paste0(list(...),
seq_len(depth),
collapse = "")
data_at_depth <- function(depth) {
data %>%
group_by(!!!group_syms) %>%
summarise(value = sum(!!size_sym), colorValue = sum(!!color_sym)) %>%
ungroup() %>%
mutate(name = !!group_syms[[depth]], level = depth) %>%
mutate_at(group_vars, as.character()) %>% {
if (depth == 1) {
mutate(., id = paste0(name, 1))
}
else {
mutate(.,
parent = pmap_chr(list(!!!group_syms[seq_len(depth) - 1]),
name_cell, depth = depth - 1),
id = paste0(parent, name, depth))
}
}
}
treemap_df <- seq_along(group_vars) %>% map(data_at_depth) %>% bind_rows()
data_list <- treemap_df %>% highcharter::list_parse() %>%
purrr::map(~.[!is.na(.)])
colorVals <- treemap_df %>%
filter(level == length(group_vars)) %>% pull(colorValue)
highchart() %>%
hc_add_series(data = data_list, type = "treemap",
allowDrillToNode = TRUE, ...) %>%
hc_colorAxis(min = min(colorVals), max = max(colorVals), enabled = TRUE)
}
Now your code, as originally written will work. You did not change the highcharter package by doing this. So if you think you'll use it in the future save the function code, as well. You will need the library purrr, since you already called dplyr (where most, if any conflicts occur), you could just call tidyverse (which calls several libraries at one time, including both dplyr and purrr).
This is what it will look like with set.seed(10):
If you drill down on the largest block:
It looks odd to me, but I'm guessing that's what you were looking for to begin with.

Object not found in function environment for nested objects

I have a code snippet which I am trying to convert into a function. This function is supposed to look for potential spelling errors in a manual-entry field. The snippet works and you can try it out like this, using the starwars data from the tidyverse package:
require(tidyverse)
require(rlang) # loaded for {{ to force function arguments as well as the with_env() function
require(RecordLinkage) # loaded for the jarowinkler() function
starwars_cleaning <- starwars %>%
add_count(name, name = "Freq_name") %>% # this keeps track of which spelling is more frequent
distinct(name, .keep_all = T) %>% # this prevents duplicated comparisons and self-comparisons
nest_by(homeworld, .key = ".Nest") %>%
mutate(Mapped = list(imap_dfr(.x = .Nest$name,
.f = ~jarowinkler(str1 = .x,
str2 = .Nest$name[-.y]) %>%
list() %>%
tibble(Score_n = ., Match_n = list(.Nest$name[-.y]),
Freq_n = list(.Nest$Freq_name[-.y]))
)))
The function should accept the variable(s) to nest on (ellipses) and the variable to look for potential misspelled matches in as arguments. Right now, it looks like this:
string_matching <- function(.df, .string_col, ...){
.df$.tmp_string <- .df %>% select({{.string_col}})
.df <- .df %>%
add_count(.tmp_string, name = "Freq_name") %>%
distinct(.tmp_string, .keep_all = T) %>%
nest_by(..., .key = ".Nest") %>%
mutate(Mapped_n = list(with_env(env = current_env(), # same error with or without specifying the execution environment for imap
expr = imap_dfr(.x = .Nest$.tmp_string,
.f = ~jarowinkler(str1 = .x,
str2 = .Nest$.tmp_string[-.y]) %>%
list() %>%
tibble(Score_n = ., Match_n = list(.Nest$.tmp_string[-.y]),
Freq_n = list(.Nest$Freq_name[-.y]))
)
))
)
return(.df)
}
starwars %>%
string_matching(name, homeworld)
On the starwars data, it isn't very useful, clearly. And I cut down some of the features of this code to get a MWE--but that's the idea. When I wrap the code up like this in a function, it returns invalid argument to unary operator (apparently caused by the [-.y]). I tried the force() command after reading this post since this problem apparently comes up a lot. Because of the current error and that post, I thought the problem might have to do with the function environment causing imap_dfr() to lose track of the data somehow. I tried to wrap the call to map in with_env() and instruct it to use the function environment rather than its own. I also tried to break up the function by assigning an intermediate object to the global environment so that it could be found in the mapping step of the function:
assign(x = "TEMP", value = .df$.Nest, envir = global_env())
That landed me with the same 'unary operator` error. I'm not sure what to try next. I seem to be going in circles. Any insights into what is causing this problem and how to fix it would be greatly appreciated.
I don't think the post you pointed to is really related here. I don't think your problem is related to execution environment. The problem really is how you've handled passing the variable to your function. When you create your tmp_string, you are calling select() which is returning a tibble rather than the vector of column values. Instead, use pull() to extract those values.
string_matching <- function(.df, .string_col, ...){
.df$.tmp_string <- .df %>% pull({{.string_col}})
.df <- .df %>%
add_count(.tmp_string, name = "Freq_name") %>%
distinct(.tmp_string, .keep_all = T) %>%
nest_by(..., .key = ".Nest") %>%
mutate(Mapped_n = list(with_env(env = current_env(), # same error with or without specifying the execution environment for imap
expr = imap_dfr(.x = .Nest$.tmp_string,
.f = ~jarowinkler(str1 = .x,
str2 = .Nest$.tmp_string[-.y]) %>%
list() %>%
tibble(Score_n = ., Match_n = list(.Nest$.tmp_string[-.y]),
Freq_n = list(.Nest$Freq_name[-.y]))
)
))
)
return(.df)
}
Or you could write your code to avoid the need for that temp column completely
string_matching <- function(.df, .string_col, ...){
col <- rlang::ensym(.string_col)
.df <- .df %>%
add_count(!!col, name = "Freq_name") %>%
distinct(!!col, .keep_all = T) %>%
nest_by(..., .key = ".Nest") %>%
mutate(Mapped_n = list(imap_dfr(.x = .Nest %>% pull(!!col),
.f = ~jarowinkler(str1 = .x,
str2 = (.Nest %>% pull(col))[-.y]) %>%
list() %>%
tibble(Score_n = ., Match_n = list((.Nest %>% pull(col))[-.y]),
Freq_n = list(.Nest$Freq_name[-.y]))
))
)
return(.df)
}

Threading the needle: finding the name of the actual argument corresponding to a formal of an outer function

The function strip() below tries to produce a brief report on the result of its operation via the tee pipe (%T>%). Because this function is in turn being handed to a wrapper function and then to purrr::pwalk, which will supply it with a bunch of dataframes one by one, I want to get a report of its operation on each dataframe along with the dataframe name; which is to say, the name of the actual dataframe that is supplied to correspond to the formal argument tib in the function below. In the example supplied, this would be "tst_df". I don't know the names in advance of running the function, as they are constructed from the filenames read from disk and various other inputs.
Somewhat to my surprise, I actually have almost all of this working, except for getting the name of the supplied dataframe. In the example below, the code that is supposed to do this is enexpr(XX), but I have also tried expr(XX), and both of these expressions applied to tib or the dot (.), with or without a preceding !!. Also deparse(substitute()) on XX, tib, and ., but without the bang bangs.
I see that the names is stripped initially by pass-by-value, and then again, maybe, by each stage of the pipe, including the T, and again, maybe, by (XX = .) in the anonymous function after the T. But I know R + tidyverse will have a way. I just hope it does not involve providing an integer to count backwards up the call stack
tst_df <- tibble(A = 1:10, B = 11:20, C=21:30, D = 31:40)
tst_df
################################################################################
# The strip function expects a non-anonymous dataframe, from which it removes
# the rows specified in remove_rows and the columns specified in remove_cols. It
# also prints a brief report; just the df name, length and width.
strip <- function(tib, remove_rows = FALSE, remove_cols = NULL){
remove_rows <- enquo(remove_rows)
remove_cols <- enquo(remove_cols)
out <- tib %>%
filter(! (!! remove_rows)) %>%
select(- !! remove_cols) %T>% (function(XX = .){
function(XX = .)print(
paste0("length of ", enxpr(XX), " = ", nrow(XX), " Width = ", ncol(XX)))
cat("\n")
})
out
}
out_tb <- strip(tib = tst_df, remove_rows = (A < 3 | D > 38), remove_cols = c(C, D))
out_tb
Just save the name of tib at the beginning of your function,
it will be found by your reporter function:
strip <- function(tib, remove_rows = FALSE, remove_cols = NULL) {
remove_rows <- enquo(remove_rows)
remove_cols <- enquo(remove_cols)
tib_name <- as.character(substitute(tib))
report <- function(out) {
cat("output length of", tib_name, "=", nrow(out), ", width =", ncol(out), "\n")
}
tib %>%
filter(! (!! remove_rows)) %>%
select(- !! remove_cols) %T>%
report
}
out_tb <- strip(tib = tst_df, remove_rows = (A < 3 | D > 38), remove_cols = c(C, D))
output length of tst_df = 6 , width = 2

How to pass multiple values in a rvest submission form

This is a follow up to a prior thread. The code works fantastic for a single value but I get the following error when trying to pass more than 1 value I get an error based on the length of the function.
Error in vapply(elements, encode, character(1)) :
values must be length 1,
but FUN(X[1]) result is length 3
Here is a sample of the code. In most instances I have been able just to name an object and scrape that way.
library(httr)
library(rvest)
library(dplyr)
b<-c('48127','48180','49504')
POST(
url = "http://www.nearestoutlet.com/cgi-bin/smi/findsmi.pl",
body = list(zipcode = b),
encode = "form"
) -> res
I was wondering if a loop to insert the values into the form would be the right way to go? However my loop writing skills are still in development and I am unsure of where to place it; in addition when i call the loop it doesn't print line by line it just returns null results.
#d isn't listed in the above code as it returns null
d<-for(i in 1:3){nrow(b)}
Here is an approach to send multiple POST requests
library(httr)
library(rvest)
b <- c('48127','48180','49504')
For each element in b perform a function that will send the appropriate POST request
res <- lapply(b, function(x){
res <- POST(
url = "http://www.nearestoutlet.com/cgi-bin/smi/findsmi.pl",
body = list(zipcode = x),
encode = "form"
)
res <- read_html(content(res, as="raw"))
})
Now for each element of the list res you should do the parsing steps explained by hrbrmstr: How can I Scrape a CGI-Bin with rvest and R?
library(tidyverse)
I will use hrbrmstr's code since he is king and it is already clear to you. Only thing we are doing here is performing it on each element of res list.
res_list = lapply(res, function(x){
rows <- html_nodes(x, "table[width='300'] > tr > td")
ret <- data_frame(
record = !is.na(html_attr(rows, "bgcolor")),
text = html_text(rows, trim=TRUE)
) %>%
mutate(record = cumsum(record)) %>%
filter(text != "") %>%
group_by(record) %>%
summarise(x = paste0(text, collapse="|")) %>%
separate(x, c("store", "address1", "city_state_zip", "phone_and_or_distance"), sep="\\|", extra="merge")
return(ret)
}
)
or using map from purrr
res %>%
map(function(x){
rows <- html_nodes(x, "table[width='300'] > tr > td")
data_frame(
record = !is.na(html_attr(rows, "bgcolor")),
text = html_text(rows, trim=TRUE)
) %>%
mutate(record = cumsum(record)) %>%
filter(text != "") %>%
group_by(record) %>%
summarise(x = paste0(text, collapse="|")) %>%
separate(x, c("store", "address1", "city_state_zip", "phone_and_or_distance"),
sep="\\|", extra="merge") -> ret
return(ret)
}
)
If you would like this in a data frame:
res_df <- data.frame(do.call(rbind, res_list), #rbinds list elements
b = rep(b, times = unlist(lapply(res_list, length)))) #names the rows according to elements in b
You can put the values inside the post as below,
b<-c('48127','48180','49504')
for(i in 1:length(b)) {
POST(
url = "http://www.nearestoutlet.com/cgi-bin/smi/findsmi.pl",
body = list(zipcode =b[i]),
encode = "form"
) -> res
# YOUR CODES HERE (for getting content of the page etc.)
}
But since for every different zipcode value the "res" value will be different, you need the put the rest of the codes inside the area I commented. Otherwise you get the last value only.

Resources