R - dbplyr - `lang_name()` is deprecated - r

I just updated dblyr and since that moment I started to saw warnings
Warning messages: 1: lang_name() is deprecated as of rlang 0.2.0.
Please use call_name() instead. This warning is displayed once per
session. 2: lang() is deprecated as of rlang 0.2.0. Please use
call2() instead. This warning is displayed once per session.
I have no clue what sould I do since my code looks like this
df <- tbl(conn, in_schema("schema", "table")) %>%
filter(status!= "CLOSED" | is.na(status)) %>%
group_by(customer_id) %>%
filter(created == min(created, na.rm = T)) %>%
ungroup() %>%
select(
contract_number,
customer_id,
approved_date = created
) %>%
collect()
There is no call_name() or lang_name() in my code. Do you guys know whats wrong? I know that my code works even with this warnings, but I don't want to see it.

As you already mentioned there is nothing wrong and your code works fine as this is a warning. The window function in dbplyr still uses the lang_name() function call. The window function is called within your filter( ... == min(...)) statements. There is already an issue on Github open for this link.
If you do not want to see the warning you can suppress it like this:
suppressWarnings(df <- tbl(conn, in_schema("schema", "table")) %>%
filter(status!= "CLOSED" | is.na(status)) %>%
group_by(customer_id) %>%
filter(created == min(created, na.rm = T)) %>%
ungroup() %>%
select(
contract_number,
customer_id,
approved_date = created
) %>%
collect())

Related

How to use list elements to query data from a database using dplyr?

I use dplyr (version 0.8.3) to query data. The query includes parameters that are defined in the beginning of the query, as shown in the first reprex below. On top of that, I want to collect the parameters in a list, as shown the second reprex. The first and second examples, which query data from a dataframe, work fine. However, when I want to query data from a database using parameters saved in a list, the SQL translation produces a non-working SQL that cannot be used to query data from a database.
Is there a way to incorporate list entries into a dplyr pipeline to query data from a database?
library(dplyr)
library(dbplyr)
library(RSQLite)
# 1. Data frame: value
a <- 1
mtcars %>% filter(vs == a) # Works
# 2. Data frame: list value
input <- list()
input$a <- 1
mtcars %>% filter(vs == input$a) # Works
# 3. Database: value and list value
con <- DBI::dbConnect(RSQLite::SQLite(), path = ":memory:")
copy_to(con, mtcars, "mtcars", temporary = FALSE)
db_mtcars <- tbl(con, "mtcars")
db_mtcars %>% filter(vs == a) %>% show_query() %>% collect() # Works
db_mtcars %>% filter(vs == input$a) %>% show_query() %>% collect() # Does not work
db_mtcars %>% filter(vs == input[1]) %>% show_query() %>% collect() # Does not work
db_mtcars %>% filter(vs == input[[1]]) %>% show_query() %>% collect() # Does not work
The background of my question is that I want to process and analyze data in a shiny app. I find it easier to develop the code for processing data outside the app and then include the code in the app afterwards. However, this task becomes increasingly difficult with a growing number of inputs. For development, my idea was to define a list named "input" such that I can copy and paste the code into the app. However, I stumble over the problem decribed above. Suggestions for an alternative development workflow are very welcome, too.
For dplyr>=1.0.0 you need to {{embrace}} values, see programming with dplyr :
db_mtcars %>% filter(vs == a) %>% show_query() %>% collect() # Works
db_mtcars %>% filter(vs == {{input$a}}) %>% show_query() %>% collect() # should work
db_mtcars %>% filter(vs == {{input[1]}}) %>% show_query() %>% collect() # should work
db_mtcars %>% filter(vs == {{input[[1]]}}) %>% show_query() %>% collect() # should work
for dplyr <1.0.0 you can use the bang bang !! operator : !!input$a.

error in could not find function in r with pipeline

I'm just starting with r, so this may very well be a very simple question but...
I've tried changing the name in 'a' to be more elaborate but this makes no difference
If I try to assign it to a variable
(e.g. baseline <- a %>% filter(Period == "Baseline") %>% group_by(File)%>%
It just tells me:
"Error in a %>% filter(Period == "Baseline") %>% group_by(File) %>% :
could not find function "%>%<-"
I'd really be grateful for any help with this.
It keeps telling me "Error in a(.) : could not find function "a"
and that it is unable to find Baseline_MAP even though it is defined earlier.
in mutate(Delta_MAP = Group_MAP - Baseline_MAP,
a <- read_csv("file.csv")
summary(a)
a %>%
filter(Period == "Baseline") %>%
group_by(File)%>%
summarise(Baseline_MAP = mean(MAP_Mean, na.rm=T),
Baseline_SBP = mean(SBP_Mean, na.rm=T),
Baseline_LaserMc1 = mean(Laser1_Magic, na.rm=T),
Baseline_Laser1 = mean(Laser1_Mean, na.rm=T))%>%
a%>%
filter(Period != "Baseline") %>%
group_by(File)%>%
summarise(Group_MAP = mean(MAP_Mean, na.rm=T),
Group_SBP = mean(SBP_Mean, na.rm=T),
Group_Laser_1Magic = mean(Laser1_Magic, na.rm=T),
Group_Laser_1 = mean(Laser1_Mean, na.rm=T))
a%>%
mutate(Delta_MAP = Group_MAP - Baseline_MAP,
Delta_MAP_Log = log(Group_MAP)-log(Baseline_MAP),
Delta_SBP = Group_SBP - Baseline_SBP,
Delta_SBP_Log = log(Group_SBP)-log(Baseline_SBP),
Delta_Laser1_Magic = Group_Laser_1Magic - Baseline_LaserMc1,
Delta_Laser1_Log = log(Group_Laser_1Magic)-log(Baseline_LaserMc1))
%>% is from the package "dplyr". So make sure you load it, i.e. library(dplyr).
Next, %>% does not assign the result to a variable. I.e.
a %>% mutate(foo=bar(x))
does not alter a. It will just show the result on the console (and none if you are running the script or calling it from a function).
You might be confusing the pipe-operator with %<>% (found in the package magrittr) which uses the left-hand variable as input for the pipe, and overwrites the variable with the modified result.
Finally, when you write
If I try to assign it to a variable (e.g. baseline <- a %>% filter(Period == "Baseline") %>% group_by(File)%>%)
You are assigning the result from the pipeline to a variable baseline -- this however does not modify the variable-names in the data frames (i.e. the column names).

Translate Spark SQL function to "normal" R code

I am trying to follow an Vignette "How to make a Markov Chain" (http://datafeedtoolbox.com/attribution-theory-the-two-best-models-for-algorithmic-marketing-attribution-implemented-in-apache-spark-and-r/).
This tutorial is interesting, because it is using the same data source as I use. But, a part of the code is using "Spark SQL code" (what I got back from my previous question Concat_ws() function in Sparklyr is missing).
My question: I googled a lot and tried to solve this by myself. But I have no idea how, since I don't know exactly what the data should look like (the author didn't gave an example of his DF before and after the function).
How can I transform this piece of code into "normal" R code (without using Spark) (especially: the concat_ws & collect_list functions are causing trouble
He is using this line of code:
channel_stacks = data_feed_tbl %>%
group_by(visitor_id, order_seq) %>%
summarize(
path = concat_ws(" > ", collect_list(mid_campaign)),
conversion = sum(conversion)
) %>% ungroup() %>%
group_by(path) %>%
summarize(
conversion = sum(conversion)
) %>%
filter(path != "") %>%
collect()
From my previous question, I know that we can replace a part of the code:
concat_ws() can be replaced the paste() function
But again, another part of code is jumping in:
collect_list() # describtion: Aggregate function: returns a list of objects with duplicates.
I hope that I described this question as clear as possible.
paste has the ability to collapse the string vector with a separator that is provided with the collapse parameter.
This can act as a drop in replacement for concat_ws(" > ", collect_list(mid_campaign))
channel_stacks = data_feed_tbl %>%
group_by(visitor_id, order_seq) %>%
summarize(
path = paste(mid_campaign, collapse = " > "),
conversion = sum(conversion)
) %>% ungroup() %>%
group_by(path) %>%
summarize(
conversion = sum(conversion)
) %>%
filter(path != "")

Same R function works as function/object in Global Env, does not work when loaded from R package

This might be a hard question to field because I can't easily offer a reproducible SNAFU without providing my access token to the Gmail API, but my hope is that I am tripping over an issue that will be clear enough from my description. We'll see...
I wrote a function that takes the gmail message ID of a Google Scholar Alert and parses it into a data frame with a columns for article title, authors, publication, date, etc. My conundrum is that the same code that works when I load it interactively (i.e., load the function into session RAM a la "my_fancy_function <- function(arguments){blah blah})" does not work when I load the function as part of an R package (a la 'devtools::load_all("/mypackage/")'). It's worth noting that the other main function in my package works fine either way, but this one trips up when I try to use it from the package version after loading the package via devtools::load_all("my_misbehaving_package/").
Below is the .R file content of the function that won't come to heel when loaded as part of a package -- and I hereby acknowledge in advance that it's some ugly mess of code, if that will spare me some finger wagging in your answers. This feels like the package version of the classic "strings as factors" SNAFU, but you tell me. The problem seems to occur very early in the function evaluation, and the error I get reads as follows:
Error in UseMethod("read_xml") :
no applicable method for 'read_xml' applied to an object of class "NULL"
And here is my whole sick code for this function:
library(stringr)
library(rvest)
library(plyr)
library(dplyr)
library(lubridate)
library(gmailr)
GScholar_alert_msg_to_df <- function(message_id){
one_message <- message(message_id)
msg_html <- body(one_message)
title <- read_html(msg_html) %>% html_nodes("h3") %>% html_text()
link <- read_html(msg_html) %>% html_nodes("h3 a") %>% html_attr("href")
msg_chunks <- msg_html %>% str_split("<a href") %>% unlist
msg_chunks <- msg_chunks[2:(length(msg_chunks)-2)]
excerpt <- msg_chunks %>% str_replace_all(fixed("<b>"), "") %>%
str_replace_all(fixed("</b>"), "") %>%
str_replace_all(fixed("<br>"), "") %>%
str_extract("<(font size=2 color=#222222|font size=\"-1\")>(.*?)</font>") %>%
unlist %>% str_replace_all("<(font size=2 color=#222222||font size=\"-1\")>", "") %>%
str_replace_all("</font>", "")
author_pub_field <- msg_chunks %>% str_replace_all(fixed("<b>"), "") %>%
str_replace_all(fixed("</b>"), "") %>%
str_extract("<font size=(2|\"-1\") color=#(006621|009933|008000)>(.*?)</font>") %>%
unlist %>%
str_replace_all("<font size=(2|\"-1\") color=#(006621|009933|008000)>", "") %>%
str_replace_all("</font>", "")
one_message_df <- data.frame(title, excerpt, link, author_pub_field, stringsAsFactors = FALSE)
one_message_df$date <- date(one_message) # needs reformatting
one_message_df$date %>% str_replace("[[:alpha:]]{3}, ", "") %>%
str_extract("^.{11}") %>% dmy -> one_message_df$date
one_message_df$author_only <- str_detect(one_message_df$author_pub_field, " - ")
one_message_df$author <- one_message_df$author_pub_field %>%
str_extract("^(.*?) - ") %>% str_replace(" - ", "")
one_message_df$author <- ifelse(one_message_df$author_only == 1, one_message_df$author, one_message_df$author_pub_field)
one_message_df$publication <- one_message_df$author_pub_field %>%
str_extract(" - (.*?)$") %>% str_replace(" - ", "") %>%
str_replace(", [0-9]{4}$", "") %>% str_replace("^[0-9]{4}$", NA)
one_message_df$publication <- str_replace(one_message_df$publication, "^[0-9]{4}$", NA)
one_message_df$author_MIAs <- str_detect(one_message_df$author, "…")
one_message_df$author %>% str_replace("…", " \\.\\.\\.") -> one_message_df$author
one_message_df$pub_name_clipped <- one_message_df$publication %>% str_detect("…")
one_message_df$publication %>% str_replace("…", " \\.\\.\\.") -> one_message_df$publication
return(one_message_df)
Yeah, it's ugly, but again, I promise it works when the code is entered interactively. As for the misbehaving package version, according to RStudio traceback the error seems to happen way up top, I think either the body() or the message() function I'm using from the gmailr package. (Yes, I've tried the code via Terminal to no happier conclusion.) Help me, oh be anyone, you're my only hope.
Thank you to #MrFlick for indicating a way out of the woods here. I was able to resolve the problem by moving that list of packages to the 'Depends' field of the DESCRIPTION file, and I also had to use the '::' operator to make explicit calls to the 'body' and 'date' functions from the gmailr package. Those two names also belong to nonstandard functions that are part of the 'base' namespace and they can't be masked by loading gmailr within the package environment. According to Hadley, I should be using '::' for every use of a function from another package when I'm writing my own.

Weird error with lapply and dplyr/magrittr

Here's a piece of code:
data <- data.frame(a=runif(20),b=runif(20),subject=rep(1:2,10)) %>%
group_by(subject) %>%
do(distance = dist(.))
#no dplyr
intermediate <- lapply(data$distance,as.matrix)
mean.dists <- apply(simplify2array(intermediate),MARGIN = c(1,2),FUN=mean)
#dplyr
mean.dists <- lapply(data$distance,as.matrix) %>%
apply(simplify2array(.),MARGIN=c(1,2),FUN=mean)
Why does the "no dplyr" version work, and the "dplyr" version throws the error, "dim(X) must have a positive length"? They seem identical to me.
The issue is that you haven't quite fully implemented the pipe line. You are using magrittr here, and the issue has little to do with dplyr
data$distance %>%
lapply(as.matrix ) %>%
simplify2array %>%
apply(MARGIN=1:2, FUN=mean)

Resources