Does sql_variant in dbplyr work as it should? - r

Let's take a look at the example in ?sql_variant:
We define a new translator function for aggregated functions, expanded from the default one:
postgres_agg <- sql_translator(.parent = base_agg,
cor = sql_prefix("corr"),
cov = sql_prefix("covar_samp"),
sd = sql_prefix("stddev_samp"),
var = sql_prefix("var_samp")
)
We then define a new variant, which is made from translation functions of the 3 different types (here 2):
postgres_var <- sql_variant(
base_scalar,
postgres_agg
)
translate_sql(cor(x, y), variant = postgres_var)
# <SQL> COR("x", "y")
translate_sql(sd(income / years), variant = postgres_var)
# <SQL> SD("income" / "years")
These don't look translated to me, shouldn't they be "CORR" and "STDDEV_SAMP" ?
# Original comment:
# Any functions not explicitly listed in the converter will be translated
# to sql as is, so you don't need to convert all functions.
translate_sql(regr_intercept(y, x), variant = postgres_var)
# <SQL> REGR_INTERCEPT("y", "x")
This one behaves as expected, which is just like the other 2.
On the other hand default translated functions work, see:
translate_sql(mean(x), variant = postgres_var)
#<SQL> avg("x") OVER ()
It's a bug right ? or am I missing something ?
My goal is to create some variants for Oracle and use it in the following fashion,then for more complicated functions (example with SQLite to be reproducible):
con <- DBI::dbConnect(RSQLite::SQLite(), path = ":memory:")
copy_to(con, cars, "cars")
con %>% tbl("cars") %>% summarize(dist = group_concat(dist)) # works as expected, as we're stealing the keyword from sqlite directly
sqlite_variant <- sql_variant(aggregate=sql_translator(.parent = base_agg,gpc = sql_prefix("group_concat")))
con %>% tbl("cars") %>% summarize(dist = gpc(dist)) # how do I make this work ?
EDIT:
One bounty later still no solution, I've cross posted the issue in the dplyr/dbplyr github page directly where I'm not sure if it has or will get attention, but in case I (or someone else) don't update this in time, check this url : https://github.com/tidyverse/dplyr/issues/3117

This is what Hadley Wickham answered on provided github link:
translate_sql() doesn't have a variant argument any more
Indeed the variant argument is not documented, though the examples use it, I suppose it will be corrected for next version.
Asked how to define custom SQL translations he had this to offer:
Have a look at http://dbplyr.tidyverse.org/articles/new-backend.html
and http://dbplyr.tidyverse.org/articles/sql-translation.html
I guess another option is to get the older version of dbplyr::sql_variant.

Related

r - check if columns renamed when unnesting

Is there a way to know if a table was renamed in the process of unnesting? I want to know if there is something where I can intercept any messages that come through with New names: and give more context about solutions
# min reprex
library(tidyverse)
f <- function() {
tibble(
x = 1:2,
y = 2:1,
z = tibble(x = 1)
) |>
unnest_wider(z, names_repair = "unique")
}
f()
New names:
• `x` -> `x...1`
• `x` -> `x...3`
x...1 y x...3
----- ----- -----
1 2 1
2 1 1
More context:
The message stems from
vctrs::vec_as_names(c("x", "x"), repair = "universal")
I see information about withCallingHandlers() but not sure if that is the right route. I thought there was a way for errors/messages to have classes that you can intercept but I can't remember what I read.
Something in testthat::expect_message() may help. I thought there would be a has_message() function out there.
There is a lot of tidy evaluation and comparing names before and after might be tricky. I could look for the names with the regex "\\.+\\d+$" but not sure that is robust enough since data could have fields with that syntax already.
Thank you!
Taking inspiration from hadley's answer, on the question that #ritchie-sacramento linked, you should check out the evaluate package.
> eval_res <- evaluate::evaluate("f()")
> eval_res[[2]]$message
[1] "New names:\n* x -> x...1\n* x -> x...3\n"
This will require more testing to see what happens to the data structure when there are errors, warnings, or even multiple messages. But this seems like the right track.

googledrive::drive_mv gives error "Parent specified via 'path' is invalid: x Does not exist"

This is a weird one and I am hoping someone can figure it out. I have written a function that uses googlesheets4 and googledrive. One thing I'm trying to do is move a googledrive document (spreadsheet) from the base folder to a specified folder. I had this working perfectly yesterday so I don't know what happened as it just didn't when I came in this morning.
The weird thing is that if I step through the function, it works fine. It's just when I run the function all at once that I get the error.
I am using a folder ID instead of a name and using drive_find to get the correct folder ID. I am also using a sheet ID instead of a name. The folder already exists and like I said, it was working yesterday.
outFolder <- 'exact_outFolder_name_without_slashes'
createGoogleSheets <- function(
outFolder
){
folder_id <- googledrive::drive_find(n_max = 10, pattern = outFolder)$id
data <- data.frame(Name = c("Sally", "Sue"), Data = c("data1", "data2"))
sheet_id <- NA
nameDate <- NA
tempData <- data.frame()
for (i in 1:nrow(data)){
nameDate <- data[i, "Name"]
tempData <- data[i, ]
googlesheets4::gs4_create(name = nameDate, sheets = list(sheet1 = tempData)
sheet_id <- googledrive::drive_find(type = "spreadsheet", n_max = 10, pattern = nameDate)$id
googledrive::drive_mv(file = as_id(sheet_id), path = as_id(folder_id))
} end 'for'
} end 'function'
I don't think this will be a reproducible example. The offending code is within the for loop that is within the function and it works fine when I run through it step by step. folder_id is defined within the function but outside of the for loop. sheet_id is within the for loop. When I move folder_id into the for loop, it still doesn't work although I don't know why it would change anything. These are just the things I have tried. I do have the proper authorization for google drive and googlesheets4 by using:
googledrive::drive_auth()
googlesheets4::gs4_auth(token = drive_token())
<error/rlang_error>
Error in as_parent():
! Parent specified via path is invalid:
x Does not exist.
Backtrace:
global createGoogleSheets(inputFile, outPath, addNames)
googledrive::drive_mv(file = as_id(sheet_id), path = as_id(folder_id))
googledrive:::as_parent(path)
Run rlang::last_trace() to see the full context.
Backtrace:
x
-global createGoogleSheets(inputFile, outPath, addNames)
-googledrive::drive_mv(file = as_id(sheet_id), path = as_id(folder_id))
\-googledrive:::as_parent(path)
\-googledrive:::drive_abort(c(invalid_parent, x = "Does not exist."))
\-cli::cli_abort(message = message, ..., .envir = .envir)
\-rlang::abort(message, ..., call = call, use_cli_format = TRUE)
I have tried changing the folder_id to the exact path of my google drive W:/My Drive... and got the same error. I should mention I have also tried deleting the folder and re-creating it fresh.
Anybody have any ideas?
Thank you in advance for your help!
I can't comment because I don't have the reputation yet, but I believe you're missing a parenthesis in your for-loop.
You need that SECOND parenthesis below:
for (i in 1:nrow(tempData) ) {
...
}

R - Trycatch is saving warning instead of returning function output

I am trying to download records from twitter using rtweet. One issue with this is the twitter server needs to wait 15minutes every 18000 records. So, after record number 18000, I receive a data frame with all the records and a nice warning telling me to wait for a bit. search_tweets has an function argument to download more than 18000 records called retryonratelimit. However, this isnt working so I am exploring other options.
I have produced a function, incorporating tryCatch to address this. However, when the warning at 18000 records pops up, tryCatch is saving the warning rather than the data frame which should be spit out before the warning. Something it would not do if 17999 records were downloaded
library(rtweet)
library(RDCOMClient)
library(profvis)
TwitScrape = function(SearchTerm){
ReturnDF = tryCatch({
TempList=NULL
Temp = search_tweets(SearchTerm,n=18000)
TempList = list(as.data.frame(Temp), SearchTerm)
return(TempList)
},
warning = function(TempList){
Comb=NULL
MAXID = min(TempList[[1]]$status_id)
message("Delay for 15 minutes to accommodate server download limits")
pause(901)
TempWarn = search_tweets(TempList[[2]],n=18000, max_id=MAXID)
TempWarn = as.data.frame(TempWarn)
Comb = rbind(TempList[[1]], TempWarn)
CombList = list(Comb, TempList[[2]])
return(CombList)
}
)
}
Searches = c("#MUFC","#LFC", "#MCFC")
TestExpandList=NULL
TestExpand=NULL
TestExpand2=NULL
for (i in seq_along(Searches)){
TestExpandList = TwitScrape(SearchTerm = Searches[i])
TestExpand = TestExpandList[[1]]
TestExpand$Cat = Searches[i]
TestExpand$DownloadDate = Sys.Date()
TestExpand2 = rbind(TestExpand2, TestExpand)
}
I hope this makes sense. If I can offer any more information please let me know. In summary, why is tryCatch saving my warning rather than the data frame I want?
I am not 100% sure what you would like to achieve, but it seems you are using tryCatch with a wrong understanding.
The argument in the warning-handler warning = function(TempList) is the warning itself, i.e. you have named it TempList, but that doesn't mean it will become your TempList variable, it will still just pass the warning into the handler.
Your function TwitScrape is returning ReturnDF by convention, as you are not properly returning anything, I guess that is still what you want and ok.
I would try to re-structure your solution without tryCatch
Thanks for your comments. RolandASc, you were right. I went back to the drawing board. See the working TwitScrape function below:
TwitScrape = function(SearchTerm){
DF=NULL
DF = search_tweets(SearchTerm,n=18001)
Warn = warnings()
if (names(Warn[1]) == "Rate limit exceeded - 88"){
message("paused")
pause(910)
DF2 = search_tweets(SearchTerm,n=18000, max_id = min(DF$status_id))
DF3 = rbind(DF, DF2)
return(DF3)
}
else {
return(DF)
}}

Importing data into R (rdata) from Github

I want to put some R code plus the associated data file (RData) on Github.
So far, everything works okay. But when people clone the repository, I want them to be able to run the code immediately. At the moment, this isn't possible because they will have to change their work directory (setwd) to directory that the RData file was cloned (i.e. downloaded) to.
Therefore, I thought it might be easier, if I changed the R code such that it linked to the RData file on github. But I cannot get this to work using the following snippet. I think perhaps there is some issue text / binary issue.
x <- RCurl::getURL("https://github.com/thefactmachine/hex-binning-gis-data/raw/master/popDensity.RData")
y <- load(x)
Any help would be appreciated.
Thanks
This works for me:
githubURL <- "https://github.com/thefactmachine/hex-binning-gis-data/raw/master/popDensity.RData"
load(url(githubURL))
head(df)
# X Y Z
# 1 16602794 -4183983 94.92019
# 2 16602814 -4183983 91.15794
# 3 16602834 -4183983 87.44995
# 4 16602854 -4183983 83.79617
# 5 16602874 -4183983 80.19643
# 6 16602894 -4183983 76.65052
EDIT Response to OP comment.
From the documentation:
Note that the https:// URL scheme is not supported except on Windows.
So you could try this:
download.file(githubURL,"myfile")
load("myfile")
which works for me as well, but this will clutter your working directory. If that doesn't work, try setting method="curl" in the call to download.file(...).
I've had trouble with this before as well, and the solution I've found to be the most reliable is to use a tiny modification of source_url from the fantastic [devtools][1] package. This works for me (on a Mac).
load_url <- function (url, ..., sha1 = NULL) {
# based very closely on code for devtools::source_url
stopifnot(is.character(url), length(url) == 1)
temp_file <- tempfile()
on.exit(unlink(temp_file))
request <- httr::GET(url)
httr::stop_for_status(request)
writeBin(httr::content(request, type = "raw"), temp_file)
file_sha1 <- digest::digest(file = temp_file, algo = "sha1")
if (is.null(sha1)) {
message("SHA-1 hash of file is ", file_sha1)
}
else {
if (nchar(sha1) < 6) {
stop("Supplied SHA-1 hash is too short (must be at least 6 characters)")
}
file_sha1 <- substr(file_sha1, 1, nchar(sha1))
if (!identical(file_sha1, sha1)) {
stop("SHA-1 hash of downloaded file (", file_sha1,
")\n does not match expected value (", sha1,
")", call. = FALSE)
}
}
load(temp_file, envir = .GlobalEnv)
}
I use a very similar modification to get text files from github using read.table, etc. Note that you need to use the "raw" version of the github URL (which you included in your question).
[1] https://github.com/hadley/devtoolspackage
load takes a filename.
x <- RCurl::getURL("https://github.com/thefactmachine/hex-binning-gis-data/raw/master/popDensity.RData")
writeLines(x, tmp <- tempfile())
y <- load(tmp)

Get function's title from documentation

I would like to get the title of a base function (e.g.: rnorm) in one of my scripts. That is included in the documentation, but I have no idea how to "grab" it.
I mean the line given in the RD files as \title{} or the top line in documentation.
Is there any simple way to do this without calling Rd_db function from tools and parse all RD files -- as having a very big overhead for this simple stuff? Other thing: I tried with parse_Rd too, but:
I do not know which Rd file holds my function,
I have no Rd files on my system (just rdb, rdx and rds).
So a function to parse the (offline) documentation would be the best :)
POC demo:
> get.title("rnorm")
[1] "The Normal Distribution"
If you look at the code for help, you see that the function index.search seems to be what is pulling in the location of the help files, and that the default for the associated find.packages() function is NULL. Turns out tha tthere is neither a help fo that function nor is exposed, so I tested the usual suspects for which package it was in (base, tools, utils), and ended up with "utils:
utils:::index.search("+", find.package())
#[1] "/Library/Frameworks/R.framework/Resources/library/base/help/Arithmetic"
So:
ghelp <- utils:::index.search("+", find.package())
gsub("^.+/", "", ghelp)
#[1] "Arithmetic"
ghelp <- utils:::index.search("rnorm", find.package())
gsub("^.+/", "", ghelp)
#[1] "Normal"
What you are asking for is \title{Title}, but here I have shown you how to find the specific Rd file to parse and is sounds as though you already know how to do that.
EDIT: #Hadley has provided a method for getting all of the help text, once you know the package name, so applying that to the index.search() value above:
target <- gsub("^.+/library/(.+)/help.+$", "\\1", utils:::index.search("rnorm",
find.package()))
doc.txt <- pkg_topic(target, "rnorm") # assuming both of Hadley's functions are here
print(doc.txt[[1]][[1]][1])
#[1] "The Normal Distribution"
It's not completely obvious what you want, but the code below will get the Rd data structure corresponding to the the topic you're interested in - you can then manipulate that to extract whatever you want.
There may be simpler ways, but unfortunately very little of the needed coded is exported and documented. I really wish there was a base help package.
pkg_topic <- function(package, topic, file = NULL) {
# Find "file" name given topic name/alias
if (is.null(file)) {
topics <- pkg_topics_index(package)
topic_page <- subset(topics, alias == topic, select = file)$file
if(length(topic_page) < 1)
topic_page <- subset(topics, file == topic, select = file)$file
stopifnot(length(topic_page) >= 1)
file <- topic_page[1]
}
rdb_path <- file.path(system.file("help", package = package), package)
tools:::fetchRdDB(rdb_path, file)
}
pkg_topics_index <- function(package) {
help_path <- system.file("help", package = package)
file_path <- file.path(help_path, "AnIndex")
if (length(readLines(file_path, n = 1)) < 1) {
return(NULL)
}
topics <- read.table(file_path, sep = "\t",
stringsAsFactors = FALSE, comment.char = "", quote = "", header = FALSE)
names(topics) <- c("alias", "file")
topics[complete.cases(topics), ]
}

Resources