How to prevent db connection code chunks from evaluating in Rmd? - r

I have a .RMD file that I knit to generate reports. In the beginning of the doc, there is a code chunk that executes a query on a remote DB and returns some data as an output (typical select query from joins of different tables). The data that is being generated is quite fixed in the sense that I'm retrieving data from a certain date interval and this date interval that I'm performing the analysis on doesn't change.
Each time I make an appearance related change in the .RMD file and I knit it, it runs this query which takes >2 minutes to run as quite some data is being returned. I don't want this to happen since the base data that I'm performing analysis on doesn't change at all.
How do I ensure this one block alone doesn't evaluate all the time? I have tried putting eval = FALSE
However I get the following error:
Error in UseMethod("mutate") : no applicable method for 'mutate' applied to an object of class "function" Calls: <Anonymous> ... withCallingHandlers -> withVisible -> eval -> eval -> %>% -> mutate Execution halted`<br><br>
For context, df is the data frame that is returned after execution of the query through dbExecuteQuery(). As soon as the chunk that has eval = FALSE is executed, I have another chunk that performs mutate() on it. That is where the error is generated.
The code chunk with eval=FALSE is a just a generic sql execution code chunk. The chunk's content goes something like this:
query <- 'select * from table1 join table2'
query2 <- 'select * from table3'
df1 <- dbGetQuery(conn,query)
df2 <- dbGetQuery(conn,query2)
df < - left_join(df1,df2)
The next code chunk where the error originates from does something like this:
df <- df %>% mutate(newcol = is.na(somecol))
I found an answer on Stackoverflow that kinda sorta explains my problem but with no satisfactory solution.
Link to SF post: error knitting flex dashboard rmarkdown dplyr

I assume, even if you get the error message, you receive it before the 2 minute mark, which would it take if this chunk would be evaluated, right? So the eval = FALSE shouldn't be the problem at all. If you don't need to run the query again and again, I assume that you stored the data locally?
The error message itself could pop up because of a package conflict, maybe try dplyr::mutate?

You could perhaps try with memoise
it'd be something like:
query <- 'select * from table1 join table2'
query2 <- 'select * from table3'
my.get.query <- memoise( dbGetQuery )
df1 <- my.get.query(conn,query)
df2 <- my.get.query(conn,query2)
df < - left_join(df1,df2)
This may not work, it depends entirely on how knitting is operated, if it starts fresh with a new session or not.
Otherwise write your own function that caches the results to a file.

Related

R Check Warning: View() should not be used in examples

I created my first R package which has three functions, to query the data in database and return the data frames based on user input. Since the data frames are large instead of printing them in console, I added View() within my function to show user the data extracted based on their input.
Code goes like this:
queryData <- function(p, q, r, s, t){
d <- DBI::dbGetQuery(conn = con, statement = "SELECT * FROM dataset" )
d <- d%>%
dplyr::filter(org == p) %>%
dplyr::filter(exp == q) %>%
dplyr::filter(dis %like% r) %>%
dplyr::filter(tis %like% s) %>%
dplyr::filter(Rx %like% t)
print(paste("No. of datasets that matched your criteria:",nrow(d)))
View(d)
}
R check was fine, I was able to install the package and run the functions. But it gave me error when I created vignette for the package.
Here is the error message:
Error: processing vignette 'package_vignette.Rmd' failed with diagnostics:
View() should not be used in examples etc
--- failed re-building 'package_vignette.Rmd'
SUMMARY: processing the following file failed:
'package_vignette.Rmd'
Error: Vignette re-building failed.
Any advice on how to fix this issue?
As the error message mentioned, View() is not made for RMarkdown, which is what the package vignettes are written in. The R Markdown cookbook suggests you can display the data just by calling the object using the built-in knitr::kable(). If it's too long you can show just the first bit by subsetting it. E.g.
knitr::kable(my_table[5,5])
will print only the first 5 rows and columns of the table. There are other packages you can use too (a brief list here), which work differently depending on the desired output format.
Alternatively, you can use paged tables to avoid scrolling:
rmarkdown::paged_table(my_table)

Can I use the output of a function in another R file?

I built a function that retrieves data from an Azure table via a REST API. I sourced the function, so I can reuse it in other R scripts.
The function is as below:
Connect_To_Azure_Table(Account, Container, Key)
and it returns as an output a table called Azure-table. The very last line of the code in the function is
head(Azure_table)
In my next script, I'm going to call that function and execute some data transformation.
However, while the function executes (and my Azure_table is previewed), I don't seem to be able to use it in the code to start performing my data transformation. For example, this is the beginning of my ETL function:
library(dplyr)
library(vroom)
library(tidyverse)
library(stringr)
#Connects to datasource
if(exists("Connect_To_Azure_Table", mode = "function")) {
source("ConnectToAzureTable.R")
}
Account <- "storageaccount"
Container <- "Usage"
Key <- "key"
Connect_To_Azure_Table(Account, Container, Key)
# Performs ETL process
colnames(Azure_table) <- gsub("value.", "", colnames(Azure_table)) # Removes prefix from column headers
Both the function and the table get warning. But while the function executes anyway, the Azure_table throws an error:
> # Performs ETL process
>
> colnames(Azure_table) <- gsub("value.", "", colnames(Azure_table)) # Removes prefix from column headers
Error in is.data.frame(x) : object 'Azure_table' not found
What should I be doing to use Azure_table in my script?
Thanks in advance!
~Alienvolm
You can ignore the RStudio warnings, they are based on a heuristic, and in this case it’s very imprecise and misleading.
However, there are some errors in your code.
Firstly, you’re only sourceing the code if the function was already defined. Surely you want to do it the other way round: source the code if the function does not yet exist. But furthermore that check is completely redundant: if you haven’t sourced the code which defines the function yet, the function won’t be defined. The existence check is unnecessary and misleading. Remove it:
source("ConnectToAzureTable.R")
Secondly, when you’re calling the function you’re not assigning its return value to any name. You probably meant to write the following:
Azure_table <- Connect_To_Azure_Table(Account, Container, Key)

Error in UseMethod("select_") in Blogdown

I am using Blogdown to create a new post and I am getting the following error when trying to preview.
The code works well in my Rmarkdown file but I cannot update it to my blog. Do anyone know where the problem is?
Quitting from lines 36-47
Error in UseMethod("select_") :
no applicable method for 'select_' applied to an object of class "function"
Calls: local ... freduce -> -> select -> select.default -> select_
Execution halted
Here is my code in lines 36-47;
library(corrplot)
library(RColorBrewer)
library(tidyverse)
corrplot(cor(df %>% select(Sales, Customers, Store,
Open, SchoolHoliday,
DayOfWeek, month, year,
CompetitionDistance,
Promo, Promo2_active) %>%
filter(!is.na(Sales), !is.na(CompetitionDistance))),
type="upper", order="original",
col=brewer.pal(n=8, name="RdYlBu"))
Thanks a lot.
I think you're getting this error because you don't have an object called df in your global environment. Either your data frame hasn't been created yet or it is called something else. There is a little-known function called df in the stats package, which is on the search path when you start an R session. You can check this by starting a new R session and typing df into the console. You will see the body of the function stats::df.
You are therefore getting the error because you are trying to subset a function, not a data frame. To resolve the error, make sure you create a data frame called df before your call to corrplot

Avoiding warning message “There is a result object still in use” when using dbSendQuery to create table on database

Background:
I use dbplyr and dplyr to extract data from a database, then I use the command dbSendQuery() to build my table.
Issue:
After the table is built, if I run another command I get the following warning:
Warning messages:
1. In new_result(connection#ptr, statement): Cancelling previous query
2. In connection_release(conn#ptr) :
 There is a result object still in use.
The connection will be automatically released when it is closed.
Question:
Because I don’t have a result to fetch (I am sending a command to build a table) I’m not sure how to avoid this warning. At the moment I disconnect after building a table and the error goes away. Is there anything I can do do to avoid this warning?
Currently everything works, I just have this warning. I'd just like to avoid it as I assume I should be clearing something after I've built my table.
Code sample
# establish connection
con = DBI::dbConnect(<connection stuff here>)
# connect to table and database
transactions = tbl(con,in_schema(“DATABASE_NAME”,”TABLE_NAME”))
# build query string
query_string = “SELECT * FROM some_table”
# drop current version of table
DBI::dbSendQuery(con,paste('DROP TABLE MY_DB.MY_TABLE'))
# build new version of table
DBI::dbSendQuery(con,paste('CREATE TABLE PABLE MY_DB.MY_TABLE AS (‘,query_string,’) WITH DATA'))
Even though you're not retrieving stuff with a SELECT clause, DBI still allocates a result set after every call to DBI::dbSendQuery().
Give it a try with DBI::dbClearResult() in between of DBI::dbSendQuery() calls.
DBI::dbClearResult() does:
Clear A Result Set
Frees all resources (local and remote) associated with a
result set. In some cases (e.g., very large result sets) this
can be a critical step to avoid exhausting resources
(memory, file descriptors, etc.)
The example of the man page should give a hint how the function should be called:
con <- dbConnect(RSQLite::SQLite(), ":memory:")
rs <- dbSendQuery(con, "SELECT 1")
print(dbFetch(rs))
dbClearResult(rs)
dbDisconnect(con)

Knitting returns parse error

In attempting to knit a PDF. I'm calling a script that should return two ggplots by calling the chunk:
```{r, echo=FALSE}
read_chunk('Script.R')
```r
But receive the error
processing file: Preview-24a46368403c.Rmd
Quitting from lines 9-12 (Preview-24a46368403c.Rmd) Error in
parse(text = x, srcfile = src) : attempt to use zero-length
variable name Calls: <Anonymous> ... <Anonymous> -> parse_all ->
parse_all.character -> parse Execution halted
The script on its own runs and returns the two plots, but won't return them when knitted.
Similarly attempted to use source()
But got a similar error
Quitting from lines 7-10 (Preview-24a459ca4c1.Rmd) Error in
file(filename, "r", encoding = encoding) : cannot open the
connection Calls: <Anonymous> ... withCallingHandlers -> withVisible
-> eval -> eval -> source -> file Execution halted
While this does not appear to be a solution for you, this exact same error message appears if the chunk is not ended properly.
I experienced this error and traced it to ending chunk with `` instead of ```. Correcting the syntax of the chunk solved the problem I experienced with the same error message as you.
Are you sure that knitr is running from the directory you think it is? It appears that it is failing to find the file.
use an absolute path, if that fixes it, you've found your problem
once you've done that, you can use opts_knit$set(root.dir = "...") -- don't use setwd(.) if you want it (the cwd) to be maintained.
Knitr's default is the directory of the .Rmd file itself.
It may have to do with the "r" at the end of the triple backquotes demarcating your code chunk. There should be nothing after the triple backquotes, but I think the problem is specifically that the letter is "r".
The issue stems from the fact that R markdown processes backquoted statements starting with r as inline code, meaning it actually runs whatever is between the backquotes.
I had similar issues writing a problem set in an Rmd with this statement, which had backquoted text intended to be monospace but not run as inline code:
Use sapply or map to calculate the probability of a failure rate over r <- seq(.05, .5, .025).
When I knit the document, I got opaque error messages saying I had an improper assignment using <-. It was because instead of just displaying the backquoted statement in monospace, r <- seq(.05, .5, .025) was actually processed as R inline code of <- seq(.05, .5, .025)...thus the improper assignment error. I fixed my error by changing the variable name from r to rate.
The actual text of the error message in your question might refer to whatever follows your code chunk, as the knitting process is probably trying to run that as code. In this case, just removing that stray r at the end of the code chunk should fix the error.
You should use the following similar syntax, I had the same exact issue but got it fixed:
```{r views}
bank.df <- read.csv("C:/Users/User/Desktop/Banks.csv", header = TRUE) #load data
dim(bank.df) # to find dimension of data frame
head(bank.df) # show first six rows
```
the ``` has to be in the end of the line.
In my case was that I finished the code with four comas, not three . Check this and If you finished with four comas too, try to delete one of them.

Resources