R bigrquery - how to catch error messages from executed SQL? - r

Say I have some SQL code that refreshes a table of data, and I would like to schedule an R script to schedule this code to run daily. Is there a way to capture any potential error messages the SQL code may throw and save that error message to an R variable instead of the error message being displayed in the R console log?
For an example, assume I have stored procedure sp_causing_error() in BigQuery that that takes data from a source table source_table and refreshes a target table table_to_refresh.
CREATE OR REPLACE PROCEDURE sp_causing_error()
BEGIN
CREATE OR REPLACE TABLE table_to_refresh AS (
Select non_existent_column, x, y, z
From source_table
);
END;
Assume the schema of the source_table has changed and column non_existent_column no longer exists. When attempting to call sp_causing_error() in RStudio via:
library(bigrquery)
query <- "CALL sp_causing_error()"
bq_project_query(my_project, query)
We get an error message printed to the console (which masks the actual error message we would encounter if running in BigQuery):
Error in UseMethod("as_bq_table") : no applicable method for 'as_bq_table' applied to an object of class "NULL"
If we were to run sp_causing_error() in BigQuery, it throws an error message stating:
Query error: Unrecognized name: non_existent_column at [sp_throw_error:3:8]
Are query error message displayed in BigQuery ever captured anywhere in bigrquery when executing SQL? My goal would be to have some sort of try/catch block in the R script that catches an error message that can then be written to an output file if the SQL code did not run successfully. Hoping there is a way we can capture the descriptive error message from BigQuery and assign it to an R variable for further processing.
UPDATE
R's tryCatch() function comes in handy here to catch the R error message:
query <- "CALL sp_causing_error()"
result <- tryCatch(
bq_project_query("research-01-217611", query),
error = function(err) {
return(err)
}
)
result now contains the error message from the R console:
<simpleError in UseMethod("as_bq_table"): no applicable method for 'as_bq_table' applied to an object of class "NULL">
However, this is still not descriptive of the actual error message we see if we execute the same SQL code in BigQuery, quoted above which references an unrecognized column name. Are we able to catch that error message instead of the more generic R error message?

UPDATE/ANSWER
Wrapping the stored procedure call within R using BigQuery's Begin...Exception...End syntax lets us get at the actual error message. Example code snippet:
query <- '
BEGIN
CALL sp_causing_error();
EXCEPTION WHEN ERROR THEN
Select 1 AS error_flag, ##error.message AS error_message, ##error.statement_text AS error_statement_text, ##error.formatted_stack_trace AS stack_trace
;
END;
'
query_result <- bq_table_download(bq_project_query(<project>, query))
error_flag <- query_result["error_flag"][[1]]
if (error_flag == 0) {
print("Job ran successfully")
} else {
print("Job failed")
# Access error message variables here and take additional action as desired
}
Warning: Note that this solution could cause an R error if the stored procedure completes successfully, as error_flag will not exist unless explicitly passed at the end of the stored procedure. This can be worked around by adding one line at the end of your stored procedure in BigQuery to set the flag appropriately so the bq_table_download() function will get a value upon the stored procedure running successfully:
BEGIN
-- BigQuery stored procedure code
-- ...
-- ...
Select 0 AS error_flag;
END;

Related

MariaDB SphinxSE not accepting weights parameter

If I try this query:
select * FROM sphinx.products where `query` = "test";
it works. But if I try to give it weights it returns an error:
select * FROM sphinx.products where `query` = "test;sort=extended:#weight DESC;weights=3,1,1,1";
Fails with error:
Error in query (1430): There was a problem processing the query on the foreign data source.
Data source error: searchd error: invalid deprecated unordered_weight count 4 (expe
(Error reported by MariaDB gets truncated there, but I believe it says "expecting 0")
And:
select * FROM sphinx.products where `query` = "test;sort=extended:#weight DESC";
Fails with error:
Error in query (1430): There was a problem processing the query on the foreign data source.
Data source error: searchd error: index 'sku_products': sort-by attribute '#weight'
(Again, error returned by SphinxSearch gets truncated by MariaDB)
All the documentation I find about SphinxSE tells me to query the index this way, yet it does not work, but nobody in the Internet seem to have met this error since nobody is asking about this anywhere...
Am I doing something wrong?
Well, the option weights= didn't work, but it accepted fieldweights=sku,90,partnumber,30,barcode,20,name,10.
(I.e., fieldweights=<field1_name>,<field1_weight>,...)
Results came ordered by weight even without specifying sort=extended:#weight DESC, so I dodged both errors and got what I needed.
Hope this helps anyone in the same situation.

pSQL query against Redshift works in DataGrip but is cancelled by WLM abort action when run through R?

I have a query that runs within seconds through dataGrip but keeps failing when run through R; I'm at my wits end on what could be causing it. I have copied the exact string that is used for the query within R and it works just fine through dataGrip.
My connection to my Redshift database works through R- I am able to select rows and even perform simple groupby operations, so I doubt that is the issue. Here's my code/query
library(RPostgreSQL)
library(RJDBC)
library(tidyverse)
conn <- dbConnect(dbDriver("PostgreSQL"),
host = 'X',
port = 'Y',
user = 'A',
password = 'B',
dbname = 'C')
df = dbGetQuery(conn, str_remove_all(paste0("
SELECT
hour,
date,
group1,
group2,
ad_group,
SUM(factor1),
SUM(factor2),
SUM(factor3),
SUM(factor4)
FROM table
WHERE
filter_col = ",value,"
GROUP BY
hour,
date,
group1,
group2,
group3;"
),'\n'))
Error generated in console after a few seconds:
Error in postgresqlExecStatement(conn, statement, ...) :
RS-DBI driver: (could not Retrieve the result : ERROR: Query (X) cancelled by WLM abort action
DETAIL:
-----------------------------------------------
error: Query (X) cancelled by WLM abort action
code: 1078
context: Query (X) cancelled by WLM abort action
query: 0
location: abort_query_action.cpp:Z
process: wlm [pid=Y]
-----------------------------------------------
)
Warning message:
In postgresqlQuickSQL(conn, statement, ...) :
Could not create execute: (above query);
Edit- value is a single numeric variable.
Edit2- I've just checked the timeout definitions for the Redshift cluster. It is set to 3 minutes, whereas my query was aborted in less than a minute
So apparently, my Redshift cluster has had a million row return limit; because dataGrip only shows you the first 500 rows when you run a query and actually processes ALL of it when you either download or skip to the last 'page'; I never hit this issue (until I actually tried downloading it all and dug into my Redshift cluster settings). To overcome this, I'm using offset and limit on my query in a loop to download everything I need. I hope this helps someone!

How to fetch data from hana to R.my codes are showing error

I am fetching data table to R from hana but it is showing some kind of error.
Succesfully integrated and odbc got connected but data is not fetching.
sqlFetch(ch,'SELECT * FROM "MY_SCHEMA.TICKETS_BY_YEAR"')
Error in odbcTableExists(channel, sqtable) :
‘SELECT * FROM "MY_SCHEMA.TICKETS_BY_YEAR"’: table not found on channel
i expected for the data but it is not coming
The cause of the error message is the wrong use of double-quotes (“).
To correctly quote the schema name and the table name, each of them need to be enclosed in a couple of quotation marks like so:
FROM “SCHEMA_NAME”.”TABLE_NAME”
^ ^ ^ ^
Your command in R needs to look like this:
sqlFetch(ch, 'SELECT * FROM "MY_SCHEMA”.”TICKETS_BY_YEAR"')

How might I get detailed database error messages from dplyr::tbl?

I'm using R to plot some data I pull out of a database (the Stack Exchange data dump, to be specific):
dplyr::tbl(serverfault,
dbplyr::sql("
select year(p.CreationDate) year,
avg(p.AnswerCount*1.0) answers_per_question,
sum(iif(ClosedDate is null, 0.0, 100.0))/count(*) close_rate
from Posts p
where PostTypeId = 1
group by year(p.CreationDate)
order by year(p.CreationDate)
"))
The query works fine on SEDE, but I get this error in the R console:
Error: <SQL> 'SELECT *
FROM (
select year(p.CreationDate) year,
avg(p.AnswerCount*1.0) answers_per_question,
sum(iif(ClosedDate is null, 0.0, 100.0))/count(*) close_rate
from Posts p
where PostTypeId = 1
group by year(p.CreationDate)
order by year(p.CreationDate)
) "zzz11"
WHERE (0 = 1)'
nanodbc/nanodbc.cpp:1587: 42000: [FreeTDS][SQL Server]Statement(s) could not be prepared.
I reckoned "Statement(s) could not be prepared." meant that SQL Server didn't like the query for some reason. Unfortunately, it didn't give any hint about what went wrong. After fiddling with the query for a bit, I noticed it was wrapped in a subselect, according to the error message. Copying and executing the full query as constructed by one of the libraries in the chain, SQL Server gave me this more informative error message:
The ORDER BY clause is invalid in views, inline functions, derived tables, subqueries, and common table expressions, unless TOP, OFFSET or FOR XML is also specified.
Now the solution is obvious: remove (or comment out) the order by clause. But where is the detailed error message in the R console? I'm using Rstudio, should that matter. If I could get the full exception right next to the code I'm working on, it would help me fix bug a lot quicker. (And just to be clear, I get cryptic errors from dplyr::tbl often and typically use binary search debugging to fix them.)

Create a stored procedure using RMySQL

Background: I am developing a rscript that pulls data from a mysql database, performs a logistic regression and then inserts the predictions back into the database. I want the entire system to be self contained in the script in case of database failure. This includes all mysql stored procedures that the script depends on to aggregate the data on the backend since these would be deleted in such a database failure.
Question: I'm having trouble creating a stored procedure from an R script. I am running the following:
mySQLDriver <- dbDriver("MySQL")
connect <- dbConnect(mySQLDriver, group = connection)
query <-
"
DROP PROCEDURE IF EXISTS Test.Tester;
DELIMITER //
CREATE PROCEDURE Test.Tester()
BEGIN
/***DO DATA AGGREGATION***/
END //
DELIMITER ;
"
sendQuery <- dbSendQuery(connect, query)
dbClearResult(dbListResults(connect)[[1]])
dbDisconnect(connect)
I however get the following error that seems to involve the DELIMITER change.
Error in .local(conn, statement, ...) :
could not run statement: You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'DELIMITER //
CREATE PROCEDURE Test.Tester()
BEGIN
/***DO DATA AGGREGATION***/
EN' at line 2
What I've Done: I have spent quite a bit of time searching for the answer, but have come up with nothing. What am I missing?
Just wanted to follow up on this string of comments. Thank you for your thoughts on this issue. I have a couple Python scripts that need to have this functionality and I began researching the same topic for Python. I found this question that indicates the answer. The question states:
"The DELIMITER command is a MySQL shell client builtin, and it's recognized only by that program (and MySQL Query Browser). It's not necessary to use DELIMITER if you execute SQL statements directly through an API.
The purpose of DELIMITER is to help you avoid ambiguity about the termination of the CREATE FUNCTION statement, when the statement itself can contain semicolon characters. This is important in the shell client, where by default a semicolon terminates an SQL statement. You need to set the statement terminator to some other character in order to submit the body of a function (or trigger or procedure)."
Hence the following code will run in R:
mySQLDriver <- dbDriver("MySQL")
connect <- dbConnect(mySQLDriver, group = connection)
query <-
"
CREATE PROCEDURE Test.Tester()
BEGIN
/***DO DATA AGGREGATION***/
END
"
sendQuery <- dbSendQuery(connect, query)
dbClearResult(dbListResults(connect)[[1]])
dbDisconnect(connect)

Resources