I'm having trouble passing NULL as an INSERT parameter query using RPostgres and RPostgreSQL:
In PostgreSQL:
create table foo (ival int, tval text, bval bytea);
In R:
This works:
res <- dbSendQuery(con, "INSERT INTO foo VALUES($1, $2, $3)",
params=list(ival=1,
tval= 'not quite null',
bval=charToRaw('asdf')
)
)
But this throws an error:
res <- dbSendQuery(con, "INSERT INTO foo VALUES($1, $2, $3)",
params=list(ival=NULL,
tval= 'not quite null',
bval=charToRaw('asdf')
)
)
Using RPostgres, the error message is:
Error: expecting a string
Under RPostgreSQL, the error is:
Error in postgresqlExecStatement(conn, statement, ...) :
RS-DBI driver: (could not Retrieve the result : ERROR: invalid input
syntax for integer: "NULL"
)
Substituting NA would be fine with me, but it isn't a work-around - a literal 'NA' gets written to the database.
Using e.g. integer(0) gives the same "expecting a string" message.
You can use NULLIF directly in your insert statement:
res <- dbSendQuery(con, "INSERT INTO foo VALUES(NULLIF($1, 'NULL')::integer, $2, $3)",
params=list(ival=NULL,
tval= 'not quite null',
bval=charToRaw('asdf')
)
)
works with NA as well.
One option here to workaround the problem of not knowing how to articulate a NULL value in R which the PostgresSQL pacakge will be able to successfully translate is to simply not specify the column whose value you want to be NULL in the database.
So in your example you could use this:
res <- dbSendQuery(con, "INSERT INTO foo (col2, col3) VALUES($1, $2)",
params=list(tval = 'not quite null',
bval = charToRaw('asdf')
)
)
when you want col1 to have a NULL value. This of course assumes that col1 in your table is nullable, which may not be the case.
Thanks all for the help. Tim's answer is a good one, and I used it to catch the integer values. I went a different route for the rest of it, writing a function in PostgreSQL to handle most of this. It looks roughly like:
CREATE OR REPLACE FUNCTION add_stuff(ii integer, tt text, bb bytea)
RETURNS integer
AS
$$
DECLARE
bb_comp bytea;
rows integer;
BEGIN
bb_comp = convert_to('NA', 'UTF8'); -- my database is in UTF8.
-- front-end catches ii is NA; RPostgres blows up
-- trying to convert 'NA' to integer.
tt = nullif(tt, 'NA');
bb = nullif(bb, bb_comp);
INSERT INTO foo VALUES (ii, tt, bb);
GET DIAGNOSTICS rows = ROW_COUNT;
RETURN rows;
END;
$$
LANGUAGE plpgsql VOLATILE;
Now to have a look at the RPostgres source and see if there's an easy-enough way to make it handle NULL / NA a bit more easily. Hoping that it's missing because nobody thought of it, not because it's super-tricky. :)
This will give the "wrong" answer if someone is trying to put literally 'NA' into the database and mean something other than NULL / NA (e.g. NA = "North America"); given our use case, that seems very unlikely. We'll see in six months time.
Related
I have a query in following format, used to perform COALESCE as well as define a new column using CASE statement.
SELECT ....
COALESCE(mm1,'missing') AS mm1,
COALESCE(mm2,'missing') AS mm2,
CASE WHEN mm1='false' AND mm2='false' THEN 'No-Proxy'
WHEN mm1 IN ('false','missing') AND mm2='true' THEN 'Good-Proxy'
WHEN mm1 ='true' AND mm2 IN ('false','missing') THEN 'Bad-Proxy'
WHEN ((mm1='true' AND mm2='true') OR (mm1='missing' AND mm2='missing')
OR (mm1='false' AND mm2='missing') OR (mm1='missing' AND mm2='false')) THEN 'Unknown'
END AS Proxy_Type,
As seen above when both mm1 and mm2 are originally NULL, we need to put value as Unknown for Proxy_Type. But when we run the query, we get unexpected output. Plz see screenshot.
Kindly advise on how to fix it.
It seems that "inline/lateral column aliasing" does not allow to "override" column at the same level:
CREATE OR REPLACE TABLE t
AS SELECT NULL AS mm1, NULL AS mm2;
Option 1: Using different column alias
SELECT
COALESCE(mm1,'missing') AS mm1_,
COALESCE(mm2,'missing') AS mm2_,
CASE WHEN mm1_='false' AND mm2_='false' THEN 'No-Proxy'
WHEN mm1_ IN ('false','missing') AND mm2_='true' THEN 'Good-Proxy'
WHEN mm1_ ='true' AND mm2_ IN ('false','missing') THEN 'Bad-Proxy'
WHEN ((mm1_='true' AND mm2_='true') OR (mm1_='missing' AND mm2_='missing')
OR (mm1_='false' AND mm2_='missing')
OR (mm1_='missing' AND mm2_='false')) THEN 'Unknown'
END AS Proxy_Type
FROM t;
-- MM1_ MM2_ PROXY_TYPE
--missing missing Unknown
Option 2: LATERAL JOIN and prefixing with subquery alias:
SELECT -- t.mm1, t.mm2,
s.mm1, s.mm2,
CASE WHEN s.mm1='false' AND s.mm2='false' THEN 'No-Proxy'
WHEN s.mm1 IN ('false','missing') AND s.mm2='true' THEN 'Good-Proxy'
WHEN s.mm1 ='true' AND s.mm2 IN ('false','missing') THEN 'Bad-Proxy'
WHEN ((s.mm1='true' AND s.mm2='true') OR (s.mm1='missing' AND s.mm2='missing')
OR (s.mm1='false' AND s.mm2='missing')
OR (s.mm1='missing' AND s.mm2='false')) THEN 'Unknown'
END AS Proxy_Type
FROM t,
LATERAL(SELECT COALESCE(t.mm1,'missing') AS mm1,COALESCE(t.mm2,'missing') AS mm2) s;
-- MM1 MM2 PROXY_TYPE
--missing missing Unknown
The ideal situation would be if we had additional keyword to distinguish between original column and calculated expression, kind of SAS - calculated.
SELECT
col,
col+10 AS col,
col,
calculated col
FROM t;
-- output
t.col/expression/t.col/expression
I’m guessing you are trying to use the re-defined values of mm1/mm2 in your case statement? If so then SQL doesn’t work like that, values don’t change within the same select statement so m1/m2 will have their starting values wherever they are referenced in the select statement.
One way round this is to use something like this:
COALESCE(mm1,'missing') AS mm1,
COALESCE(mm2,'missing') AS mm2,
CASE WHEN COALESCE(mm1,'missing') ='false' …
I am trying to query data from ClickHouse database from R with subset.
Here is the example
library(data.table)
library(RClickhouse)
library(DBI)
subset <- paste(traffic[,unique(IDs)][1:30], collapse = ',')
conClickHouse <- DBI::dbConnect('here is the connection')
DataX <- dbgetdbGetQuery(conClickHouse, paste0("select * from database
and IDs in (", subset ,") ", sep = "") )
As a result I get error:
DB::Exception: Type mismatch in IN or VALUES section. Expected: FixedString(34).
Got: UInt64: While processing (IDs IN ....
Any help is appreciated
Thanks to the comment of #DennyCrane,
"select * from database where toFixedString(IDs,34) in
(toFixedString(ID1, 34), toFixedString(ID2,34 ))"
This query subset properly
https://clickhouse.tech/docs/en/sql-reference/functions/#strong-typing
Strong Typing
In contrast to standard SQL, ClickHouse has strong typing. In other words, it doesn’t make implicit conversions between types. Each function works for a specific set of types. This means that sometimes you need to use type conversion functions.
https://clickhouse.tech/docs/en/sql-reference/functions/type-conversion-functions/#tofixedstrings-n
select * from (select 'x' B ) where B in (select toFixedString('x',1))
DB::Exception: Types of column 1 in section IN don't match: String on the left, FixedString(1) on the right.
use casting toString or toFixedString
select * from (select 'x' B ) where toFixedString(B,1) in (select toFixedString('x',1))
I have a dataframe in R containing 10 rows and 7 columns. There's a stored procedure that does the few logic checks in the background and then inserts the data in the table 'commodity_price'.
library(RMySQL)
#Connection Settings
mydb = dbConnect(MySQL(),
user='uid',
password='pwd',
dbname='database_name',
host='localhost')
#Listing the tables
dbListTables(mydb)
f= data.frame(
location= rep('Bhubaneshwar', 4),
sourceid= c(8,8,9,2),
product= c("Ingot", "Ingot", "Sow Ingot", "Alloy Ingot"),
Specification = c('ie10','ic20','se07','se08'),
Price=c(14668,14200,14280,20980),
currency=rep('INR',4),
uom=rep('INR/MT',4)
)
For multiple rows insert, there's a pre-created stored proc 'PROC_COMMODITY_PRICE_INSERT', which I need to call.
for (i in 1:nrow(f))
{
dbGetQuery(mydb,"CALL PROC_COMMODITY_PRICE_INSERT(
paste(f$location[i],',',
f$sourceid[i],',',f$product[i],',',f$Specification[i],',',
f$Price[i],',',f$currency[i],',', f$uom[i],',',#xyz,')',sep='')
);")
}
I am repeatedly getting error.
Error in .local(conn, statement, ...) :
could not run statement: You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near '[i],',',
f$sourceid[i],',',f$product[i],',',f$Specification' at line 2
I tried using RODBC but its not getting connected at all. How can I insert the data from the R dataframe in the 'commodity_price' table by calling a stored proc? Thanks in advance!
That is probably due to your use of ', this might work:
for (i in 1:nrow(f))
{
dbGetQuery(mydb,paste("CALL PROC_COMMODITY_PRICE_INSERT(",f$location[i],',',
f$sourceid[i],',',f$product[i],',',f$Specification[i],',',
f$Price[i],',',f$currency[i],',', f$uom[i],',',"#xyz",sep='',");"))
}
or the one-liner:
dbGetQuery(mydb,paste0("CALL PROC_COMMODITY_PRICE_INSERT('",apply(f, 1, paste0, collapse = "', '"),"');"))
Trying the for loop:
for (i in 1:nrow(f))
{
dbGetQuery(mydb,paste("CALL PROC_COMMODITY_PRICE_INSERT(","'",f$location[i],"'",',',"'",
f$sourceid[i],"'",',',"'",f$product[i],"'",',',"'",f$Specification[i],"'",',',"'",
f$Price[i],"'",',',"'",f$currency[i],"'",',',"'",f$uom[i],"'",',','#xyz',sep='',");"))
}
I am trying to write a prepared statement with dbSendQuery. My issue is that the data frame of inputs are converted to numeric values, but two of the three inputs are dates. This results in the following error message:
Warning: Error in postgresqlExecStatement: RS-DBI driver: (could not Retrieve the result : ERROR: invalid input syntax for type timestamp: "17624"
)
My code is as follows:
query = dbSendQuery(con,"
SELECT
***AL LOT OF TABLE AND JOINS***
WHERE
users_terminals.user_user_id = $1 and
planning_stops.planned_arrival >= $2 and
planning_stops.planned_arrival <= $3"
,
data.frame(user$users_users_id,
datefrom,
dateto))
tmp = dbFetch(query)
dbClearResult(query)
The numeric value of datefrom is 17624, so this make me think that $2 is replaced by as.numeric(datefrom) when I run the command. Also, user$users_users_id is a numeric value and I do not get an error for that one. Probably the whole data frame is converted to numeric.
I have created a workaround, but it is not an optimal situation and I would like to understand what happens here. The workaround I created is:
query = dbSendQuery(con,"
SELECT
***AL LOT OF TABLE AND JOINS***
WHERE
users_terminals.user_user_id = $1 and
EXTRACT(day from planning_stops.planned_arrival - '1970-01-01') >= $2 and
EXTRACT(day from planning_stops.planned_arrival - '1970-01-01') <= $3"
,
data.frame(user$users_users_id,
datefrom,
dateto))
tmp = dbFetch(query)
dbClearResult(query)
Could anyone help me out here? The workaround works for now, but it does not seem to be optimal.
Thanks.
UPDATE
I have read something about sqlInterpolate, so I thought let's give it a try. However, I still receive an error message:
Error in postgresqlExecStatement(conn, statement, ...) :
RS-DBI driver: (could not Retrieve the result : ERROR: operator does not exist: timestamp without time zone >= integer
LINE 57: ... planning_stops.planned_arrival >= 2018-04...
^
HINT: No operator matches the given name and argument type(s). You might need to add explicit type casts.
)
My code is now:
query = sqlInterpolate(con,"
SELECT
***AL LOT OF TABLE AND JOINS***
WHERE
users_terminals.user_user_id = ?id1 and
planning_stops.planned_arrival >= ?date1 and
planning_stops.planned_arrival <= ?date2"
,
id1 = user$users_users_id,
date1 = datefrom,
date2 = dateto)
tmp = dbGetQuery(con,
query)
Still not working though.. It seems sqlInterpolate converts the inputs to integer.
Just pass the dates as strings and convert them to DATE in the query:
query = sqlInterpolate(con,"SELECT ... WHERE
users_terminals.user_user_id = ?id1 AND
planning_stops.planned_arrival >= ?date1::DATE AND
planning_stops.planned_arrival <= ?date2::DATE"
,
id1 = user$users_users_id,
date1 = strftime(datefrom, "%Y-%m-%d"),
date2 = strftime(dateto, "%Y-%m-%d"))
tmp = dbGetQuery(con, query)
If you would like to pass a timestamp instead, just use the appropriate format in strftime() and convert to timestamp in the query.
I've asked vertica support for this as well but wondering if any1 here got the same issue.
I'm working with Vertica Analytic Database v6.1.3-0
I'm using R version 3.0.0 - which comes with the Vertica R language pack.
I'm trying to create a simple UDF that uses a parameter passed with the USING PARAMETERS keyword.
This is the R code :
testFun <- function(x,y) {
# extract the function parameters from y
parameter <- y[["parameter"]] # parameter to be passed
sum(x[,1])
}
testFunParams <- function()
{
params <- data.frame(datatype=rep(NA, 1), length=rep(NA,1),scale=rep(NA,1),name=rep(NA,1))
params[1,1] <- "varchar"
params[1,2] <- "40"
params[1,4] <- "parameter"
params
}
testFunFactory <- function()
{
list(
name=testFun
,udxtype=c("transform")
,intype=c("int")
,outtype=c("varchar(200)")
,outnames=c('test')
,parametertypecallback=testFunParams
,volatility=c("stable")
,strict=c("called_on_null_input")
)
}
In Vertica I run the library :
drop library r_test cascade;
create or replace library r_test as '.../testFun.r' language 'R';
create transform function testFun as name 'testFunFactory' library r_test;
create table test as select 1 as x union select 2 union select 3 union select 4 union select 5 union select 6 union select 7;
select testFun(x) over() from test;
> ERROR 3399: Failure in UDx RPC call InvokeGetParameterType(): Error calling getParameterType() in User Defined Object [testFun] at [/scratch_a/release/vbuild/vertica/UDxFence/vertica-udx-R.cpp:245], error code: 0, message: Error happened in getParameterType : not compatible with REALSXP
I've tried Vertica's example for a function with parameters and it worked, when I changed the parameter type to varchar it failed.
What can be done?
Thanks
I tested your configuration and it returned the following error
[...] Error happened in getParameterType : not compatible with REALSXP [...]
And after some tweaking I know what happened. You saved "scale" as a character value instead of a numeric in your "testFunParams".
Test if that helps you =)