Updating a table With a Dataframe - r

I'm using data in a dataframe to try and update a table in an sqlite database that looks like
Part | Price
------------
a | 5
b | 9
I am getting a syntax error for this
for(row in 1:nrow(newdata)){dbGetQuery(conn=db,"UPDATE Parts SET Price = ",newdata$Price[row], " WHERE Part = '", newdata$Part[row],"';")}
The exact error I'm getting:
Error in rsqlite_send_query(conn#ptr, statement) : near " ": syntax error
Why is this please?

The query string needs to be built into a single string
for(row in seq_len(nrow(newdata))) {
dbGetQuery(conn=db, sprintf("UPDATE Parts SET Price = %i WHERE Part = '%s';", newdata$Price[row], newdata$Part[row]))
}
It's also possible to accomplish this with paste or paste0, but sprintf can be easier to read.

Related

R, ClickHouse: Expected: FixedString(34). Got: UInt64: While processing

I am trying to query data from ClickHouse database from R with subset.
Here is the example
library(data.table)
library(RClickhouse)
library(DBI)
subset <- paste(traffic[,unique(IDs)][1:30], collapse = ',')
conClickHouse <- DBI::dbConnect('here is the connection')
DataX <- dbgetdbGetQuery(conClickHouse, paste0("select * from database
and IDs in (", subset ,") ", sep = "") )
As a result I get error:
DB::Exception: Type mismatch in IN or VALUES section. Expected: FixedString(34).
Got: UInt64: While processing (IDs IN ....
Any help is appreciated
Thanks to the comment of #DennyCrane,
"select * from database where toFixedString(IDs,34) in
(toFixedString(ID1, 34), toFixedString(ID2,34 ))"
This query subset properly
https://clickhouse.tech/docs/en/sql-reference/functions/#strong-typing
Strong Typing
In contrast to standard SQL, ClickHouse has strong typing. In other words, it doesn’t make implicit conversions between types. Each function works for a specific set of types. This means that sometimes you need to use type conversion functions.
https://clickhouse.tech/docs/en/sql-reference/functions/type-conversion-functions/#tofixedstrings-n
select * from (select 'x' B ) where B in (select toFixedString('x',1))
DB::Exception: Types of column 1 in section IN don't match: String on the left, FixedString(1) on the right.
use casting toString or toFixedString
select * from (select 'x' B ) where toFixedString(B,1) in (select toFixedString('x',1))

Error: Query schema does not match table schema. QuerySchema=('long')

I am running a .set-or-append command to ingest from 1 table into another. I know that the query from the source table is fine and the target table, if it exists, should have the same query but if not the command should just create it. Initially I was not having a problem with this. But a few of my .set-or-append queries have been getting this error:
Invalid query for distributed set / append / replace operation. Error: Query schema does not match table schema. QuerySchema=('long'), TableSchema=('datetime,string,string,string,string,dynamic,dynamic,dynamic'). Query:...'
I know for a fact that the schemas match. I ran the same command again and again and on about the 3rd try the call succeeded. Which makes 0 sense to me. So what is this above error and why did the same command work after failing with no change to the query whatsoever?
The query/command I am running is essentially the following:
.set-or-append async TargetTable <|
SourceTable
| where __id in ("...", "....", ........) // is aproximately 250 distinct ids in "in" operator
It appears that the query you're using is extending data with extra column:
extend hashBucket = hash(row_number(), ...) | where hashBucket == ... - and thus you're getting schema mismatch.
Perhaps, your intention was to filter based on the hashBucket, and in this case you can just use filtering without extension:
where hash(row_number(), ...) == ...
Yoni could you explain what you mean by bag_unpack giving an inconsistent mismatch? i aligned my bag_unpack ... project-reorder to match the target table i'm unpacking to, but it just changes around a few variables types in the error message:
Query schema does not match table schema.
QuerySchema=(
'datetime,long,datetime,string,string,datetime,string,
long,real,string,bool,guid,guid,string,real'),
TableSchema=(
'datetime,long,datetime,string,string,datetime,string,
long,real,guid,guid,string,bool,string,real')
really confused what the table schema and query schema even are at this point.
For reference my query is like this:
.set-or-append async apiV2FormationSearchTransform <|
//set notruncation;
apiV2FormationSearchLatest
| where hash(toguid(fullRecord["id"]), 1) == 0
| project fullRecord
| evaluate bag_unpack(fullRecord)
| extend dateCatalogued = todatetime(column_ifexists("dateCatalogued", ""))
, simpleId = tolong(column_ifexists("simpleId", ""))
, dateLastModified = todatetime(column_ifexists("dateLastModified", ""))
, reportedFormationName = tostring(column_ifexists("reportedFormationName", ""))
, comments = tostring(column_ifexists("comments", ""))
, dateCreated = todatetime(column_ifexists("dateCreated", ""))
, formationName = tostring(column_ifexists("formationName", ""))
, internalId = tolong(column_ifexists("internalId", ""))
, topDepth = toreal(column_ifexists("topDepth", ""))
, wellId = column_ifexists("wellId", toguid(""))
, id = column_ifexists("id", toguid(""))
, methodObtained = tostring(column_ifexists("methodObtained", ""))
, isTarget = tobool(column_ifexists("isTarget", ""))
, completionId = tostring(column_ifexists("completionId", ""))
, baseDepth = toreal(column_ifexists("baseDepth", ""))
| project-reorder dateCatalogued
, simpleId
, dateLastModified
, reportedFormationName
, comments
, dateCreated
, formationName
, internalId
, topDepth
, wellId
, id
, methodObtained
, isTarget
, completionId
, baseDepth
and this is the getschema output of my target table:
dateCatalogued 0 System.DateTime datetime
simpleId 1 System.Int64 long
dateLastModified 2 System.DateTime datetime
reportedFormationName 3 System.String string
comments 4 System.String string
dateCreated 5 System.DateTime datetime
formationName 6 System.String string
internalId 7 System.Int64 long
topDepth 8 System.Double real
wellId 9 System.Guid guid
id 10 System.Guid guid
methodObtained 11 System.String string
isTarget 12 System.SByte bool
completionId 13 System.String string
baseDepth 14 System.Double real
I had a similar problem. In my case, it was that there was there in the DB another table with the same name (although in a different folder. I was using the with (folder = 'foo/bar') <| option ). I have changed the table name and the error disappeared.

Inserting a R dataframe in SQL table using a stored proc

I have a dataframe in R containing 10 rows and 7 columns. There's a stored procedure that does the few logic checks in the background and then inserts the data in the table 'commodity_price'.
library(RMySQL)
#Connection Settings
mydb = dbConnect(MySQL(),
user='uid',
password='pwd',
dbname='database_name',
host='localhost')
#Listing the tables
dbListTables(mydb)
f= data.frame(
location= rep('Bhubaneshwar', 4),
sourceid= c(8,8,9,2),
product= c("Ingot", "Ingot", "Sow Ingot", "Alloy Ingot"),
Specification = c('ie10','ic20','se07','se08'),
Price=c(14668,14200,14280,20980),
currency=rep('INR',4),
uom=rep('INR/MT',4)
)
For multiple rows insert, there's a pre-created stored proc 'PROC_COMMODITY_PRICE_INSERT', which I need to call.
for (i in 1:nrow(f))
{
dbGetQuery(mydb,"CALL PROC_COMMODITY_PRICE_INSERT(
paste(f$location[i],',',
f$sourceid[i],',',f$product[i],',',f$Specification[i],',',
f$Price[i],',',f$currency[i],',', f$uom[i],',',#xyz,')',sep='')
);")
}
I am repeatedly getting error.
Error in .local(conn, statement, ...) :
could not run statement: You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near '[i],',',
f$sourceid[i],',',f$product[i],',',f$Specification' at line 2
I tried using RODBC but its not getting connected at all. How can I insert the data from the R dataframe in the 'commodity_price' table by calling a stored proc? Thanks in advance!
That is probably due to your use of ', this might work:
for (i in 1:nrow(f))
{
dbGetQuery(mydb,paste("CALL PROC_COMMODITY_PRICE_INSERT(",f$location[i],',',
f$sourceid[i],',',f$product[i],',',f$Specification[i],',',
f$Price[i],',',f$currency[i],',', f$uom[i],',',"#xyz",sep='',");"))
}
or the one-liner:
dbGetQuery(mydb,paste0("CALL PROC_COMMODITY_PRICE_INSERT('",apply(f, 1, paste0, collapse = "', '"),"');"))
Trying the for loop:
for (i in 1:nrow(f))
{
dbGetQuery(mydb,paste("CALL PROC_COMMODITY_PRICE_INSERT(","'",f$location[i],"'",',',"'",
f$sourceid[i],"'",',',"'",f$product[i],"'",',',"'",f$Specification[i],"'",',',"'",
f$Price[i],"'",',',"'",f$currency[i],"'",',',"'",f$uom[i],"'",',','#xyz',sep='',");"))
}

How to read fixed width files using Spark in R

I need to read a 10GB fixed width file to a dataframe. How can I do it using Spark in R?
Suppose my text data is the following:
text <- c("0001BRAjonh ",
"0002USAmarina ",
"0003GBPcharles")
I want the 4 first characters to be associated to the column "ID" of a data frame; from character 5-7 would be associated to a column "Country"; and from character 8-14 to be associated to a column "Name"
I would use function read.fwf if the dataset was small, but that is not the case.
I can read the file as a text file using sparklyr::spark_read_text function. But I don't know how to attribute the values of the file to a data frame properly.
EDIT: Forgot to say substring starts at 1 and array starts at 0, because reasons.
Going through and adding the code I talked about in the column above.
The process is dynamic and is based off a Hive table called Input_Table. The table has 5 columns: Table_Name, Column_Name, Column_Ordinal_Position, Column_Start, and Column_Length. It is external so any user can change, drop, and remove any file into the folder location. I quickly built this from scratch to not take actual code, does everything make sense?
#Call Input DataFrame and the Hive Table. For hive table we make sure to only take correct column as well as the columns in correct order.
val inputDF = spark.read.format(recordFormat).option("header","false").load(folderLocation + "/" + tableName + "." + tableFormat).rdd.toDF("Odd_Long_Name")
val inputSchemaDF = spark.sql("select * from Input_Table where Table_Name = '" + tableName + "'").sort($"Column_Ordinal_Position")
#Build all the arrays from the columns, rdd to map to collect changes a dataframe col to a array of strings. In this format I can iterator through the column.
val columnNameArray = inputSchemaDF.selectExpr("Column_Name").rdd.map(x=>x.mkString).collect
val columnStartArray = inputSchemaDF.selectExpr("Column_Start_Position").rdd.map(x=>x.mkString).collect
val columnLengthArray = inputSchemaDF.selectExpr("Column_Length").rdd.map(x=>x.mkString).collect
#Make the iteraros as well as other variables that are meant to be overwritten
var columnAllocationIterator = 1
var localCommand = ""
var commandArray = Array("")
#Loop as there are as many columns in input table
while (columnAllocationIterator <= columnNameArray.length) {
#overwrite the string command with the new command, thought odd long name was too accurate to not place into the code
localCommand = "substring(Odd_Long_Name, " + columnStartArray(columnAllocationIterator-1) + ", " + columnLengthArray(columnAllocationIterator-1) + ") as " + columnNameArray(columnAllocationIterator-1)
#If the code is running the first time it overwrites the command array, else it just appends
if (columnAllocationIterator==1) {
commandArray = Array(localCommand)
} else {
commandArray = commandArray ++ Array(localCommand)
}
#I really like iterating my iterators like this
columnAllocationIterator = columnAllocationIterator + 1
}
#Run all elements of the string array indepently against the table
val finalDF = inputDF.selectExpr(commandArray:_*)

Parameters and NULL

I'm having trouble passing NULL as an INSERT parameter query using RPostgres and RPostgreSQL:
In PostgreSQL:
create table foo (ival int, tval text, bval bytea);
In R:
This works:
res <- dbSendQuery(con, "INSERT INTO foo VALUES($1, $2, $3)",
params=list(ival=1,
tval= 'not quite null',
bval=charToRaw('asdf')
)
)
But this throws an error:
res <- dbSendQuery(con, "INSERT INTO foo VALUES($1, $2, $3)",
params=list(ival=NULL,
tval= 'not quite null',
bval=charToRaw('asdf')
)
)
Using RPostgres, the error message is:
Error: expecting a string
Under RPostgreSQL, the error is:
Error in postgresqlExecStatement(conn, statement, ...) :
RS-DBI driver: (could not Retrieve the result : ERROR: invalid input
syntax for integer: "NULL"
)
Substituting NA would be fine with me, but it isn't a work-around - a literal 'NA' gets written to the database.
Using e.g. integer(0) gives the same "expecting a string" message.
You can use NULLIF directly in your insert statement:
res <- dbSendQuery(con, "INSERT INTO foo VALUES(NULLIF($1, 'NULL')::integer, $2, $3)",
params=list(ival=NULL,
tval= 'not quite null',
bval=charToRaw('asdf')
)
)
works with NA as well.
One option here to workaround the problem of not knowing how to articulate a NULL value in R which the PostgresSQL pacakge will be able to successfully translate is to simply not specify the column whose value you want to be NULL in the database.
So in your example you could use this:
res <- dbSendQuery(con, "INSERT INTO foo (col2, col3) VALUES($1, $2)",
params=list(tval = 'not quite null',
bval = charToRaw('asdf')
)
)
when you want col1 to have a NULL value. This of course assumes that col1 in your table is nullable, which may not be the case.
Thanks all for the help. Tim's answer is a good one, and I used it to catch the integer values. I went a different route for the rest of it, writing a function in PostgreSQL to handle most of this. It looks roughly like:
CREATE OR REPLACE FUNCTION add_stuff(ii integer, tt text, bb bytea)
RETURNS integer
AS
$$
DECLARE
bb_comp bytea;
rows integer;
BEGIN
bb_comp = convert_to('NA', 'UTF8'); -- my database is in UTF8.
-- front-end catches ii is NA; RPostgres blows up
-- trying to convert 'NA' to integer.
tt = nullif(tt, 'NA');
bb = nullif(bb, bb_comp);
INSERT INTO foo VALUES (ii, tt, bb);
GET DIAGNOSTICS rows = ROW_COUNT;
RETURN rows;
END;
$$
LANGUAGE plpgsql VOLATILE;
Now to have a look at the RPostgres source and see if there's an easy-enough way to make it handle NULL / NA a bit more easily. Hoping that it's missing because nobody thought of it, not because it's super-tricky. :)
This will give the "wrong" answer if someone is trying to put literally 'NA' into the database and mean something other than NULL / NA (e.g. NA = "North America"); given our use case, that seems very unlikely. We'll see in six months time.

Resources