R dbWriteTable to Cloud Spanner requiring column names - r

I'm trying to insert data into a Cloud Spanner table using DBI's dbWriteTable in R, however, it is asking me to supply column names. It is my understanding that as long as the dataframe contains the same amount of columns as required by the table then this should work. Here's my code and error I'm facing (leaving out connection details):
Code:
write.to.spanner <- function(table, df){
dbWriteTable(con, table, df, overwrite=FALSE, append=TRUE, row.names=FALSE)
}
req_df <- data.frame(req_id=123, req_name="test req")
write.to.spanner("dbi_test", req_df)
DDL for spanner table:
CREATE TABLE dbi_test (
req_id INT64,
req_name STRING(20),
) PRIMARY KEY(req_id);
Error:
Warning: Error in .local: execute JDBC update query failed in dbSendUpdate ([Simba][SpannerJDBCDriver](100605) There was an error while executing the DML query : INVALID_ARGUMENT: com.simba.cloudspanner.shaded.com.google.api.gax.rpc.InvalidArgumentException: com.simba.cloudspanner.shaded.io.grpc.StatusRuntimeException: INVALID_ARGUMENT: INSERT must specify a column list [at 1:1]
INSERT INTO dbi_test VALUES(#var1,#var2)
I'm able to insert using dbSendUpdate but would prefer to use dbWriteTable and not write out insert statements.
This works fine:
write.to.spanner <- function(table, df){
insert_qry <- paste0("INSERT INTO ",table," (req_id, req_name) VALUES (",df[1,1],",'",df[1,2],"');")
dbSendUpdate(con,insert_qry)
}

Related

How to create multi-column indices DuckDB/SQLite?

I have a DuckDB with columns of data which I would like to query using multiple columns. I'm in R but I'm not sure how to create a multicolumn index (or even a single column index). Can anyone suggest a reference please? I've added SQLite as a tag because I gather that the commands could be the same.
Edit:
Based on kukuk1de's recommendation I'm trying the following
require(DBI)
require(duckdb)
DBI::dbExecute(con,statement = "CREATE INDEX multi_idx ON (percent prevalence fresh_flow maskProp dropExhale)")
but I get the following error:
Error in .local(conn, statement, ...) :
duckdb_prepare_R: Failed to prepare query CREATE INDEX multi_idx ON (percent prevalence fresh_flow maskProp dropExhale)
Error: Parser Error: syntax error at or near "("
LINE 1: CREATE INDEX multi_idx ON (percent prevalence fresh_flow maskProp...
Try this:
library("DBI")
con = dbConnect(duckdb::duckdb(), dbdir=":memory:", read_only=FALSE)
dbExecute(con, "CREATE TABLE items(item VARCHAR, value DECIMAL(10,2), count INTEGER)")
dbExecute(con, "INSERT INTO items VALUES ('jeans', 20.0, 1), ('hammer', 42.2, 2)")
dbExecute(con, "CREATE INDEX itemcount_idx ON items (item, count);")
Running the last command again will tell you the index already exists.
dbExecute(con, "CREATE INDEX itemcount_idx ON items (item, count);")
Error in duckdb_execute(res) : duckdb_execute_R: Failed to run query
Error: Catalog Error: Index with name "itemcount_idx" already exists!

Retain the text values with special character (such as hyphen and space) when updating to PostgreSQL database through R

I want to update a table in PostgreSQL table from a newData dataframe in local through a loop when the id matches in both tables. However, I encountered issues that the text values do not update exactly as our newData to the database. Number is updating correctly but there are 2 issues when updating the text:
1) I have a column for house_nbr and it can be '120-12', but somehow it calculated and updated as '108' which should really be the text '120-12'.
2) I have a column for street_name and it can be 'Main Street', but I received an error that I couldn't resolve.
(Error in { :
task 1 failed - "Failed to prepare query: ERROR: syntax error at or near "Street")
The database table datatype is in char. It seems something is wrong with special character in the text, such as hyphen and space. Please advise how to retain the character text when updating to a Postgre database. Below is the code I am using. Thanks!
Update <- function(i) {
con <- dbConnect(RPostgres::Postgres(),
user="xxx",
password="xxx",
host="xxx",
dbname="xxx",
port=5432)
text <- paste("UPDATE dbTable SET house_nbr=" ,newData$house_nbr[i], ",street_name=",newData$street_name[i], "where id=",newData$id[i])
dbExecute(con, text)
dbDisconnect(con)
}
foreach(i = 1:length(newData$id), .inorder=FALSE,.packages="RPostgreSQL")%dopar%{
Update(i)
}

Unable to write dataframe in R as Update-Statement to Postgis/PostgresSQL

I have the following dataframe:
library(rpostgis)
library(RPostgreSQL)
library(glue)
df<-data.frame(elevation=c(450,900),
id=c(1,2))
Now I try to upload this to a table in my PostgreSQL/Postgis database. My connection (dbConnect) is working for "SELECT"-Statements properly. However, I tried two ways of updating a database table with this dataframe and both failed.
First:
pgInsert(postgis,name="fields",data.obj=df,overwrite = FALSE, partial.match = TRUE,
row.names = FALSE,upsert.using = TRUE,df.geom=NULL)
2 out of 2 columns of the data frame match database table columns and will be formatted for database insert.
Error: x must be character or SQL
I do not know what the error is trying to tell me as both the values in the dataframe and table are set to integer.
Second:
sql<-glue_sql("UPDATE fields SET elevation ={df$elevation} WHERE
+ id = {df$id};", .con = postgis)
> sql
<SQL> UPDATE fields SET elevation =450 WHERE
id = 1;
<SQL> UPDATE fields SET elevation =900 WHERE
id = 2;
dbSendStatement(postgis,sql)
<PostgreSQLResult>
In both cases no data is transferred to the database and I do not see any Error logs within the database.
Any hint on how to solve this problem?
It is a mistake from my site, I got glue_sql wrong. To correctly update the database with every query created by glue_sql you have to loop through the created object like the following example:
for(i in 1:max(NROW(sql))){
dbSendStatement(postgis,sql[i])
}

Appending new data to sqlite db in R

I have created a table in a sqlite3 database from R using the following code:-
con <- DBI::dbConnect(drv = RSQLite::SQLite(),
dbname="data/compfleet.db")
s<- sprintf("create table %s(%s, primary key(%s))", "PositionList",
paste(names(FinalTable), collapse = ", "),
names(FinalTable)[2])
dbGetQuery(con, s)
dbDisconnect(con)
The second column of the table is UID which is the primary key. I then run a script to update the data in the table. The updated data could contain the same UID which already exists in the table. I don't want these existing records to be updated and just want the new records(with new UID values) to be appended to this database. The code I am using is:-
DBI::dbWriteTable(con, "PositionList", FinalTable, append=TRUE, row.names=FALSE, overwite=FALSE)
Which returns an error:
Error in result_bind(res#ptr, params) :
UNIQUE constraint failed: PositionList.UID
How can I achieve the task of appending only the new UID values without changing the existing UID values even if they appear when I run my updation script?
You can query the existing UIDs (as a one-column data frame) and remove corresponding rows from the table you want to insert.
uid_df <- dbGetQuery(con, "SELECT UID FROM PositionList")
dbWriteTable(con, "PositionList", FinalTable[!(FinalTable$UID %in% uid_df[[1]]), ], ...)
When you are going to insert data,first get the data from database by using UID.If data is exist nothing to do else insert new data with new UID.Duplicate Primary Key (UID) recard is not exist ,so it show the error.

RJDBC dbGetQuery() ERROR to create external table HIVE

I encounter this problem: the DB call only creates a table, it has problem of retrieving JDBC result set.
Error in .verify.JDBC.result(r, "Unable to retrieve JDBC result set for
Calls: dbGetQuery ... dbSendQuery -> dbSendQuery -> .local -> .verify.JDBC.result
Execution halted
options( java.parameters = "-Xmx32g" )
library(rJava)
library(RJDBC)
drv <- JDBC("org.apache.hive.jdbc.HiveDriver", "/tmp/r_jars/hive-jdbc.jar")
for(jar in list.files('/tmp/r_jars/')){
.jaddClassPath(paste("/tmp/r_jars/",jar,sep=""))
}
conn <- dbConnect(drv, "jdbc:hive2://10.40.51.75:10000/default", "myusername", "mypassword")
createSCOREDDL_query <- "CREATE EXTERNAL TABLE hiveschema.mytable (
myvariables
)
ROW FORMAT SERDE
'com.bizo.hive.serde.csv.CSVSerde'
STORED AS INPUTFORMAT
'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
's3://mybucket/myschema/'"
dbGetQuery(conn, createSCOREDDL_query)
dbDisconnect(conn)
Instead of dbGetQuery can you try using dbSendUpdate? I was having similar issues and making this switch solved the problem.
I tried with the following code as suggested by #KaIC and it worked:
dbSendUpdate(conn, "CREATE EXTERNAL TABLE hiveschema.mytable ( col_A string, col_B string) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS TEXTFILE")
For multiple tables, you can create a list or loop within a function and use an apply() construct to apply it to the entire loop.

Resources