Appending new data to sqlite db in R

Appending new data to sqlite db in R - r

I have created a table in a sqlite3 database from R using the following code:-
con <- DBI::dbConnect(drv = RSQLite::SQLite(),
dbname="data/compfleet.db")
s<- sprintf("create table %s(%s, primary key(%s))", "PositionList",
paste(names(FinalTable), collapse = ", "),
names(FinalTable)[2])
dbGetQuery(con, s)
dbDisconnect(con)
The second column of the table is UID which is the primary key. I then run a script to update the data in the table. The updated data could contain the same UID which already exists in the table. I don't want these existing records to be updated and just want the new records(with new UID values) to be appended to this database. The code I am using is:-
DBI::dbWriteTable(con, "PositionList", FinalTable, append=TRUE, row.names=FALSE, overwite=FALSE)
Which returns an error:
Error in result_bind(res#ptr, params) :
UNIQUE constraint failed: PositionList.UID
How can I achieve the task of appending only the new UID values without changing the existing UID values even if they appear when I run my updation script?

You can query the existing UIDs (as a one-column data frame) and remove corresponding rows from the table you want to insert.
uid_df <- dbGetQuery(con, "SELECT UID FROM PositionList")
dbWriteTable(con, "PositionList", FinalTable[!(FinalTable$UID %in% uid_df[[1]]), ], ...)

When you are going to insert data,first get the data from database by using UID.If data is exist nothing to do else insert new data with new UID.Duplicate Primary Key (UID) recard is not exist ,so it show the error.

Related

Unable to write dataframe in R as Update-Statement to Postgis/PostgresSQL

I have the following dataframe:
library(rpostgis)
library(RPostgreSQL)
library(glue)
df<-data.frame(elevation=c(450,900),
id=c(1,2))
Now I try to upload this to a table in my PostgreSQL/Postgis database. My connection (dbConnect) is working for "SELECT"-Statements properly. However, I tried two ways of updating a database table with this dataframe and both failed.
First:
pgInsert(postgis,name="fields",data.obj=df,overwrite = FALSE, partial.match = TRUE,
row.names = FALSE,upsert.using = TRUE,df.geom=NULL)
2 out of 2 columns of the data frame match database table columns and will be formatted for database insert.
Error: x must be character or SQL
I do not know what the error is trying to tell me as both the values in the dataframe and table are set to integer.
Second:
sql<-glue_sql("UPDATE fields SET elevation ={df$elevation} WHERE
+ id = {df$id};", .con = postgis)
> sql
<SQL> UPDATE fields SET elevation =450 WHERE
id = 1;
<SQL> UPDATE fields SET elevation =900 WHERE
id = 2;
dbSendStatement(postgis,sql)
<PostgreSQLResult>
In both cases no data is transferred to the database and I do not see any Error logs within the database.
Any hint on how to solve this problem?

It is a mistake from my site, I got glue_sql wrong. To correctly update the database with every query created by glue_sql you have to loop through the created object like the following example:
for(i in 1:max(NROW(sql))){
dbSendStatement(postgis,sql[i])
}

Unique identifier not recognized by SQL query

I am making a table for users to fill out in Shiny using SQLite. At the end of each session I want to delete all entries containing the unique sessionID:
library(RSQLite)
library(pool)
library(DBI)
#Generates unique token. For example "ce20ca2792c26a702653ce54896fc10a"
sessionID <- session$token
pool <- dbPool(RSQLite::SQLite(), dbname = "db.sqlite")
df <- data.frame( sessionID=character(),
name=character(),
group=character(),
stringsAsFactors = FALSE)
dbWriteTable(pool, "user_data", df, overwrite=FALSE, append=TRUE)
-------------#Code to fill out the table-----------------
At the end of the session I delete the session specific entries using:
dbExecute(pool, sprintf('DELETE FROM "user_data" WHERE "sessionID" == (%s)', sessionID))
I get the following error:
Warning: Error in result_create: no such column: ce20ca2792c26a702653ce54896fc10a
If I replace the session ID with a random generated number for example "4078540723057" the entries are deleted without any problem. Why is the session$token not recognized?

As the sessionId column is text in your SQLite database, SQLite expects the literal value to be surrounded in single quotes. Normally you would use a prepared statement for this, but you may try:
dbExecute(pool, sprintf("DELETE FROM user_data WHERE sessionID = '%s'", sessionID))
Waiving the need to use a prepared statement here may be justified as your script is not open/accessible to the outside.

Use Rs mongolite to correctly (insert? update?) add data to existing collection

I have the following function written in R that (I think) is doing a poor job of updating my mongo databases collections.
library(mongolite)
con <- mongolite::mongo(collection = "mongo_collection_1", db = 'mydb', url = 'myurl')
myRdataframe1 <- con$find(query = '{}', fields = '{}')
rm(con)
con <- mongolite::mongo(collection = "mongo_collection_2", db = 'mydb', url = 'myurl')
myRdataframe2 <- con$find(query = '{}', fields = '{}')
rm(con)
... code to update my dataframes (rbind additional rows onto each of them) ...
# write dataframes to database
write.dfs.to.mongodb.collections <- function() {
collections <- c("mongo_collection_1", "mongo_collection_2")
my.dataframes <- c("myRdataframe1", "myRdataframe2")
# loop dataframes, write colllections
for(i in 1:length(collections)) {
# connect and add data to this table
con <- mongo(collection = collections[i], db = 'mydb', url = 'myurl')
con$remove('{}')
con$insert(get(my.dataframes[i]))
con$count()
rm(con)
}
}
write.dfs.to.mongodb.collections()
My dataframes myRdataframe1 and myRdataframe2 are very large dataframes, currently ~100K rows and ~50 columns. Each time my script runs, it:
uses con$find('{}') to pull the mongodb collection into R, saved as a dataframe myRdataframe1
scrapes new data from a data provider that gets appended as new rows to myRdataframe1
uses con$remove() and con$insert to fully remove the data in the mongodb collection, and then re-insert the entire myRdataframe1
This last bullet point is iffy, because I run this R script daily in a cronjob and I don't like that each time I am entirely wiping the mongo db collection and re-inserting the R dataframe to the collection.
If I remove the con$remove() line, I receive an error that states I have duplicate _id keys. It appears I cannot simply append using con$insert().
Any thoughts on this are greatly appreciated!

When you attempt to insert documents into MongoDB that already exist in the database as per their primary key you will get the duplicate key exception. In order to work around that you can simply unset the _id column using something like this before the con$insert:
my.dataframes[i]$_id <- NULL
This way, the newly inserted document will automatically get a new _id assigned.

you can use upsert ( which matches document with the first condition if found it will update it, if not it will insert a new one,
first you need to separate id from each doc
_id= my.dataframes[i]$_id
updateData = my.dataframes[i]
updateData$_id <- NULL
then use upsert ( there might be some easier way to concatenate strings in R)
con$update(paste('{"_id":"', _id, '"}' ,sep="" ) , paste('{"$set":', updateData,'}', sep=""), upsert = TRUE)

Deleting row in table in sqlite DB from R

I am building a shiny application which will allow CRUD operations by a user on a table which exists in an sqlite3 database. I am using the input$table_rows_selected() function in DT to get the index of the rows selected by the user. I am then trying to delete the rows (using an action button deleteRows) from the database which have a matching timestamp (the epoch time stored as the primary key). The following code runs without any error but does not delete the selected rows.
observeEvent(input$deleteRows, {
if(!is.null(input$responsesTable_rows_selected)){
s=input$responsesTable_rows_selected
conn <- poolCheckout(pool)
lapply(length(s), function(i){
timestamp = rvsTL$data[s[i],8]
query <- glue::glue_sql("DELETE FROM TonnageListChartering
WHERE TonnageListChartering.timestamp = {timestamp}
", .con = conn)
dbExecute(conn, sqlInterpolate(ANSI(), query))
})
poolReturn(conn)
# Show a modal when the button is pressed
shinyalert("Success!", "The selected rows have been deleted. Refresh
the table by pressing F5", type = "success")
}
})
pool is a handler at the global level for connecting to the database.
pool <- pool::dbPool(drv = RSQLite::SQLite(),
dbname="data/compfleet.db")
Why does this not work? And if it did, is there any way of refreshing the datatable output without having to reload the application?

As pointed out by #RomanLustrik there was definitely something 'funky' going on with timestamp. I am not well versed with sqlite but running PRAGMA table_info(TonnageListChartering); revealed this:
0|vesselName||0||0
1|empStatus||0||0
2|openPort||0||0
3|openDate||0||0
4|source||0||0
5|comments||0||0
6|updatedBy||0||0
7|timestamp||0||1
8|VesselDetails||0||0
9|Name||0||0
10|VslType||0||0
11|Cubic||0||0
12|DWT||0||0
13|IceClass||0||0
14|IMO||0||0
15|Built||0||0
16|Owner||0||0
I guess none of the variables have a data type defined and I am not sure if that's possible to do it now. Anyway, I changed the query to ensure that the timestamp is in quotes.
query <- glue::glue_sql("DELETE FROM TonnageListChartering
WHERE TonnageListChartering.timestamp = '{timestamp}'
", .con = conn)
This deletes the user selected rows.
However, when I am left with only one row, I am unable to delete it. No idea why. Maybe because of a primary key that I have defined while creating the table?

Adding row to redshift table rather than replacing table

I have the following data being sent to redshift with a replace table command- is there a command to instead add new rows to the table rather than replacing the entire thing?
PipelineSimulation<-matrix(,42,7)
PipelineSimulation<-as.data.frame(PipelineSimulation)
PipelineSimulation[1,1]<-"APAC"
PipelineSimulation[1,2]<-"Enterprise"
and so on through
PipelineSimulation[42,3]<-"Commit"
PipelineSimulation[42,4]<-"Upsell"
PipelineSimulation[42,5]<-NAMEFURate
PipelineSimulation[42,6]<-mean(NFUEntTotals)
PipelineSimulation[,7]<-Sys.time()
then to get it into redshift I use
library(RPostgres)
library(redshiftTools)
library(RPostgreSQL)
library("aws.s3")
library("DBI")
drv<-dbDriver('PostgreSQL')
con <- dbConnect(RPostgres::Postgres(), host='bi-prod-dw-
instance.cceimtxgnc4w.us-west-2.redshift.amazonaws.com', port='5439',
dbname= '***', user="***", password="***", sslmode='require')
query="select * from everyonesdb.jet_pipelinesimulation_historic;"
result<-dbGetQuery(con,query)
print (nrow(result))
Sys.setenv("AWS_ACCESS_KEY_ID" = "***",
"AWS_SECRET_ACCESS_KEY" = "***",
"AWS_DEFAULT_REGION" = "us-west-2")
b=get_bucket(bucket = 'bjnbi-bjnrd/jetPipelineSimulation')
rs_replace_table(PipelineSimulation, con,
tableName='everyonesdb.jet_pipelinesimulation_historic', bucket='bjnbi-
bjnrd/jetPipelineSimulation',split_files =2)
So instead of rs_replace_table, I want to preserve the old data and simply add new rows onto the existing table if that's possible

From How to bulk upload your data from R into Redshift:
rs_replace_table truncates the target table and then loads it entirely from the data frame, only do this if you don't care about the current data it holds.
On the other hand, rs_upsert_table replaces rows which have coinciding keys, and inserts those that do not exist in the table.
Does using rs_upsert_table instead of rs_replace_table solve your issue?