A thing I ran into using the RPostgreSQL package is that dbWriteTable(… overwrite=TRUE) seems to destroy existing table structure (datatypes and constraints), and dbRemoveTable() is equivalent to DROP table.
So I’ve used:
ltvTable <- "the_table_to_use"
dfLTV <- dataframe(x,y,z)
sql_truncate <- paste("TRUNCATE ", ltvTable) ##unconditional DELETE FROM…
res <- dbSendQuery(conn=con, statement=sql_truncate)
dbWriteTable(conn=con, name=ltvTable, value=dfLTV, row.names=FALSE, append=TRUE)
Is the TRUNCATE step necessary, or is there a dbWriteTable method that overwrites just the content not the structure?
I experience different behaviour from the answer offered by Manura Omal to How we can write data to a postgres DB table using R?, as overwrite=TRUE does not appear to truncate first.
I'm using: RPostgreSQL 0.4-1; PostgreSQL 9.4
best wishes - JS
As far as I know overwrite=T does 3 things:
DROPs the table
CREATES the table
fills table with new data
So if you want to preserve the structure, then you need the Truncate step.
Different behaviour might be caused by existence or non-existence of foreign keys preventing the DROP table step.
Related
I have created and filled a sqlite database within R using packages DBI and RSQLite. E.g. like that:
con_sqlite <- DBI::dbConnect(RSQLite::SQLite(), "../mydatabase.sqlite")
DBI::dbWriteTable(con_sqlite, "mytable", d_table, overwrite = TRUE)
...
Now the sqlite file got too big and I reduced the tables. However, the size does not decrease and I found out that I have to use command vaccum. Is there a possibility to use this command within R?
I think this should do the trick:
DBI::dbExecute(con, "VACUUM;")
I have two very large csv files that contain the same variables. I want to combine them into one table inside a sqlite database - if possible using R.
I successfully managed to put both csv files in separate tables inside one database using inborutils::csv_to_sqlite that one imports small chunks of data at a time.
Is there a way to create a third tables where both tables are simply appended using R (keeping in mind the limited RAM)? And if not - how else can I perform this task? Maybe via the terminal?
We assume that when the question refers to the "same variables" that it means that the two tables have the same column names. Below we create two such test tables, BOD and BOD2, and then in the create statement we combine them creating table both. This does the combining entirely on the SQLite side. Finally we use look at both.
library(RSQLite)
con <- dbConnect(SQLite()) # modify to refer to existing SQLite database
dbWriteTable(con, "BOD", BOD)
dbWriteTable(con, "BOD2", 10 * BOD)
dbExecute(con, "create table both as select * from BOD union select * from BOD2")
dbReadTable(con, "both")
dbDisconnect(con)
I have my final output in R dataframe. I need to write this output to a database in Azure Databricks. Can someone help me with the syntax? I used this code:
require(SparkR)
data1 <- createDataFrame(output)
write.df(data1, path="dbfs:/datainput/sample_dataset.parquet",
source="parquet", mode="overwrite")
This code runs without error, but i dont see the database in the datainput folder (mentioned in the path). Is there some other way to do it?
I believe you are looking for saveAsTable function. write.df is particularly to save the data in a file system only, not to tag the data as table.
require(SparkR)
data1 <- createDataFrame(output)
saveAsTable(data1, tableName = "default.sample_table", source="parquet", mode="overwrite")
In the above code, default is some existing database name, under which a new table will get created having name as sample_table. If you mention sample_table instead of default.sample_table then it will be saved in the default database.
I would like to bulk-INSERT/UPSERT a moderately large amount of rows to a postgreSQL database using R. In order to do so I am preparing a multi-row INSERT string using R.
query <- sprintf("BEGIN;
CREATE TEMPORARY TABLE
md_updates(ts_key varchar, meta_data hstore) ON COMMIT DROP;
INSERT INTO md_updates(ts_key, meta_data) VALUES %s;
LOCK TABLE %s.meta_data_unlocalized IN EXCLUSIVE MODE;
UPDATE %s.meta_data_unlocalized
SET meta_data = md_updates.meta_data
FROM md_updates
WHERE md_updates.ts_key = %s.meta_data_unlocalized.ts_key;
COMMIT;", md_values, schema, schema, schema, schema)
DBI::dbGetQuery(con,query)
The entire function can be found here. Surprisingly (at leat to me) I learned that the UPDATE part is not the problem. I left it out and ran a the query again and it wasn't much faster. INSERT a million+ records seems to be the issue here.
I did some research and found quite some information:
bulk inserts
bulk inserts II
what causes large inserts to slow down
answers from #Erwin Brandstetter and #Craig Ringer were particularly helpful. I was able to speed things up quite a bit by dropping indices and following a few other suggestions.
However, I struggled to implement another suggestion which sounded promising: COPY. The problem is I can't get it done from within R.
The following works for me:
sql <- sprintf('CREATE TABLE
md_updates(ts_key varchar, meta_data hstore);
COPY md_updates FROM STDIN;')
dbGetQuery(sandbox,"COPY md_updates FROM 'test.csv' DELIMITER ';' CSV;")
But I can't get it done without reading from a extra .csv file. So my questions are:
Is COPY really a promising approach here (over the multi-row INSERT I got?
Is there a way to use COPY from within R without writing data to a file. Data does fit in memory and since it's already in mem why write to disk?
I am using PostgreSQL 9.5 on OS X and 9.5 on RHEL respectively.
RPostgreSQL has a "CopyInDataframe" function that looks like it should do what you want:
install.packages("RPostgreSQL")
library(RPostgreSQL)
con <- dbConnect(PostgreSQL(), user="...", password="...", dbname="...", host="...")
dbSendQuery(con, "copy foo from stdin")
postgresqlCopyInDataframe(con, df)
Where table foo has the same columns as dataframe df
I have a table i wish to insert records into in a Teradata environment using R
I have connected to the the DB and created my Table using JDBC
From reading the documentation there doesn't appear to be an easy way to insert records into the system except to create your own manual insert statements. I am trying to do this by creating a vectorized approach using apply (or anything similar)
Below is my code but I'm clearly not using apply correctly. Can anyone help?
s <- seq(1:1000)
str_update_table <- sprintf("INSERT INTO foo VALUES (%s)", s)
# Set Up the Connections
myconn <- dbConnect(drv,service, username, password)
# Attempt to run each of the 1000 sql statements
apply(str_update_table,2,dbSendUpdate,myconn)
I have not got the infrastructure to test, but you pass a vector to apply where apply expects an array. With your vector str_update_table the 2 in apply does not make much sense.
Try Map like in
Map(function(x) dbSendUpdate(myconn, x), str_update_table)
(untested)