How to create table in PostgreSQL using R? - r

I'd like to create table in PostgrSQL using R DBI package.
Here is s small example.
dbExecute(con, "create table data1 (var1 int not null, var2 date not null, var3 int)")
where con is connection object.
But I got an error Failed to fetch row and something else that I cannot read due to UTF-8 encoding problem.
Also I tried dbSendQuery and dbGetquery. The same result.
How can write a code to complete this task?
A one restiction applied. I know that there is a dbCreateTablecommand which creates table in PosgreSQL. But it uses R notation, but I want apply exact SQL notation.
Thanks in advance.

I got the way out.
All I need is to read the following code:
RPostgres:: dbSendQuery(con, "create table ...")

Related

How to insert nested values from R into Big Query

I would like to insert data into BQ via R. When I have a normal table everything is ok. The problem begins when I have to insert table which contains a map (nested/record repeated) column.
The column is defined like this:
I use bigrquery package and DBI like here:
dbWriteTable(
con,
"database.table",
table,
overwrite = FALSE,
append = TRUE,
row.names = FALSE
)
How should I define the customerdata column in R to insert it into Big Query? I've tried json and list but it didn't work. Although it could also be wrongly written json or list :)
I know that the example is not reproducible but it is rather not possible here or I have no idea how to create it.
Do you have any idea how to do this?

using RODBC sqlquery in R to import long string, but R truncate the string, how to get around this?

I'm using R library(RODBC) to import the results of a sql store procedure and save it in a data frame then export that data frame using write.table to write it to xml file (the results from sql is an xml output)
anyhow, R is truncating the string (imported xml results from sql).
I've tried to find a function or an option to expand the size/length of the R dataframe cell but didn't find any
I also tried to use the sqlquery in the write.table statement to ignore using a dataframe but also it didn't work, the imported data from sql is always truncated.
Anyone have any suggestions or an answer that could help me.
here is my code
#library & starting the sql connection
library(RODBC)
my_conn<-odbcDriverConnect('Driver={SQL Server};server=sql2014;database=my_conn;trusted_connection=TRUE')
#Create a folder and a path to save my output
x <- "C:/Users/ATM1/Documents/R/CSV/test"
dir.create(x, showWarnings=FALSE)
setwd(x)
Mpath <- getwd()
#importing the data from sql store procedure output
xmlcode1 <- sqlquery(my_conn, "exec dbo.p_webDefCreate 'SA25'", stringsAsFactors=F, as.is=TRUE)
#writing to a file
write.table(xmlcode1, file=paste0(Mpath,"/SA5b_def.xml"), quote = FALSE, col.names = FALSE, row.names = FALSE)
what I get is plain text that is not the full output.
and the code below is how I find the current length of my string
stri_length(xmlcode1)
[1] 65534
I had similar issue with our project, the data that was coming from the db was getting truncated to 257 characters, and I could not really get around it. Eventually I converted the column def on the db table from varchar(max) to varchar(8000) and I got all the characters back. I did not mind changing the table defintion.
In your case you can perhaps convert the column type in your proc output to varchar with some defined value if possible.
M
I am using PostgeSQL but experienced the same issue of truncation upon importing into R with RODBC package. I used Michael Kassa's solution with a slight change to set the data type to text which can store a string with unlimited length per postgresqltutorial. This worked for me.
The TEXT data type can store a string with unlimited length.
varchar() also worked for me
If you do not specify the n integer for the VARCHAR data type, it behaves like the TEXT datatype. The performance of the VARCHAR (without the size n) and TEXT are the same.

Speed up INSERT of 1 million+ rows into Postgres via R using COPY?

I would like to bulk-INSERT/UPSERT a moderately large amount of rows to a postgreSQL database using R. In order to do so I am preparing a multi-row INSERT string using R.
query <- sprintf("BEGIN;
CREATE TEMPORARY TABLE
md_updates(ts_key varchar, meta_data hstore) ON COMMIT DROP;
INSERT INTO md_updates(ts_key, meta_data) VALUES %s;
LOCK TABLE %s.meta_data_unlocalized IN EXCLUSIVE MODE;
UPDATE %s.meta_data_unlocalized
SET meta_data = md_updates.meta_data
FROM md_updates
WHERE md_updates.ts_key = %s.meta_data_unlocalized.ts_key;
COMMIT;", md_values, schema, schema, schema, schema)
DBI::dbGetQuery(con,query)
The entire function can be found here. Surprisingly (at leat to me) I learned that the UPDATE part is not the problem. I left it out and ran a the query again and it wasn't much faster. INSERT a million+ records seems to be the issue here.
I did some research and found quite some information:
bulk inserts
bulk inserts II
what causes large inserts to slow down
answers from #Erwin Brandstetter and #Craig Ringer were particularly helpful. I was able to speed things up quite a bit by dropping indices and following a few other suggestions.
However, I struggled to implement another suggestion which sounded promising: COPY. The problem is I can't get it done from within R.
The following works for me:
sql <- sprintf('CREATE TABLE
md_updates(ts_key varchar, meta_data hstore);
COPY md_updates FROM STDIN;')
dbGetQuery(sandbox,"COPY md_updates FROM 'test.csv' DELIMITER ';' CSV;")
But I can't get it done without reading from a extra .csv file. So my questions are:
Is COPY really a promising approach here (over the multi-row INSERT I got?
Is there a way to use COPY from within R without writing data to a file. Data does fit in memory and since it's already in mem why write to disk?
I am using PostgreSQL 9.5 on OS X and 9.5 on RHEL respectively.
RPostgreSQL has a "CopyInDataframe" function that looks like it should do what you want:
install.packages("RPostgreSQL")
library(RPostgreSQL)
con <- dbConnect(PostgreSQL(), user="...", password="...", dbname="...", host="...")
dbSendQuery(con, "copy foo from stdin")
postgresqlCopyInDataframe(con, df)
Where table foo has the same columns as dataframe df

Using r to Insert Records into a Database using apply

I have a table i wish to insert records into in a Teradata environment using R
I have connected to the the DB and created my Table using JDBC
From reading the documentation there doesn't appear to be an easy way to insert records into the system except to create your own manual insert statements. I am trying to do this by creating a vectorized approach using apply (or anything similar)
Below is my code but I'm clearly not using apply correctly. Can anyone help?
s <- seq(1:1000)
str_update_table <- sprintf("INSERT INTO foo VALUES (%s)", s)
# Set Up the Connections
myconn <- dbConnect(drv,service, username, password)
# Attempt to run each of the 1000 sql statements
apply(str_update_table,2,dbSendUpdate,myconn)
I have not got the infrastructure to test, but you pass a vector to apply where apply expects an array. With your vector str_update_table the 2 in apply does not make much sense.
Try Map like in
Map(function(x) dbSendUpdate(myconn, x), str_update_table)
(untested)

No results using read.csv.sql

I have a 6 gb csv file that I am trying to read into R using read.csv.sql from the sqldf package. For some reason, the result returns 0 results. What is wrong with my code? I get a warning message of "closing unused connection" which may not be related to the fact that no results are returned. My code is below.
TestData <- read.csv.sql("2025_nonroad_ff10_NCD20130831_23feb2015_v3_part1.csv", sql = "select * from file where poll == 'EXH__100414';", header=TRUE, skip=27, eol="\n", sep=",")
If I run a simpler SQL statement: select * from file limit 2, the result is:
Perhaps this revision might help:
TestData <- read.csv.sql("2025_nonroad_ff10_NCD20130831_23feb2015_v3_part1.csv", sql = "select * from file where poll = 'EXH__100414'", header=TRUE, skip=27, eol="\n", sep=",")
They were only minor changes:
removed double equals in the sql statement
removed closing semi-colon, in other programs a closing semi-colon is needed but in read.csv.sql it is not
If that doesn't work we need to try to isolate the problem
Try a simpler SQL such as showing just first two records. select * from file limit 2, does that even work?
If it works, then it means everything else is working but your original sql condition is bad, wrong etc.
If not, it means there is something else wrong with the rest of the read.csv.sql arguments, or perhaps with the file, or read.csv.sql itself.

Resources