I am using the in_schema() function from dbplyr package to create a table in a named schema of a postgresql database from R.
It is not a new piece of code and it used to work as expected = creating a table called 'my_table' in schema 'my_schema'.
con <- dbConnect(odbc::odbc(),
driver = "PostgreSQL Unicode",
server = "server",
port = 5432,
uid = "user name",
password = "password",
database = "dbase")
dbWriteTable(con,
in_schema('my_schema', 'my_table'),
value = whatever) # assume that 'whatever' is a data frame...
This piece of code has now developed issues and unexpectedly started to create a table called 'my_scheme.my_table' in the default public scheme of my database, instead of the expected my_schema.my_table.
Has anybody else noticed such behaviour, and is there a solution (except using the default postgresql scheme, which is not practical in my case)?
for that, I would recommend using copy_to() instead of dbWriteTable(): copy_to(con, iris, in_schema("production", "iris"))
Related
The pandas equivalent code for connecting to Teradata, I have used is:
database = config.get('Teradata connection', 'database')
host = config.get('Teradata connection', 'host')
user = config.get('Teradata connection', 'user')
pwd = config.get('Teradata connection', 'pwd')
with teradatasql.connect(host=host, user=user, password=pwd) as connect:
query1 = "SELECT * FROM {}.{}".format(database, tables)
df = pd.read_sql_query(query1, connect)
Now, I need to use the Dask library for loading big data as an alternative to pandas.
Please suggest a method to connect the same with Teradata.
Teradata appears to have a sqlalchemy engine, so you should be able to install that, set your connection string appropriately and use Dask's existing from_sql function.
Alternatively, you could do this by hand: you need to decide on a set of conditions which will partition the data for you, each partition being small enough for your workers to handle. Then you can make a set of partitions and combine into a dataframe as follows
def get_part(condition):
with teradatasql.connect(host=host, user=user, password=pwd) as connect:
query1 = "SELECT * FROM {}.{} WHERE {}".format(database, tables, condition)
return pd.read_sql_query(query1, connect)
parts = [dask.delayed(get_part)(cond) for cond in conditions)
df = dd.from_delayed(parts)
(ideally, you can derive the meta= parameter for from_delayed beforehand, perhaps by getting the first 10 rows of the original query).
Why dbWriteTable is not able to write data in non default schema in HANA?
dbcConnection_Test1 <- dbConnect(jdbcDriver, "jdbc:sap://crbwhd12:30215?SCHEMA_NAME/TABLE_NAME", "Username", "Password")
dbWriteTable(jdbcConnection_Test1, name = "TABLE_NAME", value = DataFrame1, overwrite = FALSE, append = TRUE)
I have established connection between R and HANA.
Then I have specified HANA Table name in which the data needs to be updated.
I am getting an error message saying that the table name is invalid and it is not available in the default schema. I want to upload the data in some other schema
I have the following data being sent to redshift with a replace table command- is there a command to instead add new rows to the table rather than replacing the entire thing?
PipelineSimulation<-matrix(,42,7)
PipelineSimulation<-as.data.frame(PipelineSimulation)
PipelineSimulation[1,1]<-"APAC"
PipelineSimulation[1,2]<-"Enterprise"
and so on through
PipelineSimulation[42,3]<-"Commit"
PipelineSimulation[42,4]<-"Upsell"
PipelineSimulation[42,5]<-NAMEFURate
PipelineSimulation[42,6]<-mean(NFUEntTotals)
PipelineSimulation[,7]<-Sys.time()
then to get it into redshift I use
library(RPostgres)
library(redshiftTools)
library(RPostgreSQL)
library("aws.s3")
library("DBI")
drv<-dbDriver('PostgreSQL')
con <- dbConnect(RPostgres::Postgres(), host='bi-prod-dw-
instance.cceimtxgnc4w.us-west-2.redshift.amazonaws.com', port='5439',
dbname= '***', user="***", password="***", sslmode='require')
query="select * from everyonesdb.jet_pipelinesimulation_historic;"
result<-dbGetQuery(con,query)
print (nrow(result))
Sys.setenv("AWS_ACCESS_KEY_ID" = "***",
"AWS_SECRET_ACCESS_KEY" = "***",
"AWS_DEFAULT_REGION" = "us-west-2")
b=get_bucket(bucket = 'bjnbi-bjnrd/jetPipelineSimulation')
rs_replace_table(PipelineSimulation, con,
tableName='everyonesdb.jet_pipelinesimulation_historic', bucket='bjnbi-
bjnrd/jetPipelineSimulation',split_files =2)
So instead of rs_replace_table, I want to preserve the old data and simply add new rows onto the existing table if that's possible
From How to bulk upload your data from R into Redshift:
rs_replace_table truncates the target table and then loads it entirely from the data frame, only do this if you don't care about the current data it holds.
On the other hand, rs_upsert_table replaces rows which have coinciding keys, and inserts those that do not exist in the table.
Does using rs_upsert_table instead of rs_replace_table solve your issue?
I’m reading a BLOB field from an ODBC data connection (the BLOB field is a file). I connect and query the database, returning the blob and the filename. The blob itself does not contain the same data as I find in the database however. My code is as follows along with the data returned vs in the DB.
library(RODBC)
sqlret<-odbcConnect('ODBCConnection')
qry<-'select content,Filename from document with(nolock) where documentid = \'xxxx\''
df<-sqlQuery(sqlret,qry)
close(sqlret)
rootpath<-paste0(getwd(),'/DocTest/')
dir.create(rootpath,showWarnings = FALSE)
content<-unlist(df$content)
fileout<-file(paste0(rootpath,df$Filename),"w+b")
writeBin(content, fileout)
close(fileout)
database blob is
0x50726F642050434E203A0D0A35363937313533320D0A33383335323133320D0A42463643453335380D0A0D0A574C4944203A0D0A0D0…
the dataframe’s content is
00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000004b020000000000000d0000f1000000008807840200000000d0f60c0c0000…
The filenames match up, as does the size of the content/blob.
The exact approach you take may vary depending on your ODBC driver. I'll demonstrate how I do this on MS SQL Server, and hopefully you can adapt it to your needs.
I'm going to use a table in my database called InsertFile with the following definition:
CREATE TABLE [dbo].[InsertFile](
[OID] [int] IDENTITY(1,1) NOT NULL,
[filename] [varchar](50) NULL,
[filedata] [varbinary](max) NULL
) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]
GO
Now let's create a file that we will push into the database.
file <- "hello_world.txt"
write("Hello world", file)
I need to do a little work to prep the byte code for this file to go into SQL. I use this function for that.
prep_file_for_sql <- function(filename){
bytes <-
mapply(FUN = readBin,
con = filename,
what = "raw",
n = file.info(filename)[["size"]],
SIMPLIFY = FALSE)
chars <-
lapply(X = bytes,
FUN = as.character)
vapply(X = bytes,
FUN = paste,
collapse = "",
FUN.VALUE = character(1))
}
Now, this is a bit strange, but the SQL Server ODBC driver is pretty good at writing VARBINARY columns, but terrible at reading them.
Coincidentally, the SQL Server Native Client 11.0 ODBC driver is terrible at writing VARBINARY columns, but okay-ish with reading them.
So I'm going to have two RODBC objects, conn_write and conn_read.
conn_write <-
RODBC::odbcDriverConnect(
paste0("driver=SQL Server; server=[server_name]; database=[database_name];",
"uid=[user_name]; pwd=[password]")
)
conn_read <-
RODBC::odbcDriverConnect(
paste0("driver=SQL Server Native Client 11.0; server=[server_name]; database=[database_name];",
"uid=[user_name]; pwd=[password]")
)
Now I'm going to insert the text file into the database using a parameterized query.
sqlExecute(
channel = conn_write,
query = "INSERT INTO dbo.InsertFile (filename, filedata) VALUES (?, ?)",
data = list(file,
prep_file_for_sql(file)),
fetch = FALSE
)
And now to read it back out using a parameterized query. The unpleasant trick to use here is recasting your VARBINARY property as a VARBINARY (don't ask me why, but it works).
X <- sqlExecute(
channel = conn_read,
query = paste0("SELECT OID, filename, ",
"CAST(filedata AS VARBINARY(8000)) AS filedata ",
"FROM dbo.InsertFile WHERE filename = ?"),
data = list("hello_world.txt"),
fetch = TRUE,
stringsAsFactors = FALSE
)
Now you can look at the contents with
unlist(X$filedata)
And write the file with
writeBin(unlist(X$filedata),
con = "hello_world2.txt")
BIG DANGEROUS CAVEAT
You need to be aware of the size of your files. I usually store files as a VARBINARY(MAX), and SQL Server isn't very friendly about exporting those through ODBC (I'm not sure about other SQL Engines; see RODBC sqlQuery() returns varchar(255) when it should return varchar(MAX) for more details)
The only way I've found to get around this is to recast the VARBINARY(MAX) as a VARBINARY(8000). That obviously is a terrible solution if you have more than 8000 bytes in your file. When I need to get around this, I've had to loop over the VARBINARY(MAX) column and created multiple new columns each of length 8000, and then paste them all together in R. (check out: Reconstitute PNG file stored as RAW in SQL Database)
As of yet, I've not come up with a generalized solution to this problem. Perhaps that's something I should spend more time on, though.
The limit of the 8000 is imposed by the ODBC driver and not by the RODBC, DBI or odbc packages.
Use the latest driver to remove the limitation: ODBC Driver 17 for SQL Server
https://learn.microsoft.com/en-us/sql/connect/odbc/linux-mac/installing-the-microsoft-odbc-driver-for-sql-server?view=sql-server-2017
There is no need to convert column to VARBINARY with this latest driver.
Following should work
X <- sqlExecute(
channel = conn_read,
query = paste0("SELECT OID, filename, ",
"filedata ",
"FROM dbo.InsertFile WHERE filename = ?"),
data = list("hello_world.txt"),
fetch = TRUE,
stringsAsFactors = FALSE
)
In monetdb I have set up a schema main and my tables are created into this schema.
For example, the department table is main.department.
With dplyr I try to query the table:
mdb <- src_monetdb(dbname="model", user="monetdb", password="monetdb")
tbl(mdb, "department")
But I get
Error in .local(conn, statement, ...) :
Unable to execute statement 'PREPARE SELECT * FROM "department"'.
Server says 'SELECT: no such table 'department'' [#42S02].
I tried to use "main.department" and other similar combinations with no luck.
What is the appropriate syntax?
There is a somewhat hacky workaround for this: We can manually set the default schema for the connection. I have a database testing, in there is a schema foo with a table called bar.
mdb <- src_monetdb("testing")
dbSendQuery(mdb$con, "SET SCHEMA foo");
t <- tbl(mdb, "bar")
The dbplyr package (a backend of dplyr for database connections) has a in_schema() function for these cases:
conn <- dbConnect(
MonetDB.R(),
host = "localhost",
dbname = "model",
user = "monetdb",
password = "monetdb",
timeout = 86400L
)
department = tbl(conn, dbplyr::in_schema("main", "department"))