dplyr & monetdb - appropriate syntax for querying schema.table? - r

In monetdb I have set up a schema main and my tables are created into this schema.
For example, the department table is main.department.
With dplyr I try to query the table:
mdb <- src_monetdb(dbname="model", user="monetdb", password="monetdb")
tbl(mdb, "department")
But I get
Error in .local(conn, statement, ...) :
Unable to execute statement 'PREPARE SELECT * FROM "department"'.
Server says 'SELECT: no such table 'department'' [#42S02].
I tried to use "main.department" and other similar combinations with no luck.
What is the appropriate syntax?

There is a somewhat hacky workaround for this: We can manually set the default schema for the connection. I have a database testing, in there is a schema foo with a table called bar.
mdb <- src_monetdb("testing")
dbSendQuery(mdb$con, "SET SCHEMA foo");
t <- tbl(mdb, "bar")

The dbplyr package (a backend of dplyr for database connections) has a in_schema() function for these cases:
conn <- dbConnect(
MonetDB.R(),
host = "localhost",
dbname = "model",
user = "monetdb",
password = "monetdb",
timeout = 86400L
)
department = tbl(conn, dbplyr::in_schema("main", "department"))

Related

How to connect to Teradata database using Dask?

The pandas equivalent code for connecting to Teradata, I have used is:
database = config.get('Teradata connection', 'database')
host = config.get('Teradata connection', 'host')
user = config.get('Teradata connection', 'user')
pwd = config.get('Teradata connection', 'pwd')
with teradatasql.connect(host=host, user=user, password=pwd) as connect:
query1 = "SELECT * FROM {}.{}".format(database, tables)
df = pd.read_sql_query(query1, connect)
Now, I need to use the Dask library for loading big data as an alternative to pandas.
Please suggest a method to connect the same with Teradata.
Teradata appears to have a sqlalchemy engine, so you should be able to install that, set your connection string appropriately and use Dask's existing from_sql function.
Alternatively, you could do this by hand: you need to decide on a set of conditions which will partition the data for you, each partition being small enough for your workers to handle. Then you can make a set of partitions and combine into a dataframe as follows
def get_part(condition):
with teradatasql.connect(host=host, user=user, password=pwd) as connect:
query1 = "SELECT * FROM {}.{} WHERE {}".format(database, tables, condition)
return pd.read_sql_query(query1, connect)
parts = [dask.delayed(get_part)(cond) for cond in conditions)
df = dd.from_delayed(parts)
(ideally, you can derive the meta= parameter for from_delayed beforehand, perhaps by getting the first 10 rows of the original query).

How do I find the schema of a table in an ODBC connection by name?

I'm using the odbc package to connect to a MS SQL Server
con <- dbConnect(odbc::odbc(),
Driver = "ODBC Driver 13 for SQL Server",
Server = "server",
Database = "database",
UID = "user",
PWD = "pass",
Port = 1111)
This server has many tables, so I'm using dbListTables(con) to search for the ones containing a certain substring. But once I find them I need to discover which schema they are in to be able to query them. I'm currently doing this manually (looking for the name of the table in each schema), but is there any way I can get the schema of all tables that match a string?
Consider running an SQL query with LIKE search using the built-in INFORMATION_SCHEMA metadata table if your user has sufficient privileges.
SELECT SCHEMA_NAME
FROM INFORMATION_SCHEMA.SCHEMATA
WHERE SCHEMA_NAME LIKE '%some string%'
Call above with R odbc with a parameterized query on the wildcard search:
# PREPARED STATEMENT
strSQL <- paste("SELECT SCHEMA_NAME" ,
"FROM INFORMATION_SCHEMA.SCHEMATA",
"WHERE SCHEMA_NAME LIKE ?SEARCH")
# SAFELY INTERPOLATED QUERY
query <- sqlInterpolate(conn, strSQL, SEARCH = '%some string%')
# DATA FRAME BUILD FROM RESULTSET
schema_names_df <- dbGetQuery(conn, query)
I found a work around using the RODBC package:
library('RODBC')
# First connect to the DB
dbconn <- odbcDriverConnect("driver = {ODBC Driver xx for SQL Server};
server = server;
database = database;
uid = username;
pwd = password")
# Now fetch the DB tables
sqlTables(dbconn)
For my specific DB I get:
names(sqlTables(dbconn)
[1] "TABLE_CAT" "TABLE_SCHEM" "TABLE_NAME" "TABLE_TYPE" "REMARKS"

dbplyr in_schema() function behaving strangely

I am using the in_schema() function from dbplyr package to create a table in a named schema of a postgresql database from R.
It is not a new piece of code and it used to work as expected = creating a table called 'my_table' in schema 'my_schema'.
con <- dbConnect(odbc::odbc(),
driver = "PostgreSQL Unicode",
server = "server",
port = 5432,
uid = "user name",
password = "password",
database = "dbase")
dbWriteTable(con,
in_schema('my_schema', 'my_table'),
value = whatever) # assume that 'whatever' is a data frame...
This piece of code has now developed issues and unexpectedly started to create a table called 'my_scheme.my_table' in the default public scheme of my database, instead of the expected my_schema.my_table.
Has anybody else noticed such behaviour, and is there a solution (except using the default postgresql scheme, which is not practical in my case)?
for that, I would recommend using copy_to() instead of dbWriteTable(): copy_to(con, iris, in_schema("production", "iris"))

Read a View created from a procedure in SAP HANA from R

I have schema in SAP HANA by the name "HYZ_ProcurementToSales" and View "V_HYZ_P25_Market_Market_Orders" which is created from a procedure, I am trying to extract the view in the R server version 1.0.153. The code I am using is:
library(RJDBC)
conn_server <- dbConnect(jdbcDriver,
"jdbc:sap:rdkom12.dhcp.pal.sap.corp:30015", "system",
"manager")
res <- dbGetQuery(conn,"select * from
HYZ_ProcurementToSales.V_HYZ_P25_Market_Market_Orders")
The error that I get is this:
"Unable to retrieve JDBC result set for
select * from HYZ_ProcurementToSales.V_HYZ_P25_Market_Market_Orders".
My belief is that something else instead of dbGetQuery will do the trick here. It works fine if I simply do
res <- dbGetQuery(conn,"select * from Tables")
The following works for me on HANA 1 SPS12 with a procedure that exposes a view called V_CURRENTUSERS:
library(RJDBC)
drv <- JDBC("com.sap.db.jdbc.Driver",
"C:\\Program Files\\SAP\\hdbclient\\ngdbc.jar",
identifier.quote='"')
conn <- dbConnect(drv, "jdbc:sap://<hanaserver>:3<instance>15/?", "**username**", "*pw*")
jusers <- dbFetch(dbSendQuery(conn = conn, statement = 'select * from v_currentusers;'))
At this point, the whole result set is bound to jusers.
Once finished you should release the result set again:
dbClearResult(jusers)
and finally close the connection
dbDisconnect(conn)
Be aware that procedures with result views are deprecated and should not be used/developed anymore. Instead, use table functions as these can also be reused in information views and allow for dynamic parameter assignment.

RJDBC dbGetQuery() ERROR to create external table HIVE

I encounter this problem: the DB call only creates a table, it has problem of retrieving JDBC result set.
Error in .verify.JDBC.result(r, "Unable to retrieve JDBC result set for
Calls: dbGetQuery ... dbSendQuery -> dbSendQuery -> .local -> .verify.JDBC.result
Execution halted
options( java.parameters = "-Xmx32g" )
library(rJava)
library(RJDBC)
drv <- JDBC("org.apache.hive.jdbc.HiveDriver", "/tmp/r_jars/hive-jdbc.jar")
for(jar in list.files('/tmp/r_jars/')){
.jaddClassPath(paste("/tmp/r_jars/",jar,sep=""))
}
conn <- dbConnect(drv, "jdbc:hive2://10.40.51.75:10000/default", "myusername", "mypassword")
createSCOREDDL_query <- "CREATE EXTERNAL TABLE hiveschema.mytable (
myvariables
)
ROW FORMAT SERDE
'com.bizo.hive.serde.csv.CSVSerde'
STORED AS INPUTFORMAT
'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
's3://mybucket/myschema/'"
dbGetQuery(conn, createSCOREDDL_query)
dbDisconnect(conn)
Instead of dbGetQuery can you try using dbSendUpdate? I was having similar issues and making this switch solved the problem.
I tried with the following code as suggested by #KaIC and it worked:
dbSendUpdate(conn, "CREATE EXTERNAL TABLE hiveschema.mytable ( col_A string, col_B string) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS TEXTFILE")
For multiple tables, you can create a list or loop within a function and use an apply() construct to apply it to the entire loop.

Resources