Access parquet files in Azure DataBricks by R via RODBC - r

I successfully configured connection to Azure DataBricks cluster and can query tables with
conn <- odbcConnect("AzureDatabricks")
sqlQuery(conn, "SELECT * FROM my_table")
but I need to access parquet files.
In Databricks I can do it with this code:
%sql
Select * FROM parquet.`/path/to/folder`
If I try this by R as
sqlQuery(conn, "Select * FROM parquet.`/path/to/folder`")
I receive error:
[Simba][SQLEngine] Table or view not found: SPARK.parquet./path/to/folder"
[RODBC] ERROR: Could not SQLExecDirect 'Select * FROM parquet.`/path/to/folder`
Is there way to access parquet files via RODBC?

You are experiencing this issue due to an error in your sql query itself. When you run Select * FROM parquet./path/to/folder, command you will not see table or view not found due to syntax error.
Example: Sample example for understanding the issue (when you run SELECT * FROM parquer.'somepath'), you will see the syntax error.
Note: After creating a Dataframe from parquet file, you have to register it as a temp table to run sql queries on it.
val sqlContext = new org.apache.spark.sql.SQLContext(sc)
val df = sqlContext.read.parquet("src/main/resources/peopleTwo.parquet")
df.printSchema
// after registering as a table you will be able to run sql queries
df.registerTempTable("people")
sqlContext.sql("select * from people").collect.foreach(println)
Reference: Spark SQL guide - Parquet files

You need to add UseNativeQuery=1; parameter to the odbc connection string.
Examlpe: Driver={Simba Spark ODBC Driver};Host=[serverHost];Port=443;HTTPPath=[httpPath];ThriftTransport=2;SSL=1;AuthMech=3;UID=token;PWD=[pwd];UseNativeQuery=1;
https://docs.databricks.com/integrations/jdbc-odbc-bi.html#ansi-sql-92-query-support-in-odbc

Related

Push/Export large datframe from R to Vertica database

I have a dataframe of 10M rows which needs to be uploaded back from R to Vertica Database.
The DBwrite() function from DBI is running into memory issues and I have tried increasing memory to 16g by
options(java.parameters = c("-XX:+UseConcMarkSweepGC", "-Xmx16g"))
Still the process is running into memory issue. I am planning to use bulk copy option of vertica to copy the csv file to create the table.
I have created an empty table on vertica
When I am executing the query
dbSendQuery(vertica, "COPY hpcom_usr.VM_test FROM LOCAL \'/opt/mount1/musoumit/MarketBasketAnalysis/Code/test.csv\' enclosed by \'\"\' DELIMITER \',\' direct REJECTED DATA \'./code/temp/rejected.txt\' EXCEPTIONS \'./code/temp/exceptions.txt\'")
I am running into this error.
Error in .verify.JDBC.result(r, "Unable to retrieve JDBC result set", :
Unable to retrieve JDBC result set
JDBC ERROR: [Vertica]JDBC A ResultSet was expected but not generated from query "COPY hpcom_usr.VM_test FROM LOCAL '/opt/mount1/musoumit/MarketBasketAnalysis/Code/test.csv' enclosed by '"' DELIMITER ',' direct REJECTED DATA './code/temp/rejected.txt' EXCEPTIONS './code/temp/exceptions.txt'". Query not executed.
Please help with what i'm doing wrong here.
Vertica also provides STDIN option aswell. Link
Please help me how can I execute this.
My Environment.
CENT OS 7
R 3.6.3 (No R Studio here I have to execute this from CLI)
Tidyverse 1.0.x
Vertica driver 9.x
System 128GB Memory and 28Core system.
Your problem is that you fire dbSendQuery() , which lives with a following dbFetch() and a final dbClearResult() - but only for query SQL statements - those that actually return a result set.
Vertica's COPY <table> FROM [LOCAL] 'file.ext' ... command is treated like a DML command. And for those - as this docu says ...
https://www.rdocumentation.org/packages/DBI/versions/0.5-1/topics/dbSendQuery
.. you need to use dbSendStatement() for data manipulation statements.
Have a go at it that way - good luck ...
dbSendUpdate(vertica, "COPY hpcom_usr.VM_test FROM LOCAL \'/opt/mount1/musoumit/MarketBasketAnalysis/Code/test.csv\' enclosed by \'\"\' DELIMITER \',\' direct REJECTED DATA \'./code/temp/rejected.txt\' EXCEPTIONS \'./code/temp/exceptions.txt\'")
instead of dbSendQuery did the trick for me.

How do I load data from my SQLite DB into Rstudio?

I created a database with SQL for a schoolproject. Currently I'm stuck at importing this data into rstudio. I put my DB file in this directory: /Users/milanpatty/Documents/Business/Semester_2/R/Proftaak". When I tried to make a connection with this code:
db <- dbConnect(SQLite(), dbname = 'Festivate.db')
I got this error:
Warning message:
Couldn't set synchronous mode: file is not a database
Use `synchronous` = NULL to turn off this warning.
BTW I use DB browser as an interface for SQLite. Could anybody help me with this problem?
edit: I installed the library RSQLite in R

Execute SQL with "like" statement in R Language

I am trying to execute a SQL Query through R to get the data from Access DB
Normal SQL statement works fine, but when it comes to like statement its throwing error
Below is code :
library(RODBC);
channel = odbcDriverConnect("Driver={Microsoft Access Driver (*.mdb, *.accdb)};DBQ=C:/Users/ADMIN/Documents/R.accdb")
test = sqlQuery(channel ,paste('SELECT R.ID, R.Template, R.WEDate FROM R WHERE R.Template Like "*slow*"'))
Error:
[1] "07002 -3010 [Microsoft][ODBC Microsoft Access Driver] Too few parameters. Expected 2."
[2] "[RODBC] ERROR: Could not SQLExecDirect 'SELECT R.ID, R.Template, R.WEDate FROM R WHERE (R.Template Like \"slow\")'
Is there a way to fix this.
Consider both of #joran's suggestions with single quote enclosing string literals AND using the ANSI-92 wildcard operator %. You would use asterisk, * (ANSI-89 mode) when running an internal query, namely inside the MSAccess.exe GUI program (which defaults to DAO) or if you connect externally to Access with DAO. Meanwhile, ADO connections uses the percent symbol which most external interfaces uses including RODBC.
I was able to reproduce your issue and both these remedies worked. Also, no need to use paste() as you are not concatenating any other object to query statement.
library(RODBC);
channel = odbcDriverConnect("Driver={Microsoft Access Driver (*.mdb, *.accdb)};
DBQ=C:/Users/ADMIN/Documents/R.accdb")
test = sqlQuery(channel,
"SELECT R.ID, R.Template, R.WEDate FROM R WHERE R.Template Like '%slow%'")

Cannot access the Hive tables through JDBC-in R

I am trying to fetch records of a hive table into R console.
I have successfully created the connection. However, when I try to access the hive table weblogs, it indicates that the table can't be found.
I have already created the weblogs table in HIVE and I have granted the permission on this table to the user from which I'm logged in.
I am using derbydb metaphore
weblogs <- dbReadTable(conn,"weblogs")
Error in .verify.JDBC.result(r, "Unable to retrieve JDBC result set
for ", : Unable to retrieve JDBC result set for SELECT * FROM
weblogs (Error while compiling statement: FAILED: SemanticException
[Error 10001]: Line 1:14 Table not found 'weblogs')

R DBI / RPostgreSQL-- connection succeeds but dbListTables returns no tables

The following code connects to my PostgreSQL database successfully (or appears to, at any rate), but attempt to issue queries were met with "relation does not exist" errors, so I tried dbListTables, which doesn't return any tables at all. The database name passed to dbConnect is correct, and the tables do exist. I think the code I'm using is exactly the same as what I was using recently, which worked successfully. Any ideas?
> library(RPostgreSQL)
Loading required package: DBI
> drv <- dbDriver("PostgreSQL")
> con <- dbConnect(drv, dbname="mydb", user="user", password=password)
> dbListTables(con)
character(0)
I'm new to both R and DBI, so I'm sure I could be missing something extremely simple...any help would be appreciated.
Solved-- I was right; it was something incredibly simple (and very, very stupid) on my part. I was running the script from the wrong server. The server I was running it from has an empty copy of the database I was attempting to connect to, so everything succeeded, and the empty result from dbListTables was correct. Once I switched servers (or simply specified the host on the other server), everything worked.
1.Connet to MySQL
a)if Mysql is installed in your system, if not install it.
b)download the RMySQL IN R
library(RMySQL)
drv = dbDriver("MySQL 5.0.1")
make sure MySQL version is correct.
con = dbConnect(drv,host="localhost",dbname="test",user="root",pass="root")
use local host or use the server i.e ip address
use the required database name, user name and password
album = dbGetQuery(con,statement="select * from table")
run required query
close(con)
2.Another way to connect database
a)first install any database like MySQL,Oracle,SQL Server
b)install the ODBC connector for database
library(Rodbc)
channel <- odbcConnect("test", uid="ripley", pwd="secret")
test is the connection name of odbc conector which user has to set manualy
user can find this in Administrator tool
res <- sqlFetch(ch, "table name")
A table can be retrieved as a data frame
res<-sqlQuery(channel, paste("select query"))
part of the with condition one table can be retrieved as a data frame
sqlSave(channel, dataframe)
to save a dataframe to the database(dont use "res<-" something like this)
like user can use
sqlCopy()
sqlDrop()
sqlTables()
close(channel)
always close the connection

Resources