How to apply dtplyr with SQL Server database - r

I am trying to apply dtplyr to a SQL Server database.
I succeeded in applying dplyr as shown below, but I don't know how to apply dtplyr
How can I do this?
library(odbc)
library(DBI)
library(tidyverse)
library(dtplyr)
library(dbplyr)
con <- DBI::dbConnect(odbc::odbc(),
Driver = "SQL Server",
Server = "address",
Database = "dbname",
UID = "ID",
PWD = "password")
dplyr::tbl(con, dbplyr::in_schema("dbo", "table1"))

The comments by #Waldi capture the essence of the matter. You can not use dtplyr with SQL Server as it only translates commands to data.table objects.
The official dtplyr documentation states:
The goal of dtplyr is to allow you to write dplyr code that is automatically translated to the equivalent ... data.table code
The official dbplyr documentation states:
It allows you to use remote database tables as if they are in-memory data frames by automatically converting dplyr code into SQL
Both dbplyr and dtplyr translate dplyr commands. Which one you use depends on whether you are working with data.table type objects (in R memory) or remote SQL databases (whichever flavor of SQL you prefer).

Related

When connecting R to Microsoft SQL Server, do you have to use a DSN?

I want to connect R to SQL Server so I can export some R data frames as tables to SQL Server.
From a few online tutorials, I've seen they use the RODBC package, and it seems that you first need to create an ODBC name first, by going to ODBC Data sources (64-bit) > System DSN > Add > SQL Server Native Client 11.0> and then insert your specifications.
I have no idea how databases are managed, so forgive my ignorance here.. my question is: if there is already a database/server set up on SQL Server, particularly also where I want to export my R data to, do I still need to do this?
For instance, when I open Microsoft SQL Server Management Studio, I see the following:
Server type: Database Engine
Server name: example.server.myorganization.com
Authentication: SQL Sever Authentication
Login: organization_user
Password: organization_password
After logging in, I can access a database called "Organization_Division_DBO" > Tables which is where I want to upload my data from R as a table. Does this mean the whole ODBC shebang is already setup for me, and I can skip the steps mentioned here where an ODBC needs to be set up?
Can I instead use the code shown here:
library(sqldf)
library(odbc)
con <- dbConnect(odbc(),
Driver = "SQL Server",
Server = "example.server.myorganization.com",
Database = "Organization_Division_DBO",
UID = "organization_user",
PWD = "organization_password")
dbWriteTable(conn = con,
name = "My_R_Table",
value = ) ## x is any data frame I have in R
I note that on this page they use a similar code to above (what is port number?) and also there is some mention "that there is also support for DSNs", so I am a little confused. Also, is there any advantage/disadvantage over using the ODBC package over the RODBC package to do this?

r - SQL on large datasets from several access databases

I'm working on a process improvement that will use SQL in r to work with large datasets. Currently the source data is stored in several different MS Access databases. My initial approach was to use RODBC to read all of the source data into r, and then use sqldf() to summarize the data as needed. I'm running out of RAM before I can even begin use sqldf() though.
Is there a more efficient way for me to complete this task using r? I've been looking for a way to run a SQL query that joins the separate databases before reading them into r, but so far I haven't found any packages that support this functionality.
Should your data be in a database dplyr (a part of the tidyverse) would be the tool you are looking for.
You can use it to connect to a local / remote database, push your joins / filters / whatever there and collect() the result as a data frame. You will find the process neatly summarized on http://db.rstudio.com/dplyr/
What I am not quite certain of - but it is not a R issue but rather an MS Access issue - is the means for accessing data across multiple MS Access databases.
You may need to write custom SQL code for that & pass it to one of the databases via DBI::dbGetQuery() and have MS Access handle the database link.
The link you posted looks promising. If it doesn't yield the intended results, consider linking one Access DB to all the others. Links take almost no memory. Union the links and fetch the data from there.
# Load RODBC package
library(RODBC)
# Connect to Access db
channel <- odbcConnectAccess("C:/Documents/Name_Of_My_Access_Database")
# Get data
data <- sqlQuery(channel , paste ("select * from Name_of_table_in_my_database"))
These URLs may help as well.
https://www.r-bloggers.com/getting-access-data-into-r/
How to connect R with Access database in 64-bit Window?

Connecting to DynamoDB using R

I would like to connect to DynamoDB with R. My ultimate goal is to create a Shiny App to display data that is stored at DynamoDB and updated frequently. So I need an efficient way to retrieve it using R.
The following references give an intuition but they do not include a native implementation in R and have not been updated for a long time.
r language support for AWS DynamoDB
AWS dynamodb support for "R" programming language
R + httr and EC2 api authentication issues
As mentioned in the answers above, running Python within R through rPython would be an option as there are SDKs for Python such as boto3.
Another alternative would be using a JDBC driver through RJDBC, which I tried:
library(RJDBC)
drv <- JDBC(
driverClass = "cdata.jdbc.dynamodb.DynamoDBDriver",
classPath = "MyInstallationDir\lib\cdata.jdbc.dynamodb.jar",
identifier.quote = "'"
)
conn <- dbConnect(
drv,
"Access Key=xxx;Secret Key=xxx;Domain=amazonaws.com;Region=OREGON;"
)
(Access Key and Secret Key replaced by xxx) and I got the error:
Error in .verify.JDBC.result(jc, "Unable to connect JDBC to ", url) :
Unable to connect JDBC to Access Key=xxx;Secret
Key=xxx;Domain=amazonaws.com;Region=OREGON;
What would be the best practice in this matter? Is there a working, native solution for R? I would appreciate if anyone could point me in the right direction.
Note: The package aws.dynamodb (https://github.com/cloudyr/aws.dynamodb) looks promising but the documentation lacks examples and I could not find any tutorial for it.
I would like to share some updates so that people with the same issue can benefit from this post:
First, I figured out how to use the JDBC driver with a few tweaks:
library(DBI)
library(RJDBC)
drv <- JDBC(
driverClass = "cdata.jdbc.dynamodb.DynamoDBDriver",
classPath = "/Applications/CData/CData JDBC Driver for DynamoDB 2018/lib/cdata.jdbc.dynamodb.jar",
identifier.quote = "'"
)
conn <- dbConnect(
drv,
url = 'jdbc:dynamodb: Access Key=xxx; SecretKey=xxx; Domain=amazonaws.com; Region=OREGON;'
)
dbListTables(conn)
Second, I realized that reticulate makes it very convenient (even more than rPython) to run Python code inside R and ended up using reticulated boto3 to get data from DynamoDB into R. You can refer to the following documentations for additional info:
reticulate
boto3 - DynamoDB
Last, I heard that RStudio is planning to build a NoSQL database driver (which would be compatible with DBI, dbplyr, pool etc.) but probably it won't be available sometime soon.
Hope someone will create an R package as comprehensive as boto3 for AWS as it gets more and more popular.

Difference between src_postgres and dbConnect function to connect R with postgres

what is the difference between src_postgres and dbConnect function? Both can be used to connect R with postgres using the RPosgresql package. In my experiments I only could use src_postgres to read and dbConnect to write to the database.
When I tried it in different combinations I only received errors.
This seems fairly strange to me.
src_postgres is a function for creating a connection to a PostgreSQL database from the dplyr package. The RPostgreSQL package implements a method for the generic dbConnect from the DBI package. src_postgres calls dbConnect from RPostgreSQL (I assume).
The generic connection object returned by dbConnect is meant to be an open ended interface for sending SQL queries to the data base. This means you could feed it any select, update, insert, delete, etc. query that you like.
src_postgres is part of the higher level interface to working with data from databases that Hadley built in dplyr. The src_* functions connect to a db and then the tbl functions specify a more specific data source (table, view, arbitrary select query) to pull data from. There are some basic table manipulation functions in dplyr but I don't believe it is intended to be a tool for doing update or insert type things in the db. That's just not what that tool is for. Note that the "verbs" implemented in dplyr are all focused on pulling data out and summarising (select, filter, mutate, etc.).
If you need to alter data in a data base on a row level, you'll need to send SQL queries to a connection created by dbConnect. If all you're doing is pulling data from a db and analyzing it in R, that is what dplyr is for.

Export R results to vertica DB

I have a table which should store the results of analytics it have performed. Connection of R and Vertica is done and also able to extract the data from vertica tables, but is not able to store the result of my analysis into the Vertica table.
Can someone help with how to insert records in Vertica through R commands via RODBC?
Here is the code i tried in Oracle:
install.packages("RODBC")
library("RODBC")
channeldev<-odbcConnect("Dev_k", uid="krish", pwd="****", believeNRows=FALSE)
odbcGetInfo(channeldev)
dataframe_dev<- sqlQuery(channeldev, "
SELECT input_stg_id
FROM
k.input_stg WHERE emp_ID=85
and update_timestamp > to_date('8/5/2013 04.00.00','mm/dd/yyyy HH24.MI.SS')")
dataframe_dev
sqlSave(channeldev,dataframe_dev,tablename="K.R2_TEST",append=TRUE)
sqlUpdate(channeldev, dataframe_dev, tablename="K.R2_TEST",index="INPUT_STG_ID")
You can use basically the same RODBC command sequence you have for Oracle:
Load the RODBC library: library(RODBC)
Connect to the Vertica database: odbcConnect()
Save data: sqlSave(). For performance reasons I'd suggest to set fast=TRUE, disable auto commit and commit the transaction at the end.
However, using Vertica bulk load utility COPY with the option LOCAL: it works transparently through ODBC and is much, much faster than sqlSave()

Resources