I would like to connect to DynamoDB with R. My ultimate goal is to create a Shiny App to display data that is stored at DynamoDB and updated frequently. So I need an efficient way to retrieve it using R.
The following references give an intuition but they do not include a native implementation in R and have not been updated for a long time.
r language support for AWS DynamoDB
AWS dynamodb support for "R" programming language
R + httr and EC2 api authentication issues
As mentioned in the answers above, running Python within R through rPython would be an option as there are SDKs for Python such as boto3.
Another alternative would be using a JDBC driver through RJDBC, which I tried:
library(RJDBC)
drv <- JDBC(
driverClass = "cdata.jdbc.dynamodb.DynamoDBDriver",
classPath = "MyInstallationDir\lib\cdata.jdbc.dynamodb.jar",
identifier.quote = "'"
)
conn <- dbConnect(
drv,
"Access Key=xxx;Secret Key=xxx;Domain=amazonaws.com;Region=OREGON;"
)
(Access Key and Secret Key replaced by xxx) and I got the error:
Error in .verify.JDBC.result(jc, "Unable to connect JDBC to ", url) :
Unable to connect JDBC to Access Key=xxx;Secret
Key=xxx;Domain=amazonaws.com;Region=OREGON;
What would be the best practice in this matter? Is there a working, native solution for R? I would appreciate if anyone could point me in the right direction.
Note: The package aws.dynamodb (https://github.com/cloudyr/aws.dynamodb) looks promising but the documentation lacks examples and I could not find any tutorial for it.
I would like to share some updates so that people with the same issue can benefit from this post:
First, I figured out how to use the JDBC driver with a few tweaks:
library(DBI)
library(RJDBC)
drv <- JDBC(
driverClass = "cdata.jdbc.dynamodb.DynamoDBDriver",
classPath = "/Applications/CData/CData JDBC Driver for DynamoDB 2018/lib/cdata.jdbc.dynamodb.jar",
identifier.quote = "'"
)
conn <- dbConnect(
drv,
url = 'jdbc:dynamodb: Access Key=xxx; SecretKey=xxx; Domain=amazonaws.com; Region=OREGON;'
)
dbListTables(conn)
Second, I realized that reticulate makes it very convenient (even more than rPython) to run Python code inside R and ended up using reticulated boto3 to get data from DynamoDB into R. You can refer to the following documentations for additional info:
reticulate
boto3 - DynamoDB
Last, I heard that RStudio is planning to build a NoSQL database driver (which would be compatible with DBI, dbplyr, pool etc.) but probably it won't be available sometime soon.
Hope someone will create an R package as comprehensive as boto3 for AWS as it gets more and more popular.
Related
I'm working on converting a simple Python script to R to act as a template for connecting up to SQLite files. The data is on a NFS mount, and we've run into a few snags in setting up the original Python template (namely, IO errors), but we were able to work around them by connecting with read-only mode and setting the VFS to unix-none, e.g.:
# Python version
path = "file///mnt_nfs/examplepath.edu/path/file.db"
connect = sqlite3.connect(path + "?mode=ro&vfs=unix-none", uri = True)
cur = connect.cursor()
While we know this is far from a perfect solution, it's acting as an interim solution while we set up a more robust database (so our users can still connect to their data in the meantime). However, most of our students are more familiar with R than Python, and I'm having difficulty finding how to recreate the workarounds in R. Is there some way to set dbConnect to include the read-only and unix-none arguments (or equivalent)?
I have the basics, but it's throwing a similar disk IO Error as the Python code was before we added the arguments. I can't seem to find any info on it in the DBI documentation.
connect <- dbConnect(RSQLite::SQLite(), path)
I think it is
dbConnect(SQLite(), dbname=path, flags=SQLITE_RO, vfs="unix-none")
documented at
library(RSQLite)
?`dbConnect,SQLiteDriver-method`
I want to connect R to SQL Server so I can export some R data frames as tables to SQL Server.
From a few online tutorials, I've seen they use the RODBC package, and it seems that you first need to create an ODBC name first, by going to ODBC Data sources (64-bit) > System DSN > Add > SQL Server Native Client 11.0> and then insert your specifications.
I have no idea how databases are managed, so forgive my ignorance here.. my question is: if there is already a database/server set up on SQL Server, particularly also where I want to export my R data to, do I still need to do this?
For instance, when I open Microsoft SQL Server Management Studio, I see the following:
Server type: Database Engine
Server name: example.server.myorganization.com
Authentication: SQL Sever Authentication
Login: organization_user
Password: organization_password
After logging in, I can access a database called "Organization_Division_DBO" > Tables which is where I want to upload my data from R as a table. Does this mean the whole ODBC shebang is already setup for me, and I can skip the steps mentioned here where an ODBC needs to be set up?
Can I instead use the code shown here:
library(sqldf)
library(odbc)
con <- dbConnect(odbc(),
Driver = "SQL Server",
Server = "example.server.myorganization.com",
Database = "Organization_Division_DBO",
UID = "organization_user",
PWD = "organization_password")
dbWriteTable(conn = con,
name = "My_R_Table",
value = ) ## x is any data frame I have in R
I note that on this page they use a similar code to above (what is port number?) and also there is some mention "that there is also support for DSNs", so I am a little confused. Also, is there any advantage/disadvantage over using the ODBC package over the RODBC package to do this?
I am trying to connect to a MySQL database through RMySQL but get the following error
"Error in .local(drv, ...) :
Failed to connect to database: Error: Unknown database 'XXX'"
Has anyone had a similar issue and was able to resolve it?
Running
macOS High Sierra, Version 10.13.6
MySQL workbench 8.0
RStudio Version 1.1.453
I constructed the SQL driver as follows:
install.packages("RMySQL")
install.packages("dbConnect")
library(DBI)
library(dbConnect)
con <- dbConnect(RMySQL::MySQL(),
dbname = "xxx",
host = "xxx",
port = xxx,
user = "xxx",
password = "xxx")
I've been following Filip Schouwenaars' datacamp course Importing Data in R (https://www.datacamp.com/courses/importing-data-in-r-part-1) and was hoping to establish a connection to the SQL database and create an MySQLConnection object to then run SQL queries from inside R.
The problem is that I get stuck at the very beginning because of the failure to connect to database. In MySQLworkbench, the script opens and looks great. I'm a complete newbie at this, and am wondering whether this may have something to do with the location of the database file itself? Should I be saving it in a specific folder?
PS: I've read through all RMySQL threads on here and could not find a solution; if I missed something, please let me know. This is my first ask on this forum, and I'm both super grateful for the community here but also worried that I missed something, somewhere. THANK YOU for your help.
Solution: I did in fact not have a 'database' (or schema, as they are now called in MySQL), but merely an .sql file. Once I created a database from the file, it worked like a charm!
For other newbies out there, especially in the humanities, this was a very helpful tutorial on how to set up MySQL with R: https://programminghistorian.org/en/lessons/getting-started-with-mysql-using-r
Can I import data directly from a Snowflake database into an R? I was able to do this by creating an ODBC connection with my Snowflake credentials; however, my company switched Snowflake to single sign on and I'm unable to get it to work.
You can certainly connect to Snowflake using R and I would highly recommend that you use RJDBC library. Two requirements are to install the RJDBC library and download snowflake’s jar file. (needless to say, make sure jar file is safe somewhere on your drive that it can not be deleted or moved accidentally). You may pick a version of jar files, say, 3.6.6.
# load library
library(RJDBC)
# specify driver
jdbcDriver <- JDBC(driverClass="net.snowflake.client.jdbc.SnowflakeDriver",
classPath="/home/username/R/snowflake-jdbc-3.6.6.jar") # <-- this is where I saved the jar file
# create a connection
# this is the most critical part.
# you have to make sure you enter your SSO path as well as corp username with domain
con <- dbConnect(jdbcDriver, "jdbc:snowflake://company.us-east-1.snowflakecomputing.com/?authenticator=https://your_domain_name.okta.com/",
'username#domain.com', 'password')
# to query data
# at this point, you are good to go. start querying data.
dbGetQuery(con, "select current_timestamp() as now")
We support OKTA single sign on from ODBC. Please follow https://docs.snowflake.net/manuals/user-guide/odbc-parameters.html for steps to configure your ODBC DSN.
I am trying to connect R to Teradata and am not sure what the input items are to the RODBC::odbcDriverConnect(). There is a teradataR package, but it is only used with R versions 3 and under, which I neither have nor want to switch to. Below is a list of the input parameters to get ODBCDriverConnect to work. "Connection" I believe is most important. I need to get an address for a driver that I don't even know if I have. This is what I need most help with. How do I get a driver for Teradata to connect to R? IT at my work is not sure how to do this. Also, if anyone knows of another way to connect Teradata to R (some other package?), please let me know.
connection = ""
case
believeNRows = TRUE
colQuote, tabQuote = colQuote
interpretDot = TRUE
DBMSencoding = "",
rows_at_time = 100
readOnlyOptimize = FALSE
Thank you for your help!
I was able to connect R to Teradata using RODBC package. Here is how to do it if you are working on a pc and have a Teradata driver.
Set up DSN:
Go to: control panel-> administrative tools -> Data Sources (ODBC) -> User DSN tab -> click add-> select Teradata driver (or whatever driver you will be using. ie. could be sql) and press finish.
A box will pop up that needs to be filled in. The following fields need to be filled:
Name: Can be any name you would like. I chose TeraDataRConnection, for example.
Name or IP address (DBC name or address): Mine for example is: Databasename.companyname.com. I looked to see how Microsoft access was connected to the database and in doing that, found the DBC address.
Username: username that you use to connect to database.
Password: password use to connect to databases (if you don't put your password in here, you will have to manually type it into R every time you connect.
In R:
Download RODBC package
library(RODBC)
ch=odbcConnect("TeraDataRConnection", uid="USERNAME HERE",pwd="PASSWORD HERE")
If you want to confim you are connected, you can type in this code to see the tables:
ListOfTables=sqlTables(ch,tableType="TABLE")
That's it!
I am able to connect to Teradata and created a Shiny app which reads data from it.
Firstly we need to install RODBC package in our R. Prerequisite of it is R (≥ 4.0.0) version. No admin access is required to upgrade R even in enterprise laptops.
Follow below steps to successfully setup connection.
Create ODBC Data Sources to connect to Teradata. The connection should be either in 64bit or 32bit, depending on R software.
Use below code snippet to get the data into reactive variable
data <- reactive({
ch <- odbcConnect(dsn = "DSNName", uid = "username", pwd = "password")
sqlQuery(ch,paste('select * from emp ')
})
DSNName - Name of DSN connection created
You can use data() to display and use the value stored in it.
Enjoy!