Setup Global Options of RMySQL to use with sqldf in R - r

I am having a lot of problems with the coding of columns using sqldf and Rpostgree (when I do querys in datasets that contain letters of the Spanish language it converts them into strange characters), for this reason I want to move to MYSQL, but I find its documentation too confusing To use it with sqldf, I need to know how I can enter the parameters of my database to use sqldf with my previously configured database.
Example for Postgree:
options (sqldf.RPostgreSQL.user = "postgres",
sqldf.RPostgreSQL.password = "test",
sqldf.RPostgreSQL.dbname = "postgres",
sqldf.RPostgreSQL.host = "localhost",
sqldf.RPostgreSQL.port = 5432)
Someone knows how I can check the options of a specific package, in this case RMySQL.

Finally, after at least 3 hours of tortuous reading in multiple articles I could find a way to configure the mysql database to be used by sqldf and without dying in the attempt, I must warn you that this is the only way to work, since if you create a connection using RMySQL and then use the connection as a parameter in sqldf you will get multiple connection errors.
Create a my.cnf file
Here is everything necessary for it to work, it must be rs-dbi because this is the default testing environment used by sqldf to perform the queries, in addition, you must first install a mysql database using the old method of authentication, if you use encryption it will generate errors because sqldf drivers do not support it.
the local-infile parameters must be added to allow local data loading to the mysql database, otherwise it will generate errors.
[rs-dbi] user = root password = test port = 3310 default-character-set
= latin1 host = 127.0.0.1 database = test local-infile [mysql] local-infile [mysqld] local_infile = 1
Just sqldf (x = test_query, drv = "RMySQL")
enjoy. The coding of the selected databases must be taken into account as the default in rs-dbi to avoid coding errors when making queries.

The documentation is in ?sqldf . (You will need to refer to the MySQL documentation and RMySQL R package for details. This could vary depending on which version of MySQL you are using.) From ?sqldf:
On MySQL the database must pre-exist. Create a c:\my.ini or
%MYSQL_HOME%\my.ini file on Windows or a /etc/my.cnf file on UNIX to
contain information about the database. This file may specify the
username, password and port. The password can be omitted if one has
not been set. If using a standard port setup then the port can be
omitted as well. The database is taken from the dbname argument of the
sqldf command or if not set from getOption("sqldf.dbname") or if that
option is not set it is assumed to be "test". Note that MySQL does not
use the user, password, host and codeport arguments of sqldf. See
http://dev.mysql.com/doc/refman/5.6/en/option-files.html for
additional locations that the configuration files can be placed as
well as other information.
Also you could try using SQLite instead (which is the default back end and requires no installation). This works for me on Windows using the default SQLite backend.
library(sqldf)
d <- data.frame(x = "el perro saltó sobre el zorro perezoso")
sqldf("select * from d")
## x
## 1 el perro saltó sobre el zorro perezoso

Related

DBConnect to open SQLite in read-only mode in R

I'm working on converting a simple Python script to R to act as a template for connecting up to SQLite files. The data is on a NFS mount, and we've run into a few snags in setting up the original Python template (namely, IO errors), but we were able to work around them by connecting with read-only mode and setting the VFS to unix-none, e.g.:
# Python version
path = "file///mnt_nfs/examplepath.edu/path/file.db"
connect = sqlite3.connect(path + "?mode=ro&vfs=unix-none", uri = True)
cur = connect.cursor()
While we know this is far from a perfect solution, it's acting as an interim solution while we set up a more robust database (so our users can still connect to their data in the meantime). However, most of our students are more familiar with R than Python, and I'm having difficulty finding how to recreate the workarounds in R. Is there some way to set dbConnect to include the read-only and unix-none arguments (or equivalent)?
I have the basics, but it's throwing a similar disk IO Error as the Python code was before we added the arguments. I can't seem to find any info on it in the DBI documentation.
connect <- dbConnect(RSQLite::SQLite(), path)
I think it is
dbConnect(SQLite(), dbname=path, flags=SQLITE_RO, vfs="unix-none")
documented at
library(RSQLite)
?`dbConnect,SQLiteDriver-method`

When connecting R to Microsoft SQL Server, do you have to use a DSN?

I want to connect R to SQL Server so I can export some R data frames as tables to SQL Server.
From a few online tutorials, I've seen they use the RODBC package, and it seems that you first need to create an ODBC name first, by going to ODBC Data sources (64-bit) > System DSN > Add > SQL Server Native Client 11.0> and then insert your specifications.
I have no idea how databases are managed, so forgive my ignorance here.. my question is: if there is already a database/server set up on SQL Server, particularly also where I want to export my R data to, do I still need to do this?
For instance, when I open Microsoft SQL Server Management Studio, I see the following:
Server type: Database Engine
Server name: example.server.myorganization.com
Authentication: SQL Sever Authentication
Login: organization_user
Password: organization_password
After logging in, I can access a database called "Organization_Division_DBO" > Tables which is where I want to upload my data from R as a table. Does this mean the whole ODBC shebang is already setup for me, and I can skip the steps mentioned here where an ODBC needs to be set up?
Can I instead use the code shown here:
library(sqldf)
library(odbc)
con <- dbConnect(odbc(),
Driver = "SQL Server",
Server = "example.server.myorganization.com",
Database = "Organization_Division_DBO",
UID = "organization_user",
PWD = "organization_password")
dbWriteTable(conn = con,
name = "My_R_Table",
value = ) ## x is any data frame I have in R
I note that on this page they use a similar code to above (what is port number?) and also there is some mention "that there is also support for DSNs", so I am a little confused. Also, is there any advantage/disadvantage over using the ODBC package over the RODBC package to do this?

How to connect to parquet files in Azure Blob Storage with arrow::open_dataset?

I am open to other ways of doing this. Here are my constraints:
I have parquet files in a container in Azure Blob Storage
These parquet files will be partitioned by a product id, as well as the date (year/month/day)
I am doing this in R, and want to be able to connect interactively (not just set up a notebook in databricks, though that is something I will probably want to figure out later)
Here's what I am able to do:
I understand how to use arrow::open_dataset() to connect to a local parquet directory: ds <- arrow::open_dataset(filepath, partitioning = "product")
I can connect to, view, and download from my blob container with the AzureStor package. I can download a single parquet file this way and turn it into a data frame:
blob <- AzureStor::storage_endpoint("{URL}", key="{KEY}")
cont <- AzureStor::storage_container(blob, "{CONTAINER-NAME}")
parq <- AzureStor::storage_download(cont, src = "{FILE-PATH}", dest = NULL)
df <- arrow::read_parquet(parq)
What I haven't been able to figure out is how to use arrow::open_dataset() to reference the parent directory of {FILE-PATH}, where I have all the parquet files, using the connection to the container that I'm creating with AzureStor. arrow::open_dataset() only accepts a character vector as the "sources" parameter. If I just give it the URL with the path, I'm not passing any kind of credential to access the container.
Unfortunately, you probably are not going to be able to do this today purely from R.
Arrow-R is based on Arrow-C++ and Arrow-C++ does not yet have a filesystem implementation for Azure. There are JIRA tickets ARROW-9611,ARROW-2034 for creating one but these tickets are not in progress at the moment.
In python it is possible to create a filesystem purely in python using the FSspec adapter. Since there is a python SDK for Azure Blob Storage it should be possible to do what you want today in python.
Presumably something similar could be created for R but you would still need to create the R equivalent of the fsspec adapter and that would involve some C++ code.
If you use Azure Synapse then you can connect to your data with odbc as if it were a SQL Server database and it has support for partitioning and other files types as well. The pricing, from what I recall, is like $5/month fixed plus $5/TB queried.
Querying data would look something like this...
library(odbc)
syncon <- dbConnect(odbc(),
Driver = "SQL Server Native Client 11.0",
Server = "yourname-ondemand.sql.azuresynapse.net",
Database = "dbname",
UID = "sqladminuser",
PWD = rstudioapi::askForPassword("Database password"),
Port = 1433)
somedata <- dbGetQuery(syncon, r"---{SELECT top 100
result.filepath(1) as year,
result.filepath(2) as month,
*
FROM
OPENROWSET(
BULK 'blobcontainer/directory/*/*/*.parquet',
DATA_SOURCE='blobname',
FORMAT = 'parquet'
) as [result]
order by node, pricedate, hour}---")
the filepath keyword refers to the name of the directory in the BULK path.
Here's the MS website https://learn.microsoft.com/en-us/azure/synapse-analytics/sql/query-specific-files
You can also make views so that people who like SQL but not parquet files can query the views without having to know anything about the underlying data structure, it'll just look like a SQL Server database to them.

R connect to database

Sorry but I am failing at a very simple task right now.
I have the following database information:
database name
hostname
port
SID
TNS
User ID
password
I want to build a connection with the RODBC package.
According to the results of my google search i should do
conn<-odbcConnect(dsn, uid=***, pwd=***)
what is "dsn"? is this even the right way?
dsn is Data Source Name, which is a shortcut you may define on your machine to store key information about the connection. How you set up a DSN varies depending on your operating system.
I write scripts that run on multiple machines, so rather than use a DSN, I use odbcDriverConnect, via something like
odbcDriverConnect(connection="driver=[driver]; server=[server]; database=[database]; uid = [User ID]; pwd = [password]")
You'll need to know your driver name to make this work. Where to find this will depend on your operating system, as well as the flavor of SQL you are using.

R and odbcDriverConnect() to connect R to teradata

I am trying to connect R to Teradata and am not sure what the input items are to the RODBC::odbcDriverConnect(). There is a teradataR package, but it is only used with R versions 3 and under, which I neither have nor want to switch to. Below is a list of the input parameters to get ODBCDriverConnect to work. "Connection" I believe is most important. I need to get an address for a driver that I don't even know if I have. This is what I need most help with. How do I get a driver for Teradata to connect to R? IT at my work is not sure how to do this. Also, if anyone knows of another way to connect Teradata to R (some other package?), please let me know.
connection = ""
case
believeNRows = TRUE
colQuote, tabQuote = colQuote
interpretDot = TRUE
DBMSencoding = "",
rows_at_time = 100
readOnlyOptimize = FALSE
Thank you for your help!
I was able to connect R to Teradata using RODBC package. Here is how to do it if you are working on a pc and have a Teradata driver.
Set up DSN:
Go to: control panel-> administrative tools -> Data Sources (ODBC) -> User DSN tab -> click add-> select Teradata driver (or whatever driver you will be using. ie. could be sql) and press finish.
A box will pop up that needs to be filled in. The following fields need to be filled:
Name: Can be any name you would like. I chose TeraDataRConnection, for example.
Name or IP address (DBC name or address): Mine for example is: Databasename.companyname.com. I looked to see how Microsoft access was connected to the database and in doing that, found the DBC address.
Username: username that you use to connect to database.
Password: password use to connect to databases (if you don't put your password in here, you will have to manually type it into R every time you connect.
In R:
Download RODBC package
library(RODBC)
ch=odbcConnect("TeraDataRConnection", uid="USERNAME HERE",pwd="PASSWORD HERE")
If you want to confim you are connected, you can type in this code to see the tables:
ListOfTables=sqlTables(ch,tableType="TABLE")
That's it!
I am able to connect to Teradata and created a Shiny app which reads data from it.
Firstly we need to install RODBC package in our R. Prerequisite of it is R (≥ 4.0.0) version. No admin access is required to upgrade R even in enterprise laptops.
Follow below steps to successfully setup connection.
Create ODBC Data Sources to connect to Teradata. The connection should be either in 64bit or 32bit, depending on R software.
Use below code snippet to get the data into reactive variable
data <- reactive({
ch <- odbcConnect(dsn = "DSNName", uid = "username", pwd = "password")
sqlQuery(ch,paste('select * from emp ')
})
DSNName - Name of DSN connection created
You can use data() to display and use the value stored in it.
Enjoy!

Resources