R convert query output to table - r

New to R programming.
I have a simple sql server query whose output looks like this :
EFFECTIVE_DATE NumberOfUser
2015-07-01 564
2015-07-02 433
2015-07-03 306
2015-07-04 50
Here's how I issue the query:
barData <- sqlQuery(sqlCon,
"select EFFECTIVE_DATE,COUNT(USER_ID) as NumberOfUser from UserTable where start_dt between '20150701' AND '20150704' group by EFFECTIVE_DATE order by EFFECTIVE_DATE")
Now I am running this query from R and want to do a barplot on this. What is the best way to do that?
Also how do i convert any query result to a data.table with which I can do barplot? When I try table(myList), it is showing a different format altogether.

The help on sqlQuery (I don't use ODBC generally) says "On success, a data frameā€¦" is returned. That would mean you should be able to do something like:
barplot(barData$NumberOfUser, names.arg=barData$EFFECTIVE_DATE,
xlab="Effective Date", ylab="Number of Users")
But posting the output of a dput(barData) into your question would really make it easier to help you.

Assuming that you used the sqldf package in R, an sql query of the form SELECT EFFECTIVE_DATE, NUM_OF_USERS FROM USERTABLE is executed using the sqldf(x, stringsAsFactors = FALSE,...) statement:
sql_string <- "select
effective_date
, num_of_users
from USRTABLE"
user_dates <- sqldf(join_string,stringsAsFactors = TRUE)
resulting in a data.frame object. Use the data.table package to convert the data frame into a data table:
user_dates <- as.data.table(user_dates)
A new data frame, user_dates, will be created using the sqldf statement. The sqldf statement, at minimum, requires a character string with the SQL operation to be performed. The stringsAsFactors argument will force categorical variables to have the class character rather than factor.
EDIT : Sincere apologies, didn't see you stating the package name in the question. In case you decide to use the sqldf package, creating a bar plot is a straightforward call to the barplot(height, ...) function:
barplot(user_dates$num_of_users,names.arg=user_dates$effective_date)
Also please note that the result of the sqlQuery on successful execution is a data frame and not a list:
On success, a data frame (possibly with 0 rows) or character string. On error, if
errors = TRUE
a character vector of error message(s), otherwise an invisible integer error code
-1
(general, call
odbcGetErrMsg
for details) or
-2
(no data, which may not be an error as some SQL statements do
return no data).

Related

creating a looped SQL QUERY using RODBC in R

First and foremost - thank you for taking your time to view my question, regardless of if you answer or not!
I am trying to create a function that loops through my df and queries in the necessary data from SQL using the RODBC package in R. However, I am having trouble setting up the query, since the parameter of the query change through each iteration (example below)
So my df looks like this:
ID Start_Date End_Date
1 2/2/2008 2/9/2008
2 1/1/2006 1/1/2007
1 5/7/2010 5/15/2010
5 9/9/2009 10/1/2009
How would I go about specifying the start date and end date in my sql program?
here's what i have so far:
data_pull <- function(df) {
a <- data.frame()
b <- data.frame()
for (i in df$id)
{
dbconnection <- odbcDriverConnect(".....")
query <- paste("Select ID, Date, Account_Balance from Table where ID = (",i,") and Date > (",df$Start_Date,") and Date <= (",df$End_Date,")")
a <- sqlQuery(dbconnection, paste(query))
b <- rbind(b,a)
}
return(b)
}
However, this doesn't query in anything. I believe it has something to do with how I am specifying the start and the end date for the iteration.
If anyone can help on this it would be greatly appreciated. If you need further explanation, please don't hesitate to ask!
A couple of syntax issues arise from current setup:
LOOP: You do not iterate through all rows of data frame but only the atomic ID values in the single column, df$ID. In that same loop you are passing the entire vectors of df$Start_Date and df$End_Date into query concatenation.
DATES: Your date formats do not align to most data base date formats of 'YYYY-MM-DD'. And still some others like Oracle, you require string to data conversion: TO_DATE(mydate, 'YYYY-MM-DD').
A couple of aforementioned performance / best practices issues:
PARAMETERIZATION: While parameterization is not needed for security reasons since your values are not generated by user input who can inject malicious SQL code, for maintainability and readability, parameterized queries are advised. Hence, consider doing so.
GROWING OBJECTS: According to Patrick Burn's Inferno Circle 2: Growing Objects, R programmers should avoid growing multi-dimensional objects like data frames inside a loop which can cause excessive copying in memory. Instead, build a list of data frames to rbind once outside the loop.
With that said, you can avoid any looping or listing needs by saving your data frame as a database table then joined to final table for a filtered, join query import. This assumes your database user has CREATE TABLE and DROP TABLE privileges.
# CONVERT DATE FIELDS TO DATE TYPE
df <- within(df, {
Start_Date = as.Date(Start_Date, format="%m/%d/%Y")
End_Date = as.Date(End_Date, format="%m/%d/%Y")
})
# SAVE DATA FRAME TO DATABASE
sqlSave(dbconnection, df, "myRData", rownames = FALSE, append = FALSE)
# IMPORT JOINED AND DATE FILTERED QUERY
q <- "SELECT ID, Date, Account_Balance
FROM Table t
INNER JOIN myRData r
ON r.ID = t.ID
AND t.Date BETWEEN r.Start_Date AND r.End_Date"
final_df <- sqlQuery(dbconnection, q)

RODBC (SQL Server) giving inconsistent results for long character fields converted to numeric [duplicate]

I'm trying to import a SQL Server table into R. The first column of this table is a 17-digit ID.
library(ODBC)
channel <- odbcConnect("my_db", uid="my_id", pwd="my_pw")
options(digits=22)
sqlQuery(channel, "select ID from dbo.my_table where ID = 10000000047974745")
Output:
ID
1 10000000047974744
As you can see the last digit is 4 instead of 5.
I've tried to use cast(ID as char) in the select, but the result is the same. What could I do?
As joran said, using as.is = TRUE as an argument to sqlQuery() solves the problem.

looping a gsub to pull from hana based on a table of values

I hope my title makes sense! If not feel free to edit it.
I have a table in R that contains unique dates. Sometimes this table may have one date at other times it may have multiple dates .I would like to loop these unique dates into SQL query I have created to pull data and append to px_tbl. I am at a loss however where to start. Below is what I have so far and obviously works when I have only 1 unique date however when the table contains 2 dates it doesn't pull.
unique_dates_df
DATE
2016-12-15
2017-02-15
2017-03-02
2017-03-09
sqlCMD_px <- 'SELECT *
FROM "_SYS_BIC"."My.Table/PRICE"
(\'PLACEHOLDER\' = (\'$$P_EFF_DATE$$\',\'%D\'))'
sqlCMD_px <- gsub("%D", unique_dates_tbl, sqlCMD_px)##<- the gsub is needed
so that the dates are formatted correctly for the SQL pull
px_tbl <- sqlQuery(myconn, sqlCMD_px)
I am convinced that an apply function will work in one form or another but haven't been able to figure it out. Thanks for the help!
This should work:
#SQL command template
sqlCmdTemp <- 'SELECT *
FROM '_SYS_BIC'.'My.Table/PRICE'
(\'PLACEHOLDER\' = (\'$$P_EFF_DATE$$\',\'%D\'))'
#Dates as character
unique_dates <- c("2017-03-08","2017-03-09", "2017-03-10")
#sapply command
res<-sapply(unique_dates, function(d) { sqlQuery(conn, gsub("%D",d,sqlCmdTemp))},simplify=F)
#bind rows
tbl.df<-do.call(rbind,res)

R Mysql help-need to do a calculation from variables in different tables

I'm (very!) new to R and mysql and I have been struggling and researching for this problem for days. So I would really appreciate ANY help.
I need to complete a mathematical expression from 2 variables in two different tables. Essentially, I'm trying to figure out how old a subject was (DOB is in one table) when they were serviced (date of service is in another table). I have an identifying variable that is the same in both.
I have tried merging these:
age<-merge("tbl1", "tbl2", by=c("patient_id") all= TRUE)
this returns:
Error in fix.by(by.x, x) : 'by' must specify a uniquely valid column
I have tried sub-setting where I just keep the variables of interest, but is is not working because I believe sub-setting only works for numbers not characters..right?
Again, I would appreciate any help. Thanks in advance
Since you are new to data.base , I think you should use dplyr here. It is an abstraction layer for many data base providers. So you will not have to deal with db problems. Here I show you a simple example of where:
I read the tables from the MYSQL
Merge the table assuming having unique shared variable
The code:
library(dplyr)
library(RMySQL)
## create a connection
SDB <- src_mysql(host = "localhost", user = "foo", dbname = "bar", password = getPassword())
# reading tables
tbl1 <- tbl(SDB, "TABLE1_NAME")
tbl2 <- tbl(SDB, "TABLE2_NAME")
## merge : this step can be done using dplyr also
age <- merge(tbl1, tbl2, all= TRUE)

RODBC sqlQuery as.is returning bad results

I'm trying to import an excel worksheet into R. I want to retrieve a (character) ID column and a couple of date columns from the worksheet. The following code works fine but brings one column in as a date and not another. I think it has something to do with more leading columns being empty in the second date field.
dateFile <- odbcConnectExcel2007(xcelFile)
query <- "SELECT ANIMALID, ST_DATE_TIME, END_DATE_TIME FROM [KNWR_CL$]"
idsAndDates <- sqlQuery(dateFile,query)
So my plan now is to bring in the date columns as character fields and convert them myself using as.POSIXct. However, the following code produces only a single row in idsAndDates.
dateFile <- odbcConnectExcel2007(xcelFile)
query <- "SELECT ANIMALID, ST_DATE_TIME, END_DATE_TIME FROM [KNWR_CL$]"
idsAndDates <- sqlQuery(dateFile,query,as.is=TRUE,TRUE,TRUE)
What am I doing wrong?
I had to move on and ended up using the gdata library (which worked). I'd still be interested in an answer for this though.

Resources