I'm trying to import an excel worksheet into R. I want to retrieve a (character) ID column and a couple of date columns from the worksheet. The following code works fine but brings one column in as a date and not another. I think it has something to do with more leading columns being empty in the second date field.
dateFile <- odbcConnectExcel2007(xcelFile)
query <- "SELECT ANIMALID, ST_DATE_TIME, END_DATE_TIME FROM [KNWR_CL$]"
idsAndDates <- sqlQuery(dateFile,query)
So my plan now is to bring in the date columns as character fields and convert them myself using as.POSIXct. However, the following code produces only a single row in idsAndDates.
dateFile <- odbcConnectExcel2007(xcelFile)
query <- "SELECT ANIMALID, ST_DATE_TIME, END_DATE_TIME FROM [KNWR_CL$]"
idsAndDates <- sqlQuery(dateFile,query,as.is=TRUE,TRUE,TRUE)
What am I doing wrong?
I had to move on and ended up using the gdata library (which worked). I'd still be interested in an answer for this though.
Related
So I have 2 excels VIN1.xlsx and VIN2.xlsx that I need to compare.
VIN1 excel has a dale column OUTGATE_DT which is populated for atleast 1 rows.
VIN2 excel has a date column OUTGATE_DT which is completely null for all rows.
when I import VIN1.xlsx excel using read_excel, it creates the object, and when I check the OUTGATE_DT column, it says its datatype to be as POSIXct[1:4] (which I assume is correct for Date Column. )
But when I import VIN2.xlsx excel using read_excel, it creates the object, and when I check the OUTGATE_DT column, it says its datatype to be logical[1:4] (it is doing this because this column is entirely empty).
and that is why my compare_df(vin1,vin2) functions failing
stating -
Error in rbindlist(l, use.names, fill, idcol) :
Class attribute on column 80 of item 2 does not match with column 80 of item 1.
I am completely new with R, your help would be highly appreciated. TIA
Please check the screenshot for reference.
You should use read_excel() as the following read_excel(, col_types = "text")
All your columns will be considered as text so you won't have any issue to compare or anything.
Or, if you want to keep the column types in your original df, you can do something like this:
library(dplyr)
library(readxl)
VIN2 <- read_excel(VIN2.xlsx) %>%
mutate(OUTGATE_DT = as.Date(OUTGATE_DT))
then you shouldn't have a problem using rbind or bind_rows from dplyr.
First and foremost - thank you for taking your time to view my question, regardless of if you answer or not!
I am trying to create a function that loops through my df and queries in the necessary data from SQL using the RODBC package in R. However, I am having trouble setting up the query, since the parameter of the query change through each iteration (example below)
So my df looks like this:
ID Start_Date End_Date
1 2/2/2008 2/9/2008
2 1/1/2006 1/1/2007
1 5/7/2010 5/15/2010
5 9/9/2009 10/1/2009
How would I go about specifying the start date and end date in my sql program?
here's what i have so far:
data_pull <- function(df) {
a <- data.frame()
b <- data.frame()
for (i in df$id)
{
dbconnection <- odbcDriverConnect(".....")
query <- paste("Select ID, Date, Account_Balance from Table where ID = (",i,") and Date > (",df$Start_Date,") and Date <= (",df$End_Date,")")
a <- sqlQuery(dbconnection, paste(query))
b <- rbind(b,a)
}
return(b)
}
However, this doesn't query in anything. I believe it has something to do with how I am specifying the start and the end date for the iteration.
If anyone can help on this it would be greatly appreciated. If you need further explanation, please don't hesitate to ask!
A couple of syntax issues arise from current setup:
LOOP: You do not iterate through all rows of data frame but only the atomic ID values in the single column, df$ID. In that same loop you are passing the entire vectors of df$Start_Date and df$End_Date into query concatenation.
DATES: Your date formats do not align to most data base date formats of 'YYYY-MM-DD'. And still some others like Oracle, you require string to data conversion: TO_DATE(mydate, 'YYYY-MM-DD').
A couple of aforementioned performance / best practices issues:
PARAMETERIZATION: While parameterization is not needed for security reasons since your values are not generated by user input who can inject malicious SQL code, for maintainability and readability, parameterized queries are advised. Hence, consider doing so.
GROWING OBJECTS: According to Patrick Burn's Inferno Circle 2: Growing Objects, R programmers should avoid growing multi-dimensional objects like data frames inside a loop which can cause excessive copying in memory. Instead, build a list of data frames to rbind once outside the loop.
With that said, you can avoid any looping or listing needs by saving your data frame as a database table then joined to final table for a filtered, join query import. This assumes your database user has CREATE TABLE and DROP TABLE privileges.
# CONVERT DATE FIELDS TO DATE TYPE
df <- within(df, {
Start_Date = as.Date(Start_Date, format="%m/%d/%Y")
End_Date = as.Date(End_Date, format="%m/%d/%Y")
})
# SAVE DATA FRAME TO DATABASE
sqlSave(dbconnection, df, "myRData", rownames = FALSE, append = FALSE)
# IMPORT JOINED AND DATE FILTERED QUERY
q <- "SELECT ID, Date, Account_Balance
FROM Table t
INNER JOIN myRData r
ON r.ID = t.ID
AND t.Date BETWEEN r.Start_Date AND r.End_Date"
final_df <- sqlQuery(dbconnection, q)
I'm using R to tidy data supplied to me (in a SAS file) so that I can bulk insert it into a SQLserver database. The problem that I'm having is that sometimes numeric fields get transformed by R after I read them in eg.(the leading 0 gets dropped, some numeric fields convert to scientific notation, long ID numbers turn into gibberish after the 15th digit).
Reading all the data into R as character solves these issues. When I'm supplied a csv file I can use data.tables 'fread' function to specify colClasses = 'character' however as far as I'm aware something like this doesnt exist for the 'read_sas' function from the haven package.
Are there any workarounds or extra documentation on how I can better approach and solve this issue?
Edit to highlight issues (left values is numeric and what I want to avoid, right value is as character and what I want):
1.
postcode <- c(0629,'0629')
postcode
[1] "629" "0629"
2.
id <- c(12000000,'12000000')
id
[1] "1.2e+07" "12000000"
3.
options(scipen=999)
id <- c(123123123123123123123123,'123123123123123123123123')
id
[1] "123123123123123117883392" "123123123123123123123123"
How can I import the data directly from SAS so that all columns in the data frame are read in as character data type (in order to avoid data quality issues when I insert into SQLserver)
I hope my title makes sense! If not feel free to edit it.
I have a table in R that contains unique dates. Sometimes this table may have one date at other times it may have multiple dates .I would like to loop these unique dates into SQL query I have created to pull data and append to px_tbl. I am at a loss however where to start. Below is what I have so far and obviously works when I have only 1 unique date however when the table contains 2 dates it doesn't pull.
unique_dates_df
DATE
2016-12-15
2017-02-15
2017-03-02
2017-03-09
sqlCMD_px <- 'SELECT *
FROM "_SYS_BIC"."My.Table/PRICE"
(\'PLACEHOLDER\' = (\'$$P_EFF_DATE$$\',\'%D\'))'
sqlCMD_px <- gsub("%D", unique_dates_tbl, sqlCMD_px)##<- the gsub is needed
so that the dates are formatted correctly for the SQL pull
px_tbl <- sqlQuery(myconn, sqlCMD_px)
I am convinced that an apply function will work in one form or another but haven't been able to figure it out. Thanks for the help!
This should work:
#SQL command template
sqlCmdTemp <- 'SELECT *
FROM '_SYS_BIC'.'My.Table/PRICE'
(\'PLACEHOLDER\' = (\'$$P_EFF_DATE$$\',\'%D\'))'
#Dates as character
unique_dates <- c("2017-03-08","2017-03-09", "2017-03-10")
#sapply command
res<-sapply(unique_dates, function(d) { sqlQuery(conn, gsub("%D",d,sqlCmdTemp))},simplify=F)
#bind rows
tbl.df<-do.call(rbind,res)
There is a large dataset that I need to download over the web using R, but I would like to learn how to filter it at the same time while downloading to the Dates that I need. Right now, I have it setup to download and .unzip and then I create another data set with a filter. The file is a text ";" delimited file
There is a Date column with format 1/1/2009 and I need to only select two dates, 3/1/2009 and 3/2/2009, how to do that in R ?
When I import it, R set it as a factor, since I only need those two dates and there is no need to do a Between, I just select the two factors and call it a day.
Thanks!
I don't think you can filter while downloading. To select only these dates you can use the subset function:
# do not convert string to factors
d.all = read.csv(file, ..., stringsAsFactors = FALSE, sep = ';')
# Date column is called DATE:
d.filter = subset(d.all, DATE %in% c("1/1/2009", "3/1/2009"))