I am trying to use RSQLite to read in tables from my database. All the tables have column names with ".".
For example: my test table has 2 columns: index, first.name
How do I write a query to filter test table with first name column:
My code is:
dbGetQuery(con,"SELECT * FROM test WHERE 'first.name' = 'Joe'")
and it gave me an error:
Error: no such column: first.name
The below should work: Adding []
dbGetQuery(con,"SELECT * FROM test WHERE [first.name] = 'Joe'")
See the below thread:
How to write a column name with dot (".") in the SELECT clause?
New to R programming.
I have a simple sql server query whose output looks like this :
EFFECTIVE_DATE NumberOfUser
2015-07-01 564
2015-07-02 433
2015-07-03 306
2015-07-04 50
Here's how I issue the query:
barData <- sqlQuery(sqlCon,
"select EFFECTIVE_DATE,COUNT(USER_ID) as NumberOfUser from UserTable where start_dt between '20150701' AND '20150704' group by EFFECTIVE_DATE order by EFFECTIVE_DATE")
Now I am running this query from R and want to do a barplot on this. What is the best way to do that?
Also how do i convert any query result to a data.table with which I can do barplot? When I try table(myList), it is showing a different format altogether.
The help on sqlQuery (I don't use ODBC generally) says "On success, a data frameā¦" is returned. That would mean you should be able to do something like:
barplot(barData$NumberOfUser, names.arg=barData$EFFECTIVE_DATE,
xlab="Effective Date", ylab="Number of Users")
But posting the output of a dput(barData) into your question would really make it easier to help you.
Assuming that you used the sqldf package in R, an sql query of the form SELECT EFFECTIVE_DATE, NUM_OF_USERS FROM USERTABLE is executed using the sqldf(x, stringsAsFactors = FALSE,...) statement:
sql_string <- "select
effective_date
, num_of_users
from USRTABLE"
user_dates <- sqldf(join_string,stringsAsFactors = TRUE)
resulting in a data.frame object. Use the data.table package to convert the data frame into a data table:
user_dates <- as.data.table(user_dates)
A new data frame, user_dates, will be created using the sqldf statement. The sqldf statement, at minimum, requires a character string with the SQL operation to be performed. The stringsAsFactors argument will force categorical variables to have the class character rather than factor.
EDIT : Sincere apologies, didn't see you stating the package name in the question. In case you decide to use the sqldf package, creating a bar plot is a straightforward call to the barplot(height, ...) function:
barplot(user_dates$num_of_users,names.arg=user_dates$effective_date)
Also please note that the result of the sqlQuery on successful execution is a data frame and not a list:
On success, a data frame (possibly with 0 rows) or character string. On error, if
errors = TRUE
a character vector of error message(s), otherwise an invisible integer error code
-1
(general, call
odbcGetErrMsg
for details) or
-2
(no data, which may not be an error as some SQL statements do
return no data).
I have a csv file that as ~1.9M rows and 32 columns. I also have limited RAM, which makes it loading into the memory very inconvenient. As a result I am thinking of using a database but do not have any intimate knowledge on the subject and so have have looked around at this site but found no viable solns so far.
The CSV file looks like this:
Case,Event,P01,P02,P03,P04,P05,P06,P07,P08,P09,P10,P11,P12,P13,P14,P15,P16,P17,P18,P19,P20,P21,P22,P23,P24,P25,P26,P27,P28,P29,P30
C000039,E97553,8,10,90,-0.34176313227395744,-5.581162038780728E-4,-0.12090388100201072,-1.5172412910939355,-0.9075283173030568,2.0571877671625742,-0.002902632819930783,-0.6761896565590585,-0.7258602353522214,0.8684602429202587,0.0023189312896576167,0.002318939470525324,-0.1881462494296103,-0.0014303471592995315,-0.03133299206977217,7.72338072867324E-4,-0.08952068388668191,-1.4536398437657685,-0.020065144945600275,-0.16276139919188118,0.6915962670997067,-1.593412697264055,-1.563877781707804,-1.4921751129092755,4.701551108078644,6,-0.688302560842075
C000039,E23039,8,10,90,-0.3420173545012358,-5.581162038780728E-4,-1.6563770995734233,-1.5386562526752448,-1.3604342580422861,2.1025445031625525,-0.0028504751366762804,-0.6103972392687121,-2.0390388918403284,-1.7249948885013526,0.00231891181914203,0.0023189141684282384,-0.18603688853814693,-0.0014303471592995315,-0.03182759137355937,0.001011754948131039,0.13009444290656555,-1.737249614361576,-0.015763602969926262,-0.16276139919188118,0.7133868949811379,-1.624962995908364,-1.5946762525901037,-1.5362787555380522,4.751479927607516,6,-0.688302560842075
C000039,E23039,35,10,90,-0.3593468363273839,-5.581162038780728E-4,-2.2590624066428937,-1.540784192984501,-1.3651511418164592,0.05539868728273849,-0.00225912499740972,0.20899232681704485,-2.2007336302050633,-2.518401278903022,0.0023189850665203673,0.0023189834133465186,-0.1386548782028836,-0.0013092574968056093,-0.0315006293688149,9.042390365542781E-4,-0.3514180333671346,-1.8007561969675518,-0.008593259125791147,-2.295351187387221,0.6329101442826701,-1.8095530459660578,-1.7748676145152822,-1.495347406256394,2.553693742122162,34,-0.6882806822066699
....
....
upto 1.9 M rows
As you can see the 'Case' column repeats itself but I want to only get unique records before importing it into a dataframe. So i used this:
f<-file("test.csv")
bigdf <- sqldf("select * from 'f' where Case in (select Case from 'f' group by Case having count(*) = 1)", dbname = tempfile(), file.format = list(header = T, row.names = F))
However I get this error:
Error in sqliteExecStatement(con, statement, bind.data) :
RS-DBI driver: (error in statement: near "in": syntax error)
Is there something obvious I am missing here.
Much thanks in advance.
CASE is a keyword, so you have to quote this column name as "Case" in your query.
For those who want unique rows using sqldf, use DISTINCT:
newdf <- sqldf("SELECT DISTINCT * FROM df") # Get unique rows
sqldf uses SQLite Syntax by default.
newdf <- sqldf("SELECT DISTINCT name FROM df") # Get unique column values
newdf <- sqldf("SELECT *, COUNT(DISTINCT name) as NoNames FROM df GROUP BY whatever") # Get a count of unique names
if you use "Case" in sqldf in R, you should put a "," before "Case". Because the "Case" query is the whole line, you should make it seperate.
This question already has answers here:
Dynamic "string" in R
(4 answers)
Closed 5 years ago.
Is it possible to pass a value into the query in dbGetQuery from the RMySQL package.
For example, if I have a set of values in a character vector:
df <- c('a','b','c')
And I want to loop through the values to pull out a specific value from a database for each.
library(RMySQL)
res <- dbGetQuery(con, "SELECT max(ID) FROM table WHERE columna='df[2]'")
When I try to add the reference to the value I get an error. Wondering if it is possible to add a value from an R object in the query.
One option is to manipulate the SQL string within the loop. At the moment you have a string literal, the 'df[2]' is not interpreted by R as anything other than characters. There are going to be some ambiguities in my answer, because df in your Q is patently not a data frame (it is a character vector!). Something like this will do what you want.
Store the output in a numeric vector:
require(RMySQL)
df <- c('a','b','c')
out <- numeric(length(df))
names(out) <- df
Now we can loop over the elements of df to execute your query three times. We can set the loop up two ways: i) with i as a number which we use to reference the elements of df and out, or ii) with i as each element of df in turn (i.e. a, then b, ...). I will show both versions below.
## Version i
for(i in seq_along(df)) {
SQL <- paste("SELECT max(ID) FROM table WHERE columna='", df[i], "';", sep = "")
out[i] <- dbGetQuery(con, SQL)
dbDisconnect(con)
}
OR:
## Version ii
for(i in df) {
SQL <- paste("SELECT max(ID) FROM table WHERE columna='", i, "';", sep = "")
out[i] <- dbGetQuery(con, SQL)
dbDisconnect(con)
}
Which you use will depend on personal taste. The second (ii) version requires you to set names on the output vector out that are the same as the data inside out.
Having said all that, assuming your actual SQL Query is similar to the one you post, can't you do this in a single SQL statement, using the GROUP BY clause, to group the data before computing max(ID)? Doing simple things in the data base like this will likely be much quicker. Unfortunately, I don't have a MySQL instance around to play with and my SQL-fu is weak currently, so I can't given an example of this.
You could also use the sprintf command to solve the issue (it's what I use when building Shiny Apps).
df <- c('a','b','c')
res <- dbGetQuery(con, sprintf("SELECT max(ID) FROM table WHERE columna='%s'"),df())
Something along those lines should work.