I am trying to get the row counts of all my tables with a query and I want to save the results in a dataframe. Right now, it only saves one value and I'm not sure what the issue is. Thanks for any help.
schema <- "test"
table_prefix <- "results_"
row_count <- list()
for (geo in geos){
table_name <- paste0(schema, ".", table_prefix, geo)
queries <- paste("SELECT COUNT(*) FROM", table_name)
}
for (x in queries){
row_count <- dbGetQuery(con, x)
}
Related
Having issues with pasting a variable into the query string for RMariaDB. I can return a query without paste and find the proper where statement I am looking for within the dataframe I query (ex. MIN). When I try to use a variable in the query it fails. I have searched stackoverflow up and down and read the dbgetquery docs but nothing seems to be working. I am sure it is something simple, just can't seem to find it.
library(RMariaDB)
team <- "MIN"
# This returns entire database with MIN in tm column.
filename <- dbGetQuery(conn, "select * from nhl_lab_lines_today")
# These will all give me a [1054] error.
test <- paste("select * from nhl_lab_lines_today WHERE tm = ",paste(team,collapse=", "),sep ="")
test <- paste("select * from nhl_lab_lines_today WHERE tm = team")
test <- paste("select * from nhl_lab_lines_today WHERE tm =", team,sep=" ")
filename <- dbGetQuery(conn, test)
dbGetQuery(con, paste0("select * from nhl_lab_lines_today WHERE tm = '", team ,"'"))
I would like to work on a large database table. The idea was to read some rows, process them, append the result to another table, and so on. In code:
stmt <- "SELECT * FROM input_table WHERE cond"
rs <- DBI::dbSendQuery(con, stmt)
while (!DBI::dbHasCompleted(rs)) {
current_set <- DBI::dbFetch(rs, 50000)
res <- process(current_set)
dbWriteTable(con, "output_table", value=res, append=TRUE)
}
DBI::dbClearResult(rs)
However, I get the message "Closing open result set, pending rows". Is there any way to save the intermediate output?
I would like to work with sqlite and later on Postgres.
Just for reference, I ended up with a solution using a LIMIT / OFFSET construct. Not sure if it efficient, but it is fast enough for my case (700k rows).
batchsize <- 50000
stmt <- "SELECT * FROM input_table WHERE cond"
lim <- paste("LIMIT", batchsize, ";")
finished <- FALSE
i <- 0
while (!finished) {
curr_stmt <- paste(stmt, lim)
current_set <- dbGetQuery(con, curr_stmt)
res <- process(current_set)
dbWriteTable(con, "output_table", value=res, append=TRUE)
finished <- nrow(current_set) < batchsize
i <- i + nrow(current_set)
lim <- paste("LIMIT", batchsize, "OFFSET", i, ";")
}
I am trying to write this commands in a loop:
table1 <- table[table$ringnr == 1,]
interaction.plot(table1$expnr, table1$disturbance, table1$flights)
table2 <- table[table$ringnr == 2,]
interaction.plot(table2$expnr, table2$disturbance, table2$flights)
table3 <- table[table$ringnr == 3,]
interaction.plot(table3$expnr, table3$disturbance, table3$flights)`
etc
This is what i have so far:
for(i in 1:19){
mypath <- file.path("C:", "Users", paste("expnr_", i, ".jpg", sep = ""))
jpeg(file=mypath)
assign(paste("table",i), subset(table, ringnr == i))
interaction.plot(table[i]$expnr, table[i]$disturbance, table[i]$flights)
dev.off()}
The first part is working and i get the data set table1, table2 etc
However, if i want to work with them in the next line R doesn't understand that i want those data set.
I know it is bad practice to use an loop for this. But does anyone know how i can work further with those dataframe created in the loop?
or can i do that with a apply function?
Thanks in advance!
If you do not need the filtered tables later, you can do this:
for(i in 1:19){
mypath <- file.path("C:", "Users", paste("expnr_", i, ".jpg", sep = ""))
jpeg(file=mypath)
temp_table <- subset(table, ringnr == i)
interaction.plot(temp_table$expnr, temp_table$disturbance, temp_table$flights)
dev.off()
}
If you need them later you can store them a list:
table_list<- list()
for(i in 1:19){
mypath <- file.path("C:", "Users", paste("expnr_", i, ".jpg", sep = ""))
jpeg(file=mypath)
table_list[[i]] <- subset(table, ringnr == i)
interaction.plot(table_list[[i]]$expnr, table_list[[i]]$disturbance, table_list[[i]]$flights)
dev.off()
}
Don't name your object table, table is a commonly used R function and you're overwriting it, you'll run into trouble at some point.
Also and more importantly here, don't create 3 tables here, put them in a list, they are numbered objects of the same kind, they should stay linked. Avoiding using assignat all is a good rule of thumb.
your_tables <- lapply(1:3,function(i) subset(your_table,ringnr == i))
Then you can do for example:
lapply(your_tables,function(x) interaction.plot(x$expnr, x$disturbance, x$flights))
I'm trying to import a number of .db3 files, and rbind them together for further analysis. I'm having no troubles importing a single .db3 file, but my rbind won't work, despite it working fine for .csv files. Where have I gone wrong?
df <- c()
for (x in list.files(pattern="*.db3")){
sqlite <- dbDriver("SQLite")
mydb <- dbConnect(sqlite, x)
dbListTables(mydb)
results <- dbSendQuery(mydb, "SELECT * FROM gps_data")
data = fetch(results, n = -1)
data$Label <- factor(x)
data <- rbind(df, data)
}
Any help you can offer would be great!
Let's have a close look at that rbind call at the end of your loop:
df <- c()
for (x in list.files(pattern="*.db3")){
sqlite <- dbDriver("SQLite")
mydb <- dbConnect(sqlite, x)
dbListTables(mydb)
results <- dbSendQuery(mydb, "SELECT * FROM gps_data")
data = fetch(results, n = -1)
data$Label <- factor(x)
data <- rbind(df, data)
}
You've created the object df, then you're binding data to the end of it and using that to override the existing data (note df hasn't changed). Great. Now your loop starts again, creating a new data object, and binding it to.... df. Doh! It's a simple error, but you're binding things in the wrong order. Try changing that last line to:
df <- rbind( df, data )
and see how it goes.
What you'll be doing differently is overwriting df over and over, making it bigger each time. When you overwrote data, you went back and recreated it anew, throwing away what you'd just done.
I have an SQLite database file exported from Scraperwiki with .sqlite file extension. How do I import it into R, presumably mapping the original database tables into separate data frames?
You could use the RSQLite package.
Some example code to store the whole data in data.frames:
library("RSQLite")
## connect to db
con <- dbConnect(drv=RSQLite::SQLite(), dbname="YOURSQLITEFILE")
## list all tables
tables <- dbListTables(con)
## exclude sqlite_sequence (contains table information)
tables <- tables[tables != "sqlite_sequence"]
lDataFrames <- vector("list", length=length(tables))
## create a data.frame for each table
for (i in seq(along=tables)) {
lDataFrames[[i]] <- dbGetQuery(conn=con, statement=paste("SELECT * FROM '", tables[[i]], "'", sep=""))
}
To anyone else that comes across this post, a nice way to do the loop from the top answer using the purr library is:
lDataFrames <- map(tables, ~{
dbGetQuery(conn=con, statement=paste("SELECT * FROM '", .x, "'", sep=""))
})
Also means you don't have to do:
lDataFrames <- vector("list", length=length(tables))
Putting together sgibb's and primaj's answers, naming tables, and adding facility to retrieve all tables or a specific table:
getDatabaseTables <- function(dbname="YOURSQLITEFILE", tableName=NULL){
library("RSQLite")
con <- dbConnect(drv=RSQLite::SQLite(), dbname=dbname) # connect to db
tables <- dbListTables(con) # list all table names
if (is.null(tableName)){
# get all tables
lDataFrames <- map(tables, ~{ dbGetQuery(conn=con, statement=paste("SELECT * FROM '", .x, "'", sep="")) })
# name tables
names(lDataFrames) <- tables
return (lDataFrames)
}
else{
# get specific table
return(dbGetQuery(conn=con, statement=paste("SELECT * FROM '", tableName, "'", sep="")))
}
}
# get all tables
lDataFrames <- getDatabaseTables(dbname="YOURSQLITEFILE")
# get specific table
df <- getDatabaseTables(dbname="YOURSQLITEFILE", tableName="YOURTABLE")