I have an SQLite database file exported from Scraperwiki with .sqlite file extension. How do I import it into R, presumably mapping the original database tables into separate data frames?
You could use the RSQLite package.
Some example code to store the whole data in data.frames:
library("RSQLite")
## connect to db
con <- dbConnect(drv=RSQLite::SQLite(), dbname="YOURSQLITEFILE")
## list all tables
tables <- dbListTables(con)
## exclude sqlite_sequence (contains table information)
tables <- tables[tables != "sqlite_sequence"]
lDataFrames <- vector("list", length=length(tables))
## create a data.frame for each table
for (i in seq(along=tables)) {
lDataFrames[[i]] <- dbGetQuery(conn=con, statement=paste("SELECT * FROM '", tables[[i]], "'", sep=""))
}
To anyone else that comes across this post, a nice way to do the loop from the top answer using the purr library is:
lDataFrames <- map(tables, ~{
dbGetQuery(conn=con, statement=paste("SELECT * FROM '", .x, "'", sep=""))
})
Also means you don't have to do:
lDataFrames <- vector("list", length=length(tables))
Putting together sgibb's and primaj's answers, naming tables, and adding facility to retrieve all tables or a specific table:
getDatabaseTables <- function(dbname="YOURSQLITEFILE", tableName=NULL){
library("RSQLite")
con <- dbConnect(drv=RSQLite::SQLite(), dbname=dbname) # connect to db
tables <- dbListTables(con) # list all table names
if (is.null(tableName)){
# get all tables
lDataFrames <- map(tables, ~{ dbGetQuery(conn=con, statement=paste("SELECT * FROM '", .x, "'", sep="")) })
# name tables
names(lDataFrames) <- tables
return (lDataFrames)
}
else{
# get specific table
return(dbGetQuery(conn=con, statement=paste("SELECT * FROM '", tableName, "'", sep="")))
}
}
# get all tables
lDataFrames <- getDatabaseTables(dbname="YOURSQLITEFILE")
# get specific table
df <- getDatabaseTables(dbname="YOURSQLITEFILE", tableName="YOURTABLE")
Related
I have ~ 250 csv files I want to load into SQLite db. I've loaded all the csv into my global environment as data frames. I'm using the following function to copy all of them to db but get Error: df must be local dataframe or a remote tbl_sql
library(DBI)
library(odbc)
library(rstudioapi)
library(tidyverse)
library(dbplyr)
library(RSQLite)
library(dm)
# Create DB Instance ---------------------------------------------
my_db <- dbConnect(RSQLite::SQLite(), "test_db.sqlite", create = TRUE)
# Load all csv files ---------------------------------------------
filenames <- list.files(pattern = ".*csv")
names <- substr(filenames, 1, nchar(filenames)-4)
for (i in names) {
filepath <- file.path(paste(i, ".csv", sep = ""))
assign(i, read.csv(filepath, sep = ","))
}
# Get list of data.frames ----------------------------------------
tables <- as.data.frame(sapply(mget(ls(), .GlobalEnv), is.data.frame))
colnames(tables) <- "is_data_frame"
tables <- tables %>%
filter(is_data_frame == "TRUE")
table_list <- row.names(tables)
# Copy dataframes to db ------------------------------------------
for (j in table_list) {
copy_to(my_db, j)
}
I have had mixed success using copy_to. I recommend the dbWriteTable command from the DBI package. Example code below:
DBI::dbWriteTable(
db_connection,
DBI::Id(
catalog = db_name,
schema = schema_name,
table = table_name
),
r_table_name
)
This would replace your copy_to command. You will need to provide a string to name the table, but the database and schema names are likely optional and can probably be omitted.
I would like to work on a large database table. The idea was to read some rows, process them, append the result to another table, and so on. In code:
stmt <- "SELECT * FROM input_table WHERE cond"
rs <- DBI::dbSendQuery(con, stmt)
while (!DBI::dbHasCompleted(rs)) {
current_set <- DBI::dbFetch(rs, 50000)
res <- process(current_set)
dbWriteTable(con, "output_table", value=res, append=TRUE)
}
DBI::dbClearResult(rs)
However, I get the message "Closing open result set, pending rows". Is there any way to save the intermediate output?
I would like to work with sqlite and later on Postgres.
Just for reference, I ended up with a solution using a LIMIT / OFFSET construct. Not sure if it efficient, but it is fast enough for my case (700k rows).
batchsize <- 50000
stmt <- "SELECT * FROM input_table WHERE cond"
lim <- paste("LIMIT", batchsize, ";")
finished <- FALSE
i <- 0
while (!finished) {
curr_stmt <- paste(stmt, lim)
current_set <- dbGetQuery(con, curr_stmt)
res <- process(current_set)
dbWriteTable(con, "output_table", value=res, append=TRUE)
finished <- nrow(current_set) < batchsize
i <- i + nrow(current_set)
lim <- paste("LIMIT", batchsize, "OFFSET", i, ";")
}
I am trying to get the row counts of all my tables with a query and I want to save the results in a dataframe. Right now, it only saves one value and I'm not sure what the issue is. Thanks for any help.
schema <- "test"
table_prefix <- "results_"
row_count <- list()
for (geo in geos){
table_name <- paste0(schema, ".", table_prefix, geo)
queries <- paste("SELECT COUNT(*) FROM", table_name)
}
for (x in queries){
row_count <- dbGetQuery(con, x)
}
I'm trying to import a number of .db3 files, and rbind them together for further analysis. I'm having no troubles importing a single .db3 file, but my rbind won't work, despite it working fine for .csv files. Where have I gone wrong?
df <- c()
for (x in list.files(pattern="*.db3")){
sqlite <- dbDriver("SQLite")
mydb <- dbConnect(sqlite, x)
dbListTables(mydb)
results <- dbSendQuery(mydb, "SELECT * FROM gps_data")
data = fetch(results, n = -1)
data$Label <- factor(x)
data <- rbind(df, data)
}
Any help you can offer would be great!
Let's have a close look at that rbind call at the end of your loop:
df <- c()
for (x in list.files(pattern="*.db3")){
sqlite <- dbDriver("SQLite")
mydb <- dbConnect(sqlite, x)
dbListTables(mydb)
results <- dbSendQuery(mydb, "SELECT * FROM gps_data")
data = fetch(results, n = -1)
data$Label <- factor(x)
data <- rbind(df, data)
}
You've created the object df, then you're binding data to the end of it and using that to override the existing data (note df hasn't changed). Great. Now your loop starts again, creating a new data object, and binding it to.... df. Doh! It's a simple error, but you're binding things in the wrong order. Try changing that last line to:
df <- rbind( df, data )
and see how it goes.
What you'll be doing differently is overwriting df over and over, making it bigger each time. When you overwrote data, you went back and recreated it anew, throwing away what you'd just done.
I have an SQLite database connection to a database file. I want to extract some data from one of the tables, do some processing in R and then create a temporary table on the same connection from the processed data. It needs to be a temp table because users may not have write access to the database, but I want to be able to query this new data alongside the data already in the database.
so, for example:
require(sqldf)
db <- dbConnect(SQLite(), "tempdb")
dbWriteTable(db, "iris", iris)
# do some processing in R:
d <- dbGetQuery(db, "SELECT Petal_Length, Petal_Width FROM iris;")
names(d) <- c("length_2", "width_2")
d <- exp(d)
and then I want to make a temporary table in the connection db from d
I know I could do:
dbWriteTable(conn=db, name= "iris_proc", value = d)
but I need it in a temp table and there doesn't seem to be an option for this in dbWriteTable.
One workaround I thought of was to add a temp table and then add columns and update them:
dbGetQuery(db, "CREATE TEMP TABLE iris_proc AS SELECT Species FROM iris;")
dbGetQuery(db, "ALTER TABLE iris_proc ADD COLUMN length_2;")
But then I can't get the data from d into the columns:
dbGetQuery(db, paste("UPDATE iris2 SET length_2 =", paste(d$length_2, collapse = ", "), ";"))
Error in sqliteExecStatement(con, statement, bind.data) :
RS-DBI driver: (error in statement: near "4.05519996684467": syntax error)
I imagine that, even if I get this to work, it will be horribly inefficient.
I thought there might have been some way to do this with read.csv.sql but this does not seem to work with open connection objects.
Use an in-memory database for the temporary table:
library(RSQLite)
db <- dbConnect(SQLite(), "tempdb")
dbWriteTable(db, "iris", iris)
d <- dbGetQuery(db, "SELECT Petal_Length, Petal_Width FROM iris")
d <- exp(d)
dbGetQuery(db, "attach ':memory:' as mem")
dbWriteTable(db, "mem.d", d, row.names = FALSE) # d now in mem database
dbGetQuery(db, "select * from iris limit 3")
dbGetQuery(db, "select * from mem.d limit 3")
dbGetQuery(db, "select * from sqlite_master")
dbGetQuery(db, "select * from mem.sqlite_master")