Background:
I can successfully pull a particular dataset (shown in the code below) from the internet using the read.csv() function. However, when I try to utilize the sqldf package to speed up the process using read.csv.sql() it produces errors. I've tried various solutions but can't seem to solve this problem.
I can successfully pull the data and create the data frame that I want with read.csv() using the following code:
ce_data <- read.csv("http://download.bls.gov/pub/time.series/cx/cx.data.1.AllData",
fill=TRUE, header=TRUE, sep="")
To test the functionality of sqldf on my machine, I successfully tested read.csv.sql() by reading in the data as 1 variable rather than the 5 desired using the following code:
library(sqldf)
ce_data_sql1 <- read.csv.sql("http://download.bls.gov/pub/time.series/cx/cx.data.1.AllData",
sql = "select * from file")
To produce the result that I got using read.csv() but utilizing the speed of read.csv.sql(), I tried this code:
ce_data_sql2 <- read.csv.sql("http://download.bls.gov/pub/time.series/cx/cx.data.1.AllData",
fill=TRUE, header=TRUE, sep="", sql = "select * from file")
Unfortunately, it produced this error:
trying URL
'http://download.bls.gov/pub/time.series/cx/cx.data.1.AllData' Content
type 'text/plain' length 24846571 bytes (23.7 MB) downloaded 23.7 MB
Error in sqldf(sql, envir = p, file.format = file.format, dbname =
dbname, : unused argument (fill = TRUE)
I have tried various methods to address the errors, using sqldf documentation and have been unsuccessful.
Question:
Is there a solution where I can read in this table with 5 variables desired using read.csv.sql()?
The reason you are reading it in as a single variable is because you did not correctly specify the separator for the original file. Try the following, where sep = "\t", for tab-separated:
ce_data_sql2 <- read.csv.sql("http://download.bls.gov/pub/time.series/cx/cx.data.1.AllData",
sep = "\t", sql = "select * from file")
.
The error you are getting in the final example:
Error in sqldf(sql, envir = p, file.format = file.format, dbname =
dbname, : unused argument (fill = TRUE)
Is due to the fact that read.csv.sql does not accept the fill argument.
Related
I'm just starting my journey with r, so I'm a complete newbie and I can't find anything that will help me solve this.
I have a csv table (random integers in each column) with 9 columns. I read 8 and I want to append them to a sql table with 8 fields (Col1 ... 8, all int's). After uploading the csv into rStudio, it looks right and only has 8 columns:
The code I'm using is:
# Libraries
library(DBI)
library(odbc)
library(tidyverse )
# CSV Files
df = head(
read_delim(
"C:/Data/test.txt",
" ",
trim_ws = TRUE,
skip = 1,
skip_empty_rows = TRUE,
col_types = cols('X7'=col_skip())
)
, -1
)
# Add Column Headers
col_headings <- c('Col1', 'Col2', 'Col3', 'Col4', 'Col5', 'Col6', 'Col7', 'Col8')
names(df) <- col_headings
# Connect to SQL Server
con <- dbConnect(odbc(), "SQL", timeout = 10)
# Append data
dbAppendTable(conn = con,
schema = "tmp",
name = "test",
value = df,
row.names = NULL)
I'm getting this error message:
> Error in result_describe_parameters(rs#ptr, fieldDetails) :
> Query requires '8' params; '18' supplied.
I ran into this issue also. I agree with Hayward Oblad, the dbAppendTable function appears to be finding another table of the same name throwing the error. Our solution was to specify the name parameter as an Id() (from DBI::Id())
So taking your example above:
# Append data
dbAppendTable(conn = con,
name = Id(schema = "tmp", table = "test"),
value = df,
row.names = NULL)
Ran into this issue...
Error in result_describe_parameters(rs#ptr, fieldDetails) : Query
requires '6' params; '18' supplied.
when saving to a snowflake database and couldn't find any good information on the error.
Turns out that there was a test schema where the tables within the schema had exactly the same names as in the prod schema. DBI::dbAppendTable() doesn't differentiate the schemas, so until those tables in the text schema got renamed to unique table names, the params error persisted.
Hope this saves someone the 10 hours I spent trying to figure out why DBI was throwing the error.
See he for more on this.
ODBC/DBI in R will not write to a table with a non-default schema in R
add the name = Id(schema = "my_schema", table = "table_name") to DBI::dbAppendTable()
or in my case it was the DBI::dbWriteTable().
Not sure why the function is not using the schema from my connection object though.. seems redundant.
I'm having some trouble to read big csv files with my R, so i'm trying to use the package sqldf to read just some column or lines from the csv.
I tried this:
test <- read.csv.sql("D:\\X17065382\\Documents\\cad\\2016_mar\\2016_domicilio_mar.csv", sql = "select * from file limit 5", header = TRUE, sep = ",", eol = "\n")
but i got this problem:
Error in connection_import_file(conn#ptr, name, value, sep, eol, skip) : RS_sqlite_import: D:\X17065382\Documents\cad\2016_mar\2016_domicilio_mar.csv line 198361 expected 1 columns of data but found 2
If you're not too fussy about which package you use, data.table has a great function for doing just what you need
library(data.table)
file <- "D:\\X17065382\\Documents\\cad\\2016_mar\\2016_domicilio_mar.csv"
fread(file, nrows = 5)
Like Shinobi_Atobe said, the fread() function from data.table is working really well. If you prefer to use base R you could also use : read.csv() or read.csv2().
i.e.:
read.csv2(file_path, nrows = 5)
Also what do you mean by "big files" ? 1GB, 10GB, 100GB?
This works for me.
require(sqldf)
df <- read.csv.sql("C:\\your_path\\CSV1.csv", "select * from file where Name='Asher'")
df
I'm running the database query by using sqldf in Shiny in R. But getting error.
ui.R:
observeEvent (input$uploadForTest_1, {
inFile=input$uploadForTest_1
inFileName=input$uploadForTest_1$name
file <-"tss.txt"
tmp = paste("audio/street", inFileName, sep = "/")
res <- read.csv.sql(file,header=FALSE,sql = "select * from file where V1=tmp",sep="\t")
print(res)
})
I'm successfully running the following query:
res <- read.csv.sql(file,header=FALSE,sql = "select * from file where V1='audio/street/b098.wav'",sep="\t")
But, if I run the query that is mentioned in ui.R it is giving me error that tmp column doesn't exists:
Warning: Error in result_create: no such column: tmp 86:
I dont want to use string in my query. I want to use variable name. Because I don't want to hard code string in query. Can I use variable name in query instead of string. If yes, then how can I do this? I didn't find solution to my problem in Internet. Thanks.
Preface read.csv.sql with fn$, and use '$tmp' in the SQL statement.
fn$read.csv.sql(file, sql = "select * from file where V1 = '$tmp'",
header = FALSE, sep = "\t")
See ?fn and the gsubfn vignette for more info. Note that sqldf automatically loads the gsubfn package so it will already be available.
You could use sprintf. Another option would be to paste together a string, but I find sprintf far more elegant for this task.
> tmp <- "audio/street/somefile.txt"
> tmp <- "audio/street/somefile.txt"
> "select * from file where V1=tmp"
[1] "select * from file where V1=tmp"
> sprintf("select * from file where V1='%s'", tmp)
[1] "select * from file where V1='audio/street/somefile.txt'"
I am working on a program that pulls data out of .mdb and .accdb files and creates the appropriate tables in R.
My working program on my Mac looks like this:
library(Hmisc)
p <- '/Users/Josh/Desktop/Directory/'
mdbfilename <- 'x.mdb'
mdbconcat <- paste(p, mdbfilename, sep = "")
mdb <- mdb.get(mdbconcat)
mdbnames <- data.frame(mdb.get(mdbconcat, tables = TRUE))
list2env(mdb, .GlobalEnv)
accdbfilename <- 'y.accdb'
accdbconcat <- paste(p, accdbfilename, sep = '')
accdb <- mdb.get(accdbconcat)
accdbnames <- data.frame(mdb.get(accdbconcat, tables = TRUE))
list2env(accdb, .GlobalEnv)
This works fine on my Mac, but on the PC I'm developing this for, I get this error message:
Error in system(paste("mdb-tables -1", file), intern = TRUE) :
'mdb-tables' not found
I've thought a lot about using RODBC, but this program allows me to have the tables arranged in a way where subsequent querying and dplyr functions work. Is there any way to get these function to work on a Windows machine?
This is driving me mad.
I have a csv file "hello.csv"
a,b
"drivingme,mad",1
I just want to convert this into a sqlite database from within R (I need to do this because the actual file is actually 10G and it won't fit into a data.frame, so I will use Sqlite as an intermediate datastore)
dbWriteTable(conn= dbConnect(SQLite(),
dbname="c:/temp/data.sqlite3",
name="data",
value="c:/temp/hello.csv",row.names=FALSE, header=TRUE)
The above code failed with error
Error in try({ :
RS-DBI driver: (RS_sqlite_import: c:/temp/hello.csv line 2 expected 2 columns of data but found 3)
In addition: Warning message:
In read.table(fn, sep = sep, header = header, skip = skip, nrows = nrows, :
incomplete final line found by readTableHeader on 'c:/temp/hello.csv'
How do I tell it to treat comma (,) within a quote "" is to be treat as string and not a separator!
I tried adding in the argument
quote="\""
But it didn't work. Help!! read.csv work just file it will fail when reading large files.
Update
A much better now is to use readr's chunked functions e.g.
#setting up sqlite
con_data = dbConnect(SQLite(), dbname="yoursqlitefile")
readr::read_delim_chunked(file, function(chunk) {
dbWriteTable(con_data, chunk, name="data", append=TRUE )) #write to sqlite
})
Original more cumbuersome way
One way to do this is to read from the file since read.csv works but it just cannot load the whole data into memory.
n = 100000 # experiment with this number
f = file(csv)
con = open(f) # open a connection to the file
data <-read.csv(f,nrows=n,header=TRUE)
var.names = names(data)
#setting up sqlite
con_data = dbConnect(SQLite(), dbname="yoursqlitefile")
while(nrow(data) == n) { # if not reached the end of line
dbWriteTable(con_data, data, name="data",append=TRUE )) #write to sqlite
data <-read.csv(f,nrows=n,header=FALSE))
names(data) <- var.names
}
close(f)
if (nrow(data) != 0 ) {
dbWriteTable(con_data, data, name="data",append=TRUE ))
Improving the proposed answer:
data_full_path <- paste0(data_folder, data_file)
con_data <- dbConnect(SQLite(),
dbname=":memory:") # you can also store in a .sqlite file if you prefer
readr::read_delim_chunked(file = data_full_path,
callback =function(chunk,
dummyVar # https://stackoverflow.com/a/42826461/9071968
) {
dbWriteTable(con_data, chunk, name="data", append=TRUE ) #write to sqlite
},
delim = ";",
quote = "\""
)
(The other, current answer with readr does not work: parentheses are not balanced, the chunk function requires two parameters, see https://stackoverflow.com/a/42826461/9071968)
You make a parser to parse it.
string = yourline[i];
if (string.equals(",")) string = "%40";
yourline[i] = string;
or something of that nature. You could also use:
string.split(",");
and rebuild your string that way. That's how I would do it.
Keep in mind that you'll have to "de-parse" it when you want to get the values back. Commas in SQL mean column, so it can really screw things up, not to mention JSONArrays or JSONObjects.
Also keep in mind that this might be very costly for 10GB of data, so you might want to start by parsing the input before it even gets to the CSV if possible..