ODBC Connection error for merging files in R - r

I am trying to read excel files using odbcConnectExcel2007 function in R from RODBC package. While reading individual file, it's working. But when I am trying to run using for loop function, it's throwing following error
3 stop(sQuote(tablename), ": table not found on channel")
2 odbcTableExists(channel, sqtable)
1 sqlFetch(conn1, sqlTables(conn1)$TABLE_NAME[1])
Below is the code:-
file_list <- list.files("./Raw Data")
file_list
for (i in 1:length(file_list)){
conn1 = odbcConnectExcel2007(paste0("./Raw Data/",file_list[i])) # open a connection to the Excel file
sqlTables(conn1)$TABLE_NAME
data=sqlFetch(conn1, sqlTables(conn1)$TABLE_NAME[1])
close(conn1)
data <- data[,c("Branch","Custome","Category","Sub Category","SKU"
"Weight","Order Type","Invoice Date")]
if(i==1) alldata=data else{
alldata = rbind(alldata,data)
}
}
I would appreciate any kind of help.
Thanks in advance.

I think it's getting messed up with the table name having quotes returned from the sqlTables(conn1)$TABLE_NAME object. Try manipulating the table name by removing the quotes. Something like this:
table <- sqlTables(conn1)$TABLE_NAME
table <- noquote(table)
table <- gsub("\'", "", table)
And then just do:
data=sqlFetch(conn1, table)

Related

Reading zip files using fread

I tried to call a zip file using fread as like this
data<-("www/608.zip")
test<- fread('gunzip -cq data')
It showed this error does not exist or is non-readable
But it will work if I call
test<- fread('gunzip -cq www/608.zip')
On my script each time value of data will change so I used If command for choosing data as like this
data<-reactive({
if (input$list == 'all')
{
"www/6.zip"
}
else{
if (input$list == 'hkj')
{
"www/6.zip"
}
I think it should work as follows:
data <- "www/608.zip"
test <- fread(cmd = paste("gunzip -cq", data))
i.e. you have to create a command string with paste() first and then pass it as cmd argument to fread().
If you want to read the file path you can use paste0 to create the string
data <- "www/608.zip"
test <- fread(cmd = paste0("gunzip -cq ", data))
fread suggest to use cmd argument for security reasons.
We can also use glue
data <- "www/608.zip"
fread(cmd = glue::glue("gunzip -cq {data}"))

How do I name file downloads in R using data from another column in dataframe?

I have a large dataset of unique file IDs and links to download the files. It looks like this:
file_id <- c("id:fghjs12:ws8c7/syx", "id:f7gnsfu:7a6#*s", "id:dug:shxgcvu:6sh")
link <- c("https://www.dynare.org/wp-repo/dynarewp028.pdf", "https://www.dynare.org/wp-repo/dynarewp029.pdf", "https://www.dynare.org/wp-repo/dynarewp020.pdf")
df <- data.frame(file_id, link, stringsAsFactors = FALSE)
I want to download each file using the name of the handle. Some of the links are broken. So I have the following loop to do the task but it's not working..
download_documents <- function(url, file_id) {
tryCatch(
{download.file(url, paste0('~/Desktop/Dataset/files/', file_id))},
error = function(e) {NA},
warning = function(w) {NA})
}
Map(download_documents, df$link, df$file_id)
Does anyone know what I'm doing wrong or have a better solution? Thanks in advance for your help!
You can turn the file_id to valid names using make.names.
Map(download_documents, df$link, make.names(df$file_id))

How to pull and loop the data in R studio?

It would be great if some one could help on below requirement.
My requirement is pull the data from Hive table based on "Fiscal Quarter" and load it to txt file. Process should be like loop, i would expect 3 txt files (FY19Q1_Txtfile1.txt/FY19Q2_Txtfile2.txt/FY19Q3_Txtfile3.txt) with 3 iterations.
Once your table is stored as a data.frame on R, named data for example, you can do that :
write.csv(data[data$Fiscal_Quarter == 'FY19Q1'], 'FY19Q1_Txtfile1.txt')
write.csv(data[data$Fiscal_Quarter == 'FY19Q2'], 'FY19Q2_Txtfile3.txt')
write.csv(data[data$Fiscal_Quarter == 'FY19Q3'], 'FY19Q3_Txtfile3.txt')
And if you want to use a loop instead :
for (i in 1:3){
file_name = paste('FY19Q', i, '_Txtfile', i, '.txt',sep="")
FQ = paste('FY19Q', i, sep="")
write.csv(data[data$Fiscal_Quarter == FQ], file_name)
}
I hope this answers the question.

RPostgreSQL loading multiple CSV files into an Postgresql table

I'm new at using Postgresql, and I'm having trouble populating a table I created with multiple *.csv files. I was working first in pgAdmin4, then I decide to work on RPostgreSQL as R is my main language.
Anyway, I am dealing (for now) with 30 csv files located in one folder. All have the same headers and general structure, for instance:
Y:/Clickstream/test1/video-2016-04-01_PARSED.csv
Y:/Clickstream/test1/video-2016-04-02_PARSED.csv
Y:/Clickstream/test1/video-2016-04-03_PARSED.csv
... and so on.
I tried to load all csv files by using a following the RPostgresql specific answer from Parfait. Sadly, it didn't work. My code is specified below:
library(RPostgreSQL)
dir = list.dirs(path = "Y:/Clickstream/test1")
num = (length(dir))
psql.connection <- dbConnect(PostgreSQL(),
dbname="coursera",
host="127.0.0.1",
user = "postgres",
password="xxxx")
for (d in dir){
filenames <- list.files(d)
for (f in filenames){
csvfile <- paste0(d, '/', f)
# IMPORT USING COPY COMMAND
sql <- paste("COPY citl.courses FROM '", csvfile , "' DELIMITER ',' CSV ;")
dbSendQuery(psql.connection, sql)
}
}
# CLOSE CONNNECTION
dbDisconnect(psql.connection)
I'm not understanding the error I got:
Error in postgresqlExecStatement(conn, statement, ...) :
RS-DBI driver: (could not Retrieve the result : ERROR: could not open file
" Y:/Clickstream/test1/video-2016-04-01_PARSED.csv " for reading: Invalid
argument
)
If I'm understanding correctly, there is an invalid argument in the name of my first file. I'm not very sure about it, but again I am recently using PostgreSQL and this RPostgreSQL in R. Any help will be much appreciated.
Thanks in advance!
Edit: I found the problem, but cannot solve it for some reason. When I copy the path while in the for loop:
# IMPORT USING COPY COMMAND
sql <- paste("COPY citl.courses FROM '",csvfile,"' DELIMITER ',' CSV ;")
I have the following result:
sql
[1] "COPY citl.courses FROM ' Y:/Clickstream/test1/video-2016-04-01_PARSED.csv ' DELIMITER ',' CSV ;"
This means that the invalid argument is the blank space between the file path. I've tried to change this unsuccessfully. Any help will be deeply appreciated!
Try something like this
Files <- list.files("Y:/Clickstream/test1", pattern = "*.csv", full.names = TRUE)
CSVs <- lapply(Files, read.csv)
psql.connection <- dbConnect(PostgreSQL(),
dbname="coursera",
host="127.0.0.1",
user = "postgres",
password="xxxx")
for(i in 1:length(Files)){
dbWriteTable(psql.connection
# schema and table
, c("citl", "courses")
, CSVs[i]
, append = TRUE # add row to bottom
, row.names = FALSE
)
}

How to import CSV into Sqlite in R where one of the variables has a comma (,) within quotes?

This is driving me mad.
I have a csv file "hello.csv"
a,b
"drivingme,mad",1
I just want to convert this into a sqlite database from within R (I need to do this because the actual file is actually 10G and it won't fit into a data.frame, so I will use Sqlite as an intermediate datastore)
dbWriteTable(conn= dbConnect(SQLite(),
dbname="c:/temp/data.sqlite3",
name="data",
value="c:/temp/hello.csv",row.names=FALSE, header=TRUE)
The above code failed with error
Error in try({ :
RS-DBI driver: (RS_sqlite_import: c:/temp/hello.csv line 2 expected 2 columns of data but found 3)
In addition: Warning message:
In read.table(fn, sep = sep, header = header, skip = skip, nrows = nrows, :
incomplete final line found by readTableHeader on 'c:/temp/hello.csv'
How do I tell it to treat comma (,) within a quote "" is to be treat as string and not a separator!
I tried adding in the argument
quote="\""
But it didn't work. Help!! read.csv work just file it will fail when reading large files.
Update
A much better now is to use readr's chunked functions e.g.
#setting up sqlite
con_data = dbConnect(SQLite(), dbname="yoursqlitefile")
readr::read_delim_chunked(file, function(chunk) {
dbWriteTable(con_data, chunk, name="data", append=TRUE )) #write to sqlite
})
Original more cumbuersome way
One way to do this is to read from the file since read.csv works but it just cannot load the whole data into memory.
n = 100000 # experiment with this number
f = file(csv)
con = open(f) # open a connection to the file
data <-read.csv(f,nrows=n,header=TRUE)
var.names = names(data)
#setting up sqlite
con_data = dbConnect(SQLite(), dbname="yoursqlitefile")
while(nrow(data) == n) { # if not reached the end of line
dbWriteTable(con_data, data, name="data",append=TRUE )) #write to sqlite
data <-read.csv(f,nrows=n,header=FALSE))
names(data) <- var.names
}
close(f)
if (nrow(data) != 0 ) {
dbWriteTable(con_data, data, name="data",append=TRUE ))
Improving the proposed answer:
data_full_path <- paste0(data_folder, data_file)
con_data <- dbConnect(SQLite(),
dbname=":memory:") # you can also store in a .sqlite file if you prefer
readr::read_delim_chunked(file = data_full_path,
callback =function(chunk,
dummyVar # https://stackoverflow.com/a/42826461/9071968
) {
dbWriteTable(con_data, chunk, name="data", append=TRUE ) #write to sqlite
},
delim = ";",
quote = "\""
)
(The other, current answer with readr does not work: parentheses are not balanced, the chunk function requires two parameters, see https://stackoverflow.com/a/42826461/9071968)
You make a parser to parse it.
string = yourline[i];
if (string.equals(",")) string = "%40";
yourline[i] = string;
or something of that nature. You could also use:
string.split(",");
and rebuild your string that way. That's how I would do it.
Keep in mind that you'll have to "de-parse" it when you want to get the values back. Commas in SQL mean column, so it can really screw things up, not to mention JSONArrays or JSONObjects.
Also keep in mind that this might be very costly for 10GB of data, so you might want to start by parsing the input before it even gets to the CSV if possible..

Resources