ROracle dbWriteTable affirms insertion that didnt happen - r

My goal is to create a table in the database and fill it with data afterwards. This is my code:
library(ROracle)
# ... "con" is the connection string, created in an earlier stage!
# 1 create example
testdata <- data.frame(A = c(1,2,3), B = c(4,5,6))
# 2 create-statement
createTable <- paste0("CREATE TABLE TestTable(", paste(paste(colnames(testdata), c("integer", "integer")), collapse = ","), ")")
# 3 send and execute query
dbGetQuery(con, createTable)
# 4 write example data
dbWriteTable(con, "TestTable", testdata, row.names = TRUE, append = TRUE)
I already suceeded a few times. The table was created and filled.
Now step 4 doesn't work anymore, R returns TRUE after execution of dbWriteTable though. But the table is still empty.
I know this is a vague question, but does anyone have an idea what could be wrong here?

I found the solution for my problem. After creating the table in step 3, you have to commit! After that, the data is written into the table.
library(ROracle)
# ... "con" is the connection string, created in an earlier stage!
# 1 create example
testdata <- data.frame(A = c(1,2,3), B = c(4,5,6))
# 2 create-statement
createTable <- paste0("CREATE TABLE TestTable(", paste(paste(colnames(testdata), c("integer", "integer")), collapse = ","), ")")
# 3 send and execute query
dbGetQuery(con, createTable)
# NEW LINE: COMMIT!
dbCommit(con)
# 4 write example data
dbWriteTable(con, "TestTable", testdata, row.names = TRUE, append = TRUE)

Related

dbAppendTable() error when I try to append data to a local server

I'm just starting my journey with r, so I'm a complete newbie and I can't find anything that will help me solve this.
I have a csv table (random integers in each column) with 9 columns. I read 8 and I want to append them to a sql table with 8 fields (Col1 ... 8, all int's). After uploading the csv into rStudio, it looks right and only has 8 columns:
The code I'm using is:
# Libraries
library(DBI)
library(odbc)
library(tidyverse )
# CSV Files
df = head(
read_delim(
"C:/Data/test.txt",
" ",
trim_ws = TRUE,
skip = 1,
skip_empty_rows = TRUE,
col_types = cols('X7'=col_skip())
)
, -1
)
# Add Column Headers
col_headings <- c('Col1', 'Col2', 'Col3', 'Col4', 'Col5', 'Col6', 'Col7', 'Col8')
names(df) <- col_headings
# Connect to SQL Server
con <- dbConnect(odbc(), "SQL", timeout = 10)
# Append data
dbAppendTable(conn = con,
schema = "tmp",
name = "test",
value = df,
row.names = NULL)
I'm getting this error message:
> Error in result_describe_parameters(rs#ptr, fieldDetails) :
> Query requires '8' params; '18' supplied.
I ran into this issue also. I agree with Hayward Oblad, the dbAppendTable function appears to be finding another table of the same name throwing the error. Our solution was to specify the name parameter as an Id() (from DBI::Id())
So taking your example above:
# Append data
dbAppendTable(conn = con,
name = Id(schema = "tmp", table = "test"),
value = df,
row.names = NULL)
Ran into this issue...
Error in result_describe_parameters(rs#ptr, fieldDetails) : Query
requires '6' params; '18' supplied.
when saving to a snowflake database and couldn't find any good information on the error.
Turns out that there was a test schema where the tables within the schema had exactly the same names as in the prod schema. DBI::dbAppendTable() doesn't differentiate the schemas, so until those tables in the text schema got renamed to unique table names, the params error persisted.
Hope this saves someone the 10 hours I spent trying to figure out why DBI was throwing the error.
See he for more on this.
ODBC/DBI in R will not write to a table with a non-default schema in R
add the name = Id(schema = "my_schema", table = "table_name") to DBI::dbAppendTable()
or in my case it was the DBI::dbWriteTable().
Not sure why the function is not using the schema from my connection object though.. seems redundant.

How to read.transactions in R feom data frame?

I have a following task. I have 50M of transaction rows. I couldnt export it to .txt file but i have connection with my Hive and i created table with transactions:
Transcation_id Item
1 A
1 B
1 C
2 A
2 A
I cannot use
order_trans <- read.transactions(
file = "(...)/trans2019.csv",
format = "single",
header=TRUE,
sep = ",",
cols=c("trans_id","item"),
rm.duplicates = T,
encoding = "UTF-16LE")
because it cuts transactions.
I would likt to do the same but in place of "File" i would like to put my data frame (trans_id,item) but it doesnt work.
I also tried:
trans = as(data.frame,"transactions")
but then apriori algorithm gives me wrong rules
APPLE--> transaction_ID
Can anyone help me with this?
Here are the solutions from the manual page (see '? transactions'):
## example 4: creating transactions from a data.frame with
## transaction IDs and items (by converting it into a list of transactions first)
a_df3 <- data.frame(
TID = c(1,1,2,2,2,3),
item=c("a","b","a","b","c", "b")
)
a_df3
trans4 <- as(split(a_df3[,"item"], a_df3[,"TID"]), "transactions")
trans4
inspect(trans4)
## Note: This is very slow for large datasets. It is much faster to
## read transactions using read.transactions() with format = "single".
## This can be done using an anonymous file.
write.table(a_df3, file = tmp <- file(), row.names = FALSE)
trans4 <- read.transactions(tmp, format = "single",
header = TRUE, cols = c("TID", "item"))
close(tmp)
inspect(trans4)

R: Insert csv-file into database using RJDBC

As RJDBC is the only package I have been able to make work on Ubuntu, I am trying to use it to INSERT a CSV-file into a database.
I can make the following work:
# Connecting to database
library(RJDBC)
drv <- JDBC('com.microsoft.sqlserver.jdbc.SQLServerDriver', 'drivers/sqljdbc42.jar', identifier.quote="'")
connection_string <- "jdbc:sqlserver://blablaserver;databaseName=testdatabase"
ch <- dbConnect(drv, connection_string, "username", "password")
# Inserting a row
dbSendQuery(ch, "INSERT INTO cpr_esben.CPR000_Startrecord (SORTFELT_10,OPGAVENR,PRODDTO,PRODDTOFORRIG,opretdato) VALUES ('TEST', 123, '2012-01-01', '2012-01-01', '2012-01-01')")
The insert works. Next I try to make an INSERT of a CSV-file with the same data, that is separated by the default "tab" and I am working on windows.
# Creating csv
df <- data.frame(matrix(c('TEST', 123, '2012-01-01', '2012-01-01', '2012-01-01'), nrow = 1), stringsAsFactors = F)
colnames(df) <- c("SORTFELT_10","OPGAVENR","PRODDTO","PRODDTOFORRIG","opretdato")
class(df$SORTFELT_10) <- "character"
class(df$OPGAVENR) <- "character"
class(df$PRODDTO) <- "character"
class(df$PRODDTOFORRIG) <- "character"
class(df$opretdato) <- "character"
write.table(df, file = "test.csv", col.names = FALSE, quote = FALSE)
# Inserting CSV to database
dbSendQuery(ch, "INSERT cpr_esben.CPR000_Startrecord FROM 'test.csv'")
Unable to retrieve JDBC result set for INSERT cpr_esben.CPR000_Startrecord FROM 'test.csv' (Incorrect syntax near the keyword 'FROM'.)
Do you have any suggestions to what I am doing wrong, when trying to insert the csv-file? I do not get the Incorrect syntax near the keyword 'FROM' error?
What if you create a statement from your data? Something like:
# Data from your example
df <- data.frame(matrix(c('TEST', 123, '2012-01-01', '2012-01-01', '2012-01-01'), nrow = 1), stringsAsFactors = F)
colnames(df) <- c("SORTFELT_10","OPGAVENR","PRODDTO","PRODDTOFORRIG","opretdato")
class(df$SORTFELT_10) <- "character"
class(df$OPGAVENR) <- "character"
class(df$PRODDTO) <- "character"
class(df$PRODDTOFORRIG) <- "character"
class(df$opretdato) <- "character"
# Formatting rows to insert into SQL statement
rows <- apply(df, 1, function(x){paste0('"', x, '"', collapse = ', ')})
rows <- paste0('(', rows, ')')
# SQL statement
statement <- paste0(
"INSERT INTO cpr_esben.CPR000_Startrecord (",
paste0(colnames(df), collapse = ', '),
')',
' VALUES ',
paste0(rows, collapse = ', ')
)
dbSendQuery(ch, statement)
This should work for any number of rows in your df
RJDBC is built on DBI, which has many useful functions to do tasks like this. What you want is dbWriteTable. Syntax would be:
dbWriteTable(ch, 'cpr_esben.CPR000_Startrecord', df, append = TRUE)
and would replace your write.table line.
I am not that familiar with RJDBC specifically, but I think the issue with your sendQuery is that you are referencing test.csv inside your SQL statement, which does not locate the file that you created with write.table as the scope of that SQL statement is not in your working directory.
Have you tried loading the file directly to the database as below.
library(RJDBC)
drv <- JDBC("connections")
conn <- dbConnect(drv,"...")
query = "LOAD DATA INFILE 'test.csv' INTO TABLE test"
dbSendUpdate(conn, query)
You can also try to include other statements in the end like delimiter for column like "|" for .txt file and "," for csv file.

R Data file not converting to Stata file

I am getting this error. Cannot figure out why? Any advise?
library(foreign)
x <- data.frame(a = "", b = 1, stringsAsFactors = FALSE)
write.dta(x, 'x.dta')
Error in write.dta(x, "x.dta") :
4 arguments passed to .Internal(nchar) which requires 3
The haven package works much better than foreign in this case as it will read strings (including empty strings) as string values.
library( haven )
x <- data.frame( a = "", b = 1, stringsAsFactors = FALSE )
write_dta( x, 'x.dta' )
Alternatively, if you pass parameter a a value when creating the data frame, instead of an empty string, foreign will be fine.
x <- data.frame( a = "a", b = 1, stringsAsFactors = FALSE )
write.dta( x,"y.dta" )
As you're using an older version of Stata, haven is the way to go, as you can specify the version of Stata you wish the dta file to be compatible with.
write_dta( x, 'x.dta', version = 13 )

Read.table and dbWriteTable result in different output?

I'm working with 12 large data files, all of which hover between 3 and 5 GB, so I was turning to RSQLite for import and initial selection. Giving a reproducible example in this case is difficult, so if you can come up with anything, that would be great.
If I take a small set of the data, read it in, and write it to a table, I get exactly what I want:
con <- dbConnect("SQLite", dbname = "R2")
f <- file("chr1.ld")
open(f)
data <- read.table(f, nrow=100, header=TRUE)
dbWriteTable(con, name = "Chr1test", value = data)
> dbListFields(con, "Chr1test")
[1] "row_names" "CHR_A" "BP_A" "SNP_A" "CHR_B" "BP_B" "SNP_B" "R2"
> dbGetQuery(con, "SELECT * FROM Chr1test LIMIT 2")
row_names CHR_A BP_A SNP_A CHR_B BP_B SNP_B R2
1 1 1 1579 SNP-1.578. 1 2097 SNP-1.1096. 0.07223050
2 2 1 1579 SNP-1.578. 1 2553 SNP-1.1552. 0.00763724
If I read in all of my data directly to a table, though, my columns aren't separated correctly. I've tried both sep = " " and sep = "\t", but both give the same column separation
dbWriteTable(con, name = "Chr1", value ="chr1.ld", header = TRUE)
> dbListFields(con, "Chr1")
[1] "CHR_A_________BP_A______________SNP_A__CHR_B_________BP_B______________SNP_B___________R
I can tell that it's clearly some sort of delimination issue, but I've exhausted my ideas on how to fix it. Has anyone run into this before?
*Edit, update:
It seems as though this works:
n <- 1000000
f <- file("chr1.ld")
open(f)
data <- read.table(f, nrow = n, header = TRUE)
con_data <- dbConnect("SQLite", dbname = "R2")
while (nrow(data) == n){
dbWriteTable(con_data, data, name = "ch1", append = TRUE, header = TRUE)
data <- read.table(f, nrow = n, header = TRUE)
}
close(f)
if (nrow(data) != 0){
dbWriteTable(con_data, data, name = "ch1", append = TRUE)
}
Though I can't quite figure out why just writing the table through SQLite is a problem. Possibly a memory issue.
I am guessing that your big file is causing a free memory issue (see Memory Usage under docs for read.table). It would have been helpful to show us the first few lines of chr1.ld (on *nix systems you just say "head -n 5 chr1.ld" to get the first five lines).
If it is a memory issue, then you might try sipping the file as a work-around rather than gulping it whole.
Determine or estimate the number of lines in chr1.ld (on *nix systems, say "wc -l chr1.ld").
Let's say your file has 100,000 lines.
`sip.size = 100
for (i in seq(0,100000,sip.size)) {
data <- read.table(f, nrow=sip.size, skip=i, header=TRUE)
dbWriteTable(con, name = "SippyCup", value = data, append=TRUE)
}`
You'll probably see warnings at the end but the data should make it through. If you have character data that read.table is trying to factor, this kludge will be unsatisfactory unless there are only a few factors, all of which are guaranteed to occur in every chunk. You may need to tell read.table not to factor those columns or use some other method to look at all possible factors so you can list them for read.table. (On *nix, split out one column and pipe it to uniq.)

Resources