Insert in RMySQL from data frame - r

Im trying to add data to MySQL table by using RMySQL. I only need to add one row at a time and it's not working. What I'm trying to do is this.
dbGetQuery(con,"INSERT INTO names VALUES(data[1,1], data[1,2])")
so what I'm doing is that I have values in data frame that is named as "data" and I need to put them into mysql table. before that I will check them if they are already in the table or not and if they are not then I will add them, but that way it isn't working. The data is read from .csv file by read.csv .

You can use paste to construct that actual query.
dat <- matrix(1:4, 2, 2)
query <- paste("INSERT INTO names VALUES(",data[1,1], ",", data[1,2], ")")
query
#[1] "INSERT INTO names VALUES( 1 , 3 )"
dbGetQuery(con, query)
# If there are a lot of columns this could be tedious...
# So we could also use paste to add all the values at once.
query <- paste("INSERT INTO names VALUES(", paste(data[1,], collapse = ", "), ")")
query
#[1] "INSERT INTO names VALUES( 1, 3 )"

You could try with:
dbWriteTable(names, data[1,],append=True)
as the DBI package details

Related

How to retrieve data from oracle to R in a faster way than this?

Here is the data frame that i have
trail_df= data.frame(d= seq.Date(as.Date("2020-01-01"), as.Date("2020-02-01"), by= 1),
AA= NA,
BB= NA,
CC= NA)
Now I would loop to the columns of trail_df and get the data of the column names respectively from the oracle database for the given date, which I am doing like this.
for ( i in 2:ncol(trail_df)){
c_name = colnames(trail_df)[i]
query = paste0("SELECT * FROM tablename WHERE ID= '",c_name,"' ") # this query would return Date and price
result= dbGetQuery(con, query) # con is the connection variable from db
for (k in nrow(trail_df)){
trail_df [which(as.Date(result[k,1])==as.Date(trail_df[,1])),i]= result[k,2]
# just matching the date in trail_df dataframe and pasting the value in front of respective column
}
}
this is the snippet of the code and the dates filtering and all has been taken care of in real code.
The problem is, I have more than 6000 columns and 500 rows, for which I have to match the dates(
BECAUSE THE DATES ARE RANDOM) and put the price in front, which is taking like forever now.
I am new in the R language and would appreciate any help which would fasten this code maybe multiprocess if possible in R.
There are two steps to this answer:
Use parameterized queries to get the raw data; and
Get this data into the "wide" format you desire.
Parameterized query
My (first) suggestion is to use parameterized queries, which is safer. It may not improve the speed relative to #RonakShah's answer (using sprintf), at least not on the first time.
However, it might help a touch if the query is repeated: DBMSes tend to parse/optimize queries and cache this optimization. When a query changes even a little, this caching cannot happen, and the query is re-optimized. In this case, this cache-invalidation is unnecessary, and can be avoided if we use binding parameters.
query <- sprintf("SELECT * FROM tablename WHERE ID IN (%s)",
paste(rep("?", ncol(trail_df[-1])), collapse = ","))
query
# [1] "SELECT * FROM tablename WHERE ID IN (?,?,?)"
res <- dbGetQuery(con, query, params = list(trail_df$ID))
Some thoughts:
if the database has many more dates than what you have here, you can restrict the data returned by reducing the date range queries. This will work well if your trail_df dates are close together:
query <- sprintf("SELECT * FROM tablename WHERE ID IN (%s) and Date between ? and ?",
paste(rep("?", ncol(mtcars)), collapse = ","))
query
res <- dbGetQuery(con, query, params = c(list(trail_df$ID), as.list(range(df$d))))
if your dates are more variable and you end up querying many more rows than you actually need, I suggest you can upload your trail_df dates into a temporary table and something like:
"select tb.Date, tb.ID, tb.Price
from mytemptable tmp
left join tablename tb on tmp.d = tb.Date
where ..."
Reshape
It appears as if your database table may be more "long" shaped and you want it "wide" in your frame. There are many ways to reshape from long-to-wide (examples), but these should work:
reshape2::dcast(res, Date ~ ID, value.var = "Price") # 'Price' is the 'value' column, unk here
tidyr::pivot_wider(res, id_cols = "Date", names_from = "ID", values.from = "Price")

dbWriteTable with geometry (point) type to MariaDB

Working with sf objects in R, as well as tables in MariaDB with geometry (in this case point) columns, I'm struggling to efficiently move data between the two (sf object to MariaDB table and vice-versa).
Note I'm using the RMariaDB package to connect to MariaDB, and have defined my connection here as consdb.
Example data:
library(sf)
pnt <- data.frame( name = c("first", "second"),
lon = c(145, 146),
lat = c(-38, -39) )
pnt <- st_as_sf( pnt, coords = c("lon", "lat") )
Trying to write sf object directly
Ideally, I'd like to be able to write sf objects like this directly to MariaDB using dbWriteTable or dbAppendTable. At the moment, that gives what look like compatibility errors.
If I try with dbWriteTable:
dbWriteTable(consdb, "temp", pnt, temporary=TRUE, overwrite=TRUE)
# Error in result_bind(res#ptr, params) : Cannot get geometry object from data you send to the GEOMETRY field [1416]
Or by creating the table first:
dbExecute(consdb, "CREATE OR REPLACE TEMPORARY TABLE temp (name VARCHAR(10), geometry POINT)")
dbAppendTable(consdb, "temp", pnt)
# Error in result_bind(res#ptr, params) : Unsupported column type list
Trying to convert to point type on insert
If I were inserting with a SQL insert query, I'd use PointFromText like so
INSERT INTO temp (name, geometry) VALUES ('new point', PointFromText('POINT(145 38)', 4326));
So I tried using that to send the data as a string. I wrote a couple of functions to convert the sf geometry column into an appropriate string column:
# to convert 1 value
point_to_text <- function(x, srid = 4326) {
sprintf("PointFromText('POINT(%f %f)', %i)", x[1], x[2], srid)
}
# to apply the above over a whole column
points_to_text <- function(x, srid = 4326) {
vapply(x, point_to_text, srid = srid, NA_character_)
}
Used that to turn the sf object into a data.frame
for_sql <- data.frame(pnt)
for_sql$geometry <- points_to_text(for_sql$geometry)
The geometry column is now a character column like: PointFromText('POINT(145.000000 -38.000000)', 4326)
Using dbWriteTable would just create a text column so I try creating the table, then using dbAppendTable:
dbExecute(consdb, "CREATE OR REPLACE TEMPORARY TABLE temp (name VARCHAR(10), geometry POINT)")
dbAppendTable(consdb, "temp", pnt)
# Error in result_bind(res#ptr, params) : Cannot get geometry object from data you send to the GEOMETRY field [1416]
Something that works, but seems silly
I can get this to work if I create a temporary SQL table, change the column to text, insert the data from R, convert the column in SQL, then append that to the original SQL table. It seems ridiculously convoluted, but just to show that it works:
# create temporary table
dbExecute(consdb, "CREATE OR REPLACE TEMPORARY TABLE temp_geom LIKE temp")
# change the geometry column to text
dbExecute(consdb, "ALTER TABLE temp_geom MODIFY COLUMN geometry TEXT")
# add the data to the temporary table
dbAppendTable(consdb, "temp_geom", for_sql)
# add a new point column
dbExecute(consdb, "ALTER TABLE temp_geom ADD COLUMN geom_conv POINT")
# convert strings to points
dbExecute(consdb, "UPDATE temp_geom SET geom_conv = PointFromText(geometry, 4326)")
# drop the old column and replace it with the new one
dbExecute(consdb, "ALTER TABLE temp_geom DROP COLUMN geometry")
dbExecute(consdb, "ALTER TABLE temp_geom CHANGE COLUMN geom_conv geometry POINT")
# append the data from the temporary table to the main one
dbExecute(consdb, "INSERT INTO temp SELECT * FROM temp_geom")
Are there any solutions others use for this, or anything that might solve the problem of passing data between sf objects and MariaDB tables?
EDIT TO ADD: As per comment from #SymbolixAU, I've now tried the following
st_write(
obj=pnt, # the sf class object, as created above
dsn=consdb, # the MariaDB connection
layer="temp", # the table name on MariaDB
append=TRUE,
layer_options=c('OVERWRITE=false', 'APPEND=true')
)
# Error in result_bind(res#ptr, params) :
Cannot get geometry object from data you send to the GEOMETRY field [1416]
I've come up with a bit of a hacky solution to this. It's not ideal, but I think it might be good enough.
Since the sf package can convert a geometry column to a WKT string with st_as_text, and MariaDB can do the reverse with ST_GeomFromText, I can use those to get things working.
One problem is that the function call I want to pass to MariaDB (something like ST_GeomFromText('POINT(1 2)') can't be passed to any of the usual table write functions like dbAppendTable because they convert the function call to a text string (I assume by enquoting it), so I have to create my own insert query and call it with dbExecute.
Here's the function I've come up with, the intention of which is to play the role of dbAppendTable, when the object to write is an sf object.
sf_dbAppendTable <- function(conn, name, value, srid = 4326) {
# convert the geometry columns to MariaDB function calls
sfc_cols <- vapply(value, inherits, NA, "sfc")
for(col in which(sfc_cols)) {
value[[col]] <- sprintf(
"ST_GeomFromText('%s', %i)",
sf::st_as_text( value[[col]] ),
srid
)
}
# when inserting to sql, surround some values in quotes, except a few types
# specifically exclude the geometry columns from this
cols_to_enquote <- vapply(value, function(x) {
if (inherits(x, "logical")) return( FALSE )
if (inherits(x, "integer")) return( FALSE )
if (inherits(x, "double")) return( FALSE )
return( TRUE )
}, NA) & !sfc_cols
# set aside column names
col_names <- names(value)
# convert to a matrix
value <- as.matrix(value)
# it should be character
if (typeof(value) != "character") value <- as.character(value)
# enquote the columns that need it, except for `NA` values, replace with `NULL`
value[ , which(cols_to_enquote) ] <- ifelse(
is.na(value[ , which(cols_to_enquote) ]),
"NULL",
paste0("'", value[ , which(cols_to_enquote) ], "'")
)
# any `NA` values still remaining, also replace with `NULL`
value[ is.na(value) ] <- "NULL"
# create a single insert query
sql_query <- sprintf(
"INSERT INTO %s (%s) VALUES (%s);",
name,
paste(col_names, collapse = ","),
paste(apply(value, 1, paste, collapse = ","), collapse = "),(")
)
# execute the query
dbExecute(conn, sql_query)
}
This seems to be working for me, but I'm sure it's nowhere near as robust or efficient as something like dbAppendTable would be. For one thing I'm using a single query string which won't work well for large queries, and won't be as efficient as the LOAD DATA INFILE method some packages manage to leverage.
If anyone has a better solution, I'd still love to hear it.

Inserting Data into an Oracle Table

I am very new to R, so please forgive any obvious or naive errors. I need to insert multiple rows of data from R into an Oracle database table.
Make the data frame (I have made the RJDBC connection earlier in the script):
df <- data.frame("field_1" = 1:2, "field_2" = c("f","k"), "field_3"= c("j","t"))
This code runs without error, but inserts only the first row into the table:
insert <- sprintf("insert into temp_r_test_u_suck values (%s')",
apply(df, 1, function(i) gsub(" ", "", paste("'", i, collapse="',"), fixed = TRUE)))
dbSendUpdate(con, insert)
This code runs:
insert <- sprintf("into temp_r_test_u_suck values (%s')",
apply(df, 1, function(i) gsub(" ", "", paste("'", i, collapse="',"), fixed = TRUE)))
insert_all <- c("insert all", insert, "select * from dual")
dbSendUpdate(con, insert_all)
But gives me this error:
Error in .local(conn, statement, ...) :
execute JDBC update query failed in dbSendUpdate (ORA-00905: missing keyword
Both of the queries work on their own in Oracle. WHAT am I doing wrong?
Thank you!
Multiple SQL statements are not supported in dbGetQuery, dbSendQuery, dbSendUpdate calls. You need to iterate through them for each statement. Hence, why only the first statement processes. To resolve, extend the anonymous function inside apply to call dbSendUpdate:
apply(df, 1, function(i) {
# BUILD SQL STATEMENT
insert <- sprintf("insert into temp_r_test_u_suck values (%s')",
paste0("'", i, collapse="',"))
# RUN QUERY
dbSendUpdate(con, insert)
})
However, RJDBC extends the DBI standard by supporting parameterization with dbSendUpdate as mentioned in rForge docs for bulk-inserts with no need for iteratively concatenating strings.
dbSendUpdate(conn, statement, ...) This function is analogous to
dbSendQuery, but works with DBML statements and thus doesn't return a
result set. It is more efficient than dbSendQuery. In addition, as of
RJDBC 0.2-9 it supports vectors in prepared statements which allows
bulk-inserts.
# ALL CHARACTER DATAFRAME
df <- data.frame(field_1=as.character(1:2), field_2=c("f","k"), field_3=c("j","t"),
stringsAsFactors=FALSE)
# PREPARED STATEMENT
sql <- "insert into temp_r_test_u_suck values (?, ?, ?)"
# RUN QUERY
dbSendUpdate(con, sql, df$field_1, df$field_2, df$field_3)

Use values in df column to create a sql query

I would like to take the values from a data frame and paste them into a text string that can be used as a sql query. In SAS I would do it
proc sql noprint; Select Names into :names separated by ", " from df; quit;
this would create a variable &name storing all the names. Like: Id, Name, Account. I would like to do this same type of thing in R, but do not know how. I can create a vector with names separated by comma and each one is surrounded by quotes, I can take away the quotes using noquote function and have them in a vector, but I can not get the elements in another paste statement to add the "SELECT" and FROM. I can not get it to all paste. Is there a way to pull the values on Column and create a text string that can be used as a SQL query inside R? Here is what I have tried in R:
name = c("Id", "IsDeleted", "Name", "Credit__Loan__c")
label = c("Record Id", "Deleted", "ID", "Loan")
df = data.frame(name, label)
names(df) <- c("name", "label")
as.query.fields = noquote(paste(df$name, collaspe=", "))
as.query.final <- paste("SELECT " , noquote(paste(df$name, collaspe=", ")), " id FROM Credit_Amortization_Schedule__c")
data(iris)
colnames(iris)
a <- noquote(paste(colnames(iris), collaspe=", "))
as.query.final <- cat("SELECT " , a, " id FROM Credit_Amortization_Schedule__c")
The result is:
SELECT Sepal.Length , Sepal.Width , Petal.Length , Petal.Width , Species , id FROM Credit_Amortization_Schedule__c
which you can then use with SQL like this:
require(RODBC)
result <- sqlQuery(db, as.query.final)
where db is your database connection
Or, since I see your sqldf tag now, if you want to use sqldf it's just:
sqldf(as.query.final)
The gsubfn package supports string interpolation:
library(gsubfn)
Names <- toString( sprintf("%s '%s'", df$name, df$label) )
fn$identity("select $Names from myTable")
giving:
[1] "select Id 'Record Id', IsDeleted 'Deleted', Name 'ID', Credit__Loan__c 'Loan' from myTable"
Here some additional examples: SO example 1 and SO example 2 .

Pass R variable to RODBC's sqlQuery with multiple entries?

I'm in the process of learning R, to wave SAS goodbye, I'm still new to this and I somehow have difficulties finding exactly what I'm looking for.
But for this specific case, I read:
Pass R variable to RODBC's sqlQuery?
and made it work for myself, as long as I'm only inserting one variable in the destination table.
Here is my code:
library(RODBC)
channel <- odbcConnect("test")
b <- sqlQuery(channel,
"select top 1 Noinscr
FROM table
where PrixVente > 100
order by datevente desc")
sqlQuery(channel,
paste("insert into TestTable (UniqueID) Values (",b,")", sep = "")
When I replace the top 1 by any other number, let's say top 2, and run the exact same code, I get the following errors:
[1] "42000 195 [Microsoft][SQL Server Native Client 10.0][SQL Server]
'c' is not a recognized built-in function name."
[2] "[RODBC] ERROR: Could not SQLExecDirect
'insert into TestTable (UniqueID) Values (c(8535735, 8449336))'"
I understand that it is because there is an extra c that is generated, I assume for column when I give the command: paste(b).
So how can I get "8535735, 8449336" instead of "c(8535735, 8449336)" when using paste(b)? Or is there another way to do this?
Look into the collapse argument in the paste() documentation. Try replacing b with paste(b, collapse = ", "), as shown below.
Edit As Joshua points out, sqlQuery returns a data.frame, not a vector. So, instead of paste(b, collapse = ", "), you could use paste(b[[1]], collapse = ", ").
library(RODBC)
channel <- odbcConnect("test")
b <- sqlQuery(channel,
"select top 1 Noinscr
FROM table
where PrixVente > 100
order by datevente desc")
sqlQuery(channel,
## note paste(b[[1]], collapse = ", ") in line below
paste("insert into TestTable (UniqueID) Values (", paste(b[[1]], collapse = ", "),")", sep = "")
Assuming b looks like this:
b <- data.frame(Noinscr=c("8535735", "8449336"))
Then you only need a couple steps:
# in case Noinscr is a factor
b$Noinscr <- as.character(b$Noinscr)
# convert the vector into a single string
# NOTE that I subset to get the vector, since b is a data.frame
B <- paste(b$Noinscr, collapse=",")
# create your query
paste("insert into TestTable (UniqueID) Values (",B,")", sep="")
# [1] "insert into TestTable (UniqueID) Values (8535735,8449336)"
You got odd results because sqlQuery returns a data.frame, not a vector. As you learned, using paste on a data.frame (or any list) can provide weird results because paste must return a character vector.

Resources