I have a Shiny app with which the user selects a table in a SQL database and then selects some columns of this table. Then the app runs this function to get the table:
selectAllGetQuery <- function(conn, columns, table){
columns <- toString(sprintf("[%s]", columns))
query <- sprintf("select %s from %s", columns, table)
print(columns) # ok
print(query) # ok
dbGetQuery(conn, query)
}
One table has a column named ACT_TEMP_°C_. When I select it, the dbGetQuery statement fails: Invalid column name 'ACT_TEMP_°C_'. However, as you can see, I included print(columns) and print(query) in the function, and these statements correctly print ACT_TEMP_°C_ in the R console. So I'm really lost. The function receives the correct name but the name changes to the wrong name in dbGetQuery.
Solved with
dbGetQuery(conn, stringi::stri_enc_tonative(query))
Related
Background
I am using R Studio to connect R to Microsoft SQL Sever Management Studio. I am reading tables into R as follows:
library(sqldf)
library(DBI)
library(odbc)
library(data.table)
TableX <- dbGetQuery(con, statement = "SELECT * FROM [dim1].[dimA].[TableX]")
Which for some tables works fine. However for most tables which have a binary ID variable
the following happens:
TableA <- dbGetQuery(con, statement = "SELECT * FROM [dim1].[dimA].[TableA]")
Error in result_fetch(res#ptr, n) :
nanodbc/nanodbc.cpp:xxx: xxxxx: [Microsoft][ODBC SQL Server Driver]Invalid Descriptor Index
Warning message:
In dbClearResult(rs) : Result already cleared
I figured out that the problem is caused by the first column, which I can select like this:
TableA <- dbGetQuery(con, statement = "SELECT ID FROM [dim1].[dimA].[TableA]")
and looks as follows:
AlwaysLearning mentioned in the comments that this is a recurring problem (1, 2, 3). The query only works when ID is selected last:
TableA <- dbGetQuery(con, statement = "SELECT AEE, ID FROM [dim1].[dimA].[TableA]")
Updated Question
The question is essentially how I can read in the table with the ID variable last, without specifying all table variables each time (because this would be unworkable).
Possible Workaround
I thought a work around could be to select ID as an integer:
TableA <- dbGetQuery(con, statement = "SELECT CAST(ID AS int), COL2 FROM [dim1].[dimA].[TableA]")
However how do I select the whole table in this case?
I am an SQL beginner, but I thought I could solve it by using something like this (from this link):
TableA <- dbGetQuery(con, statement = "SELECT * EXCEPT(ID), SELECT CAST(ID AS int) FROM [[dim1].[dimA].[TableA]")
Where I select everything but the ID column, and then the ID column last. However the solution I suggest is not accepted syntax.
Other links
A similar problem for java can be found here.
I believe I have found a workaround that meets your requirements using a table alias.
By assigning the alias T to the table I want to query, it allows me to select both a specific column ([ID]) as well as all columns in the aliased table without the need to explicitly specify them all by name.
This returns all columns of the table (including the ID column) as well as a copy of the ID column at the end of the table.
I then remove the ID column from the resulting table.
This leaves you with the desired result: all columns of a table in the order that they appear with the exception of the ID column that is placed at the end.
PS: For the sake of completeness, I have provided a template of my own DBIConnection object. You can substitute this with the specifics of your own DBIConnection object.
library(sqldf)
library(DBI)
library(odbc)
library(data.table)
con <- dbConnect(odbc::odbc(),
.connection_string = 'driver={YourDriver};
server=YourServer;
database=YourDatabase;
Trusted_Connection=yes'
)
dataframe <- dbGetQuery(con, statement= 'SELECT T.*, T.[ID] FROM [SCHEMA_NAME].[TABLE_NAME] AS T')
dataframe_scoped <- dataframe[,-1]
Today, for the first time I discovered sqldf package which I found to be very useful and convenient. Here is what the documentation says about the package:
https://www.rdocumentation.org/packages/sqldf/versions/0.4-11
sqldf is an R package for runing SQL statements on R data frames,
optimized for convenience. The user simply specifies an SQL statement
in R using data frame names in place of table names and a database
with appropriate table layouts/schema is automatically created, the
data frames are automatically loaded into the database, the specified
SQL statement is performed, the result is read back into R and the
database is deleted all automatically behind the scenes making the
database's existence transparent to the user who only specifies the
SQL statement.
So if I understand correctly, some data.frame which contains data stored in the RAM of the computer is mapped into a database on the disk temporarily as a table, then the calculation or whatever the query is supposed to do will be done and finally the result is returned back to R and all that was temporarily created in the database goes away as it never existed.
My question is, does it work other way around? Meaning, that assuming there is already a table let's say named my_table (just an example) in the database (I use PostgreSQL), is there any way to import its data from the database into a data.frame in R via sqldf? Because, currently the only way that I know is RPostgreSQL.
Thanks to G. Grothendieck for the answer. Indeed it is perfectly possible to select data from already existing tables in the database. My mistake was that I was thinking that the name of the dataframe and the corresponding table must always be the same, whereas if I understand correctly, this is only the case when a data.frame data is mapped to a temporary table in the database. As a result when I tried to select data, I had an error message saying that a table with the same name already existed in my database.
Anyway, just as a test to see whether this works, I did the following in PostgreSQL (postgres user and test database which is owned by postgres)
test=# create table person(fname text, lname text, email text);
CREATE TABLE
test=# insert into person(fname, lname, email) values ('fname-01', 'lname-01', 'fname-01.lname-01#gmail.com'), ('fname-02', 'lname-02', 'fname-02.lname-02#gmail.com'), ('fname-03', 'lname-03', 'fname-03.lname-03#gmail.com');
INSERT 0 3
test=# select * from person;
fname | lname | email
----------+----------+-----------------------------
fname-01 | lname-01 | fname-01.lname-01#gmail.com
fname-02 | lname-02 | fname-02.lname-02#gmail.com
fname-03 | lname-03 | fname-03.lname-03#gmail.com
(3 rows)
test=#
Then I wrote the following in R
options(sqldf.RPostgreSQL.user = "postgres",
sqldf.RPostgreSQL.password = "postgres",
sqldf.RPostgreSQL.dbname = "test",
sqldf.RPostgreSQL.host = "localhost",
sqldf.RPostgreSQL.port = 5432)
###
###
library(tidyverse)
library(RPostgreSQL)
library(sqldf)
###
###
result_df <- sqldf("select * from person")
And indeed we can see that result_df contains the data stored in the table person.
> result_df
fname lname email
1 fname-01 lname-01 fname-01.lname-01#gmail.com
2 fname-02 lname-02 fname-02.lname-02#gmail.com
3 fname-03 lname-03 fname-03.lname-03#gmail.com
>
>
I am trying to insert values into an existing table in Oracle database from R via ODBC by running the following command,
conODBC<-odbcConnect(dsn="xxxx", uid="xxxxx", pwd="xxxx", readOnly=FALSE)
sqls<-sprintf("INSERT INTO R_AAAA (AAA,AAB,AAC,AAD) VALUES (%s)",
apply(R_AAAA, 1, function(i) paste(i, collapse=",")))
lapply(sqls, function(s) sqlQuery(conODBC, s))
And I got this error,
"HY000 984 [Oracle][ODBC][Ora]ORA-00984: 列在此处不允许\n"
列在此处不允许 is chinese meaning 'Column is not allowed here'. And the variable sqls appears as follow,
>sqls
"INSERT INTO R_AAAA (AAA,AAB,AAC,AAD) VALUES
(H3E000000000000,344200402050,12, 2.347826e+01)"
Column AAA in R_AAAA is a string column. It appears to me Oracle database needs single quotes ' around a string value, 'H3E000000000000' for instance, in an insert statement. Anyone knows how to add single marks? I would like to insert rows into an existing table instead of creating a new table in Oracle with sqlSave.
Thanks
you can preprocess R_AAAA by adding ' to character columns before calling the code
R_AAAA <- data.frame(AAA="H3E000000000000", AAB=344200402050, AAC=12, AAD=2.347826e+01)
as.data.frame(lapply(R_AAAA, function(x) {
if (class(x) == "character") {
x <- paste0("'",x,"'")
}
x
}))
what R package are you for the sqlSave? There should be an option to insert new rows e.g. append=TRUE.
Preprocessing works. I use RODBC package. If I create an empty R_AAAA in oracle database and run the command in R
sqlSave(conODBC,R_AAAA,tablename="R_AAAA",append=TRUE)
R-studio crashes. There wasnt even an error message:)
I am new to RSQLite.
I have an input document in text format in which values are seperately by '|'
I created a table with the required variables (dummy code as follows)
db<-dbconnect(SQLite(),dbname="test.sqlite")
dbSendQuery(conn=db,
"CREATE TABLE TABLE1(
MARKS INTEGER,
ROLLNUM INTEGER
NAME CHAR(25)
DATED DATE)"
)
However I am struck at how to import values into the created table.
I cannot use INSERT INTO Values command as there are thousands of rows and more than 20+ columns in the original data file and it is impossible to manually type in each data point.
Can someone suggest an alternative efficient way to do so?
You are using a scripting language. The deal of this is literally to avoid manually typing each data point. Sorry.
You have two routes:
1: You have corrected loaded a database connection and created an empty table in your SQLite database. Nice!
To load data into the table, load your text file into R using e.g. df <-
read.table('textfile.txt', sep='|') (modify arguments to fit your text file).
To have a 'dynamic' INSERT statement, you can use placeholders. RSQLite allows for both named or positioned placeholder. To insert a single row, you can do:
dbSendQuery(db, 'INSERT INTO table1 (MARKS, ROLLNUM, NAME) VALUES (?, ?, ?);', list(1, 16, 'Big fellow'))
You see? The first ? got value 1, the second ? got value 16, and the last ? got the string Big fellow. Also note that you do not enclose placeholders for text in quotation marks (' or ")!
Now, you have thousands of rows. Or just more than one. Either way, you can send in your data frame. dbSendQuery has some requirements. 1) That each vector has the same number of entries (not an issue when providing a data.frame). And 2) You may only submit the same number of vectors as you have placeholders.
I assume your data frame, df contains columns mark, roll, and name, corrsponding to the columns. Then you may run:
dbSendQuery(db, 'INSERT INTO table1 (MARKS, ROLLNUM, NAME) VALUES (:mark, :roll, :name);', df)
This will execute an INSERT statement for each row in df!
TIP! Because an INSERT statement is execute for each row, inserting thousands of rows can take a long time, because after each insert, data is written to file and indices are updated. Insert, enclose it in an transaction:
dbBegin(db)
res <- dbSendQuery(db, 'INSERT ...;', df)
dbClearResult(res)
dbCommit(db)
and SQLite will save the data to a journal file, and only save the result when you execute the dbCommit(db). Try both methods and compare the speed!
2: Ah, yes. The second way. This can be done in SQLite entirely.
With the SQLite command utility (sqlite3 from your command line, not R), you can attach a text file as a table and simply do a INSERT INTO ... SELECT ... ; command. Alternately, read the text file in sqlite3 into a temporary table and run a INSERT INTO ... SELECT ... ;.
Useful site to remember: http://www.sqlite.com/lang.html
A little late to the party, but DBI provides dbAppendTable() which will write the contents of a dataframe to an SQL table. Column names in the dataframe must match the field names in the database. For your example, the following code would insert the contents of my random dataframe into your newly created table.
library(DBI)
db<-dbConnect(RSQLite::SQLite(),dbname=":memory")
dbExecute(db,
"CREATE TABLE TABLE1(
MARKS INTEGER,
ROLLNUM INTEGER,
NAME TEXT
)"
)
df <- data.frame(MARKS = sample(1:100, 10),
ROLLNUM = sample(1:100, 10),
NAME = stringi::stri_rand_strings(10, 10))
dbAppendTable(db, "TABLE1", df)
I don't think there is a nice way to do a large number of inserts directly from R. SQLite does have a bulk insert functionality, but the RSQLite package does not appear to expose it.
From the command line you may try the following:
.separator |
.import your_file.csv your_table
where your_file.csv is the CSV (or pipe delimited) file containing your data and your_table is the destination table.
See the documentation under CSV Import for more information.
I can't figure out how to update an existing DB2 database in R or update a single value in it.
I can't find much information on this topic online other than very general information, but no specific examples.
library(RJDBC)
teachersalaries=data.frame(name=c("bob"), earnings=c(100))
dbSendUpdate(conn, "UPDATE test1 salary",teachersalaries[1,2])
AND
teachersalaries=data.frame(name=c("bob",'sally'), earnings=c(100,200))
dbSendUpdate(conn, "INSERT INTO test1 salary", teachersalaries[which(teachersalaries$earnings>200,] )
Have you tried passing a regular SQL statement like you would in other languages?
dbSendUpdate(conn, "UPDATE test1 set salary=? where id=?", teachersalary, teacherid)
or
dbSendUpdate(conn,"INSERT INTO test1 VALUES (?,?)",teacherid,teachersalary)
Basically you specify the regular SQL DML statement using parameter markers (those question marks) and provide a list of values as comma-separated parameters.
Try this, it worked for me well.
dbSendUpdate(conn,"INSERT INTO test1 VALUES (?,?)",teacherid,teachersalary)
You just need to pass a regular SQL piece in the same way you do in any programing langs. Try it out.
To update multiple rows at the same time, I have built the following function.
I have tested it with batches of up to 10,000 rows and it works perfectly.
# Libraries
library(RJDBC)
library(dplyr)
# Function upload data into database
db_write_table <- function(conn,table,df){
# Format data to write
batch <- apply(df,1,FUN = function(x) paste0("'",trimws(x),"'", collapse = ",")) %>%
paste0("(",.,")",collapse = ",\n")
#Build query
query <- paste("INSERT INTO", table ,"VALUES", batch)
# Send update
dbSendUpdate(conn, query)
}
# Push data
db_write_table(conn,"schema.mytable",mydataframe)
Thanks to the other authors.