Special Characters are Converted to ? When Inserting into Oracle Database Using R - r

I'm making a connection to an oracle database using the ROracle package and DBI package. When I try to execute insert statements that have special characters, the special characters get converted to non-special characters. (I'm sure there's more correct terms for "special" and "non-special" that I'm not aware of).
First I make the following connection:
connection <- dbConnect(
dbDriver("Oracle"),
username = "xxxxx",
password = "xxxxx",
dbname = "xxxx"
)
Then I execute the following insert statement on a table I already have created. Column A has a type of nvarchar2.
dbSendQuery(connection, "insert into TEST_TABLE (A) values('£')")
This is what gets returned:
Statement: insert into TEST_TABLE (A) values('#')
Rows affected: 1
Row count: 0
Select statement: FALSE
Statement completed: TRUE
OCI prefetch: FALSE
Bulk read: 1000
Bulk write: 1000
As you can see, the "£" symbol gets replaced by a "#". I can execute the insert statement directly in PL/SQL and there's no issue, so it seems to be an issue with R. Any help is appreciated.

This was resolved by running Sys.setenv(NLS_LANG = "AMERICAN_AMERICA.AL32UTF8") before creating the connection.

Related

Can't find a way to gather data and upload it again to a different SQL server without breaking the encoding R/dbPlyr/DBI

Basic setup is that I connect to a database A, get some data back to R, write it to another connection, database B.
The database is SQL_Latin1_General_CP1_CI_AS and I'm using encoding = "windows-1252" in connection A and B.
Display in RStudio is fine, special characters show as they should.
When I try to write the data, I get a "Cannot insert the value NULL into column".
I narrowed it down to at least one offending field: a cell with a PHI symbol, which causes the error.
How do I make it so the PHI symbol and presumably other special characters are kept the same from source to destination?
conA <- dbConnect(odbc(),
Driver = "ODBC Driver 17 for SQL Server",
Server = "DB",
Database = "serverA",
Trusted_connection = "yes",
encoding = "1252")
dbWriteTable(conB,SQL("schema.table"),failing_row, append = T)
#This causes the "cannot insert null value" error
I suggest working around this problem without dbplyr. As the overwhelming majority of encoding questions have nothing to do with dbplyr (the encoding tag has 23k questions, while the dbplyr tag has <400 questions) this may be a non-trivial problem to resolve without dbplyr.
Here are two work-arounds to consider:
Use a text file as an intermediate step
R will have no problem writing an in-memory table out to a text file/csv. And SQL server has standard ways of reading in a text file/csv. This gives you the added advantage of validating the contents of the text file before loading it into SQL.
Documentation for SQL Server BULK INSERT can be found here. This answer gives instructions for using UTF-8 encoding: CODEPAGE = '65001'. And this documentation gives instructions for unicode: DATAFILETYPE = 'widechar'.
If you want to take this approach entirely within R, it will likely look something like:
write.csv(failing_row, "output_file.csv")
query_for_creating_table = "CREATE TABLE schema_name.table_name (
col1 INT,
col2 NCHAR(10),
)"
# each column listed along with suitable data types
query_for_bulk_insert = "BULK INSERT schema_name.table_name
FROM 'output_file.csv'
WITH
(
DATAFILETYPE = 'widechar',
FIRSTROW = 2,
ROWTERMINATOR = '\n'
)"
DBI::dbExecute(con, query_for_creating_table)
DBI::dbExecute(con, query_for_bulk_insert)
Load all the non-error rows and append the final row after
I have has some success in the past using the INSERT INTO syntax. So would recommend loading the failing row using this approach.
Something like the following:
failing_row = local_df %>%
filter(condition_to_get_just_the_failing_row)
non_failing_rows = local_df %>%
filter(! condition_to_get_just_the_failing_row)
# write non-failing rows
dbWriteTable(con, SQL("schema.table"), non_failing_rows, append = T)
# prep for insert failing row
insert_query = "INSERT INTO schema.table VALUES ("
for(col in colnames(failing_row)){
value = failing_row[[col]]
if(is.numeric(value)){
insert_query = paste0(insert_query, value, ", ")
} else {
insert_query = paste0(insert_query, "'", value, "', ")
}
}
insert_query = paste0(insert_query, ");")
# insert failing row
dbExecute(con, insert_query)
other resources
If you have not seen them already, here are several related Q&A that might assist: Arabic characters, reading UTF-8, encoding to MySQL, and non-Latin characters as question marks. Though some of these are for reading data into R.

R with postgresql database

I've been trying to query data from postgresql database (pgadmin) into R and analyse. Most of the queries work except when I try to write a condition specifically to filter out most of the rows. Please find the code below
dbGetQuery(con, 'select * from "db_name"."User" where "db_name"."User"."FirstName" = "Mani" ')
Error in result_create(conn#ptr, statement) :
Failed to prepare query: ERROR: column "Mani" does not exist
LINE 1: ...from "db_name"."User" where "db_name"."User"."FirstName" = "Mani"
^
this is the error I get, Why is it considering Mani as a column when it is just an element. Someone pls assist me
String literals in Postgres (and most flavors of SQL) take single quotes. This, combined with a few other optimizations in your code leave us with this:
sql <- "select * from db_name.User u where u.FirstName = 'Mani'"
dbGetQuery(con, sql)
Note that introduced a table alias, for the User table, so that we don't have to repeat the fully qualified name in the WHERE clause.

loading in a MySQL table called "order" with RMySQL

I'm currently trying to connect my R session to a MySQL server using the RMySQL package.
One of the tables on the server is called "order", I already searched how you can import a table called order with MySQL (by putting it into ''), yet the syntax does not work for the RMySQL query.
when I run the following statement:
order_query = dbSendQuery(mydb,"SELECT * FROM 'order'")
It returns the following error:
Error in .local(conn, statement, ...) : could not run statement:
You have an error in your SQL syntax; check the manual that
corresponds to your MySQL server version for the right syntax to use
near ''order'' at line 1
Anyone knows how to get around this in R?
Single quotes in MySQL indicate string literals, and you should not be putting them around your table names. Try the query without the quotes:
order_query = dbSendQuery(mydb,"SELECT * FROM `order`")
If you did, for some reason, need to escape your table name, then use backticks, e.g.
SELECT * FROM `some table` -- table name with a space (generally a bad thing)
Edit:
As #Ralf pointed out, in this case you do need backticks because ORDER is a MySQL keyword and you should not be using it to name your tables and columns.

R insert string column into Oracle database

I am trying to insert values into an existing table in Oracle database from R via ODBC by running the following command,
conODBC<-odbcConnect(dsn="xxxx", uid="xxxxx", pwd="xxxx", readOnly=FALSE)
sqls<-sprintf("INSERT INTO R_AAAA (AAA,AAB,AAC,AAD) VALUES (%s)",
apply(R_AAAA, 1, function(i) paste(i, collapse=",")))
lapply(sqls, function(s) sqlQuery(conODBC, s))
And I got this error,
"HY000 984 [Oracle][ODBC][Ora]ORA-00984: 列在此处不允许\n"
列在此处不允许 is chinese meaning 'Column is not allowed here'. And the variable sqls appears as follow,
>sqls
"INSERT INTO R_AAAA (AAA,AAB,AAC,AAD) VALUES
(H3E000000000000,344200402050,12, 2.347826e+01)"
Column AAA in R_AAAA is a string column. It appears to me Oracle database needs single quotes ' around a string value, 'H3E000000000000' for instance, in an insert statement. Anyone knows how to add single marks? I would like to insert rows into an existing table instead of creating a new table in Oracle with sqlSave.
Thanks
you can preprocess R_AAAA by adding ' to character columns before calling the code
R_AAAA <- data.frame(AAA="H3E000000000000", AAB=344200402050, AAC=12, AAD=2.347826e+01)
as.data.frame(lapply(R_AAAA, function(x) {
if (class(x) == "character") {
x <- paste0("'",x,"'")
}
x
}))
what R package are you for the sqlSave? There should be an option to insert new rows e.g. append=TRUE.
Preprocessing works. I use RODBC package. If I create an empty R_AAAA in oracle database and run the command in R
sqlSave(conODBC,R_AAAA,tablename="R_AAAA",append=TRUE)
R-studio crashes. There wasnt even an error message:)

In R: dbGetQuery() coerces a string to a numeric and causes problems

I have a table in a sqlite database with the following schema
CREATE TABLE os_logs (version STRING, user STRING, date STRING);
I set the following command to a variable called cmd.
select count(*), version
from os_logs
group by version
order by version;
After I send that command through dbGetQuery I get numeric results back for version instead of a string.
db <- dbConnect(SQLite(),"./os_backup.db")
dbGetQuery(db,cmd)
count(*) version
1421 NA
1797 0.7
6 0.71
2152 0.71
1123 0.72
3455 1
2335 1
The versions should be
0.70.1111_Product
0.71.22_Dev
0.71.33_Product
...
Any idea on why the strings in my sqlite database are being turned into numerics in R? If I do that command on the sql cmd line it works perfectly
Edit:
Here is how the tables are created. (With more info since I edited it out in the original question.
drop table vocf_logs;
CREATE TABLE vocf_logs (version STRING, driver STRING, dir STRING, uuid STRING PRIMARY KEY, t_start STRING);
CREATE TABLE log_os (uuid STRING PRIMARY KEY, os STRING);
.separator ","
.import vocf_dirs.csv vocf_logs
-- Put the OsVersion info from name_redacted into the table
UPDATE vocf_logs
SET version=(select log_os.os from log_os where uuid = vocf_logs.uuid);
What you describe should work fine. You must have done something differently or inserted it incorrectly to the db.
Here is a step by step test that does the exact same and works:
# Load package and connect
R> library(RSQLite)
R> db <- dbConnect(SQLite(),"./os_backup.db")
# Create db and insert data
R> dbSendQuery(db, "CREATE TABLE os_logs (version STRING, user STRING, date STRING);")
R> dbSendQuery(db, "INSERT INTO os_logs VALUES ('0.70.1111_Product', 'while', '2015-04-23')")
R> dbSendQuery(db, "INSERT INTO os_logs VALUES ('0.70.1111_Product', 'while', '2015-04-24')")
R> dbSendQuery(db, "INSERT INTO os_logs VALUES ('0.71.22_Dev', 'while', '2015-04-24')")
# Run query counting versions
R> dbGetQuery(db, "SELECT version, count(*) FROM os_logs GROUP BY version ORDER BY version;")
version count(*)
1 0.70.1111_Product 2
2 0.71.22_Dev 1
The creation of the original table was wrong. The method in R was correct. From the data type descriptions found here: https://www.sqlite.org/datatype3.html.
The declared type of "STRING" has an affinity of NUMERIC, not TEXT.
When table was created using type TEXT it worked as expected.

Resources