R saves invalid encoding into SQL Server nvarchar column - r

I'm trying to save text into SQL Server using R and ODBC Driver 17 for SQL Server. I have this simple script:
test <- data.table(dbGetQuery(con, "select LOCATIONNAME from LOCATION where LOCATION = 'AT2331'"))
name <- test[, LOCATIONNAME]
sql <- paste0('INSERT INTO LOCATION_TEST (TEST) VALUES (\'', name, '\')')
dbGetQuery(con, sql)
The data in the original table is VÖSENDORF and it's shown properly in R, but the result in the database is VÖSENDORF. The column type is nvarchar and server collation is SQL_Latin1_General_CP1_CI_AS. How can I fix this?
I also tried
sql2 <- iconv(sql, from="UTF-8", to="UTF-16LE")
But that says:
embedded nul in string
Edit:
Using 'N' prefix doesn't change the result:
sql <- paste0("INSERT INTO LOCATION_TEST (TEST) VALUES (N'", name, "')")
If I use just a hard coded value, it works fine:
sql <- paste0("INSERT INTO LOCATION_TEST (TEST) VALUES (N'VÖSENDORF')")
This is RStudio in Windows. Output of Sys.getlocale():
"LC_COLLATE=English_United Kingdom.1252;LC_CTYPE=English_United Kingdom.1252;LC_MONETARY=English_United Kingdom.1252;LC_NUMERIC=C;LC_TIME=English_United Kingdom.1252"
Tested my code in Linux with following locale, and there it works fine:
"LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF-8;LC_COLLATE=en_US.UTF-8;LC_MONETARY=en_US.UTF-8;LC_MESSAGES=en_US.UTF-8;LC_PAPER=en_US.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US.UTF-8;LC_IDENTIFICATION=C"
Tested with the latest R version, 4.2.0 (2022-04-22 ucrt) and the same issue still continues.

Related

Roracle special characters on ShinyServer (Linux)

There seem to be plenty of similar issues and answers to them, yet so far haven't found a solution that would work for me.
Issue in short: special characters (scandic letters, ohm-symbol, Celcius degree-symbol, etc.) get all scrambled up when application is run on Linux Shiny server. Data shown on Shiny Dashboard is queried from Oracle database. The column name in these examples is "NAME" and type is VARCHAR2. When I run similar code on Linux server in R or on my local Windows RStudio, all characters look fine.
What I've tried so far: Characters started to look fine in Linux R after placing NLS_LANG to NLS_LANG=AMERICAN_AMERICA.AL32UTF8 in /etc/environment. I figured these are correct NLS_LANG settings by running SELECT * FROM V$NLS_PARAMETERS and SELECT * from NLS_SESSION_PARAMETERS in Linux's R. Though this didn't fix the issue on the Shiny Server side.
Also I've played around with the dbConnect encoding-parameter, with no luck.
Somewhat reproducible example: (sorry I can't gain access to my Oracle server ;-) )
library(ROracle)
ORAdrv <- dbDriver("Oracle", unicode_as_utf8 = TRUE, ora.attributes = TRUE) #doesn't matter if I have these two latter attributes or not
ORAconnect.string <- paste(
"(DESCRIPTION=",
"(ADDRESS=(PROTOCOL=tcp)(HOST=xx.xx.xx.xx)(PORT=xxxx))",
"(CONNECT_DATA=(SID=...)))", sep = ""
)
query2 <- ("select NAME, DATA_FIELD from TABLE where DATA_FIELD in ('ID7018789', 'ID7025838', 'ID7021380')")
ORAcon <- dbConnect(ORAdrv, username = "...", password = "...", dbname = ORAconnect.string, encoding = "UTF-8") #doesn't matter if encoding is defined or not
res <- dbSendQuery(ORAcon, query2, 'set character set "utf8"') #doesn't matter if the last attribute is defined or not
df <- fetch(res)
dbDisconnect(ORAcon)
print(df)
What the end result looks like:
If I run the code in Linux R, the result is what is expected (Ohm, Celcius symbols and scandic characters look good):
If I run the same code and render the dataframe as datatable on ShinyServer app, the result is like this: (Ohm and Celcius symbols are replaced with question mark, scandic characters äö -> ao)
Any help on getting the encoding correctly on Shiny Server application side is highly appreciated =)
I was able to solve it finally. If someone else is struggling with this, cast the column to nvarchar already in the query.
in my case, to_nchar(NAME) did the trick.
https://docs.oracle.com/cd/B19306_01/server.102/b14200/functions187.htm

Querying from Snowflake in R via DBI imports � symbol

I try to query some data from a table in Snowflake into Rstudio.
I've set-up ODBC properly and execute the below R code:
Data <-dbGetQuery(snowflake, " SELECT * FROM my_table")
Then, I get the data into a data frame but there are some entries which show � character.
For example the German word Groß is imported as Gro� into R.
I checked the original my_table and the data are stored without � symbols there.
How may I fix this issue?
Specifying the encoding fixed the issue
dbConnect(odbc::odbc(), "snowflake", uid="username", pwd='password',encoding = "latin1")
Data <-dbGetQuery(snowflake, " SELECT * FROM my_table")

Import UTF-8 characters from database to R Studio

I am querying data from a SQL Server database to R Studio. Some columns contain cyrillic letters that should be used in further analysis. However they are encoded in wrong way, so I can not use them. Due to work privacy I am gonna create reproducible example that shows the problem.
library(odbc)
library(pool)
library(DBI)
poolX <- dbPool(drv = odbc::odbc(),
Driver = "ODBC Driver 17 for SQL Server",
Database = "database",
Server = "server",
UID = "user",
PWD = "123456")
Connection works well and let R Studio query data from needed database. Database contains table with characters.
Column City contains city names written on Russian.
It's shown in SQL Server as:
City = Алматы, Астана
However when I query this column to RStudio cell it's written in this form:
City = <c0><eb><ec><e0><f2><fb>,<c0><f1><f2><e0><ed><e0>
Also R shows it in different form
unique(City)
#[1] "\xc0\xeb\xec\xe0\xf2\xfb"
#[2] "\xc0\xf1\xf2\xe0\xed\xe0"
Interesting point is, if I import data from the SQL Server database to Excel and upload to R Studio, it works well
I need direct connection from database to RStudio, so I have to fix this issue.
Any help is welcome. What is the problem?
You can set your locale to Russian before you import from MSSQL with
Sys.setlocale(locale = "Russian")
If you do not want set everything to Russian, you can just set the language with
Sys.setlocale(category = "LC_CTYPE", locale = "Russian")
Example:
> City = "Алматы, Астана"
> data.frame(City)
City
1 <U+0410><U+043B><U+043C><U+0430><U+0442><U+044B>, <U+0410><U+0441><U+0442><U+0430><U+043D><U+0430>
> Sys.setlocale(category = "LC_CTYPE", locale = "Russian")
[1] "Russian_Russia.1251"
>
> data.frame(City)
City
1 Алматы, Астана

ROracle connect and pull utf8 characters

I am connecting to an Oracle database from R using ROracle. The problem is for every special utf-8 character it returns a question mark. Some Chinese values returns a solid string of question marks. I believe this is relevant because I haven't found any other question on this site (or others) that answers this for the package ROracle.
Some questions that were the most promising include an answer for MySQL: Fetching UTF-8 text from MySQL in R returns "????" but I was unable to make this work for ROracle. This site also provided some useful information https://docs.oracle.com/cd/E17952_01/mysql-5.5-en/charset-connection.html Before I was using RODBC and was easily able to configure the uft-8 encoding.
Here is some sample code... I am sorry that unless you have an Oracle database with utf-8 characters it may be impossible to duplicate... I also changed the host number and the sid for data privacy reasons...
library(ROracle)
drv <- dbDriver("Oracle")
# Create the connection string
host <- "10.00.000.86"
port <- 1521
sid <- "f110"
connect.string <- paste(
"(DESCRIPTION=",
"(ADDRESS=(PROTOCOL=tcp)(HOST=", host, ")(PORT=", port, "))",
"(CONNECT_DATA=(SID=", sid, ")))", sep = "")
con <- dbConnect(drv, username = "XXXXXXXXX",
password = "xxxxxxxxx",dbname=connect.string)
my.table <- dbReadTable(con, "DASH_D_PROJECT_INFO")
my.table[40, 1:3]
PROJECT_ID DATE_INPUT PROJECT_NAME
211625 2012-07-01 ??????, ?????????????????? ????? ??????, 1869?1917 [????? 3]
Any help is appreciated. I have read the entire documentation of the ROracle packages, and it seemed to have a solution for writing utf-8 characters, but not for reading them.
Okay after several weeks I found my own answer. I hope that it will be of value to someone else.
My question is largely answered by how Oracle stores the data. If you want UTF-8 characteristics preserverd you need the column in the table to be an NVARCHAR not just a varchar. At that point regular data pulling and encoding will work in R as expected. I was looking for the error in the wrong place.
I also want to mention one hang up on how to write utf-8 data from R to Oracle with utf-8
In writing files I had some that would not convert to UTF-8 in the following manner. So I did the step in too parts and wrote them in two steps to an oracle table. The results worked perfectly.
Encoding(my.data1$Project.Name) <- "UTF-8"
my.data1.1 <- my.data1[Encoding(my.data1$Project.Name) == "UTF-8", ]
my.data1.2 <- my.data1[Encoding(my.data1$Project.Name) != "UTF-8", ]
attr(my.data1.1$Project.Name, "ora.encoding") <- "UTF-8"
If you found this insightful give it an up vote so more can find it.

R Shiny: Unable to retrieve JDBC result set for vertica DB

Getting below error while using vertica copy table from local.
Please suggest
Error:Unable to retrieve JDBC result set for COPY
Monetisation_Base_table FROM LOCAL 'E://testCSV.csv' delimiter ','
([Vertica]JDBC A ResultSet was expected but not generated
from query "COPY Monetisation_Base_table FROM LOCAL 'E://testCSV.csv'
delimiter ','". Query not executed. )
Code Used:
library(RJDBC)
vDriver <- JDBC(driverClass="com.vertica.jdbc.Driver", classPath="full\path\to\driver\vertica_jdbc_VERSION.jar")
vertica <- dbConnect(vDriver, "jdbc:vertica://30.0.9.163:5433/db", "sk14930IU", "Snapdeal_40")
myframe = dbGetQuery(vertica, "COPY Monetisation_Base_table FROM LOCAL 'E://testCSV.csv' delimiter ','"")
dbSendUpdate should do the work in this case .

Resources