rodbc character encoding error with PostgreSQL - r

I'm getting a new error which I've never gotten before when connecting from R to a GreenPlum PostgreSQL database using RODBC. I've gotten the error using both EMACS/ESS and RStudio, and the RODBC call has worked as is in the past.
library(RODBC)
gp <- odbcConnect("greenplum", believeNRows = FALSE)
data <- sqlQuery(gp, "select * from mytable")
> data
[1] "22P05 7 ERROR: character 0xc280 of encoding \"UTF8\" has no equivalent in "WIN1252\";\nError while executing the query"
[2] "[RODBC] ERROR: Could not SQLExecDirect 'select * from mytable'"
EDIT:
Just tried querying another table and did get results. So I guess it's not an RODBC problem but a PostgreSQL table encoding problem.
R version 2.13.0 (2011-04-13)
Platform: i386-pc-mingw32/i386 (32-bit)
locale:
[1] LC_COLLATE=English_United States.1252
[2] LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C
[5] LC_TIME=English_United States.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] RODBC_1.3-2
>

First, the issue arises because R is trying to convert to a Windows locale that supports UTF8. Unfortunately, Brian Ripley has reported numerous times that Windows has no UTF8 locales. From hours spent searching the web, StackOverflow, Microsoft, etc., I have come to the conclusion that Microsoft hates UTF-8 Windows won't support UTF8.
As a result, I'm not sure that there's an easy solution to this, if there is any solution at all. The best I can recommend is to wrap some kind of conversion on the server side, look at filtering the data if you can, or try a different language, if appropriate (e.g. Chinese, Japanese, Korean).
If you do decide to wrap a converter, unicode.org recommends this ICU toolkit.

0xc280 is a control element ( U+0080 in Unicode) that is causing trouble pretty often when using SQL and the likes. The problem often lies in the conversion chain that invariably happens when you use different applications that use different encoding schemes. Windows has UTF-8 included by now, so it's not strictly a Windows problem. I believe the problem arises before R reads the data in.
In fact, in the chain the character sequence 0x80 in UNICODE will be mapped to 0xc280 in UTF-8. This is supposed to be a control sequence, and cannot be printed. But chances are big that the 0x80 is in fact not UNICODE, but Windows Latin-1 or Latin-2. In that case, the 0x80 represents the euro sign. That might explain how it ends up in your data. Check if you can find something like that in the data, that would explain something already.
My guess is that the solution will not lie at the R-end of this workchain, but before that. It will try automatic conversion, but this one is reported to fail in some cases (also for SQL and Oracle btw). Check in which encoding you're working in Postgresql, and try to use any of the latin types. There might be other links involved (a Putty or similar terminal for example). I'm pretty sure all the encodings there are ISO8859-1, which is Latin-1. Somewhere UTF-8 gets thrown in between, and when the 0x80 character gets wrongly mapped to 0xc280, you get trouble.
So check the encodings in your complete workchain, and make sure that they all match. If they don't, the automatic conversion done between each step is bound to give trouble for some characters.
Hope this helps.

I might have posted this response elswhere but here goes.
I get similar error when connecting to Postgres DB from MS SQL Management client. Tyring to fix the source data is almost impossible in my case.
My Scenario:
Trying to connect to Postgress using MS SQL Linked Objects via an
ODBC System DSN, and see errors such as "ERROR: character 0xc280 of
encoding "UTF8" has no equivalent in"WIN1252";
Select statements on some tables work and others throw this error.
Fix: Use an ODBC driver that supports Unicode. I am using an ODBC driver from PostgreSQL Global Development Group. Go to Configure DSN/Manage DSN and select the Unicode driver.
Good luck.

By default Greenplum use UTF8 for character encoding. You could check this by logging in to Greenplum server and launching psql - console client for Greenplum.
In this console application you could issue command: \l to list all of the databases configured in the Greenplum - this should also describe character set for database.
I think your prblem is that R doesnt support UTF8 for chars (You use different locale)
But you could use On-the-fly transcoding in ODBC driver. Not sure about all ODBC drivers but DataDirect drivers support extra option in odbc.ini file (usually located in user home directory) - IANAAppCodePage.
You could find appropriate code for this parameter on this link:
http://www.iana.org/assignments/character-sets
Here is the example od ODBC.ini content:
[ODBC]
Driver=/opt/odbc/lib/S0gplm60.so
IANAAppCodePage=2252
AlternateServers=
ApplicationUsingThreads=1
ConnectionReset=0
ConnectionRetryCount=0
ConnectionRetryDelay=3
Database=mysdb
EnableDescribeParam=1
ExtendedColumnMetadata=0
FailoverGranularity=0
FailoverMode=0
FailoverPreconnect=0
FetchRefCursor=1
FetchTSWTZasTimestamp=0
FetchTWFSasTime=0
HostName=192.168.1.100
InitializationString=
LoadBalanceTimeout=0
LoadBalancing=0
LoginTimeout=15
LogonID=
MaxPoolSize=100
MinPoolSize=0
Password=
Pooling=0
PortNumber=5432
QueryTimeout=0
ReportCodepageConversionErrors=0
TransactionErrorBehavior=1
XMLDescribeType=-10

Related

DB2 ODBC connection doesn't work on R 4.2

I had a working connection to a DB2 server from R. Then I ungraded to R v4.2 and it no longer works.
This is my connection string:
con_DB2 = DBI::dbConnect(odbc::odbc(),
Driver = "IBM DB2 ODBC DRIVER - C_PROGRA~2_IBM_V111~1.4FP_CLIDRI~1",
Database='DB2Q',
Hostname='usddcs',
Port=3700,
PROTOCOL='TCPIP',
UID= rstudioapi::askForPassword("Database username"),
PWD=rstudioapi::askForPassword("Database password"))
I get the following error message:
Error: nanodbc/nanodbc.cpp:1021: IM004: [Microsoft][ODBC Driver Manager] Driver's SQLAllocHandle on SQL_HANDLE_ENV failed
This seems to have been raised in this issue: https://github.com/rstudio/rstudio/issues/10509
It was NOT solved, but suggestions are that the encoding might be impacting this. Are there any arguments to the dbConnect() function that can be changed to fiddle with encoding to make it work in the new R version?
> sessionInfo()
R version 4.2.2 (2022-10-31 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19042)
Works for me with R4.2.2 on Win10 pro(19045) and Rstudio 2022.07.02-576 and
IBM clidriver version 11.5.6.0 (or higher). It may also work with older driver versions.
The key detail is that you need to have the system environment variable DB2CODEPAGE=1208 set before starting Rstudio. This tells the clidriver to use utf-8 as the application code page.
If you only want the change to impact Rstudio and R (but not separate command-line based programs, for perl, python, php, clp etc ) then set that variable in the Rstudio script via Sys.setenv(DB2CODEPAGE=1208) or equivalent configuration/startup file.
Note that if you are not using clidriver, and instead you are using a larger footprint Db2 client (for example, the fat client, or the runtime client) then you may also use the db2set DB2CODEPAGE=1208 method of setting the variable, and the consequence should be the same.
Then the database connection to Db2-LUW succeeds (the target database is also utf-8 encoded).
You can also ignore the cli driver and instead try accessing the database via rJDBC and a jdbc driver (there is no shared code between the jdbc driver and the cli driver).

Connect Rstudio to oracle database with WE8MSWIN1252 encoding

I have issues with wrong encoding when connecting to an Oracle database from RStudio. I have tried both the RODBC and ODBC packages.
When I query the database either with sqlQuery() or through dplyr Norwegian letters (Æ, Ø and Å) is not displayed correctly.
I have tried setting the default text encoding i RStudio to UTF-8, and saving the R-scripts with UTF-8 encoding, but the problem persists.
When querying the database with SELECT * FROM NLS_DATABASE_PARAMETERS I find that the database has NLS_CHARACTERSET WE8MSWIN1252, so I have tried to specify this when connecting.
con <- dbConnect(odbc::odbc(), "RStudio", uid = "user",
pwd = "password", encoding = "WE8MSWIN1252")
But this only gives me this error message:
Error in new_result(connection#ptr, statement) :
Can't convert from WE8MSWIN1252 to UTF-8
Any ideas on what I'm doing wrong?
The encoding argument in dbConnect should be set to the encoding used by the system in which you run R, rather than the encoding used in the database (in accordance with the NLS_LANG system environment variable). Therefore, if you run R/Rstudio on a Windows machine, 'encoding = "cp1252"' or 'encoding="latin1"' should do it. However, if you're on Linux or Mac, 'encoding="UTF8" should hopefully work'. If this does not work, make sure that NLS_LANG is set correctly.
(If you use Rstudio server, you have to set NLS_LANG in your .Renviron file, as Rstudio server does not export OS wide environment variables)

Directory not found when using Drat on a network drive

I have developed a package that I want to share with my colleagues at work.
I have a network drive in which I created the local repository structure that looks like this:
MyRepo
\__bin
\__windows
\__contrib
\__src
\__contrib
All folders are empty.
So I built my package with RStudio on Windows using the "Build/More/Build source package" menu, which created a tar.gz file.
Then I tried:
drat::insertPackage("../myPkg_0.0.0.9000.tar.gz",
repodir = "file://networkdrive/path/to/MyRepo",
action = "prune")
But this gives me an error:
Error: Directory file://networkdrive/path/to/MyRepo not found
Which is strange because file.exists(//networkdrive/path/to/MyRepo) returns true.
OK, then I tried:
drat::insertPackage("../myPkg_0.0.0.9000.tar.gz",
repodir = "//networkdrive/path/to/MyRepo",
action = "prune")
Without the file: in the repository path and I get another error:
tar (child): "//networkdrive/path/to/MyRepo/src/contrib/myPkg_0.0.0.9000.tar.gz: Cannot open: No such file or directory
tar (child): Error is not recoverable: exiting now
/usr/bin/tar: Child returned status 2
/usr/bin/tar: myPkg/DESCRIPTION: Not found in archive
/usr/bin/tar: Exiting with failure status due to previous errors
reading DESCRIPTION for package ‘myPkg’ failed with message:
cannot open the connection
But when I go in the "//networkdrive/path/to/MyRepo/src/contrib" folder, I can definitely see the myPkg_0.0.0.9000.tar.gz file that has been copied despite the error message.
Can anyone help?
> sessionInfo()
R version 3.3.3 (2017-03-06)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1
locale:
[1] LC_COLLATE=French_France.1252 LC_CTYPE=French_France.1252 LC_MONETARY=French_France.1252 LC_NUMERIC=C LC_TIME=French_France.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
loaded via a namespace (and not attached):
[1] drat_0.1.2 tools_3.3.3 git2r_0.18.0
I know this is old, but my colleague just came across the same problem and found this post. I believe the issue may be the lack of a trailing slash in your directory name. I have been able to recreate the error with a mapped network drive. I can resolve the issue by using "H:/MyRepo/" instead of "H:/MyRepo".
I haven't tried it with the "file://" format, but I wanted to include my answer in case someone else comes across this question.
Ok, so after some research, here is my conclusions.
It cannot be done
It's not Drat's fault
The reason why it does not work is that the tools::write_PACKAGES function does not work on network drives. Period.
I manually copied my package on the network drive, then ran setwd() to its location and executed write_PACKAGES(".", type="source") and I got the same error.
So to make this work, I just left my package.tar.gz file on a local drive, ran the tools::write_PACKAGES command locally and then moved the files to the network drive.
Adding the network drive to my repository list using options(repos = c(MyRepo = "file://networkdrive/path/to/MyRepo/")) works: RStudio and available.packages find my package.
It's not completely satisfactory, but I think it's the only way today.
I was having this problem as well and finally got to the bottom of it today.
For me, the problem was not isolated to just network locations but also occurred on C: drive. The root cause was the version of tar.exe being used to unpack the existing packages in the package directory. Calls to utils::untar are made in the tools::write_PACKAGES function.
The documentation for utils::untar explains that on Windows, external tar.exe is tried first. Sure enough, I had a version installed with Git which when used with default arguments fails when a colon is in the file name. I was able to force utils::untar to use to use the RBuildTools version of tar.exe instead by setting the environment variable TAR to "internal".
drat::insertPackage now works.

Encoding lost when reading XML in R

I am retrieving online XML data using the XML R packages. My issue is that the UTF-8 encoding is lost during the call to xmlToList : for instance, 'é' are replaced by 'é'. This happens during the XML parsing.
Here is a code snippet, with an example of encoding lost and another where encoding is kept (depending of the data source) :
library(XML)
library(RCurl)
url = "http://www.bdm.insee.fr/series/sdmx/data/DEFAILLANCES-ENT-FR-ACT/M.AZ+BE.BRUT+CVS-CJO?lastNObservations=2"
res <- getURL(url)
xmlToList(res)
# encoding lost
url2 = "http://www.bdm.insee.fr/series/sdmx/conceptscheme/"
res2 <- getURL(url2)
xmlToList(res2)
# encoding kept
Why the behaviour about encoding is different ? I tried to set .encoding = "UTF-8" in getURL, and to enc2utf8(res) but that makes no change.
Any help is welcome !
Thanks,
Jérémy
R version 3.2.1 (2015-06-18)
Platform: i386-w64-mingw32/i386 (32-bit)
Running under: Windows 7 (build 7601) Service Pack 1
locale:
[1] LC_COLLATE=French_France.1252 LC_CTYPE=French_France.1252
[3] LC_MONETARY=French_France.1252 LC_NUMERIC=C
[5] LC_TIME=French_France.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] RCurl_1.95-4.7 bitops_1.0-6 XML_3.98-1.3
loaded via a namespace (and not attached):
[1] tools_3.2.1
You are trying to read SDMX documents in R. I would suggest to use the rsdmx package that makes easier the reading of SDMX documents. The package is available on CRAN, you can also access the latest version on Github.
rsdmx allows you to read SDMX documents by file or url, e.g.
require(rsdmx)
sdmx = readSDMX("http://www.bdm.insee.fr/series/sdmx/data/DEFAILLANCES-ENT-FR-ACT/M.AZ+BE.BRUT+CVS-CJO?lastNObservations=2")
as.data.frame(sdmx)
Another approach is to use the web-service interface to embedded data providers, and INSEE is one of them. Try:
sdmx <- readSDMX(providerId = "INSEE", resource = "data",
flowRef = "DEFAILLANCES-ENT-FR-ACT",
key = "M.AZ+BE.BRUT+CVS-CJO", key.mode = "SDMX",
start = 2010, end = 2015)
as.data.frame(sdmx)
AFAIK the package also contains issues to the character encoding, but i'm currently investigating a solution to make available soon in the package. Calling getURL(file, .encoding="UTF-8") properly retrieves data, but encoding is lost calling xml functions.
Note: I also see you use a parameter lastNObservations. For the moment the web-service interface does not support extra parameters, but it may be made available quite easily if you need it.

Reading a access database (mdb) in 64 bit in R

I have a database and I need to read that in R. I found some packages such as Hmisc and RODBC which have the functions to do that. I am using windows and was not able to use Hmisc because you need to have mdb-tools package and I found no tutorial or way to install mdb-tools on windows.
Now, I was trying to start with RODBC. I found this question "How to connect R with Access database in 64-bit Window?" which shows how to have a connection in windows. I tried to use the command similar to what was accepted answer in that question.
odbcDriverConnect("Driver={Microsoft Access Driver (*.mdb, *.accdb)}; DBQ=E:/Projects\Newdata/2013 Database/Data/pgdabc_SW.mdb")
It gives the following error :
1: In odbcDriverConnect("Driver={Microsoft Access Driver (*.mdb, *.accdb)}, DBQ=E:/Projects\Newdata/2013 Database/Data/pgdabc_SW.mdb") :
[RODBC] ERROR: state 01S00, code 0, message [Microsoft][ODBC Driver Manager] Invalid connection string attribute
2: In odbcDriverConnect("Driver={Microsoft Access Driver (*.mdb, *.accdb)}, DBQ=E:/Projects\Newdata/2013 Database/Data/pgdabc_SW.mdb") :
ODBC connection failed
I am not sure how to check and start diagnosing what's going on here. I went to administrative tools and checked the options on "Data Sources (ODBC)". . I changed the target to sysWOW.
Then I created a new data source as follows:
I am not sure if I need to select database or not. I found Brian Ripley's http://cran.r-project.org/web/packages/RODBC/vignettes/RODBC.pdf RODBC tutorial but still I am not able to make it work.
This works fine for me & might work for you, too:
require(RODBC)
conn <- odbcConnectAccess2007(path.expand("~/Database.accdb"))
subset(sqlTables(conn), TABLE_TYPE == "TABLE")
df <- sqlFetch(conn, "Table1")
close(conn)
My sessionInfo():
# R version 3.1.1 (2014-07-10)
# Platform: x86_64-w64-mingw32/x64 (64-bit)
#
# other attached packages:
# [1] RODBC_1.3-10
#
# loaded via a namespace (and not attached):
# [1] tools_3.1.1
I have had issues with this (trying to query Access 32bit from R 64bit) from a long time. I think it has been fixed in windows 10.
I made a kludge by modifying something I found in this post:
How to connect R with Access database in 64-bit Window?
I made a function that saves a script (which in turn connects to the database and saves the result of the query), run it using R32, and load the data into the R64 work environment.
I prepared it for Access 2007, but something analogue could be done for Access2003 (just using odbcConnectAccess instead of odbcConnectAccess2007) or other 32 bit databases
MysqlQueryAccess2007<-function(filename,query){
tempdir=gsub('\\\\','/',tempdir())
txt<-paste("if (!'RODBC' %in% installed.packages()) install.packages('RODBC')
require(RODBC)
channel<-odbcConnectAccess2007('",filename,"')
data<-sqlQuery(channel,\"",query,"\")
save(data,file=paste('",tempdir,"','tempRODBCquery.Rdata',sep='/'))
close(channel)",sep="")
writeLines(txt,con=paste(tempdir,'RODBCscripttemp.r',sep='/')->tempscript)
system(paste0(Sys.getenv("R_HOME"), "/bin/i386/Rscript.exe ",tempscript))
tt<-get(load(paste(tempdir,'tempRODBCquery.Rdata',sep='/')))
return(tt)
}
Then you only have to do the queries this way:
dat<-MysqlQueryAccess2007("samplefile.accdb","SELECT TOP 5 * FROM TableI")
Have been trying to figure it out for a while myself.
Solution given in the accepted answer here
Reading data from 32-bit Access db using 64-bit R, credits to #erg,
as well as here
How to connect R with Access database in 64-bit Window?, credits to #JATT.
The bottom line:
Install 64-bit Microsoft Access drivers https://www.microsoft.com/en-us/download/details.aspx?id=54920
Setup appropriate System DSN in ODBC Data Sources (64-bit)
In R 64-bit read .mdb file by using odbc package: dbConnect(odbc(), 'your_64bit_dsn').

Resources