I try to query some data from a table in Snowflake into Rstudio.
I've set-up ODBC properly and execute the below R code:
Data <-dbGetQuery(snowflake, " SELECT * FROM my_table")
Then, I get the data into a data frame but there are some entries which show � character.
For example the German word Groß is imported as Gro� into R.
I checked the original my_table and the data are stored without � symbols there.
How may I fix this issue?
Specifying the encoding fixed the issue
dbConnect(odbc::odbc(), "snowflake", uid="username", pwd='password',encoding = "latin1")
Data <-dbGetQuery(snowflake, " SELECT * FROM my_table")
Related
I'm trying to save text into SQL Server using R and ODBC Driver 17 for SQL Server. I have this simple script:
test <- data.table(dbGetQuery(con, "select LOCATIONNAME from LOCATION where LOCATION = 'AT2331'"))
name <- test[, LOCATIONNAME]
sql <- paste0('INSERT INTO LOCATION_TEST (TEST) VALUES (\'', name, '\')')
dbGetQuery(con, sql)
The data in the original table is VÖSENDORF and it's shown properly in R, but the result in the database is VÖSENDORF. The column type is nvarchar and server collation is SQL_Latin1_General_CP1_CI_AS. How can I fix this?
I also tried
sql2 <- iconv(sql, from="UTF-8", to="UTF-16LE")
But that says:
embedded nul in string
Edit:
Using 'N' prefix doesn't change the result:
sql <- paste0("INSERT INTO LOCATION_TEST (TEST) VALUES (N'", name, "')")
If I use just a hard coded value, it works fine:
sql <- paste0("INSERT INTO LOCATION_TEST (TEST) VALUES (N'VÖSENDORF')")
This is RStudio in Windows. Output of Sys.getlocale():
"LC_COLLATE=English_United Kingdom.1252;LC_CTYPE=English_United Kingdom.1252;LC_MONETARY=English_United Kingdom.1252;LC_NUMERIC=C;LC_TIME=English_United Kingdom.1252"
Tested my code in Linux with following locale, and there it works fine:
"LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF-8;LC_COLLATE=en_US.UTF-8;LC_MONETARY=en_US.UTF-8;LC_MESSAGES=en_US.UTF-8;LC_PAPER=en_US.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US.UTF-8;LC_IDENTIFICATION=C"
Tested with the latest R version, 4.2.0 (2022-04-22 ucrt) and the same issue still continues.
This is my data SÃO which is being read by R properly.
However, while saving it in Oracle, it is being stored as SÃO.
I have set both DB and OS where R hosted to UTF-8.
Before calling sqlSave executing Sys.setenv(NLS_LANG = "AMERICAN_AMERICA.AL32UTF8") explicitly to set the encoding once again but no luck.
Sys.setenv(NLS_LANG = "AMERICAN_AMERICA.AL32UTF8")
lconn <- odbcConnect("MYDB",uid="*****",pwd="*****",believeNRows=FALSE);
aout<-sqlSave(lconn,txtdframe,tablename=datatblnm,append=TRUE,fast=FALSE,safer=TRUE,rownames=FALSE,nastring=NULL,varTypes=varTypes,verbose=FALSE);
So, I've been struggling with this for a while now and can't seem to google my way out of it. I'm trying to read a .sql file into R, I always do that to avoid putting 100+ lines of sql in my R scripts. I usually do this:
library(tidyverse)
library(DBI)
con <- dbConnect(<CONNECTION ARGUMENTS>)
query <- read_file("path/to/script.sql")
df <- as_tibble(dbGetQuery(con, query))
dbDisconnect(con)
However, this time my sql script has some Spanish characters in it. Say something like this:
select tree_id, tree
from forest.trees
where species = 'árbol'
When I read this script into R and make the query it just doesn't return anything, but if I copy and paste the sql script into an R string it works! So it seems that the problem is in the line where I read the script into R.
I tried changing the string's encoding in a couple of ways:
# none of these work
query <- read_file("path/to/script.sql")
Encoding(query) <- "latin1"
query <- readLines("path/to/script.sql", encoding = "latin1")
query <- paste0(query, collapse = " ")
Unfortunately I don't have a public database to offer to anyone reading this. I'm connecting to a postgreSQL 11 database.
--- UPDATE ----
I'm on a windows 10 machine, with US locale.
When I use the read_file function the contents of query look ok, the Spanish characters print out like they should, but when I pass it to dbGetQuery it just doesn't fetch anything.
I tried forcing encoding "latin1" because I found online that Spanish characters tend to fix in R when doing that. When doing this, the Spanish characters print out wrong, so I didn't expected it to work, and it didn't.
The character values in my database have 'utf-8' encoding.
Just to be completely clear, all my attempts to read the .sql script haven't worked, however this does work:
library(tidyverse)
library(DBI)
con <- dbConnect(<CONNECTION ARGUMENTS>)
query <- "select tree_id, tree from forest.trees where species = 'árbol'"
# df actually has results
df <- as_tibble(dbGetQuery(con, query))
dbDisconnect(con)
The encoding statement is telling R how to interpret the filename, not its contents. Try this instead:
filetext <- readLines(file("path/to/script.sql", encoding = "latin1"))
See this answer for more details:R: can't read unicode text files even when specifying the encoding
So after some time to think about it, I wondered why the solution proposed by MrFlick didn't work. I checked the encoding of the file created by this chunk:
query <- "select tree_id, tree from forest.trees where species = 'árbol'"
write_lines(query, "test.sql")
After checking what encoding did test.sql have, it turned out it was ANSI, but it didn't look right. So I manually changed my original script.sql encoding to ANSI. After that it worked totally fine.
This solution however, didn't work when I cloned my repo on an ubuntu environment. In ubuntu there was no problem with the original 'utf-8' encoding.
Hope this helps anyone dealing with this in windows.
I'm using R library(RODBC) to import the results of a sql store procedure and save it in a data frame then export that data frame using write.table to write it to xml file (the results from sql is an xml output)
anyhow, R is truncating the string (imported xml results from sql).
I've tried to find a function or an option to expand the size/length of the R dataframe cell but didn't find any
I also tried to use the sqlquery in the write.table statement to ignore using a dataframe but also it didn't work, the imported data from sql is always truncated.
Anyone have any suggestions or an answer that could help me.
here is my code
#library & starting the sql connection
library(RODBC)
my_conn<-odbcDriverConnect('Driver={SQL Server};server=sql2014;database=my_conn;trusted_connection=TRUE')
#Create a folder and a path to save my output
x <- "C:/Users/ATM1/Documents/R/CSV/test"
dir.create(x, showWarnings=FALSE)
setwd(x)
Mpath <- getwd()
#importing the data from sql store procedure output
xmlcode1 <- sqlquery(my_conn, "exec dbo.p_webDefCreate 'SA25'", stringsAsFactors=F, as.is=TRUE)
#writing to a file
write.table(xmlcode1, file=paste0(Mpath,"/SA5b_def.xml"), quote = FALSE, col.names = FALSE, row.names = FALSE)
what I get is plain text that is not the full output.
and the code below is how I find the current length of my string
stri_length(xmlcode1)
[1] 65534
I had similar issue with our project, the data that was coming from the db was getting truncated to 257 characters, and I could not really get around it. Eventually I converted the column def on the db table from varchar(max) to varchar(8000) and I got all the characters back. I did not mind changing the table defintion.
In your case you can perhaps convert the column type in your proc output to varchar with some defined value if possible.
M
I am using PostgeSQL but experienced the same issue of truncation upon importing into R with RODBC package. I used Michael Kassa's solution with a slight change to set the data type to text which can store a string with unlimited length per postgresqltutorial. This worked for me.
The TEXT data type can store a string with unlimited length.
varchar() also worked for me
If you do not specify the n integer for the VARCHAR data type, it behaves like the TEXT datatype. The performance of the VARCHAR (without the size n) and TEXT are the same.
I'd like to import a bunch of large text files to a SQLite db using RSQLite. If my data were comma delimited, I'd do this:
library(DBI)
library(RSQLite)
db <- dbConnect(SQLite(), dbname = 'my_db.sqlite')
dbWriteTable(conn=db, name='my_table', value='my_file.csv')
But how about with '\t' -delimited data? I know I could read the data into an R data.frame and then create the table from that, but I'd like to go straight into SQLite, since there are lots of large files. When I try the above with my data, I get one single character field.
Is there a sep='\t' sort of option I can use? I tried just adding sep='\t', like this:
dbWriteTable(conn=db, name='my_table', value='my_file.csv', sep='\t')
EDIT: And in fact that works great, but a flaw in the file I was working with was producing an error. Also good to add header=TRUE, if you have headers as I do.
Try the following:
dbWriteTable(conn=db, name='my_table', value='my_file.csv', sep='\t')
Per the following toward the top of page 21 of http://cran.r-project.org/web/packages/RMySQL/RMySQL.pdf
When dbWriteTable is used to import data from a file, you may optionally specify header=,
row.names=, col.names=, sep=, eol=, field.types=, skip=, and quote=.
[snip]
sep= specifies the field separator, and its default is ','.