SAP_HANA) How to apply 'Client Code Page' option when connect to ODBC on Linux - odbc

When I run a SQL statement containing UTF8 characters, DB does not seem to recognize the character Encoding.
When accessing SAP_HANA DB with ODBC, I want to pass the Code page to the factor together, but there is no example of how to use the option.
Currently, the connection string except for the Code page option in C++ code is as follows.
connectString =
"driver={HDBODBC};UID=" + userId +
";PWD=" + password +
";serverNode=" + serverNode +
";DATABASENAME=" + databaseName +
";CHAR_AS_UTF8=true";
How do I add it here to use the Codepage option?
I think it's as below, but it's uncertain.
connectString =
"driver={HDBODBC};UID=" + userId +
";PWD=" + password +
";serverNode=" + serverNode +
";DATABASENAME=" + databaseName +
";CHAR_AS_UTF8=true" +
";ClientCodePage=65001;
additional question
If Korean is included in the query as shown below
The column name returns normally, but the value returns abnormally.
In this case, applying the CHAR_AS_UTF8 option to false allows you to include Korean in the SQL statement, but you cannot obtain Korean data.
SQL : SELECT '한글' FROM DUMMY;
JDBC = '한글'
한글
ODBC = '한글'
��
Here's the problem. If you write Korean as a condition, the query runs normally, but you cannot get Korean data because CHAR_AS_UTF8 is false.
How can we take action?
SQL : SELECT * FROM TEST WHERE col1 = '한글';

The SAP HANA ODBC driver does not provide automatic character set conversion for non-Unicode character sets. (see https://help.sap.com/docs/SAP_HANA_PLATFORM/0eec0d68141541d1b07893a39944924e/7cab593774474f2f8db335710b2f5c50.html - there is no option for that).
Instead, the application has to know how to interpret the codes stored in the database, if it indeed uses a non-Unicode encoding.
From a HANA point of view, all strings are unicode strings, even if declared as VARCHAR, CHAR, or CLOB (instead of NVARCHAR, NCHAR, or NCLOB).

Related

Teradata query returns bad characters in string column but exporting to CSV from assistant console works

I am using DBI package in R to connect to teradata this way:
library(teradatasql)
query <- "
SELECT sku, description
FROM sku_table
WHERE sku = '12345'
"
dbconn <- DBI::dbConnect(
teradatasql::TeradataDriver(),
host = teradataHostName, database = teradataDBName,
user = teradataUserName, password = teradataPassword
)
dbFetch(dbSendQuery(dbconn, query), -1)
It returns a result as follows:
SKU DESCRIPTION
12345 18V MAXâ×¢ Collated Drywall Screwgun
Notice the bad characters â×¢ above. This is supposed to be superscript TM for trademarked.
When I use SQL assistant to run the query, and export the query results manually to a CSV file, it works fine as in the DESCRIPTION column has correct encoding.
Any idea what is going on and how I can fix this problem? Obviously, I don't want a manual step of exporting to CSV and re-reading results back into R data frame, and into memory.
The Teradata SQL Driver for R (teradatasql package) only supports the UTF8 session character set, and does not support using the ASCII session character set with a client-side character set for encoding and decoding.
If you have stored non-LATIN characters in a CHARACTER SET LATIN column in the database, and are using a client-side character set to encode and decode those characters for the "good" case, that will not work with the teradatasql package.
On the other hand, if you used the UTF8 or UTF16 session character set to store Unicode characters into a CHARACTER SET UNICODE column in the database, then you will be able to retrieve those characters successfully using the teradatasql package.

Can't find a way to gather data and upload it again to a different SQL server without breaking the encoding R/dbPlyr/DBI

Basic setup is that I connect to a database A, get some data back to R, write it to another connection, database B.
The database is SQL_Latin1_General_CP1_CI_AS and I'm using encoding = "windows-1252" in connection A and B.
Display in RStudio is fine, special characters show as they should.
When I try to write the data, I get a "Cannot insert the value NULL into column".
I narrowed it down to at least one offending field: a cell with a PHI symbol, which causes the error.
How do I make it so the PHI symbol and presumably other special characters are kept the same from source to destination?
conA <- dbConnect(odbc(),
Driver = "ODBC Driver 17 for SQL Server",
Server = "DB",
Database = "serverA",
Trusted_connection = "yes",
encoding = "1252")
dbWriteTable(conB,SQL("schema.table"),failing_row, append = T)
#This causes the "cannot insert null value" error
I suggest working around this problem without dbplyr. As the overwhelming majority of encoding questions have nothing to do with dbplyr (the encoding tag has 23k questions, while the dbplyr tag has <400 questions) this may be a non-trivial problem to resolve without dbplyr.
Here are two work-arounds to consider:
Use a text file as an intermediate step
R will have no problem writing an in-memory table out to a text file/csv. And SQL server has standard ways of reading in a text file/csv. This gives you the added advantage of validating the contents of the text file before loading it into SQL.
Documentation for SQL Server BULK INSERT can be found here. This answer gives instructions for using UTF-8 encoding: CODEPAGE = '65001'. And this documentation gives instructions for unicode: DATAFILETYPE = 'widechar'.
If you want to take this approach entirely within R, it will likely look something like:
write.csv(failing_row, "output_file.csv")
query_for_creating_table = "CREATE TABLE schema_name.table_name (
col1 INT,
col2 NCHAR(10),
)"
# each column listed along with suitable data types
query_for_bulk_insert = "BULK INSERT schema_name.table_name
FROM 'output_file.csv'
WITH
(
DATAFILETYPE = 'widechar',
FIRSTROW = 2,
ROWTERMINATOR = '\n'
)"
DBI::dbExecute(con, query_for_creating_table)
DBI::dbExecute(con, query_for_bulk_insert)
Load all the non-error rows and append the final row after
I have has some success in the past using the INSERT INTO syntax. So would recommend loading the failing row using this approach.
Something like the following:
failing_row = local_df %>%
filter(condition_to_get_just_the_failing_row)
non_failing_rows = local_df %>%
filter(! condition_to_get_just_the_failing_row)
# write non-failing rows
dbWriteTable(con, SQL("schema.table"), non_failing_rows, append = T)
# prep for insert failing row
insert_query = "INSERT INTO schema.table VALUES ("
for(col in colnames(failing_row)){
value = failing_row[[col]]
if(is.numeric(value)){
insert_query = paste0(insert_query, value, ", ")
} else {
insert_query = paste0(insert_query, "'", value, "', ")
}
}
insert_query = paste0(insert_query, ");")
# insert failing row
dbExecute(con, insert_query)
other resources
If you have not seen them already, here are several related Q&A that might assist: Arabic characters, reading UTF-8, encoding to MySQL, and non-Latin characters as question marks. Though some of these are for reading data into R.

Special Characters are Converted to ? When Inserting into Oracle Database Using R

I'm making a connection to an oracle database using the ROracle package and DBI package. When I try to execute insert statements that have special characters, the special characters get converted to non-special characters. (I'm sure there's more correct terms for "special" and "non-special" that I'm not aware of).
First I make the following connection:
connection <- dbConnect(
dbDriver("Oracle"),
username = "xxxxx",
password = "xxxxx",
dbname = "xxxx"
)
Then I execute the following insert statement on a table I already have created. Column A has a type of nvarchar2.
dbSendQuery(connection, "insert into TEST_TABLE (A) values('£')")
This is what gets returned:
Statement: insert into TEST_TABLE (A) values('#')
Rows affected: 1
Row count: 0
Select statement: FALSE
Statement completed: TRUE
OCI prefetch: FALSE
Bulk read: 1000
Bulk write: 1000
As you can see, the "£" symbol gets replaced by a "#". I can execute the insert statement directly in PL/SQL and there's no issue, so it seems to be an issue with R. Any help is appreciated.
This was resolved by running Sys.setenv(NLS_LANG = "AMERICAN_AMERICA.AL32UTF8") before creating the connection.

I am using OLEDB to read excel file into datatable. some values are missing(Empty). Why OLEDB is skipping string values? [duplicate]

I'm creating a utility to import data from Excel to Oracle database,
I have a fixed template for the excel file,
Now, when I'm trying to import the data by Jet provider and ADO.Net - Ole connection tools, I found the following problem: there're some columns haven't been imported because there are mixed data types in their columns [string and number],
I looked for this problem on the internet I found the reason is guessing data types from Excel
The load code:
connection = new OleDbConnection(#"Provider=Microsoft.Jet.OLEDB.4.0; Data Source={0};Extended Properties=Excel 8.0;");
string columns = "P_ID, FULL_NAME_AR, job_no, GENDER, BIRTH_DATE, RELIGION, MARITAL_STATUS, NAT_ID, JOB_Name, FIRST_HIRE_DATE, HIRE_DATE, CONTRACT_TYPE, GRADE_CODE, QUALIFICATION";
string sheetName = "[Emps$]";
OleDbCommand command = new OleDbCommand(string.Format("select {0} from {1} where p_id is not null", columns, sheetName), connection);
connection.Open();
dr = command.ExecuteReader();
DataTable table = new DataTable();
table.Load(dr);
What should I do to tell Excel STOP GUESSING and give me the data as Text ?
if there isn't, can you help me with any workarounds ?
Thanks in advance
I found a solution by adding IMEX=1 for the connection string, but there's a special format for it which descriped in the following link.
The IMEX parameter is for columns that use mixed numeric and alpha values.
The Excel driver will typically scan the first several rows
in order to determine what data type to use for each column. If a column is determined to be numeric
based upon a scan of the first several rows, then any rows with alpha characters in this column will
be returned as Null. The IMEX parameter (1 is input mode) forces the data type of the column to
text so that alphanumeric values are handled properly.
Regards
This isn't completely right! Apparently, Jet/ACE ALWAYS assumes a string type if the first 8 rows are blank, regardless of IMEX=1, and always uses a numeric type if the first 8 rows are numbers (again, regardless of IMEX=1). Even when I made the rows read to 0 in the registry, I still had the same problem. This was the only sure fire way to get it to work:
try
{
Console.Write(wsReader.GetDouble(j).ToString());
}
catch //Lame unfixable bug
{
Console.Write(wsReader.GetString(j));
}
Can you work from the excel end? This example run in Excel will put mixed data tyoes into an SQL Server table:
Dim cn As New ADODB.Connection
scn = "Provider=Microsoft.Jet.OLEDB.4.0;Data Source=" _
& sFullName _
& ";Extended Properties=""Excel 8.0;HDR=Yes;IMEX=1"";"
cn.Open scn
s = "SELECT Col1, Col2, Col3 INTO [ODBC;Description=TEST;DRIVER=SQL Server;" _
& "SERVER=Some\Instance;Trusted_Connection=Yes;" _
& "DATABASE=test].TableZ FROM [Sheet1$]"
cn.Execute s
An alternative solution is to add or change the setting TypeGuessRows in the registry. By setting its value to 0, the complete document will be scanned.
Unfortunately, the settings may be found on various locations in the registry, depending on the which libraries and versions of them you have installed.
For instance:
[HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node\Microsoft\Jet\4.0\Engines\Excel]
"TypeGuessRows"=dword:00000000
[HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Office\14.0\Access Connectivity Engine\Engines\Excel]
"TypeGuessRows"=dword:00000000
This will also prevent truncation of textual data longer than 255 characters. This happens if you have a number for TypeGuessRows larger than 0 and the first text longer than 255 characters occurs beyond that number.
See also Setting TypeGuessRows for excel ACE Driver.

SQLite - Insert special symbols (trademark, ...) into table

How can I insert special symbols like trademark into SQLite table? I have tried to use PRAGMA encoding = "UTF-16" with no effect :(
Typically if you surround an SQL entry with ''Single quotes, it goes in as a literal.
i.e.
'™'
problem solved. it is necessary to open DB file with sqlite3_open16, then execute command PRAGMA encoding = \"UTF-16\"; (I am not sure, if it is necessary). Now the insert will be done with UTF-16.
To select from db (to get column value) is necessary to use sqlite3_column_text16 function

Resources