SOH (\001) Fastload Delimiter - teradata

I'm trying to set the delimiter in a Fastload to \001 with no success.
Anybody know if this is possible?
SET record vartext "\001";

It is not possible.
Teradata Fastload Reference, Chapter 3: Teradata Fasload Commands, Section 'SET RECORD'
The delimiter can be a single or multi-character sequence (or string)
...
No control character other than a tab character can be used in a
delimiter.

Related

PLSQL: Find invalid characters in a database column (UTF-8)

I have a text column in a table which I need to validate to recognize which records have non UTF-8 characters.
Below is an example record where there are invalid characters.
text = 'PP632485 - Hala A - prace kuchnia Zepelin, wymiana muszli, monta􀄪 tablic i uchwytów na r􀄊czniki, wymiana zamka systemowego'
There are over 3 million records in this table, so I need to validate them all at once and get the rows where this text column has non UTF-8 characters.
I tried below:
instr(text, chr(26)) > 0 - no records get fetched
text LIKE '%ó%' (tried this for a few invalid characters I noticed) - no records get fetched
update <table> set text = replace(text, 'ó', 'ó') - no change seen in text
Is there anything else I can do?
Appreciate your input.
This is Oracle 11.2
The characters you're seeing might be invalid for your data, but they are valid AL32UTF8 characters. Else they would not be displayed correctly. It's up to you to determine what character set contains the correct set of characters.
For example, to check if a string only contains characters in the US7ASCII character set, use the CONVERT function. Any character that cannot be converted into a valid US7ASCII character will be displayed as ?.
The example below first replaces the question marks with string '~~~~~', then converts and then checks for the existence of a question mark in the converted text.
WITH t (c) AS
(SELECT 'PP632485 - Hala A - prace kuchnia Zepelin, wymiana muszli, monta􀄪 tablic i uchwytów na r􀄊czniki, wymiana zamka systemowego' FROM DUAL UNION ALL
SELECT 'Just a bit of normal text' FROM DUAL UNION ALL
SELECT 'Question mark ?' FROM DUAL),
converted_t (c) AS
(
SELECT
CONVERT(
REPLACE(c,'?','~~~~~')
,'US7ASCII','AL32UTF8')
FROM t
)
SELECT CASE WHEN INSTR(c,'?') > 0 THEN 'Invalid' ELSE 'Valid' END as status, c
FROM converted_t
;
Invalid
PP632485 - Hala A - prace kuchnia Zepelin, wymiana muszli, montao??? tablic i uchwyt??w na ro??Sczniki, wymiana zamka systemowego
Valid
Just a bit of normal text
Valid
Question mark ~~~~~
Again, this is just an example - you might need a less restrictive character set.
--UPDATE--
With your data: it's up to you to determine how you want to continue. Determine what is a good target data set. Contrary to what I set earlier, it's not mandatory to pass a "from dataset" argument in the CONVERT function.
Things you could try:
Check which characters show up as '�' when converting from UTF8 at AL32UTF8
select * from G2178009_2020030114_dinllk
WHERE INSTR(CONVERT(text ,'AL32UTF8','UTF8'),'�') > 0;
Check if the converted text matches the original text. In this example I'm converting to UTF8 and comparing against the original text. If it is different then the converted text will not be the same as the original text.
select * from G2178009_2020030114_dinllk
WHERE
CONVERT(text ,'UTF8') = text;
This should be enough tools for you to diagnose your data issue.
As shown by previous comments, you can detect the issue in place, but it's difficult to automatically correct in place.
I have used https://pypi.org/project/ftfy/ to correct invalidly encoded characters in large files.
It guesses what the actual UTF8 character should be, and there are some controls on how it does this. For you, the problem is that you have to pull the data out, fix it, and put it back in.
So assuming you can get the data out to the file system to fix it, you can locate files with bad encodings with something like this:
find . -type f | xargs -I {} bash -c "iconv -f utf-8 -t utf-16 {} &>/dev/null || echo {}"
This produces a list of files that potentially need to be processed by ftfy.

Fast Load with Control character delimiter

I have a file with Control P character(DLE data link escape) as delimiter,
I am using below command for setting the record type and delimiter
(^P is control V+ Control P). As of now everything is fine data is loading to table successfully. But when i use hexadecimal equivalent for Control P character which is 'x10'. The fast load script failing with Delimited Data Parsing error: Column
length overflow(s) in row 1 for column 1
.SET RECORD vartext '^P';

Want a commandline to get the data as it is as we get when we export data

I have a data that is having some thousands of records and each record having multiple columns.One of the column is having a data where there is a punctuation mark "," in it.
When I had tried to spool that data into a csv file and text to columns data using the delimters as comma,the data seems to be inappropriate as the data itself has a comma in it.
I am looking for a solution where I can export the data using a command line which is having as it is look when I export the data via TOAD.
Any help is much appreciated.
Note: I was looking for this solution since many days but got a chance now to post it here.
When exporting the dataset in Toad, select a delimiter other than a comma or drop down the "string quoting" dropdown box and select "double quote strings including NULLS".
Oh wait if you are spooling output, you'll need to add the double-quotes in your select statement like this in order to surround the columns containing the comma with double-quotes:
select '"' || column || '"' as column from table;
This format is pretty standard but use pipes as delimiters instead and save space by not having to wrap strings in double-quotes. Depends on what the consumer of the data requires really.

SQLite table and column name requirements

I'm wondering what constraints SQLite puts on table and column names when creating a table. The documentation for creating a table says that a table name can't begin with "sqlite_" but what other restrictions are there? Is there a formal definition anywhere of what is valid?
SQLite seems surprisingly accepting as long as the name is quoted. For example...
sqlite> create table 'name with spaces, punctuation & $pecial characters?'(x int);
sqlite> .tables
name with spaces, punctuation & $pecial characters?
If you use brackets or quotes you can use any name and there is no restriction :
create table [--This is a_valid.table+name!?] (x int);
But table names that don't have brackets around them should be any alphanumeric combination that doesn't start with a digit and does not contain any spaces.
You can use underline and $ but you can not use symbols like: + - ? ! * # % ^ & # = / \ : " '
From the sqlite doc,
If you want to use a keyword as a name, you need to quote it. There are four ways of quoting keywords in SQLite:
'keyword' A keyword in single quotes is a string literal.
"keyword" A keyword in double-quotes is an identifier.
[keyword] A keyword enclosed in square brackets is an identifier. This is not standard SQL. This quoting mechanism is used by MS Access and SQL Server and is included in SQLite for compatibility.
`keyword` A keyword enclosed in grave accents (ASCII code 96) is an identifier. This is not standard SQL. This quoting mechanism is used by MySQL and is included in SQLite for compatibility.
So, double quoting the table name and you can use any chars. [tablename] can be used but not a standard SQL.

What options are there for the sqsh style "csv" (or anyway to get tab-delimited out)

With the DB tool sqsh, I want to get the column names and the data tab delimited.
The bcp option does not include the column names.
The csv option includes the column names, but uses comma as the separate (doh). Is there a way to change it?
Currently looking to post-process the file to change the commas to tabs (ignoring the commas within strings...).
You can \set colsep="\t" to change the separator for the standard output to tab.
Edit: \t didn’t work (in my cygwin), so I used <CTRL-V><TAB>. That works:
[228] > \set colsep=" " -- Hit CTRL-V then <TAB> here.
[229] > select 'ABC' as STRING, 12 as INT;
STRING INT
------ -----------
ABC 12
(1 row affected)
Please note that since sqsh version 2.5 it is now possible to assign control characters to some variables like colsep, linesep, bcp_colsep and bcp_rowsep. So the
\set colsep="\t"
should work now properly with sqsh-2.5.

Resources