Let's say I need to CAST(birth_date AS DATE FORMAT 'MM/DD/YYYY')
If the birth_date field contains nulls or invalid characters it throws and untranslatable character error
Of course I can use the regex, otranslate but all that overcomplicates the sql
Is there any way to suppress all these errors ? CAST if you can, otherwise make it null?
The burden of checking whether the data fits into the data type you wish to store it must reside somewhere. You could use CASE {regular expression matching} THEN CAST() ELSE NULL END which may be the cleanest way to address the data quality validation in your SQL.
Otherwise, pre-process your data file to replace bad data with a token you can replace with NULL in your SQL. You can consider doing this in PowerShell, UNIX shell scripting, or perhaps a third-party tool (e.g. address cleansing/formatting, etc.).
Since there is no built in way to say CAST(<field> as <datatype) IGNORING ERRORS AS <alias> you could use a TPT script instead.
In TPT APPLY you can have an INSERT statement route errors into two different Error tables.
Something like the following would get you close. This is something that you would run after your dirty date table is loaded to get them into a clean date table.
DEFINE JOB DATA_insert_Example
(
DEFINE OPERATOR data_insert_Example
TYPE UPDATE
SCHEMA *
ATTRIBUTES
(
VARCHAR UserName,
VARCHAR UserPassword,
VARCHAR LogTable,
VARCHAR TargetTable,
INTEGER BufferSize,
INTEGER ErrorLimit = 5,
INTEGER MaxSessions = 4,
INTEGER MinSessions = 1,
INTEGER TenacityHours,
INTEGER TenacitySleep,
VARCHAR AccountID,
VARCHAR AmpCheck,
VARCHAR DeleteTask,
VARCHAR ErrorTable1 = '<yourdatabase>.<yourcleantable>'||'_ET',
VARCHAR ErrorTable2 = '<yourdatabase>.<yourcleantable>'||'_UV',
VARCHAR NotifyExit,
VARCHAR NotifyExitIsDLL,
VARCHAR NotifyLevel,
VARCHAR NotifyMethod,
VARCHAR NotifyString,
VARCHAR PauseAcq,
VARCHAR PrivateLogName,
VARCHAR TdpId,
VARCHAR TraceLevel,
VARCHAR WorkingDatabase = <yourdatabase>,
VARCHAR WorkTable = '<yourdatabase>.<yourcleantable>'||'_Work'
);
DEFINE SCHEMA data_insert_schema
(
field1 VARCHAR(20),
field2 VARCHAR(20),
field3 VARCHAR(20),
field4 VARCHAR(20)
);
DEFINE OPERATOR data_insert_export
TYPE EXPORT
SCHEMA W_0_s_DATA_esuh
ATTRIBUTES
(
VARCHAR UserName,
VARCHAR UserPassword,
STEP UPS
(
APPLY
(
'INSERT INTO <yourdatabase>.<yourcleantable>
field1,
field2,
field3,
field4
VALUES (
:field1,
:field2,
:field3,
:field4,
)';
)
TO OPERATOR
(
data_insert_Example[1]
ATTRIBUTES
(
UserName = <yourusername>,
UserPassword = <yourpass>,
LogTable = <yourdatabase>.<yourcleantable> ||'_LOG',
TargetTable = <yourdatabase>.<yourcleantable> ,
TdpId = <yourserverip/address>
)
)
SELECT * FROM OPERATOR
(
data_insert_export[1]
ATTRIBUTES
(
UserName = <yourusername>,
UserPassword = <yourpassword>,
SelectStmt = 'SELECT field1,field2,field3,field4 FROM <yourdatabase>.<yourtable> ;',
TdpId = '<yourserverip/address>'
)
);
);
)
Obviously, though, this is quite a bit more overkill than a simple REGEX. RegEx feels way overwhelming when you first start using it, but I think it's a completely reasonable usecase for checking dates stored as string literals before trying to convert them into their proper data type.
Overall, it sounds like you have garbage for data, so I totally get the frustration. Unfortunately for garbage data there is no magic bullet. You'll need some decent ETL between the garbage and your clean output.
Related
I have a simple tpt script (given below) to load image into CLOB column in an empty table.
USING CHARACTER SET UTF8
DEFINE JOB LoadingtableData
DESCRIPTION 'Loading data into table using TPT'
(
DEFINE SCHEMA TableStaging
DESCRIPTION 'SYS FILE Staging Table'
(
Col_Colb CLOB(131072) AS DEFERRED BY NAME
,Col_FNAME VARCHAR(100)
,Col_ID VARCHAR(50)
);
DEFINE OPERATOR FileReader()
DESCRIPTION 'Read file with list'
TYPE DATACONNECTOR PRODUCER
SCHEMA TableStaging
ATTRIBUTES (
VARCHAR TraceLevel = 'None'
, VARCHAR PrivateLogName = 'read_log'
, VARCHAR FileName = 'datafile.txt'
, VARCHAR OpenMode = 'Read'
, VARCHAR Format = 'Delimited'
, VARCHAR TextDelimiter = ',');
DEFINE OPERATOR SQLInserter()
DESCRIPTION 'Insert from files into table'
TYPE INSERTER
INPUT SCHEMA TableStaging
ATTRIBUTES (
VARCHAR TraceLevel = 'None'
, VARCHAR PrivateLogName = '#LOG'
, VARCHAR TdpId = '#TdpId '
, VARCHAR UserName = '#UserName '
, VARCHAR UserPassword = '#UserPassword ');
STEP LoadData (
APPLY ('INSERT INTO table_A(Col_Colb,Col_FNAME,Col_ID) VALUES (:Col_Colb,:Col_FNAME,:Col_ID);')
TO OPERATOR (SQLInserter [1])SELECT * FROM OPERATOR (FileReader ());););
To load data into table I'm using two text files:
File have all the Varchar column values and Clob data location.
Example data in file: <Clob_File_Location>,Name,123
Clob column value
Example data in file: Image.png
After executing the above tpt, I get message as "data loaded successfully". But when I check the table in place of image in Clob column text is loaded.
Can someone help in letting me know what I might be doing wrong.
Here i'm trying load a csv file into teradata tables using TPT utility
,but is filing with an error:
Here is my TPT script:
DEFINE JOB test_tpt
DESCRIPTION 'Load a Teradata table from a file'
(
DEFINE SCHEMA SCHEMA_EMP_NAME
(
NAME VARCHAR(50),
AGE VARCHAR(50)
);
DEFINE OPERATOR od_EMP_NAME
TYPE DDL
ATTRIBUTES
(
VARCHAR PrivateLogName = 'tpt_log',
VARCHAR LogonMech = 'LDAP',
VARCHAR TdpId = 'TeraDev',
VARCHAR UserName = 'user',
VARCHAR UserPassword = 'pwd',
VARCHAR ErrorList = '3807'
);
DEFINE OPERATOR op_EMP_NAME
TYPE DATACONNECTOR PRODUCER
SCHEMA SCHEMA_EMP_NAME
ATTRIBUTES
(
VARCHAR DirectoryPath= '/home/hadoop/retail/',
VARCHAR FileName = 'emp_age.csv',
VARCHAR Format = 'Delimited',
VARCHAR OpenMode = 'Read',
VARCHAR TextDelimiter =','
);
DEFINE OPERATOR ol_EMP_NAME
TYPE LOAD
SCHEMA *
ATTRIBUTES
(
VARCHAR LogonMech = 'LDAP',
VARCHAR TdpId = 'TeraDev',
VARCHAR UserName = 'user',
VARCHAR UserPassword = 'pwd',
VARCHAR LogTable = 'EMP_NAME_LG',
VARCHAR ErrorTable1 = 'EMP_NAME_ET',
VARCHAR ErrorTable2 = 'EMP_NAME_UV',
VARCHAR TargetTable = 'EMP_NAME'
);
STEP stSetup_Tables
(
APPLY
('DROP TABLE EMP_NAME_LG;'),
('DROP TABLE EMP_NAME_ET;'),
('DROP TABLE EMP_NAME_UV;'),
('DROP TABLE EMP_NAME;'),
('CREATE TABLE EMP_NAME(NAME VARCHAR(50), AGE VARCHAR(2));')
TO OPERATOR (od_EMP_NAME);
);
STEP stLOAD_FILE_NAME
(
APPLY
('INSERT INTO EMP_NAME
(Name,Age)
VALUES
(:Name,:Age);
')
TO OPERATOR (ol_EMP_NAME)
SELECT * FROM OPERATOR(op_EMP_NAME);
);
);
Call TPT:
tbuild -f test_tpt.sql
Above TPT script is failing with following error:
Teradata Parallel Transporter Version 15.10.01.02 64-Bit
TPT_INFRA: Syntax error at or near line 6 of Job Script File 'test_tpt.sql':
TPT_INFRA: At "NAME" missing RPAREN_ in Rule: Explicit Schema Element List
TPT_INFRA: Syntax error at or near line 8 of Job Script File 'test_tpt.sql':
TPT_INFRA: TPT03020: Rule: DEFINE SCHEMA
Compilation failed due to errors. Execution Plan was not generated.
Job script compilation failed .
Am i missing any detail in here?
The messages certainly could be clearer, but the issue is that NAME is a restricted word.
I have a Table with GUID date type field as Primery Key. when I run the query in DataSet window of ASP.net, the result is OK, but when I use it in ASP.Net Page it will return an error page like below :
Failed to enable constraints. One or more rows contain values violating non-null,unique, or foreign-key constraints.
The query has not any type of join, and a simple SUM query that add up daily work amounts in a filed based on date range which pass as a parameter to the query and Group by activities that exist in another filed.
This is the table :
CREATE TABLE [dbo].[DailyReport] (
[ReportDate] DATETIME NULL,
[ReportId] UNIQUEIDENTIFIER NOT NULL,
[ConstructionType] NVARCHAR (50) NULL,
[Zone] NVARCHAR (50) NULL,
[BuildingName] NVARCHAR (50) NULL,
[ActivityId] INT NULL,
[TodayWork] REAL NULL,
[Decription] NTEXT NULL,
PRIMARY KEY CLUSTERED ([ReportId] ASC)
);
And this is the query :
SELECT SUM(TodayWork) AS SumWork, ActivityId, COUNT(ReportId) AS RecordCount
FROM DailyReport
WHERE (ConstructionType = N'1') AND (ReportDate >= #DateStart) AND
(ReportDate <= #DateFinish)
GROUP BY ActivityId
Normally in c# int does not allow null values but that can do in sqlserver, that mean null values can be allowed by sql server but not in c#.
so it can be solved in 2 ways
1) either use allow null to false in database
2) or use nullable int type in c# code by using like "int? a=null"
I have created a table with an Id column as varchar(20).
I need a stored procedure which can increment id by 1.
I have tried this:
ALTER PROCEDURE dbo.spInsertCatQuery
(#Users_Id varchar(20),
#Cat_Id varchar(20),
#Query varchar(100),
#Query_Title varchar(50)
)
AS
BEGIN
Declare #Query_Id bigint
SELECT #Query_Id = coalesce((select max(Query_Id) + 1 from tblCatQuery), 1);
INSERT INTO tblCatQuery
VALUES(#Query_Id, #Users_Id, #Cat_Id, #Query_Title, #Query)
END
But it is not working after 10th record.
Change the selection of Query_id from your table to below
SELECT #Query_Id=
coalesce((select max(cast(Query_Id as int)) + 1 from tblCatQuery), 1);
Based on Gordon's comment; my understanding is that since ID is varchar max(id) is not fetching the correct max value but casting it will do so.
For example try this
create table testtab (id varchar(10));
insert into testtab values(2),(200),(53)
If you say below it will return 53
select MAX(id) from testtab
but this one will return 200
select MAX(cast(id as int)) from testtab
Tested in SQL SERVER 2008 R2
You do know your stored procedure has an implicit race condition, don't you?
Between your calculating the new query id and your table insert getting committed, another session can come in, get exactly the same query id, insert it and get committed. Guess what happens when your insert tries to commit? First in wins; the second gets a duplicate key error. Don't ask me how I know this :)
If you really need a text query id, you might try using a computed field, something like this:
create table dbo.tblCatQuery
(
query_id int not null identity(1,1) primary key clustered ,
query_id_text as right('0000000000'+convert(varchar,id),10) ,
user_id varchar(20) not null ,
cat_id varchar(20) not null ,
query varchar(100) not null ,
query_title varchar(50) not null ,
)
Then your stored procedure looks like this:
create procedure dbo.spInsertCatQuery
#Users_Id varchar(20) ,
#Cat_Id varchar(20) ,
#Query varchar(100) ,
#Query_Title varchar(50) ,
#Query_ID varchar(10) output
AS
insert dbo.tblCatQuery ( user_id , cat_id , query_title , query )
VALUES ( #Users_Id , #Cat_Id , #Query_Title , #Query )
-- give the caller back the id of the row just inserted
set #Query_ID = ##SCOPE_IDENTITY
-- for redundancy, hand it back as the SP's return code, too
return #Query_ID
GO
It sounds like your application needs a string for the ID field, yet in the database you want it ID to behave as an auto-incrementing integer field.
Consider using an integer in the database, and when you retrieve the value and need to use it as as string, at that point convert the value to a string, either in your query or in your application.
This will solve your problem.
You must seriously review your design. I shall suggest something like this.
CREATE TABLE tblCatQuery(QueryId int NOT NULL PRIMARY KEY IDENTITY(1, 1),
UserId int NOT NULL REFERENCES tblUsers(UserId),
CatId int NOT NULL REFERENCES tblCat(CatId),
Query varchar(100), Query_Title varchar(50))
CREATE TABLE tblUsers(UserId int NOT NULL PRIMARY KEY IDENTITY(1, 1), ....
CREATE TABLE tblCat(CatId int NOT NULL PRIMARY KEY IDENTITY(1, 1), ....
CREATEPROCEDURE dbo.spInsertCatQuery
(
#Users_Id int,
#Cat_Id int,
#Query varchar(100),
#Query_Title varchar(50)
)
AS
BEGIN
INSERT INTO tblCatQuery(Users_Id, Cat_Id, Query_Title, Query)
VALUES( Users_Id, Cat_Id, Query_Title, Query)
END
I am using asp.net2008 and MY SQL.
I want to auto-generate the value for the field username with the format as
"SISI001", "SISI002",
etc. in SQL whenever the new record is going to inserted.
How can i do it?
What can be the SQL query ?
Thanks.
Add a column with auto increment integer data type
Then get the maximum value of that column in the table using "Max()" function and assign the value to a integer variable (let the variable be 'x').
After that
string userid = "SISI";
x=x+1;
string count = new string('0',6-x.ToString().length);
userid=userid+count+x.ToString();
Use userid as your username
Hope It Helps. Good Luck.
PLAN A>
You need to keep a table (keys) that contains the last numeric ID generated for various entities. This case the entity is "user". So the table will contain two cols viz. entity varchar(100) and lastid int.
You can then have a function written that will receive the entity name and return the incremented ID. Use this ID concatenated with the string component "SISI" to be passed to MySQL for insertion to the database.
Following is the MySQL Table tblkeys:
CREATE TABLE `tblkeys` (
`entity` varchar(100) NOT NULL,
`lastid` int(11) NOT NULL,
PRIMARY KEY (`entity`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
The MySQL Function:
DELIMITER $$
CREATE FUNCTION `getkey`( ps_entity VARCHAR(100)) RETURNS INT(11)
BEGIN
DECLARE ll_lastid INT;
UPDATE tblkeys SET lastid = lastid+1 WHERE tblkeys.entity = ps_entity;
SELECT tblkeys.lastid INTO ll_lastid FROM tblkeys WHERE tblkeys.entity = ps_entity;
RETURN ll_lastid;
END$$
DELIMITER ;
The sample function call:
SELECT getkey('user')
Sample Insert command:
insert into users(username, password) values ('SISI'+getkey('user'), '$password')
Plan B>
This way the ID will be a bit larger but will not require any extra table. Use the following SQL to get a new unique ID:
SELECT ROUND(NOW() + 0)
You can pass it as part of the insert command and concatenate it with the string component of "SISI".
I am not an asp.net developer but i can help you
You can do something like this...
create a sequence in your mysql database as-
CREATE SEQUENCE "Database_name"."SEQUENCE1" MINVALUE 1 MAXVALUE 9999999999999999999999999999 INCREMENT BY 001 START WITH 21 CACHE 20 NOORDER NOCYCLE ;
and then while inserting use this query-----
insert into testing (userName) values(concat('SISI', sequence1.nextval))
may it help you in your doubt...
Try this:
CREATE TABLE Users (
IDs int NOT NULL IDENTITY (1, 1),
USERNAME AS 'SISI' + RIGHT('000000000' + CAST(IDs as varchar(10)), 4), --//getting uniqueness of IDs field
Address varchar(150)
)
(not tested)