Objective: Using MariaDB I want to read some data from MS SQL Server (via ODBC Connect engine) and SELECT INSERT it into a local table.
Issue: I keep geting "error code 1406 data too long" even if source and destination varchar fields have the very same size (see further details)
Details:
The query which I'm trying to execute is in the form:
INSERT INTO DEST_TABLE(NUMERO_DOCUMENTO)
SELECT SUBSTR(TRIM(NUMERO_DOCUMENTO),0,5)
FROM CONNECT_SRC_TABLE
The above is the very minimal subset of fields which causes the problem.
The source CONNECT Table is actually a view inside SQL Server. The destination table has been defined so to be identical to the the ODBC CONNECT Table (same field names, same NULL constranints, same filed types ans sizes)
There's no issue on a couple of other VARCHAR fields
The issue is happening with a filed NUMERO_DOCUMENTO VARCHAR(14) DEFAULT NULL where the max length from the input table is 14
The same issue is also happening with 2 other fields ont the same table
All in all it seems to be an issue with the source data rather then the destination table.
Attemped workarounds:
I tried to force silent truncation but, reasonably, this does not make any difference: Error Code: 1406. Data too long for column - MySQL
I tried enlarging the destination field with no appreciable effect NUMERO_DOCUMENTO VARCHAR(100) DEFAULT NULL
I tried to TRIM the source field (hidden spaces?) and to limit its size at source to no avail: INSERT INTO DEST_TABLE(NUMERO_DOCUMENTO) SELECT SUBSTR(TRIM(NUMERO_DOCUMENTO),0,5) FROM CONNECT_SRC_TABLE but the very same error is always returned
Workaround:
I tried performing the same thing using a FOR x IN (src_query) DO INSERT .... END FOR and this solution seems to work: this means that the problem is not into the data itself but in how the engine performs the INSERT SELECT query
I have a set table in teradata , when I load duplicate records throough informatica , session fails because it tries to push duplicate records in SET table.
I want that whenever duplicate records being loaded informatica rejects them using TPT or Relation connection
can anyone help me with properties I need to set
Do you really need to keep track of what records are rejected due to duplication in the TPT logs? It seems like you are open to suggestions about TPT or relational connections, so I assume you don't really care about TPT level logs.
If this assumption is correct then you can simply put an Aggregator Transformation in the mapping and mark every field as Group By. As expected, this will add a group by clause in the generated query and eliminate duplicates in the source data.
Please try following things:
1. If you'll use fload or TPT Fast load then the utility will implicitly remove the duplicates but this utility can only be used for loading into empty tables.
2. If you are trying to load data in non-empty table then place a sorter and de-dupe your data in Informatica
3. Also try changing the flag stop on error to 0 and flag Error limit in target to -1
Please share your results with us.
There is a string, which comes from text field, and has 200 character limit. Field in oracle DB's table has a maximum value of 200 characters. Application crashes, saying it can't write 212 characters to a field of maximum 200 characters. Problem is clearly on DB level, as on the other database with identical table and CRUD, it all goes well.
Suspecting that problem might be in encoding differences, I made a
SELECT * FROM NLS_DATABASE_PARAMETERS;
on both databases. Results are identical, NLS_CHARACTERSET in both cases show value of AL32UTF8. What might be the problem?
P.S. It's ASP.NET application, if it helps.
If also NLS_LENGTH_SEMANTICS parameter is the same, maybe the columns are defined differenty: VARCHAR2(200 BYTE) vs VARCHAR2(200 CHAR)?
HTH.
Alessandro
I've got a database table with a very large amount of rows. This table represents messages that are logged by a system. Each message has a message type and this is stored it it's own field in the table. I'm writing a website for querying this message log. If I want to search by message type then ideally I would want to have a drop down box listing the message types that have come up in the database. Message types may change over time so I can't hard code the types into the drop down. I'll have to do some sort of lookup. Iterating over the entire table contents to find unique message values is obviously very stupid however being stupid in the database field I'm here asking for a better way. Perhaps a separate lookup table which the database occasionally updates listing just the unique message types that I can populate my drop down from would be a better idea.
Any suggestions would be much appreciated.
The platform I'm using is ASP.NET MVC and SQL Server 2005
A separate lookup table with the id of the message type stored in your log. This will reduce the size and increase the efficiency of the log. Also it would Normalize your data.
Yep, I would definitely go with the separate lookup table. You can then populate it using something like:
INSERT TypeLookup (Type)
SELECT DISTINCT Type
FROM BigMassiveTable
You could then run a top-up job periodically to pull in new types from your main table that don't already exist in the lookup table.
SELECT DISTINCT message_type
FROM message_log
is the most straightforward but not very efficient way.
If you have a list of types that can possibly appear in the log, use this:
SELECT message_type
FROM message_types mt
WHERE message_type IN
(
SELECT message_type
FROM message_log
)
This will be more efficient if message_log.message_type is indexed.
If you don't have this table but want to create one, and message_log.message_type is indexed, use a recursive CTE to emulate loose index scan:
WITH rows (message_type) AS
(
SELECT MIN(message_type) AS mm
FROM message_log
UNION ALL
SELECT message_type
FROM (
SELECT mn.message_type, ROW_NUMBER() OVER (ORDER BY mn.message_type) AS rn
FROM rows r
JOIN message_type mn
ON mn.message_type > r.message_type
WHERE r.message_type IS NOT NULL
) q
WHERE rn = 1
)
SELECT message_type
FROM rows r
OPTION (MAXRECURSION 0)
I just wanted to state the obvious: normalize the data.
message_types
message_type | message_type_name
messages
message_id | message_type | message_type_name
Then you can just do without any cached DISTINCT:
For your dropdown
SELECT * FROM message_types
For your retrieval
SELECT * FROM messages WHERE message_type = ?
SELECT m.*, mt.message_type_name FROM messages AS m
JOIN message_types AS mt
ON ( m.message_type = mt.message_type)
I'm not sure why you would want a cached DISTINCT which you'll have to update, when you can slightly tweak the schema and have one with RI.
Create an index on the message type:
CREATE INDEX IX_Messages_MessageType ON Messages (MessageType)
Then to get a list of unique Message Types, you run:
SELECT DISTINCT MessageType
FROM Messages
ORDER BY MessageType
Because the index is physically sorted in order of MessageType SQL Server can very quickly, and efficiently, scan through the index, picking up a list of unique message types.
It is not bad performing - it's what SQL Server is good at.
Admittedly, you can save some space by having a "message types" table. And if you only display a few messages at a time: then the bookmark lookup, as it joins back to the MessageTypes table, won't be a problem. But if you start displaying hundreds or thousands of messages at a time, then the join back to MessageTypes can get pretty expensive, and needless, and it will be faster to have the MessageType stored with the message.
But i would have no problem with creating an index on the MessageType column, and selecting distinct. SQL Server loves that sort of thing. But if you're finding it to be a real load on your server, once you're getting dozens of hits a second, then follow the other suggestion and cache them in memory.
My personal solution would be:
create the index
select distinct
and if i still had problems
cache in memory that expires after 30 seconds
As for the normalized/denormalized issue. Normalizing saves space, at the cost of CPU when joins are constantly performed. But the logical point of denoralization is to avoid duplicate data, which can lead to inconsistent data.
Are you planning on changing the text of a message type, which if you stored with the messages you would have to update all rows?
Or is there something to be said for the fact that at the time of the message the message type was "Client response requested"?
Have you considered an indexed view? Its result set is materialized and persists in storage so that the overhead of the lookup is separated from the rest of whatever you're trying to do.
SQL Server takes care of automagically updating the view when there is a data change which in its opinion would change the contents of the view, so in this respect it's less flexible than Oracle materialized.
The MessageType should be a Foreign Key in the main table to a definition table containing the message type codes and descriptions. This will greatly increase your lookup performance.
Something like
DECLARE #MessageTypes TABLE(
MessageTypeCode VARCHAR(10),
MessageTypeDesciption VARCHAR(100)
)
DECLARE #Messages TABLE(
MessageTypeCode VARCHAR(10),
MessageValue VARCHAR(MAX),
MessageLogDate DATETIME,
AdditionalNotes VARCHAR(MAX)
)
From this design, your lookup should only query MessageTypes
As others have said, create a separate table of message types. When you add a record to the message table, check if the message type already exists in the table. If not, add it. In either case, then post the identifier from the message type table into the message table. This should give you normalized data. Yes, it's a little extra time when you add a record, but should be more efficient on retrieval.
If there are a lot more adds then reads and if the "message type" is short, an entirely different approach would be to still create the separate message type table, but don't reference it when doing adds, and only update it lazily, on demand.
Namely, (a) Include a time-stamp in each message record. (b) Keep a list of the message types found as of the last time you checked. (c) Each time you check, search for any new message types added since the last time, as in:
create table temp_new_types as
(select distinct message_type
from message
where timestamp>last_type_check
);
insert into message_type_list (message_type)
select message_type
from temp_new_types
where message_type not in (select message_type from message_type_list);
drop table temp_new_types;
Then store the timestamp of this check somewhere so you can use it the next time around.
The answer is to use 'DISTINCT' and each best solution is different for different sizes of table. Thousands of rows, millions, billions ? more ? This are very different best solutions.
Is there a way to further restrict the lookup performed by a database lookup functoid to include another column?
I have a table containing four columns.
Id (identity not important for this)
MapId int
Ident1 varchar
Ident2 varchar
I'm trying to get Ident2 for a match on Ident1 but wish it to only lookup where MapId = 1.
The functoid only allows the four inputs any ideas?
UPDATE
It appears there is a technique if you are interested in searching across columns that are string data types. For those interested I found this out here...
Google Books: BizTalk 2006 Recipes
Seeing as I wish to restrict on a numberic column this doesn't work for me. If anyone has any ideas I'd appreciate it. Otherwwise I may need to think about my MapId column becoming a string.
I changed the MapId to MapCode of type char(3) and used the technique described in the book I linked to in the update to the original question.
The only issue I faced was that my column collations where not in line so I was getting an error from the SQL when they where concatenated in the statement generated by the map.
exec sp_executesql N'SELECT * FROM IdentMap WHERE MapCode+Ident1= #P1',N'#P1 nvarchar(17)',N'<MapCode><Ident2>'
Sniffed this using the SQL Profiler